Skip to content

Conversation

randomm
Copy link
Contributor

@randomm randomm commented Jun 14, 2025

What does this PR do?

This PR adds a new Dockerfile-cpu-amd to resolve Intel MKL compatibility issues on AMD processors, specifically addressing the SGEMM errors reported when running Qwen3 embedding models on AMD CPUs.

Problem

Users running text-embeddings-inference on AMD processors encounter Intel MKL errors:

  • "Intel MKL ERROR: Parameter 8 was incorrect on entry to SGEMM"
  • "Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM"

This occurs because Intel MKL is optimized for Intel processors and has compatibility issues with AMD architectures.

Solution

  • Adds Dockerfile-cpu-amd: A new specialized Dockerfile following the project's existing pattern (similar to Dockerfile-cuda, Dockerfile-intel)
  • Removes Intel MKL dependencies: Uses generic BLAS libraries (libomp-dev) instead of Intel MKL
  • Maintains compatibility: No changes to existing Dockerfiles or functionality
  • Clean and minimal: Only adds the essential Dockerfile without additional workflow files

Key Changes

  • New file: Dockerfile-cpu-amd - AMD-compatible CPU Dockerfile without Intel MKL

Testing

Successfully tested on AMD server - no more Intel MKL errors

  • Image builds successfully
  • Qwen3 embedding models now work correctly on AMD processors
  • Performance is not awesome (Qwen3 0.6B model), but at least it runs on AMD chips now. Tested on AWS t3a.2xlarge instances.

Testing Results (from simple shell script)

Ran with concurrency of 3 (so not a lot!)

Total Requests: 100
Successful: 100
Failed: 0
Success Rate: 100.0%

Overall Performance (successful requests):
Average Response Time: 3.690 seconds
Median Response Time: 4.000 seconds
Min Response Time: 1.000 seconds
Max Response Time: 6.000 seconds

Performance by Text Length:

Short (1 word) ( 1 tokens):
Count: 20, Avg: 4.350s, Min: 3.000s, Max: 5.000s
Medium (3 words) ( 3 tokens):
Count: 20, Avg: 3.600s, Min: 1.000s, Max: 4.000s
Question (13 words) ( 12 tokens):
Count: 20, Avg: 2.600s, Min: 2.000s, Max: 4.000s
Paragraph (47 words) ( 42 tokens):
Count: 20, Avg: 3.100s, Min: 2.000s, Max: 6.000s
Long text (95 words) ( 74 tokens):
Count: 20, Avg: 4.800s, Min: 4.000s, Max: 6.000s

Total Test Duration: 126.00 seconds
Requests per Second: 0.79
Total Tokens Processed: 2640
Tokens per Second: 20.95

Note: most time is spent queueing.

Fixes #636

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@OlivierDehaene @Narsil

This follows the project's existing pattern of specialized Dockerfiles and provides a clean solution for AMD CPU compatibility without affecting existing functionality.

…dependencies that cause SGEMM errors on AMD processors - Uses generic BLAS (libomp-dev) instead of Intel MKL - Follows project's existing pattern of specialized Dockerfiles - Resolves issue huggingface#636
@randomm randomm force-pushed the fix-dockerfile-issue branch from 296396e to c0a00f1 Compare June 14, 2025 18:01
@polarathene
Copy link

polarathene commented Jun 22, 2025

There's nothing specific to this image that is AMD. There's already a specialized image for Intel with MKL, so why that was leaked into the default Docker image is unclear.

What you're trying to fix here is also an issue for other CPUs like from Apple, the default Dockerfile should have changes reverted to make it agnostic, it doesn't help that the MKL addition is also x86_64 specific.

Related issue: #611 (comment)

@randomm
Copy link
Contributor Author

randomm commented Jun 23, 2025

Get it @polarathene ... if the Intel stuff is reverted from the default image then this is pointless...

@alvarobartt alvarobartt mentioned this pull request Jul 9, 2025
4 tasks
@choronz
Copy link

choronz commented Aug 4, 2025

@randomm

Seems to broken for AMD Ryzen™ 7 8845HS

#667 (comment)

@alvarobartt
Copy link
Member

Hey @choronz, @randomm, @polarathene!

First of all thanks @randomm for opening the PR and willing to contribute to Text Embeddings Inference, really appreciate it 🤗

Then, given that we're using the libfakeintel.so as per line

COPY --from=builder /usr/src/libfakeintel.so /usr/local/libfakeintel.so
, it should work out of the box on AMD CPUs too as per the latest release, Text Embeddings Inference v1.8.2. More information on the libfakeintel.so for AMD CPU support on https://danieldk.eu/Intel-MKL-on-AMD-Zen from @danieldk 🤗

If you read the release notes, you'll see that the issue was not with the Dockerfile per se, but rather with the linking not working fine for intel-mkl-src, making the whole thing to fail, which shouldn't be the case anymore.

This being said, I'd suggest you to try to run whatever workload was failing before with e.g. "Intel MKL ERROR: Parameter 8 was incorrect on entry to SGEMM" and see if the error persists with ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.2.

See below the (partial) output of lscpu on an AMD CPU instance that I just used for testing:

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   16
  On-line CPU(s) list:    0-15
Vendor ID:                AuthenticAMD
  Model name:             AMD EPYC 7R13 Processor
    CPU family:           25
    Model:                1
    Thread(s) per core:   2
    Core(s) per socket:   8
    Socket(s):            1
    Stepping:             1
    BogoMIPS:             5300.00
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall
                          nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_
                          known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hype
                          rvisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall
                           fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero x
                          saveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid

Hope that helps resolve the issue 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TEI CPU inference fails with Intel MKL errors on AMD processors when running Qwen3 embedding models
4 participants