Skip to content

Conversation

mergennachin
Copy link
Contributor

Problem

The tokenizers library fails to build on aarch64 Linux systems during CI with:
/opt/rh/gcc-toolset-13/root/usr/libexec/gcc/aarch64-redhat-linux/13/ld: cannot find -latomic: No such file or directory

Root Cause

  • On aarch64, certain atomic operations require explicit linking with libatomic
  • The sentencepiece dependency detects libatomic (Found atomic: /usr/lib64/libatomic.so.1) but incorrectly adds just the string "atomic" to link flags instead of the actual library path
  • This causes the linker to fail when building the Python extension

Solution

Added proper libatomic detection and linking for aarch64/arm64 systems in two places:

  1. For the tokenizers static library
  2. For the pytorch_tokenizers_cpp Python extension module

The fix uses CMake's find_library to locate libatomic and explicitly links it when building on aarch64/arm64 architectures.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 8, 2025
@mergennachin mergennachin force-pushed the fix_tokenizers_aarch64 branch from 523fe5f to 4d8a9a0 Compare September 8, 2025 15:39
@mergennachin mergennachin force-pushed the fix_tokenizers_aarch64 branch from 4d8a9a0 to 089ff80 Compare September 8, 2025 16:52
…linking

Problem

The tokenizers library fails to build on aarch64 Linux systems during CI with:
/opt/rh/gcc-toolset-13/root/usr/libexec/gcc/aarch64-redhat-linux/13/ld: cannot find -latomic: No such file or directory

Root Cause

- On aarch64, certain atomic operations require explicit linking with libatomic
- The sentencepiece dependency detects libatomic (Found atomic: /usr/lib64/libatomic.so.1) but incorrectly adds just the string "atomic" to link flags instead of the actual library path
- This causes the linker to fail when building the Python extension

Solution

Added proper libatomic detection and linking for aarch64/arm64 systems in two places:
1. For the tokenizers static library
2. For the pytorch_tokenizers_cpp Python extension module

The fix uses CMake's find_library to locate libatomic and explicitly links it when building on aarch64/arm64 architectures.
@mergennachin mergennachin force-pushed the fix_tokenizers_aarch64 branch from 089ff80 to 3e608b5 Compare September 8, 2025 17:05
@swolchok
Copy link
Contributor

incorrectly adds just the string "atomic" to link flags instead of the actual library path

Hmm, target_link_libraries(mything atomic) sounds like the correct way to get -latomic to show up to me. Code pointer would be helpful here

# Link with atomic library on aarch64/arm64 systems
# Some aarch64 systems require explicit linking with libatomic for certain atomic operations
if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64|arm64")
find_library(ATOMIC_LIB NAMES atomic libatomic.so libatomic.so.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have to duplicate 1) the find_library call 2) the gating? can't we just if(ATOMIC_LIB) since the previous time we look, on lines 97-105, is unconditional?

@larryliu0820
Copy link
Contributor

Please address comments before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants