Skip to content

Conversation

madhav-madhusoodanan
Copy link
Contributor

Context

This is a redo of PR #1814, since a lot of details have changed with PRs #1863, #1862, #1861, #1852.

r? @folkertdev
cc: @Amanieu

@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch 2 times, most recently from 41db5a8 to 5fc0f3b Compare August 5, 2025 10:16
Comment on lines +199 to +235
match str::parse::<u32>(etype_processed.as_str()) {
Ok(value) => data.bit_len = Some(value),
Err(_) => {
data.bit_len = match data.kind() {
TypeKind::Char(_) => Some(8),
TypeKind::BFloat => Some(16),
TypeKind::Int(_) => Some(32),
TypeKind::Float => Some(32),
_ => None,
};
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are only some type kinds covered here? Maybe this could be a method on TypeKind?

@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from 9e28106 to 111cd5d Compare August 5, 2025 16:22
Copy link
Contributor

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should rebase on top of the upstream master branch instead of merging it in. That keeps the git history clean.

ci/run.sh Outdated
Comment on lines 188 to 195
x86_64-unknown-linux-gnu*)
CPPFLAGS="${TEST_CPPFLAGS}" RUSTFLAGS="${HOST_RUSTFLAGS}" RUST_LOG=warn \
cargo run "${INTRINSIC_TEST}" "${PROFILE}" \
--bin intrinsic-test -- intrinsics_data/x86-intel.xml \
--runner "${TEST_RUNNER}" \
--cppcompiler "${TEST_CXX_COMPILER}" \
--target "${TARGET}"
;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll have to see how to do this exactly, but we want to split these out of the main CI job to speed it up

@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from f3f87f2 to 2ec747c Compare August 9, 2025 12:20
@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch from 2ec747c to a8313d0 Compare September 5, 2025 08:16
@madhav-madhusoodanan
Copy link
Contributor Author

Seems like the CI run at this point failed due to this error. I'll retry shortly:

#6 39.45   Could not connect to archive.ubuntu.com:80 (185.125.190.82), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.83), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.83), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.81), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.81), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.36), connection timed out Could not connect to archive.ubuntu.com:80 (185.125.190.39), connection timed out Could not connect to archive.ubuntu.com:80 (91.189.91.82), connection timed out
#6 39.45   Unable to connect to archive.ubuntu.com:80:
#6 39.45 Fetched 126 kB in 39s (3208 B/s)
#6 39.45 Reading package lists...
#6 39.46 W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/questing/InRelease  Unable to connect to archive.ubuntu.com:80:
#6 39.46 W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/questing-updates/InRelease  Unable to connect to archive.ubuntu.com:80:
#6 39.46 W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/questing-backports/InRelease  Unable to connect to archive.ubuntu.com:80:
#6 39.46 W: Some index files failed to download. They have been ignored, or old ones used instead.

@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch 9 times, most recently from a119be1 to 0f3900b Compare September 8, 2025 19:25
@folkertdev
Copy link
Contributor

This is looking good code-wise I think, now it just needs to work :)

Running `/intel-sde/sde64 -cpuid-in /checkout/ci/docker/x86_64-unknown-linux-gnu/cpuid.def -rtm-mode full -tsx -- target/release/intrinsic-test intrinsics_data/x86-intel.xml --runner '/intel-sde/sde64             -cpuid-in /checkout/ci/docker/x86_64-unknown-linux-gnu/cpuid.def             -rtm-mode full -tsx --' --cppcompiler clang++-19 --target x86_64-unknown-linux-gnu`

Looks like the command is malformed? Where does that inner --runner come from, that doesn't look right.

CPPFLAGS="${TEST_CPPFLAGS}" RUSTFLAGS="${HOST_RUSTFLAGS}" RUST_LOG=warn \
cargo run "${INTRINSIC_TEST}" "${PROFILE}" \
--bin intrinsic-test -- intrinsics_data/x86-intel.xml \
--runner "${TEST_RUNNER}" \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@folkertdev It seems like the error comes from here, but I'm not sure how the outer runner command comes to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right, that comes from

ENV CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER="/intel-sde/sde64 \
-cpuid-in /checkout/ci/docker/x86_64-unknown-linux-gnu/cpuid.def \
-rtm-mode full -tsx --"

In other words, any cargo run is actually run with that runner. That is useful for the unit tests, but not here where we actually want to run the intrinsic-test binary on the host. On aarch64 etc. we don't configure the runner.

It might work if you add CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER="" to this command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER cannot have a zero size value, so I set it to "true" instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'll need a comment at least

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read more on what "true" does.

Apparently, it simply exits successfully and does nothing else. I think this was the reason why the CI test for x86 passed, because it did not run in the first place.

Thank you so much for pointing out this one. I've changed it to be 'bash" instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still I think it's useful here to mention that you're resetting the runner, because we want to run this binary directly on the host

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've mentioned it in a comment right above this highlighted section.

By the way, even bash failed because it cannot execute binary files. I think eval might be a better command for this purpose.

@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch 4 times, most recently from eadffb0 to d64935f Compare September 16, 2025 10:17
@madhav-madhusoodanan
Copy link
Contributor Author

Hang on, it isn't supposed to work like this.
Let me check if there are errors that we are not catching.

"{indentation}const {ty} {name}_vals[] = {values};",
"{indentation}alignas(64) const {ty} {name}_vals[] = {values};",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, I suppose we're casting values to simd vectors and then performing some sort of aligned read? In any case, can you leave a comment on why the alignment is required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, I added this as a general precautionary measure.

Fortunately when I was working on x86, there were intrinsics that helped with unaligned reads (like _mm_loadu_epi16 and the like), but I'm not sure if the same could be told for other architectures too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, I'll add a comment on the same.

Copy link
Contributor Author

@madhav-madhusoodanan madhav-madhusoodanan Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@folkertdev I wish to confer with you on this. I chose 64 so that it generalizes well across 16-bit, 32-bit and 64-bit alignment requirements.

Would 64 be a good value, or is it overkill?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with either 64 or remove it completely, any other inbetween value does not really make sense.

Comment on lines 80 to 87
if !output.status.success() {
io::stdout()
.write_all(&output.stdout)
.expect("Failed to write to stdout!");
io::stderr()
.write_all(&output.stderr)
.expect("Failed to write to stderr!");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the debug print that is part of the assert should already print the stdoud and stderr data. Is this to get better formatting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, you're right.
I hadn't noticed the debug print of the assert, my bad.

Comment on lines 34 to 44
const PLATFORM_C_HEADERS: &[&str] = &[
"iostream",
"cstring",
"iomanip",
"sstream",
"cstddef",
"cstdint",
"arm_neon.h",
"arm_acle.h",
"arm_fp16.h",
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are additional headers needed on arm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a couple of headers that were part of common/gen_c.rs which I had brought here.

Oh, I think cstddef and cstdint won't be necessary in arm though.

@sayantn
Copy link
Contributor

sayantn commented Sep 17, 2025

@madhav-madhusoodanan there is already a version of the x86-intel.xml file in crates/stdarch-verify, you can just move that file to intrinsics-data (and change the path in the crates/stdarch-verify/tests/x86-intel.rs)

.into_iter()
// Not sure how we would compare intrinsic that returns void.
.filter(|i| i.results.kind() != TypeKind::Void)
.filter(|i| i.results.kind() != TypeKind::BFloat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be dropped, x86 stdarch has the bfloat16 newtype that takes the place of C __bfloat16 (at least for now)

@madhav-madhusoodanan madhav-madhusoodanan force-pushed the intrinsic-test-x86-addition branch 3 times, most recently from ff8a8cc to 571eab0 Compare September 17, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants