Skip to content

Conversation

calebzulawski
Copy link
Member

@calebzulawski calebzulawski commented Sep 10, 2025

Summary:

  • Mask<T, N> == Simd<T, N>
  • Allow 0b0101.select(a, b)

This is a compromise. My justification:

  • There is no actual universal/portable mask implementation, but full width SIMD vector masks are pretty close.
  • Having a specified memory layout is simpler and some people will probably make the assumption anyway, even if we document otherwise.
  • AVX/AVX512 and NEON/SVE mean that on a single architecture you could have both types of masks. Using cfg and building std from source isn't really a good option. By providing both we can at least allow using each mask type in the same program depending on detected target features.
  • If Mask<T, N> is equivalent to Simd<T, N> in layout, I think we can finally put Change Mask element to match Simd element #322 to rest.

Since Mask is still represented as <N x i1> in LLVM, I think this mostly just affects memory layout. I think this actually improves codegen since the full width version of Mask is more transparent than the (deleted in this PR) AVX512 bitmask version.

Also, this PR allows selecting any vector from any width Mask with the same number of elements. I'm not sure if that should be supported or not, but Masks already implement Into to each other.

This is the first item on my list for trying to get everything resolved for stabilization :)

@programmerjake
Copy link
Member

I'm not so sure that we don't want an opaque Mask<T, N> type -- just because x86 and Arm Neon work fine with those types doesn't mean everything will work great with those types -- e.g. RISC-V uses bitmasks, and even if you use fixed-size SIMD values (relying on the minimum vector register sizes), using bitmasks should be more efficient.

If we want known-layout mask types, we could just export the FullMask<T, N> type (equivalent to Simd<i32, N> where T is the same size/align as i32 and so on) and the BitMask<N> type.

@calebzulawski
Copy link
Member Author

calebzulawski commented Sep 11, 2025

Full vector Mask<T, N> is still pretty transparently <N x i1> so other than going to memory, it should work pretty much everywhere as you'd expect (RVV example). Once you're going to memory I feel like it's more useful to know the layout than having an opaque optimization to possibly remove one instruction. The underlying bitmask layout is hacky (to support arbitrary length bitmasks) so we lose the transparency and the codegen is terrible.

With this change you can still use Mask<T, N> most of the time, but allows you to opt to use u64 as a mask instead which is more transparently <64 x i1> and actually produces usable bitmask codegen.

My issue with keeping them the same type is that Mask<i32, 4> seems to have a very specific implication (it contains i32s) and a bitmask should look more like BitMask<4>, or literally just an integer. Maybe the language will gain arbitrary length integers or a bitset type to support that in the future.

@calebzulawski
Copy link
Member Author

If we had a magic int<N> type that we could use as the bitmask implementation I'd probably be fine with that, but I don't want to wait to stabilize std::simd on that

@calebzulawski
Copy link
Member Author

calebzulawski commented Sep 11, 2025

RVV doesn't have the bitmask implementation cfged right now anyway, but for reference, here's how those same two functions compile with it:

<masked_add>:
                addi    sp, sp, -0x10
                sd  a1, 0x8(sp)
                addi    a1, sp, 0x8
                vsetivli    zero, 0x5, e32, m2, ta, ma
                vlm.v   v0, (a1)
                vmv.v.i v8, 0x0
                vsetvli zero, zero, e32, m2, tu, mu
                vle32.v v8, (a3), v0.t
                vsetvli zero, zero, e32, m2, ta, ma
                vle32.v v10, (a2)
                vadd.vv v8, v10, v8
                vse32.v v8, (a0)
                addi    sp, sp, 0x10
                ret

Disassembly of section .text.bitmasked_add:

<bitmasked_add>:
                addi    sp, sp, -0x10
                sd  a1, 0x8(sp)
                addi    a1, sp, 0x8
                vsetivli    zero, 0x5, e32, m2, ta, ma
                vlm.v   v0, (a1)
                vmv.v.i v8, 0x0
                vsetvli zero, zero, e32, m2, tu, mu
                vle32.v v8, (a3), v0.t
                vsetvli zero, zero, e32, m2, ta, ma
                vle32.v v10, (a2)
                vadd.vv v8, v10, v8
                vse32.v v8, (a0)
                addi    sp, sp, 0x10
                ret

@programmerjake
Copy link
Member

If we had a magic int<N> type that we could use as the bitmask implementation I'd probably be fine with that, but I don't want to wait to stabilize std::simd on that

I'm fine if we temporarily don't have the bitmask Mask while waiting for generic integers (we can stabilize it like that as long as we're very clear that it has unspecified layout and will probably change in the future on some targets), I'm not fine with guaranteeing the main Mask type is always a full mask.

@calebzulawski
Copy link
Member Author

calebzulawski commented Sep 12, 2025

So maybe I keep this PR how it is (with the new Select trait) but add back the unspecified layout documentation. I'll make a follow up PR that changes the generic type to be the vector type rather than the full width mask integer type. With the addition of the Select trait it should make it possible to do e.g. Select<Simd<u64, N>> for Mask<f64, N>> (rather than needing .into() everywhere), that's probably good enough for me to be okay with the unspecified layout.

@programmerjake
Copy link
Member

programmerjake commented Sep 12, 2025

I think the full mask type should have the contained Simd type still use integers as the element type for better LLVM optimization even if it's fine to now allow Mask<f64, 4> or Mask<*const (), 2> -- so those would contain Simd<i64, 4> and Simd<isize, 2> respectively

@calebzulawski calebzulawski force-pushed the masks-v2-final-final-copy branch from 6efa217 to da8fe78 Compare September 12, 2025 04:38
@calebzulawski calebzulawski changed the title Define Mask<T, N> as equivalent to Simd<T, N> and enable select on integer bitmasks Remove poor-performing bitmasks, add Select trait, and enable select on integer bitmasks Sep 12, 2025
@calebzulawski
Copy link
Member Author

calebzulawski commented Sep 12, 2025

Ok, I reverted masks to have an unspecified layout. My plan in the follow-up is to change masks to struct Mask<T: SimdElement, const N: usize>(Simd<T::Mask, N>) which should match what you're describing.

This PR now just removes the bitmask implementation and changes mask.select() to be a trait

@calebzulawski calebzulawski force-pushed the masks-v2-final-final-copy branch from da8fe78 to cd88ff3 Compare September 12, 2025 06:02
@calebzulawski calebzulawski force-pushed the masks-v2-final-final-copy branch from cd88ff3 to 45a18ae Compare September 12, 2025 06:36
} else {
// Safety: bitmask matches length
unsafe { to_bitmask_impl::<T, u64, 64, N>(self) }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a todo!() for N > 64?

@@ -450,7 +491,8 @@ where
type Output = Self;
#[inline]
fn bitand(self, rhs: Self) -> Self {
Self(self.0 & rhs.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can just stay unchanged -- same for all the other bitwise ops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants