@@ -18,8 +18,8 @@ This document describes how the SME ACLE attributes map to LLVM IR
18
18
attributes and how LLVM lowers these attributes to implement the rules and
19
19
requirements of the ABI.
20
20
21
- Below we describe the LLVM IR attributes and their relation to the C/C++
22
- level ACLE attributes:
21
+ Below, we describe the LLVM IR attributes and their relation to the
22
+ C/C++- level ACLE attributes:
23
23
24
24
``aarch64_pstate_sm_enabled ``
25
25
is used for functions with ``__arm_streaming ``
@@ -51,8 +51,8 @@ level ACLE attributes:
51
51
52
52
Clang must ensure that the above attributes are added both to the
53
53
function's declaration/definition as well as to their call-sites. This is
54
- important for calls to attributed function pointers, where there is no
55
- definition or declaration available.
54
+ important for calls to attributed function pointers, where no
55
+ definition or declaration is available.
56
56
57
57
58
58
2. Handling PSTATE.SM
@@ -77,7 +77,7 @@ and almost all parts of CodeGen we can assume that the runtime value for
77
77
``vscale `` does not. If we let the compiler insert the appropriate ``smstart ``
78
78
and ``smstop `` instructions around call boundaries, then the effects on SVE
79
79
state can be mitigated. By limiting the state changes to a very brief window
80
- around the call we can control how the operations are scheduled and how live
80
+ around the call, we can control how the operations are scheduled and how live
81
81
values remain preserved between state transitions.
82
82
83
83
In order to control PSTATE.SM at this level of granularity, we use function and
@@ -89,7 +89,7 @@ Restrictions on attributes
89
89
90
90
* It is undefined behaviour to pass or return (pointers to) scalable vector
91
91
objects to/from functions which may use a different SVE vector length.
92
- This includes functions with a non-streaming interface, but marked with
92
+ This includes functions with a non-streaming interface but marked with
93
93
``aarch64_pstate_sm_body ``.
94
94
95
95
* It is not allowed for a function to be decorated with both
@@ -100,7 +100,7 @@ Restrictions on attributes
100
100
``aarch64_new_za ``, ``aarch64_in_za ``, ``aarch64_out_za ``, ``aarch64_inout_za ``,
101
101
``aarch64_preserves_za ``.
102
102
103
- These restrictions also apply in the higher level SME ACLE, which means we can
103
+ These restrictions also apply in the higher- level SME ACLE, which means we can
104
104
emit diagnostics in Clang to signal users about incorrect behaviour.
105
105
106
106
@@ -224,7 +224,7 @@ The ``COND_SMSTART/COND_SMSTOP`` nodes additionally take ``CurrentState`` and
224
224
225
225
When ``CurrentState `` and ``ExpectedState `` can be evaluated at compile-time
226
226
(i.e. they are both constants) then an unconditional ``smstart/smstop ``
227
- instruction is emitted. Otherwise the node is matched to a Pseudo instruction
227
+ instruction is emitted. Otherwise, the node is matched to a Pseudo instruction
228
228
which expands to a compare/branch and a ``smstart/smstop ``. This is necessary to
229
229
implement transitions from ``SC -> N `` and ``SC -> S ``.
230
230
@@ -236,7 +236,7 @@ streaming compatible, the compiler has to insert a SMSTOP before the call and
236
236
insert a SMSTOP after the call.
237
237
238
238
If the function that is called is an intrinsic with no side-effects which in
239
- turn is lowered to a function call (e.g. ``@llvm.cos() ``), then the call to
239
+ turn is lowered to a function call (e.g., ``@llvm.cos() ``), then the call to
240
240
``@llvm.cos() `` is not part of any Chain; it can be scheduled freely.
241
241
242
242
Lowering of a Callsite creates a small chain of nodes which:
@@ -297,11 +297,11 @@ To ensure we use the correct SVE vector length to allocate the locals with, we
297
297
can use the streaming vector-length to allocate the stack-slots through the
298
298
``ADDSVL `` instruction, even when the CPU is not yet in streaming mode.
299
299
300
- This only works for locals and not callee-save slots, since LLVM doesn't support
300
+ This works only for locals and not callee-save slots, since LLVM doesn't support
301
301
mixing two different scalable vector lengths in one stack frame. That means that the
302
302
case where a function is marked ``arm_locally_streaming `` and needs to spill SVE
303
303
callee-saves in the prologue is currently unsupported. However, it is unlikely
304
- for this to happen without user intervention, because ``arm_locally_streaming ``
304
+ for this to happen without user intervention because ``arm_locally_streaming ``
305
305
functions cannot take or return vector-length-dependent values. This would otherwise
306
306
require forcing both the SVE PCS using '``aarch64_sve_pcs ``' combined with using
307
307
``arm_locally_streaming `` in order to encounter this problem. This combination
@@ -330,7 +330,7 @@ attributed with ``arm_locally_streaming``:
330
330
return array[N - 1] + arg;
331
331
}
332
332
333
- should use ADDSVL for allocating the stack space and should avoid clobbering
333
+ should use `` ADDSVL `` for allocating the stack space and should avoid clobbering
334
334
the return/argument values.
335
335
336
336
.. code-block :: none
@@ -381,17 +381,17 @@ Preventing the use of illegal instructions in Streaming Mode
381
381
* When executing a program in normal mode (PSTATE.SM=0), a subset of SME
382
382
instructions are invalid.
383
383
384
- * Streaming-compatible functions must only use instructions that are valid when
384
+ * Streaming-compatible functions must use only instructions that are valid when
385
385
either PSTATE.SM=0 or PSTATE.SM=1.
386
386
387
387
The value of PSTATE.SM is not controlled by the feature flags, but rather by the
388
- function attributes. This means that we can compile for '``+sme ``' and the compiler
388
+ function attributes. This means that we can compile for '``+sme ``', and the compiler
389
389
will code-generate any instructions, even if they are not legal under the requested
390
390
streaming mode. The compiler needs to use the function attributes to ensure the
391
391
compiler doesn't do transformations under the assumption that certain operations
392
392
are available at runtime.
393
393
394
- We made a conscious choice not to model this with feature flags, because we
394
+ We made a conscious choice not to model this with feature flags because we
395
395
still want to support inline-asm in either mode (with the user placing
396
396
smstart/smstop manually), and this became rather complicated to implement at the
397
397
individual instruction level (see `D120261 <https://reviews.llvm.org/D120261 >`_
@@ -408,7 +408,7 @@ auto-vectorization with a subset of streaming-compatible instructions, but that
408
408
requires changes to the CostModel, Legalization and SelectionDAG lowering.
409
409
410
410
We will also emit diagnostics in Clang to prevent the use of
411
- non-streaming(-compatible) operations, e.g. through ACLE intrinsics, when a
411
+ non-streaming(-compatible) operations, e.g., through ACLE intrinsics, when a
412
412
function is decorated with the streaming mode attributes.
413
413
414
414
@@ -456,7 +456,7 @@ AArch64 Predicate-as-Counter Type
456
456
:Overview:
457
457
458
458
The predicate-as-counter type represents the type of a predicate-as-counter
459
- value held in a AArch64 SVE predicate register. Such a value contains
459
+ value held in an AArch64 SVE predicate register. Such a value contains
460
460
information about the number of active lanes, the element width and a bit that
461
461
tells whether the generated mask should be inverted. ACLE intrinsics should be
462
462
used to move the predicate-as-counter value to/from a predicate vector.
0 commit comments