Skip to content

Commit 909c552

Browse files
author
Greg Roth
authored
Create new raw buffer load lowering function (#7144)
Disentangles the raw, structured, and typed buffer lowering implementations into an isolated function. Alters the various places that lowering took place to call into the common function. The Load lowering takes place in a few phases now. The basic information about the load is gathered as part of the ResLoadHelper constructor. One variant extracts most of this information from a call instruction. The other sets a lot of things such as offsets more explicitly, usually for subscripted or matrix loads. The helper is used to assemble call instruction arguments appropriate for the call. The call is issued possibly repeatedly for raw buffers of types greater than 4 elements. The results are then packaged and converted from memory storage type into a vector of register types. When raw buffers use a templated load with a struct, they reuse the subscript path also used for subscripted structured buffers. Such loads with structs containing vectors or matrices will invoke the load lowering from within this recursive call that traverses GEPs and other users of the original call to set up correct offsets etc. This adapts that code to use the common load lowering that enables long vectors within structs to be correctly loaded. This requires the ability to override the type used by the resloadhelper explicitly, so a member is added to accommodate the matrices vector representation that doesn't match the types of the load call. This also requires removing the bufIdx and offset swapping that was done, confusingly throughout the TranslateStructBufSubscriptUser code to account for the fact that byte address buffers have to represent offsets using the main coord parameter in favor of passing the Resource Kind down such that the right parameter can receive the incrementation when necessary for longer types such as matrices. This is enabled also by adding ResKind appropriate offset calculation in the ResLoadHelper. ResLoadHelper also gets an opcode set based on the ResKind for both overloads in preparation for further expansion to different resource kinds. Adds filecheck, verify, and IR pass tests. Lays groundwork for #7118
1 parent ebc8c5c commit 909c552

File tree

8 files changed

+1722
-444
lines changed

8 files changed

+1722
-444
lines changed

lib/HLSL/HLOperationLower.cpp

Lines changed: 347 additions & 418 deletions
Large diffs are not rendered by default.

tools/clang/test/CodeGenDXIL/hlsl/intrinsics/buffer-agg-load-stores.hlsl

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,34 @@
33
// RUN: %dxc -T vs_6_6 -DETY=uint64_t -DCOLS=2 %s | FileCheck %s
44
// RUN: %dxc -T vs_6_6 -DETY=double -DCOLS=2 %s | FileCheck %s
55

6+
// RUN: %dxc -T vs_6_6 -DETY=float1 -DCOLS=4 %s | FileCheck %s
7+
// RUN: %dxc -T vs_6_6 -DETY=bool1 -DCOLS=4 %s | FileCheck %s
8+
// RUN: %dxc -T vs_6_6 -DETY=uint64_t1 -DCOLS=2 %s | FileCheck %s
9+
// RUN: %dxc -T vs_6_6 -DETY=double1 -DCOLS=2 %s | FileCheck %s
10+
11+
// RUN: %dxc -T vs_6_6 -DETY=float4 -DCOLS=4 %s | FileCheck %s
12+
// RUN: %dxc -T vs_6_6 -DETY=bool4 -DCOLS=4 %s | FileCheck %s
13+
// RUN: %dxc -T vs_6_6 -DETY=uint64_t4 -DCOLS=2 %s | FileCheck %s
14+
// RUN: %dxc -T vs_6_6 -DETY=double4 -DCOLS=2 %s | FileCheck %s
15+
616
// RUN: %dxc -T vs_6_6 -DATY=matrix -DETY=float -DCOLS=2 -DROWS=2 %s | FileCheck %s
17+
// RUN: %dxc -T vs_6_6 -DATY=matrix -DETY=bool -DCOLS=2 -DROWS=2 %s | FileCheck %s
718
// RUN: %dxc -T vs_6_6 -DATY=matrix -DETY=uint64_t -DCOLS=2 -DROWS=2 %s | FileCheck %s
819
// RUN: %dxc -T vs_6_6 -DATY=matrix -DETY=double -DCOLS=2 -DROWS=2 %s | FileCheck %s
20+
921
// RUN: %dxc -T vs_6_6 -DATY=matrix -DETY=float -DCOLS=3 -DROWS=3 %s | FileCheck %s --check-prefixes=CHECK,MAT
1022
// RUN: %dxc -T vs_6_6 -DATY=matrix -DETY=bool -DCOLS=3 -DROWS=3 %s | FileCheck %s --check-prefixes=CHECK,MAT
1123
// RUN: %dxc -T vs_6_6 -DATY=matrix -DETY=uint64_t -DCOLS=3 -DROWS=3 %s | FileCheck %s --check-prefixes=CHECK,MAT
1224
// RUN: %dxc -T vs_6_6 -DATY=matrix -DETY=double -DCOLS=3 -DROWS=3 %s | FileCheck %s --check-prefixes=CHECK,MAT
1325

26+
// RUN: %dxc -T vs_6_6 -DATY=Matrix -DETY=float -DCOLS=2 -DROWS=2 %s | FileCheck %s
27+
// RUN: %dxc -T vs_6_6 -DATY=Matrix -DETY=uint64_t -DCOLS=2 -DROWS=2 %s | FileCheck %s
28+
// RUN: %dxc -T vs_6_6 -DATY=Matrix -DETY=double -DCOLS=2 -DROWS=2 %s | FileCheck %s
29+
// RUN: %dxc -T vs_6_6 -DATY=Matrix -DETY=float -DCOLS=3 -DROWS=3 %s | FileCheck %s --check-prefixes=CHECK,MAT
30+
// RUN: %dxc -T vs_6_6 -DATY=Matrix -DETY=bool -DCOLS=3 -DROWS=3 %s | FileCheck %s --check-prefixes=CHECK,MAT
31+
// RUN: %dxc -T vs_6_6 -DATY=Matrix -DETY=uint64_t -DCOLS=3 -DROWS=3 %s | FileCheck %s --check-prefixes=CHECK,MAT
32+
// RUN: %dxc -T vs_6_6 -DATY=Matrix -DETY=double -DCOLS=3 -DROWS=3 %s | FileCheck %s --check-prefixes=CHECK,MAT
33+
1434
// RUN: %dxc -T vs_6_6 -DATY=Vector -DETY=float -DCOLS=4 %s | FileCheck %s
1535
// RUN: %dxc -T vs_6_6 -DATY=Vector -DETY=bool -DCOLS=4 %s | FileCheck %s
1636
// RUN: %dxc -T vs_6_6 -DATY=Vector -DETY=uint64_t -DCOLS=2 %s | FileCheck %s
@@ -26,8 +46,6 @@
2646
// for different aggregate buffer types and indices.
2747
///////////////////////////////////////////////////////////////////////
2848

29-
30-
3149
// CHECK: %dx.types.ResRet.[[TY:[a-z][0-9][0-9]]] = type { [[TYPE:[a-z0-9]*]],
3250

3351
#if !defined(ATY)
@@ -68,6 +86,16 @@ struct OffVector {
6886
}
6987
};
7088

89+
template<typename T, int N, int M>
90+
struct Matrix {
91+
matrix<T, N, M> m;
92+
Matrix operator+(Matrix mat) {
93+
Matrix ret;
94+
ret.m = m + mat.m;
95+
return ret;
96+
}
97+
};
98+
7199
ByteAddressBuffer RoByBuf : register(t1);
72100
RWByteAddressBuffer RwByBuf : register(u1);
73101

@@ -156,13 +184,17 @@ void main(uint ix[2] : IX) {
156184
// StructuredBuffer Tests
157185
// CHECK: [[ANHDLRWST:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLRWST]]
158186
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWST]], i32 [[IX0]], i32 [[BOFF]]
187+
// MAT: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWST]], i32 [[IX0]], i32 [[p4]]
188+
// MAT: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWST]], i32 [[IX0]], i32 [[p8]]
159189
// I1: icmp ne i32 %{{.*}}, 0
160190
// I1: icmp ne i32 %{{.*}}, 0
161191
// I1: icmp ne i32 %{{.*}}, 0
162192
// I1: icmp ne i32 %{{.*}}, 0
163193
TYPE stbElt1 SS = RwStBuf.Load(ix[0]);
164194
// CHECK: [[IX1:%.*]] = call i32 @dx.op.loadInput.i32(i32 4,
165195
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWST]], i32 [[IX1]], i32 [[BOFF]]
196+
// MAT: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWST]], i32 [[IX1]], i32 [[p4]]
197+
// MAT: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWST]], i32 [[IX1]], i32 [[p8]]
166198
// I1: icmp ne i32 %{{.*}}, 0
167199
// I1: icmp ne i32 %{{.*}}, 0
168200
// I1: icmp ne i32 %{{.*}}, 0
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
// RUN: %dxc -DTYPE=float -T vs_6_6 %s | FileCheck %s
2+
// RUN: %dxc -DTYPE=bool -T vs_6_6 %s | FileCheck %s --check-prefixes=CHECK,I1
3+
// RUN: %dxc -DTYPE=uint64_t -T vs_6_6 %s | FileCheck %s --check-prefixes=CHECK,I64
4+
// RUN: %dxc -DTYPE=double -T vs_6_6 %s | FileCheck %s --check-prefixes=CHECK,F64
5+
6+
// RUN: %dxc -DTYPE=float1 -T vs_6_6 %s | FileCheck %s
7+
// RUN: %dxc -DTYPE=bool1 -T vs_6_6 %s | FileCheck %s --check-prefixes=CHECK,I1
8+
// RUN: %dxc -DTYPE=uint64_t1 -T vs_6_6 %s | FileCheck %s --check-prefixes=CHECK,I64
9+
// RUN: %dxc -DTYPE=double1 -T vs_6_6 %s | FileCheck %s --check-prefixes=CHECK,F64
10+
11+
// Confirm that 6.9 doesn't use vector loads for scalars and vec1s
12+
// RUN: %dxc -DTYPE=float -T vs_6_9 %s | FileCheck %s
13+
// RUN: %dxc -DTYPE=bool -T vs_6_9 %s | FileCheck %s --check-prefixes=CHECK,I1
14+
// RUN: %dxc -DTYPE=uint64_t -T vs_6_9 %s | FileCheck %s --check-prefixes=CHECK,I64
15+
// RUN: %dxc -DTYPE=double -T vs_6_9 %s | FileCheck %s --check-prefixes=CHECK,F64
16+
17+
// RUN: %dxc -DTYPE=float1 -T vs_6_9 %s | FileCheck %s
18+
// RUiN: %dxc -DTYPE=bool1 -T vs_6_9 %s | FileCheck %s --check-prefixes=CHECK,I1
19+
// RUN: %dxc -DTYPE=uint64_t1 -T vs_6_9 %s | FileCheck %s --check-prefixes=CHECK,I64
20+
// RUN: %dxc -DTYPE=double1 -T vs_6_9 %s | FileCheck %s --check-prefixes=CHECK,F64
21+
22+
///////////////////////////////////////////////////////////////////////
23+
// Test codegen for various load and store operations and conversions
24+
// for different scalar buffer types and confirm that the proper
25+
// loads, stores, and conversion operations take place.
26+
///////////////////////////////////////////////////////////////////////
27+
28+
29+
// These -DAGs must match the same line. That is the only reason for the -DAG.
30+
// The first match will assign [[TY]] to the native type
31+
// For most runs, the second match will assign [[TY32]] to the same thing.
32+
// For 64-bit types, the memory representation is i32 and a separate variable is needed.
33+
// For these cases, there is another line that will always match i32.
34+
// This line will also force the previous -DAGs to match the same line since the most
35+
// This shader can produce is two ResRet types.
36+
// CHECK-DAG: %dx.types.ResRet.[[TY:[a-z][0-9][0-9]]] = type { [[TYPE:[a-z0-9]*]],
37+
// CHECK-DAG: %dx.types.ResRet.[[TY32:[a-z][0-9][0-9]]] = type { [[TYPE]],
38+
// I64: %dx.types.ResRet.[[TY32:i32]]
39+
// F64: %dx.types.ResRet.[[TY32:i32]]
40+
41+
ByteAddressBuffer RoByBuf : register(t1);
42+
RWByteAddressBuffer RwByBuf : register(u1);
43+
44+
StructuredBuffer< TYPE > RoStBuf : register(t2);
45+
RWStructuredBuffer< TYPE > RwStBuf : register(u2);
46+
47+
Buffer< TYPE > RoTyBuf : register(t3);
48+
RWBuffer< TYPE > RwTyBuf : register(u3);
49+
50+
ConsumeStructuredBuffer<TYPE> CnStBuf : register(u4);
51+
AppendStructuredBuffer<TYPE> ApStBuf : register(u5);
52+
53+
void main(uint ix[2] : IX) {
54+
// ByteAddressBuffer Tests
55+
56+
// CHECK-DAG: [[HDLROBY:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 1, i32 1, i32 0, i8 0 }, i32 1, i1 false)
57+
// CHECK-DAG: [[HDLRWBY:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 1, i32 1, i32 0, i8 1 }, i32 1, i1 false)
58+
59+
// CHECK-DAG: [[HDLROST:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 2, i32 2, i32 0, i8 0 }, i32 2, i1 false)
60+
// CHECK-DAG: [[HDLRWST:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 2, i32 2, i32 0, i8 1 }, i32 2, i1 false)
61+
62+
// CHECK-DAG: [[HDLROTY:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 3, i32 3, i32 0, i8 0 }, i32 3, i1 false)
63+
// CHECK-DAG: [[HDLRWTY:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 3, i32 3, i32 0, i8 1 }, i32 3, i1 false)
64+
65+
// CHECK-DAG: [[HDLCON:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 4, i32 4, i32 0, i8 1 }, i32 4, i1 false)
66+
// CHECK-DAG: [[HDLAPP:%.*]] = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 5, i32 5, i32 0, i8 1 }, i32 5, i1 false)
67+
68+
// CHECK: [[IX0:%.*]] = call i32 @dx.op.loadInput.i32(i32 4,
69+
70+
// CHECK: [[ANHDLRWBY:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLRWBY]]
71+
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWBY]], i32 [[IX0]]
72+
// I1: icmp ne i32 %{{.*}}, 0
73+
TYPE babElt1 = RwByBuf.Load< TYPE >(ix[0]);
74+
75+
// CHECK: [[ANHDLROBY:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLROBY]]
76+
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLROBY]], i32 [[IX0]]
77+
// I1: icmp ne i32 %{{.*}}, 0
78+
TYPE babElt2 = RoByBuf.Load< TYPE >(ix[0]);
79+
80+
// I1: zext i1 %{{.*}} to i32
81+
// CHECK: all void @dx.op.rawBufferStore.[[TY]](i32 140, %dx.types.Handle [[ANHDLRWBY]], i32 [[IX0]]
82+
RwByBuf.Store< TYPE >(ix[0], babElt1 + babElt2);
83+
84+
// StructuredBuffer Tests
85+
// CHECK: [[ANHDLRWST:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLRWST]]
86+
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWST]], i32 [[IX0]]
87+
// I1: icmp ne i32 %{{.*}}, 0
88+
TYPE stbElt1 = RwStBuf.Load(ix[0]);
89+
// CHECK: [[IX1:%.*]] = call i32 @dx.op.loadInput.i32(i32 4,
90+
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLRWST]], i32 [[IX1]]
91+
// I1: icmp ne i32 %{{.*}}, 0
92+
TYPE stbElt2 = RwStBuf[ix[1]];
93+
94+
// CHECK: [[ANHDLROST:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLROST]]
95+
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLROST]], i32 [[IX0]]
96+
// I1: icmp ne i32 %{{.*}}, 0
97+
TYPE stbElt3 = RoStBuf.Load(ix[0]);
98+
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLROST]], i32 [[IX1]]
99+
// I1: icmp ne i32 %{{.*}}, 0
100+
TYPE stbElt4 = RoStBuf[ix[1]];
101+
102+
// I1: zext i1 %{{.*}} to i32
103+
// CHECK: all void @dx.op.rawBufferStore.[[TY]](i32 140, %dx.types.Handle [[ANHDLRWST]], i32 [[IX0]]
104+
RwStBuf[ix[0]] = stbElt1 + stbElt2 + stbElt3 + stbElt4;
105+
106+
// {Append/Consume}StructuredBuffer Tests
107+
// CHECK: [[ANHDLCON:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLCON]]
108+
// CHECK: [[CONIX:%.*]] = call i32 @dx.op.bufferUpdateCounter(i32 70, %dx.types.Handle [[ANHDLCON]], i8 -1)
109+
// CHECK: call %dx.types.ResRet.[[TY]] @dx.op.rawBufferLoad.[[TY]](i32 139, %dx.types.Handle [[ANHDLCON]], i32 [[CONIX]]
110+
// I1: icmp ne i32 %{{.*}}, 0
111+
TYPE cnElt = CnStBuf.Consume();
112+
113+
// CHECK: [[ANHDLAPP:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLAPP]]
114+
// CHECK: [[APPIX:%.*]] = call i32 @dx.op.bufferUpdateCounter(i32 70, %dx.types.Handle [[ANHDLAPP]], i8 1)
115+
// I1: zext i1 %{{.*}} to i32
116+
// CHECK: all void @dx.op.rawBufferStore.[[TY]](i32 140, %dx.types.Handle [[ANHDLAPP]], i32 [[APPIX]]
117+
ApStBuf.Append(cnElt);
118+
119+
// TypedBuffer Tests
120+
// CHECK: [[ANHDLRWTY:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLRWTY]]
121+
// CHECK: call %dx.types.ResRet.[[TY32]] @dx.op.bufferLoad.[[TY32]](i32 68, %dx.types.Handle [[ANHDLRWTY]], i32 [[IX0]]
122+
// F64: call double @dx.op.makeDouble.f64(i32 101
123+
// I64: zext i32 %{{.*}} to i64
124+
// I64: zext i32 %{{.*}} to i64
125+
// I64: shl nuw i64
126+
// I64: or i64
127+
// I1: icmp ne i32 %{{.*}}, 0
128+
TYPE typElt1 = RwTyBuf.Load(ix[0]);
129+
// CHECK: call %dx.types.ResRet.[[TY32]] @dx.op.bufferLoad.[[TY32]](i32 68, %dx.types.Handle [[ANHDLRWTY]], i32 [[IX1]]
130+
// F64: call double @dx.op.makeDouble.f64(i32 101
131+
// I64: zext i32 %{{.*}} to i64
132+
// I64: zext i32 %{{.*}} to i64
133+
// I64: shl nuw i64
134+
// I64: or i64
135+
// I1: icmp ne i32 %{{.*}}, 0
136+
TYPE typElt2 = RwTyBuf[ix[1]];
137+
// CHECK: [[ANHDLROTY:%.*]] = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle [[HDLROTY]]
138+
// CHECK: call %dx.types.ResRet.[[TY32]] @dx.op.bufferLoad.[[TY32]](i32 68, %dx.types.Handle [[ANHDLROTY]], i32 [[IX0]]
139+
// F64: call double @dx.op.makeDouble.f64(i32 101
140+
// I64: zext i32 %{{.*}} to i64
141+
// I64: zext i32 %{{.*}} to i64
142+
// I64: shl nuw i64
143+
// I64: or i64
144+
// I1: icmp ne i32 %{{.*}}, 0
145+
TYPE typElt3 = RoTyBuf.Load(ix[0]);
146+
// CHECK: call %dx.types.ResRet.[[TY32]] @dx.op.bufferLoad.[[TY32]](i32 68, %dx.types.Handle [[ANHDLROTY]], i32 [[IX1]]
147+
// F64: call double @dx.op.makeDouble.f64(i32 101
148+
// I64: zext i32 %{{.*}} to i64
149+
// I64: zext i32 %{{.*}} to i64
150+
// I64: shl nuw i64
151+
// I64: or i64
152+
// I1: icmp ne i32 %{{.*}}, 0
153+
TYPE typElt4 = RoTyBuf[ix[1]];
154+
155+
// F64: call %dx.types.splitdouble @dx.op.splitDouble.f64(i32 102
156+
// I64: trunc i64 %{{.*}} to i32
157+
// I64: lshr i64 %{{.*}}, 32
158+
// I64: trunc i64 %{{.*}} to i32
159+
// I1: zext i1 %{{.*}} to i32
160+
// CHECK: all void @dx.op.bufferStore.[[TY32]](i32 69, %dx.types.Handle [[ANHDLRWTY]], i32 [[IX0]]
161+
RwTyBuf[ix[0]] = typElt1 + typElt2 + typElt3 + typElt4;
162+
}

0 commit comments

Comments
 (0)