-
Notifications
You must be signed in to change notification settings - Fork 803
[SYCL][NATIVE_CPU] Update docs for Native CPU compiler pipeline #20042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
4e43981
to
0e74590
Compare
0e74590
to
a742204
Compare
a742204
to
5c56997
Compare
5c56997
to
80bee8b
Compare
80bee8b
to
afe1ace
Compare
afe1ace
to
7c6e3a8
Compare
7c6e3a8
to
3e06d72
Compare
37c026c
to
6092762
Compare
This integrates the appropriate compiler documentation originally in the oneAPI Construction Kit (OCK) into the NativeCPU compiler pipeline documentation. It has been updated to try to reflect the Native CPU pipeline, and remove some of the references to OCK's structures, as well as moving some of the documentation to markdown files to be consistent with some of the other documentation. Some of it may be irrelevant for Native CPU, and if so this should be updated over time. Support was added for the mermaid flowcharts in the config.
show how the `WorkItemLoopsPass` lays out and schedules a kernel\'s work-item | ||
loops in the face of barriers. | ||
|
||
```C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth converting this code into SYCL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to do that, perhaps put in as a suggestion, I'm more familiar with OpenCL.
``` | ||
|
||
As you can see, the `subhandler` steals the kernel's function name, and receives two pointer arguments: the first one points to the kernel arguments from the SYCL runtime, and the second one to the `nativecpu::state` struct. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to also keep the original input IR (from https://github.com/intel/llvm/blob/sycl/sycl/doc/design/SYCLNativeCPU.md) because it uses the original name - otherwise it's not clear (at least from this page) that "the subhandler steals the kernel's function name".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added original IR earlier in the section.
Hi @intel/dpcpp-doc-reviewers I need a review. |
3. All builtins which do not relate to scheduling have been processed and we are | ||
left with some scheduling related calls to `mux builtins`. | ||
4. The final compiled kernel is assumed to be invoked from the | ||
host-side runtime once per *work-group* in the **NDRange**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; Why is the indentation of 1. and 4. different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed bad indendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation now matches between 1 and 4 (next lines start one space after the start of the first line's sentence), but 3 is different (next line is aligned with the first line's sentence).
|
||
The inner-most function is the original input kernel, which is *wrapped* | ||
by new functions in successive phases, until it is ready in a form to be | ||
executed by the Native CPU driver. These include effectively wrapping a `for (wi : wg)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without additional context, it may not be clear what wg
and wi
are in this case. I assume it relates to the next paragraph, but if so this last sentence feels a little redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this line
Simple fix-up passes take place at this stage: the IR is massaged to | ||
conform to specifications or to fix known deficiencies in earlier | ||
representations. The input IR at this point will contains special | ||
builtins, called `mux builtins` for ndrange or subgroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it isn't code, it seems more appropriate to put quotations around "mux builtins" here, then drop the quotes for future mentions. Maybe mux
can stay code, given it is a partial part of the builtin names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
The [vecz](SYCLNativeCPUVecz.md) whole-function vectorizer is optionally run. | ||
|
||
Note that VECZ may perform its own scalarization, depending on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; It's referred to as "vecz" in the previous paragraph and then as "VECZ" here. Would be good to have it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to Vecz
This integrates the appropriate compiler documentation originally in the oneAPI Construction Kit (OCK) into the NativeCPU compiler pipeline documenation.
It has been updated to try to reflect the Native CPU pipeline, and remove some of the references to OCK's structures, as well as moving some of the documentation to markdown files to be consistent with some of the other documentation.
Some of it may be irrelevant for Native CPU, and if so this should be updated over time.
Support was added for the mermaid flowcharts in the config.