Skip to content

Conversation

coldav
Copy link
Contributor

@coldav coldav commented Sep 10, 2025

This integrates the appropriate compiler documentation originally in the oneAPI Construction Kit (OCK) into the NativeCPU compiler pipeline documenation.

It has been updated to try to reflect the Native CPU pipeline, and remove some of the references to OCK's structures, as well as moving some of the documentation to markdown files to be consistent with some of the other documentation.

Some of it may be irrelevant for Native CPU, and if so this should be updated over time.

Support was added for the mermaid flowcharts in the config.

@coldav coldav requested a review from a team as a code owner September 10, 2025 16:27
@coldav coldav force-pushed the colin/add_native_cpu_pipeline_docs branch 2 times, most recently from 4e43981 to 0e74590 Compare September 11, 2025 09:44
@coldav coldav requested a review from a team as a code owner September 11, 2025 09:44
@coldav coldav force-pushed the colin/add_native_cpu_pipeline_docs branch from 0e74590 to a742204 Compare September 11, 2025 10:03
@coldav coldav force-pushed the colin/add_native_cpu_pipeline_docs branch from a742204 to 5c56997 Compare September 11, 2025 10:11
@coldav coldav force-pushed the colin/add_native_cpu_pipeline_docs branch from 5c56997 to 80bee8b Compare September 11, 2025 11:12
@coldav coldav requested a review from a team as a code owner September 11, 2025 11:12
@coldav coldav force-pushed the colin/add_native_cpu_pipeline_docs branch from 80bee8b to afe1ace Compare September 11, 2025 12:38
@coldav coldav requested a review from a team as a code owner September 11, 2025 12:38
@coldav coldav force-pushed the colin/add_native_cpu_pipeline_docs branch from afe1ace to 7c6e3a8 Compare September 11, 2025 12:46
@coldav coldav force-pushed the colin/add_native_cpu_pipeline_docs branch from 7c6e3a8 to 3e06d72 Compare September 11, 2025 13:00
@coldav coldav force-pushed the colin/add_native_cpu_pipeline_docs branch 2 times, most recently from 37c026c to 6092762 Compare September 11, 2025 13:24
This integrates the appropriate compiler documentation originally in the
oneAPI Construction Kit (OCK) into the NativeCPU compiler pipeline documentation.

It has been updated to try to reflect the Native CPU pipeline, and remove
some of the references to OCK's structures, as well as moving some of the
documentation to markdown files to be consistent with some of the other documentation.

Some of it may be irrelevant for Native CPU, and if so this should be updated over time.

Support was added for the mermaid flowcharts in the config.
show how the `WorkItemLoopsPass` lays out and schedules a kernel\'s work-item
loops in the face of barriers.

```C
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth converting this code into SYCL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to do that, perhaps put in as a suggestion, I'm more familiar with OpenCL.

```

As you can see, the `subhandler` steals the kernel's function name, and receives two pointer arguments: the first one points to the kernel arguments from the SYCL runtime, and the second one to the `nativecpu::state` struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to also keep the original input IR (from https://github.com/intel/llvm/blob/sycl/sycl/doc/design/SYCLNativeCPU.md) because it uses the original name - otherwise it's not clear (at least from this page) that "the subhandler steals the kernel's function name".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added original IR earlier in the section.

@coldav
Copy link
Contributor Author

coldav commented Sep 16, 2025

Hi @intel/dpcpp-doc-reviewers I need a review.

3. All builtins which do not relate to scheduling have been processed and we are
left with some scheduling related calls to `mux builtins`.
4. The final compiled kernel is assumed to be invoked from the
host-side runtime once per *work-group* in the **NDRange**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; Why is the indentation of 1. and 4. different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed bad indendation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation now matches between 1 and 4 (next lines start one space after the start of the first line's sentence), but 3 is different (next line is aligned with the first line's sentence).


The inner-most function is the original input kernel, which is *wrapped*
by new functions in successive phases, until it is ready in a form to be
executed by the Native CPU driver. These include effectively wrapping a `for (wi : wg)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without additional context, it may not be clear what wg and wi are in this case. I assume it relates to the next paragraph, but if so this last sentence feels a little redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this line

Simple fix-up passes take place at this stage: the IR is massaged to
conform to specifications or to fix known deficiencies in earlier
representations. The input IR at this point will contains special
builtins, called `mux builtins` for ndrange or subgroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it isn't code, it seems more appropriate to put quotations around "mux builtins" here, then drop the quotes for future mentions. Maybe mux can stay code, given it is a partial part of the builtin names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


The [vecz](SYCLNativeCPUVecz.md) whole-function vectorizer is optionally run.

Note that VECZ may perform its own scalarization, depending on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; It's referred to as "vecz" in the previous paragraph and then as "VECZ" here. Would be good to have it consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Vecz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants