AVX512 version for POVRAY #452

Srihari-mcw · 2023-05-16T07:26:11Z

The current PR implements AVX512 version of POV-RAY. This contains optimized implementations for Noise and DNoise functions.

Details of implementation:

Implementations/Modifications for single coordinate and double coordinate Noise/DNoise function that processes single/two Vector3D inputs at a time.
An implementation to process and get noise for 8 multiples of single Vector3D coordinate input at a time.
Rewritten versions of functions like DTurbulence/Turbulence/wrinkles/Initialize_Waves and classes like GranitePattern/WrinklesPattern so as to process multiple Vector3D coordinates at a time and use capabilities of AVX512. The present versions of these functions are also retained so that we can take that implementation route whenever we want the code to run with the other versions (Let's say AVX2FMA3)
Additional flags and checks were added to check if the machine supports AVX512 and corresponding code flow.

We are able to observe significantly better performance for POVRAY pipeline with the AVX512 implementation.
Results related to performance comparison of AVX512 version and AVX2FMA3 version with MSVC compiler are added in avx512_build_setup folder in a xlsx file.
The results are recorded in Intel I5-1035 G1 and AMD 7600X.
README related to the AVX512 version is attached in the avx512_build_setup folder detailing how to build the Windows and UNIX versions. Details with the Windows build is attached here also for easier reference

Notes for AVX512 Windows build
==============================

The visual studio version was updated from vs2015 to vs2022 to enable support of AVX512 Version

Set the configuration in solution file to Release-AVX512 | x64
Select the 'Generic POV-Ray > povbase' project and expand 'Backend Headers', then open the
file build.h(source/base/build.h) listed within it.In it replace with name
and email of person who builds the code in BUILT_BY flag and comment the #error directive (line 129)
In syspovconfig.h(windows/povconfig/syspovconfig.h) uncomment the #define _CONSOLE. (line 56)
The AVX512 version was developed with the console version.
The GUI build has been skipped in the solution file.
Note: (Presently with the updated code the GUI project is skipped for building,
as the cmedit64.dll and povcmax64.dll from official windows distribution are
incompatible with VS2022. The console version alone is available to build and test).
Build the solution file and in the vs2022/bin64 folder we can run the POVRAY examples with povconsole-avx512.exe.

         General command example - povconsole-avx512.exe +Ibenchmark.pov
         Single worker thread - povconsole-avx512.exe +WT1 benchmark.pov
         Output image - benchmark.png

Srihari-mcw · 2023-05-23T04:13:43Z

For easier reference of the performance results a excel workbook detailing the same has been attached here (also part of the branch submitted) -
POVRAY Compiled Results - AVX512 - AMD 7600X and Intel I5-1035 G1 with +WT1 Option.xlsx

Srihari-mcw · 2023-06-01T01:19:11Z

Sample results for dimensions 320 x 240 :

AMD Raphael 7600X

Intel Icelake I5-1035 G1

chris20 · 2024-11-08T18:18:00Z

Note: (Presently with the updated code the GUI project is skipped for building,
as the cmedit64.dll and povcmax64.dll from official windows distribution are
incompatible with VS2022. The console version alone is available to build and test).

Curious as to why exactly you say cmedit64 is "incompatible". Could you clarify?

That aside, there is absolutely no reason to confine the build to the console version, DLL issues or no DLL issues. Just rename or delete the editor DLL's and fire up pvengine as it has been explicitly designed to work with or without the editor.

chris20 · 2024-11-08T18:47:01Z

This is quite an extensive patch. The @Srihari-mcw account profile is quite bare and as of the time of writing shows nothing about you, your background, or where you work. In addition there seems little activity apart from your threading issue PR and a few commits here and there.

I need to know whether you coded this personally, are submitting it for someone who did, and whether it was a "work for hire".

Putting it another way: who exactly owns the intellectual property over the code being submitted in this pull request? With a patch this extensive I need to work that out before spending any more time on it.

Note: 3rd-party ownership of the code isn't bad, we have accepted similar patches before, but on those occasions we already knew who we were dealing with.

Srihari-mcw · 2024-11-14T11:15:08Z

Hi @chris20 ,

This was work done as a contractor. We have confirmed this code is provided license free. We will try to check on the DLLs and get back. Thanks

chris20 · 2024-11-17T18:54:19Z

Sorry but "confirmed this code is provided license-free" can't work in this case. Intellectual property generated as a "work for hire" has a specific meaning within the US copyright system. The IP belongs to the entity that paid for the work to be done. For this patch to even be considered for acceptance I must know the identity of the IP owner and must communicate directly with them.

I don't understand the apparent secrecy regarding this work. It seems to be the product of significant effort. I am supposing this was sponsored by either AMD or Intel (probably the former). And if it was, that's perfectly OK. We've accepted similar patches in the past, but we were dealing directly with the copyright owner.

So please cut the secrecy and just tell us who sponsored the work and how we may contact them.

Srihari-mcw · 2024-11-19T12:45:29Z

@chris20 ,

There is no intentional secrecy here, but we are bound by an NDA unless we receive explicit permission to share contract details. We don't anticipate any confidentiality or license issues, and have requested our client to reach out directly or comment here. Thanks

anshuarya · 2024-11-20T16:54:46Z

@chris20 -- @Srihari-mcw works in my team. Feel free to reach out to me if you need any further clarifications: anshu.arya (at) amd

Srihari-mcw · 2024-11-21T03:56:51Z

Recently some testing was carried out with this patch on AMD Granite Ridge 9600X(MSVC Console Version). Significant improvements were seen with benchmark and mediasky examples especially (26% and 21%) respectively. Attaching sheet and details here for reference

AMD 9600X - POVRay test results.xlsx

…l Studio Updates

chris20 · 2024-11-23T13:06:22Z

Recently some testing was carried out with this patch on AMD Granite Ridge 9600X(MSVC Console Version). Significant improvements were seen with benchmark and mediasky examples especially (35% and 26%) respectively.

35% improvement on the benchmark is fairly impressive I will admit.

Srihari-mcw · 2024-11-25T17:25:55Z

Recently some testing was carried out with this patch on AMD Granite Ridge 9600X(MSVC Console Version). Significant improvements were seen with benchmark and mediasky examples especially (35% and 26%) respectively.

35% improvement on the benchmark is fairly impressive I will admit.

@chris20 , some updates were made in the numbers above after further checks. The updated numbers have benchmark giving good gains around 26%. The same has been updated in the comment shared earlier. Thanks

Srihari-mcw · 2024-12-09T16:20:44Z

Hi @chris20 ,

Post your comments on the GUI Version, we tried to build the GUI Version. Currently its generating an executable (pvengine-avx.exe) and currently we are unable to run it. Kindly share your thoughts here, thanks. Screenshot of the folder with build is attached here

We installed from your release package in the repo and were able to run the GUI Version. We tried to remove the DLLs (cmedit32/64.dll , povcmax32/64.dll) from the executable path and were able to run it successfully

Also, have you tried to build with the latest changes with VS2022 for CLI/GUI Versions? Thanks

chris20 · 2024-12-09T17:31:41Z

Hi Srihari,

When you say you're "unable to run it", could you please be more specific?

I haven't had a chance to build the code yet. I will drop an email to Anshu's AMD address with regard to the copyright assignment.

Srihari-mcw · 2024-12-10T16:15:28Z

Hi @chris20 ,

Currently with the generated pvengine executable, on clicking there is no activity seen/ no GUI window opens up. Thanks

chris20 · 2024-12-10T17:53:15Z

Currently with the generated pvengine executable, on clicking there is no activity seen/ no GUI window opens up. Thanks

I don't have the current configuration you're building from and even if I did, I would expect someone who has raised an 80-file PR should at least make an attempt to solve simple issues like this by themselves. If it were some deep, gnarly issue that would require the attention of one of the primary developers and is in scope for the project it would be a different issue. But, c'mon, man, ... "unable to run it" ??? ¯\(ツ)/¯ :-)

Here's a hint: have you thought of using the debugger?

Srihari-mcw · 2024-12-11T15:37:57Z

Yeah @chris20 , actually we are also looking for a fix for this on our side. Just wanted to check with you for any possible inputs. Thanks

Srihari-mcw · 2025-01-17T13:42:52Z

Hi @chris20 ,

The issues with GUI Versions were fixed in the latest commits. Thanks

trevorsandy · 2025-07-04T06:56:09Z

source/core/material/noise.h

-
+inline void Noise2D(const Vector3d& EPoint, int noise_generator, double &value) { }//PortableNoise2D(EPoint, noise_generator, value); }
+inline void DNoise2D(Vector3d& result, const Vector3d& EPoint) { }//PortableDNoise2D(result, EPoint); }
+inline DBL Noise8D(Vector3d& result, const Vector3d& EPoint) { return 0.0;}


The signature of Noise8D does not match its function pointer defined above:

typedef DBL(*NoiseFunction8D) (const Vector3d& EPoint, int noise_generator);

This will trigger a build break with the error: no matching function for call to 'Noise8D' on systems where TRY_OPTIMIZED_NOISE is not defined - e.g. macOS.

Cheers,

trevorsandy · 2025-07-04T07:44:39Z

source/core/material/pattern.h

    virtual DBL EvaluateRaw(const Vector3d& EPoint, const Intersection *pIsection, const Ray *pRay, TraceThreadData *pThread) const;
 };

+/// Implements the `granite` pattern. - AVX512 implementation


These updates to pattern breaks the clang build at linking on macOS Intel and Arm builds with the following message:

ld: Undefined symbols: pov::wrinkles(pov::GenericVector3d<double> const&, pov::Tnormal_Struct const*, pov::GenericVector3d<double>&), referenced from: pov::Perturb_Normal(pov::GenericVector3d<double>&, pov::Tnormal_Struct const*, pov::GenericVector3d<double> const&, pov::Intersection*, pov::Ray const*, pov::TraceThreadData*) in libpovray.a[76](normal.o) clang: error: linker command failed with exit code 1 (use -v to see invocation)

Cheers,

We have followed the commands for unix build and resolved the above stated issues.
Is it possible to understand the commands you used for building in macOS with Intel and Arm builds?

Is it possible to understand the commands you used for building in macOS with Intel and Arm builds?

Certainly, follow this URL to my GitHub build actions: https://github.com/trevorsandy/povray/actions/runs/16244570362

Cheers,

trevorsandy · 2025-08-18T03:54:27Z

platform/x86/avx512/avx512noise.cpp

+/* Use avx2 intrinsics for vpermpd and native fma3    */
+/******************************************************/
+
+#define FMA_PD(a,b,c) _mm256_fmadd_pd((a),(b),(c))


Another issue.

This macro fails on Fedora version 40 and higher (tested Fedora 40 - 42 Docker images) with the following feedback:

In file included from /usr/lib/gcc/x86_64-redhat-linux/14/include/immintrin.h:117, from /usr/lib/gcc/x86_64-redhat-linux/14/include/x86intrin.h:32, from x86/avx512/avx512noise.cpp:48: /usr/lib/gcc/x86_64-redhat-linux/14/include/fmaintrin.h: In function ‘pov::AVX512Noise(pov::GenericVector3d<double> const&, int)’: /usr/lib/gcc/x86_64-redhat-linux/14/include/fmaintrin.h:47:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_pd(double __vector(4), double __vector(4), double __vector(4))’: target specific option mismatch 47 | _mm256_fmadd_pd (__m256d __A, __m256d __B, __m256d __C) | ^~~~~~~~~~~~~~~ x86/avx512/avx512noise.cpp:114:38: note: called from here 114 | #define FMA_PD(a,b,c) _mm256_fmadd_pd((a),(b),(c)) | ~~~~~~~~~~~~~~~^~~~~~~~~~~~~

Compiler checks seems ok. There is no check for -fma3:

... checking whether g++ accepts -march=native... yes checking which architecture to optimize for... x86_64-pc-linux-gnu (using -march=native) checking whether C++ compiler accepts -mavx... yes checking whether C++ compiler accepts -mavx2... yes checking whether C++ compiler accepts -mfma... yes checking whether C++ compiler accepts -mfma4... yes checking whether C++ compiler accepts -mavx512f... yes ...

Compilation settings also looks good.

Compilation settings: Build architecture: x86_64-pc-linux-gnu Built/Optimized for: x86_64-pc-linux-gnu (using -march=native) Compiler vendor: gnu Compiler version: g++ 14 CXX Compiler flags: -pipe -Wno-multichar -Wno-write-strings -fno-enforce-eh-specs -Wno-non-template-friend -s -O3 -ffast-math -march=native -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -pthread CPP Compiler flags: -I/usr/include/SDL2 -D_REENTRANT -I/usr/include/OpenEXR -pthread -I/usr/include/Imath -pthread -I/usr/include -I/usr/include C Compiler flags: -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer Linker flags: -Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -L/usr/lib64 -L/usr/lib CC parameters: gcc CXX parameters: g++ Libraries: -lSDL2 -lSDL2 -lX11 -lOpenEXR-3_2 -lOpenEXRUtil-3_2 -lOpenEXRCore-3_2 -lIex-3_2 -lIlmThread-3_2 -lImath-3_1 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lpthread -pthread

You can see the full GitHub build log here

For your information, builds are successful on Fedora 36.

Cheers,

trevorsandy · 2025-08-19T19:13:50Z

unix/prebuild.sh

+if BUILD_x86avx512
+libraries_platformcpu += libx86avx512.a
+libx86avx512_a_SOURCES =  `echo $files_x86avx512`
+libx86avx512_a_CXXFLAGS = \$(CXXFLAGS) -mavx512f


The avx512noise object uses avx2 intrinsics for vpermpd and native fma3 so the BUILD_x86avx512 CXXFLAGS should include the fma flag.

Consequently, line 1448 above should be:

linelibx86avx512_a_CXXFLAGS = \$(CXXFLAGS) -mavx512f -mfma

This addition corrects the Fedora 40+ build failure reported in the post above.

Lastly, it is a good idea to add a dirinfo.dox file to platform\x86\avx512 with the following content:

/** @dir @ingroup PovPlatform @brief Source code files containing FMA3 and AVX512-specific implementations of dynamically dispatched code. The files in this directory contain code that is intended for dynamic dispatch, and therefore may need to be compiled with different options than the rest of POV-Ray. For instance, to compile these files with gcc, the `-mfma -mavx512f` flags will be needed. */

Cheers,

RokerHRO · 2025-09-04T14:06:48Z

platform/x86/cpuid.cpp

 #define CPUID_00000001_ECX_AVX_MASK     (0x1 << 28)
 #define CPUID_00000001_EDX_SSE2_MASK    (0x1 << 26)
 #define CPUID_00000007_EBX_AVX2_MASK    (0x1 <<  5)
+#define CPUID_00000007_EBX_AVX512_MASK  (0x1 << 16)


two constants with the same value? Intentional or by mistake?

Both FMA4 and AVX512 constants are indeed set to 1 << 16. However, for cpuid information comparison to detect machine architecture information, AVX512 is using data from the EBX register

povray/platform/x86/cpuid.cpp

Line 229 in e26c5c4

avx512 = ((info[CPUID_EBX] & CPUID_00000007_EBX_AVX512_MASK) != 0);

and FMA4 is using data from ECX register

povray/platform/x86/cpuid.cpp

Line 236 in e26c5c4

fma4 = ((info[CPUID_ECX] & CPUID_80000001_ECX_FMA4_MASK) != 0);

RokerHRO · 2025-09-04T14:13:38Z

Does this AVX512 patch only work on MS Windows or also on Un*x?

Srihari-mcw · 2025-09-05T11:20:21Z

Does this AVX512 patch only work on MS Windows or also on Un*x?

@RokerHRO , the support for AVX512 patch is added for both Windows and Linux Systems. Thanks

Srihari-mcw added 2 commits May 12, 2023 00:05

Add updates for AVX512 version of POV-RAY

e26c5c4

Update the sheet with additional performance details

81f15b3

chris20 self-assigned this Oct 24, 2024

Make updates to fix issues with OpenEXR dependency after latest Visua…

7066e75

…l Studio Updates

Srihari-mcw added 3 commits January 17, 2025 18:54

Update the gui changes along with minor fixes

1ac864b

update indentation

8e06545

Update README files

ebfdc8c

trevorsandy reviewed Jul 4, 2025

View reviewed changes

Fixes for issues with Noise8D and wrinkles functions definitions

a925046

trevorsandy reviewed Aug 18, 2025

View reviewed changes

trevorsandy reviewed Aug 19, 2025

View reviewed changes

Added fma flag for unix setup with avx512

5190396

RokerHRO reviewed Sep 4, 2025

View reviewed changes

AVX512 version for POVRAY #452

Are you sure you want to change the base?

AVX512 version for POVRAY #452

Uh oh!

Conversation

Srihari-mcw commented May 16, 2023

Uh oh!

Srihari-mcw commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Srihari-mcw commented Jun 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris20 commented Nov 8, 2024

Uh oh!

chris20 commented Nov 8, 2024

Uh oh!

Srihari-mcw commented Nov 14, 2024

Uh oh!

chris20 commented Nov 17, 2024

Uh oh!

Srihari-mcw commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anshuarya commented Nov 20, 2024

Uh oh!

Srihari-mcw commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris20 commented Nov 23, 2024

Uh oh!

Srihari-mcw commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Srihari-mcw commented Dec 9, 2024

Uh oh!

chris20 commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Srihari-mcw commented Dec 10, 2024

Uh oh!

chris20 commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Srihari-mcw commented Dec 11, 2024

Uh oh!

Srihari-mcw commented Jan 17, 2025

Uh oh!

trevorsandy Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

trevorsandy Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

jenetscaria-mcw Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

trevorsandy Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevorsandy Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

trevorsandy Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RokerHRO Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Srihari-mcw Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

RokerHRO commented Sep 4, 2025

Uh oh!

Srihari-mcw commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Srihari-mcw commented May 23, 2023 •

edited

Loading

Srihari-mcw commented Jun 1, 2023 •

edited

Loading

Srihari-mcw commented Nov 19, 2024 •

edited

Loading

Srihari-mcw commented Nov 21, 2024 •

edited

Loading

Srihari-mcw commented Nov 25, 2024 •

edited

Loading

chris20 commented Dec 9, 2024 •

edited

Loading

chris20 commented Dec 10, 2024 •

edited

Loading

trevorsandy Jul 11, 2025 •

edited

Loading

trevorsandy Aug 19, 2025 •

edited

Loading