Skip to content

Conversation

Srihari-mcw
Copy link

The current PR implements AVX512 version of POV-RAY. This contains optimized implementations for Noise and DNoise functions.

Details of implementation:

  • Implementations/Modifications for single coordinate and double coordinate Noise/DNoise function that processes single/two Vector3D inputs at a time.
  • An implementation to process and get noise for 8 multiples of single Vector3D coordinate input at a time.
  • Rewritten versions of functions like DTurbulence/Turbulence/wrinkles/Initialize_Waves and classes like GranitePattern/WrinklesPattern so as to process multiple Vector3D coordinates at a time and use capabilities of AVX512. The present versions of these functions are also retained so that we can take that implementation route whenever we want the code to run with the other versions (Let's say AVX2FMA3)
  • Additional flags and checks were added to check if the machine supports AVX512 and corresponding code flow.

We are able to observe significantly better performance for POVRAY pipeline with the AVX512 implementation.
Results related to performance comparison of AVX512 version and AVX2FMA3 version with MSVC compiler are added in avx512_build_setup folder in a xlsx file.
The results are recorded in Intel I5-1035 G1 and AMD 7600X.
README related to the AVX512 version is attached in the avx512_build_setup folder detailing how to build the Windows and UNIX versions. Details with the Windows build is attached here also for easier reference

Notes for AVX512 Windows build
==============================

The visual studio version was updated from vs2015 to vs2022 to enable support of AVX512 Version

  1. Set the configuration in solution file to Release-AVX512 | x64
  2. Select the 'Generic POV-Ray > povbase' project and expand 'Backend Headers', then open the
    file build.h(source/base/build.h) listed within it.In it replace with name
    and email of person who builds the code in BUILT_BY flag and comment the #error directive (line 129)
  3. In syspovconfig.h(windows/povconfig/syspovconfig.h) uncomment the #define _CONSOLE. (line 56)
    The AVX512 version was developed with the console version.
    The GUI build has been skipped in the solution file.
    Note: (Presently with the updated code the GUI project is skipped for building,
    as the cmedit64.dll and povcmax64.dll from official windows distribution are
    incompatible with VS2022. The console version alone is available to build and test).
  4. Build the solution file and in the vs2022/bin64 folder we can run the POVRAY examples with povconsole-avx512.exe.
         General command example - povconsole-avx512.exe +Ibenchmark.pov
         Single worker thread - povconsole-avx512.exe +WT1 benchmark.pov
         Output image - benchmark.png

@Srihari-mcw
Copy link
Author

Srihari-mcw commented May 23, 2023

For easier reference of the performance results a excel workbook detailing the same has been attached here (also part of the branch submitted) -
POVRAY Compiled Results - AVX512 - AMD 7600X and Intel I5-1035 G1 with +WT1 Option.xlsx

@Srihari-mcw
Copy link
Author

Srihari-mcw commented Jun 1, 2023

Sample results for dimensions 320 x 240 :

AMD Raphael 7600X

image

image

Intel Icelake I5-1035 G1
image
image

@chris20 chris20 self-assigned this Oct 24, 2024
@chris20
Copy link
Member

chris20 commented Nov 8, 2024

Note: (Presently with the updated code the GUI project is skipped for building,
as the cmedit64.dll and povcmax64.dll from official windows distribution are
incompatible with VS2022. The console version alone is available to build and test).

Curious as to why exactly you say cmedit64 is "incompatible". Could you clarify?

That aside, there is absolutely no reason to confine the build to the console version, DLL issues or no DLL issues. Just rename or delete the editor DLL's and fire up pvengine as it has been explicitly designed to work with or without the editor.

@chris20
Copy link
Member

chris20 commented Nov 8, 2024

This is quite an extensive patch. The @Srihari-mcw account profile is quite bare and as of the time of writing shows nothing about you, your background, or where you work. In addition there seems little activity apart from your threading issue PR and a few commits here and there.

I need to know whether you coded this personally, are submitting it for someone who did, and whether it was a "work for hire".

Putting it another way: who exactly owns the intellectual property over the code being submitted in this pull request? With a patch this extensive I need to work that out before spending any more time on it.

Note: 3rd-party ownership of the code isn't bad, we have accepted similar patches before, but on those occasions we already knew who we were dealing with.

@Srihari-mcw
Copy link
Author

Hi @chris20 ,

This was work done as a contractor. We have confirmed this code is provided license free. We will try to check on the DLLs and get back. Thanks

@chris20
Copy link
Member

chris20 commented Nov 17, 2024

Sorry but "confirmed this code is provided license-free" can't work in this case. Intellectual property generated as a "work for hire" has a specific meaning within the US copyright system. The IP belongs to the entity that paid for the work to be done. For this patch to even be considered for acceptance I must know the identity of the IP owner and must communicate directly with them.

I don't understand the apparent secrecy regarding this work. It seems to be the product of significant effort. I am supposing this was sponsored by either AMD or Intel (probably the former). And if it was, that's perfectly OK. We've accepted similar patches in the past, but we were dealing directly with the copyright owner.

So please cut the secrecy and just tell us who sponsored the work and how we may contact them.

@Srihari-mcw
Copy link
Author

Srihari-mcw commented Nov 19, 2024

@chris20 ,

There is no intentional secrecy here, but we are bound by an NDA unless we receive explicit permission to share contract details. We don't anticipate any confidentiality or license issues, and have requested our client to reach out directly or comment here. Thanks

@anshuarya
Copy link
Contributor

@chris20 -- @Srihari-mcw works in my team. Feel free to reach out to me if you need any further clarifications: anshu.arya (at) amd

@Srihari-mcw
Copy link
Author

Srihari-mcw commented Nov 21, 2024

Recently some testing was carried out with this patch on AMD Granite Ridge 9600X(MSVC Console Version). Significant improvements were seen with benchmark and mediasky examples especially (26% and 21%) respectively. Attaching sheet and details here for reference

AMD 9600X - POVRay test results.xlsx

image
image

@chris20
Copy link
Member

chris20 commented Nov 23, 2024

Recently some testing was carried out with this patch on AMD Granite Ridge 9600X(MSVC Console Version). Significant improvements were seen with benchmark and mediasky examples especially (35% and 26%) respectively.

35% improvement on the benchmark is fairly impressive I will admit.

@Srihari-mcw
Copy link
Author

Srihari-mcw commented Nov 25, 2024

Recently some testing was carried out with this patch on AMD Granite Ridge 9600X(MSVC Console Version). Significant improvements were seen with benchmark and mediasky examples especially (35% and 26%) respectively.

35% improvement on the benchmark is fairly impressive I will admit.

@chris20 , some updates were made in the numbers above after further checks. The updated numbers have benchmark giving good gains around 26%. The same has been updated in the comment shared earlier. Thanks

@Srihari-mcw
Copy link
Author

Hi @chris20 ,

Post your comments on the GUI Version, we tried to build the GUI Version. Currently its generating an executable (pvengine-avx.exe) and currently we are unable to run it. Kindly share your thoughts here, thanks. Screenshot of the folder with build is attached here

povray output folder

We installed from your release package in the repo and were able to run the GUI Version. We tried to remove the DLLs (cmedit32/64.dll , povcmax32/64.dll) from the executable path and were able to run it successfully

Also, have you tried to build with the latest changes with VS2022 for CLI/GUI Versions? Thanks

@chris20
Copy link
Member

chris20 commented Dec 9, 2024

Hi Srihari,

When you say you're "unable to run it", could you please be more specific?

I haven't had a chance to build the code yet. I will drop an email to Anshu's AMD address with regard to the copyright assignment.

@Srihari-mcw
Copy link
Author

Hi @chris20 ,

Currently with the generated pvengine executable, on clicking there is no activity seen/ no GUI window opens up. Thanks

@chris20
Copy link
Member

chris20 commented Dec 10, 2024

Currently with the generated pvengine executable, on clicking there is no activity seen/ no GUI window opens up. Thanks

I don't have the current configuration you're building from and even if I did, I would expect someone who has raised an 80-file PR should at least make an attempt to solve simple issues like this by themselves. If it were some deep, gnarly issue that would require the attention of one of the primary developers and is in scope for the project it would be a different issue. But, c'mon, man, ... "unable to run it" ??? ¯\(ツ)/¯ :-)

Here's a hint: have you thought of using the debugger?

@Srihari-mcw
Copy link
Author

Yeah @chris20 , actually we are also looking for a fix for this on our side. Just wanted to check with you for any possible inputs. Thanks

@Srihari-mcw
Copy link
Author

Hi @chris20 ,

The issues with GUI Versions were fixed in the latest commits. Thanks


inline void Noise2D(const Vector3d& EPoint, int noise_generator, double &value) { }//PortableNoise2D(EPoint, noise_generator, value); }
inline void DNoise2D(Vector3d& result, const Vector3d& EPoint) { }//PortableDNoise2D(result, EPoint); }
inline DBL Noise8D(Vector3d& result, const Vector3d& EPoint) { return 0.0;}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of Noise8D does not match its function pointer defined above:

typedef DBL(*NoiseFunction8D) (const Vector3d& EPoint, int noise_generator);

This will trigger a build break with the error: no matching function for call to 'Noise8D' on systems where TRY_OPTIMIZED_NOISE is not defined - e.g. macOS.

Cheers,

{
virtual PatternPtr Clone() const { return BasicPattern::Clone(*this); }
virtual DBL EvaluateRaw(const Vector3d& EPoint, const Intersection *pIsection, const Ray *pRay, TraceThreadData *pThread) const;
};

/// Implements the `granite` pattern. - AVX512 implementation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These updates to pattern breaks the clang build at linking on macOS Intel and Arm builds with the following message:

ld: Undefined symbols:
  pov::wrinkles(pov::GenericVector3d<double> const&, pov::Tnormal_Struct const*, pov::GenericVector3d<double>&), referenced from:
      pov::Perturb_Normal(pov::GenericVector3d<double>&, pov::Tnormal_Struct const*, pov::GenericVector3d<double> const&, pov::Intersection*, pov::Ray const*, pov::TraceThreadData*) in libpovray.a[76](normal.o)
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Cheers,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have followed the commands for unix build and resolved the above stated issues.
Is it possible to understand the commands you used for building in macOS with Intel and Arm builds?

Copy link

@trevorsandy trevorsandy Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to understand the commands you used for building in macOS with Intel and Arm builds?

Certainly, follow this URL to my GitHub build actions: https://github.com/trevorsandy/povray/actions/runs/16244570362

Cheers,

/* Use avx2 intrinsics for vpermpd and native fma3 */
/******************************************************/

#define FMA_PD(a,b,c) _mm256_fmadd_pd((a),(b),(c))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue.

This macro fails on Fedora version 40 and higher (tested Fedora 40 - 42 Docker images) with the following feedback:

In file included from /usr/lib/gcc/x86_64-redhat-linux/14/include/immintrin.h:117,
                 from /usr/lib/gcc/x86_64-redhat-linux/14/include/x86intrin.h:32,
                 from x86/avx512/avx512noise.cpp:48:
/usr/lib/gcc/x86_64-redhat-linux/14/include/fmaintrin.h: In function ‘pov::AVX512Noise(pov::GenericVector3d<double> const&, int)’:
/usr/lib/gcc/x86_64-redhat-linux/14/include/fmaintrin.h:47:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_pd(double __vector(4), double __vector(4), double __vector(4))’: target specific option mismatch
   47 | _mm256_fmadd_pd (__m256d __A, __m256d __B, __m256d __C)
      | ^~~~~~~~~~~~~~~
x86/avx512/avx512noise.cpp:114:38: note: called from here
  114 | #define FMA_PD(a,b,c) _mm256_fmadd_pd((a),(b),(c))
      |                       ~~~~~~~~~~~~~~~^~~~~~~~~~~~~

Compiler checks seems ok. There is no check for -fma3:

...
checking whether g++ accepts -march=native... yes
checking which architecture to optimize for... x86_64-pc-linux-gnu (using -march=native)
checking whether C++ compiler accepts -mavx... yes
checking whether C++ compiler accepts -mavx2... yes
checking whether C++ compiler accepts -mfma... yes
checking whether C++ compiler accepts -mfma4... yes
checking whether C++ compiler accepts -mavx512f... yes
...

Compilation settings also looks good.

Compilation settings:
  Build architecture:  x86_64-pc-linux-gnu
  Built/Optimized for: x86_64-pc-linux-gnu (using -march=native)
  Compiler vendor:     gnu
  Compiler version:    g++ 14
  CXX Compiler flags:  -pipe -Wno-multichar -Wno-write-strings -fno-enforce-eh-specs -Wno-non-template-friend -s -O3 -ffast-math -march=native -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -pthread
  CPP Compiler flags:  -I/usr/include/SDL2 -D_REENTRANT -I/usr/include/OpenEXR -pthread -I/usr/include/Imath  -pthread -I/usr/include  -I/usr/include
  C Compiler flags:    -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer 
  Linker flags:        -Wl,-z,relro -Wl,--as-needed  -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes  -L/usr/lib64  -L/usr/lib
  CC parameters:       gcc
  CXX parameters:      g++
  Libraries:           -lSDL2 -lSDL2  -lX11  -lOpenEXR-3_2 -lOpenEXRUtil-3_2 -lOpenEXRCore-3_2 -lIex-3_2 -lIlmThread-3_2 -lImath-3_1 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lpthread  -pthread 

You can see the full GitHub build log here

For your information, builds are successful on Fedora 36.

Cheers,

unix/prebuild.sh Outdated
if BUILD_x86avx512
libraries_platformcpu += libx86avx512.a
libx86avx512_a_SOURCES = `echo $files_x86avx512`
libx86avx512_a_CXXFLAGS = \$(CXXFLAGS) -mavx512f
Copy link

@trevorsandy trevorsandy Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The avx512noise object uses avx2 intrinsics for vpermpd and native fma3 so the BUILD_x86avx512 CXXFLAGS should include the fma flag.

Consequently, line 1448 above should be:

linelibx86avx512_a_CXXFLAGS = \$(CXXFLAGS) -mavx512f -mfma

This addition corrects the Fedora 40+ build failure reported in the post above.

Lastly, it is a good idea to add a dirinfo.dox file to platform\x86\avx512 with the following content:

/**

@dir
@ingroup PovPlatform
@brief Source code files containing FMA3 and AVX512-specific implementations of dynamically dispatched code.

The files in this directory contain code that is intended for dynamic dispatch,
and therefore may need to be compiled with different options than the rest of POV-Ray.
For instance, to compile these files with gcc, the `-mfma -mavx512f` flags will be needed.

*/

Cheers,

@@ -114,6 +114,7 @@ static unsigned long long getXCR0()
#define CPUID_00000001_ECX_AVX_MASK (0x1 << 28)
#define CPUID_00000001_EDX_SSE2_MASK (0x1 << 26)
#define CPUID_00000007_EBX_AVX2_MASK (0x1 << 5)
#define CPUID_00000007_EBX_AVX512_MASK (0x1 << 16)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two constants with the same value? Intentional or by mistake?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both FMA4 and AVX512 constants are indeed set to 1 << 16. However, for cpuid information comparison to detect machine architecture information, AVX512 is using data from the EBX register

avx512 = ((info[CPUID_EBX] & CPUID_00000007_EBX_AVX512_MASK) != 0);
and FMA4 is using data from ECX register
fma4 = ((info[CPUID_ECX] & CPUID_80000001_ECX_FMA4_MASK) != 0);

@RokerHRO
Copy link

RokerHRO commented Sep 4, 2025

Does this AVX512 patch only work on MS Windows or also on Un*x?

@Srihari-mcw
Copy link
Author

Does this AVX512 patch only work on MS Windows or also on Un*x?

@RokerHRO , the support for AVX512 patch is added for both Windows and Linux Systems. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants