-
Notifications
You must be signed in to change notification settings - Fork 293
AVX512 version for POVRAY #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/v3.8.0
Are you sure you want to change the base?
AVX512 version for POVRAY #452
Conversation
For easier reference of the performance results a excel workbook detailing the same has been attached here (also part of the branch submitted) - |
Curious as to why exactly you say cmedit64 is "incompatible". Could you clarify? That aside, there is absolutely no reason to confine the build to the console version, DLL issues or no DLL issues. Just rename or delete the editor DLL's and fire up pvengine as it has been explicitly designed to work with or without the editor. |
This is quite an extensive patch. The @Srihari-mcw account profile is quite bare and as of the time of writing shows nothing about you, your background, or where you work. In addition there seems little activity apart from your threading issue PR and a few commits here and there. I need to know whether you coded this personally, are submitting it for someone who did, and whether it was a "work for hire". Putting it another way: who exactly owns the intellectual property over the code being submitted in this pull request? With a patch this extensive I need to work that out before spending any more time on it. Note: 3rd-party ownership of the code isn't bad, we have accepted similar patches before, but on those occasions we already knew who we were dealing with. |
Hi @chris20 , This was work done as a contractor. We have confirmed this code is provided license free. We will try to check on the DLLs and get back. Thanks |
Sorry but "confirmed this code is provided license-free" can't work in this case. Intellectual property generated as a "work for hire" has a specific meaning within the US copyright system. The IP belongs to the entity that paid for the work to be done. For this patch to even be considered for acceptance I must know the identity of the IP owner and must communicate directly with them. I don't understand the apparent secrecy regarding this work. It seems to be the product of significant effort. I am supposing this was sponsored by either AMD or Intel (probably the former). And if it was, that's perfectly OK. We've accepted similar patches in the past, but we were dealing directly with the copyright owner. So please cut the secrecy and just tell us who sponsored the work and how we may contact them. |
@chris20 , There is no intentional secrecy here, but we are bound by an NDA unless we receive explicit permission to share contract details. We don't anticipate any confidentiality or license issues, and have requested our client to reach out directly or comment here. Thanks |
@chris20 -- @Srihari-mcw works in my team. Feel free to reach out to me if you need any further clarifications: anshu.arya (at) amd |
Recently some testing was carried out with this patch on AMD Granite Ridge 9600X(MSVC Console Version). Significant improvements were seen with benchmark and mediasky examples especially (26% and 21%) respectively. Attaching sheet and details here for reference |
35% improvement on the benchmark is fairly impressive I will admit. |
@chris20 , some updates were made in the numbers above after further checks. The updated numbers have benchmark giving good gains around 26%. The same has been updated in the comment shared earlier. Thanks |
Hi @chris20 , Post your comments on the GUI Version, we tried to build the GUI Version. Currently its generating an executable (pvengine-avx.exe) and currently we are unable to run it. Kindly share your thoughts here, thanks. Screenshot of the folder with build is attached here We installed from your release package in the repo and were able to run the GUI Version. We tried to remove the DLLs (cmedit32/64.dll , povcmax32/64.dll) from the executable path and were able to run it successfully Also, have you tried to build with the latest changes with VS2022 for CLI/GUI Versions? Thanks |
Hi Srihari, When you say you're "unable to run it", could you please be more specific? I haven't had a chance to build the code yet. I will drop an email to Anshu's AMD address with regard to the copyright assignment. |
Hi @chris20 , Currently with the generated pvengine executable, on clicking there is no activity seen/ no GUI window opens up. Thanks |
I don't have the current configuration you're building from and even if I did, I would expect someone who has raised an 80-file PR should at least make an attempt to solve simple issues like this by themselves. If it were some deep, gnarly issue that would require the attention of one of the primary developers and is in scope for the project it would be a different issue. But, c'mon, man, ... "unable to run it" ??? ¯\(ツ)/¯ :-) Here's a hint: have you thought of using the debugger? |
Yeah @chris20 , actually we are also looking for a fix for this on our side. Just wanted to check with you for any possible inputs. Thanks |
Hi @chris20 , The issues with GUI Versions were fixed in the latest commits. Thanks |
source/core/material/noise.h
Outdated
|
||
inline void Noise2D(const Vector3d& EPoint, int noise_generator, double &value) { }//PortableNoise2D(EPoint, noise_generator, value); } | ||
inline void DNoise2D(Vector3d& result, const Vector3d& EPoint) { }//PortableDNoise2D(result, EPoint); } | ||
inline DBL Noise8D(Vector3d& result, const Vector3d& EPoint) { return 0.0;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of Noise8D does not match its function pointer defined above:
typedef DBL(*NoiseFunction8D) (const Vector3d& EPoint, int noise_generator);
This will trigger a build break with the error: no matching function for call to 'Noise8D' on systems where TRY_OPTIMIZED_NOISE is not defined - e.g. macOS.
Cheers,
{ | ||
virtual PatternPtr Clone() const { return BasicPattern::Clone(*this); } | ||
virtual DBL EvaluateRaw(const Vector3d& EPoint, const Intersection *pIsection, const Ray *pRay, TraceThreadData *pThread) const; | ||
}; | ||
|
||
/// Implements the `granite` pattern. - AVX512 implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These updates to pattern breaks the clang build at linking on macOS Intel and Arm builds with the following message:
ld: Undefined symbols:
pov::wrinkles(pov::GenericVector3d<double> const&, pov::Tnormal_Struct const*, pov::GenericVector3d<double>&), referenced from:
pov::Perturb_Normal(pov::GenericVector3d<double>&, pov::Tnormal_Struct const*, pov::GenericVector3d<double> const&, pov::Intersection*, pov::Ray const*, pov::TraceThreadData*) in libpovray.a[76](normal.o)
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Cheers,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have followed the commands for unix build
and resolved the above stated issues.
Is it possible to understand the commands you used for building in macOS with Intel and Arm builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to understand the commands you used for building in macOS with Intel and Arm builds?
Certainly, follow this URL to my GitHub build actions: https://github.com/trevorsandy/povray/actions/runs/16244570362
Cheers,
/* Use avx2 intrinsics for vpermpd and native fma3 */ | ||
/******************************************************/ | ||
|
||
#define FMA_PD(a,b,c) _mm256_fmadd_pd((a),(b),(c)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue.
This macro fails on Fedora version 40 and higher (tested Fedora 40 - 42 Docker images) with the following feedback:
In file included from /usr/lib/gcc/x86_64-redhat-linux/14/include/immintrin.h:117,
from /usr/lib/gcc/x86_64-redhat-linux/14/include/x86intrin.h:32,
from x86/avx512/avx512noise.cpp:48:
/usr/lib/gcc/x86_64-redhat-linux/14/include/fmaintrin.h: In function ‘pov::AVX512Noise(pov::GenericVector3d<double> const&, int)’:
/usr/lib/gcc/x86_64-redhat-linux/14/include/fmaintrin.h:47:1: error: inlining failed in call to ‘always_inline’ ‘_mm256_fmadd_pd(double __vector(4), double __vector(4), double __vector(4))’: target specific option mismatch
47 | _mm256_fmadd_pd (__m256d __A, __m256d __B, __m256d __C)
| ^~~~~~~~~~~~~~~
x86/avx512/avx512noise.cpp:114:38: note: called from here
114 | #define FMA_PD(a,b,c) _mm256_fmadd_pd((a),(b),(c))
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~
Compiler checks seems ok. There is no check for -fma3:
...
checking whether g++ accepts -march=native... yes
checking which architecture to optimize for... x86_64-pc-linux-gnu (using -march=native)
checking whether C++ compiler accepts -mavx... yes
checking whether C++ compiler accepts -mavx2... yes
checking whether C++ compiler accepts -mfma... yes
checking whether C++ compiler accepts -mfma4... yes
checking whether C++ compiler accepts -mavx512f... yes
...
Compilation settings also looks good.
Compilation settings:
Build architecture: x86_64-pc-linux-gnu
Built/Optimized for: x86_64-pc-linux-gnu (using -march=native)
Compiler vendor: gnu
Compiler version: g++ 14
CXX Compiler flags: -pipe -Wno-multichar -Wno-write-strings -fno-enforce-eh-specs -Wno-non-template-friend -s -O3 -ffast-math -march=native -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -pthread
CPP Compiler flags: -I/usr/include/SDL2 -D_REENTRANT -I/usr/include/OpenEXR -pthread -I/usr/include/Imath -pthread -I/usr/include -I/usr/include
C Compiler flags: -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
Linker flags: -Wl,-z,relro -Wl,--as-needed -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -L/usr/lib64 -L/usr/lib
CC parameters: gcc
CXX parameters: g++
Libraries: -lSDL2 -lSDL2 -lX11 -lOpenEXR-3_2 -lOpenEXRUtil-3_2 -lOpenEXRCore-3_2 -lIex-3_2 -lIlmThread-3_2 -lImath-3_1 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lpthread -pthread
You can see the full GitHub build log here
For your information, builds are successful on Fedora 36.
Cheers,
unix/prebuild.sh
Outdated
if BUILD_x86avx512 | ||
libraries_platformcpu += libx86avx512.a | ||
libx86avx512_a_SOURCES = `echo $files_x86avx512` | ||
libx86avx512_a_CXXFLAGS = \$(CXXFLAGS) -mavx512f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The avx512noise object uses avx2 intrinsics for vpermpd and native fma3 so the BUILD_x86avx512 CXXFLAGS should include the fma flag.
Consequently, line 1448 above should be:
linelibx86avx512_a_CXXFLAGS = \$(CXXFLAGS) -mavx512f -mfma
This addition corrects the Fedora 40+ build failure reported in the post above.
Lastly, it is a good idea to add a dirinfo.dox file to platform\x86\avx512
with the following content:
/**
@dir
@ingroup PovPlatform
@brief Source code files containing FMA3 and AVX512-specific implementations of dynamically dispatched code.
The files in this directory contain code that is intended for dynamic dispatch,
and therefore may need to be compiled with different options than the rest of POV-Ray.
For instance, to compile these files with gcc, the `-mfma -mavx512f` flags will be needed.
*/
Cheers,
@@ -114,6 +114,7 @@ static unsigned long long getXCR0() | |||
#define CPUID_00000001_ECX_AVX_MASK (0x1 << 28) | |||
#define CPUID_00000001_EDX_SSE2_MASK (0x1 << 26) | |||
#define CPUID_00000007_EBX_AVX2_MASK (0x1 << 5) | |||
#define CPUID_00000007_EBX_AVX512_MASK (0x1 << 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two constants with the same value? Intentional or by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both FMA4 and AVX512 constants are indeed set to 1 << 16. However, for cpuid information comparison to detect machine architecture information, AVX512 is using data from the EBX register
Line 229 in e26c5c4
avx512 = ((info[CPUID_EBX] & CPUID_00000007_EBX_AVX512_MASK) != 0); |
Line 236 in e26c5c4
fma4 = ((info[CPUID_ECX] & CPUID_80000001_ECX_FMA4_MASK) != 0); |
Does this AVX512 patch only work on MS Windows or also on Un*x? |
@RokerHRO , the support for AVX512 patch is added for both Windows and Linux Systems. Thanks |
The current PR implements AVX512 version of POV-RAY. This contains optimized implementations for Noise and DNoise functions.
Details of implementation:
We are able to observe significantly better performance for POVRAY pipeline with the AVX512 implementation.
Results related to performance comparison of AVX512 version and AVX2FMA3 version with MSVC compiler are added in avx512_build_setup folder in a xlsx file.
The results are recorded in Intel I5-1035 G1 and AMD 7600X.
README related to the AVX512 version is attached in the avx512_build_setup folder detailing how to build the Windows and UNIX versions. Details with the Windows build is attached here also for easier reference
Notes for AVX512 Windows build
==============================
The visual studio version was updated from vs2015 to vs2022 to enable support of AVX512 Version
file
build.h
(source/base/build.h) listed within it.In it replace with nameand email of person who builds the code in
BUILT_BY
flag and comment the #error directive (line 129)The AVX512 version was developed with the console version.
The GUI build has been skipped in the solution file.
Note: (Presently with the updated code the GUI project is skipped for building,
as the cmedit64.dll and povcmax64.dll from official windows distribution are
incompatible with VS2022. The console version alone is available to build and test).