Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
04fbb2f
Working version of diffusion and reflection. TODO fix refraction
Black-Phoenix Sep 23, 2019
dd86be4
Finished diffusion and reflection. Also finished Thrust stream compac…
Black-Phoenix Sep 24, 2019
5191715
Optimized intersection calculations and set max depth = scenedepth
Black-Phoenix Sep 24, 2019
2beb0d2
Added sorting and caching first bounce
Black-Phoenix Sep 24, 2019
7cbef1a
Added refraction; Fixed cache bounce to reset with view change
Black-Phoenix Sep 26, 2019
8ac672b
Added AA
Black-Phoenix Sep 26, 2019
05de6d7
Added AA to path tracer
Black-Phoenix Sep 27, 2019
8a7cb02
Added motion blur
Black-Phoenix Sep 27, 2019
806dcb3
Added mesh loading; Tested on simple cube mesh; Mesh must include nor…
Black-Phoenix Sep 28, 2019
fdde37d
Added Fresnel's approximation for refraction; Added material support …
Black-Phoenix Sep 28, 2019
6701905
First pass at readme; Added support to load mesh files without normals
Black-Phoenix Sep 29, 2019
d1fec28
Updated README with more images
Black-Phoenix Sep 29, 2019
650cc23
Finished most images required in the README
Black-Phoenix Sep 29, 2019
e069cfb
Updated readme with performance plots; Added timing code via common.cu
Black-Phoenix Sep 29, 2019
cc8952a
Fixed plot in README
Black-Phoenix Sep 29, 2019
cca1d6f
Added reflective reference to Fresnel's effect
Black-Phoenix Sep 29, 2019
f224b2c
Changed banner image
Black-Phoenix Sep 29, 2019
2069243
Added model files
Black-Phoenix Sep 29, 2019
e04bb3e
Updated README
Black-Phoenix Sep 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ set(headers
src/sceneStructs.h
src/preview.h
src/utilities.h
src/common.h
)

set(sources
Expand All @@ -84,6 +85,7 @@ set(sources
src/scene.cpp
src/preview.cpp
src/utilities.cpp
src/common.cu
)

list(SORT headers)
Expand Down
10 changes: 5 additions & 5 deletions INSTRUCTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ by material type. This should be easily toggleable.
are contiguous in memory before shading. How does this impact performance? Why?
* A toggleable option to cache the first bounce intersections for re-use across all
subsequent iterations. Provide performance benefit analysis across different
max ray depths.
max ray depths.

##### Part 2 - Make Your Pathtracer Unique!

Expand All @@ -105,7 +105,7 @@ with point value up to +20/100 at the grader's discretion
* Refraction (e.g. glass/water) [PBRT 8.2] with Frensel effects using
[Schlick's approximation](https://en.wikipedia.org/wiki/Schlick's_approximation)
or more accurate methods [PBRT 8.5]. You can use `glm::refract` for
Snell's law.
Snell's law. ???
* Recommended but not required: non-perfect specular surfaces. (See below.)
* Physically-based depth-of-field (by jittering rays within an aperture)
[PBRT 6.2.3].
Expand All @@ -114,7 +114,7 @@ with point value up to +20/100 at the grader's discretion
* Procedural Shapes & Textures.
* You must generate a minimum of two different complex shapes procedurally. (Not primitives)
* You must be able to shade object with a minimum of two different textures
* Texture mapping [PBRT 10.4] and Bump mapping [PBRT 9.3].
* Texture mapping [PBRT 10.4] and Bump mapping ??? [PBRT 9.3].
* Implement file-loaded textures AND a basic procedural texture
* Provide a performance comparison between the two
* Direct lighting (by taking a final ray directly to a random point on an
Expand All @@ -124,7 +124,7 @@ with point value up to +20/100 at the grader's discretion
* Subsurface scattering [PBRT 5.6.2, 11.6].
* [Better hemisphere sampling methods](http://graphics.ucsd.edu/courses/cse168_s14/ucsd/CSE168_11_Random.pdf)
* Arbitrary mesh loading and rendering (e.g. `obj` files) with
toggleable bounding volume intersection culling
toggleable bounding volume intersection culling ???
* You can find models online or export them from your favorite 3D modeling application.
With approval, you may use a third-party loading code to bring the data
into C++. [tinyObj](http://syoyo.github.io/tinyobjloader/) is highly recommended.
Expand All @@ -141,7 +141,7 @@ toggleable bounding volume intersection culling
* If implemented in conjunction with Arbitrary mesh loading, this qualifies as the
toggleable bounding volume intersection culling.
* See below for more resources
* [Wavefront pathtracing](https://research.nvidia.com/publication/megakernels-considered-harmful-wavefront-path-tracing-gpus):
* [Wavefront pathtracing](https://research.nvidia.com/publication/megakernels-considered-harmful-wavefront-path-tracing-gpus): ???
Group rays by material without a sorting pass. A sane implementation will
require considerable refactoring, since every supported material suddenly needs
its own kernel.
Expand Down
218 changes: 213 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,219 @@ CUDA Path Tracer

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Name: Vaibhav Arcot
- [LinkedIn](https://www.linkedin.com/in/vaibhav-arcot-129829167/)
* Tested on: Windows 10, i7-7700HQ @ 2.8GHz (3.8 Boost) 32GB, External GTX 1080Ti, 11G (My personal laptop)

### (TODO: Your README)
![Banner Image](./img/banner.png)

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
## Path Tracing overview

This repo is a path tracer written entirely in C++ and CUDA accelerated. The idea of a path tracer is to simulate the effect light and materials have on other objects in the scene.

All images shown were created by running 5000 iterations unless otherwise specified, and all scene files and meshes are provided

### Features
* Reflective and diffused materials
* Stream compaction
* Material sorting
* Caching first bounce
* Refractive materials using Schlick's approximation of Fresnel's law
* Motion blur
* Anti Aliasing
* Normal debugging view
* Loading arbitrary meshes and ray culling
## Cornell Box

![Ref image](./img/REFERENCE_cornell.5000samp.png)

The Cornell box is a simple stage, consisting of 5 diffusive walls (1 red, 1 green and the other 3 white). In the above sample, a diffusive sphere.

## Different Materials

Below is an image with the 4 types of materials inside a Cornell box. The small dodecicosacron is a diffusive material, one orb is reflective and one is transparent. Finally, a cube is used as a light for the scene.

![All Materials](./img/all_materials.png)

## Effects

### Motion blur
To perform motion blur, we move the object slightly by some velocity each time we render a frame. This gives the illusion of motion. To showcase this feature, I decided to pull down the red and green walls of the Cornell box. Each wall had a constant velocity going down and stopped once they reached a point. Just for fun, I decided to put a light source inside the transparent orb.
![Motion blur](./img/motion_blur.png)

### Anti Aliasing

To perform anti-aliasing, I decided to use the simple approach of jittering the ray within the pixel every time we generate the rays for that scene. This prevents the ray from having the same first bounce, which can otherwise make the edges of objects appear jagged (aliasing, shown in the zoomed versions below).

| <p align="center"> <b>Anti Aliasing Off </b></p> | <p align="center"> <b>Anti Aliasing On </b></p>|
| ---- | ---- |
| ![AA off](./img/AA_off.png) | ![AA on](./img/AA_on.png) |
| <p align="center">![AA off Zoomed](./img/AA_off_zoom.png)</p> | <p align="center">![AA on Zoomed](./img/AA_on_zoom.png)</p> |



### Fresnel's Effect

Fresnel's effect is the idea that even a refractive material has a reflective quality to it (based on the incident ray angle). To approximate this effect, Schlick's approximation was used. The results are shown below (the diffuse object is shown for orientation context).

| <p align="center"> Transparent object with Fresnel's effect Off </b></p> | <p align="center"> <b>Transparent object with Fresnel's effect On </b></p> |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| ![](./img/fresnels/refractive_no_fresnel.png) | ![](./img/fresnels/refractive_fresnels.png) |
| <p align="center"> Diffused object </b></p> | <p align="center"> Reflective object </b></p> |
| <img src="./img/fresnels/diffused.png" title="Diffuse reference"/> | <img src="./img/fresnels/reflective.png" title="reflective reference"/> |



### Meshes

Mesh loading is supported in this path tracer with the help of *tinyobjloader*. The implementation allows for a mesh to have a rotation, translation and scale added to it, and also allows the importing of mesh files that have or lack normals defined inside them. Currently, only triangular meshes are supported, but the code should be easy to extend to higher-order polygons.

#### Great Dodecicosacron

This mesh is one of the first meshes I was able to load and render (Besides a debugging square). This mesh has 360 vertices and 120 faces, with the material used, was a reflective dark blue colour with a refractive index of 1.33. As mentioned previously, all scene files are present in the scenes/Scenes folder.

![Great Dodecicosacron](./img/dodecicosacron.png)

#### Elephant

Below is a mesh of an elephant with a diffused red surface (and diffused white ground). This mesh has 623 vertices and 1148 faces.

![Elephant](./img/elephant.png)

#### Stanford bunny

In the aim of pushing the system to its limits, I decided to load up the Stanford bunny. The material is a dark refractive material (It has a metallic quality only because it is hard to figure out if you are seeing a reflection or refraction which I like). This model has 34,817 vertices and 69,630 faces.

![Bunny!!!](./img/bunny.png)

#### Stanford dragon

Finally, I decided to load the Stanford dragon mesh. This mesh has a staggering (for me) 435,545vertices and 871,306 faces. Due to time limitations, I was only able to run this for 2.5k iterations, and an octree or KD-tree would have added a massive speed up (Future work).

![](./img/dragon.png)

### Debugging Normal view

To debug the mesh normals, I ended up implementing a simple normal view mode. In this mode, each surface is coloured by the absolute value of their normal. Thus, if the surface is a roof (or floor), it will have a normal in the y-axis (0, 1, 0) and thus be coloured in green (RGB colouring). below is a sample image of a tilted cube (made of triangles) with the faces coloured using the normals.

![](./img/normal_debugging_view.png)

## Optimizations

### Stream Compaction

One of the first optimizations was to stop bouncing terminated rays. This reduces the number of threads we need to spawn each bounce (after each bounce, rays terminate by hitting either a light source or into the void). To do this, I used thrust::partition to split the array by their completion state (completed rays are moved to the end). Then the number of rays to bounce is reduced and the main bounce loop is run again. Once the entire process has finished, we just need to reset the number of rays (so that everything is used to create the final image). The performance improvement is shown below:

### Material Sorting

The idea of material sorting is to reduce warp divergence. To implement this, I decided to go with thrust::sort_by_key, where the key is the material type. The results of this are shown below, but the key point is that it performs worse than not doing it. This could be because warp divergence occurs in my implementation (because of the probabilistic reflection refraction) and the small cases where it does reduce divergence doesn't justify the added overhead of sorting the rays (and intersections).

### Caching first bounce

First bounce caching is the idea to not recomputing the first bounce every time we start a new iteration because the initial rays will always start from the same place (not true after 1 bounce). Some important things to note, this optimization cannot be used with the anti-aliasing technique implemented here because that would jitter the initial ray, thus changing its first bounce location. This optimization also cannot be used with motion blur, because the object changes its position after every frame rendered (making the previously cached bounce incorrect). Both these cases are asserted in the code to prevent them from happening.

### Bounding box ray culling

The final optimization is for collision detection with meshes. Each of the meshes loaded had a LOT of polygons, and checking each ray with each polygon would quickly become impossible to run in any reasonable time. As a first optimization, a bounding box around the mesh is created during the time of loading. Then this bounding box is used as the first check for collision. This allows a significant number of rays to be discarded (only if the object is small).

### Results

To test the optimizations, I ran 500 iterations on 2 difference scenes (below). Both scenes contain a mesh object (including the cube) to be able to test the ray culling optimization. The second scene was created to allow a good number of the rays to be compacted.

| Scene 1 after 500 iterations | Scene 2 after 500 iterations |
| :----------------------------------------------------------: | :----------------------------------------------------------: |
| <img src="./img/profile_scene_1.png" alt="Scene 1 example" style="zoom: 33%;" /> | <img src="./img/profile_scene_2.png" alt="Scene 2 example" style="zoom: 33%;" /> |

The runtimes for each optimization (alone) has been shown below. The best-case option is the case where the optimizations that help are turned on (Ray culling and Caching).

![Data plot](./img/profile_plot.png)

From the above plots, we can see that material sorting doesn't improve performance. Stream compaction comes close to improvement but is slightly more expensive. I believe this is because thrusts implementation of partition isn't optimal. Another reason is that it is scene dependent. The more rays we can mark as terminated the better stream compaction will be.

Bounding box culling did extremely well and would scale well with the increase in complexity of the mesh. Caching also works well but has the issue that it cannot be used with the anti-aliasing technique chosen.

## Other cool results

### Re computing normals makes it more jagged

Normals are assigned to each of the vertices of the polygon in a mesh (not to the face itself). Then to find the normal of a point on the face, we can interpolate the normal using the barycentric coordinates. This results in a smoother look to the edges.

While loading a *.obj* model, not all of them come with the normals precomputed, so to solve this, I included a simple normal calculation mode. Though it works, it isn't ideal because while calculating the normals for the vertices, I only use the 3 edges/vertices of that face (and take the cross product) and set all the 3 vertex normals to this same value.

The issue is that the resultant model will be jagged at the internal edges. Below is a comparison of using the normals created using a program (CAD Exchanger) vs calculating them myself (It looks kinda cool actually). The solution would be to find all the faces attached to a vertex and then compute the normal using a mean of all the faces, but this has been left for the future.

This is only noticeable for low polycount objects. For the dragon shown above, I had to compute the normals using my approximation and I couldn't tell the difference.

| Existing normals | Approximation of normals |
| ----------------------------------------------- | ---------------------------------------------- |
| ![Smooth elephant](./img/elephant_2_smooth.png) | ![Rough elephant](./img/elephant_2_jagged.png) |



### Effect of depth on a render

To show the effect of depth on the render, I decided to render a reflective intensive scene. 2 of the walls (red and green) and 6 orbs are reflective, 2 light sources (one is the middle orb), 2 transparent (green) orbs and 1 orb(blue) + 3 walls are diffusive. Because of this setup, the number of remaining rays doesn't reach 0 by a depth of 8, meaning there can be a further improvement (in deeper reflections).

| Depth | Render | Comment |
| ----- | ----------------------- | ------------------------------------------------------------ |
| 1 | ![](./img/depth/1.png) | For this render, we see no reflections at all. The no path tracing case. |
| 2 | ![](./img/depth/2.png) | We start to see some reflections (only the simplest ones). |
| 3 | ![](./img/depth/3.png) | We can see more reflections on the reflection of the orbs in the walls. |
| 4 | ![](./img/depth/4.png) | We now have better refractions. |
| 5 | ![](./img/depth/5.png) | The reflections of the orbs have some transparency. |
| 6 | ![6](./img/depth/6.png) | The reflection of the transparent orbs isn't transparent. |
| 7 | ![7](./img/depth/7.png) | The difference is subtle, but is shows up in the 3rd order reflections |
| 8 | ![](./img/depth/8.png) | We can keep going, but here is a good stopping point. |



### Effect of iterations on a render

To see the effect of iterations on render quality, I went with the same image I used above (with a depth of 8) to test the effect of iteration on render for a semi-complex scene. From visual inspection, 2000 seems to be the tipping point, and further iterations have diminishing value.

| Iterations | Render |
| ---------- | ------------------------ |
| 50 | ![](./img/iter/50.png) |
| 250 | ![](./img/iter/250.png) |
| 500 | ![](./img/iter/500.png) |
| 1000 | ![](./img/iter/1000.png) |
| 2000 | ![](./img/iter/2000.png) |
| 5000 | ![](./img/iter/5000.png) |



## Observations

### Material sorting is slow

I mentioned this before, but sorting is slow! Maybe using my radix implementation (which seemed to outperform thrusts implementation by a lot) could overcome this

### Creating meshes with normals helps

Finding meshes with normals or creating them using CAD Exchanger saved time during the initial phases, by reducing the number of things to debug (not really, but kind of).

## Bloopers

For the first blooper, this was in the very early stages where the floor and ceiling were reflective, and stole the colour from the right and left wall and light.

![Reflections error](./img/bloopers/reflective_materials_screwed.png)



For the next stumble, for a long time, I couldn't figure out why my roof was black. Then I understood I was double adding colours (which made the walls very vivid) and also having a bug in the loop termination condition.

![Black roof](./img/bloopers/black_roof.png)

## Dependencies & CMake changes

- CUDA 10+
- [tinyobjloader](https://github.com/syoyo/tinyobjloader) (Included in repo)
- Added *common.h* to the CMakeList.txt

## Useful links

[3D obj files with normals](https://casual-effects.com/data/)

[Fresnel's law](https://blog.demofox.org/2017/01/09/raytracing-reflection-refraction-fresnel-total-internal-reflection-and-beers-law/)

[Easier 3D obj files](https://graphics.cmlab.csie.ntu.edu.tw/~robin/courses/cg04/model/index.html)
Binary file added data/metrics.xlsx
Binary file not shown.
2 changes: 2 additions & 0 deletions external/include/tiny_obj_loader.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#define TINYOBJLOADER_IMPLEMENTATION
#include "tiny_obj_loader.h"
Loading