diff --git a/README.md b/README.md
index 110697c..e27bc2d 100644
--- a/README.md
+++ b/README.md
@@ -1,13 +1,156 @@
 CUDA Path Tracer
 ================
 
-**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3**
+* **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+  - #### Author information
 
-### (TODO: Your README)
+    - Tianming Xu (Mark)
+      - www.linkedin.com/in/tianming-xu-8bb81816a (LinkedIn)
+    - Tested on: Windows 10, i7-8700 @ 3.20GHz 16GB, GTX 2080 8192MB (my personal desktop)
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+### Output Screenshots
 
+###### Mirror World
+
+![](img/mirror_world.png)
+
+### General overview
+
+Path tracing is a rendering algorithm such that stimulates how lights bounce around in the real-world scenario.  In every path tracing image, we have a scene containing all the topological information of all objects in the picture. We render every pixel of the image based on the amount of light hitting on the object on that pixel and the material color and attributes of that object. Therefore, the image generated by it is based on the physical rule of the real-world and looks very "realistic". 
+
+![](img/bsdf.png)
+
+
+
+​						(https://en.wikipedia.org/wiki/Bidirectional_scattering_distribution_function)
+
+​									The three fundamental ways light translate in the scene
+
+However, there is a fundamental difference between the path tracing and real-world light: the light in path tracing starts from the eye, or usually said as camera, and try to reach the light source, while in the real-world, the light comes from light source to the eye. Therefore, that is why path tracing is also said "backward tracing".
+
+![](img/real_world_light.PNG)
+
+​												This is how light bounces in the real-world
+
+![](img/backward_tracing.PNG)
+
+​													This is how light works in the path tracer
+
+ In this project, we not only implement a path tracer, but also use GPU CUDA cores to accelerate the rendering process significantly. 
+
+### Features 
+
+#### Overview
+
+- Shading kernel with BSDF features: diffuse, reflective and refractive with Fresnel law
+- Kernel optimization: terminating path using stream compaction
+- Kernel optimization: sorting material in global memory
+- Kernel optimization: caching the first layer path and their intersections for the rest iterations
+- Stochastic sampled antialiasing
+- Motion blur
+
+
+
+#### Details
+
+###### basic scene
+
+![](img/basic_scene.png)
+
+###### three materials
+
+![](img/three_materials.png)
+
+Three materials
+
+top mid: reflective
+
+botom right: refractive with IOR 1.55
+
+bottom left: diffuse yellow
+
+###### Stochastic sampled antialiasing
+
+With Antialiasing
+
+![](img/with_antialiasing.png)
+
+Without antialiasing
+
+![](img/without_antialiasing.png)
+
+The idea of antialiasing is fairly simple: in each iteration we generate the initial ray with a little bit offset within a certain range. As more iterations have been rendered, the color of the pixel will be blend with its surrounding pixel and form a very natural transition especially on the boundary of object.
+
+It is not very obvious in the large screenshot, so I enlarge the picture to see it more closely to show the effect.
+
+![](img/with_aa_enlarge.PNG)
+
+​							The boundary of cube is more like a straight line when AA is on
+
+​							![](img/without_aa_enlarge.PNG)
+
+​							     Notice the boundary of cube is very serrated when AA is not on
+
+###### Motion Blur
+
+![](img/blur.png)
+
+![](img/motion_blur.png)
+
+Motion blur is a very cool effect. The basic idea behind it is that. We add a velocity to the object, and for each iteration, we slowly but continuously update the position of the object. The updated position will also be shaded as the object's color while blurring with its previous color, causing such effect.
+
+I made a funny mistake when I update the position of object too fast, which makes my object fly out of scene after several iteration. 
+
+### Performance Analysis
+
+We use our basic scene, which doesn't have anti-aliasing, stream compaction, first ray cache and sorting materials, as the benchmark to compare with the three optimizations we do on our CUDA code.
+
+
+
+###### Stream compaction
+
+![](img/ray_terminated_compact.png)
+
+The idea of stream compaction is that, as the ray terminates in some reasons( hits on light source or doesn't hit anything in the scene), we don't actually need it in the next several bounces. Hence, we should remove them from our ray pools to reduce the number of thread required in the next bounce, accelerating the speed.
+
+![](img/time_compact.png)
+
+In this figure, we can clearly see the improvement of applying stream compaction algorithm in our CUDA code. The threads saved from terminated ray can be allocated to compute the remaining ray instead of being idle for the rest of time.
+
+
+
+###### Sorting by materials
+
+Sorting by materials seems to be unnecessary if we implement a normal CPU path tracer. However, it is an optimization in GPU version because the bottleneck of the efficiency in GPU is not computation, but memory I/O. Therefore, sorting rays by materials can maximize the utilization of locality for getting data from global memory, which significantly improve the performance.
+
+![](img/time_sorting.png)
+
+However, we don't see the improvement as we expected. Instead, the time for each iteration raises significantly comparing to the program without using this technique. The possible reason I guess, is that the number of materials in my benchmark scene(very basic) is too small to make this technique show its efficiency. Sorting is a time consuming algorithm, so if the number of materials is not very large, the sorting time can outweighs the time we saved from utilizing locality. When we render a complex scene, this technique might perform better I guess.
+
+
+
+###### Cache first ray and intersection
+
+Because the initial rays generated from camera(eye)  are fixed, and we have many iterations to render a picture. It is natural to think of a way to cache the fixed data and use them later. The performance of this technique shows up after the first iteration, as we still need to generate rays in the first iteration. Hence, I only compare the performance after the first iteration.
+
+![](img/time_cache.png)
+
+The improvement on the performance is exactly what we expected.
+
+
+
+### Acknowledgements
+
+- Motion blur
+  - http://www.cemyuksel.com/research/papers/time_interval_ray_tracing_for_motion_blur-high.pdf
+  - Yue Li
+- Jie Meng and Hannah for helping me solve the sorting by materials optimization
+- Jiangping Xu for helping me stream compaction part and refractive material in Fresnel law
+  - also the Schlick's approximation formula in https://en.wikipedia.org/wiki/Schlick's_approximation
+
+
+
+### Comments and Future work
+
+There are some issues when I try to run the path tracer in Release mode, so I use Debug mode to collect data. Therefore, the performance of the program is significantly slower, but the relative efficiency comparison should still be valid. I will work on fixing the issues and update the chart later.
\ No newline at end of file
diff --git a/img/50_50chance_by_diffuse.PNG b/img/50_50chance_by_diffuse.PNG
new file mode 100644
index 0000000..6311a08
Binary files /dev/null and b/img/50_50chance_by_diffuse.PNG differ
diff --git a/img/after_mod.PNG b/img/after_mod.PNG
new file mode 100644
index 0000000..d566dde
Binary files /dev/null and b/img/after_mod.PNG differ
diff --git a/img/backward_tracing.PNG b/img/backward_tracing.PNG
new file mode 100644
index 0000000..6925535
Binary files /dev/null and b/img/backward_tracing.PNG differ
diff --git a/img/basic_scene.png b/img/basic_scene.png
new file mode 100644
index 0000000..5734427
Binary files /dev/null and b/img/basic_scene.png differ
diff --git a/img/blur.png b/img/blur.png
new file mode 100644
index 0000000..250281b
Binary files /dev/null and b/img/blur.png differ
diff --git a/img/bsdf.png b/img/bsdf.png
new file mode 100644
index 0000000..8f2f17f
Binary files /dev/null and b/img/bsdf.png differ
diff --git a/img/low_depth.PNG b/img/low_depth.PNG
new file mode 100644
index 0000000..a5e9021
Binary files /dev/null and b/img/low_depth.PNG differ
diff --git a/img/mirror_world.png b/img/mirror_world.png
new file mode 100644
index 0000000..f69b5da
Binary files /dev/null and b/img/mirror_world.png differ
diff --git a/img/motion_blur.png b/img/motion_blur.png
new file mode 100644
index 0000000..d38000f
Binary files /dev/null and b/img/motion_blur.png differ
diff --git a/img/motion_blur_debug.png b/img/motion_blur_debug.png
new file mode 100644
index 0000000..022345f
Binary files /dev/null and b/img/motion_blur_debug.png differ
diff --git a/img/ray_terminated_compact.png b/img/ray_terminated_compact.png
new file mode 100644
index 0000000..231af63
Binary files /dev/null and b/img/ray_terminated_compact.png differ
diff --git a/img/real_world_light.PNG b/img/real_world_light.PNG
new file mode 100644
index 0000000..6e97a8b
Binary files /dev/null and b/img/real_world_light.PNG differ
diff --git a/img/three_materials.png b/img/three_materials.png
new file mode 100644
index 0000000..38a1485
Binary files /dev/null and b/img/three_materials.png differ
diff --git a/img/time_cache.png b/img/time_cache.png
new file mode 100644
index 0000000..04ac165
Binary files /dev/null and b/img/time_cache.png differ
diff --git a/img/time_compact.png b/img/time_compact.png
new file mode 100644
index 0000000..cd117ea
Binary files /dev/null and b/img/time_compact.png differ
diff --git a/img/time_sorting.png b/img/time_sorting.png
new file mode 100644
index 0000000..94fc80d
Binary files /dev/null and b/img/time_sorting.png differ
diff --git a/img/too_light_when_add_depth.PNG b/img/too_light_when_add_depth.PNG
new file mode 100644
index 0000000..08c9c4c
Binary files /dev/null and b/img/too_light_when_add_depth.PNG differ
diff --git a/img/with_aa_enlarge.PNG b/img/with_aa_enlarge.PNG
new file mode 100644
index 0000000..8097f67
Binary files /dev/null and b/img/with_aa_enlarge.PNG differ
diff --git a/img/with_antialiasing.png b/img/with_antialiasing.png
new file mode 100644
index 0000000..b8a503b
Binary files /dev/null and b/img/with_antialiasing.png differ
diff --git a/img/without_aa_enlarge.PNG b/img/without_aa_enlarge.PNG
new file mode 100644
index 0000000..b6e8cd4
Binary files /dev/null and b/img/without_aa_enlarge.PNG differ
diff --git a/img/without_antialiasing.png b/img/without_antialiasing.png
new file mode 100644
index 0000000..645de06
Binary files /dev/null and b/img/without_antialiasing.png differ
diff --git a/scenes/cornell.txt b/scenes/cornell.txt
index 83ff820..73508cd 100644
--- a/scenes/cornell.txt
+++ b/scenes/cornell.txt
@@ -48,6 +48,17 @@ REFR        0
 REFRIOR     0
 EMITTANCE   0
 
+// Diffuse yellow
+MATERIAL 5
+RGB         .85 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+
 // Camera
 CAMERA
 RES         800 800
@@ -112,6 +123,14 @@ SCALE       .01 10 10
 OBJECT 6
 sphere
 material 4
-TRANS       -1 4 -1
+TRANS       -1 5 -1
 ROTAT       0 0 0
 SCALE       3 3 3
+
+// Cube
+OBJECT 7
+cube
+material 5
+TRANS       2 2 1
+ROTAT       0 45 0
+SCALE       3 4 3
\ No newline at end of file
diff --git a/scenes/cornell_motion.txt b/scenes/cornell_motion.txt
new file mode 100644
index 0000000..bd25bd9
--- /dev/null
+++ b/scenes/cornell_motion.txt
@@ -0,0 +1,135 @@
+// Emissive material (light)
+MATERIAL 0
+RGB         1 1 1
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   5
+
+// Diffuse white
+MATERIAL 1
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse red
+MATERIAL 2
+RGB         .85 .35 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse green
+MATERIAL 3
+RGB         .35 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Specular white
+MATERIAL 4
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     .98 .98 .98
+REFL        1
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse yellow
+MATERIAL 5
+RGB         .85 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+
+// Camera
+CAMERA
+RES         800 800
+FOVY        45
+ITERATIONS  5000
+DEPTH       8
+FILE        cornell
+EYE         0.0 5 10.5
+LOOKAT      0 5 0
+UP          0 1 0
+
+
+// Ceiling light
+OBJECT 0
+cube
+material 0
+TRANS       0 10 0
+ROTAT       0 0 0
+SCALE       3 .3 3
+VELOCITY    0 0 0
+
+// Floor
+OBJECT 1
+cube
+material 1
+TRANS       0 0 0
+ROTAT       0 0 0
+SCALE       10 .01 10
+VELOCITY    0 0 0
+
+// Ceiling
+OBJECT 2
+cube
+material 1
+TRANS       0 10 0
+ROTAT       0 0 90
+SCALE       .01 10 10
+VELOCITY    0 0 0
+
+// Back wall
+OBJECT 3
+cube
+material 1
+TRANS       0 5 -5
+ROTAT       0 90 0
+SCALE       .01 10 10
+VELOCITY    0 0 0
+
+// Left wall
+OBJECT 4
+cube
+material 2
+TRANS       -5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+VELOCITY    0 0 0
+
+// Right wall
+OBJECT 5
+cube
+material 3
+TRANS       5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+VELOCITY    0 0 0
+
+// Sphere
+OBJECT 6
+sphere
+material 5
+TRANS       -1 4 -1
+ROTAT       0 0 0
+SCALE       3 3 3
+VELOCITY    1 0 0
diff --git a/scenes/cornell_refract.txt b/scenes/cornell_refract.txt
new file mode 100644
index 0000000..cf03893
--- /dev/null
+++ b/scenes/cornell_refract.txt
@@ -0,0 +1,140 @@
+// Emissive material (light)
+MATERIAL 0
+RGB         1 1 1
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   5
+
+// Diffuse white
+MATERIAL 1
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse red
+MATERIAL 2
+RGB         .85 .35 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse green
+MATERIAL 3
+RGB         .35 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Specular white
+MATERIAL 4
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     .98 .98 .98
+REFL        1
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse yellow
+MATERIAL 5
+RGB         .85 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+
+// transmissive white
+MATERIAL 6
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     .98 .98 .98
+REFL        0
+REFR        1
+REFRIOR     1.55
+EMITTANCE   0
+
+
+
+// Camera
+CAMERA
+RES         800 800
+FOVY        45
+ITERATIONS  5000
+DEPTH       8
+FILE        cornell
+EYE         0.0 5 10.5
+LOOKAT      0 5 0
+UP          0 1 0
+
+
+// Ceiling light
+OBJECT 0
+cube
+material 0
+TRANS       0 10 0
+ROTAT       0 0 0
+SCALE       3 .3 3
+
+// Floor
+OBJECT 1
+cube
+material 1
+TRANS       0 0 0
+ROTAT       0 0 0
+SCALE       10 .01 10
+
+// Ceiling
+OBJECT 2
+cube
+material 1
+TRANS       0 10 0
+ROTAT       0 0 90
+SCALE       .01 10 10
+
+// Back wall
+OBJECT 3
+cube
+material 1
+TRANS       0 5 -5
+ROTAT       0 90 0
+SCALE       .01 10 10
+
+// Left wall
+OBJECT 4
+cube
+material 2
+TRANS       -5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Right wall
+OBJECT 5
+cube
+material 3
+TRANS       5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Sphere
+OBJECT 6
+sphere
+material 6
+TRANS       -1 4 -1
+ROTAT       0 0 0
+SCALE       3 3 3
\ No newline at end of file
diff --git a/scenes/cornell_three_materials.txt b/scenes/cornell_three_materials.txt
new file mode 100644
index 0000000..c011478
--- /dev/null
+++ b/scenes/cornell_three_materials.txt
@@ -0,0 +1,154 @@
+// Emissive material (light)
+MATERIAL 0
+RGB         1 1 1
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   5
+
+// Diffuse white
+MATERIAL 1
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse red
+MATERIAL 2
+RGB         .85 .35 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse green
+MATERIAL 3
+RGB         .35 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Specular white
+MATERIAL 4
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     .98 .98 .98
+REFL        1
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse yellow
+MATERIAL 5
+RGB         .85 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// transmissive white
+MATERIAL 6
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     .98 .98 .98
+REFL        0
+REFR        1
+REFRIOR     1.55
+EMITTANCE   0
+
+
+// Camera
+CAMERA
+RES         800 800
+FOVY        45
+ITERATIONS  5000
+DEPTH       8
+FILE        cornell
+EYE         0.0 3 8
+LOOKAT      0 5 0
+UP          0 1 0
+
+
+// Ceiling light
+OBJECT 0
+cube
+material 0
+TRANS       0 10 0
+ROTAT       0 0 0
+SCALE       3 .3 3
+
+// Floor
+OBJECT 1
+cube
+material 1
+TRANS       0 0 0
+ROTAT       0 0 0
+SCALE       10 .01 10
+
+// Ceiling
+OBJECT 2
+cube
+material 1
+TRANS       0 10 0
+ROTAT       0 0 90
+SCALE       .01 10 10
+
+// Back wall
+OBJECT 3
+cube
+material 1
+TRANS       0 5 -5
+ROTAT       0 90 0
+SCALE       .01 10 10
+
+// Left wall
+OBJECT 4
+cube
+material 2
+TRANS       -5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Right wall
+OBJECT 5
+cube
+material 3
+TRANS       5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// reflective Sphere
+OBJECT 6
+sphere
+material 4
+TRANS       -0.5 6 1
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// refractive sphere
+OBJECT 7
+sphere
+material 6
+TRANS       1 3 1
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// diffuse sphere
+OBJECT 8
+sphere
+material 5
+TRANS       -2 3 1
+ROTAT       0 0 0
+SCALE       3 3 3
\ No newline at end of file
diff --git a/scenes/mirror_world.txt b/scenes/mirror_world.txt
new file mode 100644
index 0000000..c73407a
--- /dev/null
+++ b/scenes/mirror_world.txt
@@ -0,0 +1,144 @@
+// Emissive material (light)
+MATERIAL 0
+RGB         1 1 1
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   5
+
+// Diffuse white
+MATERIAL 1
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse red
+MATERIAL 2
+RGB         .85 .35 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse green
+MATERIAL 3
+RGB         .35 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Specular white
+MATERIAL 4
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     .98 .98 .98
+REFL        1
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse yellow
+MATERIAL 5
+RGB         .85 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+
+// Camera
+CAMERA
+RES         800 800
+FOVY        45
+ITERATIONS  5000
+DEPTH       8
+FILE        cornell
+EYE         1 2 4
+LOOKAT      0 5 0
+UP          0 1 0
+
+
+// Ceiling light
+OBJECT 0
+cube
+material 0
+TRANS       0 10 0
+ROTAT       0 0 0
+SCALE       2 .3 2
+
+// Floor
+OBJECT 1
+cube
+material 1
+TRANS       0 0 0
+ROTAT       0 0 0
+SCALE       10 .01 10
+
+// Ceiling
+OBJECT 2
+cube
+material 1
+TRANS       0 10 0
+ROTAT       0 0 90
+SCALE       .01 10 10
+
+// Back wall
+OBJECT 3
+cube
+material 4
+TRANS       0 5 -5
+ROTAT       0 90 0
+SCALE       .01 10 10
+
+// Left wall
+OBJECT 4
+cube
+material 2
+TRANS       -5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Right wall
+OBJECT 5
+cube
+material 3
+TRANS       5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Sphere
+OBJECT 6
+sphere
+material 4
+TRANS       -1 6 -1
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// Cube
+OBJECT 7
+cube
+material 5
+TRANS       -1 2.5 -1
+ROTAT       0 45 0
+SCALE       2 5 2
+
+// Front wall
+OBJECT 8
+cube
+material 4
+TRANS       0 5 5
+ROTAT       0 -90 0
+SCALE       .01 10 10
\ No newline at end of file
diff --git a/src/interactions.h b/src/interactions.h
index 5ce3628..267a0ab 100644
--- a/src/interactions.h
+++ b/src/interactions.h
@@ -2,6 +2,10 @@
 
 #include "intersections.h"
 
+#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)
+
+#define EPSILON_OFFSET 0.01f
+
 // CHECKITOUT
 /**
  * Computes a cosine-weighted random direction in a hemisphere.
@@ -50,7 +54,7 @@ glm::vec3 calculateRandomDirectionInHemisphere(
  * 
  * The visual effect you want is to straight-up add the diffuse and specular
  * components. You can do this in a few ways. This logic also applies to
- * combining other types of materias (such as refractive).
+ * combining other types of materias (such as refractive). --- only reflective is that right? diffuse shouldn't have reflected component
  * 
  * - Always take an even (50/50) split between a each effect (a diffuse bounce
  *   and a specular bounce), but divide the resulting color of either branch
@@ -66,14 +70,139 @@ glm::vec3 calculateRandomDirectionInHemisphere(
  *
  * You may need to change the parameter list for your purposes!
  */
+
+//it means that we will update the color directly in scatterRay and just need to call scatter Ray
 __host__ __device__
 void scatterRay(
 		PathSegment & pathSegment,
         glm::vec3 intersect,
         glm::vec3 normal,
         const Material &m,
-        thrust::default_random_engine &rng) {
+        thrust::default_random_engine &rng,
+        bool outside = true) {
     // TODO: implement this.
     // A basic implementation of pure-diffuse shading will just call the
-    // calculateRandomDirectionInHemisphere defined above.
-}
+
+    //first determine the material
+    
+    //*********TODO, modify such that each material types have a possibility to do its work. use randomly number generator to determine
+    //diffuse case
+    //if it is pure-diffuse, then it is the its material's color /invPi -- might have other diffuse
+    if (m.hasReflective == 0 && m.hasRefractive == 0)
+    {
+
+        //if diffuse -- update color but not respawn ray
+        pathSegment.color *= m.color;
+        if (pathSegment.remainingBounces <= 0)
+        {
+            return;
+        }
+        // calculateRandomDirectionInHemisphere defined above.
+        glm::vec3 diffuse_dir = calculateRandomDirectionInHemisphere(normal, rng);
+        glm::vec3 wo = -pathSegment.ray.direction;
+        glm::vec3 wi = diffuse_dir;
+        //first update color -- actually not related to ray inforamtion at all for now
+        //glm::vec3 temp_col = m.color * (float)InvPi;
+        //apply lambert's law
+        //float lightTerm = glm::abs(glm::dot(wi, normal));  //don't need, because we don't include pdf
+        //update ray -- determine whether we should keep bouncing
+        pathSegment.ray.origin = intersect + EPSILON_OFFSET * normal;
+        pathSegment.ray.direction = diffuse_dir;
+
+
+    }
+    //if pure-specular, itself doesn't have color, all is about its reflected item's color
+    else if (m.hasReflective > 0)
+    {
+        if (pathSegment.remainingBounces <= 0)
+        {
+            return;
+        }
+        //pure reflective when hasReflective is 1, otherwise, go 50/50
+        if (m.hasReflective == 1)
+        {
+            glm::vec3 specular_dir = glm::normalize(glm::reflect(pathSegment.ray.direction, normal));
+            //no update on color
+            //update ray
+            pathSegment.ray.origin = intersect + EPSILON_OFFSET * normal;
+            pathSegment.ray.direction = specular_dir;
+            //pathSegment.color *= m.specular.color;
+            //pathSegment.color *= glm::abs(glm::dot(pathSegment.ray.direction, normal)) * m.color;
+        }
+        else
+        {
+            //use random generator to genrate a num
+            thrust::uniform_real_distribution<float> u01(0, 1);
+            float condition = u01(rng);
+            //first use simple 5/5 version
+            if (condition < 0.5)
+            {
+                glm::vec3 diffuse_dir = calculateRandomDirectionInHemisphere(normal, rng);
+                //first update color -- actually not related to ray inforamtion at all for now
+                pathSegment.color *= m.color;
+                //update ray
+                pathSegment.ray.origin = intersect + EPSILON_OFFSET * normal;
+                pathSegment.ray.direction = diffuse_dir;
+            }
+            else
+            {
+                glm::vec3 specular_dir = glm::normalize(glm::reflect(pathSegment.ray.direction, normal));
+                //no update on color
+                //update ray
+                pathSegment.ray.origin = intersect + EPSILON_OFFSET * normal;
+                pathSegment.ray.direction = specular_dir;
+                //pathSegment.color *= m.specular.color;
+                //pathSegment.color *= glm::abs(glm::dot(pathSegment.ray.direction, normal)) * m.color;
+            }
+        }
+
+    }
+    else if (m.hasRefractive > 0)
+    {
+        if (pathSegment.remainingBounces <= 0)
+        {
+            return;
+        }
+
+        //the eta of air is always 1 we assume
+        float eta_in = 1.0f;
+        float eta_out = m.indexOfRefraction;
+        if (!outside)
+        {
+            float temp = eta_in;
+            eta_in = eta_out;
+            eta_out = temp;
+        }
+
+        float eta = eta_in / eta_out;
+
+        //then compute Schlick's_approximation
+        float cos_theta = glm::dot(-pathSegment.ray.direction, normal);
+        float r0 = pow((1 - m.indexOfRefraction) / (1 + m.indexOfRefraction), 2);
+        float fresnel = r0 + (1 - r0) * pow(1 - cos_theta, 5);
+
+        //get a random reflection 
+        thrust::uniform_real_distribution<float> u01(0, 1);
+        float condition = u01(rng);
+        //we reflect if fresnel is larger than the random reflection possibility, internal reflect
+        if (fresnel > condition)
+        {
+            glm::vec3 specular_dir = glm::normalize(glm::reflect(pathSegment.ray.direction, normal));
+            //no update on color
+            //update ray
+            pathSegment.ray.origin = intersect + EPSILON_OFFSET * normal;
+            pathSegment.ray.direction = specular_dir;
+            
+        }
+        //else we refract
+        {
+            glm::vec3 refract_dir = glm::normalize(glm::refract(pathSegment.ray.direction, normal, eta));
+            //no update on color
+            //update ray
+            pathSegment.ray.origin = intersect + EPSILON_OFFSET * pathSegment.ray.direction;
+            pathSegment.ray.direction = refract_dir;
+        }
+    }
+
+    pathSegment.remainingBounces--;
+}
\ No newline at end of file
diff --git a/src/main.cpp b/src/main.cpp
index fe8e85e..a6898a3 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -69,7 +69,7 @@ int main(int argc, char** argv) {
     // Initialize CUDA and GL components
     init();
 
-    // GLFW main loop
+    // GLFW main loop  -- in preview.cpp, will runCuda for many times until we close window
     mainLoop();
 
     return 0;
@@ -98,6 +98,7 @@ void saveImage() {
     //img.saveHDR(filename);  // Save a Radiance HDR file
 }
 
+//main function
 void runCuda() {
     if (camchanged) {
         iteration = 0;
diff --git a/src/pathtrace.cu b/src/pathtrace.cu
index c1ec122..c499ea7 100644
--- a/src/pathtrace.cu
+++ b/src/pathtrace.cu
@@ -1,23 +1,34 @@
 #include <cstdio>
 #include <cuda.h>
 #include <cmath>
+#include <iostream>
 #include <thrust/execution_policy.h>
+#include <thrust/device_vector.h>
+#include <thrust/host_vector.h>
 #include <thrust/random.h>
 #include <thrust/remove.h>
+#include <thrust/partition.h>
 
 #include "sceneStructs.h"
 #include "scene.h"
 #include "glm/glm.hpp"
 #include "glm/gtx/norm.hpp"
+#include "glm/gtc/matrix_inverse.hpp"
 #include "utilities.h"
 #include "pathtrace.h"
 #include "intersections.h"
 #include "interactions.h"
 
 #define ERRORCHECK 1
+#define TEST_RADIX 0
+#define SORT_MATERIAL 0
+#define CACHE_FIRST_BOUNCE 0
+#define ANTI_ALIASING 0
+#define MOTION_BLUR 0
 
 #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
 #define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)
+
 void checkCUDAErrorFn(const char *msg, const char *file, int line) {
 #if ERRORCHECK
     cudaDeviceSynchronize();
@@ -75,7 +86,11 @@ static PathSegment * dev_paths = NULL;
 static ShadeableIntersection * dev_intersections = NULL;
 // TODO: static variables for device memory, any extra info you need, etc
 // ...
-
+//static first_bounce_intersection and flag -- need to turn false when we change camera
+static ShadeableIntersection * dev_first_bounce_intersections = NULL;
+static PathSegment * dev_first_bounce_paths = NULL;
+//for radix sort
+static int n_bit_material_bound = -1;
 void pathtraceInit(Scene *scene) {
     hst_scene = scene;
     const Camera &cam = hst_scene->state.camera;
@@ -84,6 +99,7 @@ void pathtraceInit(Scene *scene) {
     cudaMalloc(&dev_image, pixelcount * sizeof(glm::vec3));
     cudaMemset(dev_image, 0, pixelcount * sizeof(glm::vec3));
 
+    //same number of path as pixel
   	cudaMalloc(&dev_paths, pixelcount * sizeof(PathSegment));
 
   	cudaMalloc(&dev_geoms, scene->geoms.size() * sizeof(Geom));
@@ -96,6 +112,16 @@ void pathtraceInit(Scene *scene) {
   	cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
 
     // TODO: initialize any extra device memeory you need
+    cudaMalloc(&dev_first_bounce_intersections, pixelcount * sizeof(ShadeableIntersection));
+    cudaMemset(dev_first_bounce_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
+
+    cudaMalloc(&dev_first_bounce_paths, pixelcount * sizeof(PathSegment));
+
+    //calculate the bit number we need to concern
+    if (n_bit_material_bound == -1)
+    {
+        n_bit_material_bound = std::log2(hst_scene->materials.size()) + 1;
+    }
 
     checkCUDAError("pathtraceInit");
 }
@@ -107,7 +133,8 @@ void pathtraceFree() {
   	cudaFree(dev_materials);
   	cudaFree(dev_intersections);
     // TODO: clean up any extra device memory you created
-
+    cudaFree(dev_first_bounce_intersections);
+    cudaFree(dev_first_bounce_paths);
     checkCUDAError("pathtraceFree");
 }
 
@@ -117,7 +144,7 @@ void pathtraceFree() {
 *
 * Antialiasing - add rays for sub-pixel sampling
 * motion blur - jitter rays "in time"
-* lens effect - jitter ray origin positions based on a lens
+* lens effect - jitter ray origin positions based on a lens -- need to modify if I want to implement
 */
 __global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, PathSegment* pathSegments)
 {
@@ -129,16 +156,31 @@ __global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, Path
 		PathSegment & segment = pathSegments[index];
 
 		segment.ray.origin = cam.position;
-    segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
+        segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
 
 		// TODO: implement antialiasing by jittering the ray
+        #if ANTI_ALIASING
+        thrust::default_random_engine rng = makeSeededRandomEngine(iter, index, 0);
+        thrust::uniform_real_distribution<float> u01(0, 1);
+        float rn_x = u01(rng);
+        float rn_y = u01(rng);
+        segment.ray.direction = glm::normalize(cam.view
+            - cam.right * cam.pixelLength.x * ((float)(x + rn_x) - (float)cam.resolution.x * 0.5f)
+            - cam.up * cam.pixelLength.y * ((float)(y + rn_y) - (float)cam.resolution.y * 0.5f)
+        );
+
+        #else
 		segment.ray.direction = glm::normalize(cam.view
 			- cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f)
 			- cam.up * cam.pixelLength.y * ((float)y - (float)cam.resolution.y * 0.5f)
 			);
 
+
+        #endif
+
 		segment.pixelIndex = index;
 		segment.remainingBounces = traceDepth;
+        segment.terminated = false;
 	}
 }
 
@@ -198,6 +240,7 @@ __global__ void computeIntersections(
 			}
 		}
 
+        //no hit
 		if (hit_geom_index == -1)
 		{
 			intersections[path_index].t = -1.0f;
@@ -208,6 +251,7 @@ __global__ void computeIntersections(
 			intersections[path_index].t = t_min;
 			intersections[path_index].materialId = geoms[hit_geom_index].materialid;
 			intersections[path_index].surfaceNormal = normal;
+            intersections[path_index].outside = outside;
 		}
 	}
 }
@@ -243,24 +287,45 @@ __global__ void shadeFakeMaterial (
       Material material = materials[intersection.materialId];
       glm::vec3 materialColor = material.color;
 
-      // If the material indicates that the object was a light, "light" the ray
+      // If the material indicates that the object was a light, "light" the ray -- should light source be + instead of mult?
       if (material.emittance > 0.0f) {
         pathSegments[idx].color *= (materialColor * material.emittance);
+        //should we terminate? -- yes
+        pathSegments[idx].remainingBounces = -1;
       }
       // Otherwise, do some pseudo-lighting computation. This is actually more
       // like what you would expect from shading in a rasterizer like OpenGL.
       // TODO: replace this! you should be able to start with basically a one-liner
+
+      //basic implementation of bsdf
       else {
-        float lightTerm = glm::dot(intersection.surfaceNormal, glm::vec3(0.0f, 1.0f, 0.0f));
-        pathSegments[idx].color *= (materialColor * lightTerm) * 0.3f + ((1.0f - intersection.t * 0.02f) * materialColor) * 0.7f;
-        pathSegments[idx].color *= u01(rng); // apply some noise because why not
+          //if specular, completely depend on the reflected color
+          glm::vec3 intersec_pos = pathSegments[idx].ray.direction * intersection.t + pathSegments[idx].ray.origin;
+          if (material.hasRefractive)
+          {
+              //first determine whether it is inside the object or not by computing the cosTheta of output ray direction -- which is its z value, as it is normalized, I will add to utility though
+              scatterRay(pathSegments[idx], intersec_pos, intersection.surfaceNormal, material, rng, intersection.outside);
+          }
+          else
+          {
+              scatterRay(pathSegments[idx], intersec_pos, intersection.surfaceNormal, material, rng);
+          }
       }
+
+      // fake implementation
+      //else {
+      //  float lightTerm = glm::dot(intersection.surfaceNormal, glm::vec3(0.0f, 1.0f, 0.0f));
+      //  pathSegments[idx].color *= (materialColor * lightTerm) * 0.3f + ((1.0f - intersection.t * 0.02f) * materialColor) * 0.7f;
+      //  pathSegments[idx].color *= u01(rng); // apply some noise because why not
+      //}
+
     // If there was no intersection, color the ray black.
     // Lots of renderers use 4 channel color, RGBA, where A = alpha, often
     // used for opacity, in which case they can indicate "no opacity".
     // This can be useful for post-processing and image compositing.
     } else {
       pathSegments[idx].color = glm::vec3(0.0f);
+      pathSegments[idx].remainingBounces = -1;//no further bouncing -- help stream compaction
     }
   }
 }
@@ -273,10 +338,200 @@ __global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterati
 	if (index < nPaths)
 	{
 		PathSegment iterationPath = iterationPaths[index];
+        //should here be average?
 		image[iterationPath.pixelIndex] += iterationPath.color;
 	}
 }
 
+//need to change position later
+struct have_more_bounce
+{
+    __host__ __device__
+        bool operator()(const PathSegment p)
+    {
+        return p.remainingBounces >= 0;
+    }
+};
+
+
+//sort rays based on material id -- failed..
+//__global__ void compute_b_e(int values_size, int* values, int* dev_b, int* dev_e, unsigned int bit)
+//{
+//    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+//    if (idx < values_size)
+//    {
+//        int x_i = values[idx];          // value of integer at position i
+//        unsigned int p_i = (x_i >> bit) & 1;
+//
+//        if (p_i)
+//        {
+//            dev_b[idx] = 1;
+//        }
+//        else
+//        {
+//            dev_e[idx] = 1;
+//        }
+//    }
+//}
+//
+//__global__ void compute_t(int values_size, int* dev_e, int* dev_t, int* dev_f)
+//{
+//    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+//    if (idx < values_size)
+//    {
+//        int total_false = dev_e[values_size - 1] + dev_f[values_size - 1];
+//        dev_t[idx] = idx - dev_f[idx] + total_false;
+//    }
+//}
+//
+//__global__ void compute_d(int values_size, int* dev_b, int* dev_t, int* dev_f, int* dev_d)
+//{
+//    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+//    if (idx < values_size)
+//    {
+//        bool has_bit = dev_b[idx] == 1;
+//        int address;
+//        if (has_bit)
+//        {
+//            address = dev_t[idx];
+//            //dev_d[address] = dev_values[idx];  --test
+//        }
+//        else
+//        {
+//            address = dev_f[idx];
+//            //dev_d[address] = dev_values[idx];  --test
+//        }
+//        dev_d[idx] = address;
+//    }
+//}
+//
+//__global__ void apply_address(int values_size, PathSegment* dev_paths, PathSegment* dev_output_paths, ShadeableIntersection* dev_intersections, ShadeableIntersection* dev_output_intersections, int* dev_d)
+//{
+//    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+//    if (idx < values_size)
+//    {
+//        int address = dev_d[idx]; //address to store to
+//        dev_output_paths[address] = dev_paths[idx];
+//        dev_output_intersections[address] = dev_intersections[idx];
+//    }
+//}
+//
+//__global__ void setup_material_array(int values_size, ShadeableIntersection* dev_input_intersections, int* dev_materialIds)
+//{
+//    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+//    if (idx < values_size)
+//    {
+//        dev_materialIds[idx] = dev_input_intersections[idx].materialId;
+//    }
+//
+//
+//}
+//
+//void sort_by_materialID(int values_size)
+//{
+//    //first allocate 5 intermediate buffers  -- similar in slides
+//    int* dev_b;
+//    int* dev_e;
+//    int* dev_f;
+//    int* dev_t;
+//    int* dev_d;
+//    int* dev_materialIds;
+//
+//    //intermediate buffer
+//    PathSegment* dev_output_paths;
+//    PathSegment* dev_input_paths;
+//    ShadeableIntersection* dev_output_intersections;
+//    ShadeableIntersection* dev_input_intersections;
+//
+//    cudaMalloc(&dev_b, values_size * sizeof(int));
+//    cudaMalloc(&dev_e, values_size * sizeof(int));
+//    cudaMalloc(&dev_f, values_size * sizeof(int));
+//    cudaMalloc(&dev_t, values_size * sizeof(int));
+//    cudaMalloc(&dev_d, values_size * sizeof(int));
+//    cudaMalloc(&dev_materialIds, values_size * sizeof(int));
+//    cudaMalloc(&dev_output_paths, values_size * sizeof(PathSegment));
+//    cudaMalloc(&dev_output_intersections, values_size * sizeof(ShadeableIntersection));
+//    cudaMalloc(&dev_input_paths, values_size * sizeof(PathSegment));
+//    cudaMalloc(&dev_input_intersections, values_size * sizeof(ShadeableIntersection));
+//
+//    cudaMemcpy(dev_input_paths, dev_paths, values_size * sizeof(PathSegment), cudaMemcpyDeviceToDevice);
+//    cudaMemcpy(dev_input_intersections, dev_intersections, values_size * sizeof(ShadeableIntersection), cudaMemcpyDeviceToDevice);
+//
+//
+//    for (int bit = 0; bit < n_bit_material_bound; ++bit)
+//    {
+//        cudaMemset(dev_b, 0, values_size * sizeof(int));
+//        cudaMemset(dev_e, 0, values_size * sizeof(int));
+//        cudaMemset(dev_d, 0, values_size * sizeof(int));
+//        cudaMemset(dev_materialIds, 0, values_size * sizeof(int));
+//
+//        const int blockSize = 128;
+//        dim3 blocksPerGrid((values_size + blockSize - 1) / blockSize);
+//
+//        //store material id into the int array and use that to do radix sort
+//        setup_material_array << <blockSize, blocksPerGrid >> > (values_size, dev_input_intersections, dev_materialIds);
+//
+//        //compute array b and e
+//        compute_b_e<<<blockSize, blocksPerGrid>>>(values_size, dev_materialIds, dev_b, dev_e, bit);
+//
+//        //thrust to compute the scan of e -- if we input device data, then we should apply thrust::device, otherwise doesn't work
+//        thrust::device_ptr<int> dev_temp_e(dev_e);
+//        thrust::device_ptr<int> dev_temp_f(dev_f);
+//        thrust::exclusive_scan(thrust::device, dev_temp_e, dev_temp_e + values_size, dev_temp_f); //doesn't work, don't know why
+//        //compute total false by adding last element in dev_f and dev_e -- no, because need to cpy back, only for that number
+//        //directly add in in each kernel
+//        compute_t <<<blockSize, blocksPerGrid >> > (values_size, dev_e, dev_t, dev_f);
+//
+//        //compute the corresponding address of each element in new array and store in dev_d
+//        compute_d <<<blockSize, blocksPerGrid >> > (values_size, dev_b, dev_t, dev_f, dev_d);
+//
+//        //apply dev_d's address back to pathSegment and shaderIntersection for next round
+//        apply_address << <blockSize, blocksPerGrid >> > (values_size, dev_paths, dev_output_paths, dev_intersections, dev_output_intersections, dev_d);
+//        //apply the output_paths and output_intersections back to dev_paths and dev_intersections
+//        cudaMemcpy(dev_paths, dev_output_paths, values_size * sizeof(PathSegment), cudaMemcpyDeviceToDevice);
+//        cudaMemcpy(dev_intersections, dev_output_intersections, values_size * sizeof(ShadeableIntersection), cudaMemcpyDeviceToDevice);
+//    }
+//
+//    //no need to store another time because we have already done that in the last iteration in the loop
+//
+//    //Free all cuda stuff
+//
+//    cudaFree(dev_b);
+//    cudaFree(dev_e);
+//    cudaFree(dev_f);
+//    cudaFree(dev_t);
+//    cudaFree(dev_d);
+//    cudaFree(dev_materialIds);
+//
+//    cudaFree(dev_output_paths);
+//    cudaFree(dev_output_intersections);
+//    cudaFree(dev_input_paths);
+//    cudaFree(dev_input_intersections);
+//}
+
+
+
+
+//thanks for Jie Meng and Hanna's help
+struct material_comparison
+{
+    __host__ __device__
+        bool operator()(const ShadeableIntersection& i1, const ShadeableIntersection& i2)
+    {
+        return i1.materialId < i2.materialId;
+    }
+};
+
+void sort_by_material(int num_paths, PathSegment* dev_paths, ShadeableIntersection* dev_intersections)
+{
+    //wrapped by device_ptr
+    thrust::device_ptr<PathSegment> dev_paths_ptr(dev_paths);
+    thrust::device_ptr<ShadeableIntersection> dev_intersections_ptr(dev_intersections);
+
+    thrust::sort_by_key(thrust::device, dev_intersections_ptr, dev_intersections_ptr + num_paths, dev_paths_ptr, material_comparison());
+}
+
+
 /**
  * Wrapper for the __global__ call that sets up the kernel calls and does a ton
  * of memory management
@@ -310,7 +565,7 @@ void pathtrace(uchar4 *pbo, int frame, int iter) {
     //     Currently, intersection distance is recorded as a parametric distance,
     //     t, or a "distance along the ray." t = -1.0 indicates no intersection.
     //     * Color is attenuated (multiplied) by reflections off of any object
-    //   * TODO: Stream compact away all of the terminated paths.
+    //   * TODO: Stream compact away all of the terminated paths. -- but you still need their data right?
     //     You may use either your implementation or `thrust::remove_if` or its
     //     cousins.
     //     * Note that you can't really use a 2D kernel launch any more - switch
@@ -325,23 +580,189 @@ void pathtrace(uchar4 *pbo, int frame, int iter) {
     //   for you.
 
     // TODO: perform one iteration of path tracing
+    //start gpu timer
+    utilityCore::PerformanceTimer timer;
+    timer.startGpuTimer();
+
+    //if there is motion blur, we update the transformation of the geometry
+#if MOTION_BLUR
+    //actually, the velocity only determine how much we want to move the object
+    //float timeStep = iter; //wrong, fly out of scene
+    float timeStep = 1 / (hst_scene->state.iterations * 0.1f); // depend on how long you want the motion to arrive on your expected position -- here is 500 iteration
+    for (int i = 0; i < hst_scene->geoms.size(); i++)
+    {
+        hst_scene->geoms[i].translation += hst_scene->geoms[i].velocity * timeStep;
+        hst_scene->geoms[i].transform = utilityCore::buildTransformationMatrix(hst_scene->geoms[i].translation, hst_scene->geoms[i].rotation, hst_scene->geoms[i].scale);
+        hst_scene->geoms[i].inverseTransform = glm::inverse(hst_scene->geoms[i].transform);
+        hst_scene->geoms[i].invTranspose = glm::inverseTranspose(hst_scene->geoms[i].transform);
+    }
+
+    cudaMemcpy(dev_geoms, &(hst_scene->geoms)[0], hst_scene->geoms.size() * sizeof(Geom), cudaMemcpyHostToDevice);
+    checkCUDAError("motion blur error");
+
+#endif 
+
+#if CACHE_FIRST_BOUNCE && !ANTI_ALIASING && !MOTION_BLUR
+    //first iteration, we need to generate Ray and cache
+    if (iter == 1)
+    {
+        generateRayFromCamera << <blocksPerGrid2d, blockSize2d >> > (cam, iter, traceDepth, dev_paths);
+        checkCUDAError("generate camera ray");
+
+        //cache the ray
+        cudaMemcpy(dev_first_bounce_paths, dev_paths, pixelcount * sizeof(PathSegment), cudaMemcpyDeviceToDevice);
+
+        int depth = 0;
+        PathSegment* dev_path_end = dev_paths + pixelcount; //the tail of path segment array
+        int num_paths = dev_path_end - dev_paths; //is that the same as pixel count? -- no when antialiasing, do we need to change?
+
+        // --- PathSegment Tracing Stage ---
+        // Shoot ray into scene, bounce between objects, push shading chunks
+
+        bool iterationComplete = false;
+        //create a intermediate buffer
+        while (!iterationComplete) {
+
+            // clean shading chunks
+            cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
+            // tracing
+            dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
+            computeIntersections << <numblocksPathSegmentTracing, blockSize1d >> > (
+                depth
+                , num_paths
+                , dev_paths
+                , dev_geoms
+                , hst_scene->geoms.size()
+                , dev_intersections
+                );
+            checkCUDAError("trace one bounce");
+            cudaDeviceSynchronize();
+            //cache the intersection
+            if (depth == 0)
+            {
+                cudaMemcpy(dev_first_bounce_intersections, dev_intersections, pixelcount * sizeof(ShadeableIntersection), cudaMemcpyDeviceToDevice);
+            }
+            depth++;
+
+
+            // TODO:
+            // --- Shading Stage ---
+            // Shade path segments based on intersections and generate new rays by
+          // evaluating the BSDF.
+          // Start off with just a big kernel that handles all the different
+          // materials you have in the scenefile.
+
+          // TODO: compare between directly shading the path segments and shading
+          // path segments that have been reshuffled to be contiguous in memory.
+
+#if SORT_MATERIAL
+    //reshuffle the pathSegments
+            sort_by_material(num_paths, dev_paths, dev_intersections);
+#endif
 
+            shadeFakeMaterial << <numblocksPathSegmentTracing, blockSize1d >> > (
+                iter,
+                num_paths,
+                dev_intersections,
+                dev_paths,
+                dev_materials
+                );
+            // TODO: should be based off stream compaction results, and even shot more rays
+            // update the dev_path and num_paths -- if determine no_more_bounce by remainingBounce == -1, then our ray will be termianted and no longer take account
+            PathSegment* dev_paths_end_result = thrust::partition(thrust::device, dev_paths, dev_paths + num_paths, have_more_bounce());
+            num_paths = dev_paths_end_result - dev_paths;
+            if (depth >= traceDepth || num_paths <= 0)
+            {
+                iterationComplete = true;
+            }
+        }
+
+        //remember to recover its num_paths
+        num_paths = dev_path_end - dev_paths;
+        // Assemble this iteration and apply it to the image
+
+        dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
+        finalGather << <numBlocksPixels, blockSize1d >> > (num_paths, dev_image, dev_paths);
+    }
+    else
+    {
+
+        //copy dev_first_bounce_paths and dev_first_bounce_intersections to dev_paths and dev_intersections
+        cudaMemcpy(dev_paths, dev_first_bounce_paths, pixelcount * sizeof(PathSegment), cudaMemcpyDeviceToDevice);
+        cudaMemcpy(dev_intersections, dev_first_bounce_intersections, pixelcount * sizeof(ShadeableIntersection), cudaMemcpyDeviceToDevice);
+        //set the initial depth to 1
+        int depth = 1;
+        PathSegment* dev_path_end = dev_paths + pixelcount; //the tail of path segment array
+        int num_paths = dev_path_end - dev_paths;
+
+        bool iterationComplete = false;
+        //no need to generate ray, directly go into the while loop
+        while (!iterationComplete) {
+            //first go shading and then update the depth
+#if SORT_MATERIAL
+    //reshuffle the pathSegments
+            sort_by_material(num_paths, dev_paths, dev_intersections);
+#endif
+            dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
+            shadeFakeMaterial << <numblocksPathSegmentTracing, blockSize1d >> > (
+                iter,
+                num_paths,
+                dev_intersections,
+                dev_paths,
+                dev_materials
+                );
+            // TODO: should be based off stream compaction results, and even shot more rays
+            // update the dev_path and num_paths -- if determine no_more_bounce by remainingBounce == -1, then our ray will be termianted and no longer take account
+            PathSegment* dev_paths_end_result = thrust::partition(thrust::device, dev_paths, dev_paths + num_paths, have_more_bounce());
+            num_paths = dev_paths_end_result - dev_paths;
+            if (depth >= traceDepth || num_paths <= 0)
+            {
+                iterationComplete = true;
+                continue;
+            }
+
+
+            // reset dev_intersections
+            cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
+            // tracing
+            computeIntersections << <numblocksPathSegmentTracing, blockSize1d >> > (
+                depth
+                , num_paths
+                , dev_paths
+                , dev_geoms
+                , hst_scene->geoms.size()
+                , dev_intersections
+                );
+            checkCUDAError("trace one bounce");
+            cudaDeviceSynchronize();
+            depth++;
+        }
+
+        //remember to recover its num_paths
+        num_paths = dev_path_end - dev_paths;
+        // Assemble this iteration and apply it to the image
+
+        dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
+        finalGather << <numBlocksPixels, blockSize1d >> > (num_paths, dev_image, dev_paths);
+    }
+#else
 	generateRayFromCamera <<<blocksPerGrid2d, blockSize2d >>>(cam, iter, traceDepth, dev_paths);
 	checkCUDAError("generate camera ray");
 
 	int depth = 0;
-	PathSegment* dev_path_end = dev_paths + pixelcount;
-	int num_paths = dev_path_end - dev_paths;
+	PathSegment* dev_path_end = dev_paths + pixelcount; //the tail of path segment array
+	int num_paths = dev_path_end - dev_paths; //is that the same as pixel count? -- no when antialiasing, do we need to change?
 
 	// --- PathSegment Tracing Stage ---
 	// Shoot ray into scene, bounce between objects, push shading chunks
 
   bool iterationComplete = false;
+  //create a intermediate buffer
 	while (!iterationComplete) {
 
 	// clean shading chunks
 	cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
-
+    //cudaMemset(dev_intersections, 0, num_paths * sizeof(ShadeableIntersection));
 	// tracing
 	dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
 	computeIntersections <<<numblocksPathSegmentTracing, blockSize1d>>> (
@@ -363,9 +784,15 @@ void pathtrace(uchar4 *pbo, int frame, int iter) {
   // evaluating the BSDF.
   // Start off with just a big kernel that handles all the different
   // materials you have in the scenefile.
+
   // TODO: compare between directly shading the path segments and shading
   // path segments that have been reshuffled to be contiguous in memory.
 
+#if SORT_MATERIAL
+    //reshuffle the pathSegments
+    sort_by_material(num_paths,dev_paths,dev_intersections);
+#endif
+
   shadeFakeMaterial<<<numblocksPathSegmentTracing, blockSize1d>>> (
     iter,
     num_paths,
@@ -373,12 +800,27 @@ void pathtrace(uchar4 *pbo, int frame, int iter) {
     dev_paths,
     dev_materials
   );
-  iterationComplete = true; // TODO: should be based off stream compaction results.
+  // TODO: should be based off stream compaction results, and even shot more rays
+  // update the dev_path and num_paths -- if determine no_more_bounce by remainingBounce == -1, then our ray will be termianted and no longer take account
+  PathSegment* dev_paths_end_result = thrust::partition(thrust::device, dev_paths, dev_paths + num_paths, have_more_bounce());
+  //int previous_paths = num_paths;
+  num_paths = dev_paths_end_result - dev_paths;
+  //std::cout << previous_paths - num_paths << std::endl;
+  if (depth >= traceDepth || num_paths <= 0)
+  {
+      iterationComplete = true;
+  }
 	}
+    //remember to recover its num_paths
+    num_paths = dev_path_end - dev_paths;
+    // Assemble this iteration and apply it to the image
 
-  // Assemble this iteration and apply it to the image
-  dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
-	finalGather<<<numBlocksPixels, blockSize1d>>>(num_paths, dev_image, dev_paths);
+    dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
+    finalGather << <numBlocksPixels, blockSize1d >> > (num_paths, dev_image, dev_paths);
+#endif
+    //std::cout << std::endl;
+    timer.endGpuTimer();
+    utilityCore::printElapsedTime(timer.getGpuElapsedTimeForPreviousOperation(), "With stream compaction");
 
     ///////////////////////////////////////////////////////////////////////////
 
diff --git a/src/scene.cpp b/src/scene.cpp
index cbae043..3c1b444 100644
--- a/src/scene.cpp
+++ b/src/scene.cpp
@@ -75,6 +75,10 @@ int Scene::loadGeom(string objectid) {
             } else if (strcmp(tokens[0].c_str(), "SCALE") == 0) {
                 newGeom.scale = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
             }
+            // loading velocity here, if there exist one
+            else if (strcmp(tokens[0].c_str(), "VELOCITY") == 0) {
+                newGeom.velocity = glm::vec3(atof(tokens[1].c_str()), atof(tokens[2].c_str()), atof(tokens[3].c_str()));
+            }
 
             utilityCore::safeGetline(fp_in, line);
         }
diff --git a/src/sceneStructs.h b/src/sceneStructs.h
index b38b820..7aeba4c 100644
--- a/src/sceneStructs.h
+++ b/src/sceneStructs.h
@@ -23,6 +23,7 @@ struct Geom {
     glm::vec3 translation;
     glm::vec3 rotation;
     glm::vec3 scale;
+    glm::vec3 velocity;
     glm::mat4 transform;
     glm::mat4 inverseTransform;
     glm::mat4 invTranspose;
@@ -64,6 +65,7 @@ struct PathSegment {
 	glm::vec3 color;
 	int pixelIndex;
 	int remainingBounces;
+    bool terminated;
 };
 
 // Use with a corresponding PathSegment to do:
@@ -73,4 +75,5 @@ struct ShadeableIntersection {
   float t;
   glm::vec3 surfaceNormal;
   int materialId;
+  bool outside;
 };
diff --git a/src/utilities.h b/src/utilities.h
index abb4f27..54fd8d7 100644
--- a/src/utilities.h
+++ b/src/utilities.h
@@ -1,5 +1,13 @@
 #pragma once
 
+#include <cuda.h>
+#include <cuda_runtime.h>
+#include <cstdio>
+#include <cstring>
+#include <cmath>
+#include <chrono>
+#include <stdexcept>
+
 #include "glm/glm.hpp"
 #include <algorithm>
 #include <istream>
@@ -11,6 +19,11 @@
 
 #define PI                3.1415926535897932384626422832795028841971f
 #define TWO_PI            6.2831853071795864769252867665590057683943f
+#define InvPi             0.31830988618379067154;
+#define Inv2Pi            0.15915494309189533577;
+#define PiOver2           1.57079632679489661923;
+#define PiOver4           0.78539816339744830961;
+#define Sqrt2             1.41421356237309504880;
 #define SQRT_OF_ONE_THIRD 0.5773502691896257645091487805019574556476f
 #define EPSILON           0.00001f
 
@@ -23,4 +36,116 @@ namespace utilityCore {
     extern glm::mat4 buildTransformationMatrix(glm::vec3 translation, glm::vec3 rotation, glm::vec3 scale);
     extern std::string convertIntToString(int number);
     extern std::istream& safeGetline(std::istream& is, std::string& t); //Thanks to http://stackoverflow.com/a/6089413
+
+    //performance timer from hw2
+    class PerformanceTimer
+    {
+    public:
+        PerformanceTimer()
+        {
+            cudaEventCreate(&event_start);
+            cudaEventCreate(&event_end);
+        }
+
+        ~PerformanceTimer()
+        {
+            cudaEventDestroy(event_start);
+            cudaEventDestroy(event_end);
+        }
+
+        void startCpuTimer()
+        {
+            if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); }
+            cpu_timer_started = true;
+
+            time_start_cpu = std::chrono::high_resolution_clock::now();
+        }
+
+        void endCpuTimer()
+        {
+            time_end_cpu = std::chrono::high_resolution_clock::now();
+
+            if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); }
+
+            std::chrono::duration<double, std::milli> duro = time_end_cpu - time_start_cpu;
+            prev_elapsed_time_cpu_milliseconds =
+                static_cast<decltype(prev_elapsed_time_cpu_milliseconds)>(duro.count());
+
+            cpu_timer_started = false;
+        }
+
+        void startGpuTimer()
+        {
+            if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); }
+            gpu_timer_started = true;
+
+            cudaEventRecord(event_start);
+        }
+
+        void endGpuTimer()
+        {
+            cudaEventRecord(event_end);
+            cudaEventSynchronize(event_end);
+
+            if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); }
+
+            cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end);
+            gpu_timer_started = false;
+        }
+
+        float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015
+        {
+            return prev_elapsed_time_cpu_milliseconds;
+        }
+
+        float getGpuElapsedTimeForPreviousOperation() //noexcept
+        {
+            return prev_elapsed_time_gpu_milliseconds;
+        }
+
+        // remove copy and move functions
+        PerformanceTimer(const PerformanceTimer&) = delete;
+        PerformanceTimer(PerformanceTimer&&) = delete;
+        PerformanceTimer& operator=(const PerformanceTimer&) = delete;
+        PerformanceTimer& operator=(PerformanceTimer&&) = delete;
+
+    private:
+        cudaEvent_t event_start = nullptr;
+        cudaEvent_t event_end = nullptr;
+
+        using time_point_t = std::chrono::high_resolution_clock::time_point;
+        time_point_t time_start_cpu;
+        time_point_t time_end_cpu;
+
+        bool cpu_timer_started = false;
+        bool gpu_timer_started = false;
+
+        float prev_elapsed_time_cpu_milliseconds = 0.f;
+        float prev_elapsed_time_gpu_milliseconds = 0.f;
+    };
+
+    template<typename T>
+    void printElapsedTime(T time, std::string note = "")
+    {
+        std::cout << "   elapsed time: " << time << "ms    " << note << std::endl;
+    }
+}
+
+inline int ilog2(int x) {
+    int lg = 0;
+    while (x >>= 1) {
+        ++lg;
+    }
+    return lg;
+}
+
+inline int ilog2ceil(int x) {
+    return x == 1 ? 0 : ilog2(x - 1) + 1;
+}
+//from 561 hw
+inline float CosTheta(const glm::vec3 &w) { return w.z; }
+inline float Cos2Theta(const glm::vec3 &w) { return w.z * w.z; }
+inline float AbsCosTheta(const glm::vec3 &w) { return std::abs(w.z); }
+inline float Sin2Theta(const glm::vec3 &w) {
+    return std::max((float)0, (float)1 - Cos2Theta(w));
 }