* Tracing rays is fast, shading is slow
* normalize is slow and should be avoided as much as possible
* C is not faster than C++, correct use of C++ features can improve performance
* SSE is not always faster, should be used without load and store from FPU
* Taking branch is slower, computation is faster
* Computation is fast, accessing memory is slow
* Do not stuck the stack memory, make best use of registers
* Write clean code, optimize when the code is clean enough
* Always use a profiler, the bottleneck is not what you expect for most of the time