E5200 OC 2.9GHz, OpenMP 2 threads, rasterized 2 level noised density fields into a 512M volumn in 1150.62 seconds with paging system, used 130M memory, a deal between time and space.
To stokes, I have to take care about multithreading, GPU and even network, paging system for DSO rendering etc.
I decided to use block-based addresing mode not slice based, because of sparse accessing, compressing and paging. Why paging ? Assumed that we would raymarch an 8G density field file by a RenderMan DSO, we could not use too much memory but one slice in that file might up to 32MB, if we could only use up to 512M memory in DSO, we could just hold 16 slices in memory but we can cache much more blocks (a 64^3 density block only occupies 1M), that would be efficient when enabled adaptive sampling. At the same time, too large linear slice-based volumn can’t be loaded into GPU for previewing.
What’s the most critical about a software ? Architecture. So, how to improve the architecture ? Spend more and more time about execution workflow. I still need a lot of time to re-factor the architecture, make it faster and more flexible.