Photon mapping with final gathering and stratified importance sampling was implemented on CUDA this week. I also tried a crude implementation of irradiance caching on the GPU using a divergence field based cache position selection which I came up with myself, it still needs further improvement, and I will try to post that next week. Now, without irradiance caching, with 800x600 resolution, 100k photons and 200 final gather rays per pixel, rendering a frame takes ~ 15-20 secs. This is far from interactive, but since I uses a incremental based approach where intermediate results are shown, the system is far more responsive in quickly visualize results.
The following is a screen shot of the current renderer. With 1024 by 768 resolution, 100k photons and 260 final gather rays per pixel, It's fully rendered in 25 seconds.
The following video showcases the current state of the renderer. Note that when the light source is moved, the reponse is pretty slow (~1fps). This is because we need to precompute irradiance at each photon position to accelerate the final gather. Also we shoot 100k photons. With 10k photons, we get around 10-15fps when moving the light source and similar rendering results.