Sunday, October 17, 2010

kd-tree CPU builder and GPU-based tracer is ready

In the past week, I added significant amount of code to my renderer, most of which pertained to kd-tree construction and traversal. I read through Kun Zhou's paper on GPU based kd-tree construction, and decided that the paper could be implemented in the following steps:

1) Make sure the kd-tree tracer (traversal) is working. To achieve this, I first implement a CPU based kd-tree builder with data structure optimized for CUDA. Then a GPU based tracer that utilizes the resulting kd-tree should be implemented. At this point, I should have a kd-tree CPU builder and GPU tracer.

2) Substitute a GPU kd-tree builder for the CPU one built in the first step. The output data structure should be in sync so that we don't need to change (hopefully) the GPU tracer in step 1. The GPU kd-tree builder should implement the Small Node stage according to Zhou's paper.

3) Implement the Large Node stage as proposed in Zhou's paper. This is trickier than Small Node Stage, but hopefully this is going to provide worthwhile performance boost.

So for this week Step 1 is done. I set up a team foundation server project so that I can keep track of different versions of the code which is very useful. For the actual kd-tree implementation, to make the data readily available to the GPU (and later to the GPU kd-tree builder), the kd-tree nodes, triangle lists and split position lists are pre-allocated chunks of memory. For example, KdTreeNode class looks like the following:

struct KdTreeNode
{
    union
    {
        struct
        {
            float split;
            int axis, left, tri;
        };
        struct
        {
            float4 __data;
        };
    };
};

It is packed into one float4 value. Left is the index of the left child in the pre-allocated array of KdTreeNodes, and the right child is simply left + 1. The integer tri points at the head index of the triangle list of the node in the pre-allocated triangle list array. The triangle list and split position lists are two big linked lists. Whenever some resource is requested, an element is fetched from the next available element in the lists, and gets returned to the "pool" once it's no longer used. This way, we only need to allocate the memory once.

The GPU tracing part is a stacked-based kd-tree traversal implementation according to Zhou's paper. However, the triangle list isn't optimized into pre-order traversal order yet. This can be done in Step 2 & 3.

For the next week, I plan to complete Step 2 at the minimum, and optionally Step 3 if time permits. Moreover, I will implement more intuitive movement control by using keyboard and let the obj loader support texture mapping.

The following is a video showcasing the current state of the tracer. I used goldfish.obj with 23k triangles. After kd-tree construction using the CPU with 13 maximum levels and 9516 kd-tree nodes, the ray-tracer achieved still a stable 60fps (max FPS for the monitor) with 2 levels of reflection. Pretty satisfactory result for now.


No comments:

Post a Comment