YINING KARL LI | Code & Visuals: 2012

Tuesday, December 18, 2012

Texture Mapping

A few weeks back I started work on another piece of super low-hanging fruit: texture mapping! Before I delve into the details, here's a test render showing three texture mapped spheres with varying degrees of glossiness in a glossy-walled Cornell box. I was also playing with logos for Takua render and put a test logo idea on the back wall for fun:

...and the same scene with the camera tilted down just to show off the glossy floor (because I really like the blurry glossy reflections):

My texturing system can, of course, support textures of arbitrary resolution. The black and white grid and colored UV tile textures in the above render are square 1024x1024, while the Earth texture is rectangular 1024x512. Huge textures are handled just fine, as demonstrated by the following render using a giant 2048x2048, color tweaked version of Simon Page's Space Janus wallpaper:

Of course UV transformations are supported. Here's the same texture with a 35 degree UV rotation applied and tiling switched on:

Since memory is always at a premium, especially on the GPU, I've implemented textures in a fashion inspired by geometry instancing and node based material systems, such as the system for Maya. Inside of my renderer, I represent texture files as a file node containing the raw image data, streamed from disk via stb_image. I then apply transformations, UV operations, etc through a texture properties node, which maintains a pointer to the relevant texture file node, and then materials point to whatever texture properties nodes they need. This way, texture data can be read and stored once in memory and recycled as many times as needed, meaning that a well formatted scene file can altogether eliminate the need for redundant texture read/storage in memory. This system allows me to create amusing scenes like the following one, where a single striped texture is reused in a number of materials with varied properties:

Admittedly I made that stripe texture really quickly in Photoshop without too much care for straightness of lines, so it doesn't actually tile very well. Hence why the sphere in the lower front shows a discontinuity in its texture... that's not glitchy UVing, just a crappy texture!

I've also gone ahead and extended my materials system to allow any material property to be driven with a texture. In fact, the stripe room render above is using the same stripe texture to drive reflectiveness on the side walls, resulting in reflective surfaces where the texture is black and diffuse surfaces where the texture is white. Here's another example of texture driven material properties showing emission being driven using the same color-adjusted Space Janus texture from before:

Even refractive and reflective index of refraction can be driven with textures, which can yield some weird/interesting results. Here are a pair of renders showing a refractive red cube with uniform IOR, and with IOR driven with a Perlin noise map:

Uniform refractive IOR

Refractive IOR driven with a Perlin noise texture map

The nice thing about a node-style material representation is that I should be able to easily plug in procedural functions in place of textures whenever I get around to implementing some (that way I can use procedural Perlin noise instead of using a noise texture).

Here's an admittedly kind of ugly render using the color UV grid texture to drive refractive color:

For some properties, I've had to add a requirement to specify a range of valid values by the user when using a texture map, since RGB values don't map well to said properties. An example would be glossiness, where a gloss value range of 0% to 100% leaves little room for detailed adjustment. Of course this issue can be fixed by adding support for floating point image formats such as OpenEXR, which is coming very soon! In the following render, the back wall's glossiness is being driven using the stripe texture (texture driven IOR is also still in effect on the red refractive cube):

Of course, even with nice instancing schemes, textures potentially can take up a gargantuan amount of memory, which poses a huge problem in the GPU world where onboard memory is at a premium. I still need to think more about how I'm going to deal with memory footprints larger than on-device memory, but at the moment my plan is to let the renderer allocate and overflow into pinned host memory whenever it detects that the needed footprint is within some margin of total available device memory. This concern is also a major reason why I've decided to stick with CUDA for now... until OpenCL gets support for a unified address space for pinned memory, I'm not wholly sure how I'm supposed to deal with memory overflow issues in OpenCL. I haven't reexamine OpenCL in a little while now though, so perhaps it is time to take another look.

Unfortunately, something I discovered while in the process of extending my material system to support texture driven properties is that my renderer could probably use a bit of refactoring for the sake of organization and readability. Since I now have some time over winter break and am planning on making my Github repo for Takua-RT public soon, I'll probably undertake a bit of code refactoring over the next few weeks.

Friday, December 7, 2012

Blurred Glossy Reflections

Over the past few months I haven't been making as much progress on my renderer as I would have liked, mainly because another major project has been occupying most of my attention: TAing/restructuring the GPU Programming course here at Penn. I'll probably write a post at the end of the semester with detailed thoughts and comments about that later. More on that later!

I recently had a bit of extra time, which I used to tackle a piece of super low hanging fruit: blurred glossy reflections. The simplest brute force approach blurred glossy reflections is to take the reflected ray direction from specular reflection, construct a lobe around that ray, and sample across the lobe instead of only along the reflected direction. The wider the lobe, the blurrier the glossy reflection gets. The following diagram, borrowed from Wikipedia, illustrates this property:

In a normal raytracer or rasterization based renderer, blurred glossy reflections require something of a compromise between speed and visual quality (much like many other effects!), since using a large number of samples within the glossy specular lobe to achieve a high quality reflection can be prohibitively expensive. This cost-quality tradeoff is therefore similar to the tradeoffs that must be made in any distributed raytracing effect. However, in a pathtracer, we're already using a massive number of samples, so we can fold the blurred glossy reflection work into our existing high sample count. In a GPU renderer, we have massive amounts of compute as well, making blurred glossy reflections far more tractable than in a traditional system.

The image at the top of this post shows three spheres of varying gloss amounts in a modified Cornell box with a glossy floor and reflective colored walls, rendered entirely inside of Takua-RT. Glossy to glossy light transport is an especially inefficient scenario to resolve in pathtracing, but throwing brute force GPU compute at it allows for arriving at a good solution reasonably quickly: the above image took around a minute to render at 800x800 resolution. Here is another test of blurred glossy reflections, this time in a standard diffuse Cornell box:

...and some tests showing varying degrees of gloss, within a modified Cornell box with glossy left and right walls. Needless to say, all of these images were also rendered entirely inside of Takua-RT.

Full specular reflection

Approximately 10% blurred glossy reflection

Approximately 30% glossy reflection

Finally, here's another version of the first image in this post, but with the camera in the wrong place. You can see a bit of the stand-in sky I have right now. I'm working on a sun & sky system right now, but since its not ready yet, I have a simple gradient serving as a stand-in right now. I'll post more about sun & sky when I'm closer to finishing with it... I'm not doing anything fancy like Peter Kutz is doing (his sky renderer blog is definitely worth checking out, by the way), just standard Preetham et. al. style.

Saturday, September 15, 2012

Thoughts on Stackless KD-Tree Traversal

Edit: Erwin Coumans in the comments section has pointed me to a GDC 2009 talk by Takahiro Harada proposing something called Tree Traversal using History Flags, which is essentially the same as the idea proposed in this post, with the small exception that Harada's technique uses a bit field to track previously visited nodes on the up traverse. I think that Harada's technique is actually better than the pointer check I wrote about in this post, since keeping a bit field would allow for tracking the previously visited node without having to go back to global memory to do a node check. In other words, the bit field method allows for less thrashing of global memory, which I should think allows for a nice performance edge. So, much as I suspected, the idea in this post in one that folks smarter than me have arrived upon previously, and my knowledge of the literature on this topic is indeed incomplete. Much thanks to Erwin for pointing me to the Harada talk! The original post is preserved below, in case anyone still has an interest in reading it.

Of course, one of the biggest challenges to implementing a CUDA pathtracer is the lack of recursion on pre-Fermi GPUs. Since I intend for Takua-RT to be able to run on any CUDA enabled CPU, I necessarily have to work with the assumption that I won't have recursion support. Getting around this problem in the core pathtracer is not actually a significant issue, as building raytracing systems that operate in an iterative fashion as opposed to in a recursive fashion is a well-covered topic.

Traversing a kd-tree without recursion, however, is a more challenging proposition. So far as I can tell from a very cursory glance at existing literature on the topic, there are presently two major approaches: fully stack-less methods that require some amount of pre-processing of the kd-tree, such as the rope-based method presented in Popov et. al. [2007], and methods utilizing a short stack or something similar, such as the method presented in Zhou et. al. [2008]. I'm in the process of reading both of these papers more carefully, and will probably explore at least one of these approaches soon. In the meantime, however, I thought it might be a fun exercise to try to come up with some solution of my own, which I'll summarize in this post. I have to admit that I have no idea if this is actually a novel approach, or if its something that somebody also came up with and rejected a long time ago and I just haven't found yet. My coverage of the literature in this area is highly incomplete, so if you, the reader, are aware of a pre-existing version of this idea, please let me know so that I can attribute it properly!

The basic idea I'm starting with is that when traversing a KD-tree (or any other type of tree, for that matter), at a given node, there's only a finite number of directions one can go in, and a finite number of previous nodes one could have arrived at the current node from. In other words, one could conceivably define a finite-state machine type system for traversing a KD-tree, given an input ray. I say finite-state machine type, because what I shall define here isn't actually strictly a FSM, as this method requires storing information about the previous state in addition to the current state. So here we go:

We begin by tracking two pieces of information: what the current node we are at is, and what direction we had to take from the previous node to get to the current node. There are three possible directions we could have come from:

1. Down from the current node's parent node
2. Up from the current node's left child
3. Up from the current node's right child

Similarly, there are only three directions we can possibly travel in from the current node:

1. Up to the current node's parent node
2. Down to the current node's left child
3. Down to the current node's right child

When we travel up from the current node to its parent, we can easily figure out if we are traveling up from the right or the left by looking at whether the current node is the parent node's left or right child.

Now we need a few rules on which direction to travel in given the direction we came from and some information on where our ray currently is in space:

1. If we came down from the parent node and if the current node is not a leaf node, intersection test our ray with both children of the current node. If the ray only intersects one of the children, traverse down to that child. If the ray intersects both of the children, traverse down to the left child.
2. If we came down from the parent node and if the current node is a leaf node, carry out intersection tests between the ray and the contents of the node and store the nearest intersection.
3. If we came up from the left child, intersection test our ray with the right child of the current node. If we have an intersection, traverse down the right child. If we don't have an intersection, traverse upwards to the parent.
4. If we came up from the right child, traverse upwards to the parent.

That's it. With those four rules, we can traverse an entire KD-Tree in a DFS fashion, while skipping branches that our ray does not intersect for a more efficient traverse, and avoiding any use of recursion or the use of a stack in memory.

There is, of course, the edge case that our ray is coming in to the tree from the "back", so that the right child of each node is "in front" of the left child instead of "behind", but we can easily deal with this case by simply testing which side of the KD-tree we're entering from and swapping left and right in our ruleset accordingly.

I haven't actually gotten around to implementing this idea yet (as of September 15th, when I started writing this post, although this post may get published much later), so I'm not sure what the performance looks like. There are some inefficiencies in how many nodes our traverse will end up visiting, but on the flipside, we won't need to keep much of anything in memory except for two pieces of state information and the actual KD-tree itself. On the GPU, I might run into implementation level problems that could impact performance, such as too many branching statements or memory thrashing if the KD-tree is kept in global memory and a gazillion threads try to traverse it at once, so these issues will need to be addressed later.

Again, if you, the reader, knows of this idea from a pre-existing place, please let me know! Also, if you see a gaping hole in my logic, please let me know too!

Since this has been a very text heavy post, I'll close with some pictures of a KD-tree representing the scene from the Takua-RT post. They don't really have much to do with the traverse method presented in this post, but they are KD-tree related!

Vimeo's compression really does not like thin narrow lines, so here are some stills:

Monday, September 10, 2012

TAKUA/Avohkii Render

One question I've been asking myself ever since my friend Peter Kutz and I wrapped our little GPU Pathtracer experiment is "why am I writing Takua Render as a CPU-only renderer?" One of the biggest lessons learned from the GPU Pathtracer experiment was that GPUs can indeed provide vast quantities of compute suitable for use in pathtracing rendering. After thinking for a bit at the beginning of the summer, I've decided that since I'm starting my renderer from scratch and don't have to worry about the tons of legacy that real-world big boy renderers like RenderMan have to deal with, there is no reason why I shouldn't architect my renderer to use whatever compute power is available.

With that being said, from this point forward, I will be concurrently developing CPU and GPU based implementations of Takua Render. I call this new overall project TAKUA/Avohkii, mainly because Avohkii is a cool name. Within this project, I will continue developing the C++ based x86 version of Takua, which will retain the name of just Takua, and I will also work on a CUDA based GPU version, called Takua-RT, with full feature parity. I'm also planning on investigating the idea of an ARM port, but that's an idea for later. I'm going to stick with CUDA for the GPU version now since I know CUDA better than OpenCL and since almost all of the hardware I have access to develop and test on right now is NVIDIA based (the SIG Lab runs on NVIDIA cards...), but that could change down the line. The eventual goal is to have a set of renderers that together cover as many hardware bases as possible, and can all interoperate and intercommunicate for farming purposes.

I've already gone ahead and finished the initial work of porting Takua Render to CUDA. One major lesson learned from the GPU Pathtracer experiment was that enormous CUDA kernels tend to run into a number of problems, much like massive monolithic GL shaders do. One problem in particular is that enormous kernels tend to take a long time to run and can result in the GPU driver terminating the kernel, since NVIDIA's drivers by default assume that device threads taking longer than 2 seconds to run are hanging and cull said threads. In the GPU Pathtracer experiment, we used a giant monolithic kernel for a single ray bounce, which ran into problems as geometry count went up and subsequently intersection testing and therefore kernel execution time also increased. For Takua-RT, I've decided to split a single ray bounce into a sequence of micro-kernels that launch in succession. Basically, each operation is now a kernel; each intersection test is a kernel, BRDF evaluation is a kernel, etc. While I suppose I lose a bit of time in having numerous kernel launches, I am getting around the kernel time-out problem.

Another important lesson learned was that culling useless kernel launches is extremely important. I'm checking for empty threads at the end of each ray bounce and culling via string compaction for now, but this can of course be further extended to the micro-kernels for intersection testing later.

Anyhow, enough text. Takua-RT, even in its super-naive unoptimized CUDA-port state right now, is already so much faster than the CPU version that I can render frames with fairly high convergence in seconds to minutes. That means the renderer is now fast enough for use on rendering animations, such as the one at the top of this post. No post-processing whatsoever was applied, aside from my name watermark in the lower right hand corner. The following images are raw output frames from Takua-RT, this time straight from the renderer, without even watermarking:

Each of these frames represents 5000 iterations of convergence, and took about a minute to render on a NVIDIA Geforce GTX480. The flickering in the glass ball in animated version comes from having a low trace depth of 3 bounces, including for glass surfaces.

Sunday, September 9, 2012

Jello KD-Tree

I've started an effort to clean up, rewrite, and enhance my ObjCore library, and part of that effort includes taking my KD-Tree viewer from Takua Render and making it just a standard component of ObjCore. As a result, I can now plug the latest version of ObjCore into any of my projects that use it and quickly wire up support for viewing the KD-Tree view for that project. Here's the jello sim from a few months back visualized as a KD-Tree:

I've adopted a new standard grey background for OpenGL tests, since I've found that the higher amount of contrast this darker grey provides plays nicer with Vimeo's compression for a clearer result. But of course I'll still post stills too.

Hopefully at the end of this clean up process, I'll have ObjCore in a solid enough of a state to post to Github.

Wednesday, September 5, 2012

Volumetric Renderer Revisited

I've been meaning to add animation support to my volume renderer for demoreel purposes for a while now, so I did that this week! Here's a rotating cloud animation:

...and of course, a still or two:

Instead of just rotating the camera around the cloud, I wanted for the cloud itself to rotate but have the noise field it samples stay stationary, resulting in a cool kind of morphing effect with the cloud's actual shape. In order to author animations easily, I implemented a fairly rough, crude version of Maya integration. I wrote a script that will take spheres and point lights in Maya and build a scene file for my volume renderer using the Maya spheres to define cloud clusters and the point lights to define... well... lights. With an easy bit of scripting, I can do this for each frame in a keyframed animation in Maya and then simply call the volume renderer once for each frame. Here's what the above animation's Maya scene file looks like:

Also, since my pseudo-blackbody trick was originally intended to simulate the appearance of a fireball, I tried creating an animation of a fireball by just scaling a sphere:

...and as usual again, stills:

So that's that for the volume renderer for now! I think this might be the end of the line for this particular incarnation of the volume renderer (it remains the only piece of tech I'm keeping around that is more or less unmodified from its original CIS460/560 state). I think the next time I revisit the volume renderer, I'm either going to port it entirely to CUDA, as my good friend Adam Mally did with his, or I'm going to integrate it into my renderer project, Peter Kutz style.

Thursday, August 16, 2012

More Experiments with Trees

Every once in a while, I return to trying to make a good looking tree. Here's a frame from my latest attempt:

Have I finally managed to create a tree that I'm happy with? Well..... no. But I do think this batch comes closer than previous attempts! I've had a workflow for creating base tree geometry for a while now that I'm fairly pleased with, which is centered around using OnyxTREE as a starting point and then custom sculpting in Maya and Mudbox. However, I haven't tried actually animating trees before, and shading trees properly has remained a challenge. So, my goal this time around was to see if I could make any progress in animating and shading trees.

As a starting point, I played with just using the built in wind simulation tools in OnyxTREE, which was admittedly difficult to control. I found that having medium to high windspeeds usually led to random branches glitching out and jumping all over the place. I also wanted to make a weeping willow style tree, and even medium-low windspeeds often resulted in the hilarious results:

It turns out OnyxTREE runs fine in Wine on OSX. Huh

A bigger problem though was the sheer amount of storage space exporting animated tree sequences from Onyx to Maya requires. The only way to bring Onyx simulations into programs that aren't 3ds Max is to export the simulation as an obj sequence from Onyx and then import the sequence into whatever program. Maya doesn't have a native method to import obj sequences, so I wrote a custom Python script to take care of it for me. Here's a short compilation of some results:

One important thing I discovered was that the vertex numbering in each obj frame exported from Onyx remains consistent; this fact allowed for an important improvement. Instead of storing a gazillion individual frames of obj meshes, I experimented with dropping a large number of intermediate frames and leaving a relatively smaller number of keyframes which I then used as blendshape frames with more scripting hackery. This method works rather well; in the above video, the weeping willow at the end uses this approach. There is, however, a significant flaw with this entire Onyx based animation workflow: geometry clipping. Onyx's system does not resolve cases where leaves and entire branches clip through each other... while from a distance the trees look fine, up close the clipping can become quite apparent. For this reason, I'm thinking about abandoning the Onyx approach altogether down the line and perhaps experimenting with building my own tree rigs and procedurally animating them. That's a project for another day, however.

On the shading front, my basic approach is still the same: use a Vray double sided material with a waxier, more specular shader for the "front" of the leaves and a more diffuse shader for the "back". In real life, leaves of course display an enormous amount of subsurface scattering, but leaves are a special case for subsurface scatter: they're really really thin! Normally subsurface scattering is a rather expensive effect to render, but for thin material cases, the Vray double sided material can quite efficiently approximate the subsurface effect for a fraction of the rendertime.

Bark is fairly straightforward to, it all comes down to the displacement and bump mapping. Unfortunately, the limbs in the tree models I made this time around were straight because I forgot to go in and vary them up/sculpt them. Because of the straightness, my tree twigs don't look very good this time, even with a decent shader. Must remember for next time! Creating displacement bark maps from photographs or images sourced from Google Image Search or whatever is really simple; take your color texture into Photoshop, slam it to black and white, and adjust contrast as necessary:

Here's a few seconds of rendered output with the camera inside of the tree's leaf canopy, pointed skyward. It's not exactly totally realistic looking, meaning it needs more work of course, but I do like the green-ess of the whole thing. More importantly, you can see the subsurface effect on the leaves from the double sided material!

Something that continues to prove challenging is how my shaders hold up at various distances. The same exact shader (with a different leaf texture), looks great from a distance, but loses realism when the camera is closer. I did a test render of the weeping willow from further away using the same shader, and it looks a lot better. Still not perfect, but closer than previous attempts:

...and of course, a pretty still or two:

A fun experiment I tried was building a shader that can imitate the color change that occurs as fall comes around. This shader is in no way physically based, it's using just a pure mix function controlled through keyframes. Here's a quick test showing the result:

Eventually building a physically based leaf BSSDF system might be a fun project for my own renderer. Speaking of which, I couldn't resist throwing the weeping willow model through my KD-tree library to get a tree KD-tree:

Since the Vimeo compression kind of borks thin lines, here's a few stills:

Alright, that's all for this time! I will most likely return to trees yet again perhaps a few weeks or months from now, but for now, much has been learned!

Saturday, July 14, 2012

Random Point Sampling On Surfaces

Just a heads up, this post is admittedly more of a brain dump for myself than it is anything else.

A while back I implemented a couple of fast methods to generate random points on geometry surfaces, which will be useful for a number of applications, such as direct lighting calculations involving area lights.

The way I'm sampling random points varies by geometry type, but all methods are pretty simple. Right now the system is implemented such that I can give the renderer a global point density to follow, and points will be generated according to that density value. This means the number of points generated on each piece of geometry is directly linked to the geometry's surface area.

For spheres, the method I use is super simple: get the surface area of the sphere, generate random UV coordinates, and map those coordinates back to the surface of the sphere. This method is directly pulled from this Wolfram Mathworld page, which also describes why the most naive approach to point picking on a sphere is actually wrong.

My approach for ellipsoids unfortunately is a bit brute force. Since getting the actual surface area for an ellipsoid is actually fairly mathematically tricky, I just approximate it and then use plain old rejection sampling to get a point.

Boxes are the easiest of the bunch; find the surface area of each face, randomly select a face weighted by the proportion of the total surface area that face comprises, and then pick a random x and y coordinate on that face. The method I use for meshes is similar, just on potentially a larger scale: find the surface area of all of the faces in the mesh and select a face randomly weighted by the face's proportion of the total surface area. Then instead of generating random cartesian coordinates, I generate a random barycentric coordinate, and I'm done.

The method that I'm using right now is purely random, so there's no guarantee of equal spacing between points initially. Of course, as one picks more and more points, the spacing between any given set of points will converge on something like equally spaced, but that would take a lot of random points. I've been looking at this Dart Throwing On Surfaces Paper for ideas, but at least for now, this solution should work well enough for what I want it for (direct lighting). But we shall see!

Also, as I am sure you can guess from the window chrome on that last screenshot, I've successfully tested Takua Render on Linux! Specifically, on Fedora!

Thursday, July 5, 2012

Thoughts on Ray Bounce Depth

I finally got around to doing a long overdue piece of analysis on Takua Render: looking at the impact of ray bounce depth on performance and on the final image.

Of course, in real life, light can bounce around (almost) indefinitely before it is either totally absorbed or enters our eyeballs. Unfortunately, simulating this behavior completely is extremely difficult in any type of raytracing solution because in a raytrace solution, letting a ray bounce around indefinitely until it does something interesting can lead to extremely extremely long render times. Thus, one of the first shortcuts that most raytracing (and therefore pathtracing) systems take is cutting off rays after they bounce a certain number of times. This strategy should not have much of an impact on the final visual quality of a rendered image, since the more a light ray bounces around, the less each successive bounce contributes to the final image anyway.

With that in mind, I did some tests with Takua Render in hopes of finding a good balance between ray bounce depth and quality/speed. The following images or a glossy white sphere in a Cornell Box were rendered on a quad-core 2.5 GhZ Core i5 machine.

For a reference, I started with a render with a maximum ray bounce depth of 50 and 200 samples per pixel:

Max Bounce Depth of 50, 200 iterations, took 1325 seconds to render.

Then I ran a test render with a maximum of just 2 bounces; essentially, this represents the direct lighting part of the solution only, albeit generated in a Monte Carlo fashion. Since I made the entire global limit 2 bounces, no reflections show up on the sphere of the walls, just the light overhead. Note the total lack of color bleeding and the dark shadow under the ball.

Max Bounce Depth of 2, 200 iterations, took 480 seconds to render.

The next test was with a maximum of 5 bounces. In this test, nice effects like color bleeding and indirect illumination are back! However, compared to the reference render, the area under the sphere still has a bit of dark shadowing, much like what one would expect if an ambient occlusion pass had been added to the image. While not totally accurate to the reference render, this image under certain artistic guidelines might actually be acceptable, and renders considerably faster.

Max Bounce Depth of 5, 200 iterations, took 811 seconds to render.

Differencing the 5 bounce render from the reference 50 bounce render shows that the 5 bounce one is ever so slightly dimmer and that most of the difference between the two images is in the shadow area under the sphere. Ignore the random fireflying pixels, which is just a result of standard pathtracing variance in the renders:

5 bounce test differenced with the 50 bounce reference.

The next test was 10 bounces. At 10 bounces, the resultant images is essentially visually indistinguishable from the 50 bounce reference, as shown by the differenced image included. This result implies that beyond 10 bounces, the contributions of successive bounces to the final image are more or less negligible.

Max Bounce Depth of 10, 200 iterations, took 995 seconds to render.

10 bounce test differenced with the 50 bounce reference. Note that there is essentially no difference.

Finally, a test with a maximum of 20 bounces is still essentially indistinguishable from both the 10 bounce test and the 50 bounce reference:

Max Bounce Depth of 20, 200 iterations, took 1277 seconds to render.

Interestingly, render times do not scale linearly with maximum bounce depth! The reason for this relationship (or lack thereof) can be found in the fact that the longer a ray bounces around, the more likely it is to find a light source and terminate. At 20 bounces, the odds of a ray finding a light source is very very close to the odds of a ray finding a light source at 50 bounces, explaining the smallness of the gap in render time between 20 and 50 bounces (especially compared to the difference in render time between, say, 2 and 5 bounces).