A raytracer in python – part 4: profiling

After having finally obtained a raytracer which produces antialiasing, it is now time to take a look at performance. We already saw some numbers in the last post. Rendering a 200×200 image with 16 samples per pixels (a grand total of 640.000 rays) takes definitely too much. I want to perform some profiling with python, find the hotspots in the code, and eventually devise a strategy to optimize them.

General profiling with cProfile

To perform basic profiling, I used the cProfile program provided in the standard library. It appears that the longest processing time is in the hit() function

$ python -m cProfile -s time test3.py
 Ordered by: internal time

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 2560000  85.329    0.000  108.486    0.000 Sphere.py:12(hit)
 1960020  30.157    0.000   30.157    0.000 {numpy.core.multiarray.array}
 7680000  23.115    0.000   23.115    0.000 {numpy.core._dotblas.dot}
 1        19.589   19.589  195.476  195.476 World.py:25(render)
 2560000   7.968    0.000  116.454    0.000 World.py:62(f)
 640000    6.710    0.000  133.563    0.000 World.py:61(hit_bare_bones_object)
 640025    4.438    0.000  120.902    0.000 {map}
 640000    3.347    0.000  136.910    0.000 Tracer.py:5(trace_ray)
 640000    3.009    0.000    3.009    0.000 {numpy.core.multiarray.zeros}
 640000    2.596    0.000    2.613    0.000 {sorted}
 640000    2.502    0.000    3.347    0.000 {filter}
 640000    1.835    0.000   16.784    0.000 Ray.py:4(__init__)

This does not surprise me, as the main computation a raytracer performs is to test each ray for intersection on the objects in the scene, in this case multiple Sphere objects.

Profiling line by line for hot spots

Understood that most of the time is spent into hit(), I wanted to perform line-by-line profiling. This is not possible with the standard python cProfile module, therefore I searched and found an alternative, line_profiler:

$ easy_install-2.7 --prefix=$HOME line_profiler
$ kernprof.py -l test3.py
Wrote profile results to test3.py.lprof
$ python -m line_profiler test3.py.lprof

Before running the commands above, I added the @profile decorator to the method I am interested in. This decorator is added by line_profiler to the __builtin__ module, so no explicit import statement is needed.

class Sphere(object):
    <...>
    @profile
    def hit(self, ray):
         <...>

The results of this profiling are

Line # Hits  Time    Per Hit  % Time Line Contents
==============================================================
12                                   @profile
13                                   def hit(self, ray):
14 2560000  27956358  10.9     19.2     temp = ray.origin - self.center
15 2560000  17944912   7.0     12.3     a = numpy.dot(ray.direction, ray.direction)
16 2560000  24132737   9.4     16.5     b = 2.0 * numpy.dot(temp, ray.direction)
17 2560000  37113811  14.5     25.4     c = numpy.dot(temp, temp) \
                                              - self.radius * self.radius
18 2560000  20808930   8.1     14.3     disc = b * b - 4.0 * a * c
19
20 2560000  10963318   4.3      7.5     if (disc < 0.0):
21 2539908   5403624   2.1      3.7         return None
22                                      else:
23   20092     75076   3.7      0.1         e = math.sqrt(disc)
24   20092    104950   5.2      0.1         denom = 2.0 * a
25   20092    115956   5.8      0.1         t = (-b - e) / denom
26   20092     83382   4.2      0.1         if (t > 1.0e-7):
27   20092    525272  26.1      0.4            normal = (temp + t * ray.direction)\
                                                           / self.radius
28   20092    333879  16.6      0.2            hit_point = ray.origin + t * \
                                                              ray.direction
29   20092    299494  14.9      0.2            return ShadeRecord.ShadeRecord(
                                                         normal=normal,
                                                         hit_point=hit_point,
                                                         parameter=t,
                                                         color=self.color)

Therefore, it appears that most of the time is spent in this chunk of code:

temp = ray.origin - self.center
a = numpy.dot(ray.direction, ray.direction)
b = 2.0 * numpy.dot(temp, ray.direction)
c = numpy.dot(temp, temp) - self.radius * self.radius
disc = b * b - 4.0 * a * c

We cannot really optimize much. We could precompute self.radius * self.radius, but it does not really have an impact. Something we can observe is the huge amount of routine calls. Is the routine call overhead relevant ? Maybe: Python has a relevant call overhead, but a very simple program like this

def main():
    def f():
        return 0
    a=0
    for i in xrange(2560000):
        if f():
            a = a+1

    print a

main()

is going to take 0.6 seconds, not small, but definitely not as huge as the numbers we see. Why is that ? And why is the raytracer so slow for the same task ? I think the bottleneck is somewhere else.

Finding the problem

I decided to profile World.render() to understand what’s going on: this is the routine in charge of going through the pixels, shooting the rays, then delegating the task of finding intersections to Tracer.trace_ray, which in turns re-delegates the task to World.hit_bare_bone_object. I don’t really like this design, but I stick to the book as much as possible, mostly because I don’t know how things will become later on.

The profiling showed two hot spots in World.render(), in the inner loop:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================

    41    640000     18786192     29.4     29.2  ray = Ray.Ray(origin = origin,
                                                               direction = (0.0,0.0,-1.0))
    42
    43    640000     22414265     35.0     34.9  color += numpy.array(tracer.trace_ray(ray))

Why is it so slow to perform these two operations? It turns out that numpy is incredibly slow at creating arrays. This may indeed be the reason why it’s so slow to instantiate a Ray object (two numpy.arrays), to add the color (another instantiation) and to perform operations in the Sphere.hit slow lines. At this point I’m not sure I can trust numpy.array, and I decide to remove it completely replacing arrays with tuples. The result is pleasing

$ time python test3.py
real	0m31.215s
user	0m29.923s
sys	0m2.355s

This is an important point: tuples are much faster than small arrays. numpy seems to be optimized for large datasets and performs poorly when handling small ones. This includes not only the creation of the arrays, but also any operation in numpy that may create numpy arrays as a consequence, such as calling numpy.dot on two tuples instead of a trivial implementation such as

def dot(a,b):
    return a[0]*b[0]+a[1]*b[1]+a[2]*b[2]

in fact, if I use numpy.dot on tuples in Sphere.hit():

 a = numpy.dot(ray.direction, ray.direction)
 b = 2.0 * numpy.dot(temp, ray.direction)
 c = numpy.dot(temp, temp) - self.radius * self.radius

the total running time goes from 31 seconds to a staggering 316 seconds (5 minutes). My guess is that they are converted to numpy.arrays internally, followed by the actual vector-vector operation.

I call myself happy with a runtime of 30 seconds for now, and plan to optimize further when more complex operations are performed. You can find the version for this post at github.

NASA image shows tree density in United States

This impressive image, released a few months ago, shows the tree density in the United States (click to enlarge)

According to the NASA Earth Observatory website, the map has been obtained from a variety of sources, from ground data collection to space-based radar, requiring 6 years of work. At the highest resolution, ten pixels correspond to an hectare of land. It shows how little of the territory is actually covered in trees, either because of territory constraints (height, climate) or because of agriculture needs. As I said in a previous post, trees are fundamental (and cheap) carbon dioxide gatherers, but they also tend to require a lot of water and proper climate.

Academia StackExchange started

More than a year ago, I decided to propose a Question/Answer site for Academia on the StackExchange platform. The idea was to gather academics to share their expertise in academic career strategies, grant proposals, publication process, visa and immigration for academics, and generally all the troubles of an academic lifestyle.

A few days ago, the site finally reached the threshold commitment, went through private beta, and is now open to the public in beta stage. It is now possible to join in and ask questions or give answers.

Mountain Lion: freedom is no longer cool.

This is a rant.

Here is a 10 years long Apple customer, generally satisfied with the quality of the product, but my stance started to change recently, and radically. Apple is pushing it too far. As I smelled long ago with the introduction of the App marketplace, their plan apparently is to make OSX like iOS: Apple uniquely decides what can or can’t run on your computer, it can remove software remotely from your device, force you to pay to develop and release your software, wants to inspect your code to approve it, and wants a share of your profits. If you didn’t expect it, then take a look at Mountain Lion “Gatekeeper” option, which, rest assured, will magically disappear as soon as convenient. Possible scenarios include:

  • Your software is “politically incorrect”? Won’t work on macs.
  • Your software competes with Apple’s applications ? Won’t work on macs.
  • Your software is stomping on someone else’s feet? Won’t work on macs.
  • Want to write some code ? Pay the Apple tribute or it won’t work on macs.
  • Have a scripting language interpreter ? Won’t work on macs.

I left Windows because I considered it a trap, and I moved to Linux. I then left Linux for OSX because it gave me the freedom of a Unix environment with a nice and polished interface. I am not your slave Apple. Keep your controlling crap away from my computer, away from my life.

More jaw-dropping advances in prosthetics

I already posted about advances in prosthetics some time ago, but this is beyond words. You may want to apply the wadsworth constant to the movie.

A raytracer in python – part 3: samplers

In the previous post, we explored a very basic way of plotting images: shooting a ray from the center of every pixel, and plot the color of the object we hit. The result is a rather flat, very jagged image

Border jagging arises from the fact that we are sampling with a discrete grid (our ViewPlane) an object that is smooth due to its functional expression (our spheres). The sampling is by nature approximated, and when mapped to pixels it produces either the color of the object, or the color of the background.

We would like to remove the jagged behavior. To achieve this, we could increase the resolution. That would make the jaggies less appreciable thanks to a higher number of pixels. An alternative is to keep the lower resolution, and shoot many rays per each pixels instead of only one (a technique called supersampling). To compute the final color, weighting is performed according to the number of rays that impact vs. the total number of rays sent from that pixel. This technique is known as anti-aliasing.

The figure details visually what said above: the real description of the sphere is smooth (left hand figure). Shooting one ray per pixel, and coloring the pixel according to hit/no-hit, produces either a fully colored pixel, or a fully background pixel (center). With supersampling, we shoot a higher number of rays, and per each pixel we perform weighting of the color.

As a result, the jaggies in the sphere are replaced with a smoother, more pleasant transition

Choice of Samplers

The pattern used for the supersampled rays is important. The module samplers in the python raytracer implements three common strategies.

Regular sampling

It is the most intuitive and easy, and the one used above: a regular grid of rays. It is easy to implement but it can introduce unpleasant artifacts for more complex situations. Plotting the position of the rays in the pixel will produce the following layout (for a 8×8 supersample grid)

As we see, the layout is regular on the grid of subcells (painted yellow and white for better visualization) that define the pixel. On the vertical and horizontal distributions (plotted on the grey bars above and on the left) we also see a regular distribution, but its regularity may introduce artifacts in some cases.

Random sampling

Random sampling positions the rays at random within the pixel. This may sound appealing, but it may instead end up as suboptimal: it can produce random clumping, in particular for a small number of samples. This will unbalance the weighting leading to an incorrect evaluation. Plotting one distribution one may obtain

Note the uneven distribution of the points, leaving large parts not sampled and other parts oversampled. In addition, the vertical and horizontal distribution tend to be uneven.

Jittered sampling

Jittered sampling takes the best of both worlds: the regularity of the Regular sampler with a degree of randomness from the Random sampler. The idea is to select the center of each subcell and apply randomization, so that each subcell produces only one ray, but without the artifact inducing regularity proper of the Regular sampler.

Computational cost impact

Unfortunately, performing supersampling makes the creation of the image considerably slower. In the following table you can see the timings (in seconds) for Jittered and Regular sampling, compared against the case with no sampling. The size of the image rendered is 200×200 pixels.

Sample size/sets No sampling Regular Random Jittered
1/1 23
4/1 68 63 160
9/1 158 152 271
16/1 229 235 276
16/2 223 247 267
16/4 240 277 260

As we can see, supersampling introduces a considerably higher computational effort. We also see that having multiple sets (a requirement to prevent the same set of subsamples to be reused for adjacent pixels, something that again would introduce artifacts) does not really change the timings. According to the current implementation, I expect this to be verified. On the other hand, I don’t expect timings for Regular and Jittered to be so different, since the creation of the values is performed once and for all at startup. This is worth investigating while looking for performance improvement. In the next post I will perform profiling of the python code and check possible strategies to reduce the timings, eventually revising the current design.

Current implementation

The current implementation of python-raytrace can be found at github. In this release, I added the samplers. Samplers are derived classes of the BaseSampler class, and are hosted in the samplers module. Derived Samplers must reimplement _generate_samples. Points are stored because we want to be able to select sets at random as well as replay the same points. Samplers also reimplements the __iter__() method as a generator of (x,y) tuples, with x and y being in the interval [0.0, 1.0). Once initialized, the Sampler can therefore be iterated over with a simple

for subpixel in sampler:
    # use subpixel

The World class can now be configured with different Samplers for the antialiasing. The default Sampler is a Regular 1 subpixel Sampler, which is the original one-ray-per-pixel sampling.

Linear algebra courses at MIT from Prof. Gilbert Strang

Linear algebra is fundamental mathematical knowledge for those who need to perform computational natural sciences. It is a neat formalism to express things in a compact way, and describe precious algorithms to solve computational problems from chemistry, physics, astronomy, and so on.

I found these precious and very clear lectures from MIT professor Gilbert Strang. The lectures are available to the general public under the MIT OpenCourseWare distant learning initiative. I am pretty sure not to say anything excessive when I claim that such initiatives should be declared Intangible Heritage of Humanity due to their important role in scientific advancement and education.

Here is the list of the lectures, linked to the proper material on the MIT OCW website:

  1. The geometry of linear equations
  2. Elimination with matrices
  3. Multiplication and inverse matrices
  4. Factorization into A=LU
  5. Transposes, permutation, spaces R^n
  6. Column space and nullspace
  7. Solving Ax=0: pivot variables, special solutions
  8. Solving Ax=b: row reduced form R
  9. Independence, basis and dimension
  10. The four fundamental subspaces
  11. Matrix spaces; rank 1; small world graphs
  12. Graphs, networks, incidence matrices
  13. Quiz 1 review
  14. Orthogonal vectors and subspaces
  15. Projections onto subspaces
  16. Projection matrices and least squares
  17. Orthogonal matrices and Gram-Schmidt
  18. Properties of determinants
  19. Determinant formulas and cofactors
  20. Cramer’s rule, inverse matrix, and volume
  21. Eigenvalues and Eigenvectors
  22. Diagonalization and powers of A
  23. Differential equations and exp(At)
  24. Markov matrices; Fourier seriesQuiz 2 review
  25. Symmetric matrices and positive definiteness
  26. Complex matrices; fast Fourier transform
  27. Positive definite matrices and minima
  28. Similar matrices and Jordan form
  29. Singular value decomposition
  30. Linear transformations and their matrices
  31. Change of basis; image compression
  32. Quiz 3 review
  33. Left and right inverses; pseudoinverse
  34. Final course review

The third eye of the iguana

Did you know that iguanas (and not only them) have three eyes ?

the feature is called Parietal eye, it is highly advantageous to spot the shadow of a predator coming from above, and it’s not unique to iguanas, but it can be found in frogs, lizards, sharks and other reptiles and fishes. In the Tuatara the eye is way more developed, but it regresses as the animal gets older. All these eyes can influence the pineal gland, apparently responsible for the circadian rhythm regulation.

Offsetting carbon emissions with trees: a little math

I want to do a thought experiment, hoping not to make any mistake.

The world produces 29,888,121,000 metric tons of CO2 per year. According to Wikipedia, one ton of dry wood sequestrates 1.8 tons of CO2. If you put these two together, you quickly realize that you need to produce 16,604,511,666 tons of wood per year in order to offset the worldwide CO2 production.

The largest single stem tree available today is the General Sherman giant sequoia. It is 83.8 meters tall, with a diameter of 7 meters. Its estimated dry weight is 1121 tons. It took thousands of years to reach that size, and it’s a unique tree, but I will use it as a greatly exaggerated upper limit to the size and mass of a tree. I estimate a normal tree as a fifth of that, or 250 tons.  In order to compensate the CO2 emission, we need to plant and grow to full size 66 millions trees. Assuming the tree occupies an area of 10 x 10 square meter this adds up to a required forest size of 6600 square kilometers, which is one sixth of Switzerland. Every year.

Of course, I am not considering the fact that a tree must grow to full size in order to offset the CO2. If we do, and if we assume it takes 20 years for a tree to reach the size given above, we need to plant a little more than three Switzerlands of trees right now, to offset the CO2 production of the next 20 years, for a grand total of 1.3 billion trees. Considering the population of Europe is 852 million people, it means that each of us needs to plant a little more than one tree.

I’m working on that. And you ?

 

A raytracer in python – part 2: rendering multiple objects

A quick addition needed to the raytracer is providing freedom to add more objects to the rendering scene. In Part 1, the design was such that only one object, a sphere, could be drawn. The new code allows much more flexibility. I added a Plane object, introduced assignment of colors to the objects, divided the source into multiple files, and fixed a bug relative to rendering direction. Let’s see more specifically.

The class World is the main interface to user programming of the raytracer. This small program creates and renders a scene containing four spheres, a white one in the origin, the others along the axes, each with different colors.

import raytrace
from raytrace import objects

w=raytrace.World()
w.add_object(objects.Sphere(center=(0.0,0.0,0.0),
                            radius=10.0,
                            color=(1.0,1.0,1.0)
                           )
            )
w.add_object(objects.Sphere(center=(50.0,0.0,0.0),
                            radius=10.0,
                            color=(1.0,0.0,0.0)
                           )
            )
w.add_object(objects.Sphere(center=(0.0,50.0,0.0),
                            radius=10.0,
                            color=(0.0,1.0,0.0)
                           )
            )
w.add_object(objects.Sphere(center=(0.0,0.0,50.0),
                            radius=10.0,
                            color=(0.0,0.0,1.0)
                            )
            )
w.render()

The resulting image is the following

Clearly, the white sphere is not visible, as it’s hidden by the blue sphere. This test allowed me to discover a problem with orientation: the green sphere was on the wrong side. I therefore had to analyze a bit the orientation and the different coordinate systems into play here

  1. The measure of the screen is given in pixels. We are used to this when it comes to screen size, for example. An image which is 320×200 means that it’s 320 pixels wide (horizontal resolution) and 200 pixel high (vertical resolution). Don’t fall into “thinking matrix”, where NxM means N rows x M columns. It is the exact opposite.
  2. The geometry uses the cartesian system, which has the origin in the center of the picture. The x axis is oriented towards the right, the y axis towards the top, and the z axis towards the observer. This is not different from a traditional cartesian layout: for the z=0 plane, the top left pixel correspond to a (-,+) coordinate, the bottom right to a (+,-) coordinate. Points closer to the observer have positive z, honoring the right hand system. The camera in the above picture is at z=+100.0
  3. The raytracer uses pixel coordinates for the viewplane. Pixel 0,0 is at the bottom left. Changing the first index moves horizontally (along the row) from left to right. Changing the second index moves vertically (along the column) from bottom to top. As a consequence, the point at the bottom right is (hres-1,0), and at the top left is (o, vres-1). Note that this is equivalent to a cartesian (x,y) system with origin on the bottom left corner. Not a surprise, since there is a direct mapping between the pixels and the rays’ origins.
  4. Finally, the pixel image indexing of pygame and PIL. For them, pixel 0,0 is top left. Like the case above, incrementing the first index also moves along the row from left to right. However, incrementing the second index moves vertically from top to bottom, which is the opposite of the raytracing index. Bottom left is (0, vres-1) and the bottom right is (hres-1, vres-1).

A remapping is therefore needed from the pixel coordinate of the rendering and the pixel coordinate of the display (e.g. pygame). The transformation is trivial, of course, but it must be kept into account, otherwise pics will be flipped horizontally.

Another interesting fact is that, according to “Ray Tracing from the ground up” it’s useful to perform the raytracing operation starting from the bottom and working our way up, rendering pixels from left to right. According to them this is for coding symmetry and future convenience, so we stick to it.

Finding the foremost object

In the book, finding the foremost object is made more complex by the language used, C++. In python, you can use functional programming style to obtain the same in a very concise statement. The idea is to cast a ray, then go through all objects in the world to find the intersection point (if any) between the ray and the object. If more than one object is hit, the one with the intersection point closer to the observer commands the pixel color, since it’s in front.

I achieve this with the following code

def hit_bare_bones_object(self,ray):
   def f(o):
     shadeRec = o.hit(ray)
     if shadeRec:
       return (shadeRec.parameter, o)
     else:
       return None

   try:
     foremost=sorted( \
                filter(lambda x: x is not None, \
                  map(f, self.objects)
                ), key=lambda x: x[0]
              )[0][1]
   except IndexError:
     return None

   return foremost

What does this code do ? I defined a simple internal function f which accepts an object, performs the hit and returns a tuple containing the hit point position (as a parameter of the ray, so it’s a single number, not a xyz coordinate) and the object.

Now, I use this function to map all the objects defined in the world. I will obtain a list with one entry per each object, either a None (not hit) or a 2-tuple containing the parameter and the hit object. I filter out the None entries, leaving only the 2-tuples and then sort according to their first element. The 2-tuple with the lowest parameter is now at index 0, and the [1] element of this tuple is the foremost object. At any time, the list may be empty (such as if you don’t have any object,  or no object is hit. In that case, a IndexError will be raised and that will indicate that the ray hit nothing. I may rework on this function later on, but for this second round, it suits my needs.

It’s now time to move on to samplers. Given that the code is growing in size, I created a git repository you can clone from. The release of this post is available here. The code is under BSD license.