Can't add widgets in Qt Designer? Don't use Unity


I got around a problem in Qt designer that didn't allow me to drop any widget. When I tried, I obtained the barred circle mouse pointer, instead of the plus. When dropped, the drop operation did not complete, and the receiving widget remained empty. Apparently, it's due to some form of collision between designer and Unity (the graphic environment on Ubuntu). I switched to Gnome and now designer works. Note that I got it to work occasionally on Unity as well, but I was never able to reproduce the exact conditions that allowed it.

Git stash size in the command line prompt


Too often I get confused with git when it comes to stashes. I tend to stash often, as I jump from a task to another or from a branch to another, but it already happened I forgot I stashed something. The stash grows and I don't remember what each patch contains. Fortunately, I never really end up doing duplicate work, but this is doomed to happen if I don't take appropriate measures.

I concocted this function and bash prompt to present (in proper color no less) the current amount of stashed items. This way I always know if I have stashes around

function git_stash_size {
 lines=$(git stash list -n 100 2> /dev/null) || return
 if [ "${#lines}" -gt 0 ]
   count=$(echo "$lines" | wc -l | sed 's/^[ \t]*//') # strip tabs
   echo " ["${count#} "stash] "
# Comment in the above and uncomment this below for a color prompt
$(__git_ps1 " (%s)")\[\033[00m\]\[\033[01;31m\]$(git_stash_size)\[\033[00m\]\$ '

Reformat properly as a single line.

A good lesson in python and unicode


Ever wanted to understand more about unicode in python? This talk is a good explanation on how to deal with it properly

Difference between mpiexec and mpirun?

A few days ago I started playing with MPI, and I started wondering: "what's the difference between mpiexec and mpirun?" It turns out that the distinction is mostly historical. In the first MPI specifications, there was nothing defining how the executables should run. Implementors of the specifications created an mpirun executable, but each implementation had different switches and different behavior. The MPI2 standard filled this gap, but it was not possible to merge the different implementor-dependent behaviors that became established in the meantime. The solution was to standardize the utility name as mpiexec. As a consequence, MPI2 compliant implementations will generally have both: mpiexec to honor the standard, and mpirun to honor compatibility with their previous implementation.

A pythonic way out of the GPL restrictions in MySQL client library

Date Tags GPL

I recently became aware of this native Python package PyMySQL. The package has one important benefit vs. the other solutions to talk to a MySQL server, such as MySQLdb (AKA mysql-python) , namely, it reimplements the MySQL protocol, instead of binding to the MySQL connector library (also known as libmysqlclient). Why is this an issue? Well, because the MySQL connector library is GPL, and you can't bind against GPL code unless your code is under a GPL-compatible license. This excludes all commercial uses, and makes all derivative works of libmysqlclient GPL as well, including the Python binding MySQLdb. If you thought about circumventing the problem using unixodbc, tough luck: the ODBC MySQL connector is also GPL, thus making unixodbc GPL as well.

Despite the low version number, PyMySQL, seems to be working, has no dependencies, it's pure python, and it is released under the very liberal MIT license.

A raytracer in python – part 5: non-planar samplers


In this post we are going to describe and implement non-planar samplers. In the previous post about samplers, we implemented and characterized different planar samplers to make antialiasing possible. The characteristic of these samplers was to produce regular or random points on a plane with x,y between zero and one. To implement effects such as simulation of lenses behavior, reflections and so on, we also need to be able to shoot rays according to geometrical patterns other than the plane. More specifically, we need to be able to map points on a disk (to simulate lenses) or on a hemisphere (to simulate other optical effects such as reflections), while at the same time preserving the good characteristics of the random distributions outlined in the planar case.

To achieve this, the samplers now implement two new methods, BaseSampler.map_samples_to_disk() and BaseSampler.map_sampler_to_hemisphere(). They are in charge of remapping the planar distribution to a disk or to a hemisphere, but with a couple of twists: in the disk remap, the points in the range [0:1] must be remapped to a full circle from [-1:1] in both axes, so to cover the circle completely while preserving the distribution. This is done through a formulation called Shirley's concentric maps.


In the hemisphere remapping, we also want to introduce a variation in the density so that it changes with the cosine of the polar angle from the top of the hemisphere. In other words, we want an adjustable parameter e to focus the point density closer to the top of the hemisphere.


We will need the characteristics of this distributions later on, when we will have to implement reflections and other optical effects. As you can see from the above plot, higher values of the parameter e produce a higher concentration of the points close to the top of the hemisphere. On the other hand, a low e parameter tend to produce a more uniform distribution over the full hemisphere.

To obtain the points, the sampler object has now three methods to request an iterator. We are no longer iterating on the object itself, because we need to provide three different iteration strategies. Methods BaseSampler.diskiter(), BaseSampler.hemisphereiter() and BaseSampler.squareiter(), each returning a generator over the proper set of points. Note that the hemisphere point generator returns 3D points, differently from the other two returning 2D points.

You can find the code for this post at github.

Calling a C routine from python, the easy way


I think it may be interesting for others to see how to call easily a C routine from python, without implementing a python module. What you need is the ctypes module. Remeber however that apparently the use of this module is generally frowned upon, at least according to a note I found in PEP 399:

Usage of ``ctypes`` to provide an API for a C library will continue to be frowned upon as ``ctypes`` lacks compiler guarantees that C code typically relies upon to prevent certain errors from occurring (e.g., API changes).

although to be honest, it may be in the context of the PEP itself, and not as a general recommendation.

Nevertheless, suppose you want to call the function

double pow(double, double)

in the standard math library.

The first thing to do is to define the prototype of the function. You achieve this via the following:

prototype=ctypes.CFUNCTYPE(ctypes.c_double, ctypes.c_double, ctypes.c_double)

The call to ctypes.CFUNCTYPE creates a prototype object for a C function that returns a double (the first argument) and accepts two double (the second and third arguments).

Now you have a prototype. This entity is a class

>>> prototype
<class 'ctypes.CFunctionType'>

and you can bind this prototype to the actual function in the math library with the following. First you create an object representing the library

>>> dll=ctypes.cdll.LoadLibrary("libm.dylib") # on Mac. Linux use

and then you bind to the actual function

>>> pow = prototype(("pow", dll))
>>> pow(3,4)

This just brushes the surface, but I wanted to make a simple introductory post.

Copying and pasting from the python interpreter


One very powerful feature of python is the interactive interpreter: it allows you to test and quickly evaluate snippets of code. Occasionally, I need to rerun the same code, either during the same or another python interpreter session. One quick way to achieve would be to copy and paste the code again, but you quickly realize the prompt makes it hard:

>>> for i in xrange(10):
...     print i+10
...     print i-10
...     print i/2
and so on

if you directy copy and paste the above snippet, it clearly won't work due to the presence of the prompts:

>>> >>> for i in xrange(10):
 File "<stdin>", line 1
 >>> for i in xrange(10):
SyntaxError: invalid syntax
>>> ...     print i+10

Frustrated, I decided to solve the problem once and for all: I created a .pythonrc file where I override the normal ">>> " prompt to a header prompt, and the continuation prompt to the empty string:

import sys
sys.ps1='--- [Python] ---\n'

Then, I added the PYTHONSTARTUP variable in my .bash_profile to refer to this file:

export PYTHONSTARTUP=$HOME/.pythonrc

Now my interactive session looks like this

Python 2.7.1 (r271:86832, Feb 27 2011, 20:04:04)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
--- [Python] ---
for i in xrange(2):
        print i

--- [Python] ---

I am now free to copy and paste the above code and replay it in the interpreter, or later on, directly into a vim session (but I will have to change tab spacing).

3d plot of earthquakes in Ferrara-Modena-Bologna


The earthquakes that recently hit my hometown and region triggered a natural sequence of minor quakes. Thanks to the available data, I was able to plot the 3d representation of the ipocenters. Red points represent all the quakes from the start of the seismic sequence to the 3rd of June. Points in purple represent particularly strong quakes (5.0 or higher). Blue squares represent cities. I suggest to enable the 720p version.

A raytracer in python – part 4: profiling


After having finally obtained a raytracer which produces antialiasing, it is now time to take a look at performance. We already saw some numbers in the last post. Rendering a 200x200 image with 16 samples per pixels (a grand total of 640.000 rays) takes definitely too much. I want to perform some profiling with python, find the hotspots in the code, and eventually devise a strategy to optimize them.

General profiling with cProfile

To perform basic profiling, I used the cProfile program provided in the standard library. It appears that the longest processing time is in the hit() function

$ python -m cProfile -s time
 Ordered by: internal time

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 2560000  85.329    0.000  108.486    0.000
 1960020  30.157    0.000   30.157    0.000 {numpy.core.multiarray.array}
 7680000  23.115    0.000   23.115    0.000 {}
 1        19.589   19.589  195.476  195.476
 2560000   7.968    0.000  116.454    0.000
 640000    6.710    0.000  133.563    0.000
 640025    4.438    0.000  120.902    0.000 {map}
 640000    3.347    0.000  136.910    0.000
 640000    3.009    0.000    3.009    0.000 {numpy.core.multiarray.zeros}
 640000    2.596    0.000    2.613    0.000 {sorted}
 640000    2.502    0.000    3.347    0.000 {filter}
 640000    1.835    0.000   16.784    0.000

This does not surprise me, as the main computation a raytracer performs is to test each ray for intersection on the objects in the scene, in this case multiple Sphere objects.

Profiling line by line for hot spots

Understood that most of the time is spent into hit(), I wanted to perform line-by-line profiling. This is not possible with the standard python cProfile module, therefore I searched and found an alternative, line_profiler:

$ easy_install-2.7 --prefix=$HOME line_profiler
$ -l
Wrote profile results to
$ python -m line_profiler

Before running the commands above, I added the @profile decorator to the method I am interested in. This decorator is added by line_profiler to the __builtin__ module, so no explicit import statement is needed.

class Sphere(object):
    def hit(self, ray):

The results of this profiling are

Line # Hits  Time    Per Hit  % Time Line Contents
12                                   @profile
13                                   def hit(self, ray):
14 2560000  27956358  10.9     19.2     temp = ray.origin -
15 2560000  17944912   7.0     12.3     a =, ray.direction)
16 2560000  24132737   9.4     16.5     b = 2.0 *, ray.direction)
17 2560000  37113811  14.5     25.4     c =, temp) \
                                              - self.radius * self.radius
18 2560000  20808930   8.1     14.3     disc = b * b - 4.0 * a * c
20 2560000  10963318   4.3      7.5     if (disc < 0.0):
21 2539908   5403624   2.1      3.7         return None
22                                      else:
23   20092     75076   3.7      0.1         e = math.sqrt(disc)
24   20092    104950   5.2      0.1         denom = 2.0 * a
25   20092    115956   5.8      0.1         t = (-b - e) / denom
26   20092     83382   4.2      0.1         if (t > 1.0e-7):
27   20092    525272  26.1      0.4            normal = (temp + t * ray.direction)\
                                                           / self.radius
28   20092    333879  16.6      0.2            hit_point = ray.origin + t * \
29   20092    299494  14.9      0.2            return ShadeRecord.ShadeRecord(

Therefore, it appears that most of the time is spent in this chunk of code:

temp = ray.origin -
a =, ray.direction)
b = 2.0 *, ray.direction)
c =, temp) - self.radius * self.radius
disc = b * b - 4.0 * a * c

We cannot really optimize much. We could precompute self.radius * self.radius, but it does not really have an impact. Something we can observe is the huge amount of routine calls. Is the routine call overhead relevant ? Maybe: Python has a relevant call overhead, but a very simple program like this

def main():
    def f():
        return 0
    for i in xrange(2560000):
        if f():
            a = a+1

    print a


is going to take 0.6 seconds, not small, but definitely not as huge as the numbers we see. Why is that ? And why is the raytracer so slow for the same task ? I think the bottleneck is somewhere else.

Finding the problem

I decided to profile World.render() to understand what's going on: this is the routine in charge of going through the pixels, shooting the rays, then delegating the task of finding intersections to Tracer.trace_ray, which in turns re-delegates the task to World.hit_bare_bone_object. I don't really like this design, but I stick to the book as much as possible, mostly because I don't know how things will become later on.

The profiling showed two hot spots in World.render(), in the inner loop:

Line #      Hits         Time  Per Hit   % Time  Line Contents

    41    640000     18786192     29.4     29.2  ray = Ray.Ray(origin = origin,
                                                               direction = (0.0,0.0,-1.0))
    43    640000     22414265     35.0     34.9  color += numpy.array(tracer.trace_ray(ray))

Why is it so slow to perform these two operations? It turns out that numpy is incredibly slow at creating arrays. This may indeed be the reason why it's so slow to instantiate a Ray object (two numpy.arrays), to add the color (another instantiation) and to perform operations in the Sphere.hit slow lines. At this point I'm not sure I can trust numpy.array, and I decide to remove it completely replacing arrays with tuples. The result is pleasing

$ time python
real    0m31.215s
user    0m29.923s
sys 0m2.355s

This is an important point: tuples are much faster than small arrays. numpy seems to be optimized for large datasets and performs poorly when handling small ones. This includes not only the creation of the arrays, but also any operation in numpy that may create numpy arrays as a consequence, such as calling on two tuples instead of a trivial implementation such as

def dot(a,b):
    return a[0]*b[0]+a[1]*b[1]+a[2]*b[2]

in fact, if I use on tuples in Sphere.hit():

a =, ray.direction)
b = 2.0 *, ray.direction)
c =, temp) - self.radius * self.radius

the total running time goes from 31 seconds to a staggering 316 seconds (5 minutes). My guess is that they are converted to numpy.arrays internally, followed by the actual vector-vector operation.

I call myself happy with a runtime of 30 seconds for now, and plan to optimize further when more complex operations are performed. You can find the version for this post at github.