The subtle art of writing a code example

One of the most frustrating experiences when learning a new technology is finding useless examples. An example is the most precious thing that comes with a new library, language, or technology. It must be a starting point, a wise and unadulterated explanation on how to achieve a given result. A perfect example must have the following characteristics:

  • Self contained: it should be small enough to be compiled or executed as a single program, without dependencies or complex makefiles. An example is also a strong functional test if you correctly installed the new technology. The more issues could arise, the more likely is that something goes wrong, and the more difficult is to debug and solve the situation.
  • Pertinent: it should demonstrate one, and only one, specific feature of your software/library,  involving the minimal additional behavior from external libraries.
  • Helpful: the code should bring you forward, step by step, using comments or self-documenting code.
  • Extensible: the example code should be a small “framework” or blueprint for additional tinkering. A learner can start by adding features to this blueprint.
  • Recyclable: it should be possible to extract parts of the example to use in your own code
  • Easy: An example code is not the place to show your code-fu skillz. Keep it easy.

And here comes the helpful acronym: SPHERE. Yes, I looked for the correct synonyms to make it, after the main concepts were in place.

Prototypical examples of violations of those rules:

Violation of self-containedness: an example spanning multiple files without any real need for it. If your example is a python program, keep everything into a single module file. Don’t sub-modularize it. In Java, try to keep everything into a single class, unless you really must partition some entity into a meaningful object you need to pass around (and java mandates one class per file, if I remember correctly).

Violation of Pertinency: When showing how many different shapes you can draw, adding radio buttons and complex controls with all the possible choices for point shapes is a bad idea. You de-focalize your example code, introducing code for event handling, controls initialization etc., and this is not part the feature you want to demonstrate, they are unnecessary noise in the understanding of the crucial mechanisms providing the feature.

Violation of Helpfulness: code containing dubious naming, wrong comments, hacks, and functions longer than one page of code.

Violation of Extensibility: badly factored code that have everything into a single function, with potentially swappable entities embedded within the code. Example: if an example reads data from a file and displays it, create a method getData() returning a useful entity, instead of opening the file raw and plotting the stuff. This way, if the user of the library needs to read data from a HTTP server instead, he just has to modify the getData() module and use the example almost as-is. Another violation of Extensibility comes if the example code is not under a fully liberal (e.g. MIT or BSD) license.

Violation of Recyclability: when the code layout is so intermingled that is difficult to easily copy and paste parts of it and recycle them into another program. Again, licensing is also a factor.

Violation of Easiness: Yes, you are a functional-programming nerd and want to show how cool you are by doing everything on a single line of map, filter and so on, but that could not be helpful to someone else, who is already under pressure to understand your library, and now has to understand your code as well.

And in general, the final rule: if it takes more than 10 minutes to do the following: compile the code, run it, read the source, and understand it fully, it means that the example is not a good one.

Large quake occurs near Chile.

A magnitude 8.8 quake occurred in Chile. A tsunami warning has been issued. Very good details, and a map of the predicted height of the tsunami across all the pacific can be found at Phil Plait’s blog.

Apparently, Japan is not concerned about the Tsunami, which should hit around 3 AM GMT (12AM Japanese time). The advisories from the Japanese meteo/quake agency has no information about any critical measures to be taken.

Yesterday, a weaker but still relevant quake hit the island of Okinawa, in the south of the Japanese Archipelago.

New paper published on Journal of Physical Chemistry A

The paper I submitted some time ago at the Journal of Physical Chemistry A has been published: Borini S, Limacher PA, Luethi HP, “Structural Features Analysis and Nonlinearity of End-Cap-Substituted Polyacetylenes”, DOI: 10.1021/jp908439x

I already wrote about the findings reported in this paper at the time it was accepted. It is a very nice paper, and the very highly reputed journal makes me even more comfortable of its content being cool and insightful, in particular for experimentalists.

Eight molecules that changed the rules of the game: Diethyl Ether

With this post I want to start a series about single molecules whose synthesis, discovery, or explanation had such dramatic effects for humanity to produce a complete paradigm shift for daily life or scientific insight. On purpose, I left out the “big ones”: you will not find DNA in this list, nor you will find vitamins. Instead, I will focus on small, apparently insignificant compounds of a handful of atoms, whose legacy is so pervasive that we cannot imagine a world without it.

So… here we go.

Diethyl Ether

Rule changed: started the formal discipline of anesthesia

Diethyl Ether, also simply known as ether, chemical formula CH3-CH2-O-CH2-CH3, is not a particularly pleasant compound at first sight: it is highly flammable, with tendencies of explosiveness, toxic in high doses, and with an unpleasant, suffocating smell… but in fact it is a really precious substance.

Diethyl Ether

Diethyl Ether

Ether was the breakthrough that started the formal discipline of anesthesia. Before 1842, having a surgery or a tooth extraction was synonymous with excruciating, horrible pain. There was basically no reliable anesthetic available. Traditional methods against pain were based on either alcohol, opium, mixtures of herbs or similar drugs, all suboptimal: their effect is difficult to control, sometimes partial, and often dangerous in the required dose. In the first half of the 19th century, not many candidates were available as potential anesthetics.

Known and synthesized hundreds of year earlier, ether was not the only substance known to demonstrate anesthetic properties. Chloroform, chemical formula CHCl3 is another one, but it is strongly toxic. Improperly used, it is fatal. A better compound was known to be Nitrous Oxide, chemical formula N2O, and also known as “laughing gas”

Nitrous oxide

Nitrous oxide

Produced for the first time in 1775 by Joseph Priestley, this colorless, harmless and slightly sweet gas did not receive much attention until Humphry Davy, aged 21, decided to take a sniff… you know, as an experiment. The result: tingling sensation, sensitivity to sounds, hallucinations, and a very pleasant euphoria. In the spirit of scientific sharing, in particular when you are 21 and science gives you the air of paradise, big public parties were organized. People got inebriated and apparently had a good time, so good that N2O became a de-facto alternative to alcohol. The distraction perpetuated by the recreational use of nitrous oxide was so strong that for more than 30 years nobody really took the time of using it as anesthetic.

Around the same time, ether was used for the same activity: getting high. There were some rumors of danger about it, so it remained less popular than nitrous oxide, but regardless of the rumors some people, in the US so-called “ether frolics“, continued using it. Among them was a 17 years old student, Philip Wilhite. Not what we would call a kind person: while getting inebriated with friends, they forced a passing-by black guy to breathe large quantities of ether. The poor guy fell completely unconscious, and only after an hour or so and much slapping by the local doctor he was able to, I assume, run away on his legs from that bunch of criminals.

In 1842, Wilhite became assistant for Crawford Long, a doctor and surgeon. Apparently, it was not unusual for Long’s team to throw wild ether parties, where it was also not uncommon to get bruises due to the ether-induced tumbling and falling. The realization that no memory remained about the bruises, together with Wilhite experience with the poor black guy years earlier, made them realize they found something interesting. They performed the first dental surgery under anesthesia, to remove a tumor, and it was successful, but they did not publish their findings until 1848, although Long openly demonstrated his findings to other colleagues.

In the meantime, Horace Wells, a dentist, pioneered the use of nitrous oxide during dental surgery. Unfortunately, the process did not work during a crucial presentation at the Massachusetts General Hospital , and Wells was strongly derided by the audience. He ended up living in shame until he took his own life while in jail. However, his disappointing result set the stage for the success of his former associate, William Morton, in 1846.

Morton, a dentist as well (although he never graduated), used ether for his patients. Apparently, he got the idea from his tutor Charles Jackson, from Wells, and hearing about the “ether frolics”. He was invited for a demonstration of his claims at the very same hospital where Wells had his personal disgrace the year before, and successfully excised a tumor without any sign of pain from the patient. Morton claimed it was not using ether, but a substance he called “Letheon”, for which he applied a patent. Nobody believed him, as it was clear as sunlight it was ether. He got many enemies, and he never really obtained anything out of his efforts of getting money. He died poor at the relatively young age of 49. Nevertheless, the buzz generated around the issue was enough for him to be credited as the one who started anesthesia.

Except that he wasn’t. Crawford Long did it four years before him.
Today, Long (and partially Wilhite and the poor black guy) is recognized as the person who started it all, but he did not receive a commemorative monument, unlike Morton

Morton monument in Boston

Morton monument in Boston

Today, ether is no longer used as anesthetics, except in developing countries where cost is a major factor, mainly because it’s flammable, slightly toxic, and better alternatives exist. On the other hand, nitrous oxide is still used as mild anesthetic in dentistry. Interestingly, the jury is still out on the mechanism behind the anesthetic effect of these substances. After 150 years and all the improvements in our scientific techniques, we still don’t really know what exactly happens when we inhale ether or nitrous oxide.

Additional Links

http://en.wikipedia.org/wiki/Ernest_Duchesne

My business card, with QR-code. Geeky!

I ran out of business cards recently, so I had to make new ones. I took the chance to indulge a bit over the QR-Code, a two-dimensional barcode you can find on everything in Japan. It’s a pretty nice barcode system, very stable with respect to corruption and quite ok in terms of capacity. Among many others, there’s a library (ZebraCrossing) to encode and decode QR-codes, and also online services are available for this (to decode and encode). Google charts can also be used to encode.

Stefano Borini business card

Ok, it’s not particularly stylish (I’ve never been good in design and graphic arts), but it’s definitely geeky! Since I am an artistically impaired geek, this card represents me pretty well. The idea is that you can take a picture of it (with your cellphone, for example) and with proper software installed on the cellphone a vcard-like format (MECARD) gets automatically imported in your contacts. The QR-Code contains nothing more than the text on the card (with some omissions to stay small). I’m pretty sure I’m not the first person having this gadget on his business card, but I still haven’t seen it on someone else’s card.

Please donate for the Haiti earthquake

After the recent tragic events in Haiti, it is a priority to help doctors without borders as much as possible. Please donate, even a small amount from a lot of people can make a sizable difference.

How I ate Fugu and survived to tell the tale

Some time ago I had Fugu, or puffer fish, a highly poisonous fish with no known antidote. Here is a picture to document the fact

Well, it could just be me in front of something that looks like fish, and I’m not going to eat it anyway, but trust me, I had it. Yes, I wanted to take the risk of dying from tetrodotoxin poisoning. After all, we live only once (very appropriate), and since I’m here, why not try it out? Also, note the smile of a sushi lover dream come true.

So, how does Fugu work ? A specialized chef prepares the puffer fish with a proper cutting process, removing all parts containing poison, and leaving only the edible ones. In particular, the liver is among the deadly parts, and is therefore removed completely. Other parts, like the meat, the fins, parts of the head, are safe to consume, all going into a rather particular dinner. Unfortunately I was not able to watch the cutting process: the chef was behind a bench (you can see him in the picture), but apparently he did a proper job, since I am still alive.

Why is puffer fish so poisonous? The culprit is a substance it accumulates, tetrodotoxin, probably obtained from diet or produced from symbiotic bacteria ingested by the fish. This molecule disrupts nerve signal transmission leading to body paralysis, starting from the lips and tongue, then the hands, then to all the rest, including the diafragm. With no control on the diafragm, the victim is unable to breathe, and dies of asphyxiation. During the whole process, which occurs in a matter of hours, the victim is fully conscious and awake, just unable to move, speak, and (in the end) breathe. This is because tetrodotoxin is not able to enter the brain, leaving its nerve tissues unscathed. Scary isn’t it ? The poison is so powerful that 1 milligram (the quantity you can put on the tip of a pin) is enough to kill a human. A single pufferfish contains enough poison to kill tens of people. If you are taken early, kept breathing and get the toxin removed from your body, you can survive the poison and recover completely.

Nerve signal transmission is actuated by an exchange of sodium and potassium ions on the two different sides of the nerve cell membrane. The different ion concentration gives rise to a difference of potential, maintained at the expense of energy. There is an enzyme, known as the sodium-potassium pump, on the surface of the nerve cell membrane, with the task of keeping this unbalance by actively carrying three sodium ions outside the cell, and two potassium ions inside the cell at every cycle. The nerve cell stays “loaded and ready” to transmit the signal. When a signal transmission is triggered, sodium ions are allowed to flow back into the cell in a cascade event, trying to re-establish the equilibrium and suppress the gradient. This is made possible by another enzyme, a sodium transport channel. Tetrodotoxin binds strongly with this channel, thus preventing the sodium to enter the membrane. In some sense, it acts like a cork. Without this mechanism in place, the signal is no longer able to travel along the nerves down to the muscles, and paralysis ensues.

Is eating fugu really so dangerous? According to this site,  incidents are approximately less than 100 per year, with a 10 to 50 % mortality. Most, probably all, of these cases are untrained people eating their own catch. The probability of dying from a certified, experienced Fugu chef are close to negligible, and probably your life is more in danger while driving to the restaurant.

The price for a Fugu dinner is high: 30.000 Yen (230 Euro) for a full course dinner for me and my host, and it is definitely not worth it. The consistency remembers rubber band, and the taste is basically neutral. The dinner therefore focuses on additional herbs, sauces and preparation to please your senses, with the fish as an additional, risky business. Definitely interesting once in a lifetime, as a “been there, done that” story, but for a much lower price I can have a delicious Italian meal where my taste buds really get involved the right way.

References

  • http://emedicine.medscape.com/article/818763-overview
  • http://www.life.umd.edu/grad/MLfsc/zctsim/ionchannel.html
  • http://www.chm.bris.ac.uk/motm/ttx/ttx.htm

Image self consistency from xkcd

I love xkcd. A comic combining fun and math by definition has to be good and geeky and the author, Randall Munroe, is a real genius on this. The latest comic is pretty interesting

See xkcd for alt text

xkcd, by Randall Munroe

The image is self-descriptive, meaning that each graph represents information about the image itself. For example, the first panel contains a pie chart which says how many pixels are either white or black on the image. Clearly, the relative amount of black pixels in the image depends on the size of the slice of that piechart representing the amount of black pixels, a “chicken-egg” kind of problem. It is apparently difficult to obtain such image, because the plotted data must be consistent with themselves via the graphical representation. This kind of problems, where the solution depends on itself, is quite common in many scientific problems, and it’s solved through self-consistency.

The trick is as follows: we start with a first, approximate solution, called a guess, and we apply a method that gives us a result depending on this guess. Then, we take this newly obtained result, and reapply the method again, to obtain a new result, and then again, and again, until, hopefully, the input and the output of the method are the same. When this occurs, we solved our problem via self-consistency. Of course, this convergence is not guaranteed to occur, but if it occurs, we found a solution (there could be more than one).

Let’s see it in action in a simplified form. I wrote two small python programs. They use matplotlib and the Python Image Library. The first (called piechart.py) creates a pie chart from a given data input

import sys
from matplotlib import pyplot

white = int(sys.argv[1])
black = int(sys.argv[2])

pyplot.pie([white, black], colors=('w', 'k'))
pyplot.savefig(sys.argv[3], format="pdf")

If we call this program specifying two values (the absolute values are not important, as the pie chart shows relative amount), it draws the pie chart accordingly:

python piechart.py 100 400 piechart_100w_400b.pdf
convert -geometry 210x158 piechart_100w_400b.pdf piechart_100w_400b.png
Starting guess

Starting guess

This creates a pie chart where white is 1/5 of the pie chart area and black is 4/5. Please note that due to a setup problem of my matplotlib I can only create pdf, so I convert the pdf into png of defined size, in our case, 210×158, using the convert program. The total size of the image is of course important, having an influence on the total number of pixels. I chose a good value for presentation purposes which guarantees quick convergence.

The second program is called imagedata.py and extracts size and number of white and black pixels from an image.

import sys

from PIL import Image

im = Image.open(sys.argv[1])
white = 0
black = 0
for i in im.getdata():
  if i == (255,255,255):
    white += 1
  else:
    # we assume black everything that is not white:
    black += 1
print im.size[0],im.size[1],white,black

If we run this program on the png image, it will tell us how many pixels are white, and how many are black.

$ python imagedata.py piechart_100w_400b.png
210 158 23988 9192

Of the 33.180 pixels defining the full image above (border included, not only the pie chart circle), 23988 are white (72%), and 9192 are black (28%). Hence the image is not representing itself: the plot represents our initial values of 20 % white and 80 % black.

Now we create a new image, in agreement with the iterative procedure, passing the most recently obtained values

python piechart.py 23988 9192 piechart_23988w_9192b.pdf
convert -geometry 210x158 piechart_23988w_9192b.pdf piechart_23988w_9192b.png

and repeat the process. This becomes tedious very soon, so I wrote a driver (driver.sh) to perform the process for me

# generates the starting guess
python piechart.py 100 400 iter_0.pdf
convert -geometry 210x158 iter_0.pdf iter_0.png 

# iterative process
echo "step w   h  white black"
step=1
while true;
do
 data=`python imagedata.py iter_$(($step-1)).png`
 echo "$step - $data"
 python piechart.py `echo $data|awk '{print $3}'` `echo $data|awk '{print $4}'`  iter_$step.pdf
 convert -geometry 210x158 iter_$step.pdf iter_$step.png
 step=$(($step+1))
done

If we run it, we immediately see a very interesting result:

step w   h  white black
1 - 210 158 23988 9192
2 - 210 158 29075 4105
3 - 210 158 30551 2629
4 - 210 158 30977 2203
5 - 210 158 31108 2072
6 - 210 158 31158 2022
7 - 210 158 31164 2016
8 - 210 158 31169 2011
9 - 210 158 31172 2008
10 - 210 158 31172 2008
11 - 210 158 31172 2008
12 - 210 158 31172 2008

The number of black pixels decreases, and the number of white ones increases. At every step, the image slightly changes, until it reaches a point where it does not change anymore: it achieved self-consistency, and it is representing itself. This is a movie of the various steps until convergence

Piechart convergence

Piechart convergence

What if we started from the other direction, namely, with a guess containing zero as the number of black pixels? The result would have been the same

1 - 210 158 31750 1430
2 - 210 158 31320 1860
3 - 210 158 31221 1959
4 - 210 158 31184 1996
5 - 210 158 31178 2002
6 - 210 158 31174 2006
7 - 210 158 31172 2008
8 - 210 158 31172 2008
9 - 210 158 31172 2008

Again, even with a different starting guess, we obtain the same result, here depicted as a movie

Piechar convergence 2

Piechar convergence 2

I hope this gave a brief explanation on how Randall achieved the self-consistent image. His case was more complex, having three plots. Also, the comic is scribbled, so either he drew it by hand,  approximating the  computed result, or he performed some scribble-like transformation preserving the pixel count. I assume it is the former.

How much statistics should one know ?

I just wrote an answer to this very interesting question on Stackoverflow. Now, as a disclaimer, I’m not an expert in statistics, but I did enough statistics to “know the beast”, or at least what are the dangers. I will rearrange my answer for this post, to address the more general case.

The main issue is “How much statistics should any person know?”. In our life, we all deal with statistics, willful or not. Polls, weather forecast, drug effectiveness, insurances, and of course some parts of computer science. Being able to critically analyze the presented data gives the line between picking the right understanding out of them or being scammed, tricked, or misdirected.

Technically, the following points are important:

All these points are critical if you want to interpret anything with a grain of salt. Yet, they are not the whole story. Let’s face it. Statistics needs understanding before anything can be inferred, otherwise wrong conclusions will be obtained. I will give you some examples:

  • The evaluation of the null hypothesis is critical for testing of the effectiveness of a method. For example, if a drug works, or if a fix to your hardware had a concrete result or it’s just a matter of chance. Say you want to improve the speed of a machine, and change the hard drive. Does this change matters? you could do sampling of performance with the old and new hard disk, and check for differences. Even if you find that the average with the new disk is lower, that does not mean the hard disk has an effect at all. Here enters Null hypothesis testing, and it will give you a probability, not a definitive answer, like: there’s a 90 % probability that changing the hard drive has a concrete effect on the performance of your machine. Depending on this value, you could decide to upgrade hard drives to all 10.000 machines in your server farm, or not.
  • Correlation is important to find out if two entities “change alike”. As the internet mantra “correlation is not causation” teaches, it should be taken with care. The fact that two random variables show correlation does not mean that one causes the other, nor that they are related by a third variable (which you are not measuring). They could just behave in the same way. Look for pirates and global warming to understand the point. A correlation reports the possible presence of a signal, it does not report a finding.
  • Bayesian inference. We all know Bayesian-based spam filter, but there’s more, and it’s important to see how human decisions and mood can be influenced by a clear understanding of data analysis. Suppose someone goes to a medical checkup and the result tells him/her has cancer. Fact is: most people at this point would think “I have cancer” without any doubt. That’s wrong. A positive testing for cancer moves your probability of having cancer from the baseline for the population (say, 12 % of women have the chance for breast cancer) to a higher value, which is not 100 %. How high is this number depends on the accuracy of the test. If the test is lousy, you could just be a false positive. The more accurate the method, the higher is the skew, but still not 100 %. Of course, if multiple independent tests all confirm cancer, then it’s very probable it is there, but still it’s not 100 %. maybe it’s 99.999 %. This is a point many people don’t understand about bayesian statistics.
  • Plotting methods. That’s another thing that is always left unattended. Analysis of data does not mean anything if you cannot convey effectively what they mean via a simple plot. Depending on what information you want to put into focus, or the kind of data you have, you will prefer a xy plot, a histogram, a violin plot, etc… Each data insight has a different preferred plot, exactly as each conversation has a different appropriate wording.

Statistics enter our lives every time we have to distill an answer or compare numerical (or reduced to numerical) data from unreliable sources: a signal from an instrument, a bunch of pages and the number of words they contain and so on. Think for example to the algorithm to perform click detection on the iphone. You are using a trembling, fat stylus (also known as finger) to point to an icon which is much smaller than the stylus itself. Clearly, the hardware (capacitive touchscreen) will send a bunch of data about the finger, plus a bunch of data about random noise from the environment. The driver must make sense out of this mess and give you a x,y coordinate on the screen. That needs a lot of statistics.

An additional issue is sampling. Sampling actually comes first than statistical analysis: you collect a sample, reduce it to a number, and perform statistics on this number (among many others). Sampling is a fine and delicate art, and no statistics will correct, or even point out at an incorrect sampling, unless you act smart. Sampling introduces bias, either from the sampler, the sampling method, the analysis method,  the nature of the sample, or the nature of nature itself. A good sampler knows these things and tries to reduce unwanted bias as much into a random distribution, so to treat it statistically.

As a closing remark, statistic is among the most powerful allies we have to understand the noisy universe we live in, but it’s also a very dangerous backstabber enemy, if not used properly. Willfully misusing it is definitely evil.

Periodic table of videos

I found this very interesting site about the periodic table of elements, from the University of Nottingham. For each element, there’s a video showing the characteristics of the element, and a brief commentary. Worth checking out if you always had some curiosity about the chemical elements, what they look like, and how they behave.

They also have a youtube channel for even more interesting short movies about chemistry and physics.

InspectorWordpress has prevented 1 attacks.