Computational chemistry development in research

Imagine you are a professor in organic chemistry. You received financial support for a project, and you are ready to hire a Ph.D. student to make it happen. The project requires the synthesis of a new compound.

Imagine you interview your best candidate. At the whiteboard, you present him with various problems of how to synthesize different products, and you find he is very qualified in planning these strategies. He also knows how to estimate the yields of the products, and how much initial reagents are needed. He knows temperatures and catalyzers required, as well as conditions that may break down the product. He can also write excellent articles. Seems a very promising person, so you hire him.

After a few weeks, he joins the group and enters your lab for the first time. Here you find out the following about him:

  • It is the first, maybe second time he enters into a lab.
  • He does not know the names of lab glassware. He never heard terms like “flask” or “test tube”. He actually looks at you with a strange face when presented with them, and say that the only time he did an experiment during his course, he used a pan and it was enough.
  • He is unable to keep his bench and glassware clean, and most often than not, he throws away used glassware instead of washing it.
  • He constantly breaks glassware, scattering glass and reagents around, damaging the work of his colleagues who have to clean his mess or just deal with it in the most improbable ways.
  • He is unable to assemble glassware (such as he is unable to build a distillation setup). He puts clamps in the wrong way, does not put silicone grease in the joints, nor any clamp to keep these joins sealed to prevent “popping” due to pressure buildup.
  • He never performed a chromatography, nor an Infrared Spectrum. He just vaguely heard about them, but his course professor said that they are useless and you can identify the substance with a sniff, which is enough most of the time.
  • He pretends to prepare and use his own very impure reagents, even when high-purity, standard reagents are available from a reagent reseller. He claims that it takes too much time to call them and make the order. Also, he is unable to understand the codes and numbers written on the bottle’s label.
  • He labels his flasks with cryptic names, and pass them to colleagues who constantly have to ask him what they contain. Most often, he does not remember and have to check on his notebook. When he remembers, it turns out that the product inside is either polluted or damaged.
  • He disposes of his byproducts in the drain, instead of using the proper disposal units. When addressed about it, he says “who cares? it takes less trouble”
  • When asked about his low quality work, he claims that it’s not his task nor its core expertise to produce top quality laboratory work, nor to know how to use laboratory equipment, as long as he can manage to get the required product at the end. He also points you at a very efficient synthetic strategy he just devised at the whiteboard, claiming that he is doing an excellent work.
  • At the end of his employment, he leaves a notebook containing the process to create the product. The notebook is completely disorganized, the pages are mixed and not numbered. The handwriting is close to impossible to understand, it is written in four different languages, and the quantities are specified as “a pinch of”. However, the glassware setup he made and left, when fed with the contents of an unlabeled reagent bottle you find in a bench downstairs, gives you the product, but only if the humidity in the laboratory is at exactly 75 %.

My question for you, the professor of the group is the following: would you keep this person in your group, or would you dismiss him?

Same story, different chemist

The scenario presented above is considered the norm in a different branch of chemistry, theoretical and computational chemistry. While “white coat” chemists use whiteboard synthesis strategies, a laboratory, glassware tools, spectroscopic instruments, chromatography and distillations, theoretical and computational chemists use mathematics, computers, editors and software development. Both these sets are tools for the job, and in order to practice the discipline, proficiency with them is expected. Yet, in the discipline of theoretical and computational chemistry, the general level of competence and proficiency with its tools can be rightfully compared one-to-one with the scenario given above for the case of organic chemistry.

Organic chemists are supposed to be able to use a laboratory environment with high quality standards and protocols. By analogy, computational chemists are supposed to be able to use a software development environment to high quality standards and protocols. In both cases, to perform their research they should be required to be masters of the basic tools, and proficient in a broad set of specialized and modern tools. It is inexcusable not to be.

“Google plus” and “What do you love”

Since its release, I got access to Google+ and started playing with it, so I feel obliged to join the crowd and state something about it.

I want to first state one important point. I am not a fan of social networks, at all, unless when useful (such as LinkedIn). Why? For four reasons, all boiling down to two main concerns of privacy and relevance:

  1. the strongest violations to your privacy (and everyone has one) comes not from what you put online, but what others put online with your name slapped on it. You have little control on this in general, but a highly tied, open “village square” is more prone to induce others to talk about personal facts about themselves and others in broad daylight. We all need a little privacy, ranging from that time you got sick from too much something, to your family’s current or past health. Even a “How is your mother?” thrown into a large pool of contacts by accident may not produce a really pleasant discussion. It’s a matter of tact, discretion and courtesy, and social networks may tend to undermine this out of acquired habit (depending on how they are designed). In the end, however, it’s always about people.
  2. It’s very difficult to remove something once it’s out on the web. Some things are better left forgotten, because it may lead to harassment from idiots (whose absolute number is staggering high in the crowded square of the internet). Nobody likes to be harassed, and occasionally you would just want to remove things with some degree of confidence.
  3. A central idea of some social networks is that “all friends are equal”. This is not true. Your boss may be a “friend” in terms of semantic connectivity, but it’s certainly not a friend in the human sense. Similar argument goes for parents, relatives and so on. There’s a stronger and very, very accurate semantic meaning to what we mean with a label assigned to a social connection, which in most cases is nuanced to the level of the single person. Squashing it flat reminds me of Orwell’s 1984 Newspeak: “remove all shades of meaning from language, leaving simple dichotomies (pleasure and pain, happiness and sadness, goodthink and crimethink)” (quote from wikipedia). Human lives generate data, and this is the information age. We want to have our life, our data, our experiences and memories accessible to humans and computers for many practical and non-practical reasons. Interpreting our complex human lives with all its data, nuances, errors, and inconsistencies by computers is an almost impossible task: we are still unable to make computers as intelligent as people, so we are making people as dumb as computers, to meet halfway.
  4. Most of the social network traffic is irrelevant noise. I may be interested in some people’s posts, but I want my stream of information equilibrated, and I want to check some streams more frequently, other streams less frequently. Some people may babble a lot about things I may have interest once a month. Some others, I want to hear as soon as I can.

So long for the motivations. What about Google+ ?

Google+ is Google Wave with a usable interface. I may even be tempted to say that Wave was just a “phony experiment” to stress the protocol and backend, but that’s just a very hard guess. The shared document concept is now applied to small chunks of information we post in the streams. Instead of picking the participants one by one, they can be classified once and for all and referred as a whole: the Circles. Adding comments, posting links, images and Google Maps position were already Wave features, also provided by some Wave widget plugins. I would not be surprised to see the old Wave widget plugins to become Google+ applications soon, so you can play chess with a friend on Google+, while others watch and comment the moves (Edit : two weeks after this post went live, google announced google+ games). Visibility of a post is nothing but the Wave’s adding people to, or removing from, the post participants list. Google+ also introduces the social graph features not present in Wave with a strategy similar to twitter: one-side “following”, as opposed to a fully two-sides “friending” negotiation, making it feel less “personal mailing list of (huge number of) friends” and more akin to a personal blog with many, possibly unknown readers. Finally, it removed the Play/Rewind feature, admittedly useless in this new context.

I liked Wave for its intrinsic underlying power. I may be inconsistent to say that I don’t like Google+. In fact, it’s quite ok: it gives much more control on handling on information, and an adequate feeling about the concerns I wrote about. It allows a complete removal of all your Google+ data if you so wish, not just “disabling”, and the overall appearance is fresh, although some improvements are still needed. I would not use it for anything too personal, but as a complement or even substitute of Twitter, to post information that may be relevant only to some people I know.

Closing this post, I want to point your attention also on “What do you love?“, some kind of Google aggregator that searches all the available about something you claim you love. I’ve never seen it before, so I assume it’s something new, but I may be wrong. Don’t bother poking it with four letters words: Google’s engineers know you really love kittens.

The end of the space age

Today, an era ends.

Credit: Nasa

Today the last Shuttle, Atlantis, is scheduled to land for the last time , closing the era of the Shuttle missions, and basically the Space age.

Why I say so?

Well I don’t think I should spend low-grade effort explaining something that has already been professionally written at the Economist. Instead, I want to say thank you.

I want to say thank you to everyone that made this marvel of technology possible, a companion of life, the visible proof of what humans could accomplish if they just focused on “doing” rather than “babbling”, catching dreams rather than following lies, concentrate rather than disperse. Unfortunately, it is painful to observe that the US technological advancement when it comes to space arose from a head-to-head with USSR during the cold war. There was nothing inspirational about it. It was just a field where the US could beat, and indeed won, a war of propaganda against their enemy of the time. Were the field have been how to prepare delicious fried potatoes, today we would have the best frying pan ever produced by mankind.

But I digress. We took the chance, and won our race, as humans, to free ourselves from the comfort of our own planet, and got there, in the weightless darkness that can kill and amaze you.

The country who gave us this:

and this

Today puts in this activity less money than those required to air-condition soldier’s tents (check the audio). True ? False ? I don’t know. I also don’t know how true it is that US budget costs in 2010 for the Department of Defense is 663 billions dollars, vs. 46.7 billions for Department of Education and 26.3 billions for Department of Energy. That’s 7 % and 3.9 % respectively of the DoD budget. These are the numbers I found. As accurate they may be, still they explain a lot about the current status of the country.

Kitchen scheduling and administration

I don’t like TV, but occasionally I get to something that pokes my brain. I am an interested observer of a TV “reality show” called “Gordon’s Kitchen Nightmares“.  Professional chef Gordon Ramsay travels from restaurant to restaurant in order to analyze and (allegedly?) try to fix their troublesome financial issues, generally due to bad managerial and food preparation habits.

The point of this introduction is that I am fascinated by restaurant kitchens. It’s an impressive task of scheduling, management and resource optimization for a reliable quality of service, under an insane amount of time pressure. When it comes down to a professional environment that requires fully dedicated and flawless teamwork, nothing compares to restaurants. OK, probably train stations and airports, but that’s a different scale. Here are some of the issues they have to handle:

  • Different food likely requires different preparation time, but when you receive an order from a table, people sitting at that table are supposed to get the food at the same time.
  • Some components can be prepared beforehand, thus paying the price of a slow preparation step earlier, when the restaurant is still closed. When the actual meal needs to be prepared, the component is ready to be used. It is then important to decide which components can (and must) be prepared beforehand, and the appropriate quantities (so that there’s enough for the whole evening, but not too much so that it expires) while at the same time preserving food quality.
  • If someone gets a first course, the second course is supposed to come after a while, giving him the time to eat the first, relax a few minutes, and start the second. Timings must be calibrated so that the client doesn’t get the second course too early (while he is still eating the first one) or too late (having him wait for too long). In addition, if you have a table of people, clearly you should start serving the second course when every guest finished their first one.
  • Stoves, ovens, ranges and helper chefs are finite resources that must be allocated properly. If you have to prepare ten portions of spaghetti, you don’t use ten helpers and ten ranges at the same time. Ovens have a fixed maximum capacity, and you have to take it into account if extensive use is required, eventually at different temperatures.
  • A broad menu requires a lot of ingredients, hence more space in the fridge, and more stuff that is potentially unused at expiration date, thus increasing waste. A small menu is more manageable, but it removes customer choice and customer return value, which drives profits.
  • A large amount of tables puts more strain on the kitchen, increasing the likelihood that the waiting times will become unsustainable, hurting the restaurant’s reputation. A small amount of tables leads to less pressure on the kitchen, but it also means reduced earning and less picks from the menu, increasing the likelihood of expired food in the fridge.
  • There are two major and very different patterns of client flow: the mass commit (such as at opening time, or if a whole bus of tourists decides to stop by and have a meal) and the constant flow, where the new orders are regularly spaced and evenly distributed in time. In both cases, a well organized kitchen must deliver food within an acceptable time span after the order has taken place.
  • The chef must choose the proper menu that optimizes preparation time, resource usage, ingredient consumption, price, local taste of customers, preparation skills of his team.
  • The size of the dishes is important for business. Nobody wants to see a big dish with a small portion in it, unless you are a famous Nouvelle Cuisine chef, of course. Normal customers in normal restaurants prefer a proportionate amount, so a large dish forces a large amount of food to be put in. Customers leaving with doggy bags are indicative of an excessive amount of served food, which is a waste of money and requires high prices leading to a less competitive restaurant.
  • In some cases, preparations (or part of it) can be delegated to the waiter, thus freeing kitchen manpower. Examples are desserts (e.g. burning the sugar on the Crème brûlée with a torch, or cutting a big roasted chicken directly at the table for both delegation and customer satisfaction)
  • On top of all this, the kitchen has to deliver good food, from good fresh ingredients.

And let’s not underestimate additional human factors. Being a professional chef means that the salary is most likely low, the work schedule tough, and there are no regular vacations, but if you get proficient enough and you have some good professional connections and a bit of luck, you can get to work on one of these

It kind of helps… I think.

Bad science, good science – Part 3: Developing a critical eye

In this final post, I want to give some form of grocery list to get an idea of the reliability of scientific communication performed by general purpose media. To do this, I necessarily had to introduce some nomenclature in the previous posts, in particular about citations, type of article, structure of an article, Impact Factors, and h-Index. This nomenclature will now become useful.

Suppose you read an article on your local newspaper, claiming some new impressive scientific breakthrough, something like

“crocodile tears make you lose weight”

The article details how Professor Whatever at University of Smart People reports in a recent article on Journal of Succulent Science how a substance contained in crocodile tears, Vitamin Z, can increase metabolism and therefore make you lose weight, according to his research. The message of the article is that a daily dose of crocodile tears will burn fat away.

Not so fast, I say. The claim is unlikely to be that simple. It has been simplified to make it accessible, but reality is much more complex. You may need to check a bit the assertions, the reputation of the involved parties, and the actual result of the research. Let’s go with reputation first:

  1. Check the reputation of the journalist. Is he qualified as a scientific journalist ? Does he or she have a good background in the science and scientific terms of the discipline he or she is presenting ? This point allows you to get an idea of how much he understood about the research result, and how much is hyperbole.
  2. Check the reputation of the scientific journal of the cited scientific article. Is it a real journal, or just something that is named with a sciencey name, but it’s not a reputed journal. When was the journal started? Does it have a good impact factor?
  3. Check the reputation of the authors of the scientific paper. What is their h-Index ? To which institution are they affiliated ? Are there any conflicts of interest that may bias the evaluation ?
  4. Did the journalist get the information he presents directly from the paper, or from an agency? How many steps happened between the information source and the journalist?

You should then proceed to question the claim

  1. Does the result arise from new research (therefore more likely to have bias due to smaller sampling groups for a preliminary assessment) or from a review, which is likely to have larger samples and more accurate and additional considerations?
  2. How much active substance is needed to deliver the presented result, and how much raw material do you have to drink/eat in order to obtain the requested amount of active component in your body? Suppose you would have to drink 4 liters of crocodile tears per day, would you do it, and would it be easy and cheap to obtain them, even if you technically could drink them?
  3. Does the substance become available in the body, or does it get expelled or undigested unless administered through unconventional methods? IV injection is a very unpleasant way to get substances in your body. If it’s a food, does it break down if you cook it, and cooking is the only way of eating it ?
  4. Was the experiment performed on humans, on laboratory mice, or on yeast cells ? Mice can be a valid prototype organism for human in most cases, but metabolism can be different, and so is the effect considering the different weight of a mouse and a human.
  5. How many people were used in the experiment ? 10, or 10.000? Size of the sample is very important. Results on a small sample may be due to random chance, or a particular sensitivity (or lack thereof) of that particular set of people to the substance.  Were the selected people all of the same ethnicity, age, sex, or lifestyle, or were these effects different among the group ? Was there a placebo control ? If you take a group of healthy, young students at the end of winter and want to check if crocodile tears has an effect in losing weight, you may obtain a (wrong) positive answer: in reality they lost weight because with Spring outside, they started biking more.
  6. Suppose the claimed effect actually exists, but it allows you to lose only 20 grams per month. Does it really matter in practice ?
  7. Is the presence of an effect factual (e.g. some sort of metabolism increase) but the conclusion far fetched (losing weight) ? How well are the actual signal and the claimed outcome correlated?
  8. If the proposed substance is a drug, has it already passed all the clinical trial phases? Promising drugs may be cut short and never see the market because of troublesome effects found during trials.

I mostly focused on health. What about new exceptional technologies to solve problems we know to have, and problems we didn’t know we have?

  1. Does the technology stated actually lasts, or it degrades within a few days or months? Does it still carry potential problems such as excessive fragility?
  2. Does it introduce new problems, such as higher pollution? What is the strategy to deal with these new problems?
  3. Is the material needed to produce the technology enough? Some materials are intrinsically rare, and this may limit the impact of the technology to a very small number of people, unlikely to have a relevant effect on the broader market.
  4. Is the promised technology able to push aside the legacy and habits of its future customers?

There may be many additional remarks to check, but in general the safest approach is to remember that

  • if it’s too good to be true, it probably isn’t, or there may be a catch
  • if it’s too simple, important details have been probably left out.
  • beware of the telephone game
  • beware of incorrect use of statistics

but as a counter-argument, remember that only through knowledge we can understand better what is presented. Skepticism is good as long as it is knowledgeable skepticism. Ignorant skepticism is as bad as ignorant acceptance.

What makes the color of things ?

Suppose someone gives you the chemical formula of a substance, such as

and asks you the color this substance is expected to have. Is it possible to give an answer? In most cases, you may have an educated guess, but an accurate prediction is far from trivial: the color of a substance is decided at various levels, from the basic molecular level up to the macroscopic structure.

The first level: the molecule by itself

The most “trivial” level is the molecule by itself, and it is decided by the elements it is made of, its geometric structure (the position of the atoms), and their charges. These parameters have a key impact on how its electrons are distributed in space and how this distribution changes when light enters the scene, a phenomenon which is strongly related to light adsorption and thus to color.

When it comes to perception of visible light, white light is a mixture of all the wavelengths of electromagnetic radiation from ~700 nanometers to ~400 nanometers. These wavelengths are perceived by our eyes (and brains) as colors, with the longer value of 700 nanometers being almost infrared and the shorter 400 nanometers being almost ultraviolet.

EM spectrum

In a more simplified rewording, white light is a mixture of all the colors of the rainbow, spanning from red to violet passing through yellow, green, blue etc. as beautifully shown by this prism

When you send some white light on the molecule you basically provide all the colors. The electronic setup of the molecule is such that it “prefers” specific light wavelengths (hence, specific light colors), and this preference results in an adsorption. This is due to the light promoting an “electronic transition” between a ground state and an excited state: electronic distribution is rearranged due to the interaction between electrons and the electromagnetic radiation. A simplified vision of this event is the electron “jumping” to a higher, excited level, but in reality, it is the electronic cloud that changes.

Transition to a ground (E1) to an excited state (E2) due to adsorption of an incoming photon of light (hv)

The accumulated energy is then “quenched” (dispersed) as heat. As a consequence, the molecule removes some colors from the white light, leaving others unscathed, and the resulting color we see is the complementary one. If the molecule absorb blue, you get red. If it absorb yellow, you get violet. Absorption in general is not an “all-or-nothing”. The intensity of absorption at each wavelength depends on many factors, producing what is called an absorption spectrum, which is unique and characteristic of every molecule or atom. The color of the substance is the complementary result of this spectrum. The uniqueness of the spectrum allows us to infer the composition of our Sun, of distant stars and planets, through what were commonly known as Fraunhofer lines

Fraunhofer lines are absorption lines of atoms in the Sun atmosphere. Some absorption is also performed by Earth atmosphere. They act as fingerprints for a given atomic species.

Electronic transitions, however, are not the only responsible for absorbing light. A molecule can also absorb light by excitation of rotations and vibrations (meaning that the molecule spins faster, or vibrates more). One case is water. Water appears as transparent, but in reality it’s slightly blue. The reason is that some wavelengths in the red make it vibrate more (to be exact, water absorbs in the infrared, which would not make a difference to our eyes, but this absorption has a so-called “overtone” which is in the visible red). As a result, a minimal amount of red is subtracted from white light and water ends up being slightly blue.

Can we predict this information? Yes, we totally can, with relatively good, but not perfect accuracy. There are many different programs capable of obtaining this information: the wavelengths where absorptions occur, vibrations, and other parameters that are important to decide the final spectrum. For atoms and small molecules, accuracy is very good, but as the molecule size increases, predictions require larger and larger computing power. For this reasons, quantum chemistry method developers daily create new smart approximations, able to deliver a very accurate result for a reduced computational cost. In any case, the required input is just the geometric position (xyz coordinates) of the atoms, their atomic numbers and masses, and the net charge.

The second level: molecular interactions and reactions

Molecules are generally not alone. They can come close, and eventually have other molecules around, either of the same species, or of other species, such as those of a solvent: from simple water, alcohol or acetone, to complex cell environment. There are no reactions involved, just the proximity of other molecules, with their protons and electrons. These partners alters the electronic setup of the molecule, promoting a slight variation of the electronic and vibrational behavior. Absorption, and thus the color, is consequently changed. In general, this change is a shift of the original spectrum either towards higher wavelengths (bathochromic shift) or shorter ones (hypsochromic shift).

Then you have anything that can change the structure of the molecule through chemical reaction. Take tea, put some lemon into it, and its color becomes lighter. The reason is that with lemon you are increasing the acidity of the water, which means a higher concentration of charged hydrogen ions (H+). The higher concentration of hydrogen ions push Thearubigins, a class of colored substances found in fermented tea, into a form with the ion attached, which creates a change in the molecular electronic distribution, which in turns changes the absorption and thus the color. For some substances, this effect can be dramatic: from blue to red, from transparent to purple, from yellow to blue. These are the so-called pH indicators

Bromothymol blue at acid, neutral and alcaline solution (left to right).

These effects may technically be predictable, but they require to consider a complex system of interacting species, with different chemical exchanges, short and long range interactions of charges and so on. This may be very difficult, if not impossible to perform with today’s methods and computational power, although approximations exist to work around the heavy computational weight and provide reasonable results.

The third level: crystal and impurities

A crystal is a solid where the constituent molecules or ions are disposed in space with a well defined order. For any given substance, the ordering of its atoms or molecules in space is not necessarily unique, a phenomenon known as polymorphism. Depending on the packing, different properties arise, and different colors are the result. Diamond is transparent, graphite is black, and black is also C60 fullerene, but they are all made of the same element: carbon.Diamond and graphite

For another example, take gold. You may say that it has gold color, but if you take a small cluster (say, 100 atoms) of gold, what you see is red, not gold-colored.

When you have atoms or molecules ordered in a crystalline structure, the result can absorb light by virtue of this ordered structure. Note that this effect is complementary to the initial absorption characteristics of the molecule or atom taken by itself. For example, one single atom of carbon may absorb close to nothing in the visible, but due to the highly ordered crystalline structure, the macroscopic block of graphite you hold in your hands absorbs light, most of it, and thus is black. A similar effect occurs with any pigment having crystalline structures influencing the color. At the quantum level, the effect just presented is related to band structure and Bloch wavefunctions. The same facts also explains semiconductors and conductivity of metals.

These effects are relatively predictable. A large number of computational software deals with periodic structures in a very efficient way, providing spectroscopic information about the properties of both atomic and molecular crystals.

As an additional twist, crystals can have defects, such as imperfect packing or impurities of foreign elements into the periodic structure. The resulting effect is beautifully shown in diamonds, for example in the Aurora Pyramid of Hope

and in Aluminium oxide: pure, it is colorless. Add some chromium, iron vanadium and titanium and it may become ruby

or sapphire, which is blue, pink, yellow, orange, purple or green, depending on the crystal structure, and the relative quantities of these impurities.

These effects are generally very hard to compute, as they may require statistically large ensembles of atoms. I am not aware of any computational techniques on this regard.

The fourth level: macroscopic properties

Finally you have how the substance is structured at the macroscopic level. Take a smooth platinum electrode: it is platinum color. Make it sponge-like (by making very tiny bubbles and pits) to increase the surface area and it appears black as coal. The reason is that light is scattered and absorbed completely, leading to a black color.

This opens to many additional effects concerning matter-light interaction. What is the color of a CD ? Is it silver ? Is it “rainbow” ? What about the color of a oil slick on the road in a rainy day ? What about the color of a Tiger’s eye, or of an opal

And what about blue eyes, and the blue color of a spoon of flour dispersed in water ? Both are due to Tyndall scattering. There is no blue pigment in blue eyes, nor in flour, but the scattering of light is frequency dependent, reflecting blue and transmitting red, leading to a blue color.

This Wikipedia and Wikimedia Commons image is from the user Chris 73 and is freely available at http://commons.wikimedia.org/wiki/File:WaterAndFlourSuspensionLiquid.jpg under the creative commons cc-by-sa 3.0 license.

As you see, color is a very particular property, and while you may have an educated guess from quantum mechanics techniques, it’s not always easy to infer the color of a substance. This is just the tip of the iceberg. You have many other phenomena (such as how much light penetrates into the substance, or which macroscopic imperfections are present) which affects both the color and the reflective properties of a substance. Ice is transparent, but if it’s full of bubbles it is white. Plastic looks like plastic, and metal looks like metal, depending on how light is scattered and absorbed, which then changes the way it is reflected back to the viewer. In addition, this does not only affects color, but also the general material texture.

What about the opening molecule ?

The opening molecule is Indigo, a natural dye found in some plants. Today, it is synthetically produced in large quantities. It is commonly used to dye blue jeans.

Bad science, good science – Part 2: Reputation, quality and meaning

This article continues my series of three articles on how to defend yourself from bad scientific communication perpetrated by non-scientific newspapers. The first post detailed the scientific article and the mechanism of citations. In this post I will proceed detailing the peer reviewing process, and two numbers, the Impact Factor and the h-Index, to obtain a very rough estimate of the authoritativeness of journals and scientists. I want to stress this point, and I will stress it even more: using these values give a very rough evaluation, which is not an absolute verdict. It is just a potential signal that, together with other evidence, may give an insight of the trustworthiness of a scientific finding.

Peer reviewing

Bringing an article from a draft to a polished publication for a scientific journal requires going through a process called “peer reviewing”. Peer reviewing means that your submission is scrutinized by other experts (known as referees) before it is accepted for publication. This process aims to satisfy the following needs:

  • filter out articles that are not appropriate for the journal (e.g. an article on lab synthesis on a journal for computational sciences)
  • check if the article satisfies basic requirements for rigorous scientific investigation, allowing it to be reproducible, verifiable, and with proportionate claims
  • check if mistakes have been performed in the procedure, such as performing an incorrect collection of samples (e.g. checking water quality of a lake by collecting samples from only one point of the lake, far from the source of pollution), incorrect analysis of data (e.g. not enough samples, like testing a drug on only one person) introduction of errors (e.g. using an unreliable analysis kit) and many others.
  • request additional proof for a claim to be considered valid, typically because the claim made by the authors is too general for the amount of data available.
  • point out similar techniques, or missing citations.
  • ask general questions to the author about some aspects of the paper.

Peer review is generally done anonymously: when you submit an article, the journal editor chooses two or three  referees considered suitable to give a sensible opinion on your claims. The editor collects their feedback and forwards them to you as anonymous reports. Your name, on the other hand, is generally known to the referees, although this may not always be the case.

Claiming that something has been peer reviewed is not necessarily a guarantee for certified scientific quality. Suppose I decide to start a journal, call it with a sciencey name, and have my mom (who is not a scientist) do the peer review. Although this technically would be a peer reviewed journal, what is published on it will not necessarily be authoritative. Thus, claiming something is published on a “peer reviewed journal” says little about its scientific value and correctness: it just says that someone else took some kind of look at the article before publication. Sounds far fetched? It happened (Link 1, Link 2).

Even with recognized journals, the peer reviewing process can vary from very strict to lax, depending on editorial policies and choice of referees. Different journals have different levels of strictness: some journals want just the “cream of the crop” of science, being ruthless on what gets on their pages, and selecting not only for scientific excellence but also for interdisciplinary impact. Journals such as Nature and Science belong to this category. Other journals may focus on a very narrow scientific field, with editors delegating to a pool of highly reputed, very tough referees who pretend a certain level of importance in your claims and destroy your paper to splinters before accepting it. Finally, you may have journals accepting articles with low impact on the discipline, or even experimental or methodological errors.

Needless to say, the process of peer reviewing is not perfect. There are many objections to peer review as a process, and we won’t debate them here, but at the moment it’s the best compromise for the task. The aim is toward basic filtering and methodological quality, not necessarily the correctness of the claim and the obtained experimental values. The referee does not try to reproduce the experiment: he/she just checks if the obtained results are possible to reproduce with the given information, and if the paper makes scientific sense and provides new information to the discipline. The scientific community will then evaluate the claim, comparing it to other methods, and eventually finding out a new insight, a methodological error, or an intentionally fraudulent activity. Regarding the latter, it is taken very seriously by the scientific community. I’ve seen Ph.Ds titles revoked and head people resign over fraudulent scientific activity performed by others under their direct management.

Measuring (rough) journal authority: the Impact Factor

The authoritativeness of a journal comes from the reputation it accumulated inside and outside the community it writes for; This community is made of people who are both readers and authors at the same time. One rough solution to measure the reputation of a journal is represented by the Impact Factor (IF). Before explaining how it works, remember that:

  • it is not a perfect method
  • it is not the only one
  • it acts “by proxy”, meaning that it measures something else which generally is assumed to be correlated with reputation, but this is not the rule

This post is for giving rough tools, and Impact Factor is an sufficiently appropriate tool to get at least a rough idea about a journal’s reputation.

From the part 1 of this article, we discussed about citations. An article which is cited a lot by other papers had some impact on the scientific community, which reacted by investigating more; an article with is cited poorly was received with a “meh” and people moved on. A scientific journal publishes tens of articles every issue, and each of these articles will be eventually cited by others in the near future. The Impact Factor of a journal can be intended as the average number of citations an article on that journal collects, averaged on the last two years.

For example, suppose a journal publishes 300 articles in 2006 and 200 articles in 2007, for a total of 500 articles in this two years period. In 2008, you check the total number of citations these 500 articles collected from other articles in the same or other journals. Suppose this number is 1500. Then the impact factor of that journal in 2008 is 1500/500 = 3.0. This value is the average number of citations collected by a single article in that journal.

How do you increase the Impact Factor ? Publish very few articles which get cited a lot. How do you decrease it? Publish a lot of poorly cited articles. Peer reviewing strictness can influence the impact factor, as well as editorial shift towards review articles (which are cited more). It is also important to note that impact factors cannot be compared across disciplines: the highest impact factors you may find in theoretical chemistry journals is lower than typical impact factors found in biology journals.

To sum up, Impact Factor measures a network of citations: it gives a measure of the interest the scientific community has for the average paper published on that journal, not necessarily how good is the average article, although it may be claimed that these two concepts are somehow correlated.

Where do you find Impact Factors ? Generally on the journal website. In alternative, you need to ask the ISI Web Of Knowledge database, which requires a subscription. You may therefore need an academic friend or a visit to your university library.

Measuring (rough) scientists authority: the h-Index

As Impact Factor measures (with clauses) the importance of a journal, h-Index measures (with clauses) the importance of a scientist. A scientists’ career is about producing papers, either by himself in the first years of his career, or through others, such as Ph.Ds, Postdocs, and collaborators. Clearly, as a scientist becomes more experienced and more involved in the scientific progress he gets more articles, and more citations from other colleagues working in the field. h-Index addresses both these factors at the same time.

h-Index is a number, and I will explain its meaning with an example: I have an h-Index of 7 (see citation metrics) not high, but in line with friends who did more research than I. This value of 7 means that I have 7 publications that have at least 7 citations. I actually have 15 publications, but the remaining 8 have less than 7 citations. More generally, an h-index of N means that the researcher has N papers with at least N citations.

h-Index is far from perfect, but its point is to measure cumulative productivity and visibility of a researcher in its field. Let’s see these two limit cases to understand why:

  • A researcher publishes 100 papers in his career, but he receives only 2 citations in one article. All his remaining articles are not cited. His h-index is therefore 1. He has one article with at least one citation (two citations). His h-index is not 2, because he does not have two articles with at least 2 citations each.
  • A young Ph.D. student publishes one disruptive paper collecting 200 citations. His h-Index is 1 because he has one paper with at least one citation. As in the previous case we see how the stress is on both productivity and impact at the same time.

There are many objections to be made to h-Index. This beautiful but very deep post “Who Is Today’s Einstein? An Exercise In Ranking Scientists“  by Johannes Koelman explains in a lot of details what is the problem of ranking scientists, and why h-Index is flawed. It compares, in particular, a very young Einstein-like genius (such as the one above, 1 paper with 200 citations, h-Index = 1) who loses its chance of continuing its scientific career to Mr. Mediocre, a guy with 5 papers having 5 citations each (h-Index = 5). I repeat, h-Index gives a measure of productivity and impact of a scientists, which may represent its authoritativeness, especially if he is old in the field. Note the additional point that h-Index cannot be compared across disciplines, because it depends on the number of citations, which in turns depend on the size of the scientific community in your field.

Summing up

In this post, I described two rough metrics to evaluate journals (the Impact Factor) and researchers (the h-Index). These two metrics are far from perfect, but they may give a signal about the authoritativeness of a scientific claim, by checking how the community respond to the general level of a given journal or researcher.

How science heals amputees

Check out this great movie on the BBC website: a young amputee decided to replace his non-functional hand with a robotic one. The new hand allows him to perform tasks as tying a shoe, opening a bottle, and fully rotate around the wrist, something not possible for a human hand. This is impressive on so many levels. The future is here. Cue Star Wars music.

Export vim text (with colors) to HTML

Vim is a great, great programming tool. Even after years of experience with it you still get to discover, either by change or by sharing, fantastic tips to make an impossible task incredibly easy.

It is the case with my recent problem of exporting the visual aspect of vim (as from terminal) to an HTML document, including the highlight colors. How to do it? Well, I tried pygments and it’s really great, but I asked on SuperUser for alternative methods. It appears that Vim supports this natively. Just issue

:TOhtml

to Vim, and it will create and open an HTML file containing what you see in your terminal. A big thank you to user progo for this tip.

Tulips failure

I am really sad to report my utter (but expected) failure with green stuff: the tulips died. From really green sprouts they suddenly became yellow and died within a week. This is the cold body aftermath

Poor green thing

Poor once-green thing

I have no idea of what went wrong, but I will try again next year. What hasn’t failed however, is the associated donation. I did not receive a lot from the book royalties this year, but I doubled that amount and donated it to Telethon.