Skip to content

No shame for “Import your contacts” feature

I am negatively astonished (and I am not alone) by a new diffuse practice of many networking websites: during the registration process, they ask your Gmail (or other webmail’s) personal username and password, so to login on your behalf and import your address book contacts.

What?

Giving your personal password to another website allows them to poke into your mail account, where you can have your personal emails (containing eventually other passwords), documents, calendars and so on. You are potentially handing them your (and a bit of your contacts’) digital life. Despite the disclaimers of honesty of these sites, even honest and very respectable companies can happen to have dishonest employees. Finally, there’s also a fair chance that you are violating the Gmail terms of service, which clearly state (paragraph 5.3) that your password is confidential. Other interesting paragraphs are 6.1 and 6.2, which in my opinion could entitle for a TOS violation as well.

But apart of these facts, what I consider most dangerous is to pass this practice as “normal” or “acceptable” just because it looks convenient and easy. I wonder what people would say if they were asked to hand out their home keys to the shop where they just bought a TV, so that the shipment can take place with no additional fuss for them. The user is generally the weakest point in security, and educating him/her in the wrong direction is really something to be worried about.

Added Twitter

I added Twitter to the blog. It’s a nice toy, and I want to see if it’s actually useful. For brief reports on my status, it seems like a nice thing, and I am having fun using the cellphone to update it.

From URI to ontologies

It is some sort of well known pattern, the “http://” stuff. It is so frequently used that browsers just fill it in automatically by default. But what does exactly mean? What’s behind it? In reality, behind a so called URI, there’s more than meets the eye.

What is an URI?

First of all, we need some definition. an URI, or Uniform Resource Identifier, is a string identifier that, as the name implies, identifies some resource. A resource is “something” that can be referred to. It could be an internet document, a book, whatever. The official definition of an URI is detailed in its latest form in RFC 3986.

The URI is made of the following parts:

  1. A scheme
  2. A hierarchical part
  3. Optionally, a query
  4. Optionally, a fragment

Some URIs can be URLs (Locators) or URNs (Names). Both are URIs, but the URL identifies a resource and the place where to find the resource, while a URN describes a name of a resource through the special scheme name urn:. When you type an address in your browser, you are specifying a URL. When you talk about “urn:isbn:0-395-36341-1″, you are using a URN to talk about a book with a specific ISBN. Please note that the relationship is not bijective: although every URL is a URI, not all URIs are URLs, and the same can be said for URNs instead of URL.

Although many scheme names are named after protocols (eg. http, ftp, ldap) this is mostly incidental. There is no “technical correspondence” between the scheme and the protocol: for example by saying a URI with the scheme name http, trying to get the resource (therefore assuming the URI is also a location) will trigger not only HTTP, but also DNS. Moreover, if you have the document in cache, you will not use HTTP at all, but you will get a resource that is on your hard drive. “Resolving” a URI means finding an access strategy to “use” (in very broad term) the resource identified by that URI, an operation called “dereference”. The most common dereferencing is retrieval, like downloading the resource.

The hierarchical part includes either an authority and a path, or just a path. Please note that the “path” concept is very loose: for example, in “mailto:[email protected]” “[email protected]” is a path. Similarly, in “urn:isbn:0-395-36341-1″, the path is  “isbn:0-395-36341-1″ (which corresponds to the Webster Dictionary, in case you are wondering). When you have an authority involved for the resolution of the path, and only in this case, then you have the “//” to indicate the authority, for example “http://example.com/”, where the authority is “example.com” and the path is “/”. An apparent exception is “telnet://example.com”, but in any case you do have an authority, and the path is always empty. Another apparent strange situation is “mailto:[email protected]”: in this case, be careful not to confuse a path with an authority (maybe user qualified, as in “http://user@host/”).

What can you use URIs for ?

So, what does the distinction of URI, URL and URN is really useful for?  The fact is that, while a URL refers to a “place” where to find a resource, a URN refers to a “name”of a resource in the urn scheme, and a URI as a name of a resource in any scheme. When we talk about resource, the meaning is very broad. It could be anything, even an abstract concept. An example is the definition of namespaces in XML, something like

<h:html xmlns:h="http://www.w3.org/">
 <h:head><h:title>Hello!</h:title></h:head>
 <h:body>
   <h:h1>Big Title</h:h1>
 </h:body>
</h:html>

The URI http://www.w3.org/ is just an URI. It is not a location (the fact that W3C is actually using the address to provide an informative page can be seen as a coincidence). It is a symbol to mean something, namely, the fact that some elements in that XML documents belong to the vocabulary of XHTML.

In Chestnut package manager, the XML manifest  starts with

<Package xmlns="urn:uuid:d195be0c-200a-40a4-9d05-35fdf42eb29f" version="1.0.0">

Where the URI “urn:uuid:d195be0c-200a-40a4-9d05-35fdf42eb29f” again means something: the fact that the “grammar” used is the one of Chestnut package manager. I created this URI with the utility uuidgen, and by definition is a unique id. I could have used anything else, even the URI of this post. The important point is that it has to be a URI, and has to be unique for this specific use. In this particular case, the URI is also a URN.

Another interesting usage of URIs is in ontologies and RDF descriptions. Briefly and roughly, an ontology is a “description of a world”. Suppose that you want to describe the following information

my guitar is white

and you want a computer to be able to understand it. Moreover, you would like to make the computer understand that the guitar is a musical instrument

The guitar is a musical instrument

and that white is a color

white is a color

Now, if you ask the computer to search all the musical instruments that are white, you would like to get my guitar, because it is white, and because it is a musical instrument. Seems easy, but it’s not. The fact is that humans are very smart at interpreting those phrases, but a computer is not. Quoting Bill Clinton, it depends on what the meaning of “is” is. By saying that “my guitar is white” we describe a property of the guitar in terms of its color. We have a subject (”my guitar”), a verb (”is”, in terms of “has color”) and a predicate (”white”). This is also known as a “triplet”, and is the basis of RDF.

Of course, if we take “the guitar is a musical instrument” we have again a case of triplet: “the guitar” is the subject, “is a” in terms of “is a kind of” is the verb, and “musical instrument” is the predicate. Please note that we have another interesting fact here: while “my guitar” is a very specific guitar (that is, the guitar I own), “the guitar” is an abstract concept that applies to any guitar. They are not the same thing, they are two different concepts, but we can say without doubt that “my guitar (the first concept) is a guitar (the second concept)”(this is a case of instance/class relationship).

We can form very complex networks of subject-verb-predicate triplets describing the digital and non-digital world we live in, so that a computer can help us in doing complex search and analysis. How do we differentiate all the concepts and ambiguities we just encountered? You guessed it: with URIs. We will have a URI to express the concept of “my guitar”, another URI to express the concept of “white”, another URI to express the concept of “guitar”, “musical instrument”, “color”, and we will also have different URIs expressing the concepts of “is (as a color)” and “is (a kind of)”. All these concepts are part of the description (also known as ontology) we want to grant to the small world we created in this example. This description is rather simple and loose, but we can define way more complex ontologies. For example, we can create a color ontology, describing the colors and their relationships:

white is a color, red is a color, snow white is a kind of white, blood is a kind of red, red is a warm color, blue is a cold color

or an instrument ontology describing

guitar is a six-string instrument, violin is a four-string instrument, four-string instrument is a string instrument,
six-string instrument is a string instrument, string instrument is a musical instrument

Each of these concepts (”guitar”, “six-string instrument”, “violin”, “four-string instrument”, “string instrument”, “musical instrument”) will then be referred by means of a unique URI.

LHC proton dump

How to stop a beam of protons traveling at 99.99 % of the speed of light and able to melt 500 kg of copper in less than a second ?

I’ve found this interesting article about how to dispose of the proton beam from the Large Hadron Collider. The amount of energy and the focus of the beam is so strong that any metal would melt. So they use graphite, a solid block of it, 8 meters long and almost one meter large, weighting almost 10 tons. Graphite has a very high melting temperature, making it a good choice for the task. Any other solution would either melt or instantly vaporize, leading to a catastrophic explosion.

As a consequence, the human head is not supposed to be used for the task. Despite this, a dramatic accident happened to Anatoli Bugorski, who happened to stick his head into a particle accelerator beam in 1978, and survived to tell the tale.

Error 1044 in MySQL: Access denied when using LOCK TABLES

I got an error while using mysqldump

mysqldump: Got error: 1044: Access denied for user x@y to database z when using LOCK TABLES

To solve this problem, either ask you administrator to grant you the lock privileges, or use the following command instead.

mysqldump -u username -p database –single-transaction >dump.sql

Proofreading!

Proofreading is fun.

You proofread something with three pair of eyes for a bunch of months. When you believe your manuscript is perfect, you send the stuff to the editor, which after a while returns you the formatted proofs.

Then you take the proofs and print them out, and again the three pair of eyes start again reading them. The result is that you still find a bunch of errors! To avoid an errors in dissertation you can always buy dissertation online. But… there’s more. Each pair of eyes find errors than the other two pairs did not spot.

But the most funny part is when you send a document containing the cumulated corrections to the editor, and reviewing this document you spot errors in it.

You just have to live with it. A book will never be without errors.

Moving to ETH Zürich

I am starting a postdoc at ETH Zürich the 1st of October. I will work on data management and high throughput calculations in quantum chemistry, and I will probably be involved in a standardization process (as I already took part earlier) for communication and data sharing. The experience I did in sequence analysis was precious, as they already worked on standardization issues before quantum chemists. The experience I did outside academia was important to develop a method and a rigorous approach to programming.

I am very excited of the project and the opportunity. It will be fun and intriguing, and it is a project that can really create new perspectives in computational sciences.

Chestnut Package Manager 2.0.0 released

I just released a program I developed: Chestnut Package Manager, a utility to handle executables and resource files in a transparent, platform independent and relocatable way. Its concept is similar to Apple bundles and Java archives. It is implemented in Python.

I also provide a nice tutorial about how to use it and how to deploy your packages. It has been very useful to me, and I guess it will be for other people out there.

Computing for Comparative Microbial Genomics

I am proud to announce that Springer has finally released on the web (and Amazon as well) the descriptive information of the textbook I took part on: Computing for Comparative Microbial Genomics. It was a fantastic and incredible experience, for which I will always be grateful to Dave Ussery, my supervisor at Technical University of Denmark, and to Trudy Wassenaar, an independent professional and Associate Professor at the same institute.

Together with the book, I deployed a supplemental information website, comparativemicrobial.com. At the moment, it does not contain much information, apart of some biographies and a couple of links. We expect to enrich it with simple tutorial code and up-to-date news as time passes, following the feedback we obtain from the readers.

EDIT: specified that only the description is available on the web, not the textbook itself ;)

Long time no write

I am cooking a lot of stuff in the pot. Soon I will post about all of them, one after another, but I prefer to post about new things when they are ready and officially out. So, stay tuned, because there’s really a lot of good stuff.

Close
E-mail It