Why is most science programming done in fortran?

I found this interesting question in the referral logs on ForTheScience. Why is most science programming done in Fortran (77 or 95)?

After some thought, I can fill the following reasons:

  • Fortran is simple to understand. Not the code itself maybe, but the style. The learning curve for doing something in Fortran is very low, and after you manage the basic concepts, read and write, you can be proficient enough to write even complex computational application. Most scientists are not programmers, and they would be overwhelmed with the intricacies of C and C++. I would strongly prefer a non-programmer to code Fortran than, say, perl. In other words, Fortran just fits the need computational scientists have: read numbers, do a calculation, write the result, and everything can be taught to a profane in a standard semester course.
  • Fortran is computationally efficient. I will not go into the age old debate about “is Fortran really faster than C?”, for which I have a rather articulate opinion I won’t delve in. Instead, I will just present the fact that is indeed one of the languages whose compilers and computational libraries have been beaten to death for computational efficiency, being their marketing value.
  • Fortran is old. This has the effect of producing a huge amount of legacy code that have to be maintained or reused. Rewriting this code is normally not possible: who should do it? Even if this task requires just one man month in three years of Ph.D. contract, it will probably not produce a scientific publication, so nobody want the task. Moreover, the rewriting will likely ruin interfacing with other codes, programs and libraries, as well as the group knowledge (if any) of the code, so this move will almost always be opposed.
  • Fortran has a slow release cycle. Backward compatibility has been kept into account. Knowing that the code you wrote in the 80s will still compile today (or eventually you will have to add some compiler switch) make everyone happier. I am not sure you can run a perl or python program written 10 years ago and have it running today. I have no experience with C and C++ and old codes, so I am not completely sure about this point, and I welcome being proven wrong.

There are for sure many other reasons, but I won’t go further.

Let’s see instead why and when Fortran should not be used. My head goes to python for most comparison:

  • Fortran has very reduced expressivity. You need a lot of code, often redundant, to code something. In some cases, you need to put stuff in temporary variables to pass the information to a subroutine, introducing more variables (difficult to maintain) or recycling old ones (bad).
  • Fortran (77, things are better in 95) makes very difficult to perform modular programming. Namespace pollution can be dramatic on large programs, especially considering the short identifier limit (not an issue on modern compilers, but in violation of the standard). Fortran 95 modules are a step ahead, but you can’t group modules into submodules.
  • Fortran (95) does not allow storage of function or subroutine pointers, making callback-oriented programming very hard.
  • Fortran (95) does not allow inheritance. Smart workarounds exist, but they require some skills and the base class develops a dependency towards derived ones.
  • Fortran has no polymorphism nor templating, making very painful to work on generic data types. Again, workarounds exist, but they require external tools.
  • Fortran makes very difficult to keep loose coupling. A very strong dependency network arises. For large programs, the number of modules USEd (or the amount of code in them) may increase considerably. Compare with python, where a module does not need to be imported if you have to call a method on an object inside that module, or with C++, where you have forward declarations.
  • Fortran (95) does not have object orientation, it is very difficult, if not impossible, to use traditional design patterns.
  • Fortran does not have exceptions (F2003 will, but not custom ones, as far as I know).
  • Fortran has IMPLICIT. (Edit: yes, it has IMPLICIT NONE, but the existence of implicit declaration is unfortunately abused still today. It should have been deprecated.)
  • Fortran (77) does not have aggregated data types and dynamic memory allocation (in the standard)
  • Fortran strings are not dynamic in length (unless, if I remember correctly, if you do very weird hacks). A string of Length 100 and another of Length 101 are like being two different datatypes (say an int and a string), unless you use the LEN=* in routine calls, but you cannot make more room to an allocated string if needed.
  • Fortran did not have clear interfacing with C, and every compiler did as it pleased. Apparently this is no longer true with the introduction of BIND.
  • No effective tools exist for documenting the code or easily perform Test Driven Development.
  • Libraries out there are targeted at computational tasks. I haven’t seen any good library for GUI programming, networking, db access, and even if you could, would you ?
  • Fortran is full of unusual pitfalls for anyone used to a different language. While pitfalls exist in any language, Fortran has pitfalls coming from compatibility towards older improper use (e.g. automatic SAVE in assignment at declaration). In some other cases though, pitfalls are due to the highly optimized nature of the language. These pitfalls are in general a strong deviation from the behavior of any other language using the similar constructs.
  • Most of the code out there uses old code. Even if the language progressed, you will still find ancient remains of code written when the main writing method was a stick on a clay table. This code will most likely be impossible to refactor.

This is just out of the top of my head, and I am sure there is a lot more. In any case, Fortran 2003 seems to alleviate most of the problems outlined above. In particular, it will have object oriented programming, and function/subroutine pointers. A considerable step forward.

Please note that I wasn’t a Fortran fan, but with time I became tolerant to it. It should be used sparingly and only where the need exist, or if a real reason exists: use high level programming languages with good expressivity first, such as Python. Then eventually optimize where needed, sometimes with a drop of Fortran, but only if you really, really (yes, I mean really) need it.

4 Comments

  1. Magermans says:

    Perhaps the comment is late, but it is , I think, relevant.
    First of all, I’m a fan of Java. I also work on some simulation software written in Fortran.

    1) a lot of code to perform a task ??
    let me give a counter example
    Java (cold be C or C++):

    float[10][10] a;
    for (int i=0; i<a.length; i++) for(int j=0; j<a[i].length; j++) a[i][j] = 0;

    Fortran:

    real::a(10,10)
    a = 0

    2) modular programming is easy with fortran 95, Object programming is with fortran 2003.

    3) if fortran has IMPLICIT, it also have IMPLICIT NONE and each compiler has flags to prevent implicit statement.

    4) Dynmic memory allocation, aggregation: not missing any more since fortran 90

    5) dynamic strings: foreseen in fortran 2003, not yet implemented by compilers. The only cons so far. Nevertheless, charaster(100) and Character(101) are more like an double and a float than an int and a string, as thay can be compared and assinged to each other…

    6) GUI programming, Networking, DB access, real time, inter-process communication and much more… Yes, I've done that in Fortran 15 years ago when I was programming Satellite control center, with high requirements on reliability. A lot safer than doing them in C or C++.

    Conclusion: The language is secondary and you cannot say "Don't use that or that language". There are Pros and cons for all of them, and it always depends on the programmer…

    Pol Magermans
    Software Engineer
    University of Liège, Belgium.

  2. > 1) a lot of code to perform a task ??
    > let me give a counter example
    > Java (cold be C or C++):
    >
    > float[10][10] a;
    > for (int i=0; i a[i][j] = 0;
    >
    > Fortran:
    >
    > real::a(10,10)
    > a = 0

    The example you present specifically addresses one strength of fortran (90): handling arrays. For any other task, it will take more code. I cannot show you real examples here, but you are welcome to download the Q5Cost library for real cases of bloat code to perform trivial tasks such as exception handling.
    By the way, if the example you gave was C, I would not zero an array the way you did, I would simply use bzero().

    > 2) modular programming is easy with fortran 95, Object
    > programming is with fortran 2003.

    Not really. Modular programming is “easy” once you know how it works, but the way is implemented makes it a pain to handle. Namespacing is limited to basically one level, and you cannot rename your module import. When you use USE, you import every symbol in your scope, meaning that if you have two different modules, but with the same symbol name inside, they will clash. Unless there’s a trick I don’t know about, this means you cannot import them at the same time. This breaks havoc if you want to use two different modules with the same interface but with different implementation (for example, if you want to read data from a file format and store it into another file format).

    > 3) if fortran has IMPLICIT, it also have IMPLICIT NONE and
    > each compiler has flags to prevent implicit statement.

    Yes, but they are not turned on by default. This is one case where backward compatibility should have been dropped for the sake of forcing a bit of proper coding style. We would have much better code around now.

    > 4) Dynmic memory allocation, aggregation: not missing any more
    > since fortran 90

    Except for strings.

    > 5) dynamic strings: foreseen in fortran 2003, not yet
    > implemented by compilers. The only cons so far.

    well, last time I checked we are in 2009 ;)

    > Nevertheless, charaster(100) and Character(101) are more like
    > an double and a float than an int and a string, as thay can be
    > compared and assinged to each other…

    In other words, they are like two different datatypes, with forced implicit coercion.

    > 6) GUI programming, Networking, DB access, real time,
    > inter-process communication and much more… Yes, I’ve done that
    > in Fortran 15 years ago when I was programming Satellite
    > control center, with high requirements on reliability.

    Interesting. Using what public available libraries ? I guess they were internally produced in your case. This unfortunately limits its access to the rest of the world, making adoption of fortran for these tasks basically nil.

    > Conclusion: The language is secondary and you cannot say
    > “Don’t use that or that language”. There are Pros and cons
    > for all of them, and it always depends on the programmer…

    Exactly what I said, and I totally agree with you. I would not use python for a low level device driver and I would not use Fortran to access a MySQL database.

  3. MSB says:

    An ISO Technical Report defined a standard addition for variable length strings. An open source implementation exists since 2003. You “use” iso_varying_string. Deferred length allocatable scaler strings (Fortran 2003) provide dynamic length strings. Some compilers supporter this already, e.g., Intel ifort since mid-2009.

    Re: “When you use USE, you import every symbol in your scope, meaning that if you have two different modules, but with the same symbol name inside, they will clash. Unless thereâ??s a trick I donâ??t know about, this means you cannot import them at the same time.”

    There are two methods in Fortran to solve this problem:
    “use ModuleName, only: ItemName” will select only “ItemName” from the module “ModuleName”, so if that module has items that you don’t need that clash with another module, they are omitted and don’t clash. If two modules have items that you need that have the same name, you can rename the item: “use ModuleName, OrigName => NewName”. Clash gone!

    “short identifier limit”? 31 characters in Fortran 95, 63 in Fortran 2003. Longer than this, the lines of the program will get rather long…

  4. @MSB:

    > There are two methods in Fortran to solve this problem:
    > “use ModuleName, only: ItemName” will select only “ItemName”
    > from the module “ModuleName”, so if that module has items that
    > you don’t need that clash with another module, they are omitted
    > and don’t clash. If two modules have items that you need that
    > have the same name, you can rename the item: “use ModuleName,
    > OrigName => NewName”. Clash gone!

    yes, but you will have to specify it by hand for each routine you need to import. This can become very tedious as you have to cherry pick what you need, and write even more boilerplate code. Compare to how python solves the problem:

    import foo
    import bar

    foo.hello()
    bar.hello()

    In Fortran you would have to “USE foo” and “USE bar”, plus rename the hello routines with “USE foo, hello => foo_hello” and “USE bar, hello => bar_hello”. This introduces two problems, the first is given above: you need to do it for any routine you need and clash, which can be a lot if you are, for example, using two IO modules with a standard interface for conversion among different formats.

    The second problem is that if you are particularly unlucky, and instead of hello you have Conn_getRemoteServerInfo (25 chars) and the module Network (7 chars), to stay consistent in style you will have to do USE Network, Conn_getRemoteServerInfo => Network_Conn_getRemoteServerInfo which is 33 chars and therefore in excess of the standard.

    The identifier issue I raise however is a minor point and unfortunate case, though. It is not representative, you can work around it, but during actual programming we did hit it. The root cause was forcing a namespacing-like workaround to keep things organized and programming OO-style without doing OO at all (don’t ask, I fought against it but sometimes a compromise is needed), in a language that does not support nested namespaces.

Comments are closed.