Screening the coding style of Large Fortran HPC Codes

Cover image for blog article, showing sequoias.
Large HPC Fortran codes are like a family of ancient giant sequoias in the forest of scientific software.

The EXCELLERAT Centre of Excellence includes several high-performance computing (HPC) laboratories that are preparing and promoting the use of HPC for engineering in the future. The project focuses on the software that will run on the next generation of hardware, Exascale computers.

Surprisingly, five out of seven core codes used for this exploration use Fortran, a language born in 1954. Moreover, three are more than twenty years old, and the largest of them started in the 1990s.

NAME

LANGUAGE

CREATED

Nek5000

Fortran

1990

AVBP

Fortran

1993

Fenics

C++/Python

2003

ALYA

Fortran

2005

Fluidity

Fortran (40%)

?

TPLS

Fortran

2009

Flucs

C++/Python

2012

If you consider the typical staff turnover in labs, PhDs candidates can invest three to five years at most. At the same time, HPC supercomputers show a similar turnover.
In terms of longevity, these codes have survived from two to six generations of hardware and lab personnel.

Why do physicists use Fortran to prepare for Exascale computing?

Fortran has been popular among physicists for decades. You can find an overview of the reasons in a blog article. However, we are not talking here about just any academic software. The peculiar aspect of this software is the objective of promoting the use of HPC research software in engineering decisions. In other words, all software on the list has already demonstrated:

  • Good performance, with respect to the HPC community
  • Proper physical modeling, with respect to the physical community
  • Acceptable accuracy, with respect to the numerical community
  • A large enough application field, to attract engineering users.
  • Satisfactory user experience, to not scare these users away. 

The more recent – and trendy – approaches (C++ templates, HPC domain-specific languages) are supported by relatively larger communities than Fortran. How has Fortran kept these huge warships afloat?

Pushing further longevity and performance

Before developing, we need to understand the structure of this software.

The fabric of scientific solvers: source code

A scientific solver is – usually – a very simple single component. You insert a bundle of data at the input, the black box hums for a while, then delivers heaps of data at the other end. What can we see if we open the top of this black box? Is it scary?

Inside the black box, you only find the source code, translated for the computer. Mere humans work on the code through the source code, and it looks like this (sample for the GitHub of Nek5000):

if (icalld.eq.0) then
! just in case we call setup from usrdat2
call fix_geom
call geom_reset(1)

call set_intflag
call neknekmv
if (nid.eq.0) write(6,*) ‘session id:’, idsess
if (nid.eq.0) write(6,*) ‘extrapolation order:’, ninter
if (nid.eq.0) write(6,*) ‘nfld_neknek:’, nfld_neknek

nfld_min = iglmin_ms(nfld_neknek,1)
nfld_max = iglmax_ms(nfld_neknek,1)
if (nfld_min .ne. nfld_max) then
nfld_neknek = nfld_min
if (nid.eq.0) write(6,*)
$         ‘WARNING: reset nfld_neknek to ‘, nfld_neknek
endif
endif

There is obviously a lot of jargon but Fortran keywords are easy to spot and quite explicit: IF, ENDIF, THEN…
While mainstream computer science can deal with an endless list of action (database queries, text processing, 3D rendering, security, etc…), these codes involve only computation and memory access.
Therefore, you do not really need a computer scientist nor a general purpose language to write these codes, but you need numerics, physics, and HPC experts… and lots of them.

Looking at the structure of the fabric

The source code is a lot of lines, nested in a typical structure.

  1. These lines are grouped in blocks (Usually Subroutines, Functions or Module in Fortran)
  2. The blocks are gathered in files. (Usually .f or f90)
  3. The files are grouped in directories

The following figure gives you an overview of the code AVBP illustrating this nesting with circular packing.

Repartition of AVBP source code

The largest circle is the root folder of the source code. In the first circle, the darker circles represent several subfolders: ./NUMERICS, ./PARSER, ./IO, etc…
The nesting continues down to the darkest and smallest discs, each standing for one subroutine or function. For example, the subroutine hdf_kewords inside ./IO/hdf_kewords.f90 is highlighted with a white circle.
The relative size of the circles is proportional to the amount of lines inside.

You can grasp with this figure, the variability and nesting of the source code, the very material each one will edit. These characteristics are specific to each code. The very same image for the core folder of Nek5000 looks quite different, without any nesting.

Repartition of Nek500 source code

How big are the codes?​

Now comes the scale. The “large” codes are made from 100 to 1000 thousands of lines. This is small compared to large commercial software and development teams over one hundred people. But this is very large for a single black box managed by that many experts, with no mainstream computer scientist.

For example, the AVBP software has about 200 000 lines. Assuming one hour to completely understand 50 statements of code, one would need more than two-and-a-half years to read the code (1600 hours/year, no mail).
In a three-year French PhD timeframe, you can reasonably assume that a maximum of ten percent of the code can be read, understood, and edited by a single worker.
Such a situation is a fertiliser for dangerous code smells: spaghetti code, cut and paste programming, and many more.

Large legacy codes imply that a single person can only work on a small aspect. The key is to efficiently use this tessellation of staff effort on the great ensemble.

How to make the best of this situation?

The challenge of working with legacy code is not limited to Fortran, and plenty of excellent resources already exist.
You can start with some educational articles, or investigate further with specialised authors such as Adam Tornhill.

In the EXCELLERAT CoE, we focused first on reducing the time needed for a new developer to understand the abstract ideas of someone else, coded in Fortran, by promoting a homogeneous coding style via the use of linters. The success of Python PeP-0008 coding standard is a good illustration of the project. Unfortunately, Fortran linters are scarce and often commercial. Moreover, the reader will understand now that legacy Fortran codes need the following additional characteristics:

  • OpenSource, to be sure of what the linter is doing in my outdated but mandatory fortran66 routine (retro-compatibility prevails!).
  • Easy to install and imbed in continuous integration pipelines, to provide a systematic enforcement.
  • A customisable and extendable set of rules because every community comes with its specific preferences.
  • The ability to spot clearly the weak parts because a single person cannot do the full work.
  • Inclusivity. Fortran did change a lot, from Fortran (19)66 to Fortran 2018, and all versions shall be accepted.


Here is a very partial shortlist of existing Fortran linters:

  • fortran-linter is light and free, but without customisation yet.
  • the linter-gfortran is a plug-in for the linter of ATOM Integrated Development Editor. While perfect from the ATOM IDE, it is a bit complex to use automatically in a continuous integration pipeline.
  • Cleanscapes’s FortranLint is a commercial one.

Concerning coding style, there is no widely adopted standard in the community yet. The well-known Fortran standards are oriented toward compilers, not humans.
The Colorado State University Fortran Coding Style is worth mentioning, since it is human-oriented.

Consequently the EXCELLERAT CoE developed its own lightweight linter, flinter, with an initial set of rules inspired by the Colorado State Coding Style and the Python PEP-0008. The customisable set of rules should help the comparison of different teams’ coding styles, and – maybe – converge to a more general set of rules.

What is the added value of checking the coding styles?

The advantages of monitoring the coding style will be illustrated in this last section. Several code bases are checked, and the global figures are reported in the following table:

Nameversionsize (statements)raw flint score (blind test)
Nek5000v19 No Refactoring50 008-5.73
AVBP sources7.7193 2570.53
AVBP tools7.7136 3870.93
AVSP sources6.130 636-0.56
NTMIX sources5.128 3630.01
NTMIX sources3.031 525-0.28

All these scores are computed with the same coding style. However, do remember that this default coding style IS NOT the standard actually enforced by each development team today.

The rating formula is similar to pylint formula:

rate = (struct_nb * 5 + syntax_nb) / stmt_nb * 10

In this formula, stmt_nb are the number of statements (no blank lines or comments). This number is largely smaller than the actual number of lines.
syntax_nb are the number of syntax warnings. It can range from a mere space character missing around an operator, to a bare EXIT without error code (*ie a correct end, even if this exit comes from a sanity test failing).
struct_nb are structural warnings, ranging from too-long routines (a single subroutine of 5000k statements is inhumane to maintain) to short variable names (try to rename the variable tin a text…).
Like pylint, the maximum score is 10. A rating of 0 or below means the authors were just not aware of this coding style.
There is no minimum score, but you can easily reach -100, especially on a short routine while taking care of spacings.

Refactoring monitoring

In this first comparison, a CFD direct numerical simulation code has been refactored. In the first scan, the version 3.0 shows many bad-rated (Purple) small routines on the lefthand side.

Scan of NTMIX v3, before refactoring (rated -0.28)

On the following refactored version, these badly rated routines vanished. Actually, they were merged into larger routines and cleaned. In case you were wondering, these routines are all named dfdx_* and dfdy_*, meaning the refactoring was done, at least, on the gradient operators.

Scan of NTMIX v5, after refactoring (rated 0.01)

This example illustrates the typical task of refactoring, with someone rewriting a batch of selected files.

Two solvers from two teams

In the following we will compare two general-purpose academic CFD codes, Nek5000 and AVBP.

Scan of Nek5000 v19, before refactoring (rated -5.73)
Scan of AVBP V7.7 (rated 0.53)

The two scans show that the default coding style is closer to the one enforced in the AVBP team. A different default style could have given the opposite result. For example, you see AVBP is allowing more nesting in the sources (see the upper blue part, related to analytic chemistry schemes). Nesting sources can be seen as a bad pattern, because it makes the navigation difficult. This trait is not part of the current coding style, and could have decreased the rating of AVBP with respect to Nek5000.

The proper coding style should always be defined by the developers, and never blindly imposed.

Same teams, different solvers.

To illustrate how coding styles are team-dependent, we will look at two more projects from Cerfacs.

First, there are the tools used alongside the combustion simulations of AVBP. These tools are independent of each other. This explains the extraordinary variability in size in the first circle. So independent in fact that some lines are duplicated, instead of being shared. You can spot four identical structures on the right part, which are four repetitions, with slight tunings, of the same open-source Chemkin library.

Scan of tools related to AVBP (rating 0.93)
Scan of AVSP, spin of AVBP (rating -0.56)

Surprisingly four different projects developed by the same team show global rating in the range [-1, 1], while the rating formula allows [-100, 10]. In other words, the “distance” to the standard is globally the same.

Takeaways

We have noted that the majority of the software selected by an HPC Centre of Excellence for engineering applications are large Fortran legacy codes. We have shown that, in this context, “large” means that the usual single worker can only edit a small part of the source code.

The EXCELLERAT project, trying to promote the use of HPC by engineers, provided flinter, a tool helping developers observe and monitor these small parts from a maintenance point of view. The tool computes the adhesion of code parts to a customisable coding style.
The examples have illustrated that coding styles are already enforced in the tested situations, and that refactoring already involves coding style improvements.

We stress that in the flinter score hereabove, like in pylint, the syntax warnings (spaces, deprecated uses etc…) are far less important than structural warnings (too many variables, undescriptive naming, bloated functions or subroutines). This is because the technological debt comes more from these structural problems, and there is no magical item able to do this refactoring in any language. (Even Python’s black and autopep8 tools are limited to syntax issues.) Such chores have to be done manually, and the adhesion circle scans can tell where refactoring matters.

The comparison of these coding styles among EXCELLERAT partners will be the topic of a future communication.

—Antoine Dauptain, Cerfacs

Acknowledgements:

This work has been supported by the EXCELLERAT project which has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 823691.

Many thanks to Jérôme Lecomte who kindly provided the community with the nice circlify package, which enables the circular packing to be computed, and codemetrics, which gives a lot of insight into the code we wrote.