Success Story: Generative AI Meets Legacy HPC Codes – Local, Frugal and Confidential AI Assistance for Scientific Software
success story # highlights:
- Keywords:
- Generative AI
- Local LLMs
- Legacy code modernisation
- Fortran
- Code documentation
- Data sovereignty
- Frugal AI
- Industry sector: Aeronautics, Energy, Environment/climate/weather, Manufacturing & engineering
- Key codes used: CodeLSD, Compilysis (LLM compiler helper), AVBP
Benefits:
- 500,000 lines of legacy Fortran documented in natural language in 70 minutes
- Full data sovereignty: source code never leaves the premises during AI-assisted work
- Free of cost in term of tokens, far below cloud API pricing
- Faster onboarding of junior developers through searchable, plain-language code maps
Organisation involved:
Cerfacs is a science centre that produces innovative solutions for the simulation of earth physics and engineering, backed by seven public and private partners (Airbus, CNES, EDF, Météo-France, Onera, Safran, TotalEnergies). Its COOP team focuses on the engineering of scientific software. Since the end of 2025, Cerfacs has developed, benchmarked and deployed the local generative-AI tools used in this story.
codes involved:
- CodeLSD: an open-source (MIT) pipeline translating scientific source code into natural-language documentation
- LLM compiler helper (Compilysis): an interactive compile-explain-fix loop for Fortran/C++ developers
- AVBP: high-fidelity reactive flow solver used as the demonstration legacy codebase
technical / scientific challenge:
Scientific HPC software lives in massive, long-lived Fortran or C codebases maintained by a handful of domain experts. The AVBP combustion solver, for instance, has grown beyond 500,000 lines of code over 20 years. Hardware heterogeneity now forces deep rewrites, while half of the upcoming contributors are junior scientists trained on generative AI before research methodology. Yet cloud AI assistants are ill-suited to this world: confidential industrial sources cannot leave the premises, and token costs and energy footprints are hard to control.
Solution:
The Cerfacs COOP team built and benchmarked a complete local generative-AI workbench for scientific code, running entirely on on-premise hardware (Apple M-series laptops, NVIDIA Spark and H100 nodes) with open-weight models served by Ollama and vLLM. CodeLSD, an open-source pipeline, translated the 500,000 lines of AVBP Fortran into structured English documentation in 70 minutes, served to users through an interactive code map. A compiler-helper tool closes the loop between compiler errors and plain-language explanation and fixes, keeping the developer in control. Systematic benchmarks on model choice, Fortran fine-tuning, agents, skills and deterministic prompt pipelines turned these prototypes into reproducible practice.
impact:
European engineering depends on legacy HPC codes whose modernisation is slowed by scarce expertise. By proving that useful AI assistance runs on local, modest hardware, this work removes the main blocker for industrial adoption: confidentiality. In line with The French Cybersecurity Agency (ANSSI) security recommendations for generative AI systems, no source code or user data ever leaves the premises, so even restricted codebases can benefit of such tools.
The approach is also frugal. Internal benchmarks measured generation around $0.02 per million tokens at a 7 W power draw on a laptop, versus $0.06–0.22 on cloud APIs, and showed that small, well-chosen or fine-tuned models can match larger ones on specific tasks. Serving engines such as vLLM raise throughput by an order of magnitude under concurrent load.
Finally, the performed work changes how teams relate to AI: by building their own agents, skills and invariants (“mooring points” such as code manifestos and single-responsibility tests), developers keep scientific control over the AI-assisted evolution of their codes, preparing them for heterogeneous exascale systems such as Alice Recoque.
Potential EXCELLERAT Services:
Code-to-documentation campaigns with CodeLSD: batch translation of a legacy Fortran/C/C++ codebase into structured natural-language documentation, delivered as an interactive, searchable code map.
Secure local AI deployment audit: selection and benchmarking of hardware (laptop to H100), open-weight models and serving engines (Ollama, vLLM) for an engineering team’s confidentiality, cost and energy constraints.
Custom agents and skills for code communities: compiler helpers, review and test-generation agents built around the community’s own invariants (code manifestos, single-responsibility tests, know-how skills) which can be coupled to other tools developped during excellerat such as Maraudersmap.
unique value of each service:
CodeLSD is code-agnostic, open-source and fast enough (a full industrial solver in about an hour) to be rerun at every release; unlike raw chat summaries, its results are served to all users through an interactive map of the codebase.
Recommendations are grounded in measured power, throughput and cost benchmarks across real hardware tiers and aligned with ANSSI security guidance, not vendor claims, ensuring viable, sober and sovereign AI deployments.
- The agents encode the team’s own standards and keep the human in the loop at every fix, so the community owns its AI tooling and never has to say “the AI did it” about its scientific software.
