Explaining Scientific Code, Assisted by Large Language Models

Understanding an unfamiliar codebase is a recurring challenge in scientific computing. Many researchers have experienced the convenience of pasting code into a large language model (LLM) and asking: “Explain this code.” While effective for small snippets, this manual approach quickly becomes tedious, inconsistent, and inefficient as projects grow in size and complexity.

Our technical EXCELLERAT P2 partner, CERFACS, addresses this problem in a blog article with WalkingPrompt: a workflow for code explanation assisted by language models, designed to keep the human developer in control while reducing repetitive effort. Instead of relying on ad-hoc copy–paste prompting, WalkingPrompt systematically traverses a repository, identifies relevant code parts, splits them when needed, links with dependencies, and generates structured summaries that can be aggregated across modules and directories.

In practice, this repository-level approach produces more consistent explanations than manual prompting. It helps avoid common pitfalls, such as misleading generalisations or inconsistent terminology. The systematic aggregation and summarisation gives understanding of the code base at the higher levels, and creates a coherent first layer of documentation, even for large, multi-module scientific codes. The human remains central: reviewing outputs, spotting hallucinations, and deciding how summaries are used.

The article also discusses current limitations and future directions. Improvements in prompt design, smarter code chunking, integration of structural information such as call graphs, and careful model selection all influence the balance between time, energy consumption, and trust. Particular emphasis is placed on local deployment, privacy, and reproducibility – key requirements in scientific and HPC environments.

Overall, WalkingPrompt illustrates how code understanding assisted by LLMs can become a sustainable background service rather than a fragile, manual task. By embedding documentation generation into regular workflows, scientific software can remain more accessible, maintainable, and trustworthy over time.

Read the full blog article by Antoine Dauptain (CERFACS) on the COOP Blog.