Nek5000 in EXCELLERAT: towards Exascale

One of EXCELLERAT’s tasks is improving Nek5000’s Exascale readiness, particularly with respect to accelerators. As a starting point, we used an OpenACC version of the proxy-app Nekbone and focused on further optimising the small matrix-matrix kernels which constitutes most of the work in the full Nek5000. By tuning the OpenACC directives, the performance of these kernels could be improved with almost a factor of two, however this was still far from the measured roofline of these kernels. This led to the development of a hybrid OpenACC + CUDA version of Nekbone, where the main part of the code is parallelised using OpenACC, while the matrix-matrix kernels has been rewritten in CUDA. Using CUDA in the performance critical kernels allowed us to further tune these kernels far beyond what is possible with OpenACC, increasing the performance by almost a factor of three compared to the baseline OpenACC implementation, and achieving 80% of the measured roofline. The experience we have gained from tuning Nekbone is now helping us to port and tune the accelerator port of Nek5000.

An important issue is as well the post-processing step, which is challenging due to quickly growing volume of the data produced by a simulation and a relatively big cost of I/O operations. That is why in-situ visualisation become more popular lately, and we instrumented Nek5000 to use Catalyst to perform data reduction during the simulation. An important part of the workflow is a Python script produced by the visualisation software Paraview. This script contains the instructions which will be executed during the run time of the simulation. Usually the main objective of these instructions is to manipulate and display scientific data through the use of the Python VTK API.

One of the main advantages of using this approach is related to the fact that, once the code has been instrumented, a several number of different analyses can be performed by simply changing the script used. This great advantage is overshadowed by the complexity of the number of details to be considered during the compilation of Catalyst, i.e. OpenGL, Mesa, and Python (along with its respective dependences) must be already available previously to the compilation.

In order to facilitate part of the procedure above described, the version 5.6 of Paraview has been modified, so that the need for third-party software components has been reduced in order for filters to be implemented by using C++ instead of the scripts created by Paraview. As a consequence of these modifications scalability and portability are encouraged, while rendering is obviated. This tuned version of Paraview, named Decaf-Paraview, an HPC in-situ data analysis tool is currently being developed. The main goal of this tool is related to its capability of handling data arisen from the C++ VTK API. As result, data can be analysed in real-time by using external algorithms originally unavailable in VTK.

It is both important and interesting from the engineering design’s point of view to assess the sensitivity and robustness of the scale-resolving simulations of fluid flows with respect to the variation of numerical, modelling, and physical parameters. Also, we may need to construct a data-driven model that can be used as a surrogate for optimisation, for instance. Furthermore, we would like to estimate the uncertainty that would be involved in time-averaged quantities of turbulent flows due to potentially insufficient sampling in time. Such estimates would be really important considering the importance of verification in CFD and the fact that the industrially-relevant flows are computationally expensive and cannot be run on very long-time intervals.

To achieve these goals, relevant techniques in the fields of uncertainty quantification (UQ) and computer experiments can be employed. Implementing such techniques for CFD applications, in general, and simulation of turbulent flows, in particular, has led to the development of UQit which is a Python toolbox for uncertainty quantification.

UQit will be released in open-source during the run of EXCELLERAT and hence can potentially be beneficial for a larger users’ community. Generally speaking, UQit can be non-intrusively linked to any CFD solver through appropriate interfaces for data transfer. The main features which are currently available in UQit are as follow: standard and probabilistic polynomial chaos expansion (PCE) with the possibility of using compressed sensing, ANOVA (analysis-of-variance)-based global sensitivity analysis, Gaussian process regression with observation-dependent noise structure, as well as batch-based and autoregressive models for analysis of time-series. The flexible structure of UQit allows for implementing new techniques.

To demonstrate the importance of UQ in assessing accuracy, robustness and sensitivity of flow’s quantities of interest (QoIs) when the design parameters are allowed to vary, we have used UQit in different computer experiments using Nek5000. A novel aspect of the analysis is that we have been able to combine the uncertainty in the training data, for instance due to finite time-averaging, with the variation of numerical and modelling parameters. The developed framework is also applied to compare the performance of two widely-used open-source CFD solvers, Nek5000 and OpenFOAM, through evaluation of different metrics.

UQit has also been employed in collaboration with in-situ and data analytics working groups. The overall aim is to develop the pipelines required for conducting computer experiments relying on CFD simulators on HPC. The aim of these experiments can be diverse, including UQ and sensitivity analysis, constructing predictive surrogates and reduced-order models, developing data-driven models, and performing robust optimisation. In this regard, different projects have been defined which have shown promising results so far. In particular, we are investigating the impact of shape uncertainty on the flow features and also developing a novel strategy for driving adaptive-mesh refinement (AMR) in Nek5000.

We will continue our work within EXCELLERAT, combining the presented tools and making use of them during the final simulation of our use case. As a first step, we will enable in-situ analysis for nonconforming meshes produced by AMR runs.

—N. Jansson, A. Peplinski, S. Rezaeiravesh, M. Zavala