Data management workflow for HPC - SSC Data Management Workflow Portal
Organisations and smaller industry partners today face various problems in dealing with high performance computing (HPC) computations, HPC in general or even access to HPC resources. In many cases, the computations are complex and the potential users do not have the necessary expertise to fully use HPC technologies without support.
The EXCELLERAT Data Management Service develops best practices and provides support for managing the large amounts of data generated and used in technical applications. Efficient strategies for pre- and post-processing as well as storage, accessibility and efficient management of metadata will be explored. In addition, a highly automated data dispatch via exchange mechanisms is offered; depending on the process steps, it can be clearly identified where the data comes from and where it has yet to be provided. Overarching data workflow monitoring can be set up to avoid redundancies or unnecessary archiving of data.
SSC-Services GmbH is an IT service provider and member of the EXCELLERAT Centre of Excellence consortium. It develops individual concepts for corporate collaboration and customer-oriented solutions for all aspects of digital transformation. Since 1998, the company based in Böblingen (Germany) has been offering solutions for the technical connection and collaboration of large companies and their suppliers or development partners. The company’s roots lie in the management and exchange of data of all types and sizes.
As part of the EXCELLERAT project, SSC is developing a secure data exchange and transfer platform to facilitate the use of HPC for industry and to make data transfer more efficient. The developed data platform will help companies overcome or, in the best case, eliminate the above-mentioned obstacles to the use of HPC.
During the exchange of data or the provision of data to be computed, workflow monitoring keeps track of where the data comes from, what is done with it and where the result needs to be transferred to. Should a second run with the same data be required, the workflow identifies the data already present to avoid redundancies and thus increase the speed of the transfer. In addition, smart compression rules are applied to reduce the large amount of data provided. Versioning of the computation job is mandatory at this stage to monitor data throughput. This measure minimises the amount of changed data to be considered.
Depending on the code as well as the availability of HPC resources, the workflow automatically detects the best constellation within the HPC environment to perform the computations at the earliest possible time.
With the implementation of data management and workflow overview on the platform, it becomes possible to perform simple tasks such as cleaning or managing directories, which enables the use of machine learning and artificial intelligence. This could be used to generate workspaces based on experience or features, or to implement suggestions for calculations or optimised data transfer.
The unique value proposition is that SSC offers industry access to HPC through academic institutes and connects the two partners so that both sides can benefit from the platform.
SSC’s offering includes training, consultancy, best practice guides for using the platform, and provision and operation/support of the platform for industry to significantly facilitate the use of HPC.
The benefits for industrial end users and scientists can include:
- Secure, fast, traceable, automated, and intelligent data exchange
- Time and cost savings through a high degree of automation that streamlines the process chain
- Calculations that can be started from any location with a secure connection
- Workflow management
All simulation software (open source or licensed) is supported due to the generic structure of the platform, which benefits independent software providers.
The advantages for HPC vendors and HPC cloud providers are that the training phases for inexperienced users are shorter and the support effort for HPC vendors and HPC cloud providers is reduced. Also, the web front-end reduces the complexity of HPC use and increases user engagement in the HPC environment.
The SSC data management platform will be available via the EXCELLERAT Service Portal in the second half of 2022.