Three participants standing beside an EPICURE banner promoting high-level HPC application support, at a conference booth area.
Connecting with the HPC community at the EasyBuild User Meeting 2026
05/05/2026

From 14 hours to under 5: faster whole-genome analysis with EuroHPC’s MeluXina

By Latvian Biomedical Research and Study Centre

 

 

Large-scale genome sequencing projects are opening doors to earlier diagnoses, more effective treatments, and new insights into disease understanding. Yet, analysing thousands of whole genomes remains a major computational challenge.

 

The Latvian genome reference project explores scalable solutions to make fast whole-genome sequencing (WGS) analysis accessible and practical, which would benefit the patients and clinicians. With EPICURE’s support, the program reduced whole-genome analysis runtime from over 14 hours to under 5, achieving a 3.1× speed-up and enabling more scalable genomic processing.

 

The Latvian Biomedical Research and Study Centre has already sequenced around 4,000 human genomes but faces limitations in local HPC capacity that slow down secondary analyses. To solve this, they benchmarked and optimised their containerised variant-calling pipeline (nf-core/sarek with Nextflow) on the MeluXina supercomputer to identify configurations suitable for efficient large-scale genomic processing.

 

 

 

Scientific and technical challenges

 

WGS analysis at this stage requires efficient orchestration of complex, resource-intensive workflows. The initial pipeline setup faced several limitations, such as restricted parallelism at the node level and inefficient scheduling caused by large numbers of small jobs.

 

In addition, specific steps such as duplicate marking introduced significant computational overhead. Performance varied depending on storage tiers, executor configuration, and hardware choices. The researchers needed to understand how to balance CPU and GPU resources, optimise data placement, and ensure efficient execution across nodes, all of which were essential to improving scalability.

 

 

 

Image of a supercomputer with a blue filter

@LuxProvide

 

 

EPICURE support and EuroHPC resources

 

To overcome these challenges, the EPICURE support focused on the redesign and optimisation of the current Nextflow/Sarek pipeline for execution on the MeluXina system.

 

The original workflow was reconfigured to use the HyperQueue executor on pre-allocated nodes, to avoid inefficient job scheduling, and to enable effective multi-node execution. EPICURE also guided the application of MeluXina’s best practices for data placement and supported the installation of NVIDIA’s GPU-accelerated genomics software Parabricks.

 

The result was a robust and scalable pipeline that runs efficiently across both CPU and GPU environments and supports large-scale genomic analyses.

 

 

Results and impact

 

The optimised configurations delivered a significant improvement in performance and scalability. The best-performing setup, which used GPU nodes, Parabricks, and HyperQueue, reduced runtime from approximately 14.6 hours in the initial CPU-only configuration to around 4.7 hours using three GPU nodes. This corresponds to a speed-up of about 3.1×.

 

Beyond raw performance gains, the project defined practical strategies for scaling containerised genomics workflows on HPC systems. The results showed that the most substantial improvements came from GPU-accelerated tools combined with appropriate workflow orchestration, rather than relying only on hardware upgrades.

 

These advances enable a faster and more reliable process for large genomic datasets. They also support the completion of the Latvian genome reference project and contribute to broader initiatives such as the Genome of Europe project. In the long term, this work carried out by the Latvian Biomedical Research and Study Centre will support the development of allele frequency databases for clinical and scientific use.

 

 

 

Close-up 3D illustration of a DNA double helix composed of clustered particles against a light background.

 

 

Next steps

 

The next phase of the project will focus on applying these configurations and best practices to large-scale production analyses of Latvian genomic data. This will ensure more efficient use of both CPU and GPU resources at each stage of the pipeline.

 

Future work will address the optimisation of resource allocation and the development of additional components, such as joint-calling workflows, to support more advanced genomic analysis.

 

 

To learn more about the Exploring additional computational resources for Latvian genome reference project, visit the project page on the European HPC application support portal.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *