Protein Folding Simulation at Scale

A deep dive into optimizing GROMACS for large-scale molecular dynamics simulations of the SARS-CoV-2 spike protein.

Case Study: Protein Folding Simulation at Scale

In this case study, we'll explore how we optimized a large-scale protein folding simulation for the SARS-CoV-2 spike protein using GROMACS on a multi-node GPU cluster.

Project Overview

Challenge: Simulate the folding dynamics of the SARS-CoV-2 spike protein (2.2M atoms) for 1µs.

Hardware Available: - 8 nodes with 4× NVIDIA A100 GPUs each - 100 Gb/s InfiniBand interconnect - 1 TB RAM per node

Initial Setup and Challenges

When we first attempted this simulation, we encountered several challenges:

Memory Bottleneck

# Initial GROMACS error
Fatal error: Out of memory allocating 24GB for PME grid

Poor GPU Utilization

# GPU utilization from nvidia-smi
|  GPU-Util  |
|         25%|  # Much lower than expected

Solution Strategy

1. System Preparation

We optimized the system setup using careful periodic boundary conditions:

# Using MDAnalysis to set up the system
import MDAnalysis as mda
from MDAnalysis.analysis import distances

# Load structure
u = mda.Universe('spike.pdb')

# Calculate optimal box size
dims = distances.box_dimensions(u.atoms)
box_margin = 1.5  # nm
new_box = dims + 2 * box_margin

print(f"Optimal box dimensions: {new_box} nm")

2. GPU Task Distribution

We implemented a hybrid parallelization strategy:

# GROMACS run command with optimized parameters
gmx mdrun -deffnm spike \
    -nb gpu \
    -pme gpu \
    -bonded gpu \
    -update gpu \
    -ntomp 8 \
    -npme 2 \
    -nstlist 100 \
    -dlb yes

3. Performance Monitoring

We developed a custom monitoring script:

import matplotlib.pyplot as plt
import numpy as np

def parse_log(logfile):
    performance = []
    with open(logfile, 'r') as f:
        for line in f:
            if 'Performance' in line:
                # Extract ns/day
                perf = float(line.split()[-2])
                performance.append(perf)
    return performance

# Plot performance over time
performance = parse_log('md.log')
plt.plot(performance)
plt.xlabel('Time (steps)')
plt.ylabel('Performance (ns/day)')
plt.show()

Results

Performance Improvements

Optimization Step	Performance (ns/day)	GPU Utilization
Initial Setup	15.2	25%
Memory Optimized	22.7	45%
GPU Tasks Tuned	42.3	85%
Final Setup	48.9	92%

Scaling Analysis

# Scaling data visualization
nodes = [1, 2, 4, 8]
speedup = [1, 1.85, 3.42, 6.1]

plt.plot(nodes, speedup, 'bo-')
plt.plot(nodes, nodes, 'k--', label='Linear')
plt.xlabel('Number of Nodes')
plt.ylabel('Speedup')
plt.legend()
plt.grid(True)

Key Learnings

Memory Management
Use PME decomposition carefully
Monitor memory usage per rank
Adjust box size based on protein dimensions
GPU Optimization
Balance PME workload
Tune update groups
Optimize neighbor searching
Network Considerations
Use GPUDirect RDMA
Optimize domain decomposition
Monitor communication patterns

Code Snippets

Production Run Script

#!/bin/bash
#SBATCH --job-name=spike_fold
#SBATCH --nodes=8
#SBATCH --gpus-per-node=4
#SBATCH --time=48:00:00

module load GROMACS/2023.2-CUDA

# Environment setup
export GMX_FORCE_UPDATE_DEFAULT_GPU=true
export CUDA_VISIBLE_DEVICES=0,1,2,3

# Run simulation
mpirun -np 32 gmx_mpi mdrun \
    -deffnm spike \
    -maxh 47.5 \
    -nb gpu \
    -pme gpu \
    -bonded gpu \
    -update gpu \
    -ntomp 8 \
    -nstlist 100 \
    -dlb yes

Conclusions

Through careful optimization and monitoring, we achieved: - 3.2× performance improvement - 92% GPU utilization - 6.1× scaling efficiency on 8 nodes

Resources

GROMACS Performance Analysis
GPU Optimization Guide
Raw Data and Scripts

Discussion

Share your experiences with large-scale protein simulations or ask questions below!