Communication Patterns
Optimizing communication patterns is crucial for achieving high performance in parallel applications. This guide provides patterns and best practices for efficient communication in MPI applications.
Common Communication Patterns
1. Master-Worker Pattern
// Master process
if (rank == 0) {
for (int i = 1; i < size; i++) {
MPI_Send(data, count, MPI_FLOAT, i, tag, comm);
}
}
// Worker processes
if (rank > 0) {
MPI_Recv(data, count, MPI_FLOAT, 0, tag, comm, &status);
}
2. Ring Pattern
// Ring communication
int left = (rank + size - 1) % size;
int right = (rank + 1) % size;
// Send to right, receive from left
MPI_Sendrecv(sendbuf, count, MPI_FLOAT, right, tag,
recvbuf, count, MPI_FLOAT, left, tag,
comm, &status);
3. Pipeline Pattern
// Pipeline communication
if (rank == 0) {
// Process data and send to next
MPI_Send(processed_data, count, MPI_FLOAT, 1, tag, comm);
} else if (rank < size - 1) {
// Receive from previous, process, send to next
MPI_Recv(data, count, MPI_FLOAT, rank - 1, tag, comm, &status);
MPI_Send(processed_data, count, MPI_FLOAT, rank + 1, tag, comm);
} else {
// Last process: receive and finalize
MPI_Recv(data, count, MPI_FLOAT, rank - 1, tag, comm, &status);
}
Optimized Communication Patterns
1. Communication-Avoiding Algorithms
// Example: Communication-avoiding matrix multiplication
// Instead of full matrix exchange, use partial updates
for (int i = 0; i < local_size; i++) {
// Local computation
for (int j = 0; j < local_size; j++) {
local_result[i][j] += local_a[i][k] * local_b[k][j];
}
// Periodic communication
if (i % communication_interval == 0) {
MPI_Allreduce(local_result, global_result, count, MPI_FLOAT, MPI_SUM, comm);
}
}
2. Hybrid Communication Patterns
// Example: Hybrid MPI+OpenMP communication
#pragma omp parallel
{
int thread_id = omp_get_thread_num();
int num_threads = omp_get_num_threads();
// Thread-local computation
#pragma omp for
for (int i = 0; i < local_size; i++) {
// Local computation
}
// Thread-local reduction
#pragma omp critical
{
// Update shared data
}
}
// MPI communication between processes
MPI_Allreduce(local_data, global_data, count, MPI_FLOAT, MPI_SUM, comm);
Best Practices
- Minimize Communication
- Use local computation where possible
- Combine multiple messages into single transfers
-
Use non-blocking communication for overlapping computation
-
Optimize Collective Operations
- Choose appropriate collective operation based on data pattern
- Use
MPI_IN_PLACE
when possible -
Consider non-blocking collectives for large data
-
Load Balancing
- Distribute work evenly across processes
- Use dynamic load balancing for irregular workloads
-
Consider process migration for load balancing
-
Communication-Avoiding Algorithms
- Implement algorithms that minimize global communication
- Use local reductions before global operations
- Consider domain decomposition strategies
Performance Considerations
- Network Topology
- Use topology-aware communication patterns
- Consider network distance in process placement
-
Use process affinity for better cache utilization
-
Memory Bandwidth
- Optimize data layout for better cache utilization
- Use appropriate data types for communication
-
Consider memory pinning for DMA transfers
-
Synchronization
- Minimize synchronization points
- Use non-blocking operations where possible
- Consider using one-sided communication
Related Resources
Discussion
Join the discussion about communication patterns and optimization techniques in our community forum.