Skip to content

Communication Patterns

Optimizing communication patterns is crucial for achieving high performance in parallel applications. This guide provides patterns and best practices for efficient communication in MPI applications.

Common Communication Patterns

1. Master-Worker Pattern

// Master process
if (rank == 0) {
    for (int i = 1; i < size; i++) {
        MPI_Send(data, count, MPI_FLOAT, i, tag, comm);
    }
}

// Worker processes
if (rank > 0) {
    MPI_Recv(data, count, MPI_FLOAT, 0, tag, comm, &status);
}

2. Ring Pattern

// Ring communication
int left = (rank + size - 1) % size;
int right = (rank + 1) % size;

// Send to right, receive from left
MPI_Sendrecv(sendbuf, count, MPI_FLOAT, right, tag,
             recvbuf, count, MPI_FLOAT, left, tag,
             comm, &status);

3. Pipeline Pattern

// Pipeline communication
if (rank == 0) {
    // Process data and send to next
    MPI_Send(processed_data, count, MPI_FLOAT, 1, tag, comm);
} else if (rank < size - 1) {
    // Receive from previous, process, send to next
    MPI_Recv(data, count, MPI_FLOAT, rank - 1, tag, comm, &status);
    MPI_Send(processed_data, count, MPI_FLOAT, rank + 1, tag, comm);
} else {
    // Last process: receive and finalize
    MPI_Recv(data, count, MPI_FLOAT, rank - 1, tag, comm, &status);
}

Optimized Communication Patterns

1. Communication-Avoiding Algorithms

// Example: Communication-avoiding matrix multiplication
// Instead of full matrix exchange, use partial updates
for (int i = 0; i < local_size; i++) {
    // Local computation
    for (int j = 0; j < local_size; j++) {
        local_result[i][j] += local_a[i][k] * local_b[k][j];
    }
    // Periodic communication
    if (i % communication_interval == 0) {
        MPI_Allreduce(local_result, global_result, count, MPI_FLOAT, MPI_SUM, comm);
    }
}

2. Hybrid Communication Patterns

// Example: Hybrid MPI+OpenMP communication
#pragma omp parallel
{
    int thread_id = omp_get_thread_num();
    int num_threads = omp_get_num_threads();

    // Thread-local computation
    #pragma omp for
    for (int i = 0; i < local_size; i++) {
        // Local computation
    }

    // Thread-local reduction
    #pragma omp critical
    {
        // Update shared data
    }
}

// MPI communication between processes
MPI_Allreduce(local_data, global_data, count, MPI_FLOAT, MPI_SUM, comm);

Best Practices

  1. Minimize Communication
  2. Use local computation where possible
  3. Combine multiple messages into single transfers
  4. Use non-blocking communication for overlapping computation

  5. Optimize Collective Operations

  6. Choose appropriate collective operation based on data pattern
  7. Use MPI_IN_PLACE when possible
  8. Consider non-blocking collectives for large data

  9. Load Balancing

  10. Distribute work evenly across processes
  11. Use dynamic load balancing for irregular workloads
  12. Consider process migration for load balancing

  13. Communication-Avoiding Algorithms

  14. Implement algorithms that minimize global communication
  15. Use local reductions before global operations
  16. Consider domain decomposition strategies

Performance Considerations

  1. Network Topology
  2. Use topology-aware communication patterns
  3. Consider network distance in process placement
  4. Use process affinity for better cache utilization

  5. Memory Bandwidth

  6. Optimize data layout for better cache utilization
  7. Use appropriate data types for communication
  8. Consider memory pinning for DMA transfers

  9. Synchronization

  10. Minimize synchronization points
  11. Use non-blocking operations where possible
  12. Consider using one-sided communication

Discussion

Join the discussion about communication patterns and optimization techniques in our community forum.