ALMA pipeline profiling and benchmarking

Summary of fundamentals and main findings of the ongoing multi-phase work by the Scientific Computing Group at NRAO. The main goal is to characterize the execution of the pipelines with respect to computing resources. Our expectation is that this work will help ALMA ARCs have a deeper understanding of the computational cost of data processing jobs, while providing developers an additional tool to help track specific areas where CASA can be made more resource efficient.

Data measured by the profiling framework

Timing
Memory footprint per process
Memory load of a node (used, cached, swap and the largest slab block)
Number of file descriptors per process
IO statistics on lustre file system (number of files per IO size range - 0k-4k, 4k-8k...)
Number and duration of system calls (open, close, read, write, fcntl, fsync)

Tests

The following tests were performed on the AOC cluster:

Serial benchmarks for all datasets
Parallelization breadth (number of MPI processes)
Storage type
Concurrency

Nodes on the AOC cluster were selected as follows, according to test requirements and properties.

nmpost001-050 for parallelization breadth (to be consistent with phase 1) and storage
nmpost051-060 for concurrency (and also parallelization breadth)

The following tests were performed on AWS:

Parallelization breadth (number of MPI processes)
Memory limit
Timing vs CPU type
Number of OpenMP threads

Space shortcuts

Page tree

Data measured by the profiling framework

Tests