...
- Timing
- Memory footprint per process
- Memory load of a node (used, cached, swap and the largest slab block)
- Number of file descriptors per process
- IO statistics on lustre file system (number of files per IO size range - 0k-4k, 4k-8k...)
- Number and duration of system calls (open, close, read, write, fcntl, fsync)
Tests
The following tests were performed on the AOC cluster:
- Serial benchmarks for all datasets
- Parallelization breadth (number of MPI processes)
- Storage type
- Concurrency
Nodes on the AOC cluster were selected as follows, according to test requirements and properties.
- nmpost001-050 for parallelization breadth (to be consistent with phase 1) and storage
- nmpost051-060 for concurrency (and also parallelization breadth)
The following tests were performed on AWS:
- Parallelization breadth (number of MPI processes)
- Memory limit
- Timing vs CPU type
- Number of OpenMP threads
Conclusions
- Parallelization (MPI) of the calibration pipeline without creating MMS reduces only tclean times, resulting in approximately 1 - 15% of total pipeline runtime
- Parallelization (MPI) of the imaging pipeline results in runtimes decreasing nearly linearly with the number of MPI processes
- Reduction in runtime with local NVMe storage devices is less than 15% with respect to lustre - to be tested with larger devices to accommodate working directories larger than ~ 1.5 TB
No appreciable difference in imaging run time between 8, 16 and 32 GB RAM per process (8-way MPI) - not yet tested below 8 GB per process
Current recommendation is to run isolated jobs or 2-way concurrency (2 jobs on a node) with 8-way parallelization - more testing is planned to understand swap memory behavior of 4-way concurrency, that is more efficient timewise
MPI parallelization is advantageous over OpenMP if there’s enough memory to support more processes; OpenMP is advantageous when memory is exhausted and there are unused cores
- Newer, faster CPUs with higher Passmark (industry standard benchmark - https://www.passmark.com/) are likely to be indicative of faster runs