Page History

...

Timing
Memory footprint per process
Memory load of a node (used, cached, swap and the largest slab block)
Number of file descriptors per process
IO statistics on lustre file system (number of files per IO size range - 0k-4k, 4k-8k...)
Number and duration of system calls (open, close, read, write, fcntl, fsync)

Image Removed

Tests

The following tests were performed on the AOC cluster:

Nodes on the AOC cluster were selected as follows, according to test requirements and properties.

nmpost001-050 for parallelization breadth (to be consistent with phase 1) and storage
nmpost051-060 for concurrency (and also parallelization breadth)

The following tests were performed on AWS:

Parallelization (MPI) of the calibration pipeline without creating MMS reduces only tclean times, resulting in approximately 1 - 15% of total pipeline runtime
Parallelization (MPI) of the imaging pipeline results in runtimes decreasing nearly linearly with the number of MPI processes
Reduction in runtime with local NVMe storage devices is less than 15% with respect to lustre - to be tested with larger devices to accommodate working directories larger than ~ 1.5 TB
No appreciable difference in imaging run time between 8, 16 and 32 GB RAM per process (8-way MPI) - not yet tested below 8 GB per process
Current recommendation is to run isolated jobs or 2-way concurrency (2 jobs on a node) with 8-way parallelization - more testing is planned to understand swap memory behavior of 4-way concurrency, that is more efficient timewise
MPI parallelization is advantageous over OpenMP if there’s enough memory to support more processes; OpenMP is advantageous when memory is exhausted and there are unused cores
Newer, faster CPUs with higher Passmark (industry standard benchmark - https://www.passmark.com/) are likely to be indicative of faster runs