Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Timing
  • Memory footprint per process
  • Memory load of a node (used, cached, swap and the largest slab block)
  • Number of file descriptors per process
  • IO statistics on lustre file system (number of files per IO size range - 0k-4k, 4k-8k...)
  • Number and duration of system calls (open, close, read, write, fcntl, fsync)

Image Removed

Tests

The following tests were performed on the AOC cluster:

  • Serial benchmarks for all datasets
  • Parallelization breadth (number of MPI processes)
  • Storage type
  • Concurrency

Nodes on the AOC cluster were selected as follows, according to test requirements and properties.

  • nmpost001-050 for parallelization breadth (to be consistent with phase 1) and storage
  • nmpost051-060 for concurrency (and also parallelization breadth)

The following tests were performed on AWS:

  • Parallelization breadth (number of MPI processes)
  • Memory limit
  • Timing vs CPU type
  • Number of OpenMP threads

Conclusions

  • Parallelization (MPI) of the calibration pipeline without creating MMS reduces only tclean times, resulting in approximately 1 - 15% of total pipeline runtime
  • Parallelization (MPI) of the imaging pipeline results in runtimes decreasing nearly linearly with the number of MPI processes
  • Reduction in runtime with local NVMe storage devices is less than 15% with respect to lustre - to be tested with larger devices to accommodate working directories larger than ~ 1.5 TB
  • No appreciable difference in imaging run time between 8, 16 and 32 GB RAM per process (8-way MPI) - not yet tested below 8 GB per process

  • Current recommendation is to run isolated jobs or 2-way concurrency (2 jobs on a node) with 8-way parallelization - more testing is planned to understand swap memory behavior of 4-way concurrency, that is more efficient timewise

  • MPI parallelization is advantageous over OpenMP if there’s enough memory to support more processes; OpenMP is advantageous when memory is exhausted and there are unused cores

  • Newer, faster CPUs with higher Passmark (industry standard benchmark - https://www.passmark.com/) are likely to be indicative of faster runs