Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Parallelization breadth (number of MPI processes)
  • Memory limit
  • Timing vs CPU type
  • Number of OpenMP threads

Files

Conclusions

Summary of main conclusions. Go to the reports and presentations on Files for detailed information.

  • Parallelization (MPI) of the calibration pipeline without creating MMS reduces only tclean times, resulting in approximately 1 - 15% of total pipeline runtime
  • Parallelization (MPI) of the imaging pipeline results in runtimes decreasing nearly linearly with the number of MPI processes
  • Reduction in runtime with local NVMe storage devices is less than 15% with respect to lustre - to be tested with larger devices to accommodate working directories larger than ~ 1.5 TB
  • No appreciable difference in imaging run time between 8, 16 and 32 GB RAM per process (8-way MPI) - not yet tested below 8 GB per process

  • Current recommendation is to run isolated jobs or 2-way concurrency (2 jobs on a node) with 8-way parallelization - more testing is planned to understand swap memory behavior of 4-way concurrency, that is more efficient timewise

  • MPI parallelization is advantageous over OpenMP if there’s enough memory to support more processes; OpenMP is advantageous when memory is exhausted and there are unused cores

  • Newer, faster CPUs with higher Passmark (industry standard benchmark - https://www.passmark.com/) are likely to be indicative of faster runs