Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Numbers are in hours

CPUs at CHTC are noticibly slower than CPUs at NRAO. For example, their set of c20xx machines (e20{03..18}) each have two Intel Xeon Silver 4114 2.20GHz processors and 0.5TB to 1TB of memory, while their large memory machines (mem3, mem2001, mem2002) each have four Intel Xeon E7-4820 v4 2.00GHz processors and 2TB to 4TB of memory.


CASA-6

Small Data Set

Small data set VLASS1.2.sb36484946.eb36542800.58574.4235612037_ptgfix_split_smaller.ms with full parameters, using cfcache from local disk.

StepNRAO (run06)NRAO/CHTC ()NRAO/AWS ()
011.0

054.8

061.0

071.2

154.0

160.8

233.0

241.3

Total17.1

CASA-5

Large Data Set

Large data set VLASS1.2.sb36491855.eb36574404.58585.53016267361_datacolumn.ms with full parameters and copying , using cfcache to from local disk at CHTC.

StepNRAO (steps-all-parallel9)NRAO/CHTC (steps-all-parallel17)NRAO/AWS (steps-all-parallel16)
019.427.7 (ran at CHTC)6.48.912.3
0560.2171.567.365.9
062427.924.48
0711.8218.411.2
1555.2161.05861.96
166.14.05.77.6
23230.8
140.1
2446
54.4
Total443.5
373.9


Small Data Set

Small data set test.ms with full parameters and not copying cfcache to local disk at CHTC using the 16k (wrong) cfcache, using cfcache from local disk.

parallel12parallel15parallel148.0986568130392020232269563434748347853141660168
StepNRAO (steps-all-parallel21)NRAO/CHTC (steps-all-parallel19)NRAO/AWS (steps-all-parallel20)
011.21.21.2
055.513.25.3
061.62.21.8
071.31.41.4
155.412.05.2
161.01.01.0
236.613.96.4
243.57.23.4
Total4626.122652.813925.0

CPUs at CHTC are noticibly slower than CPUs at NRAO.  For example, their set of c20xx machines (e20{03..18}) each have two Intel Xeon Silver 4114 2.20GHz processors and 0.5TB to 1TB of memory, while their large memory machines (mem3, mem2001, mem2002) each have four Intel Xeon E7-4820 v4 2.00GHz processors and 2TB to 4TB of memory.  Possible reasons for this slowdown:

  • cfcache on cephfs
  • Slower CPUs
  • Multiple users
  • Hyperthreading

I ran a small data set test with full parameters at CHTC that copied cfcache from /staging to local disk and step05 took only 16.7 hours instead of the 56.8 hours it had taken using cfcache on /staging.

Small data set test.ms with full parameters and not copying cfcache to local disk at CHTC using the 8k (right) cfcache and copying the cfcache to local disk at CHTC.

7

Wallclock time from start to finish for the small data set (test.ms)

  • NRAO: 26.5
  • NRAO/CHTC: 111.5 so this job spent about as much time waiting for nodes as running
  • NRAO/AWS: 27.3

...