Definitions
cpuset: Is the set of cores on which the job is allowed to run. On a dual processor machine running Linux, all the even numbered cores are on socket and the odd numbered cores are on the other socket. E.g.
cpuset=0,2,4,6,8,10,12,14 # all the cores on one 8core socket.
cpuset=0,1,2,3,4,5,6,7 # 4 cores on one socket and four on the other.
Conclusions
- casa-5 seems to produce the same image no matter what the cpuset is.
https://docs.google.com/spreadsheets/d/1aKCzeCOj1-50mC7I4fN2eMupPrfR6OH-6LN4jSK9LtQ/edit#gid=670565607
- casa-pipeline-release-5.6.1-8.el7 and casa-6.1.2-7-pipeline-2020.1.0.36 both use the same version of OpenMPI (1.10.4)
/home/casa/packages/RHEL7/release/casa-pipeline-release-5.6.1-8.el7/lib/mpi/bin/mpirun -version
/home/casa/packages/RHEL7/release/casa-6.1.2-7-pipeline-2020.1.0.36/lib/mpi/bin/mpirun -version
- With casa-6, the resulting image is dependant on the cpuset used.
https://docs.google.com/spreadsheets/d/1aKCzeCOj1-50mC7I4fN2eMupPrfR6OH-6LN4jSK9LtQ/edit#gid=2101591390
https://docs.google.com/spreadsheets/d/1aKCzeCOj1-50mC7I4fN2eMupPrfR6OH-6LN4jSK9LtQ/edit#gid=93665106
- When using 8cores and mpicasa -n 9, I casa-6 always produces the same image regardless of the cpuset.
https://docs.google.com/spreadsheets/d/1aKCzeCOj1-50mC7I4fN2eMupPrfR6OH-6LN4jSK9LtQ/edit#gid=1339676938
- jobs jr-batch.9 and jr-nmpost005b.2 show that -n 9 is the same as -n $machinefile when ppn is 9
- runnnig a manual job with access to all the cores (no cpuset) and -n 9 produces the same result as jr-nmpost005.55 (all 8 even cores).
Though I only have a few data points.
- nmpost005, nmpost006, and nmpost072 produce the same images given the same input and using the same cpuset.
- cores chosen by Torque don't seem to change for a given host. Though I only have a few data points. If it did vary once in a
while it could explain the once in a while differences I saw in my end-to-end runs.
https://docs.google.com/spreadsheets/d/1aKCzeCOj1-50mC7I4fN2eMupPrfR6OH-6LN4jSK9LtQ/edit#gid=1234076945
- Torque seems to choose different cpusets for different hosts. E.g. nmpost005 gets 0-1,3-5,7,9,11,13 while nmpost006 gets 0-2,4-6,8,10,12. This cpuset doesn't seem to change after a reboot nor after smaller jobs being run on the host. I have no idea where Torque is saving this cpuset between jobs but it seems to be doing just that. This could produce different images if you are using something like ppn:8 and -n 8 or -n machinefile and you may think it is host dependent when actually it may just be Torque choosing different cpusets for you.
- It seems that the specific cores chosen doesn't dictate the image created but the number of cores on each socket does.
- It is looking like hardware doesn't really matter. It's the cpuset.
Questions
QUESTION: Does the number of threads per process (ps -T <pid>) change with different cpusets?
QUESTION: Check whether nodescheduler give whole NUMA node, also test whether nodescheduler + mpicasa -n 8 gives same image as the good -n 8 images (ie 8-0 not 6-2 or 5-3)
ANSWER: except for what I am guessing is a Torque but on nmpost060, all the other nodes honored the numanode in nodescheduler and gave me or other users 8 cores on the same socket.
- nmpost011/8-15: cpuset.cpus: 1,5,7,11,13,17,19,23 cpuset.mems: 1 (dual 12core sockets)
- nmpost013/0-7: cpuset.cpus: 0,4,6,10,12,16,18,22 cpuset.mems: 0 (dual 12core sockets)
- nmpost021/0-7: cpuset.cpus: 0,2,4,6,8,10,12,14 cpuset.mems: 0 (dual 16core sockets)
- nmpost033/0-7: cpuset.cpus: 0,2,4,6,8,10,12,14 cpuset.mems: 0 (dual 16core sockets)
- nmpost033/8-15: cpuset.cpus: 1,3,5,7,9,11,13,15 cpuset.mems: 1 (dual 16core sockets)
- nmpost036/0-7: cpuset.cpus: 0,2,6,8,10,12,16,18 cpuset.mems: 0 (dual 20core sockets)
- nmpost036/8-15: cpuset.cpus: 1,3,7,9,11,13,17,19 cpuset.mems: 1 (dual 20core sockets)
- nmpost060/0-7: cpuset.cpus 0,2 cpuset.mems: 0 (dual 16core sockets) Why is this cpuset only 0,2 when torque? L_Request = -L tasks=1:lprocs=8:memory=92gb:place=numanode which looks like nodescheduler but cpuset_string = nmpost060:0,2.
QUESTION: Running jobs with nodescheduler
ANSWER: Using nodescheduler, which provides you with 8 cores, to reserve a node and then manually running casa with either -n 8 or -n 9 produces images that are pixel identical to what you would get with a hand crafted cpuset of 8 cores on the same socket and using -n 8 or -n 9. In other words if you have been using nodescheduler to reserve nodes, I don't think your casa images are suspect.
QUESTION: Test with 4 way parallelization whether 4-0, 0-4, 2-2, 1-3, 3-1 distribution impacts resulting image using -n 4. also try -n 5.
ANSWER
- Using 4 cores and -n 4: 4-0, 0-4 produces a different image than 2-2, 1-3, 3-1.
- Using 4cores and -n 5: all permutations tested (4-0, 0-4, 2-2, 1-3, 3-1) produces the same image.
QUESTION: Test an ALMA data set examine oussid.s12_0.2276_444_53712_sci.spw16_18_20_22.cont.I.iter1.image for comparison
ANSWER: I can't use John Tobin's compare script so I have had to use cmp on individual files.
QUESTION: Can I get different images using batch as well as manual?