...
mkdir -p /dev/shm/mpicasa
export OMPI_MCA_orte_tmpdir_base=/dev/shm/mpicasa
CFCache Creation
If you set There is a known CASA bug where setting the cfcache='' then mpicasa will run for days or longer (I never let it finish naturaly). Perhaps it is trying to create a cfcache but is deadlocked with itself.
...
causes one part of CASA to create a cfcache with a name like imagename_base.cf, and another part of CASA to look for the cfcach as cfcache.cf. Or something like that. Anyway, never set cfcache=''. Either set it to an existing cfcache or to some directory that doesn't exists like cfcach="cachedir".
I don't know why this only seems to be a problem with mpicasa and not serial CASA. Perhaps it causes some race condition.
Questions
- Why doesn't the serial job fail with matplotlib errors because of a missing .matplotlib like the parallel case does? Does CASA not start matplotlib or perhaps a different version of matplotlib?
- Can I run my parallel DAGs at CHTC?
...