AUDI jobs operating on the same dataset will result in some data products having identical file names and what to do in this situation needs to be addressed.

Continuum/Cube products:

We should avoid determining the duplicity of a product by the filename because AUDI products will have the same file names, but possibly different angular resolution, pipeline versions, spectral coverage, etc. Instead, should focus on comparing metadata. The metadata of the new product should be compared to the metadata of all products contained within the archive.

Header keywords to check for the same values, which if all equal would mean the product is the same. Values are meant as examples and will not reflect the values that all datasets will have. If the new product does not contain all header fields as products already in archive (and vice-versa) it should be regarded as a new product. If metadata are identical, products are identical and should not be ingested.

NAXIS   =                    4
NAXIS1  =                 1728
NAXIS2  =                 1728

NAXIS3  =                    1

NAXIS4  =                    1

BMAJ    =   3.401172359470E-05

BMIN    =   2.806701517786E-05

BPA     =  -4.208995788542E+01

BTYPE   = 'Intensity'
OBJECT  = 'GW_Ori  '
BUNIT   = 'Jy/beam '           /Brightness (pixel) unit

CTYPE1  = 'RA---SIN'

CRVAL1  =   8.228496247106E+01
CDELT1  =  -7.222222217153E-06
CRPIX1  =   8.650000000000E+02

CUNIT1  = 'deg     '

CTYPE2  = 'DEC--SIN'

CRVAL2  =   1.187020728400E+01
CDELT2  =   7.222222217153E-06
CRPIX2  =   8.650000000000E+02

CUNIT2  = 'deg     '

CTYPE3  = 'FREQ    '

CRVAL3  =   2.248650462272E+11
CDELT3  =   1.810101696316E+10
CRPIX3  =   1.000000000000E+00
CTYPE4  = 'STOKES  '
CRVAL4  =   1.000000000000E+00
CDELT4  =   1.000000000000E+00
CRPIX4  =   1.000000000000E+00
CUNIT4  = '        '
RESTFRQ =   2.305380000000E+11
SPECSYS = 'LSRK' 
TELESCOP= 'ALMA    '
DATE-OBS= '2017-12-10T04:53:09.359999'
PIPEVER = '43130 (Pipeline-CASA56-P2-B)'
CASAVER = '5.6.2-3 '

SPECMODE= 'cube    '

New version of same product?

We can also attempt to define criteria where a product would be a new version of the same product. This would attempt to keep products where some metadata change by very small values due to heuristic adjustments within the imaging pipelines, but are not fundamentally different.

Fields that should be identical:

NAXIS   =                    4

SPECMODE= 'cube    '

TELESCOP= 'ALMA    '

DATE-OBS= '2017-12-10T04:53:09.359999'

BTYPE   = 'Intensity'

OBJECT  = 'GW_Ori  '

BUNIT   = 'Jy/beam '           /Brightness (pixel) unit

SPECSYS = 'LSRK'

CTYPE1  = 'RA---SIN'

CUNIT1  = 'deg     '

CTYPE2  = 'DEC--SIN'

CUNIT2  = 'deg     '

CTYPE3  = 'FREQ    '

CTYPE4  = 'STOKES  '

CUNIT4  = '        '

RESTFRQ =   2.305380000000E+11

Fields same within 0.25 arcsec:

CRVAL1  =   8.228496247106E+01

CRVAL2  =   1.187020728400E+01

Fields same within 0.5%:

CRVAL3  =   2.248650462272E+11

CDELT3  =   1.810101696316E+10

CRPIX3  =   1.000000000000E+00

NAXIS3  =                    1

Fields same within 5%:

BMAJ    =   3.401172359470E-05

BMIN    =   2.806701517786E-05

BPA     =  -4.208995788542E+01

NAXIS1  =                 1728

NAXIS2  =                 1728

NAXIS4  =                    1

CDELT1  =  -7.222222217153E-06

CRPIX1  =   8.650000000000E+02

CDELT2  =   7.222222217153E-06

CRPIX2  =   8.650000000000E+02

May or may not be the same:

PIPEVER = '43130 (Pipeline-CASA56-P2-B)'

CASAVER = '5.6.2-3 '



8 Comments

  1. Thanks John, looks good. The only fields where the 5% might not work well are the CRVAL keywords (RA, Dec and frequency refs), a 5% change to one of those would be quite a different cube. I can't think of a way that the RA and Dec could be changed by the user, but I guess the ref. frequency could be (for example, the case of an extragalactic cube where no refreq is set), so perhaps that should be tighter.

  2. My thought that the RA and Dec could slightly change with different mosaicing algorithms or heuristic changes that change the centering of a mosaic. I changed the criteria to be in just arcseconds rather than a percentage, which I agree in retrospect doesn't make sense for RA/Dec.

  3. We just had another set of images requested with one target and multiple spw, seems quite a common use case for reimaging. For this use case, we just need to allow ingestion of multiple continuum images, it seems to me it would be harmless to just ingest the new images and deprecate the old ones (or delete them altogether). My astrocloud is getting full with hand-staged images, so can we put in a ticket for this?

  4. So it looks like we need to come up with something for the July 2nd 2020 planning meeting. My impression is that discussions are converging on two lines of action:

    1) We ask SSA to use the guidelines above (and I would include the CASA/Pipeline version in determination of uniqueness) to determine whether a product is or is not duplicating a prior product. If it is, do not ingest that product into the archive (and email a warning to the email address given in the audiPass command argument), but ingest any other image products and associated ancillary data in the same job that are not duplicates.

    2) We ask SSA to give each product a unique ID in the archive that is not the filename. This will enable us to keep multiple versions of a given product (for example, 2 continuum images made with different CASA versions) without having to change filenames in the archive or on ingest.

  5. I think 1) would would be a shorter term solution, but 2) would be more comprehensive. I will file a ticket and put the DMS discuss label on it for next weeks meeting.

  6. I've added the SRDP-Priority label to John't ticket. Mark, I think that the same product but with different CASA versions is a version of the product not a unique one.



  7. SSA-6476 - Getting issue details... STATUS

  8. Now we also have the capability to change beam size we are also seeing attempts to make multiple images with the same MOUS/source/spw but different beam sizes, so we will need to decide how close the beam size/tapering/robust can be before we call it a duplicate. I would argue for something quite tight like 5-10% in beam area - that should be enough that reruns of the default beamsize with different versions of CASA would not result in new products, but a significant change in robust would result in a new product. Though actually I see that BMAJ and BMIN are included in the description above, so maybe we have this covered.