Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review, simplify, and finalize fiberassign output format #271

Open
sbailey opened this issue Sep 21, 2020 · 3 comments
Open

review, simplify, and finalize fiberassign output format #271

sbailey opened this issue Sep 21, 2020 · 3 comments

Comments

@sbailey
Copy link
Contributor

sbailey commented Sep 21, 2020

Related to dangling PR #254: review the fiberassign output formats, simplify/trim them, and finalize it on something we are happy to use for years to come. A non-exhaustive list of items to check:

  • The lightweight fba-*.fits format was designed to minimize I/O and simplify debugging and studies with mocks, but it lacks the full information needed for actual observations, which are post-facto merged into the full fiberassign-*.fits format. The HDUs from fba are propagated forwards so that any QA scripts on the fba files also works on the fiberassign files, but this results in replicated information
    • FAVAIL vs. POTENTIAL_ASSIGNMENTS
    • FASSIGN vs. FIBERASSIGN
  • How much targeting information should be kept for targets that were assigned?
    • and should this be kept in the FIBERASSIGN HDU or a separate TARGETS HDU?
  • How much targeting information should be kept for targets that were reachable but not assigned?
    • previous default was everything, including for reachable SKY targets, but that involved an HDU with 80% default values due to most columns for Tractor-targets not applying to the much larger number of SKY targets...
  • Do we need to maintain backwards compatibility with tiles that were already observed and have a non-ideal format?
  • Find and review emails on the desi-data list (and possibly elsewhere) before re-inventing the wheel again

Finalize this before restarting observations in Fall 2020.

@tskisner
Copy link
Member

tskisner commented Nov 6, 2020

Just had an adhoc conversation with @forero and @geordie666 on a cancelled zoom call... Testing at KPNO on a couple hundred tiles showed:

  1. Out of a 1.5 hour run time, one hour was spent on the merging step
  2. Merging all the input columns ran out of memory.

Here is a proposition: can fiberassign just write out the minimal file needed for ICS and then we can do the merging later at NERSC as a convenience step? We could add some extra header keys with checksums of the input target files, to ensure that post-facto merging is using the same target files. I think the minimal file format would include:

  • The main fiberassign HDU with only assigned targets and minimal columns needed by ICS
  • The potential / available targets HDU with the current columns plus one additional bitfield to be used with Improvements to event logging / reconstruction #182
  • Any sky / gfa / other HDUs needed

The result would be that the merged files would still be needed for fancy plots or any QA that needed additional properties of available targets, but actually running the assignment in operations would be fast and light.

@forero
Copy link
Member

forero commented Nov 17, 2020

From [desi-data 5128]:

Suggested fiberassign file columns (minimally needed for ops, plus a few more for future developments, keeping more columns for the assigned targets than the potential targets):

In the FIBERASSIGN HDU (5000 targets that were actually assigned):

FIBER TARGETID LOCATION FIBERSTATUS LAMBDA_REF PETAL_LOC TARGET_RA TARGET_DEC FA_TARGET FA_TYPE FIBERASSIGN_X FIBERASSIGN_Y DEVICE_LOC
OBJTYPE CMX_TARGET DESI_TARGET FLUX_G FLUX_R FLUX_Z PHOTSYS
BGS_TARGET, MWS_TARGET, SCND_TARGET
FIBERTOTFLUX_G, FIBERTOTFLUX_R, FIBERTOTFLUX_Z
MORPHTYPE
SERSIC,SHAPE_R,SHAPE_E1,SHAPE_E2
PARALLAX, PMRA, PMDEC, REF_EPOCH
EBV
NUMTARGET
PRIORITY, SUBPRIORITY, OBSCONDITIONS, NUMOBS_MORE
PRIORITY_INIT, NUMOBS_INIT
FLUX_IVAR_G, FLUX_IVAR_R, FLUX_IVAR_Z
FIBERFLUX_G, FIBERFLUX_R, FIBERFLUX_Z
FLUX_W1, FLUX_W2
REF_ID, REF_CAT
GAIA_PHOT_G_MEAN_MAG, GAIA_PHOT_BP_MEAN_MAG, GAIA_PHOT_BR_MEAN_MAG
TIMESTAMP, VERSION, TARGET_STATE (from ledger)

In the TARGETS HDU (anything that was covered by a fiber, even if it wasn't assigned) — a much smaller set of columns because there are so many rows:

TARGETID DESI_TARGET CMX_TARGET SV1_TARGET RA DEC FA_TARGET FA_TYPE PRIORITY SUBPRIORITY OBSCONDITIONS

note that the *_TARGET bit columns are the only ones that are not currently in the minimalist fba_run output, and when we hit main survey we could drop CMX_TARGET and SV1_TARGET.

@tskisner
Copy link
Member

I think this has been resolved now that the target columns were pruned in the output? If so we should close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants