Skip to content

"on the fly" classification: profiling and benchmark

jhu-s edited this page Mar 8, 2024 · 13 revisions

Test data

Set 1

  • source: https://zenodo.org/records/3265189
  • original data: DES_Ia-0001_HEAD.FITS, DES_Ia-0001_PHOT.FITS
  • pre-process: convert to csv file; only include columns of 'SNID','MJD','FLUXCAL','FLUXCALERR','FLT'
  • size: 57 MB (1295258 rows)
  • Number of SN objects: 21558
  • Avg. rows per object: 60.1

Set 2

  • source: new simulation data from the science team
  • processed data: DES_sims.csv
  • size: 6.8 MB (133925 rows)
  • number of SN objects: 1667
  • avg. rows per object: 80.3

Test setup

Original code

Time benchmark

Data set 1:

  • total running time: 439 s

Data set 2:

  • total running time: 4.8 s

Memory profile

Data set 1: Screen Shot 2024-03-01 at 5 04 06 pm

Data set 2: Screen Shot 2024-03-01 at 5 05 26 pm

Optimization 1: use groupby to get data batches

Time benchmark

Data set 1

Total running time: 15.5 s

Data set 2

Total running time: 2.5 s

Memory profile

Data set 1

Screen Shot 2024-03-04 at 12 52 01 pm

Data set 2

Screen Shot 2024-03-04 at 1 40 56 pm

Optimization 2: move ordered features selection to function format_data

Time benchmark

Data set 1

Total running time: 11.6 s

Data set 2

Total running time: 1.1 s