Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPR1-3159: Estimate size of MS produced by mvftoms #382

Open
ludwigschwardt opened this issue Nov 7, 2024 · 7 comments
Open

SPR1-3159: Estimate size of MS produced by mvftoms #382

ludwigschwardt opened this issue Nov 7, 2024 · 7 comments
Assignees

Comments

@ludwigschwardt
Copy link
Contributor

ludwigschwardt commented Nov 7, 2024

This is a dump of SPR1-3159 to facilitate external collaboration...

The calculation used in the archive web interface seems to overestimate the MS size by about 25%, which makes users worry that some of their data is missing:

Screenshot 2024-10-23 at 16 20 07

This originates from katsdparchive in @ctgschollar's domain.

The last person who worked on this was Kgomotso (2 years ago) but he is not around anymore. It’s been in production for the past year at least.

@ludwigschwardt
Copy link
Contributor Author

ludwigschwardt commented Nov 7, 2024

Let’s work through an example.

  • Pick a small dataset (1730279709, MVF4 size after mvf_download = 822 MB).
  • Run
mvftoms.py mvf4/1730279709/1730279709_sdp_l0.full.rdb -f --flags=cam,data_lost,ingest_rfi -o ms/test
  • Look at the directory created:
ls -la ms/test/

total 787732
drwxrwxr-x 15 kat kat       700 Oct 30 10:20 ./
drwxrwxr-x  3 kat kat        60 Oct 30 10:20 ../
drwxrwxr-x  2 kat kat       120 Oct 30 10:20 ANTENNA/
drwxrwxr-x  2 kat kat       120 Oct 30 10:20 DATA_DESCRIPTION/
drwxrwxr-x  2 kat kat       140 Oct 30 10:20 FEED/
drwxrwxr-x  2 kat kat       140 Oct 30 10:20 FIELD/
drwxrwxr-x  2 kat kat       120 Oct 30 10:20 FLAG_CMD/
drwxrwxr-x  2 kat kat       120 Oct 30 10:20 HISTORY/
drwxrwxr-x  2 kat kat       120 Oct 30 10:20 OBSERVATION/
drwxrwxr-x  2 kat kat       160 Oct 30 10:20 POINTING/
drwxrwxr-x  2 kat kat       140 Oct 30 10:20 POLARIZATION/
drwxrwxr-x  2 kat kat       120 Oct 30 10:20 PROCESSOR/
drwxrwxr-x  2 kat kat       140 Oct 30 10:20 SOURCE/
drwxrwxr-x  2 kat kat       140 Oct 30 10:20 SPECTRAL_WINDOW/
drwxrwxr-x  2 kat kat       120 Oct 30 10:20 STATE/
-rw-rw-r--  1 kat kat      8125 Oct 30 10:20 table.dat
-rw-rw-r--  1 kat kat       271 Oct 30 12:23 table.f0
-rw-rw-r--  1 kat kat   3145728 Oct 30 12:23 table.f0_TSM0
-rw-rw-r--  1 kat kat       284 Oct 30 12:23 table.f1
-rw-rw-r--  1 kat kat   6291456 Oct 30 12:23 table.f1_TSM0
-rw-rw-r--  1 kat kat       304 Oct 30 12:23 table.f2
-rw-rw-r--  1 kat kat   6291456 Oct 30 12:23 table.f2_TSM0
-rw-rw-r--  1 kat kat       274 Oct 30 12:23 table.f3
-rw-rw-r--  1 kat kat   2097152 Oct 30 12:23 table.f3_TSM0
-rw-rw-r--  1 kat kat       273 Oct 30 12:23 table.f4
-rw-rw-r--  1 kat kat   2097152 Oct 30 12:23 table.f4_TSM0
-rw-rw-r--  1 kat kat    231932 Oct 30 12:23 table.f5
-rw-rw-r--  1 kat kat       284 Oct 30 12:23 table.f6
-rw-rw-r--  1 kat kat 392167424 Oct 30 12:23 table.f6_TSM0
-rw-rw-r--  1 kat kat       294 Oct 30 12:23 table.f7
-rw-rw-r--  1 kat kat 197132288 Oct 30 12:23 table.f7_TSM0
-rw-rw-r--  1 kat kat       293 Oct 30 12:23 table.f8
-rw-rw-r--  1 kat kat 197132288 Oct 30 12:23 table.f8_TSM0
-rw-rw-r--  1 kat kat       104 Oct 30 10:20 table.info
-rw-rw-r--  1 kat kat       357 Oct 30 12:23 table.lock

du -sb ms/test/
806914372	ms/test/

Our basic parameters are:

  • 22 dumps selected (scan 1, 15 dumps + scan 4, 7 dumps)
  • 4096 channels
  • 544 correlation products (136 baselines if divided by number of pol terms = 4)

This results in 22 dumps * 544 corrprods / 4 pols = 22 * 136 = 2992 rows in the MS ✅

The basic block size / tile on disk seems to be 2 ** 21 = 2097152 bytes. All file sizes of table.fX_TSM0 are multiples of this (except the index table.f0_TSM0). In particular, it explains why the weights are slightly larger even though its size is a multiple of 2 ** 20 bytes.

The table.fX_TSM0 files have the following associations (open the corresponding table.fX file with strings):

  • table.f0: 3145728 = 1.5 tiles → some index
  • table.f1: 6291456 = 3 tiles → FLAG
  • table.f2: 6291456 = 3 tiles → FLAG_CATEGORY
  • table.f3: 2097152 = 1 tile → WEIGHT
  • table.f4: 2097152 = 1 tile → SIGMA
  • table.f6: 392167424 = 187 tiles → DATA
  • table.f7: 197132288 = 94 tiles → WEIGHT_SPECTRUM
  • table.f8: 197132288 = 94 tiles → SIGMA_SPECTRUM

@ludwigschwardt
Copy link
Contributor Author

ludwigschwardt commented Nov 7, 2024

This suggests the following formula for the main payload in the MS:

import math

# The main table.fX_TSM0 files have sizes that are multiples of these
# block sizes ("bucket" sizes in CASA Tiled Storage Manager speak?).
# XXX This is empirically determined so far, maybe cross-check with code
# See `casacore.tables.table.getdminfo output` and look for `BucketSize`
BIG_BLOCK = 2 ** 21
SMALL_BLOCK = 2 ** 18


def col_size(n_cells, bits, block_size=BIG_BLOCK):
    """Round `n_cells` of `bits` bits up to next block size."""
    return math.ceil(n_cells * bits / 8 / block_size) * block_size


def estimate_ms_size(n_dumps, n_chans, n_corrprods, n_pols=4):
    """Estimate MS size from basic MVF4 parameters."""
    n_baselines = n_corrprods // 4
    n_rows = n_dumps * n_baselines    
    n_cells = n_rows * n_pols
    n_cells_per_spectrum = n_cells * n_chans
    # Start with table.f0_TSM0 (not sure what that is, an index?)
    size = 12 * SMALL_BLOCK
    # DATA: complex64
    size += col_size(n_cells_per_spectrum, bits=64)
    # WEIGHT_SPECTRUM: float32
    size += col_size(n_cells_per_spectrum, bits=32)
    # SIGMA_SPECTRUM: float32
    size += col_size(n_cells_per_spectrum, bits=32)
    # FLAG: bit
    size += col_size(n_cells_per_spectrum, bits=1, block_size=SMALL_BLOCK)
    # FLAG_CATEGORY: bit
    size += col_size(n_cells_per_spectrum, bits=1, block_size=SMALL_BLOCK)
    # WEIGHT: float32
    size += col_size(n_cells, bits=32)
    # SIGMA: float32
    size += col_size(n_cells, bits=32)
    return size

Try it out on the small dataset:

In [10]: estimate_ms_size(22, 4096, 544)
Out[10]: 806354944

In [11]: !du -sb ms/test/
806914372	ms/test/

In [17]: 806354944 / 806914372
Out[17]: 0.9993067071062157

This now underestimates the size by 0.07%. Much better!

@ludwigschwardt
Copy link
Contributor Author

ludwigschwardt commented Nov 7, 2024

Test this idea on the dataset in the original query: 1703007682.

  • d.shape = (3545, 32768 , 8064) = 7.494 TB
  • Select scans=”track”109 scans, 3319 dumps
  • Average channels by factor of 8 → 4096 channels afterwards
  • Run estimate_ms_size(3319, 4096, 8064)1.782 TB

Looks promising!

Out of interest, the corresponding MVF4 size is 3319 * 4096 * 8064 * 10 / 1e12 = 1.096 TB. MS is not that efficient, probably because it stores both SIGMA_SPECTRUM and the redundant WEIGHT_SPECTRUM. Without the latter the MS size could have been 1.343 TB.

The basic number of bytes per element differs like this:

  • MKAT: complex64 vis + byte weights + byte flags = 8 + 1 + 1 = 10 bytes
  • MS: complex64 vis + float32 weights + float32 sigma + 2/8 flags = 8 + 4 + 4 + 0.25 = 16.25 bytes
  • more efficient MS: complex64 vis + float32 sigma + 2/8 flags = 8 + 4 + 0.25 = 12.25 bytes

@ludwigschwardt
Copy link
Contributor Author

ludwigschwardt commented Nov 7, 2024

The advantage of building this estimate into mvftoms.py as opposed to katsdparchive is that mvftoms.py knows exactly how many dumps, channels and corrprods it will produce based on the selections, unlike the archive software that will need to estimate it.

My suggestion is an estimate_ms_size utility function called from mvftoms.py so that it incorporates the effects of all the script options. We can add an option like --estimate-size to print out the estimated number of bytes. During normal use the script can also print out the estimate, and then determine the size afterwards and report the discrepancy. This will help us a lot with fine-tuning, effectively making every dataset a test case.

@ZachSARAO
Copy link

ZachSARAO commented Nov 8, 2024

Thanks @ludwigschwardt. I'm not following the logic above entirely. But I hazily understand that the size on disk is less than estimated by the archive website.

I downloaded the .rdb file locally (instead of working from the remote at archive-gw-1.kat.ac.za and I see that the input of 3.1MB was expanded to a 755MB directory. What additional data is added / where does it come from?

The command above took several minutes to run on my local, which is too long to do on the fly depending on a users's flag choice.

Would it be possible to speed this up a lot (to a few seconds)? i.e. if data is coming from somewhere else, mount it locally + avoiding any file writes/ + any other optimizations.

Or otherwise... is it possible to improve our estimations in a meaningful way?

@ludwigschwardt
Copy link
Contributor Author

The RDB file is only the metadata (3 MB). The bulk of the data (755 MB) lives in Ceph as chunks / objects. The additional data is the data. 😊

You cannot run mvftoms.py to estimate its size... That's kinda pointless. I ran it to get the correct size in my example. If you could speed it up, we'd be golden 😁

Yes, the estimations could be improved by incorporating the estimate_ms_size function inside mvftoms.py and running only that when passed an --estimate-size option.

@ZachSARAO
Copy link

How can i tell the number of dumps unless I create the ms? (that's the reason that I thought you had to run mvtoms before estimating size). I see your point how pointless that would be...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants