Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very first interim_l3 dataset #121

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

hgloeckner
Copy link
Collaborator

I added a first draft for a gridded class that can be used to start l3 creation.
Right now it does some things where I am not sure if that's useful:

  • launch_time is made a dimension
  • datasets are concatenated along launch_time and timecausing many missing values (those will go away when we switch dimension to some height in subsequent steps, but maybe the switch should happen before?)
  • all the former attributes (aircraft data, like true_heading_(deg), but also things like author) are now variables. We should probably add something that looks at them and if it's the same thing for every sonde makes it an attribute again. I just did not want to lose information.

It's not meant to be perfect at this point, but maybe we can use it to discuss whether it makes sense like this at all?

@hgloeckner
Copy link
Collaborator Author

... and Issue #120 should be solved here as well - for now I just copied the Sonde header and way to do things

@Geet-George
Copy link
Owner

Hi @hgloeckner, thanks for starting this. I think it makes a lot of sense, but maybe with some changes? I'm "typing out loud" my thoughts here.

There are two main steps (and a whole bunch of sub-steps, see #90) to go from L2 to L3:

(a) Prepare L2 data for concatenation (interpolate, cell methods, etc.) - this still happens to Sonde instances. The final step could be your function prepare_l2_for_gridded of keeping attributes, add launch_time dim, etc. (although sonde_id is the dim we want for Gridded not launch_time.)

(b) Concatenate & incorporate L3 features (attrs, vars, etc.) - this happens to Gridded instances

After (a) is done, this will be stored in an attribute (currently _interim_l3_ds, we could probably change the name!) I see the Gridded class as something that would concatenate multiple instances from Sonde (in particular the attribute _interim_l3_ds) and then apply further changes, maybe such as adding wspd,wdir,T,RH, other attributes, etc.

To summarize, there will be a bunch of functions between "add_l2_ds" and "prepare_l2_for_gridded" in the pipeline, and we should make sonde_id the dimension for gridded sondes, instead of launch_time.

I hope these make sense. Otherwise the PR is exactly in the direction where we want to go! :)


object.__setattr__(self, "_interim_l3_ds", ds)

return self


@dataclass(order=True, frozen=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@dataclass(order=True, frozen=True)
@dataclass(order=True)

Let's skip the frozen already for Gridded! For Sonde, it was more important because sonde_id should be unique.

sondes: dict
# platforms: dict = field(default_factory=dict)
_: KW_ONLY
interpolation_grid: np.ndarray = field(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These class attributes of interpolation_grid, interpolation_bins and max_gap_fill will make it fixed values, and unable to be changed by user config. I think it is better to have these values as something that can be imported from a helper function. The concat_sondes can then use these as arguments.

src/halodrops/processor.py Outdated Show resolved Hide resolved
@@ -967,9 +986,32 @@ def add_q_and_theta_to_l2_ds(self):
else:
ds = self.l2_ds

ds = calc_q_from_rh(ds)
ds = calc_theta_from_T(ds)
ds = hh.calc_q_from_rh(ds)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a different commit. (that's my bad to not include hh, sorry :P)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes true. I subcontiously changed it at some point ^^'

src/halodrops/processor.py Outdated Show resolved Hide resolved
src/halodrops/processor.py Outdated Show resolved Hide resolved
src/halodrops/processor.py Outdated Show resolved Hide resolved
src/halodrops/processor.py Outdated Show resolved Hide resolved
@hgloeckner
Copy link
Collaborator Author

Your completely right - Nina, Theresa and I will also talk this afternoon.
Interpolating before is definitely more useful. I unfortunately only realized that too late and wanted to already give something up for discussion^^' I hope I can fix your points later today

Copy link
Owner

@Geet-George Geet-George left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to test it out, but from looking at the code, it looks okay. I think something might break if we don't provide a file that is prepared from L2 before concatenating, but this is worth trying when we actually add in the L3 functions.

def sondes_to_gridded(sondes: dict) -> xr.Dataset:
pass
def sondes_to_gridded(sondes: dict, config: configparser.ConfigParser):
print(sondes)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(sondes)


def remove_non_mono_incr_alt(self):
"""
This function removes the indices in the
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This function removes the indices in the
This function removes the value at the indices which have a non-monotonically increasing value in the

@Geet-George Geet-George marked this pull request as ready for review August 10, 2024 17:26
@hgloeckner
Copy link
Collaborator Author

I am not really sure if I accidentally deleted something essential from the interpolation method compared to joanne. One downside in the comparison is that my version needs the bottleneck package to run (I think otherwise interpolate_na over a time variable does not work out of the box). I also haven't thoroughly checked the functions. so it's not ready for merge yet, but maybe you can have a look if you like

@hgloeckner hgloeckner force-pushed the create_interim_l3 branch 2 times, most recently from 42002b9 to b95e72b Compare August 12, 2024 18:09
@hgloeckner
Copy link
Collaborator Author

It's not working if there are sondes without a launch detect yet. Fixing that currently by removing those from the sonde_dictionary. That means that the sondes that don't detect a launch are removed from the dataset, but I guess for level3 that's ok (they still show in level2). Also, are the sondes where the qc fails currently still in there? then we should remove them as well

@hgloeckner
Copy link
Collaborator Author

I get a basic level3 file out of this. However, I am not sure whether we are (correctly) iterating over the different flight IDs. As far as I understood there should be a level3 file for each flight right? I guess we are not doing that currently

@hgloeckner
Copy link
Collaborator Author

I added the q and theta calculations since I needed them for the plot :)

@ninarobbins
Copy link
Collaborator

hi @hgloeckner, when trying to plot the sondes in a map, I realized that the lon/lat have dimension (gpsalt), when they should also have dimension (sonde_id). I'll try to fix this

@hgloeckner
Copy link
Collaborator Author

I think also, main is not working anymore right now (or only for one path structure) because the last to commits are actually fixes for the path_structure not so much for this branch.

@hgloeckner
Copy link
Collaborator Author

I think also, main is not working anymore right now (or only for one path structure) because the last to commits are actually fixes for the path_structure not so much for this branch.

I put those changes into another pull request so that we can have a working main before we merge this :)

@hgloeckner hgloeckner force-pushed the create_interim_l3 branch 3 times, most recently from 33b3621 to 3cdd07e Compare August 19, 2024 13:49
@ninarobbins
Copy link
Collaborator

Where does the sonde_id currently come from? I guess it should be something containing the platform, flight date, and sonde number (HALO_20240811_s1).

@hgloeckner
Copy link
Collaborator Author

It's the internal serial id. Geet and Tobi answered this to me in Issue #127

@hgloeckner
Copy link
Collaborator Author

hgloeckner commented Aug 23, 2024

Are you okay with merging? I think it would be nice to have a basic level 3 in main so that we can start further editing from there @Geet-George

@Geet-George
Copy link
Owner

Are you okay with merging? I think it would be nice to have a basic level 3 in main so that we can start further editing from there @Geet-George

I am not sure I could completely understand the conversation here. So, is the main branch working now? And what does this basic level 3 include?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants