Skip to content

Commit

Permalink
Merge pull request #85 from Geet-George/multiple-flights
Browse files Browse the repository at this point in the history
Following is the logic behind the commits in this PR

## Data directory structure

The following is taken from the current documentation:

> The Data_Directory is a directory that includes all data from a single campaign. Therein, data from individual flights are stored in their respective folders with their name in the format of YYYYMMDD, indicating the flight-date. In case of flying through midnight, consider the date of take-off. In case, there are multiple flights in a day, consider adding alphabetical suffixes to distinguish chronologically between flights, e.g. 20200202-A and 20200202-B would be two flights on the same day in the same order that they were flown.

This system excludes the possibility of having multiple platforms in a single campaign. However, batching by campaign can be one of two options: for a single-platform campaign, batching across all flights in the campaign or for a multi-platform campaign, batching across all flights of a platform and then, again batching across all platforms of the campaign. Currently, batching is only possible for all sondes in a single flight. This is done by providing a mandatory `flight_id` in the config file.

## Suggested changes:

The Data_Directory should be of the structure where each directory in it should stand for a platform and directories within a platform's directory would be individual flight directories. This will be made mandatory. The package will then auto-infer platform names (`platforms` attribute) based on the platform directories' names. This value will go in to the dataset attributes (e.g. `platform_id`) and if the user wishes, also in the filenames of the dataset.

If the user wishes to provide custom `platforms` values, it can be provided as an attribute under the `MANDATORY` section of the config file, but then a separate `platform_directory_names` must be provided which will provide the platforms' data directory names in the same sequence as the platform names provided in `platforms`. If there are multiple platforms in the campaign, the `platforms` values provided by the user must be comma-separated values, e.g. `halo,wp3d` (preceding and succeeding spaces will be part of the platform name, e.g. when setting the `platform_id` name). If there is only one platform, provide a name with no commas.

Now, the only way to batch process will be to process for all sondes of a campaign, i.e. all sondes from all flights of all platforms in a campaign. If the user wants a subset of the batching, they can choose to only include limited directories in the `data_directory` they provide in the config file. However, considering that the processing doesn't take is not compute-heavy, there are no use-cases coming to my mind which warrant a separate mode of batch processing. 

## Now how to go about doing this?

The function `create_and_populate_flight_object` in the `pipeline` module processes all sondes of a flight.

A new function in the pipeline module `get_platforms` will get `platforms` value/s based on the directory names in `data_directory` or the user-provided `platforms` values corresponding to directory names (`platform_directory_names`). For each platform, a `Platform` object will be created with its `platform_id` attribute coming from the `platforms` attribute.

For each `Platform` object, another function in the pipeline module will get all corresponding `flight_id` values by looping over all directory names in a platform's directory and process all sondes in flight-wise batches.

After the flight-wise batch processing is done, all L2 files in the corresponding `flight_id` directories will be populated with L2 datasets that contain the corresponding `platform_id` and `flight_id` attributes. For creating L3 and onwards, the script will just look for all L2 files in the `data_directory` and get the flight and platform information from the `platform_id` and `flight_id` attributes of the L2 files.
  • Loading branch information
Geet-George authored Dec 11, 2023
2 parents 49a9f19 + f0d3928 commit f638883
Show file tree
Hide file tree
Showing 3 changed files with 154 additions and 24 deletions.
66 changes: 55 additions & 11 deletions src/halodrops/helper/paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,49 @@
module_logger = logging.getLogger("halodrops.helper.paths")


class Paths:
class Platform:
"""
Deriving flight paths from the provided platform directory
The input should align in terms of hierarchy and nomenclature
with the {doc}`Directory Structure </handbook/directory_structure>` that `halodrops` expects.
"""

def __init__(
self, data_directory, platform_id, platform_directory_name=None
) -> None:
self.platform_id = platform_id
self.platform_directory_name = platform_directory_name
self.data_directory = data_directory
self.flight_ids = self.get_flight_ids()

def get_flight_ids(self):
"""Returns a list of flight IDs for the given platform directory"""
if self.platform_directory_name is None:
platform_dir = os.path.join(self.data_directory, self.platform_id)
else:
platform_dir = os.path.join(
self.data_directory, self.platform_directory_name
)

flight_ids = []
for flight_dir in os.listdir(platform_dir):
if os.path.isdir(os.path.join(platform_dir, flight_dir)):
flight_ids.append(flight_dir)
return flight_ids


class Flight:
"""
Deriving paths from the provided directory
The input should align in terms of hierarchy and nomenclature
with the {doc}`Directory Structure </handbook/directory_structure>` that `halodrops` expects.
"""

def __init__(self, data_directory, flight_id):
def __init__(
self, data_directory, flight_id, platform_id, platform_directory_name=None
):
"""Creates an instance of Paths object for a given flight
Parameters
Expand All @@ -30,25 +64,33 @@ def __init__(self, data_directory, flight_id):
`flight_id` : `str`
Individual flight directory name
`platform_id` : `str`
Platform name
Attributes
----------
`flight_id`
`flight_idpath`
Path to flight data directory
`flight_idname`
`flight_id`
Name of flight data directory
`l1dir`
Path to Level-1 data directory
"""
self.logger = logging.getLogger("halodrops.helper.paths.Paths")
self.flight_id = os.path.join(data_directory, flight_id)
self.flight_idname = flight_id
self.l0dir = os.path.join(data_directory, flight_id, "Level_0")
self.l1dir = os.path.join(data_directory, flight_id, "Level_1")
if platform_directory_name is None:
platform_directory_name = platform_id
self.flight_idpath = os.path.join(
data_directory, platform_directory_name, flight_id
)
self.flight_id = flight_id
self.platform_id = platform_id
self.l1dir = os.path.join(self.flight_idpath, "Level_1")
self.l0dir = os.path.join(self.flight_idpath, "Level_0")

self.logger.info(
f"Created Path Instance: {self.flight_id=}; {self.flight_idname=}; {self.l1dir=}"
f"Created Path Instance: {self.flight_idpath=}; {self.flight_id=}; {self.l1dir=}"
)

def get_all_afiles(self):
Expand All @@ -69,7 +111,7 @@ def quicklooks_path(self):
`str`
Path to quicklooks directory
"""
quicklooks_path_str = os.path.join(self.flight_id, "Quicklooks")
quicklooks_path_str = os.path.join(self.flight_idpath, "Quicklooks")
if pp(quicklooks_path_str).exists():
self.logger.info(f"Path exists: {quicklooks_path_str=}")
else:
Expand All @@ -80,7 +122,7 @@ def quicklooks_path(self):
return quicklooks_path_str

def populate_sonde_instances(self) -> Dict:
"""Returns a dictionary of `Sonde` class instances for all A-files found in `flight_id`
"""Returns a dictionary of `Sonde` class instances for all A-files found in `flight_idpath`
and also sets the dictionary as value of `Sondes` attribute
"""
afiles = self.get_all_afiles()
Expand All @@ -94,6 +136,8 @@ def populate_sonde_instances(self) -> Dict:

Sondes[sonde_id] = Sonde(sonde_id, launch_time=launch_time)
Sondes[sonde_id].add_launch_detect(launch_detect)
Sondes[sonde_id].add_flight_id(self.flight_id)
Sondes[sonde_id].add_platform_id(self.platform_id)
Sondes[sonde_id].add_afile(a_file)
if launch_detect:
Sondes[sonde_id].add_postaspenfile()
Expand Down
92 changes: 79 additions & 13 deletions src/halodrops/pipeline.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from .helper.paths import Paths
from .helper.paths import Platform, Flight
from .sonde import Sonde
import configparser
import inspect
import os
import xarray as xr


Expand Down Expand Up @@ -137,9 +138,67 @@ def get_args_for_function(config, function):
return args


def create_and_populate_Paths_object(config: configparser.ConfigParser) -> Paths:
def get_platforms(config):
"""
Creates a Paths object and populates it with A-files.
Get platforms based on the directory names in `data_directory` or the user-provided `platforms` values.
Parameters
----------
config : ConfigParser instance
The configuration file parser.
Returns
-------
dict
A dictionary where keys are platform names and values are `Platform` objects.
Raises
------
ValueError
If `platforms` is specified in the config file but `platform_directory_names` is not, or
if a value in `platform_directory_names` does not correspond to a directory in `data_directory`.
"""
data_directory = config.get("MANDATORY", "data_directory")
if config.has_option("MANDATORY", "platforms"):
if not config.has_option("MANDATORY", "platform_directory_names"):
raise ValueError(
"platform_directory_names must be provided in the config file when platforms is specified"
)
platforms = config.get("MANDATORY", "platforms").split(",")
platform_directory_names = config.get(
"MANDATORY", "platform_directory_names"
).split(",")
platforms = dict(zip(platforms, platform_directory_names))
for directory_name in platform_directory_names:
if not os.path.isdir(os.path.join(data_directory, directory_name)):
raise ValueError(
f"No directory found for {directory_name} in data_directory"
)
platform_objects = {}
for platform, platform_directory_name in platforms.items():
platform_objects[platform] = Platform(
data_directory=data_directory,
platform_id=platform,
platform_directory_name=platform_directory_name,
)
else:
platforms = [
name
for name in os.listdir(data_directory)
if os.path.isdir(os.path.join(data_directory, name))
]
platform_objects = {}
for platform in platforms:
platform_objects[platform] = Platform(
data_directory=data_directory, platform_id=platform
)
return platform_objects


def create_and_populate_flight_object(config: configparser.ConfigParser) -> Flight:
"""
Creates a Flight object and populates it with A-files.
Parameters
----------
Expand All @@ -148,15 +207,22 @@ def create_and_populate_Paths_object(config: configparser.ConfigParser) -> Paths
Returns
-------
Paths
A Paths object.
Flight
A Flight object.
"""
platform_objects = get_platforms(config)
output = {}
mandatory = get_mandatory_args(Paths)
mandatory_args = get_mandatory_values_from_config(config, mandatory)
output["paths"] = Paths(**mandatory_args)
output["sondes"] = output["paths"].populate_sonde_instances()
return output["paths"], output["sondes"]
output["sondes"] = {}
for platform in platform_objects:
for flight_id in platform_objects[platform].flight_ids:
flight = Flight(
platform_objects[platform].data_directory,
flight_id,
platform,
platform_objects[platform].platform_directory_name,
)
output["sondes"].update(flight.populate_sonde_instances())
return output["sondes"]


def iterate_Sonde_method_over_dict_of_Sondes_objects(
Expand Down Expand Up @@ -299,10 +365,10 @@ def run_pipeline(pipeline: dict, config: configparser.ConfigParser):


pipeline = {
"create_paths": {
"create_flight": {
"intake": None,
"apply": create_and_populate_Paths_object,
"output": ["paths", "sondes"],
"apply": create_and_populate_flight_object,
"output": "sondes",
},
"qc": {
"intake": "sondes",
Expand Down
20 changes: 20 additions & 0 deletions src/halodrops/sonde.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,26 @@ def __post_init__(self):
if self.launch_time is not None:
object.__setattr__(self, "sort_index", self.launch_time)

def add_flight_id(self, flight_id: str) -> None:
"""Sets attribute of flight ID
Parameters
----------
flight_id : str
The flight ID of the flight during which the sonde was launched
"""
object.__setattr__(self, "flight_id", flight_id)

def add_platform_id(self, platform_id: str) -> None:
"""Sets attribute of platform ID
Parameters
----------
platform_id : str
The platform ID of the flight during which the sonde was launched
"""
object.__setattr__(self, "platform_id", platform_id)

def add_spatial_coordinates_at_launch(self, launch_coordinates: List) -> None:
"""Sets attributes of spatial coordinates at launch
Expand Down

0 comments on commit f638883

Please sign in to comment.