Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #85 from Geet-George/multiple-flights
Following is the logic behind the commits in this PR ## Data directory structure The following is taken from the current documentation: > The Data_Directory is a directory that includes all data from a single campaign. Therein, data from individual flights are stored in their respective folders with their name in the format of YYYYMMDD, indicating the flight-date. In case of flying through midnight, consider the date of take-off. In case, there are multiple flights in a day, consider adding alphabetical suffixes to distinguish chronologically between flights, e.g. 20200202-A and 20200202-B would be two flights on the same day in the same order that they were flown. This system excludes the possibility of having multiple platforms in a single campaign. However, batching by campaign can be one of two options: for a single-platform campaign, batching across all flights in the campaign or for a multi-platform campaign, batching across all flights of a platform and then, again batching across all platforms of the campaign. Currently, batching is only possible for all sondes in a single flight. This is done by providing a mandatory `flight_id` in the config file. ## Suggested changes: The Data_Directory should be of the structure where each directory in it should stand for a platform and directories within a platform's directory would be individual flight directories. This will be made mandatory. The package will then auto-infer platform names (`platforms` attribute) based on the platform directories' names. This value will go in to the dataset attributes (e.g. `platform_id`) and if the user wishes, also in the filenames of the dataset. If the user wishes to provide custom `platforms` values, it can be provided as an attribute under the `MANDATORY` section of the config file, but then a separate `platform_directory_names` must be provided which will provide the platforms' data directory names in the same sequence as the platform names provided in `platforms`. If there are multiple platforms in the campaign, the `platforms` values provided by the user must be comma-separated values, e.g. `halo,wp3d` (preceding and succeeding spaces will be part of the platform name, e.g. when setting the `platform_id` name). If there is only one platform, provide a name with no commas. Now, the only way to batch process will be to process for all sondes of a campaign, i.e. all sondes from all flights of all platforms in a campaign. If the user wants a subset of the batching, they can choose to only include limited directories in the `data_directory` they provide in the config file. However, considering that the processing doesn't take is not compute-heavy, there are no use-cases coming to my mind which warrant a separate mode of batch processing. ## Now how to go about doing this? The function `create_and_populate_flight_object` in the `pipeline` module processes all sondes of a flight. A new function in the pipeline module `get_platforms` will get `platforms` value/s based on the directory names in `data_directory` or the user-provided `platforms` values corresponding to directory names (`platform_directory_names`). For each platform, a `Platform` object will be created with its `platform_id` attribute coming from the `platforms` attribute. For each `Platform` object, another function in the pipeline module will get all corresponding `flight_id` values by looping over all directory names in a platform's directory and process all sondes in flight-wise batches. After the flight-wise batch processing is done, all L2 files in the corresponding `flight_id` directories will be populated with L2 datasets that contain the corresponding `platform_id` and `flight_id` attributes. For creating L3 and onwards, the script will just look for all L2 files in the `data_directory` and get the flight and platform information from the `platform_id` and `flight_id` attributes of the L2 files.
- Loading branch information