Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technical scoping: expanding GC Uploader to support multiple file formats #73

Open
rudokemper opened this issue Feb 4, 2025 · 3 comments
Assignees
Labels

Comments

@rudokemper
Copy link
Member

rudokemper commented Feb 4, 2025

#72 introduced a new "GC Uploader" Windmill app to handle Locus Maps exports exclusively.

However, we know that this is not the only type of file that our users will want to upload. Some other ones that we've already heard about include:

  • Mapeo or CoMapeo exports (GeoJSON, or ZIP containing GeoJSON and attachments).
  • KoboToolbox CSV/XLS export that was manually cleaned up by a user in Excel.
  • Compressed files containing Esri Shapefile data.
  • Compressed files with Timelapse template and annotation data, including media files.
  • CSV or GeoJSON data from other sources.

Rather than creating a separate "GC Uploader" app for each file type, we could adapt the existing application to recognize different formats and schemas, triggering the appropriate connector script automatically.

Perhaps more ambitiously, in the same scope of work we might consider consolidating two or more connector scripts that are broadly similar in flow, and only differ in relation to data structures.

The user experience could be roughly similar to that of Felt's "Upload Anything", providing a single entry point to handle many and diverse file formats that our users want to store in GuardianConnector.

This issue will focus on technical scoping to determine the best approach in Windmill for extending the GC Uploader to support multiple file formats. The issue will be closed once we have converged on a clear approach, at which point implementation work can be scoped and planned separately.

@rudokemper rudokemper added this to the 11th Hour Project milestone Feb 4, 2025
@rudokemper rudokemper self-assigned this Feb 4, 2025
@rudokemper
Copy link
Member Author

One consideration for very large compressed files (such as Timelapse data with media files) is that the Windmill file input converts and returns the uploaded data as a Base64 encoded string. This is poised to start running into memory limitations with >1GB files.

@rudokemper
Copy link
Member Author

A related concern, from the user side in working with Timelapse, is that compressing the Timelapse folder with all of the camera trap media is proving to be onerous - they are running into issues with file size and disk space, and a need to compress the files piecemeal.

@IamJeffG
Copy link
Contributor

IamJeffG commented Feb 6, 2025

One consideration for very large compressed files (such as Timelapse data with media files) is that the Windmill file input converts and returns the uploaded data as a Base64 encoded string.

When this happens I end up using the Python Azure Storage SDK in the windmill script. The script's input is not a file upload, but the path to a folder or big file in Cloud Storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants