This project provides an easy-to-deploy environment for running geospatial processing jobs using the Python programming language with the power of Apache Spark, enhanced by the Sedona library for geospatial analytics and Delta Lake for reliable data storage.
Check out the following project for usage example.
- Docker - To use this environment an installed copy of Docker is required. For this purpose Docker or Docker Desktop is recommended. The following product can be downloaded from their website or installed through a package manager.
-
Clone the repo and navigate to the Project folder
git clone https://github.com/Raychani1/PySpark_Sedona_Delta_Docker
-
Build the Docker Image
docker build -t pyspark_sedona_delta_docker .
-
Navigate to your Project directory and create a Project related Dockerfile based on the new Image with the following content:
FROM pyspark_sedona_delta_docker:latest WORKDIR /app # Install Project related Python libraries COPY requirements.txt . RUN pip install -r requirements.txt --no-cache-dir
-
Build the Project related Docker Image
docker build -t my_project .
-
Create alternative way for more convenient execution
On Linux:
alias My_Project="docker run --rm -it -v $(pwd):/app my_project:latest"
On Windows:
-
Create a PowerShell script file with the following content:
function My_Project { docker run --rm -it -v ${pwd}:/app my_project:latest $args }
-
Load the new function using the dot notation
. .\my_project.ps1
-
-
Run your project through the environment
My_Project python main.py
-
Navigate to your Project directory and create a Project related Dockerfile based on the new Image with the following content:
FROM rajcsanyiladislavit/local_geo_analysis:latest WORKDIR /app # Install Project related Python libraries COPY requirements.txt . RUN pip install -r requirements.txt --no-cache-dir
-
Follow steps 4 - 6 in the previous section.
-
Navigate to your Project directory and create a Project related Dockerfile based on the description in previous section(s).
-
Start up your Project container and connect to it using the official documentation for VS Code and PyCharm.
Distributed under the MIT License. See LICENSE for more information.