-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added detailed example about how to use the EventsUnion and its union() method to overlay information across multiple inputs.
- Loading branch information
Showing
3 changed files
with
259 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,259 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "5d13fc96-ad0d-46ca-8924-dff8ef56ef38", | ||
"metadata": {}, | ||
"source": [ | ||
"# How to transfer data from one GIS layer to another - Overlay\n", | ||
"\n", | ||
"## Setup\n", | ||
"\n", | ||
"Suppose we have two separate GeoDataFrames, both containing different pieces of information about the same route, but the line breaks don't line up across the two GeoDataFrames. \n", | ||
"\n", | ||
"Here is a visual example of the problem:\n", | ||
"\n", | ||
"\n", | ||
"Figure 1: Example Setup\n", | ||
"\n", | ||
"The blue line on top (labeled \"Reference\") has three segments: from 0 to 2, from 2 to 4 and from 4 to 5. \n", | ||
"The orange line on the bottom (labelled \"Input\") has two segments: from 1 to 3, and from 3 to 6. \n", | ||
"Notice how the breaks don't line up nicely across them, and there are even parts where the Reference layer has data and the Input doesn't (and vice versa). Furthermore, notice how the Reference layer has information about the `Value` variable for each of its segments, while the Input has information about the `Category` variable for each of its segments.\n", | ||
"\n", | ||
"It is worth noting that, in this example, all of the lines shown are assumed to belong to the same Route ID, so that information is ommitted from the images." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f25e64a6-1755-432d-84c7-c05b192c40fa", | ||
"metadata": {}, | ||
"source": [ | ||
"## The idea behind the solution\n", | ||
"\n", | ||
"It is quite common for us to have situations like these, where linework was generated with different breakpoints, each one of which contains important information, and we are tasked with merging or joining the data to figure out the characteristics of the entire route for all variables across all of the disparate layers. \n", | ||
"\n", | ||
"The idea behind it is actually quite simple. Ultimately, a new DataFrame is created with breaks at every point across the two layers. You can think about this process as trying to find the smallest common denominator, one little piece at a time, across the Reference and Input layers. \n", | ||
"\n", | ||
"\n", | ||
"Figure 2: Solution\n", | ||
"\n", | ||
"In the output, we can see several smaller segments: one from 0 to 1, another from 1 to 2, and so on. This way, it becomes easy to figure out what the `Value` and `Category` values need to be for each little segment in the output." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2d42cc38-7231-4fe9-b4a6-5e7793d830df", | ||
"metadata": {}, | ||
"source": [ | ||
"## Solving the problem with code\n", | ||
"\n", | ||
"In `linref`, we can solve this problem using a process called `union`. For reference, this process is sometimes called \"overlay\" (or overlaying) in other software, such as ArcGIS Pro. \n", | ||
"\n", | ||
"Let's walk through how we would make this work using `linref`.\n", | ||
"\n", | ||
"### Basic setup\n", | ||
"The first thing to do is to load up `pandas`, `linref` and create the input data we will use." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "d6f3629a-c359-48c5-b2e5-e54e956a72f0", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Loading important libraries\n", | ||
"import pandas as pd\n", | ||
"import linref as lr" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "04c55507-73ff-4485-9da7-b10f95712ab4", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Creating synthetic data \n", | ||
"reference_df = pd.DataFrame({'reference_rowid':[1,2,3],\n", | ||
" 'route_id':['main_route','main_route','main_route'],\n", | ||
" 'beg':[0,2,4],\n", | ||
" 'end':[2,4,5],\n", | ||
" 'val':[3.5,6,1.5]})\n", | ||
"\n", | ||
"reference_ec = lr.EventsCollection(reference_df , \n", | ||
" keys=['route_id'],\n", | ||
" beg='beg',\n", | ||
" end='end',\n", | ||
" )\n", | ||
"\n", | ||
"input_df = pd.DataFrame({'input_rowid':[100,200],\n", | ||
" 'route_id':['main_route','main_route'],\n", | ||
" 'beg':[1,3],\n", | ||
" 'end':[3,6],\n", | ||
" 'categ':['A','B']\n", | ||
" })\n", | ||
"\n", | ||
"input_ec = lr.EventsCollection(input_df, \n", | ||
" keys=['route_id'],\n", | ||
" beg='beg',\n", | ||
" end='end',\n", | ||
" )" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ca1ed1da-a906-47f3-96bc-c3396c57f0fd", | ||
"metadata": {}, | ||
"source": [ | ||
"### Creating an `EventsUnion` and executing a `union()`\n", | ||
"\n", | ||
"Now we have two `EventCollection` objects: `reference_ec` (analogous to the blue \"Reference\" linework) and `input_ec` (analogous to the orange \"Input\" linework), identical to the setup shown in Figure 1 above. \n", | ||
"\n", | ||
"The only thing we need to do is create an `EventsUnion` object and then execute the `union()` method, as shown below." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "f8b236cf-063c-48d2-a16d-879f6fbff151", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"C:\\Users\\diasf\\.conda\\envs\\gpd\\Lib\\site-packages\\linref\\events\\collection.py:2384: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.\n", | ||
" group = self._build_group(self._groups.get_group(keys))\n", | ||
"C:\\Users\\diasf\\.conda\\envs\\gpd\\Lib\\site-packages\\linref\\events\\collection.py:2384: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.\n", | ||
" group = self._build_group(self._groups.get_group(keys))\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# Creating the EventsUnion object\n", | ||
"eu = lr.EventsUnion([reference_ec,input_ec])\n", | ||
"\n", | ||
"# Executing the `union()` method\n", | ||
"union_ec = eu.union()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "efc8a02e-264f-4419-8b8e-5139f4da4305", | ||
"metadata": {}, | ||
"source": [ | ||
"Once the `union()` method is executed, it creates a new `EventsCollection` object containing the results.\n", | ||
"\n", | ||
"We can then look inside the resulting `DataFrame` as follows:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "0ba0ff6a-7e0e-4a5e-89d5-a5ef193aa185", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
" route_id beg end index_0 index_1\n", | ||
"0 main_route 0.0 1.0 0.0 NaN\n", | ||
"1 main_route 1.0 2.0 0.0 0.0\n", | ||
"2 main_route 2.0 3.0 1.0 0.0\n", | ||
"3 main_route 3.0 4.0 1.0 1.0\n", | ||
"4 main_route 4.0 5.0 2.0 1.0\n", | ||
"5 main_route 5.0 6.0 NaN 1.0\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# Extracting the DataFrame from the result of the EventsUnion\n", | ||
"union_df = union_ec.df.copy()\n", | ||
"\n", | ||
"print(union_df)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "fcb81973-1685-41c0-b843-2c3188594b19", | ||
"metadata": {}, | ||
"source": [ | ||
"### Merging information from the original DataFrames into the output from the `union()`\n", | ||
"\n", | ||
"The resulting `DataFrame` is broken down into the smallest common segments across both members of the union, just like we showed in Figure 2. \n", | ||
"\n", | ||
"The two `index` columns (`index_0` and `index_1`) contain information about how each one of its segments relates to the rows in the Reference and Input layers. Specifically, `index_0` and `index_1` are the indexes of the rows of the `reference_df` and `input_df` objects, respectively. \n", | ||
"\n", | ||
"Therefore, if we want to join or merge any data from those original layers, we can just use a basic pandas `merge()`, with `left_on=\"index_0\"` (or `left_on=\"index_1\"`) and with `right_index=True`. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "3272484f-fe6f-4d02-8d5b-f0d5e17789cb", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
" route_id beg end index_0 index_1 val categ\n", | ||
"0 main_route 0.0 1.0 0.0 NaN 3.5 NaN\n", | ||
"1 main_route 1.0 2.0 0.0 0.0 3.5 A\n", | ||
"2 main_route 2.0 3.0 1.0 0.0 6.0 A\n", | ||
"3 main_route 3.0 4.0 1.0 1.0 6.0 B\n", | ||
"4 main_route 4.0 5.0 2.0 1.0 1.5 B\n", | ||
"5 main_route 5.0 6.0 NaN 1.0 NaN B\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# Merging the output of the EventsUnion with the original `reference_df` and `input_df` \n", | ||
"# to carry over the `'val'` and `'categ'` columns, respectively.\n", | ||
"out_df = (union_df\n", | ||
" .merge(reference_df[['val']], \n", | ||
" how='left', \n", | ||
" left_on='index_0', \n", | ||
" right_index=True)\n", | ||
" .merge(input_df[['categ']], \n", | ||
" how='left', \n", | ||
" left_on='index_1', \n", | ||
" right_index=True)\n", | ||
" )\n", | ||
"\n", | ||
"print(out_df)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2069e58d-029e-497b-ad16-1929cdb3e780", | ||
"metadata": {}, | ||
"source": [ | ||
"Finally, the result looks exactly like the green Output line shown in Figure 2, where each little segment has information about the `Value` and `Category` segment. " | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.2" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.