Skip to content

Commit

Permalink
fixed references
Browse files Browse the repository at this point in the history
  • Loading branch information
Chwiggy committed May 12, 2024
1 parent b2c4ba3 commit c18b795
Show file tree
Hide file tree
Showing 8 changed files with 290 additions and 21 deletions.
3 changes: 1 addition & 2 deletions src/chapters/01_introduction.typ
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@
And not least, this means, that this thesis won't be free of these implicit or explicit value judgements either.
Transit is an inherently political topic.

// TODO expand:
As an exteriority to daily life, transport systems are hard to influence by an individual according to their own needs.
With the strong link between transportation and the necessary built infrastructure, transit systems within their geographic location have a strong amount of inertia @holzapfel_urbanismus_2020.
Historical decisions about the transport network therefor still have an outsized influence on the transit opportunities of modern people.
Expand All @@ -49,7 +48,7 @@

This almost precludes the idea of an easy spatial measure of transit access based on individual experiences. However, with an idea focused on opportunities for social connection and social services, which generally tend to be spread more evenly across cities, and a focus on a singular urban space, it might be possible to characterise the nature of a transit system and its capacity to move people without an intense focus on individual points of interest. To this end, I employed statistical routing data based on conveyal engine @conway_evidencetransit.

== Research Questions
== Research Questions <hypo>
Guiding this research, are a few research questions then, that I tried to answer with two indices one for average travel times and one trying to capture the need to plan journeys out before starting on an itinerary. For each research question there are a handful of hypotheses to test. Ideally this thesis can falsify some of these hypotheses.

First of all, I will focus on the spatial pattern of mean travel times, to answer the question if the proposed indices can capture observable patterns in the traffic flow of the city. There are two hypothesis associated with this question:
Expand Down
2 changes: 1 addition & 1 deletion src/chapters/02_related_work.typ
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#import "../preamble.typ": *
#set math.equation(numbering: "(1)")

= Related Work
= Related Work <related>
In the following section I will explore, the aspects of this work already discussed in the literature at the intersection of transit accessibility and transit network analysis. This overview is necessarily incomplete, and in the face of various historical traditions of transit planning procedures can only give a sample of ideas present in a very active field of literature. First of all, this section is concerned with giving an overview of the literature that led to the formation of the ideas in this thesis. A discussion of competing ideas follows in a later chapter.

== Access
Expand Down
10 changes: 4 additions & 6 deletions src/chapters/03_methodology.typ
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@

= Methodological Approach
This thesis project started as an exploratory data analysis project, trying to find easy to implement metrics for public transit service coverage and accessibility with transit based on open source software and openly accessible data.
Starting points were prior considerations about closeness centrality and reach, as well as an interest in the temporal variability of transit services on the macro and micro scale.
// TODO add references to Related Work
Starting points were prior considerations about closeness centrality and reach, as well as an interest in the temporal variability of transit services on the macro and micro scale (compare @related) .

== Case Studies <case-study>
To this end it was necessary to find at least on suitable case study. As my choice to approximate network characteristics here fell on routing with `r5py`, based on conveyal's `r5` routing engine @r5py, there were several data availability requirements. There needed to be a routable General Transit Feed Specification schedule (GTFS) @mobility_data_reference_2024 and suitable street network data from Open Street Map (OSM). For further details see @data below.
Expand All @@ -15,8 +14,7 @@
After fine tuning the process in Bonn, Germany (mostly due to concerns over point of interest like school locations), I made the decision to use Heidelberg again for the final data for this thesis, based on my personal familiarity with Heidelberg and its transit network.
As this thesis does not include any measures to verify the data acquired through routing with an empircal sample of real life experiences, personal experience and familiarity at least allowed for checks against my own experience and intuition.

// TODO better map of Heidelberg
#figure(image("../figures/basemap.png"), caption: [Overview map of Heidelberg (OpenStreetMap contrubutors)], kind: "Map", supplement: "Map") <overview>
#figure(image("../figures/basemap.svg"), caption: [Overview map of Heidelberg (OpenStreetMap contrubutors)], kind: "Map", supplement: "Map") <overview>

For a transit study, Heidelberg, a city of roughly 130,000 people with a large student population, offers a variety of modes of public transit. There are buses and trams operated by the local municpal transit company rnv, regional buses, as well as multiple S-Bahn stations with regular commuter trains.
Beyond that Heidelberg offers a few different urban spaces (see @overview).
Expand Down Expand Up @@ -62,12 +60,12 @@
The easiest choice for this is overlaying grid cells. For this the choice fell on hexagonal grid cells for their translational symmetries in regards to cartesian distance between all adjacent cells. For this hexgrids for the aroa of Heidelberg were acquired from `h3pandas` @dahn_h3pandas_2023. For these cells population density data was acquired from the Global Human Settlement (GHS) project @schiavina_ghs-pop_2023.


== Processing
== Processing <method_processing>
With all this data it becomes possible to calculate travel time matrices for multimodal public transport journeys with `r5py`. The general flow of data as described in @processing_chart, was primarily contained within a `python` application running in a `docker` container, that could run on a linux server. The only exceptions were the supply of a suitable GTFS schedule dataset, and the supply of the right tiles from the GHS layer dataset.

The large DELFI GTFS dataset was cropped to the general area of Heidelberg to reduce computational overhead for the travel time matrix routing. For this, I used the `gtfs-general` command line tool @psotta_michaelsjpgtfs-general_2024. Similarly osm `.pbf` files acquired from geofabrik were, this time automatically, cropped using `osmosis` @openstreetmap_osmosis_2023 if they were larger than a filesize limit based on the locally available computing power.

Both the OSM data and the gtfs data then were supplied as properties to the `r5py` class `TransitNetwork`. A departure date was automatically chosen then out of the `r5py TransitNetwork` automatically based on a few heuristics, to pick an arbitray non-special weekday. The date arrived at by this process was /* TODO insert date*/. For this date a departure time was chosen for each hour with a departure time window of 60 minutes, as such covering the entire day. Routing modes were set to walking and public transit to capture a common use case of public transit use, where transit users walk to the first stop of their itinerary, and walk from the last transit stop of their itinerary to their destination.
Both the OSM data and the gtfs data then were supplied as properties to the `r5py` class `TransitNetwork`. A departure date was automatically chosen then out of the `r5py TransitNetwork` automatically based on a few heuristics, to pick an arbitray non-special weekday. For this date a departure time was chosen for each hour with a departure time window of 60 minutes, as such covering the entire day. Routing modes were set to walking and public transit to capture a common use case of public transit use, where transit users walk to the first stop of their itinerary, and walk from the last transit stop of their itinerary to their destination.

#figure(
box(
Expand Down
10 changes: 5 additions & 5 deletions src/chapters/04_access.typ
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
#import "../preamble.typ": *
#set math.equation(numbering: "(1)")
= Transit Access
As seen in /*TODO Reference Section Related Work*/, there are plenty of ways to operationalise transit accessibility, as a local measure.
For this thesis, I focused on local connectivity without a large focus on specific itinerary scenarios. As detailed in /*TODO reference Methodology*/, data acquisition focused on travel time matrices to ascertain a measure of reach. As as i set up these these travel time matrices, they are an active measure of reach, that is they measure how easy or hard it may be to move from one cell to another, as oppossed how easy it is for a cell to be reached @levinson_towards_2020.
As seen in @related, there are plenty of ways to operationalise transit accessibility, as a local measure.
For this thesis, I focused on local connectivity without a large focus on specific itinerary scenarios. As detailed in @data, data acquisition focused on travel time matrices to ascertain a measure of reach. As as i set up these these travel time matrices, they are an active measure of reach, that is they measure how easy or hard it may be to move from one cell to another, as oppossed how easy it is for a cell to be reached @levinson_towards_2020.
This can obviously easily be reversed to measure passive reach.

Travel time here is used as a common cost measure for transit accessibility.
As discussed earlier /*TODO add to related work*/ this is of course not the only realistic measure of impedance.
As discussed earlier @related this is of course not the only realistic measure of impedance.
As Heidelbergs transit has generally integrated ticketing and a reasonably high degree of people with public transit subscriptions, it seems reasonable to ignore some cost measures like fare rules.
With this approach to reach, I'm basically approximating the inverse of closeness centrality as formulated by @stamos_transportation_2023.

Transit access, however, depends on temporal aspects as well, both because different destinations offer various time constraints as well as the transport network changing over the course of the day @levinson_towards_2020. As mentioned in /*TODO reference to related work*/, this represents a gap in current travel time datasets, and transit accessibility analyses @verduzco_torres_public_2024. By calculating travel time matrices for every hour of the day for this thesis, I try to fill this gap.
Transit access, however, depends on temporal aspects as well, both because different destinations offer various time constraints as well as the transport network changing over the course of the day @levinson_towards_2020. As mentioned in @related, this represents a gap in current travel time datasets, and transit accessibility analyses @verduzco_torres_public_2024. By calculating travel time matrices for every hour of the day for this thesis, I try to fill this gap.

== Post-Processing <processing>
After travel time matrices were calculated with `r5py` @r5py as used in @tenkanen_longitudinal_2020, based on the conveyal engine @Conway_uncertainty_2018, average Travel Times $T_c$ were calculated as in @TravelTimeEq for each cell with $C_d$ as Travel Time Cost from cell to another destination cell divided by the Number of Cells $N_c-1$ for the cell itself.
$ T_c = (sum C_d)/(N_c-1) $ <TravelTimeEq>
Here $C_d$ describes the median travel time for a cell to cell connection at every point in time within the set 1 hour time interval given to `r5py`. These average travel times were calculated for 24 hours in a representative day for each cell. Capturing median travel times for each hour of that day from cell to cell, based on the Conveyal approach to travel time uncertainties @Conway_uncertainty_2018. This also represents `r5py`'s default behaviour, as `r5py` @r5py by default returns the median travel time over the supplied departure time window for the travel time matrix calculation. It can also supply other percentile travel times within this departure time window. I will make use of this fact in /*TODO reference next chapter*/. Using this median travel time for the average cell to cell travel times over the course of the departure time window of an hour can provide a more realistic measure than a single departure point in time, that would be inherently dependent on the whims of the schedule @levinson_towards_2020 @owen_modeling_2015.
Here $C_d$ describes the median travel time for a cell to cell connection at every point in time within the set 1 hour time interval given to `r5py`. These average travel times were calculated for 24 hours in a representative day for each cell. Capturing median travel times for each hour of that day from cell to cell, based on the Conveyal approach to travel time uncertainties @Conway_uncertainty_2018. This also represents `r5py`'s default behaviour, as `r5py` @r5py by default returns the median travel time over the supplied departure time window for the travel time matrix calculation. It can also supply other percentile travel times within this departure time window. I will make use of this fact in @planning. Using this median travel time for the average cell to cell travel times over the course of the departure time window of an hour can provide a more realistic measure than a single departure point in time, that would be inherently dependent on the whims of the schedule @levinson_towards_2020 @owen_modeling_2015.

This measure has also been used in travel time datasets for metrics spanning the UK @verduzco_torres_public_2024 or Helsinki @tenkanen_longitudinal_2020. There is however a gap in temporal variability of transport choices across the course of a day @verduzco_torres_public_2024. One could obviously extend the whole departure time window to an entire day and compare various percentile outcomes. However, that approach limits insights into specific service patterns accross a day and their influence on connectivity. Therefore, it makes more sense to compute travel time matrices for multiple routing queries over the course of a day and to compare their results with each other.

Expand Down
6 changes: 3 additions & 3 deletions src/chapters/05_planning.typ
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
#import "../preamble.typ": *
#set math.equation(numbering: "(1)")

= Planning to Access
= Planning to Access <planning>
Beyond raw travel times, I also looked at different percentiles of `r5py` travel time calculations. These percentile differences can be understood as a proxy for the amount of flexibility a traveller brings to adjust their departure time to minimise their waiting time at their first public transport stop or to minimise their overall journey time.
As such a high percentile implies transit usage without much planning and adaptation to the schedule and a low percentile implies transit usage with a high degree of planning, but flexibly adapted to the schedule @verduzco_torres_public_2024.
The mean difference between these two resulting travel times from each cell to each other cell then represents the expected travel time difference that a public transit user might experience in a specific location when travelling either with a considerable amount of premeditation, or on a whim just hoping for public transit to show up.

== Post-Processing
To measure these differences, I performed essentially the same processing as for mean travel times in /*TODO reference processing*/ but taking the difference between the 90th and 10th percentile of r5py @r5py travel times according to /*TODO reference TravelTimeEq */ as seen in @Percentile_Difference.
To measure these differences, I performed essentially the same processing as for mean travel times in @method_processing but taking the difference between the 90th and 10th percentile of r5py @r5py travel times according to @TravelTimeEq as seen in @Percentile_Difference.

$ P_c = (sum C_d\("90th"\)-C_d\("10th"\))/(N_c-1) $ <Percentile_Difference>

Expand All @@ -20,7 +20,7 @@

#figure(image("../figures/Difference_Map17.svg"), kind: "Map", supplement: "Map", caption: [Map of differences in travel time in Heidelberg for 90th and 10th percentile of travel times in minutes per cell. Difference in minutes.]) <difference_map>

In general the populated areas in @difference_map seem to display with a few exceptions a hugely homogenous travel time difference of around 10 to 15 minutes. This homogeneity is remains intact looking at neighbourhood averages including unpopulated cells (compare @boxplot_difference). Here neighbourhood averages for travel time differences all range between 10 and 15 minutes. With the exclusion of Handschuhsheim, the window gets even more narrow and average travel times all lie between 12.5 and 15 minutes. Again however as in /*TODO reference previous chapter or boxplot */, neighbourhoods with a lot of unpopulated area like Altstad or Handschuhsheim exhibit a much larger spread of travel time differences than central districts without much unpopulated area, like Bahnstadt. Without populated cell the picture remains largely the same, however differences between the different neighbourhoods shrink further, and the spread of travel time differences within neighbourhoods diminishes as well.
In general the populated areas in @difference_map seem to display with a few exceptions a hugely homogenous travel time difference of around 10 to 15 minutes. This homogeneity is remains intact looking at neighbourhood averages including unpopulated cells (compare @boxplot_difference). Here neighbourhood averages for travel time differences all range between 10 and 15 minutes. With the exclusion of Handschuhsheim, the window gets even more narrow and average travel times all lie between 12.5 and 15 minutes. Again however as in @clean_boxplot, neighbourhoods with a lot of unpopulated area like Altstad or Handschuhsheim exhibit a much larger spread of travel time differences than central districts without much unpopulated area, like Bahnstadt. Without populated cell the picture remains largely the same, however differences between the different neighbourhoods shrink further, and the spread of travel time differences within neighbourhoods diminishes as well.

#figure(image("../figures/Boxplots_Difference.svg"), caption: [Boxplot of travel times grouped by neighbourhoods.]) <boxplot_difference>

Expand Down
Loading

0 comments on commit c18b795

Please sign in to comment.