From c18b79550d8d7948f5beb3166131e59d6b8e0bcd Mon Sep 17 00:00:00 2001 From: Chwiggy Date: Sun, 12 May 2024 15:55:53 +0200 Subject: [PATCH] fixed references --- src/chapters/01_introduction.typ | 3 +- src/chapters/02_related_work.typ | 2 +- src/chapters/03_methodology.typ | 10 +- src/chapters/04_access.typ | 10 +- src/chapters/05_planning.typ | 6 +- src/chapters/06_summary.typ | 8 +- src/chapters/07_discussion.typ | 2 + src/figures/basemap.svg | 270 +++++++++++++++++++++++++++++++ 8 files changed, 290 insertions(+), 21 deletions(-) create mode 100644 src/figures/basemap.svg diff --git a/src/chapters/01_introduction.typ b/src/chapters/01_introduction.typ index 5289afb..c52f7ec 100644 --- a/src/chapters/01_introduction.typ +++ b/src/chapters/01_introduction.typ @@ -33,7 +33,6 @@ And not least, this means, that this thesis won't be free of these implicit or explicit value judgements either. Transit is an inherently political topic. - // TODO expand: As an exteriority to daily life, transport systems are hard to influence by an individual according to their own needs. With the strong link between transportation and the necessary built infrastructure, transit systems within their geographic location have a strong amount of inertia @holzapfel_urbanismus_2020. Historical decisions about the transport network therefor still have an outsized influence on the transit opportunities of modern people. @@ -49,7 +48,7 @@ This almost precludes the idea of an easy spatial measure of transit access based on individual experiences. However, with an idea focused on opportunities for social connection and social services, which generally tend to be spread more evenly across cities, and a focus on a singular urban space, it might be possible to characterise the nature of a transit system and its capacity to move people without an intense focus on individual points of interest. To this end, I employed statistical routing data based on conveyal engine @conway_evidencetransit. - == Research Questions + == Research Questions Guiding this research, are a few research questions then, that I tried to answer with two indices one for average travel times and one trying to capture the need to plan journeys out before starting on an itinerary. For each research question there are a handful of hypotheses to test. Ideally this thesis can falsify some of these hypotheses. First of all, I will focus on the spatial pattern of mean travel times, to answer the question if the proposed indices can capture observable patterns in the traffic flow of the city. There are two hypothesis associated with this question: diff --git a/src/chapters/02_related_work.typ b/src/chapters/02_related_work.typ index 75fa886..f9b1d01 100644 --- a/src/chapters/02_related_work.typ +++ b/src/chapters/02_related_work.typ @@ -1,7 +1,7 @@ #import "../preamble.typ": * #set math.equation(numbering: "(1)") -= Related Work += Related Work In the following section I will explore, the aspects of this work already discussed in the literature at the intersection of transit accessibility and transit network analysis. This overview is necessarily incomplete, and in the face of various historical traditions of transit planning procedures can only give a sample of ideas present in a very active field of literature. First of all, this section is concerned with giving an overview of the literature that led to the formation of the ideas in this thesis. A discussion of competing ideas follows in a later chapter. == Access diff --git a/src/chapters/03_methodology.typ b/src/chapters/03_methodology.typ index c0b2ccd..05058c4 100644 --- a/src/chapters/03_methodology.typ +++ b/src/chapters/03_methodology.typ @@ -5,8 +5,7 @@ = Methodological Approach This thesis project started as an exploratory data analysis project, trying to find easy to implement metrics for public transit service coverage and accessibility with transit based on open source software and openly accessible data. - Starting points were prior considerations about closeness centrality and reach, as well as an interest in the temporal variability of transit services on the macro and micro scale. - // TODO add references to Related Work + Starting points were prior considerations about closeness centrality and reach, as well as an interest in the temporal variability of transit services on the macro and micro scale (compare @related) . == Case Studies To this end it was necessary to find at least on suitable case study. As my choice to approximate network characteristics here fell on routing with `r5py`, based on conveyal's `r5` routing engine @r5py, there were several data availability requirements. There needed to be a routable General Transit Feed Specification schedule (GTFS) @mobility_data_reference_2024 and suitable street network data from Open Street Map (OSM). For further details see @data below. @@ -15,8 +14,7 @@ After fine tuning the process in Bonn, Germany (mostly due to concerns over point of interest like school locations), I made the decision to use Heidelberg again for the final data for this thesis, based on my personal familiarity with Heidelberg and its transit network. As this thesis does not include any measures to verify the data acquired through routing with an empircal sample of real life experiences, personal experience and familiarity at least allowed for checks against my own experience and intuition. - // TODO better map of Heidelberg - #figure(image("../figures/basemap.png"), caption: [Overview map of Heidelberg (OpenStreetMap contrubutors)], kind: "Map", supplement: "Map") + #figure(image("../figures/basemap.svg"), caption: [Overview map of Heidelberg (OpenStreetMap contrubutors)], kind: "Map", supplement: "Map") For a transit study, Heidelberg, a city of roughly 130,000 people with a large student population, offers a variety of modes of public transit. There are buses and trams operated by the local municpal transit company rnv, regional buses, as well as multiple S-Bahn stations with regular commuter trains. Beyond that Heidelberg offers a few different urban spaces (see @overview). @@ -62,12 +60,12 @@ The easiest choice for this is overlaying grid cells. For this the choice fell on hexagonal grid cells for their translational symmetries in regards to cartesian distance between all adjacent cells. For this hexgrids for the aroa of Heidelberg were acquired from `h3pandas` @dahn_h3pandas_2023. For these cells population density data was acquired from the Global Human Settlement (GHS) project @schiavina_ghs-pop_2023. - == Processing + == Processing With all this data it becomes possible to calculate travel time matrices for multimodal public transport journeys with `r5py`. The general flow of data as described in @processing_chart, was primarily contained within a `python` application running in a `docker` container, that could run on a linux server. The only exceptions were the supply of a suitable GTFS schedule dataset, and the supply of the right tiles from the GHS layer dataset. The large DELFI GTFS dataset was cropped to the general area of Heidelberg to reduce computational overhead for the travel time matrix routing. For this, I used the `gtfs-general` command line tool @psotta_michaelsjpgtfs-general_2024. Similarly osm `.pbf` files acquired from geofabrik were, this time automatically, cropped using `osmosis` @openstreetmap_osmosis_2023 if they were larger than a filesize limit based on the locally available computing power. - Both the OSM data and the gtfs data then were supplied as properties to the `r5py` class `TransitNetwork`. A departure date was automatically chosen then out of the `r5py TransitNetwork` automatically based on a few heuristics, to pick an arbitray non-special weekday. The date arrived at by this process was /* TODO insert date*/. For this date a departure time was chosen for each hour with a departure time window of 60 minutes, as such covering the entire day. Routing modes were set to walking and public transit to capture a common use case of public transit use, where transit users walk to the first stop of their itinerary, and walk from the last transit stop of their itinerary to their destination. + Both the OSM data and the gtfs data then were supplied as properties to the `r5py` class `TransitNetwork`. A departure date was automatically chosen then out of the `r5py TransitNetwork` automatically based on a few heuristics, to pick an arbitray non-special weekday. For this date a departure time was chosen for each hour with a departure time window of 60 minutes, as such covering the entire day. Routing modes were set to walking and public transit to capture a common use case of public transit use, where transit users walk to the first stop of their itinerary, and walk from the last transit stop of their itinerary to their destination. #figure( box( diff --git a/src/chapters/04_access.typ b/src/chapters/04_access.typ index 2e6f5ca..2ab931c 100644 --- a/src/chapters/04_access.typ +++ b/src/chapters/04_access.typ @@ -1,21 +1,21 @@ #import "../preamble.typ": * #set math.equation(numbering: "(1)") = Transit Access - As seen in /*TODO Reference Section Related Work*/, there are plenty of ways to operationalise transit accessibility, as a local measure. - For this thesis, I focused on local connectivity without a large focus on specific itinerary scenarios. As detailed in /*TODO reference Methodology*/, data acquisition focused on travel time matrices to ascertain a measure of reach. As as i set up these these travel time matrices, they are an active measure of reach, that is they measure how easy or hard it may be to move from one cell to another, as oppossed how easy it is for a cell to be reached @levinson_towards_2020. + As seen in @related, there are plenty of ways to operationalise transit accessibility, as a local measure. + For this thesis, I focused on local connectivity without a large focus on specific itinerary scenarios. As detailed in @data, data acquisition focused on travel time matrices to ascertain a measure of reach. As as i set up these these travel time matrices, they are an active measure of reach, that is they measure how easy or hard it may be to move from one cell to another, as oppossed how easy it is for a cell to be reached @levinson_towards_2020. This can obviously easily be reversed to measure passive reach. Travel time here is used as a common cost measure for transit accessibility. - As discussed earlier /*TODO add to related work*/ this is of course not the only realistic measure of impedance. + As discussed earlier @related this is of course not the only realistic measure of impedance. As Heidelbergs transit has generally integrated ticketing and a reasonably high degree of people with public transit subscriptions, it seems reasonable to ignore some cost measures like fare rules. With this approach to reach, I'm basically approximating the inverse of closeness centrality as formulated by @stamos_transportation_2023. - Transit access, however, depends on temporal aspects as well, both because different destinations offer various time constraints as well as the transport network changing over the course of the day @levinson_towards_2020. As mentioned in /*TODO reference to related work*/, this represents a gap in current travel time datasets, and transit accessibility analyses @verduzco_torres_public_2024. By calculating travel time matrices for every hour of the day for this thesis, I try to fill this gap. + Transit access, however, depends on temporal aspects as well, both because different destinations offer various time constraints as well as the transport network changing over the course of the day @levinson_towards_2020. As mentioned in @related, this represents a gap in current travel time datasets, and transit accessibility analyses @verduzco_torres_public_2024. By calculating travel time matrices for every hour of the day for this thesis, I try to fill this gap. == Post-Processing After travel time matrices were calculated with `r5py` @r5py as used in @tenkanen_longitudinal_2020, based on the conveyal engine @Conway_uncertainty_2018, average Travel Times $T_c$ were calculated as in @TravelTimeEq for each cell with $C_d$ as Travel Time Cost from cell to another destination cell divided by the Number of Cells $N_c-1$ for the cell itself. $ T_c = (sum C_d)/(N_c-1) $ - Here $C_d$ describes the median travel time for a cell to cell connection at every point in time within the set 1 hour time interval given to `r5py`. These average travel times were calculated for 24 hours in a representative day for each cell. Capturing median travel times for each hour of that day from cell to cell, based on the Conveyal approach to travel time uncertainties @Conway_uncertainty_2018. This also represents `r5py`'s default behaviour, as `r5py` @r5py by default returns the median travel time over the supplied departure time window for the travel time matrix calculation. It can also supply other percentile travel times within this departure time window. I will make use of this fact in /*TODO reference next chapter*/. Using this median travel time for the average cell to cell travel times over the course of the departure time window of an hour can provide a more realistic measure than a single departure point in time, that would be inherently dependent on the whims of the schedule @levinson_towards_2020 @owen_modeling_2015. + Here $C_d$ describes the median travel time for a cell to cell connection at every point in time within the set 1 hour time interval given to `r5py`. These average travel times were calculated for 24 hours in a representative day for each cell. Capturing median travel times for each hour of that day from cell to cell, based on the Conveyal approach to travel time uncertainties @Conway_uncertainty_2018. This also represents `r5py`'s default behaviour, as `r5py` @r5py by default returns the median travel time over the supplied departure time window for the travel time matrix calculation. It can also supply other percentile travel times within this departure time window. I will make use of this fact in @planning. Using this median travel time for the average cell to cell travel times over the course of the departure time window of an hour can provide a more realistic measure than a single departure point in time, that would be inherently dependent on the whims of the schedule @levinson_towards_2020 @owen_modeling_2015. This measure has also been used in travel time datasets for metrics spanning the UK @verduzco_torres_public_2024 or Helsinki @tenkanen_longitudinal_2020. There is however a gap in temporal variability of transport choices across the course of a day @verduzco_torres_public_2024. One could obviously extend the whole departure time window to an entire day and compare various percentile outcomes. However, that approach limits insights into specific service patterns accross a day and their influence on connectivity. Therefore, it makes more sense to compute travel time matrices for multiple routing queries over the course of a day and to compare their results with each other. diff --git a/src/chapters/05_planning.typ b/src/chapters/05_planning.typ index 3cb464a..33c0dcc 100644 --- a/src/chapters/05_planning.typ +++ b/src/chapters/05_planning.typ @@ -1,13 +1,13 @@ #import "../preamble.typ": * #set math.equation(numbering: "(1)") -= Planning to Access += Planning to Access Beyond raw travel times, I also looked at different percentiles of `r5py` travel time calculations. These percentile differences can be understood as a proxy for the amount of flexibility a traveller brings to adjust their departure time to minimise their waiting time at their first public transport stop or to minimise their overall journey time. As such a high percentile implies transit usage without much planning and adaptation to the schedule and a low percentile implies transit usage with a high degree of planning, but flexibly adapted to the schedule @verduzco_torres_public_2024. The mean difference between these two resulting travel times from each cell to each other cell then represents the expected travel time difference that a public transit user might experience in a specific location when travelling either with a considerable amount of premeditation, or on a whim just hoping for public transit to show up. == Post-Processing - To measure these differences, I performed essentially the same processing as for mean travel times in /*TODO reference processing*/ but taking the difference between the 90th and 10th percentile of r5py @r5py travel times according to /*TODO reference TravelTimeEq */ as seen in @Percentile_Difference. + To measure these differences, I performed essentially the same processing as for mean travel times in @method_processing but taking the difference between the 90th and 10th percentile of r5py @r5py travel times according to @TravelTimeEq as seen in @Percentile_Difference. $ P_c = (sum C_d\("90th"\)-C_d\("10th"\))/(N_c-1) $ @@ -20,7 +20,7 @@ #figure(image("../figures/Difference_Map17.svg"), kind: "Map", supplement: "Map", caption: [Map of differences in travel time in Heidelberg for 90th and 10th percentile of travel times in minutes per cell. Difference in minutes.]) - In general the populated areas in @difference_map seem to display with a few exceptions a hugely homogenous travel time difference of around 10 to 15 minutes. This homogeneity is remains intact looking at neighbourhood averages including unpopulated cells (compare @boxplot_difference). Here neighbourhood averages for travel time differences all range between 10 and 15 minutes. With the exclusion of Handschuhsheim, the window gets even more narrow and average travel times all lie between 12.5 and 15 minutes. Again however as in /*TODO reference previous chapter or boxplot */, neighbourhoods with a lot of unpopulated area like Altstad or Handschuhsheim exhibit a much larger spread of travel time differences than central districts without much unpopulated area, like Bahnstadt. Without populated cell the picture remains largely the same, however differences between the different neighbourhoods shrink further, and the spread of travel time differences within neighbourhoods diminishes as well. + In general the populated areas in @difference_map seem to display with a few exceptions a hugely homogenous travel time difference of around 10 to 15 minutes. This homogeneity is remains intact looking at neighbourhood averages including unpopulated cells (compare @boxplot_difference). Here neighbourhood averages for travel time differences all range between 10 and 15 minutes. With the exclusion of Handschuhsheim, the window gets even more narrow and average travel times all lie between 12.5 and 15 minutes. Again however as in @clean_boxplot, neighbourhoods with a lot of unpopulated area like Altstad or Handschuhsheim exhibit a much larger spread of travel time differences than central districts without much unpopulated area, like Bahnstadt. Without populated cell the picture remains largely the same, however differences between the different neighbourhoods shrink further, and the spread of travel time differences within neighbourhoods diminishes as well. #figure(image("../figures/Boxplots_Difference.svg"), caption: [Boxplot of travel times grouped by neighbourhoods.]) diff --git a/src/chapters/06_summary.typ b/src/chapters/06_summary.typ index dfdbceb..39c495e 100644 --- a/src/chapters/06_summary.typ +++ b/src/chapters/06_summary.typ @@ -1,12 +1,12 @@ #import "../preamble.typ": * = Hypotheses and their Falsification -At the start of this project I set a few hypotheses to primarily falsify. It is tempting to change these hypotheses based on the data acquired here, but that would be akin to putting a cart in front of a horse. The hypotheses as originally set out are listed in /*TODO reference right section here*/. In the following two sections I will try and see if I managed to bring evidence to falsify these hypotheses. +At the start of this project I set a few hypotheses to primarily falsify. It is tempting to change these hypotheses based on the data acquired here, but that would be akin to putting a cart in front of a horse. The hypotheses as originally set out are listed in @hypo. In the following two sections I will try and see if I managed to bring evidence to falsify these hypotheses. == Travel Time Indicator - The first hypothesis concerned mean travel times, and their ability to capture transit service patterns. The null hypothesis would be that our indicator cannot capture any details about public transit patterns in Heidelberg. Comparing the map of average cell to cell travel times /*TODO reference travel time maps*/ to for example a map of the Heidelberg tram network (@trams), the tram network coincides with a lot of lower travel time cells. + The first hypothesis concerned mean travel times, and their ability to capture transit service patterns. The null hypothesis would be that our indicator cannot capture any details about public transit patterns in Heidelberg. Comparing the map of average cell to cell travel times @map_17_tt to for example a map of the Heidelberg tram network (@trams), the tram network coincides with a lot of lower travel time cells. - #figure(image("../figures/2024-02-11_Straßenbahn-Linien-Heidelberg-2019-09.svg"), caption: [rnv Tram service map as of September 2023 from /*TODO reference this right*/], kind: "Map", supplement: "Map") + #figure(image("../figures/2024-02-11_Straßenbahn-Linien-Heidelberg-2019-09.svg"), caption: [rnv Tram service map as of September 2023], kind: "Map", supplement: "Map") Furthermore, the indicator clearly captures elements of centrality within the city, such as outlying districts which need more travel time to other districts than the central nodes of the public transit network. As such the null hypothesis that the travel time indicator cannot reproduce features of Heidelberg transit, can be rejected. The mean travel time is a useful indicator. @@ -19,7 +19,7 @@ At the start of this project I set a few hypotheses to primarily falsify. It is == Planning Indicator The third hypothesis concerns the alignment of the planning indicator with the mean travel time indicator. I hypothesised that these would align fairly well, and central locations with a low average travel time would be also well connected in terms of the need to plan out journeys, or how much difference planning a journey makes to travel times. - For peak hour travel this is definitely not true. At 17:00 local time /*TODO reference right chapter, or image*/, it is weirdly enough not central places that have a low travel time difference, but the places that lie in unpopulated areas without direct access to a stop, but that are about equidistant from two stops that lie on completely differenc line. These, for example, are cells in the forest east of Handschuhsheim. Cells within the populated area display a largely homogenous distribution of travel time differences, acrosse the day. Only at edge times, do outlying districts exhibit worse travel time differences than central hubs. As such the hypothesis can be falsified, but in a way that puts the usefulness of this indicator as a proxy for the need to plan into question. + For peak hour travel this is definitely not true. At 17:00 local time @difference_map, it is weirdly enough not central places that have a low travel time difference, but the places that lie in unpopulated areas without direct access to a stop, but that are about equidistant from two stops that lie on completely differenc line. These, for example, are cells in the forest east of Handschuhsheim. Cells within the populated area display a largely homogenous distribution of travel time differences, acrosse the day. Only at edge times, do outlying districts exhibit worse travel time differences than central hubs. As such the hypothesis can be falsified, but in a way that puts the usefulness of this indicator as a proxy for the need to plan into question. This however also falsifies the fourth hypothesis, that the planning indicator will preserve it's pattern across the day. As seen above, this isn't true, and outlying areas experience higher travel time differences specifically before the morning rush-hour and after the evening peak. For nighttime travel the pattern reverses because in outlying areas walking becomes the only viable mode in the set of modes used for routing, and suddenly the schedules matter only at stations that still see some limited night time service. diff --git a/src/chapters/07_discussion.typ b/src/chapters/07_discussion.typ index ed78566..c9cb997 100644 --- a/src/chapters/07_discussion.typ +++ b/src/chapters/07_discussion.typ @@ -46,6 +46,8 @@ I also excluded public transit fare structures as implemented in @conway_off_the_mta. While these are a common cost factor used next to travel time @levinson_towards_2020, they are complicated to implement for a simple analysis like this, especially when GTFS schedule data doesn't include the optional fare rules, dataset. And a focus on more detailed model itineraries could help make this a much more robust look into less stereotypical nine to five commute scenarios. Unfortunately point of interest data comes with its own challenges. == Outlook + The analysis provide here, is of course a snapshot in time. Transit schedules change, permanently or due to constructions. These changes make it useful to reasses particular transit systems regularly and might make it interesting to check how transit system changes affect indicators like these. There is already valuable modelling of transit service quality improvements and their effects on percieved transit costs @litman_valuing_2008. + So what can be done to improve the robustness and veracity of these results. First the analysis could be expanded by differenc model scenarios. This could include school locations, which didn't make the cut because good school data with a distinction of school types was not available for Heidelberg, but for Bonn @ministerium_fur_schule_und_bildung_nrw_grunddaten_2016. However, this could also include a myriad of other point of interests for interesting travel scenarios. Beyond that it seems useful to test these results against real world data. This could involve a more heavy usage of spatial statistics to correlate results with other measures like socio-economic data for example @pereira_is_2023. Just to test the assumption of the routing approach however, actual public transit usage data would be fairly useful. Unfortunately, this data rarely exists in a openly accessible way -- especially not in transit fare systems, like Germany, where there is no rigorous system of ticket checks or counting passengers. diff --git a/src/figures/basemap.svg b/src/figures/basemap.svg new file mode 100644 index 0000000..8a080ea --- /dev/null +++ b/src/figures/basemap.svg @@ -0,0 +1,270 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Lotte + + + + + QGIS 3.36.2-Maidenhead + + + + + + + + image/svg+xml + + 2024-05-03T16:37:36 + + + + + Lotte + + + + + QGIS 3.36.2-Maidenhead + + + + + +