-
Notifications
You must be signed in to change notification settings - Fork 79
GTFS Processing
#GTFS Processing
The configuration data for a system, including routes, stops, trips, schedule times etc., are read from data in the GTFS format. Key additions of the Transitime system are 1) ability to use "supplemental" GTFS files; and 2) being able to modify titles.
GTFS is inherently voluminous because it defines details for each trip. To reduce the amount of information the trips are processed into trip patterns so that there is more data reuse in the database.
Most GTFS files can have supplemental files that contain additional information or corrections. Ideally the agency would fix their GTFS data but that simply is not realistic. Don't want to hand edit the GTFS files from the agency because that is not supportable, especially for agencies where the data is frequently updated. By having supplemental files one can correct just the key problems and the changes will continue to be made even when new GTFS data is processed.
The data is matched using a key. So for the stops.txt file the key is the stop_id column. When the stop_id of the supplemental data matches that of the main stops.txt file then the supplemental data is used.
There are additional columns defined for certain GTFS files, such as stops.txt. For supplemental stops.txt can also define the columns deleteFromRoutes
and deleteFirstStopFromRoutes
in order to delete a stop from the system.
One specifies a directory containing supplemental GTFS files using the -supplementDir /GTFS/sfmta/supplement
command line option when processing the GTFS data.
Often the names or titles specified in GTFS are not intended for end users. The titles might be SHOUTED, peculiar symbols such as '@' might be used as in "4th@Main", white spaces might be lacking, and of course there are misspellings. Transitime allows one to handle these situations by having a titles replacement text file that contains regular expressions to be used on the titles. When processing the GTFS data one uses the -regexReplaceFile file.txt
to specify the file containing the regular expressions. Each line in the file contains a regular expression to match and then a separator =>
and then the replacement text. So to replace Hward with Howard one would use Hward=>Howard
.
The following is data used for sfmta:
-- For sf-muni. This file contains filters for modifying titles.
-- The first column in this file is a regular expression and the
-- second column is the corresponding replacement text. The
-- delimiter between the columns is "=>".
-- To fix O'shanessy
O's=>O'S
O'f=>O'F
Ccsf=>CCSF
Ucsf=>UCSF
Ggnra=>GGNRA
Mcl=>McL
Mca=>McA
Hward=>Howard
-- Following could be part of a longer word so need to use
-- word boundary \b
\bBart\b=>BART
\bUs\b=>US
-- Following two are for making sure have spaces around the '&' char
-- Uses the fancy lookahead and lookbehind features described
-- at http://www.regular-expressions.info/lookaround.html
&(?! )=>&
(?<! )&=> &
It should be noted that titles can also be changed manually by using supplemental GTFS files described above.