-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add HAFAS mgate.exe endpoints #7
Conversation
Nice! I'd say we can land this even if still incomplete and fill the gaps as the remaining details become available. I have been working on adjusting the coverage format in KPublicTransport, so it should get easier to merge that into here. Regarding the file names: KPublicTransport uses the ISO-3166-2 region code as an additional namespace level next to the country code, for local operators. This helps to disambiguate e.g. "AVV" in Germany (Aachen vs. Augsburg). That could of course also be done by other means, but IMHO something worth thinking about. |
I sounds tempting, but there are quite a lot of endpoints that don't cover a single county/region, either less or far more (e.g. VRN/RMV, VBB/BVG, NVV). I think we could go arbitrarily fine-grained here, so I just used the endpoint's provider's abbreviation for now. |
@em0lar & @n0emis I will just tag you here, since you have created pyhafas. Once this is merged, |
ac4aa53
to
12efb66
Compare
Right. I'd not see this as coverage information, but purely as a systematic disambiguation. OTOH I'm only aware of one practically relevant abbreviation collision within one country, and that's AVV. Another thing I noticed while trying to integrate this: we are using |
Makes sense. Let's go with the systematic distinction then and store/name endpoints by their providers "center of operation".
I've picked |
Nearly all files in this pull request specify coverage with country codes instead of geojson polygons. While this can simplify both data maintenance and consumption (API consumers wanting to find out if a location is covered by an endpoint simply need to check whether it is in one of the listed countries using an arbitrarily coarse/accurate database of country polygons), it also means that API consumers -must- have such a database to take advantage of coverage data. Is there such a database we can refer to? If not, I'd prefer to always provide polygons. Extending our spec so we can augment them with country codes might be useful, though. Side note: the definition for germany at https://github.com/isellsoap/deutschlandGeoJSON/blob/master/1_deutschland/4_niedrig.geo.json is already 22kB. I think we should settle for a much lower level of detail here, e.g. the rectangles already used in https://invent.kde.org/libraries/kpublictransport/-/tree/master/src/lib/networks. Do you have different preferences or am I missing something here? If not, I'll replace the coverage country codes with polygons in the next days. |
I found several, and probably there are many more:
These are in different states of maintenance, and handle disputes & non-UN-recognized countries/regions differently, etc. But generally speaking: Yes, there are.
IMO coverage polygons imply greater accuracy than what they actually convey. And once you go down this rabbit hole, you'll be very busy specifying exactly what and what not each endpoint covers. Most endpoints' coverage areas don't have "clean" borders, they have many exceptions. Maintaining a dataset, and later on deciding to extend it, is far easier to do than having a very detailed dataset that ends up unmaintained (and therefore partially wrong). This is why I'm a bit skeptical of adding polygons for endpoints with a large (and very random) coverage area. I definitely see your point that providing polygons would make consumption a lot easier, but maybe we can find a middle ground by linking to some of these databases. Keep in mind that, as long as a coverage area is composed of "complete" Bundesländer (country subdivisions to be exact) – which is often the case –, we could just use ISO 3166-2 codes.
As in "this shape corresponds exactly to that country"? Or as in "this shape lies entirely within that country"? |
On Tue, Jan 26, 2021 at 06:24:56AM -0800, Jannis Redmann wrote:
IMO coverage polygons *imply* greater accuracy than what they actually convey. And once you go down this rabbit hole, you'll be *very busy* specifying exactly what and what not each endpoint covers. Most endpoints' coverage areas don't have "clean" borders, they have many exceptions.
Maintaining a dataset, and later on deciding to extend it, is far easier to do than having a very detailed dataset that ends up unmaintained (and therefore partially wrong). This is why I'm a bit skeptical of adding polygons for endpoints with a large (and very random) coverage area. I definitely see your point that providing polygons would make consumption a lot easier, but maybe we can find a middle ground by linking to some of these databases.
Keep in mind that, as long as a coverage area is composed of "complete" *Bundesländer* (country subdivisions to be exact) – which is often the case –, we could just use [ISO 3166-2 codes](https://www.npmjs.com/package/iso3166-2-db).
The reason I brought this up is that (as I just noticed) I forgot to add
coverage.region to schema.json and also didn't think of looking at the
definition in readme.md when jsonschema complained about missing coverage
polygons in this PR. So I was under the impression that we only allowed geojson
polygons for coverage regions, and not ISO-3166-1/2 codes. Oops :)
Thank you for the insights regarding polygons vs country codes -- you're
absolutely right. I agree that using ISO 3166 1/2 where possible is best.
> Extending our spec so we can augment them with country codes might be useful, though.
As in "this shape corresponds exactly to that country"? Or as in "this shape lies entirely within that country"?
I meant adding region codes to the spec -- which is a moot point, as they are
already part of it.
I'll update schema.json right away.
|
The distinction between "area" and "region" as well as the definition of "region" were missing. See also #7
Right, I see the polygons and region codes as complementary, ie. we want both IMHO.
Maintenance overhead for the extra region codes should be minimal I think, the part that usually takes time is to determine the coverage area and create the polygons. The region codes might even simplify this by decoupling those two steps, like it happened here. |
👍 I see the value of having polygons for consumption, but I think we should try to have them in a reasonably high resolution:
For people querying a connection from Gubin (in Poland, close to the Polish-Germany border), to Berlin, it's relevant to know if there's a local bus connecting the center of Gubin with Guben (German side, about 2km away, where the RE1 stops). I might have an extreme opinion on this: With transit routing, in certain situations (e.g. any form of accessibility requirement), inaccurate connections become borderline incorrect. Of course, we can't make all routing responses completely correct (including accessibility info, etc.), but we can at least handle the situation in the least unhelpful way:
Once we support Krosno Odrzańskie County public transport (imaginary example), a consuming app could show:
Maybe I'm too ambitious here, and cross-region/-country public transportation is not ready to be that user-friendly yet. |
Right, high-res polygons in the source data and pre-processing would work for KPublicTransport, even just bounding boxes as the extreme low-res case would help us already.
Right, but isn't all that independent of whether you have a high res polygon or say just a bounding box? For implementing something like this it would seem important to know which operators need to be checked for possible routes along the way, DB and the local operator in your example. Both would still be selected when only having bounding boxes, you might however consider more operators due to too their bounding box covering areas not actually supported. Or would you use the high-res polygon to determine the nearest location reachable by one operator client-side? |
The problem here is that most API endpoints don't tell (or wouldn't even know themselves!) if the know all local transit. For example: With a computed connections including 10km walking, I can't tell (and often even they wouldn't be able to) tell if 10km walking is indeed the best route given all local transit, or if it just appears to be the best because the endpoint only knows about a few regional trains. This is where exact coverage areas can help, defined with background knowledge that we've managed to obtain (e.g. by checking if there's local transit that the endpoint doesn't know about). We're drifting a little far off here though. In general, I'm in favor of adding coverage polygons, as long as we don't make them less accurate on purpose so that the statement is borderline "wrong". |
That works for KPublicTransport, our only constraint there is that the polygon is containing the covered area, ie. rather is too large than too small, an exact high-res polygon would satisfy that as well. |
With my |
ff8b26e
to
4aba333
Compare
c730684
to
e996ea5
Compare
Previously, only a single Polygon was accepted.
It'd be odd for these two endpoints to have different attributes
5643926
to
63e3833
Compare
still to do:
data/at/oebb-hafas-mgate.json
data/at/vkg-hafas-mgate.json
data/at/vvt-hafas-mgate.json
data/be/nmbs-sncb-hafas-mgate.json
data/ch/sbb-cff-ffs-hafas-mgate.json
data/ch/zvv-hafas-mgate.json
data/de/avv-hafas-mgate.json
data/de/bvg-hafas-mgate.json
data/de/db-busradar-nrw-hafas-mgate.json
data/de/db-hafas-mgate.json
data/de/db-sbahn-muenchen-hafas-mgate.json
data/de/hvv-hafas-mgate.json
data/de/insa-hafas-mgate.json
data/de/invg-hafas-mgate.json
data/de/mobil-nrw-hafas-mgate.json
data/de/nahsh-hafas-mgate.json
data/de/nvv-hafas-mgate.json
data/de/rmv-hafas-mgate.json
data/de/rsag-hafas-mgate.json
data/de/saarvv-hafas-mgate.json
data/de/svv-hafas-mgate.json
data/de/vbb-hafas-mgate.json
data/de/vbn-hafas-mgate.json
data/de/vmt-hafas-mgate.json
data/de/vos-hafas-mgate.json
data/de/vrn-hafas-mgate.json
data/de/vsn-hafas-mgate.json
data/dk/rejseplanen-hafas-mgate.json
data/ie/iarnrod-eireann-hafas-mgate.json
data/lu/cfl-hafas-mgate.json
data/lu/mobiliteit-lu-hafas-mgate.json
data/us/bart-hafas-mgate.json
data/us/cmta-hafas-mgate.json
attribution
coverage
lists