Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add HAFAS mgate.exe endpoints #7

Merged
merged 35 commits into from
Feb 25, 2021
Merged

add HAFAS mgate.exe endpoints #7

merged 35 commits into from
Feb 25, 2021

Conversation

derhuerst
Copy link
Member

@derhuerst derhuerst commented Jan 17, 2021

still to do:

  • add required fields
    • data/at/oebb-hafas-mgate.json
    • data/at/vkg-hafas-mgate.json
    • data/at/vvt-hafas-mgate.json
    • data/be/nmbs-sncb-hafas-mgate.json
    • data/ch/sbb-cff-ffs-hafas-mgate.json
    • data/ch/zvv-hafas-mgate.json
    • data/de/avv-hafas-mgate.json
    • data/de/bvg-hafas-mgate.json
    • data/de/db-busradar-nrw-hafas-mgate.json
    • data/de/db-hafas-mgate.json
    • data/de/db-sbahn-muenchen-hafas-mgate.json
    • data/de/hvv-hafas-mgate.json
    • data/de/insa-hafas-mgate.json
    • data/de/invg-hafas-mgate.json
    • data/de/mobil-nrw-hafas-mgate.json
    • data/de/nahsh-hafas-mgate.json
    • data/de/nvv-hafas-mgate.json
    • data/de/rmv-hafas-mgate.json
    • data/de/rsag-hafas-mgate.json
    • data/de/saarvv-hafas-mgate.json
    • data/de/svv-hafas-mgate.json
    • data/de/vbb-hafas-mgate.json
    • data/de/vbn-hafas-mgate.json
    • data/de/vmt-hafas-mgate.json
    • data/de/vos-hafas-mgate.json
    • data/de/vrn-hafas-mgate.json
    • data/de/vsn-hafas-mgate.json
    • data/dk/rejseplanen-hafas-mgate.json
    • data/ie/iarnrod-eireann-hafas-mgate.json
    • data/lu/cfl-hafas-mgate.json
    • data/lu/mobiliteit-lu-hafas-mgate.json
    • data/us/bart-hafas-mgate.json
    • data/us/cmta-hafas-mgate.json
  • add optional fields like attribution
  • add Liechtenstein to coverage lists
  • adapt to Proposal: Hafas product metadata #16

@vkrause
Copy link
Member

vkrause commented Jan 18, 2021

Nice! I'd say we can land this even if still incomplete and fill the gaps as the remaining details become available. I have been working on adjusting the coverage format in KPublicTransport, so it should get easier to merge that into here.

Regarding the file names: KPublicTransport uses the ISO-3166-2 region code as an additional namespace level next to the country code, for local operators. This helps to disambiguate e.g. "AVV" in Germany (Aachen vs. Augsburg). That could of course also be done by other means, but IMHO something worth thinking about.

@derhuerst
Copy link
Member Author

Regarding the file names: KPublicTransport uses the ISO-3166-2 region code as an additional namespace level next to the country code, for local operators. This helps to disambiguate e.g. "AVV" in Germany (Aachen vs. Augsburg). That could of course also be done by other means, but IMHO something worth thinking about.

I sounds tempting, but there are quite a lot of endpoints that don't cover a single county/region, either less or far more (e.g. VRN/RMV, VBB/BVG, NVV). I think we could go arbitrarily fine-grained here, so I just used the endpoint's provider's abbreviation for now.

@derhuerst
Copy link
Member Author

derhuerst commented Jan 19, 2021

@em0lar & @n0emis I will just tag you here, since you have created pyhafas. Once this is merged, kpublictransport & hafas-client will pull their HAFAS API endpoint definitions (or at least the basic details about them) from this repo; Others like public-transport-enabler, TripKit & BahnhofsAbfahrten/marudor.de might follow. More background info: #1

@vkrause
Copy link
Member

vkrause commented Jan 20, 2021

I sounds tempting, but there are quite a lot of endpoints that don't cover a single county/region, either less or far more (e.g. VRN/RMV, VBB/BVG, NVV). I think we could go arbitrarily fine-grained here, so I just used the endpoint's provider's abbreviation for now.

Right. I'd not see this as coverage information, but purely as a systematic disambiguation. OTOH I'm only aware of one practically relevant abbreviation collision within one country, and that's AVV. avv-aachen and avv-augsburg would address that sufficiently as well.

Another thing I noticed while trying to integrate this: we are using ver here, the readme calls this field version, which one do we want? ver matches the Hafas naming, version is nicer to read, I don't have a strong preference for either option.

@derhuerst
Copy link
Member Author

I'd not see this as coverage information, but purely as a systematic disambiguation.

Makes sense. Let's go with the systematic distinction then and store/name endpoints by their providers "center of operation".

we are using ver here, the readme calls this field version, which one do we want? ver matches the Hafas naming, version is nicer to read, I don't have a strong preference for either option.

I've picked ver so far because its possible values are HAFAS-specific anyways, just like with client or auth.

vkrause added a commit that referenced this pull request Jan 24, 2021
@vkrause
Copy link
Member

vkrause commented Jan 24, 2021

I've picked ver so far because its possible values are HAFAS-specific anyways, just like with client or auth.

Ok, updated #8 to follow that, and fixed existing documentation and data in #9.

@derf
Copy link
Contributor

derf commented Jan 25, 2021

Nearly all files in this pull request specify coverage with country codes instead of geojson polygons. While this can simplify both data maintenance and consumption (API consumers wanting to find out if a location is covered by an endpoint simply need to check whether it is in one of the listed countries using an arbitrarily coarse/accurate database of country polygons), it also means that API consumers -must- have such a database to take advantage of coverage data. Is there such a database we can refer to? If not, I'd prefer to always provide polygons. Extending our spec so we can augment them with country codes might be useful, though.

Side note: the definition for germany at https://github.com/isellsoap/deutschlandGeoJSON/blob/master/1_deutschland/4_niedrig.geo.json is already 22kB. I think we should settle for a much lower level of detail here, e.g. the rectangles already used in https://invent.kde.org/libraries/kpublictransport/-/tree/master/src/lib/networks.

Do you have different preferences or am I missing something here? If not, I'll replace the coverage country codes with polygons in the next days.

@derhuerst
Copy link
Member Author

Is there such a database we can refer to?

I found several, and probably there are many more:

These are in different states of maintenance, and handle disputes & non-UN-recognized countries/regions differently, etc. But generally speaking: Yes, there are.

If not, I'd prefer to always provide polygons.

IMO coverage polygons imply greater accuracy than what they actually convey. And once you go down this rabbit hole, you'll be very busy specifying exactly what and what not each endpoint covers. Most endpoints' coverage areas don't have "clean" borders, they have many exceptions.

Maintaining a dataset, and later on deciding to extend it, is far easier to do than having a very detailed dataset that ends up unmaintained (and therefore partially wrong). This is why I'm a bit skeptical of adding polygons for endpoints with a large (and very random) coverage area. I definitely see your point that providing polygons would make consumption a lot easier, but maybe we can find a middle ground by linking to some of these databases.

Keep in mind that, as long as a coverage area is composed of "complete" Bundesländer (country subdivisions to be exact) – which is often the case –, we could just use ISO 3166-2 codes.

Extending our spec so we can augment them with country codes might be useful, though.

As in "this shape corresponds exactly to that country"? Or as in "this shape lies entirely within that country"?

@derf
Copy link
Contributor

derf commented Jan 26, 2021 via email

derf added a commit that referenced this pull request Jan 26, 2021
The distinction between "area" and "region" as well as the definition of
"region" were missing. See also #7
@vkrause
Copy link
Member

vkrause commented Jan 26, 2021

Right, I see the polygons and region codes as complementary, ie. we want both IMHO.

  • Polygons are most useful when having to pick the right endpoint when searching by coordinate. Ideally those should be as low-res as reasonably possible, to keep this performing well. And it's not like the usefulness of a transport operator immediately stops just because you are a few hundred meters outside of an imaginary border.
  • Polygons can model things that region codes can't, such as local operators serving an area much smaller than what a code can express.
  • Region codes are useful for endpoint selection whenever no coordinates are known yet (e.g. name-based search when at least the country in known), or for presenting some form of grouped list of endpoints to the user. Transportr does that for example, and KDE Itinerary also has a similar view in its settings, but both are constrained to a single country for each endpoint so far.

Maintenance overhead for the extra region codes should be minimal I think, the part that usually takes time is to determine the coverage area and create the polygons. The region codes might even simplify this by decoupling those two steps, like it happened here.

@derhuerst
Copy link
Member Author

derhuerst commented Jan 26, 2021

Polygons are most useful when having to pick the right endpoint when searching by coordinate. Ideally those should be as low-res as reasonably possible, to keep this performing well.

Polygons can model things that region codes can't, such as local operators serving an area much smaller than what a code can express.

👍

I see the value of having polygons for consumption, but I think we should try to have them in a reasonably high resolution:

  • AFAIK there's no widely used machine-readable way to denote the precision of GeoJSON data. People consuming the polygons might make assumptions that they shouldn't make because it's low-res. There is a relevant difference between covered areas and "almost covered" areas (see below).
  • File size and processing time are no issue, are they? I assume transport-apis will mostly be used ahead-of-time (a.k.a. pre-processed), not during runtime.

And it's not like the usefulness of a transport operator immediately stops just because you are a few hundred meters outside of an imaginary border.

For people querying a connection from Gubin (in Poland, close to the Polish-Germany border), to Berlin, it's relevant to know if there's a local bus connecting the center of Gubin with Guben (German side, about 2km away, where the RE1 stops).

I might have an extreme opinion on this: With transit routing, in certain situations (e.g. any form of accessibility requirement), inaccurate connections become borderline incorrect.

Of course, we can't make all routing responses completely correct (including accessibility info, etc.), but we can at least handle the situation in the least unhelpful way:

Look, the Deutsche Bahn service doesn't seem to cover your area. There might be a (accessible) local bus to Guben, or you might have to get to Guben on your own.

Once we support Krosno Odrzańskie County public transport (imaginary example), a consuming app could show:

Neither Krosno Odrzańskie County public transport nor Deutsche Bahn know how to get from Gubin to Berlin, but the former knows that there's a bus to Guben, and the latter says that there's a train "RE1" to Berlin.

Maybe I'm too ambitious here, and cross-region/-country public transportation is not ready to be that user-friendly yet.

@vkrause
Copy link
Member

vkrause commented Jan 26, 2021

I see the value of having polygons for consumption, but I think we should try to have them in a reasonably high resolution:

* AFAIK there's no widely used machine-readable way to denote the precision of GeoJSON data. People consuming the polygons might make assumptions that they shouldn't make because it's low-res. There _is_ a relevant difference between covered areas and "almost covered" areas (see below).

* File size and processing time are no issue, are they? I assume `transport-apis` will mostly be used _ahead-of-time_ (a.k.a. pre-processed), not during runtime.

Right, high-res polygons in the source data and pre-processing would work for KPublicTransport, even just bounding boxes as the extreme low-res case would help us already.

For people querying a connection from Gubin (in Poland, close to the Polish-Germany border), to Berlin, it's relevant to know if there's a local bus connecting the center of Gubin with Guben (German side, about 2km away, where the RE1 stops).

I might have an extreme opinion on this: With transit routing, in certain situations (e.g. any form of accessibility requirement), inaccurate connections become borderline incorrect.

Of course, we can't make all routing responses completely correct (including accessibility info, etc.), but we can at least handle the situation in the least unhelpful way:

Look, the Deutsche Bahn service doesn't seem to cover your area. There might be a (accessible) local bus to Guben, or you might have to get to Guben on your own.

Once we support Krosno Odrzańskie County public transport (imaginary example), a consuming app could show:

Neither Krosno Odrzańskie County public transport nor Deutsche Bahn know how to get from Gubin to Berlin, but the former knows that there's a bus to Guben, and the latter says that there's a train "RE1" to Berlin.

Maybe I'm too ambitious here, and cross-region/-country public transportation is not ready to be that user-friendly yet.

Right, but isn't all that independent of whether you have a high res polygon or say just a bounding box? For implementing something like this it would seem important to know which operators need to be checked for possible routes along the way, DB and the local operator in your example. Both would still be selected when only having bounding boxes, you might however consider more operators due to too their bounding box covering areas not actually supported. Or would you use the high-res polygon to determine the nearest location reachable by one operator client-side?

@derhuerst
Copy link
Member Author

I might have an extreme opinion on this: With transit routing, in certain situations (e.g. any form of accessibility requirement), inaccurate connections become borderline incorrect.

[…]

Right, but isn't all that independent of whether you have a high res polygon or say just a bounding box? For implementing something like this it would seem important to know which operators need to be checked for possible routes along the way, DB and the local operator in your example. Both would still be selected when only having bounding boxes, you might however consider more operators due to too their bounding box covering areas not actually supported. Or would you use the high-res polygon to determine the nearest location reachable by one operator client-side?

The problem here is that most API endpoints don't tell (or wouldn't even know themselves!) if the know all local transit.

For example: With a computed connections including 10km walking, I can't tell (and often even they wouldn't be able to) tell if 10km walking is indeed the best route given all local transit, or if it just appears to be the best because the endpoint only knows about a few regional trains.

This is where exact coverage areas can help, defined with background knowledge that we've managed to obtain (e.g. by checking if there's local transit that the endpoint doesn't know about).


We're drifting a little far off here though. In general, I'm in favor of adding coverage polygons, as long as we don't make them less accurate on purpose so that the statement is borderline "wrong".

@vkrause
Copy link
Member

vkrause commented Jan 26, 2021

We're drifting a little far off here though. In general, I'm in favor of adding coverage polygons, as long as we don't make them less accurate on purpose so that the statement is borderline "wrong".

That works for KPublicTransport, our only constraint there is that the polygon is containing the covered area, ie. rather is too large than too small, an exact high-res polygon would satisfy that as well.

@derhuerst
Copy link
Member Author

With my pan-european-public-transport prototype, I have more or less the same requirements and have used simplify(buffer(coveredRegion)).

@derhuerst derhuerst force-pushed the hafas-based branch 7 times, most recently from c730684 to e996ea5 Compare February 17, 2021 16:20
@derhuerst derhuerst marked this pull request as ready for review February 17, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants