-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make dates available for SPARQL #3
Comments
2021-02-12 status: high priority |
@mzeinstra what to do for seasons? Start month? End month? Middle month? Both start and end month? |
Just out of the top of my head. Wouldn’t last day and first day of the
season not work?
…On Sun, 14 Mar 2021 at 01:19, Jeroen De Dauw ***@***.***> wrote:
@mzeinstra <https://github.com/mzeinstra> what to do for seasons? Start
month? End month? Middle month? Both start and end month?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABK54J7DWKLBOVD47XHHOJTTDP6J7ANCNFSM4XO5KB2Q>
.
|
Implementation wise we can make it all happen. Usecase wise the first and last day for seasons works for some cases but not for others. Some examples where it does not work:
I also suspect that using start and end month is better than using the days. The dates also come with precision, and month precision is closer to season than day precision. If we go with multiple values (ie start and end month), then perhaps it might even make sense to include all months part of the season? |
Another example of a use case that gets messed up, this time applicable to intervals, when using start and end time: Imagine having interval 1900-1999 (20th century). If you do a query finding times between 1890 and 1910, you will find the item via the starting time. But if you query for all times between 1910 and 1930, you will not find the item at all. It is not clear to me how to solve that, and it might not be possible without adding features to Blazegraph. And I can imagine, that for the people running the queries, it might be best if intervals (and possibly seasons and sets) are skipped. It is possible we make queries less usable for them by including these. Hard to tell. |
I agree on adding the months in a season, that seems to be the best way forward. For intervals we could add all years in a sequence, but that might not be the best solution in this. Your proposal is to not expose intervals at all? I assume we will also expose the 'raw' EDTF string as well? |
Yes. That could be added without all the guessing once a concrete usecase materializes, which might well be never.
At the moment not. The EDTF is being translated into standard Wikibase time values, so we get maximum compatibility with tools. I am not sure what the implications of also exposing it in RDF as a string are, and am worried that having some values for P123 be a date and some be a string is a big no-no. So this would need some investigation, which will either take a lot longer than the technical work itself, or uncover the need for a bunch of extra technical work. |
I agree, that is a good start within the limited time. I am afraid that not exposing the EDTS-as-string might also close the route of exporting EDTF data from the platform. Would that be true, or would it only be for the Sparql? |
I was just talking about SPARQL and RDF. The standard MediaWiki and Wikibase export mechanisms contain the string version. Example of entity JSON with some EDTF strings in it via the web API: http://edtf.wikibase.wiki/w/api.php?action=wbgetentities&ids=P1 So not having the string in SPARQL or RDF does not create an export issue. Indeed, the RDF does not contain the entirety of the native Wikibase time values. |
Ah ok, than that use case doesn't exists anymore thanks. |
Implementation should be done. Now we will test if the SPARQL queries actually work.
|
I think we can close this task and open more specific tickets if some further tweaks are needed. |
Agreed, we will test this and get back to you. with specific tickets. |
@JeroenDeDauw can we use the environment that you set up to test this? http://edtf.wikibase.wiki/wiki/Property:P1 I don't see the query service available there. |
I will update the demo instance later today so you can test queries there later today, and ping you once it happened. |
I was pointed to this by @mzeinstra. Not having the EDTF representation in Blazegraph might be an issue for downstream pipelines that ingest or export to and from CIDOC-CRM represented data values. CIDOC-CRM is RDF. I am wondering if this issue could be resolved if the dates in EDTF are already in Wikibase native transformed in the native date stamp of Wikibase, where the EDTF representation is maintained as a qualifier to those statements. This would look something like this: https://safsandbox1.wiki.opencura.com/wiki/Item:Q1 Wouldn't using qualifier maintain the integrity with EDTF also in the RDF representation? |
@andrawaag I thought you suggested creating a 'hidden' qualifier to contain the string and not the other way around. right? |
No, I would not hide this. My point is that conceptually Wikibase consists of two redudent data layers, a relation model and a RDF model. We should not remove this redundancy. The RDF layer is crucial for information retrieval since querying wikibase through the API is suboptimal. It is not possible to query Wikibase on both strings and statements. Here the WBQS is key. If there is a discrepancy between the two models, information retrieval will become difficult. THere is indeed a difficulty in EDTF is one would like to do sorting, especially if the model captures only the string representation. If the sollution is transforming the EDTF time string to a XSD:Datetime value I would do that in both layers, and the suggestion I made is one possibility. But I would not hide that, on the contrary, that would lead to downstream confusion, |
@JeroenDeDauw I'll have a further discussion with @andrawaag on this. In the meantime. I was wondering if it is possible to something like this with Blazegraph: PersonX birthDate “~2021-XX-05?“^^xsd:string . That way we could move the responsibility of searching through the dates to the person creating the query. |
I am not familiar with RDF or Blazegraph, so can't tell what is appropriate or what will work without prior investigation. What I do know is what Wikibase outputs as RDF for dates:
See the bottom of: view-source:http://edtf.wikibase.wiki/wiki/Special:EntityData/Q1.rdf This includes
It does not include I suspect we can add more fields to the above RDF Description without breaking the query service. So we could add the EDTF string as such. I am not sure it is "correct" to do this from an RDF perspective. And I am unsure to which degree the information will be queryable via SPARQL. So to change things here, I either need a specification of what the desired RDF output is, or I first need to investigate these topics more so I can make an informed recommendation. |
I had a discussion with Andra on this functionality. To be able to have the proper functionality for export in RDF and for presentation in SPARQL we that you StringValue aftere you add the TimeValues here: WikibaseEdtf/src/Services/RdfBuilder.php Line 31 in 1d9f772
As you say it will most likely not break BlazeGraph and it will help us to present the EDTF string in SPARQL as well use the TTL and RDF export possibility in e.g. http://edtf.wikibase.wiki/wiki/Special:EntityData/Q2.ttl Would that work @JeroenDeDauw ? After this export I will ask @andrawaag and Jose Labra to verify if that is working as expected. |
huh? |
Do you want to have a call on this today? e.g. at 16:00? |
I send you an invite |
This is for #3 Note: I am unsure if this output makes sense RDF/SPARQL wise.
The RDF now also contains the plain EDTF as a string. The above item results in: https://pastebin.com/gGiRPGAb. (Search for (Not deployed on demo system yet) |
I've asked Andra and Jose if this works for their use cases as well. Could you make this available on the demo system? So we can test Sparql as well. |
Done |
Interesting. I see that it appears in the ttl files e.g. (http://edtf.wikibase.wiki/wiki/Special:EntityData/P1.ttl)
But then I expect the following to works too, right? @andrawaag
|
Jose and I reviewed the TTL file at http://edtf.wikibase.wiki/wiki/Special:EntityData/Q1.ttl. By adding the EDTF type as a string to the RDF representation allows the roundtripping which is crucial to maintain the data integrity inside Wikibase. We do have some concerns though regarding the transformation from edtf to xsd:datetime. In the current implementation, edtf is stated as xsd:edtf, which is incorrect. EDTF is not part of XSD. Can this be changed to the applicable namespace? e.g. (https://id.loc.gov/datatypes/edtf.html) Would it be possible to document the rules that are used to transform between edtf and xsd:datetime. For example in the above cite rdf representation of Q1 we see: 2006-24 is to represent winter of 2006. This seems to be transformed to January, February and December 2006. Is that correct, because one could argue that it actually is 2005-12, 2006-1, 2006-2 or 2006-12, 2007-1, 2007-2. If you are from e.g. Australia or Chile, that might be 2006-7, 2006-8, 2006-9. |
|
It is true that for some it could be problematic, but for the day within a month or month within a year cases it's useful to have at least one date. Here's my use case: I want a database of multi-day events and instances of those so I can show them on maps, lists, etc. I might also want to transfer this data into metadata for other consumers, e.g. JSON-LD Event. In some of these cases, I want to retrieve the most recent or upcoming instance of a annual event, that would be determined via a SPARQL query. Previously I'd intended to use start time statements with end time qualifiers to allow for the possibility of cancellation or rescheduling, which I also want to record and (in some cases) show. The new type might be better for this, because the period/interval itself is the subject of a single claim. However, without a datetime value in SPARQL I'd likely have to do my own sorting through items and parsing of dates (or, possibly, pass in a lot of matching text-mode filters) to get the right one. This could include uncertain dates, e.g. "Eurofurence (likely August 2022)". I'd normally consider this to fall on the start of that month, or - less preferred, but maybe nice to have as well - the end of the month. Presumably it'd use day precision per the example above, or less for month or year intervals. (Failing that, given the distribution of time zones, it might be best to use ~11:00 UTC on the day rather than midnight, to avoid being in a different day in some locations.) |
I don't see this as an open ticket yet.
We discussed that the MVP would be to expose the lowest date of an EDTF value to SPARQL in Wikibase. Given possibilities this could be the highest and lowest values.
This is to make the current operators on dates available in SPARQL.
The text was updated successfully, but these errors were encountered: