-
Notifications
You must be signed in to change notification settings - Fork 2
Application profile project
This page provides additional details to the "getting started" section from the Home page
An example project is available here.
Finferprinter project is a folder structure based on the following a set of conventions.
- configuration.json file comprising the project configurations
- /fragments sub-folder containing the document patterns (e.g. JINJA, Mustache, etc.), it is also the root for the templating engine file loader.
- /static sub-folder containing all the javascript, css, images and other static artifacts used (imported) in the final document. This static folder is copied into the output folder.
- /data sub-folder containing any data source, namespace definition or other content asset
This is a sample configuration file. If only alpha dataset is specified then no diff is generated, otherwise when beta is available then the diff is automatically inserted into the project. The diff section of the configuration is needed only when beta dataset is available and means that only the structural columns are used to generate the diff, and then the sutructural columns are renamed to carry friendly column titles.
Please note that the paths specified in the configuration file can be relative or absolute, they are resolved by the project builder before usage.
Also the prefix.csv defines all the prefixes to be used in shortening the URIs. If none is provided then default ones will be generated in the form ns0, ns1 ns2...nsx.
{
"title": "Dataset Fingerprint Report",
"type": "report",
"author": "Eugeniu Costetchi",
"ns_file": "data/prefix.csv",
"output" : "output",
"alpha": {
"file": "data/alpha.csv",
"title": "Alpha Dataset",
"description": "Alpha dataset description"
},
"beta": {
"file": "data/beta.csv",
"title": "Beta Dataset",
"description": "Beta dataset description"
},
"diff": {
"structural_columns" : ["stype", "p", "ootype"],
"column_titles" : ["Domain", "Property", "Range"]
}
}
This SPARQL query needs to be executed on the datasets you intend to fingerprint. The result of this query is the set of CSV files serving as input to the RDF Fingerprinter. Store all the query results in the data sub-folder.
######################################################################3
# countins SPV plus , the most informative query so far,
# counting prop types as well for spo types
##########################################################################
select distinct ?stype ?p ?ootype ?propType
(count(distinct ?s) as ?scnt)
(count(distinct ?o) as ?ocnt)
(count(*) as ?cnt)
(min(?sp_star) as ?min_sp)
(max(?sp_star) as ?max_sp)
(avg(?sp_star) as ?avg_sp)
where
{
?s ?p ?o .
?s a ?stype .
optional {
?o a ?otype .
}
{
select distinct ?s ?p (count(*)as ?sp_star)
{
?s ?p [].
} group by ?s ?p
}
bind( if(?p=rdf:type, ?stype, if(bound(?otype),?otype, datatype(?o) )) as ?ootype )
bind( if(?p=rdf:type,"object", if(bound(?otype),"object","data")) as ?propType)
}
group by ?stype ?p ?ootype ?propType
order by ?stype ?p ?ootype ?propType
If the RDF fingerprinter is installed then to launch it simply run the fingerprinter (with no paramaters) in the project folder.
cd <project folder>
fingerprinter