Wikidata Query Service/Categories: number of pages/subcategories and HiddenCategory attributes - sparql

using gas:service or mediawiki:categoryTree services of Wikidata API is it possible somehow to include mediawiki:pages, mediawiki:subcategories and mediawiki:HiddenCategory attributes in query results? I see these attributes in dumps, but have no luck trying to access them programmatically (with SPARQL or some other API)...

You just need to add your conditions e.g for pages add:
?out mediawiki:pages ?pages .
Result
{
"out" : {
"type" : "uri",
"value" : "https://en.wikipedia.org/wiki/Category:Fictional_ducks"
},
"depth" : {
"datatype" : "http://www.w3.org/2001/XMLSchema#int",
"type" : "literal",
"value" : "1"
},
"pages" : {
"datatype" : "http://www.w3.org/2001/XMLSchema#integer",
"type" : "literal",
"value" : "113"
}
They warn that you can't access this through the UI, so you need to encode your query and pass it in the URL:https://query.wikidata.org/bigdata/namespace/categories/sparql?query=&format=json
Full query:
PREFIX gas: <http://www.bigdata.com/rdf/gas#>
prefix mediawiki: <https://www.mediawiki.org/ontology#>
SELECT * WHERE {
SERVICE gas:service {
gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.BFS" .
gas:program gas:linkType mediawiki:isInCategory .
gas:program gas:traversalDirection "Reverse" .
gas:program gas:in <https://en.wikipedia.org/wiki/Category:Ducks>. # one or more times, specifies the initial frontier.
gas:program gas:out ?out . # exactly once - will be bound to the visited vertices.
gas:program gas:out1 ?depth . # exactly once - will be bound to the depth of the visited vertices.
gas:program gas:maxIterations 8 . # optional limit on breadth first expansion.
}
?out mediawiki:pages ?pages .
} ORDER BY ASC(?depth)

Related

JSON response for a nested SPARQL

I have the structure like below I am currently looking to get a JSON response like this with a SPARQL query I have tried few things like concat and str but dint work out that well for me it got complicated at 3rd level
I have now added 2 frames that I have tried with JSON-LD Framing it gives the correct output until the first level at second level it fails to expand the data
:Reference xx1:timestamp "12/15/2020" .
:Reference xx2:logs xxx:log1 .
:log1 rdf:type xxx:Logs .
:log1 xx1:approver "xxx:bob" .
:log1 xx1:requester "xxx:daisy" .
:log1 xx1:timestamp "12/15/2020" .
:log1 xx1:name "log1" .
:log2 rdf:type xxx:Logs .
:log2 xx1:approver "xxx:bob" .
:log2 xx1:requester "xxx.daisy" .
:log2 xx1:timestamp "18/15/2020" .
:log2 xx1:name "log2" .
:bob rdf:type xxx:User .
:bob xx1:name "bob" .
:daisy rdf:type xxx:User .
:daisy xx1:name "daisy" .```
Required Response with SPARQL (3 Levels)
[
{
"timestamp": "12/15/2020",
"logs": [
{ "name": "log1", "timestamp": "12/15/2020" "approver" : {name: bob },"requester" : {name: bob }},
{ "name": "log2", "timestamp": "12/15/2020" "approver" : {name: bob },"requester" : {name: bob }},
]
}
]
JSON-LD FRAMING
FRAME 1
{
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"contains":{
"#type":"xxx:User"
}
}
}
FRAME 2
{
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"hasApprover" :{"#type":"xxx:User"},
"hasRequester" :{"#type":"xxx:User"}
}
}
The output that I get is
Reference [Log 1 {User is expnded}, Log2{User is not expanded}]
What I need is
Reference [Log 1 {User is expanded}, Log2{User is expanded}]
This JSON-LD frame helps to get the required result
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"hasApprover" :{"#type":"xxx:User","#embed": "#always"},
"hasRequester" :{"#type":"xxx:User","#embed": "#always"}
}
}

one to many relationship definition in ttl and SPARQL-Generate nested GENERATE

I have too many things that I'm not sure of, I may not have asked the right question.
I want to use
https://ci.mines-stetienne.fr/sparql-generate/playground.html
to map some JSON data to turtle RDF format.
Here is a working a version, with the problematic part commented out:
BASE <http://example.com/>
PREFIX iter: <http://w3id.org/sparql-generate/iter/>
PREFIX fun: <http://w3id.org/sparql-generate/fn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX gr: <http://purl.org/goodrelations/v1#>
PREFIX cocoon: <https://miranda-zhang.github.io/cloud-computing-schema/v1.0/ontology.ttl>
GENERATE {
[] a cocoon:VM;
rdfs:label ?name;
cocoon:numberOfCores ?cores;
cocoon:hasCPUcapacity[
a cocoon:PhysicalQuantity;
cocoon:numericValue ?gceu;
cocoon:hasUnitOfMeasurement cocoon:gceu;
];
cocoon:hasMemory [
a cocoon:PhysicalQuantity;
cocoon:numericValue ?memory;
cocoon:hasUnitOfMeasurement cocoon:GB;
];
# GENERATE {
# gr:hasPriceSpecification [
# gr:UnitPriceSpecification;
# gr:hasCurrency "USD"^^xsd:string;
# gr:hasCurrencyValue ?regionalPrice^^xsd:float;
# gr:hasRegion cocoon:?region;
# ];
# }
# ITERATOR iter:JSONPath(?gcloudVM,".price") AS ?price .
# .
}
SOURCE <https://raw.githubusercontent.com/miranda-zhang/cloud-computing-schema/master/example/jq/gcloud/vm.json> AS ?source
ITERATOR iter:JSONPath(?source,"$[*]") AS ?gcloudVM
WHERE {
BIND (fun:JSONPath(?gcloudVM,".name") AS ?name)
BIND (fun:JSONPath(?gcloudVM,".cores") AS ?cores)
BIND (fun:JSONPath(?gcloudVM,".memory") AS ?memory)
BIND (fun:JSONPath(?gcloudVM,".gceu") AS ?gceu)
BIND (fun:JSONPath(?price,".price") AS ?regionalPrice)
BIND (fun:JSONPath(?price,".region") AS ?region)
}
The ontology I defined https://miranda-zhang.github.io/cloud-computing-schema/v1.0/ontology.ttl
Assuming it is correct, my problem is the nested GENERATE.
I want to annotate
"price": [
{
"region": "us",
"price": 0.0076
},
{
"region": "us-central1",
"price": 0.0076
}
]
Maybe into something like:
gr:hasPriceSpecification [
gr:UnitPriceSpecification;
gr:hasCurrency "USD"^^xsd:string;
gr:hasCurrencyValue 0.0076^^xsd:float;
gr:hasRegion cocoon:us;
];
gr:hasPriceSpecification [
gr:UnitPriceSpecification;
gr:hasCurrency "USD"^^xsd:string;
gr:hasCurrencyValue 0.0076^^xsd:float;
gr:hasRegion cocoon:us-central1;
];
Full data is at
https://github.com/miranda-zhang/cloud-computing-schema/blob/master/example/jq/gcloud/vm.json
AKSW is right, I got rid of the syntax error.
BASE <https://w3id.org/cocoon/>
PREFIX iter: <http://w3id.org/sparql-generate/iter/>
PREFIX fun: <http://w3id.org/sparql-generate/fn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX gr: <http://purl.org/goodrelations/v1#>
PREFIX cocoon: <https://raw.githubusercontent.com/miranda-zhang/cloud-computing-schema/master/ontology/1.0/cocoon.ttl>
GENERATE {
<data#{?name}> a cocoon:VM;
rdfs:label ?name;
cocoon:numberOfCores ?cores;
cocoon:hasCPUcapacity[
a cocoon:PhysicalQuantity;
cocoon:numericValue ?gceu;
cocoon:hasUnitOfMeasurement cocoon:gceu;
];
cocoon:hasMemory [
a cocoon:PhysicalQuantity;
cocoon:numericValue ?memory;
cocoon:hasUnitOfMeasurement cocoon:GB;
];
GENERATE {
<data#{?name}> gr:hasPriceSpecification [
a gr:UnitPriceSpecification ;
gr:hasCurrency "USD"^^xsd:string;
gr:hasCurrencyValue "{?regionalPrice}"^^xsd:float;
gr:hasRegion "{?region}"^^xsd:string;
]
}
ITERATOR iter:JSONPath(?gcloudVM,".price[*]") AS ?price
WHERE {
BIND (fun:JSONPath(?price,".price") AS ?regionalPrice)
BIND (fun:JSONPath(?price,".region") AS ?region)
}
.
}
SOURCE <https://raw.githubusercontent.com/miranda-zhang/cloud-computing-schema/master/example/jq/gcloud_vm.json> AS ?source
ITERATOR iter:JSONPath(?source,"$[*]") AS ?gcloudVM
WHERE {
BIND (fun:JSONPath(?gcloudVM,".name") AS ?name)
BIND (fun:JSONPath(?gcloudVM,".cores") AS ?cores)
BIND (fun:JSONPath(?gcloudVM,".memory") AS ?memory)
BIND (fun:JSONPath(?gcloudVM,".gceu") AS ?gceu)
BIND (fun:JSONPath(?price,".price") AS ?regionalPrice)
BIND (fun:JSONPath(?price,".region") AS ?region)
}

SPARQL: Insert query results as RDF LIST

I want to get from graph A results bound to a specific variable lets say ?s.
Next I want to insert those results as an RDF list in graph B.
This is my SPARQL Update:
prefix foo:<http://foo.com/>
insert
{
graph <http://B.com>
{
?var foo:propA foo:A ;
foo:propB [ foo:propA foo:A ;
foo:propX ( ?s )
] ;
foo:propC ?o .
}
}
where
{
graph <http://A.com>
{
?s ?p ?o.
BIND(URI(CONCAT("http://example.org/", STRAFTER( STR(?o), "http://someuri.org/"))) as ?var)
}
}
My problem is that it inserts this data:
<http://example.org/varX> foo:propA foo:A ;
foo:propB [ foo:propA foo:A ;
foo:propX ( <http://s1.com> )
],
[
foo:propA foo:A ;
foo:propX ( <http://s2.com> )
] ;
foo:propC <http://oX.com> .
Instead I want it to insert this:
<http://example.org/varX> foo:propA foo:A ;
foo:propB [ foo:propA foo:A ;
foo:propX ( <http://s1.com> <http://s2.com> )
] ;
foo:propC <http://oX.com> .
Can I achieve this result, is it possible?
Basically I want to set the object for the foo:propX predicate, an RDF list containing the elements values bound to variable ?s.
Note: the exact same query executes fine in RDF4J, but strangely causes Blazegraph to throw a
MalformedQueryException: Undefined vocabulary: http://www.w3.org/1999/02/22-rdf-syntax-ns#first
I don't think this is possible using SPARQL only. You'll need to use some of the API functionality to create the RDF collection.
One way to do this is to first construct your graphB as a Model object in memory, and then at the end insert that model in one go. Something along these lines (untested, but this should illustrate the general idea - have a look at the RDF4J documentation and javadoc for more details):
ValueFactory vf = conn.getValueFactory();
TupleQuery query = conn.prepareTupleQuery("SELECT ?s ?o ?var WHERE ...");
List<BindingSet> queryResult = QueryResults.asList(query.evaluate());
// map values of var to values of S
Map<Value, List<Value>> varToS = new HashMap<>();
... // do something clever with the query result to fill this HashMap
// start building our result graph
Model graphB = new TreeModel()
ModelBuilder mb = new ModelBuilder(graphB);
mb.setNamespace("foo", "http://example.org/");
mb.namedGraph("foo:graphB");
for(Value var: varToS.keySet()) {
BNode propBValue = vf.createBNode();
BNode propXValue = vf.createBNode();
mb.subject(var)
.add("foo:propA", "foo:A")
.add("foo:propB", propBValue)
.subject(propBValue)
.add("foo:propA", "foo:A")
.add("foo:propX", propXValue);
// add the values of ?s for the given v as a collection
RDFCollections.asRDF(varToSet.get(var), propXValue, graphB);
}
// insert our created graphB model into the database
conn.add(graphB);

how to programmatically get all available information from a Wikidata entity?

I'm really new to wikidata. I just figured that wikidata uses a lot of reification.
Suppose we want to get all information available for Obama. If we are going to do it from DBpedia, we would just use a simple query:
select * where {<http://dbpedia.org/resource/Barack_Obama> ?p ?o .} This would return all the properties and values with Obama being the subject. Essentially the result is the same as this page: http://dbpedia.org/page/Barack_Obama while the query result is in a format I needed.
I'm wondering how to do the same thing with Wikidata. This is the Wikidata page for Obama: https://www.wikidata.org/wiki/Q76. Let's say I want all the statements on this page. But almost all the statements on this page are reified in that they have ranks and qualifiers, etc. For example, for the "educated at" part, it not only has the school, but also the "start time" and "end time" and all schools are ranked as normal since Obama is not in these schools anymore.
I could just get all the schools by getting the truthy statements (using https://query.wikidata.org):
SELECT ?school ?schoolLabel WHERE {
wd:Q76 wdt:P69 ?school .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
The above query will simple return all the schools.
If I want to get the start time and end time of the school, I need to do this:
SELECT ?school ?schoolLabel ?start ?end WHERE {
wd:Q76 p:P69 ?school_statement .
?school_statement ps:P69 ?school .
?school_statement pq:P580 ?start .
?school_statement pq:P582 ?end .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
But the thing is, without looking at the actual page, how would I know that the ?school_statement has pq:P580 and pq:P582, namely the "start time" and "end time"? And it all comes down to a question that how do I get all the information (including reification) from https://www.wikidata.org/wiki/Q76?
Ultimately, I would expect a table like this:
||predicate||object||objectLabel||qualifier1||qualifier1Value||qualifier2||qualifier2Value||...
you should probably go for the Wikidata data API (more specifically the wbgetentities module) instead of the SPARQL endpoint:
In your case:
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q76
You should find all the qualifiers data you where looking for: example with entities.Q76.claims.P69.1
{ mainsnak:
{ snaktype: 'value',
property: 'P69',
datavalue:
{ value: { 'entity-type': 'item', 'numeric-id': 3273124, id: 'Q3273124' },
type: 'wikibase-entityid' },
datatype: 'wikibase-item' },
type: 'statement',
qualifiers:
{ P580:
[ { snaktype: 'value',
property: 'P580',
hash: 'a1db249baf916bb22da7fa5666d426954435256c',
datavalue:
{ value:
{ time: '+1971-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ],
P582:
[ { snaktype: 'value',
property: 'P582',
hash: 'a065bff95f5cb3026ebad306b3df7587c8daa2e9',
datavalue:
{ value:
{ time: '+1979-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ] },
'qualifiers-order': [ 'P580', 'P582' ],
id: 'q76$464382F6-E090-409E-B7B9-CB913F1C2166',
rank: 'normal' }
Then you might be interesting in ways to extract readable results from those results

SPARQL update with optional parts

Consider the following SPARQL update:
INSERT {
?performance
mo:performer ?performer ; # optional
mo:singer ?singer ; # optional
mo:performance_of [
dc:title ?title ; # mandatory
mo:composed_in [ a mo:Composition ;
mo:composer ?composer # optional
]
]
}
WHERE {}
If I do not provide values (e.g. in Jena's ParameterizedSparqlString.setIri() for ?performer, ?singer, or ?composer, this update won't insert statements with the corresponding objects, which is as intended.
But how can I suppress [] a mo:Composition as well if ?composer is missing. Creating it in a second INSERT whose WHERE filters on ISIRI(?composer) doesn't seem to be an option because that INSERT won't know the blank node that has already been created by the first one.
So how can I support this kind of optional parameters in a single SPARQL update? E.g., is there any means for "storing" the blank node between two INSERTs?
The following seems to work, when the caller sets composition to a blank node if and only if it sets ?composer to an IRI.
if (composer != null) {
parameterizedSparqlString.setIri ("composer" , composer);
parameterizedSparqlString.setParam("composition", NodeFactory.createAnon());
}
INSERT {
?performance
mo:performer ?performer ; # optional
mo:singer ?singer ; # optional
mo:performance_of [
dc:title ?title ; # mandatory
mo:composed_in ?composition ] . # optional
?composition a mo:Composition ;
mo:composer ?composer .
}
WHERE {}
Hats off to #Joshua Taylor for the lead.
I'd still prefer a self-contained version that does not require the additional parameter ?composition (i.e. works without making additional demands on the caller), if that's possible at all.