how to programmatically get all available information from a Wikidata entity?

how to programmatically get all available information from a Wikidata entity? - sparql

I'm really new to wikidata. I just figured that wikidata uses a lot of reification.
Suppose we want to get all information available for Obama. If we are going to do it from DBpedia, we would just use a simple query:
select * where {<http://dbpedia.org/resource/Barack_Obama> ?p ?o .} This would return all the properties and values with Obama being the subject. Essentially the result is the same as this page: http://dbpedia.org/page/Barack_Obama while the query result is in a format I needed.
I'm wondering how to do the same thing with Wikidata. This is the Wikidata page for Obama: https://www.wikidata.org/wiki/Q76. Let's say I want all the statements on this page. But almost all the statements on this page are reified in that they have ranks and qualifiers, etc. For example, for the "educated at" part, it not only has the school, but also the "start time" and "end time" and all schools are ranked as normal since Obama is not in these schools anymore.
I could just get all the schools by getting the truthy statements (using https://query.wikidata.org):
SELECT ?school ?schoolLabel WHERE {
wd:Q76 wdt:P69 ?school .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
The above query will simple return all the schools.
If I want to get the start time and end time of the school, I need to do this:
SELECT ?school ?schoolLabel ?start ?end WHERE {
wd:Q76 p:P69 ?school_statement .
?school_statement ps:P69 ?school .
?school_statement pq:P580 ?start .
?school_statement pq:P582 ?end .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
But the thing is, without looking at the actual page, how would I know that the ?school_statement has pq:P580 and pq:P582, namely the "start time" and "end time"? And it all comes down to a question that how do I get all the information (including reification) from https://www.wikidata.org/wiki/Q76?
Ultimately, I would expect a table like this:
||predicate||object||objectLabel||qualifier1||qualifier1Value||qualifier2||qualifier2Value||...

you should probably go for the Wikidata data API (more specifically the wbgetentities module) instead of the SPARQL endpoint:
In your case:
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q76
You should find all the qualifiers data you where looking for: example with entities.Q76.claims.P69.1
{ mainsnak:
{ snaktype: 'value',
property: 'P69',
datavalue:
{ value: { 'entity-type': 'item', 'numeric-id': 3273124, id: 'Q3273124' },
type: 'wikibase-entityid' },
datatype: 'wikibase-item' },
type: 'statement',
qualifiers:
{ P580:
[ { snaktype: 'value',
property: 'P580',
hash: 'a1db249baf916bb22da7fa5666d426954435256c',
datavalue:
{ value:
{ time: '+1971-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ],
P582:
[ { snaktype: 'value',
property: 'P582',
hash: 'a065bff95f5cb3026ebad306b3df7587c8daa2e9',
datavalue:
{ value:
{ time: '+1979-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ] },
'qualifiers-order': [ 'P580', 'P582' ],
id: 'q76$464382F6-E090-409E-B7B9-CB913F1C2166',
rank: 'normal' }
Then you might be interesting in ways to extract readable results from those results

Related

SPARQL: How to retrieve a 1:N relationship?

After learning the basics of SPARQL I'm still trying to make sense of 1:N relationships. How can I retrieve an object with all its relationships as a single record?
For example, I have a Project linked to two Topics. And I try to retrieve it with:
SELECT ?projName ?topic ?topicName {
?proj hasName ?projName .
?proj hasTopic ?topic .
?topic hasName ?topicName .
FILTER (?proj = <$uri>) .
}
But what I get is:
result: [
[
projName: "My Project"
topic: "TOPIC1_URI"
topicName: "First Topic"
],
[
projName: "My Project"
topic: "TOPIC2_URI"
topicName: "Second Topic"
]
]
But I would want to get it as:
result: [
projName: "My Project"
topics: [
[
topic: "TOPIC1_URI"
topicName: "First Topic"
],
[
topic: "TOPIC2_URI"
topicName: "Second Topic"
]
]
]
How could I achieve this? I don't know what I'm missing but I don't see how to do this with SPARQL.
Thanks a lot in advance

As you can read here, the result of a SELECT SPARQL query is a set of bindings, i.e., assignments of values to the free variables of the query. You can think of such bindings as a matrix or a table, like for SQL queries, whose attributes are the variables' names. But you can't arrange such variables bindings as you wish if you have just a matrix for representing them.
What you can do is to collect all data about topics in a single variable using the GROUP_CONCAT function. For example:
SELECT
?projName
(GROUP_CONCAT(?topicData; separator=", ") AS ?topics)
WHERE {
?proj hasName ?projName .
?proj hasTopic ?topic .
?topic hasName ?topicName .
FILTER (?proj = <$uri>) .
BIND(CONCAT(?topic, ": ", ?topicName) AS ?topicData)
}
GROUP BY ?projName
But remember that the values assigned to ?topics will be strings, not JSON arrays:
result: [
projName: "My Project"
topics: "TOPIC1_URI: First Topic, TOPIC2_URI: Second Topic"
]
Clearly you can choose other separators than ": " and ", ".

JSON response for a nested SPARQL

I have the structure like below I am currently looking to get a JSON response like this with a SPARQL query I have tried few things like concat and str but dint work out that well for me it got complicated at 3rd level
I have now added 2 frames that I have tried with JSON-LD Framing it gives the correct output until the first level at second level it fails to expand the data
:Reference xx1:timestamp "12/15/2020" .
:Reference xx2:logs xxx:log1 .
:log1 rdf:type xxx:Logs .
:log1 xx1:approver "xxx:bob" .
:log1 xx1:requester "xxx:daisy" .
:log1 xx1:timestamp "12/15/2020" .
:log1 xx1:name "log1" .
:log2 rdf:type xxx:Logs .
:log2 xx1:approver "xxx:bob" .
:log2 xx1:requester "xxx.daisy" .
:log2 xx1:timestamp "18/15/2020" .
:log2 xx1:name "log2" .
:bob rdf:type xxx:User .
:bob xx1:name "bob" .
:daisy rdf:type xxx:User .
:daisy xx1:name "daisy" .```
Required Response with SPARQL (3 Levels)
[
{
"timestamp": "12/15/2020",
"logs": [
{ "name": "log1", "timestamp": "12/15/2020" "approver" : {name: bob },"requester" : {name: bob }},
{ "name": "log2", "timestamp": "12/15/2020" "approver" : {name: bob },"requester" : {name: bob }},
]
}
]
JSON-LD FRAMING
FRAME 1
{
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"contains":{
"#type":"xxx:User"
}
}
}
FRAME 2
{
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"hasApprover" :{"#type":"xxx:User"},
"hasRequester" :{"#type":"xxx:User"}
}
}
The output that I get is
Reference [Log 1 {User is expnded}, Log2{User is not expanded}]
What I need is
Reference [Log 1 {User is expanded}, Log2{User is expanded}]

This JSON-LD frame helps to get the required result
"#context":{
"XXX":"http://ABC"
},
"#type":"xxx:Reference",
"contains":{
"#type":"xxx:Log",
"hasApprover" :{"#type":"xxx:User","#embed": "#always"},
"hasRequester" :{"#type":"xxx:User","#embed": "#always"}
}
}

SPARQL filter xsd:Date fails

I have the following SPARQL query
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?d
WHERE { GRAPH <https:/my/triples>{
?s <http://my/ontology/v1.0#hasTimestamp> ?d .
FILTER(?d > "1945-01-01"^^xsd:date)
}
}
When i execute the above, it fails to filter correctly the results by date. Actually it gives me no results at all, and no errors.
If i remove the date filter i get this response:
"bindings": [
{
"s": { "type": "uri" , "value": "seo:S2A_MSI_2019_11_28_09_33_31_T33SXB_t_dogliotti" } ,
"d": { "datatype": "xsd:date" , "type": "typed-literal" , "value": "2019-11-28" }
}
What could be wrong?

I tried all of the suggestions in the comments but i couldn't make it work properly.
I stumbled upon a post from 2011 (!!) that partially gave me the solution.
The post is here and it states that the below solution (aka. "stringifying" the date works only for equality
FILTER(str(?d) >= "2018") .
and casting to xsd:dateTime works for comparisons.
In my case the first solution worked for comparisons (including equality), while casting fails miserably.

Exact match of variable string in SPARQL Wikidata Query Service

Exact match of variable string in SPARQL Wikidata Query Service at https://query.wikidata.org does not give the the results I expected.
I was expecting I could do:
SELECT * {
hint:Query hint:optimizer "None" .
{ SELECT DISTINCT (xsd:string(?author_name_) AS ?author_name) { wd:Q5565155 skos:altLabel ?author_name_ . } }
?work wdt:P2093 ?author_name .
}
But I get no returned results from the Wikidata Query Service:
However, if I use the "=" comparison, I can match the strings:
SELECT * {
hint:Query hint:optimizer "None" .
{ SELECT DISTINCT (xsd:string(?author_name_) AS ?author_name) { wd:Q5565155 skos:altLabel ?author_name_ . } }
?work wdt:P50 wd:Q5565155 .
?work wdt:P2093 ?author_name__ .
FILTER (?author_name = ?author_name__)
}
With the current data in Wikidata, I get five rows returned in this query.
Another way to get this data is by using a BIND:
SELECT * {
BIND("Knudsen GM" AS ?author_name)
?work wdt:P2093 ?author_name .
}
I suppose there might be something wrong with the casting as this does not return anything:
SELECT * {
BIND(xsd:string("Knudsen GM") AS ?author_name)
?work wdt:P2093 ?author_name .
}
Combinations with xsd:string changed to STR or no conversion at all in the original query do neither yield result rows.

SPARQL DATASET definition

Can anyone explain to me what this statement "Dataset(QuadPattern,μ,GS,GS)" means?Especially, I am trying to figure out the model of DELETE DATA operation (DELETE DATA QuadData) , but I cant understand what Dataset(QuadPattern,{},GS,GS) means.

You seem to be referring to the SPARQL 1.1 Update spec:
Dataset(QuadPattern,μ,DS,GS) ...[an] auxiliary function constructs an RDF Dataset from a QuadPattern, given a solution mapping and an RDF Dataset.
Put simply it's function which takes a bunch of RDF in graphs which may include variables, e.g.:
GRAPH ?g { ?person a Person ; ex:tel ?tel }
{ ?g ex:source ?source }
and a set of solutions μ:
{ ?g => <http://example.com/graph1> , ?person => <http://example.com/alice> , ?tel => "0898 505050" , ?source => <http://192.com/> }
{ ?g => <http://example.com/graph2> , ?person => <http://example.com/bob> , ?tel => "117 117" , ?source => <http://192.com/> }
and binds those values, resulting in a dataset:
{
<http://example.com/graph1> ex:source <http://192.com/> .
<http://example.com/graph2> ex:source <http://192.com/> .
}
GRAPH <http://example.com/graph1> { <http://example.com/alice> a Person ; ex:tel "0898 505050" }
GRAPH <http://example.com/graph2> { <http://example.com/bob> a Person ; ex:tel "117 117" }

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to programmatically get all available information from a Wikidata entity? - sparql

Related

SPARQL: How to retrieve a 1:N relationship?

JSON response for a nested SPARQL

SPARQL filter xsd:Date fails

Exact match of variable string in SPARQL Wikidata Query Service

SPARQL DATASET definition

Categories

Resources