I am new to SPARQL, and graph database querying as a whole so please excuse any ignorance but I am trying to write a basic output using some data stored within Fueski and am struggling to understand the best practice for handling duplication of rows due to the cardinality that exist between the various concepts.
I will use a simple example to hopefully demonstrate my point.
Data Set
This is a representative sample of the types of data and relationships I am currently working with;
Data Set
Based on this structure I have produced the following triples (N-Triple format);
<http://www.test.com/ontologies/Author/JohnGrisham> <http://www.test.com/ontologies/property#firstName> "John" .
<http://www.test.com/ontologies/Author/JohnGrisham> <http://www.test.com/ontologies/property#lastName> "Grisham" .
<http://www.test.com/ontologies/Author/JohnGrisham> <http://www.test.com/ontologies/property#hasWritten> <http://www.test.com/ontologies/Book/TheClient> .
<http://www.test.com/ontologies/Author/JohnGrisham> <http://www.test.com/ontologies/property#hasWritten> <http://www.test.com/ontologies/Book/TheFirm> .
<http://www.test.com/ontologies/Book/TheFirm> <http://www.test.com/ontologies/property#name> "The Firm" .
<http://www.test.com/ontologies/Book/TheFirm> <http://www.test.com/ontologies/property#soldBy> <http://www.test.com/ontologies/Retailer/Foyles> .
<http://www.test.com/ontologies/Book/TheFirm> <http://www.test.com/ontologies/property#soldBy> <http://www.test.com/ontologies/Retailer/Waterstones> .
<http://www.test.com/ontologies/Book/TheClient> <http://www.test.com/ontologies/property#name> "The Client" .
<http://www.test.com/ontologies/Book/TheClient> <http://www.test.com/ontologies/property#soldBy> <http://www.test.com/ontologies/Retailer/Amazon> .
<http://www.test.com/ontologies/Book/TheClient> <http://www.test.com/ontologies/property#soldBy> <http://www.test.com/ontologies/Retailer/Waterstones> .
<http://www.test.com/ontologies/Retailer/Amazon> <http://www.test.com/ontologies/property#name> "Amazon" .
<http://www.test.com/ontologies/Retailer/Waterstones> <http://www.test.com/ontologies/property#name> "Waterstones" .
<http://www.test.com/ontologies/Retailer/Foyles> <http://www.test.com/ontologies/property#name> "Foyles" .
Render Output Format
Now what I am trying to do is render a page where all authors are displayed showing details of all the books and the retailers in which those individual books are sold. so something like this (suedo code);
for-each:Author
<h1>Author.firstName + Author.lastName</h1>
for-each:Author.Book
<h2>Book.Name</h2>
Sold By:
for-each:Book.Retailer
<h2>Retailer.name</h2>
SPARQL
For the rendering to work my thinking was I would need the author's First name and last name, then all book names they have and the various retailer names those books are sold through and therefore I came up with the following SPARQL;
PREFIX p: <http://www.test.com/ontologies/property#>
SELECT ?authorfirstname
?authorlastname
?bookname
?retailername
WHERE {
?author p:firstName ?authorfirstname;
p:lastName ?authorlastname;
p:hasWritten ?book .
OPTIONAL {
?book p:name ?bookname;
p:soldBy ?retailer .
?retailer p:name ?retailername .
}
}
This provides the following results;
Results Triple Table
Unfortunately due to the duplication of rows my basic rendering attempt cannot produce output as expected, in fact it's rendering a new "Author" section for every row returned from the query.
I guess what I'm trying to understand is how should this type of rendering should be done.
Is it the renderer that is supposed to regroup data back into the graph form it wants to travese (I honestly cannot see how this can be the case)
Is the SPARQL invalid - is there a way to do what I want in the SPARQL language itself?
Am I just doing something completely wrong?
AMENDMENT - More Detailed Analysis on GROUP_CONCAT
When reviewing the options available to me I came across GROUP_CONCAT but after a bit of playing with it decided it probably wasn't the option that was going to give me what I wanted and probably wasn't the best route. The reasons for this are;
Data Size
Whilst the data set I am running my examples over in this post is small only spanning 3 concepts and a very restricted data set the actual concepts and data I am running against in the real world is far far larger where concatenating results will produce extremely long delimitered strings, especially for free format columns such as descriptions.
Loss of context
Whilst trying out group_concat I quickly realised that I couldn't understand the context of how the various data elements across the group_concat columns related.. I can show that by using the book example above.
SPARQL
PREFIX p: <http://www.test.com/ontologies/property#>
select ?authorfirstname
?authorLastName
(group_concat(distinct ?bookname; separator = ";") as ?booknames)
(group_concat(distinct ?retailername; separator = ";") as ?retailernames)
where {
?author p:firstName ?authorfirstname;
p:lastName ?authorLastName;
p:hasWritten ?book .
OPTIONAL {
?book p:name ?bookname;
p:soldBy ?retailer .
?retailer p:name ?retailername .
}
}
group by ?authorfirstname ?authorLastName
This produced the following output;
firstname = "John"
lastname = "Grisham"
booknames = "The Client;The Firm"
retailernames = "Amazon;Waterstones;Foyles"
As you can see this has produced one result row but you can no longer work out how the various data elements relate. Which Retailers are for which Book?
Any help/guidance would be greatly appreciated.
Current Solution
Based on the recommended solution below I have used the concept of keys to bring the various data sets togehter however I have tweeked it slightly so that I am using a query per concept (E.g. author, book and retailer) and then used the keys to bring together the results in my renderer.
Author Results
firstname lastname books
--------------------------------------------------------------------------------
1 John Grisham ontologies/Book/TheClient|ontologies/Book/TheFirm
Book Results
id name retailers
-------------------------------------------------------------------------------------------------------
1 ontologies/Book/TheClient The Client ontologies/Retailer/WaterStones|ontologies/Retailer/Amazon
2 ontologies/Book/TheFirm The Firm ontologies/Retailer/WaterStones|ontologies/Retailer/Foyles
Retailer Results
id name
--------------------------------------------------
1 ontologies/Retailer/Amazon Amazon
2 ontologies/Retailer/Waterstones Waterstones
3 ontologies/Retailer/Foyles Foyles
What I then do in my renderer is use the ID's to pull results from the various result sets...
for-each author a : authors
output(a.firstname)
for-each book b : a.books.split("|")
book = books.get(b) // get the result for book b (e.g. Id to Foreign key)
output(book.name)
for-each retailer r : book.retailers.split("|")
retailer = retailers.get(r)
output(retailer.name)
So effectively you are stitching together what you want from the various different result sets and presenting it.
This seems to be working OK for the moment.
I find it easier to construct objects out of the SPARQL results in code rather than trying to form a query that returns only a single row per the relevant resource.
I would use the URI of the resources to identify which rows belong to which resource (author in this case), and then merge the result rows based on said URI.
For JS applications I use the code here to construct objects out of SPARQL results.
For complex values I use __ in the variable name to denote that an object should be constructed from the value. For example all values with variables prefixed with ?book__ would be turned into an object with the remainder of the variable's name as the name of the object's attribute, each object identified by ?book__id. So having values for ?book__id and ?book__name would result in an attribute book for the author, such that author.book = { id: '<book-uri>', name: 'book name'} (or a list of such objects if there are multiple books).
For example in this case I would use the following query:
PREFIX p: <http://www.test.com/ontologies/property#>
SELECT ?id ?firstName ?lastName ?book__id ?book__name
?book__retailer
WHERE {
?id p:firstName ?firstName;
p:lastName ?lastName;
p:hasWritten ?book__id .
OPTIONAL {
?book__id p:name ?book__name;
p:soldBy/p:name ?book__retailer .
}
}
And in the application code I would construct Author objects that look like this (JavaScript notation):
[{
id: '<http://www.test.com/ontologies/Author/JohnGrisham>',
firstName: 'John',
lastName: 'Grisham',
book: [
{
id: '<http://www.test.com/ontologies/Book/TheFirm>',
name: 'The Firm',
retailer: ['Amazon', 'Waterstones', 'Foyles']
},
{
id: '<http://www.test.com/ontologies/Book/TheClient>',
name: 'The Client',
retailer: ['Amazon', 'Waterstones', 'Foyles']
}
]
}]
This is a common problem that can strike any relational database, I suppose. As you say GROUP_CONCAT is useful in many situations, but does lose fidelity.
I worked out a solution you might find interesting. Let's assume you want to construct a view or result tree looping though authors, then for each author their books, then for each author the retailer.
SELECT DISTINCT ?authorname ?bookname ?retailername {
...
} ORDER BY ?authorname ?bookname ?retailername
That gives you results like this:
author book retailer
-----------------------------
1 author1 book1 retailer1
2 author1 book1 retailer2
3 author1 book2 retailer2
4 author2 book3 retailer2
5 author2 book3 retailer3
...
Because of the ordering it's possible to step through
get next result
currentauthor = author in result
print currentauthor
while author in next result = currentauthor:
get next result
currentbook = book in result
print currentauthor
while book in next result = currentbook:
get next result
print retailer in result
Related
I want to traverse graph starting from any "root" concept and getting down to its leaf concepts moving by reified predicates of certain type (e.g. hasChild only).
I have a large graph in which Concepts C are connected with named predicates R.
Predicates are in turn attached to blank nodes B which hold their meta data, including connection type CT.
Essentially the pattern is:
root_concept -[^subject]-> blank -[object]-> concept -[^subject]-> blank -[object]-> concept -[^subject]-> blank -[object]-> ...
And I want to get all downstream concepts.
So normally you would do something like (1):
SELECT ?c
WHERE {
FILTER( ?root = <SomeValue> ) .
?root (^rdf:subject/rdf:object)* ?c .
}
But I need to get intermediate blank node B and filter on its CT!
For a single step this looks like (2):
SELECT ?c
WHERE {
FILTER( ?root = <SomeValue> ) .
?root ^rdf:subject ?blank .
?blank rdf:object ?c .
?blank :hasRelationType "hasChild" .
}
Question: How to merry (1) and (2)?
Additional info:
Each blank node has at least these triples:
1 blank1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://some/ontology/terms/Relationship
3 blank1 http://www.w3.org/1999/02/22-rdf-syntax-ns#object C2
4 blank1 http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate R
5 blank1 http://www.w3.org/1999/02/22-rdf-syntax-ns#subject C1
6 blank1 http://some/ontology/terms/hasRelationType NotHasChild (!)
Predicates represented by blank nodes B actually have their own IDs, but they are completely uninformative (i.e. rdf:predicate from blank nodes leads to something like R0134235)
I have tried modeling my query based on arbitrary length property paths, and general recipe for getting all child properties, but couldn't figure that out.
I have the URI of an entity (:P1) and I want to find all entities that are similar to this entity. Right now, I am trying to find those entities that are connected to :P1 through a common entity and the same attribute. I have a query like this.
SELECT ?simP (SUM(?score) AS ?simScore)
WHERE {
{
:P1 ?prop ?q.
?q ?propInv ?simP.
?propInv owl:inverseOf ?prop.
FILTER(?simP != :P1).
BIND(1 AS ?score).
} UNION {
:P1 ?prop1 ?q1.
?q1 ?prop2 ?q.
?q ?prop2Inv ?q2.
?q2 ?prop1Inv ?simP.
?prop1Inv owl:inverseOf ?prop1.
?prop2Inv owl:inverseOf ?prop2.
FILTER(?simP != :P1).
BIND(0.5 AS ?score).
}
}
GROUP BY ?simP
ORDER BY DESC(?simScore)
As you can see, I am trying to find those entities that are connected to P1 through a common entity at a distance of 1 hop and then 2 hops. Scores (reciprocal of the number of hops) reduce as the number of hops increase.
My issues with this query are
It requires that each property have an inverse (owl:inverseOf) defined which need not always be the case.
The number of statements in the queries go on increasing as I increase the number of hops from 1 to 2 to 3 and so on.
My question is if there is a better way to getting the outcome I am expecting. It would be great if the query can at the very least get entities that are connected to :P1 through a common entity atleast 2 hops away without having to add the UNION clause for each hop.
Also, is there a better approach to getting entities considered "similar" to :P1.
I wonder how can I found out how many labels in Wikidata are for each language, out of the total amount of 50 millions entries.
For example, in https://query.wikidata.org , for Catalán language ("ca") I tried with
SELECT ?lang (COUNT(DISTINCT ?item) AS ?count) WHERE {
?item schema:inLanguage "ca" .
} GROUP BY ?lang
ORDER BY DESC (?count)
and got a result of 703351, but I think it's not correct because I downloaded the Wikidata dump (from https://dumps.wikimedia.org/wikidatawiki/entities/ ), and I already extracted more than two millions of labels in Catalán (and the extraction process is still running)
So, any clue on what am I doing wrong?
As suggested in the notes above, using Quarry:
https://quarry.wmflabs.org/query/27976
USE wikidatawiki_p;
DESCRIBE wb_terms;
SELECT COUNT(*) FROM wb_terms
WHERE term_type = 'label' AND term_language = "ca";
My ontology has 2 classes: food foodSource.
I want to get the data for the class food with 2 kinds of foodSource using the SPARQL query:
SELECT ?food ?foodSource ?foodSource2
WHERE {
{
?foodSource mpasi:data_memerlukanBahan ?memerlukanBahan.
?food ?memerlukanBahan ?value.
FILTER regex(str(?foodSource),"carrot")
}
UNION
{
?foodSource2 mpasi:data_memerlukanBahan ?memerlukanBahan.
?food ?memerlukanBahan ?value.
FILTER regex(str(?foodSource2),"tomato")
}
}
order by ?food
I get the correct result. But then I want to avoid some food sources. So, how do can I get food that doesn't have a food source? For example, I have 3 kinds of food:
kedelai porridge
apple crumble
tomato soup
Then i want food without apple inside it. So, naturally i just get kedelai porridge and tomato soup.
Can i get it using !regex?
FILTER (!regex(str(?foodSource),"apple")).
Perhaps a first step is to refactor your UNION statement. The first two triple patterns are the same for both UNION graph statements, so move those outside of your UNION statement. Then you're left with just the two FILTER statements, which can be re-written with an || expression:
SELECT ?food ?foodSource ?foodSource2
WHERE {
?foodSource mpasi:data_memerlukanBahan ?memerlukanBahan.
?food ?memerlukanBahan ?value.
FILTER (regex(str(?foodSource),"carrot") || regex(str(?foodSource2),"apple"))
}
Perhaps better, as #AKSW suggests, is to perform the or in the regex itself:
FILTER regex(str(?foodSource),"carrot|apple")
Since FILTER in SPARQL is a positive filter - it states what to let through - then "carrot" and "apple" are the only possible solutions. "potato" can't get through the filter.
However if you wanted to find everything but "potato", then you have a couple of negation choices:
FILTER (!regex(str(?foodSource),"potato"))
or via regex negation:
FILTER regex(str(?foodSource),"[^potato]")
All queries are tested on sparql virtuso endpoint
I want to find the categories of two dbpedia subject like here Bharatiya_Janata_Party and New_Delhi. I want to match how the categories of these are similar to each other.
As here in the first query i got the categories of Bharatiya_Janata_Party.
In the Second query i got the categories of New_Delhi.
Now I want to match the result of category of Bharatiya_Janata_Party to that of New_Delhi. Like here
Nationalist_parties---New_Delhi
Nationalist_parties---New_Delhi_district
Nationalist_parties---Populated_places_established_in_1911
Nationalist_parties---Capitals_in_Asia
Nationalist_parties---Capitals_in_Asia
Nationalist_parties--Planned_capitals
Political_parties_established_in_1980---New_Delhi
Political_parties_established_in_1980---New_Delhi_district
.....
....
..
..
I have fired a query III for making match between Nationalist_parties---New_Delhi. I got a match at level 4((^skos:broader){0,4}).
Similarly Again I have to do for Nationalist_parties---New_Delhi_district.
The real problem is that i want to combine these 3 queries so that i may get the direct result in a tabular form. Is there any way to automate the whole process.
Query I:
SELECT *
WHERE {
dbpedia:Bharatiya_Janata_Party dcterms:subject ?x
}
Result of Query I:
dbpedia.org/resource/Category:Nationalist_parties
dbpedia.org/resource/Category:Political_parties_established_in_1980
dbpedia.org/resource/Category:Conservative_parties_in_India
dbpedia.org/resource/Category:Hindu_political_parties
dbpedia.org/resource/Category:Hindutva
dbpedia.org/resource/Category:Bharatiya_Janata_Party
dbpedia.org/resource/Category:1980_establishments_in_India
Query II:
SELECT *
WHERE {
dbpedia:New_Delhi dcterms:subject ?x
}
Result of Query II:
dbpedia.org/resource/Category:New_Delhi
dbpedia.org/resource/Category:New_Delhi_district
dbpedia.org/resource/Category:Populated_places_established_in_1911
dbpedia.org/resource/Category:Capitals_in_Asia
dbpedia.org/resource/Category:Indian_capital_cities
dbpedia.org/resource/Category:Planned_capitals
dbpedia.org/resource/Category:Urdu-speaking_countries_and_territories
QUERY III:
select distinct ?super where {
?super (^skos:broader){0,4} category:Nationalist_parties, category:New_Delhi
}
Result:
dbpedia.org/resource/Category:Government-related_organizations
dbpedia.org/resource/Category:Government
First Match at level 4 with 2 Super Classes
P.S: It is not necessary that the other query will match at (^skos:broader){0,4}. So i am manually firing the above query from (^skos:broader){0,0} and incrementing as (^skos:broader){0,1}->(^skos:broader){0,2)...to the first match.
select distinct ?super where {
?super (^skos:broader){0,6} category:Nationalist_parties, category:New_Delhi_district
}
Result:
dbpedia.org/resource/Category:Categories_by_topic
dbpedia.org/resource/Category:Government
dbpedia.org/resource/Category:Categories_by_parameter
dbpedia.org/resource/Category:Political_geography
First Match at level 6 with 4 Super Classes
===================================
Combining these 3 queries i want this type of result in a tabular form:-
==================================
**CategoryI(QueryI---Category(QuesryII)---Level --count matches*
Nationalist_parties---New_Delhi---------------------------- 4------ 2
Nationalist_parties---New_Delhi_district-------------------6--------4
Nationalist_parties---Populated_places_established_in_1911
Nationalist_parties---Capitals_in_Asia
Nationalist_parties---Capitals_in_Asia
...
.....
....
Please help me to automate and combine the above query. I have read several posts but not able to figure it how.