how to group count items in SPARQL, accumulating low hit entries? - sparql

How do I count grouped entries in SPARQL, merging entries whose quantity is less than a specific factor?
Consider for example the Nobel Prize data. I could get a count of all family names with a query like
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name (count(*) as ?count) WHERE {
?id foaf:familyName ?name
}
GROUP BY $name
ORDER BY DESC($count)
How do I modify the query so it only returns the family names occuring at least 3 times, accumulating the other names as other.

Just wrap your SELECT into another one.
Query
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name_ (SUM(?count) AS ?count_) {
{
SELECT ?name (COUNT(*) AS ?count) {
?id foaf:familyName ?name
} GROUP BY ?name
}
BIND (IF(?count > 2, ?name, "Other") AS ?name_)
} GROUP BY ?name_ ORDER BY DESC(IF(?name_ = "Other", -1 , ?count_))
Results
name_ count_
----------- ---------
Smith 5
Fischer 4
Wilson 4
Lee 3
Lewis 3
Müller 3
Other 878

Related

Constructing all Wikidata triples in SPARQL that have to do with a certain identifier

I am trying to return a list of triples by using a construct query in the Wikidata SPARQL query service.
I want all triples of people who have a GTAA_ID identifier. In addition to this, of all people who have a GTAA_ID, I want all triples containing the person with a GTAA_ID as the subject + all combinations of predicates and objects.
Take the following example:
Tom Hanks has a GTAA_ID (identifier). Tom Hanks also has a lot of other statements and identifiers in Wikidata. I want to retrieve everything that has to do with Tom Hanks (as he has a GTAA_ID).
So for example:
Tom Hanks | GTAA_ID | 106942
Tom Hanks | residence | Los Angeles
Tom Hanks | residence | Oakland
Tom Hanks | occupation | film actor
Tom Hanks | occupation | film director
and so on..
The example query Wikidata give (for all triples to do with Asthma) is:
#CONSTRUCT query to get the RDF graph for a Wikidata item (e.g. asthma)
CONSTRUCT {
wd:Q35869 ?p ?o .
?o ?qualifier ?f .
?o prov:wasDerivedFrom ?u .
?u ?a ?b .
}
WHERE {
wd:Q35869 ?p ?o .
OPTIONAL {?o ?qualifier ?f .}
OPTIONAL {?o prov:wasDerivedFrom ?u .
?u ?a ?b .}
}
This returns a list of triples that has to do with the item 'Asthma'.
However, what I want to do is return a list of triples where the item is a person with a GTAA_ID.
This is the code which returns all subjects with the corresponding GTAA_ID:
CONSTRUCT {
?s ps:1741 ?o .
}
WHERE {
?s ps:P1741 ?o.
}
and the link to the Wikidata SPARQL Query Service
Many thanks in advance!

SPARQL combine - concatenate two columns into one

I'm a newbie to SPARQL, and I would like to combine two columns into one
initial table
a|b|1
c|d|2
wanted table
a|b
c|d
b|1
d|2
its like creating two different tables and putting them one on the other.
I need that to make a visualisation using d3sparql which takes this data form, I also though about importing a json and modifying it then printing it to the screen, but if this is possible that makes things a lot easier and faster..
UPDATE:
My original Query looks like that
PREFIX prefix1:<...>
PREFIX prefix2:<...>
SELECT ?x ?y ?z
WHERE {
?x prefix1:OwnsY ?y.
?y prefix1:PublishesZ ?z.
}
The simplest way (but not the most efficient) is:
PREFIX prefix1:<...>
PREFIX prefix2:<...>
SELECT DISTINCT ?x ?z
WHERE {
{ ?x prefix1:OwnsY ?y. ?y prefix1:PublishesZ ?z. }
UNION
{ ?x prefix1:OwnsY ?z. ?z prefix1:PublishesZ ?whatever. }
}
which in effect performs the query twice.
If you're just trying to extract all the subjects and objects of the owns and published triples, then this can be a very simple query. If, on the other hand, you only want data where there's all three parts, you'll need a full union. Let's create some example data and see what we can do:
#prefix : <urn:ex:>
:a :owns :b .
:b :publishes 1 .
:c :owns :d .
:d :publishes 2 .
:e :owns :f . # no corresponding :publishes
:g :publishes :h . # no corresponding :ownsprefix : <urn:ex:>
Here's your current query and results:
select ?x ?y ?z {
?x :owns ?y .
?y :publishes ?z .
}
---------------
| x | y | z |
===============
| :a | :b | 1 |
| :c | :d | 2 |
---------------
Now, if you're willing to get those owns and publishes triples that don't have corresponding publishes and owns triples, you can use a union, values block, or property path. The union would be
{ ?x :owns ?y } union { ?x :publishes ?y }
The values block would would be a bit simpler:
values ?p { :owns :publishes }
?x ?p ?y
You can make that even simpler with a property path:
?x :owns|:publishes ?y
These all give you:
-----------
| x | y |
===========
| :a | :b |
| :b | 1 |
| :c | :d |
| :d | 2 |
| :e | :f | * owns
| :g | :h | * publishes
-----------
Note that the rows with stars are present because "e owns f" and "g publishes h", even though there's nothing that f publishes, and nothing that owns g. Your original query won't find those. If you need to exclude the owns and publishes triples that don't have publishes and own counterparts, you'll need to use the union option, that includes all the parts as in user205512's answer.

How to return an union of entity id's

Consider having a list of persons(parent) and their posts(children).
Note that I have simplified the example but it could be parent-child-childofChild-etc, so some entities with a tree of children.
I would like to have a query that returns a union of some filtered(on some criteria) persons id's and their post id's like so:
person1Id
person2Id
person1Post1Id
person1Post2Id
.......
Given a concrete situation with 3 persons(with properties age,name) where Person1 has 2 posts and the other persons have no posts I managed to do this in 2 ways but both have limitations. So inside the where clause we have:
?entityId ?p ?o.
{
#persons/posts graph pattern
}
filter (?entityId =?person || ?entityId=?post) # || otherChildFitler
This works but uses filter so I want to avoid by using union:
{
SELECT (?person as ?s) WHERE{ #criteria}
order by ...
limit 1
}
UNION
{
{
SELECT ?person WHERE{ #criteria}
order by ...
limit 1
}
?s a <http://www.example.org/schema/Post> .
?s <http://www.example.org/schema/postedBy> ?person .
}
This also works but I have to duplicate the person inner query for each child.
So I tried to somehow use bind like so:
(Assume the inner query returns 1 person that has 2 posts.)
select ?s ?person ?projectedOutPerson
where
{
#tag 1
{
#returns 1 person with 2 posts
SELECT ?person WHERE
{
?person a <http://www.example.org/schema/Person> .
optional{?person <http://www.example.org/schema/age> ?age.}
}
order by desc(?age)
limit 1
}
#tag 2
bind(?person as ?projectedOutPerson)
#tag 3
{
bind(?projectedOutPerson as ?s)
}
#tag 4
UNION
{
?s a <http://www.example.org/schema/Post> .
?s <http://www.example.org/schema/postedBy> ?projectedOutPerson.
}
#tag 5
}
This retuns posts id's but doesn't add personIds and it has some curious behaviour.
I'm testing this on bigdata 1.3 and stardog 2.
?projectedOutPerson is bound ok(tag 2-3) if i select a person with no posts no posts are returned.
In both db the bind between tag3-4 is not done/joined so ?s is not bound to a person(returns an empty result).
if i remove portion tag 4-5 bigdata displays selected person(so now bind in tag 3-4 works) but stardog returns ( empty result, person1, person1) so in stardog chaining the binding doesn't work
bind(?person as ?p1) #only this is bound
bind(?p2 as ?p3)
bind(?p3 as ?s)
In sparql1.1 docs "The variable introduced by the BIND clause must not have been used in the group graph pattern up to the point of use in BIND. " but each time a new variable is introduced, so I think it should work.
So how can I solve this without duplicating the subquery or use filter.
It's much easier to answer this kind of question if you provide some data. If I understand your question correctly, you have some data like the following. I'm providing it in Turtle because it's fairly human readable.
#prefix : <http://stackoverflow.com/q/21115947/1281433/> .
# person1 has two posts, person2 has one post, and
# person3 has no posts at all.
:person1 a :Person ;
:hasPost :person1post1 , :person1post2 .
:person2 a :Person ;
:hasPost :person2post1 .
:person3 a :Person .
Now, if you want to select each person, as well as all their posts, you can do it with a query like this:
prefix : <http://stackoverflow.com/q/21115947/1281433/>
select ?id where {
?person a :Person ;
:hasPost? ?id .
}
-----------------
| id |
=================
| :person3 |
| :person2 |
| :person2post1 |
| :person1 |
| :person1post2 |
| :person1post1 |
-----------------
The trick here is that in the query pattern
?person a :Person ;
:hasPost? ?id .
the ?person a :Person ensures that ?person is a Person. Then, the pattern ?person :hasPost? ?id finds ?ids such that there's a path from ?person to ?id of length zero or one. The length zero case means that ?id can be bound to the same value of ?person. The case of length one means that you'll get every x such that ?person :hasPost x.
Now, in the case of persons and posts, it doesn't make a lot of sense (I think) to talk about posts having other posts, but if you're just looking for descendants in a tree structure, you can use * instead of ? in your property path. For instance, if you had this data:
#prefix : <http://stackoverflow.com/q/21115947/1281433/> .
# 1
# / \
# / \
# 2 3
# / \ / \
# 4 5 6 7
# \ \ \
# 8 9 10
:node1 :hasChild :node2 , :node3 .
:node2 :hasChild :node4 , :node5 .
:node3 :hasChild :node6 , :node7 .
:node4 :hasChild :node8 .
:node6 :hasChild :node9 .
:node7 :hasChild :node10 .
You could get :node3 and all its descendants with a query like this:
prefix : <http://stackoverflow.com/q/21115947/1281433/>
select ?node where {
:node3 :hasChild* ?node
}
-----------
| node |
===========
| :node3 |
| :node7 |
| :node10 |
| :node6 |
| :node9 |
-----------

how to pull out list of all hyperlinked people on a persons wikipedia page using SPARQL and dbpedia

I want to pull out a list of all the "persons" which have a link to another person on Wikipedia.
For instance, George H. W. Bush has this sentence in his bio:
"Bush was born in Milton, Massachusetts, to Senator
Prescott Bush and Dorothy Walker Bush."
Now Dorothy Bush is hyperlinked to her own page. Can I get a list which looks like:
George H. W. Bush | Dorothy Walker Bush
George H. W. Bush | Babe Ruth
George H. W. Bush | Bill Clinton
and to extend this.. for everyone on Wikipedia? I'll obviously have to break this down into bit sized chunks for it to output but I just am not sure how to code this to select for linked persons only. Thanks
One way to start would simply be to search for connected resources that are both of type Person. You can use dbpedia's web based query form.
SELECT ?person1 ?p ?person2
WHERE {
?person1 ?p ?person2.
?person1 a foaf:Person.
?person2 a foaf:Person.
}
ORDER BY ?person1
LIMIT 10
OFFSET 0
You can "split this data into chunks" by using the ORDER BY keyword and iterating over the value after OFFSET (eg. 10, 20, 30, ...). You should save all results of these seperate queries and then combine them afterwards to get the full result.
If you are only looking for a particular kind of interperson relationship on dbpedia, the following query will give you all the properties used to connect two persons.
SELECT DISTINCT ?p
WHERE {
?person1 ?p ?person2.
?person1 a foaf:Person.
?person2 a foaf:Person.
}
Choose one or several of those properties, eg. http://dbpedia.org/property/married, and get a list of person related by this property using the following query.
SELECT ?person1 ?person2
WHERE {
?person1 <http://dbpedia.org/property/married> ?person2.
?person1 a foaf:Person.
?person2 a foaf:Person.
}
ORDER BY ?person1
LIMIT 10
OFFSET 0
As you will see by yourself property usage on dbpedia is quite heterogeneous, so it might take some effort to get what you want.
Hope this helps as a starting point.

IF statement in SPARQL

I have the following SPARQL query:
PREFIX yago: <http://dbpedia.org/class/yago/>
SELECT distinct count (?Montreal) as ?Montreal count(?Toronto) as ?Toronto
WHERE
{
{?Montreal rdf:type yago:HospitalsInMontreal} UNION {?Toronto rdf:type yago:HospitalsInToronto}.
}
This query results in:
Montreal = 20
Toronto = 28
What I want is:
I want to edit the query and instead of giving 20 and 28, I want to compare the results such that
If number of hospitals in Montreal is larger than the number of hospitals in Toronto then:
Montreal = 1
Toronto = 2
If the number of hospitals in Toronto is larger than the number of hospitals in Montreal then:
Montreal = 2
Toronto = 1
I tried this query but it didn't work:
PREFIX yago: <http://dbpedia.org/class/yago/>
SELECT distinct count (?Montreal) as ?Montreal count(?Toronto) as ?Toronto
WHERE
{
{?Montreal rdf:type yago:HospitalsInMontreal} UNION {?Toronto rdf:type yago:HospitalsInToronto}.
LET( ?Montreal := IF( ?Montreal > ?Toronto, -1, 1 ).
}
Thanks
There is no LET expression in SPARQL, but you don't need one for the above query:
PREFIX yago: <http://dbpedia.org/class/yago/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT distinct if(count(?MontrealC)>count(?TorontoC),1,2) as ?Montreal if(count(?TorontoC)>count(?MontrealC),1,2) as ?Toronto
WHERE
{
{?MontrealC rdf:type yago:HospitalsInMontreal} UNION {?TorontoC rdf:type yago:HospitalsInToronto}.
}