Lets say there is a RDF DB of people and each of these people has many triples defining this person's friends (so many of 'person' x:hasFriend 'otherPerson'). How can I find people who have the most similar friends? I'm a novice at SPARQL and this seems like a really complex query.
Basically, the results would be a list of people starting from the ones with the most similar friends list (to the person specified in the query) and then going down the list to the people with the least similar friends list.
So lets say I search this query for person1, the results would be something like:
person2 - 300 of the same friends
person30 - 245 of the same friends
person18 - 16 of the same friends
etc.
If you adapt the approach in my answer to How to find similar content using SPARQL (of which this might be considered a duplicate), you'd end up with something like:
select ?otherPerson (count(?friend) as ?numFriends) where {
:person1 :hasFriend ?friend . #-- person1 has a friend, ?friend .
?otherPerson :hasFriend ?friend . #-- so does ?otherPerson .
}
group by ?otherPerson #-- One result row per ?otherPerson,
order by desc(?numFriends) #-- ordered by number of common friends.
If you really want to, you can use a reverse property path to make the query pattern a little shorter:
select ?otherPerson (count(?friend) as ?numFriends) where {
#-- some ?friend is a friend of both :person1 and ?otherPerson .
?friend ^:hasFriend :person1, ?otherPerson .
}
group by ?otherPerson #-- One result row per ?otherPerson,
order by desc(?numFriends) #-- ordered by number of common friends.
Related
Is there a way to get a variable number of columns for a given predicate? Essentially, I want to turn this:
title note
A. 1
A. 2
A. 3
B. 4
B. 5
into
title note1 note2 note3
A. 1 2 3
B. 4 5 null
Like, can i set the columns created to the maximum number of "notes" in the query or something. Thanks.
There are several ways you can approach this. One way is to change your query. Now, in the general case it is not possible to do a SELECT query that does exactly what you want. However, if you happen to know in advance what the maximum number of notes per title is, you can sort of do this.
Supposing your original query was something like this:
SELECT ?title ?note
WHERE { ?title :hasNote ?note }
And supposing you know titles have at most 3 notes, you could probably (untested) do something like this:
SELECT ?title ?note1 ?note2 ?note3
WHERE {
?title :hasNote ?note1 .
OPTIONAL { ?title :hasNote ?note2 . FILTER (?note2 != ?note1) }
OPTIONAL { ?title :hasNote ?note3 . FILTER (?note3 != ?note1 && ?note3 != ?note2) }
}
As you can see this is not a very nice solution though: it doesn't scale and is probably very inefficient to process as well.
Alternatives are various forms of post-processing. To make it simpler to post-process you could use an aggregate operator to get all notes for a single item on a single line at least:
SELECT ?title (GROUP_CONCAT(?note) as ?notes)
WHERE { ?title :hasNote ?note }
GROUP BY ?title
result:
title notes
A. "1 2 3"
B. "4 5"
You could then post-process the values of the ?notes variable to split them into the separate notes again.
Another solution is that instead of using a SELECT query, you use a CONSTRUCT query to give you back an RDF graph, rather than a table, and work directly with that in your code. Tables are kinda weird in an RDF world if you think about it: you're querying a graph model, why is the query result not a graph but a table?
CONSTRUCT
WHERE { ?title :hasNote ?note }
...and then process the result in whatever API you're using to do the queries.
I would like to count the number of Wikidata items that have two properties at the same time. For example, a Viaf ID and a BNF ID, or a LoC Id and a SUDOC id. The first way that comes to my mind would be a query like this:
SELECT (COUNT(DISTINCT ?item) AS ?count) WHERE {
?item wdt:P214 ?viaf.
?item wdt:P268 ?bnf.
}
Try it.
But this query is inefficient (23 seconds) and, to apply it to 10 properties, would require 90 comparisons two by two. Is there a more efficient way to perform these calculations?
I am trying to write a SPARQL query which will return a set of patient identifier codes (?Crid) which have associated with them a specific diagnosis code (?ICD9) and DO NOT have associated with them a specific medication AND which have an order date (?OrderDate) prior to their recruitment date (?RecruitDate). I have incorporated the OBIB ontology into my graph.
Here is what I have so far (a bit simplified and with a few steps through the graph omitted for readability/sensitivity):
SELECT DISTINCT ?Crid WHERE
{?Crid a obib:CRID .
#-- Return CRIDs with a diagnosis
?Crid obib:hasPart ?ICD9 .
?ICD9 a obib:diagnosis .
#-- Return CRIDs with a medical prescription record
?Crid obib:hasPart ?medRecord .
?medRecord a obib:medicalRecord .
#-- Return CRIDs with an order date
?medRecord obib:hasPart ?OrderDate .
?OrderDate a obib:dateOfDataEntry .
#-- Return CRIDs with a recruitment date
?Crid obib:hasPart ?FormFilling .
?FormFilling a obib:formFilling .
?RecruitDate obib:isAbout ?FormFilling .
?RecruitDate a obib:dateOfDataEntry .
#-- Filter results for specific ICD9 codes
FILTER (?ICD9 = '1')
#-- Subtract Results with Certain Medication and Order Date Prior to Recruitment
#-- This is the part that I think is giving me a problem
MINUS {
FILTER (regex (?medRecord, "medication_1", "i"))
FILTER (?RecruitDate-?OrderDate < "P0D"^^xsd:dayTimeDuration)
}
}
My gut feeling is that I am not using MINUS correctly. This query returns mostly the right results: I am expecting 10 results and it is returning 12. The extraneous 2 results did take "medication_1" and have order dates before their recruitment dates, so I do not want them to be included in the set.
In case it matters, I am using a Stardog endpoint to run this query and to store my graph data.
Instead of
#-- Subtract Results with Certain Medication and Order Date Prior to Recruitment
#-- This is the part that I think is giving me a problem
MINUS {
FILTER (regex (?medRecord, "medication_1", "i"))
FILTER (?RecruitDate-?OrderDate < "P0D"^^xsd:dayTimeDuration)
}
}
I'd probably just write this without MINUS as:
FILTER (!regex(?medRecord, "medication_1", "i"))
FILTER (?RecruitDate-?OrderDate >= "P0D"^^xsd:dayTimeDuration)
I'd also probably consider whether REGEX is the right tool here (would a simple string comparison work?), but that's a different issue.
I've a dbpedia resource and I'd like to obtain all the dbpedia categories associated. For this purpose I wrote this SPARQL query
SELECT ?p ?o WHERE
{
<http://dbpedia.org/resource/Rihanna> ?p ?o .
}
focusing only on http://purl.org/dc/terms/subject property.
The results I've is a set of categories. Which could be a good manner to select the most relevant category which describes Rihanna singer?
This query orders Rihanna's categories by the total number of members in each category:
SELECT ?category (COUNT(?member) as ?memberCount) WHERE {
?member dct:subject ?category.
{ SELECT ?category WHERE { dbr:Rihanna dct:subject ?category. } }
}
ORDER BY ?memberCount
The assumption here is that, the fewer members a category has, the higher the relevance of that category for any particular member.
The results for this query list the following categories as most relevant to Rihanna:
Barbadian fashion designers
Barbadian people of Irish descent
Barbadian Christians
Barbadian people of Guyanese descent
Barbadian female singers
My question is a followup with my first question about SPARQL here.
My SPARQL query results for Mountain objects are here.
From those results I picked a certain object resource.
Now I want to get values of "is dbpedia-owl:highestPlace of" records for this chosen Mountain object.
That is, names of mountain ranges for which this mountain is highest place of.
This is, as I figure, complex. Not only because I do not know the required syntax, but also I get two objects here.
One of them is Mont Blank Massif which is of type "place".
Another one is Western Alps which is of type "mountain range" - my desired record.
I need record # 2 above but not 1. I know 1 is also relevant but sometimes it doesn't follow same pattern. Sometimes the records appear to be of YAGO type, which can be totally misleading. To be safe, I simply want to discard those records whenever there is type mismatch.
How can I form my SPARQL query to get these "is dbpedia-owl:highestPlace of" records and also have the type filtering?
you can use this query, note however that Mont_Blanc_massif in your example is both a dbpedia-owl:Place and a dbpedia-owl:MountainRange
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
}
run query
edit after comment: filter
It is not really clear what you want to filter (yago?), technically you can filter for example like this:
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
FILTER NOT EXISTS {
?place ?pred ?obj
Filter (regex(?obj, "yago"))
}
}
this filters out results that have any object with 'yago' in its URL.
Extending the result from the previous answer, the appropriate query would be
select * where {
?mountain a dbpedia-owl:Mountain ;
dbpedia-owl:abstract ?abstract ;
foaf:depiction ?depiction .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
FILTER(langMatches(lang(?abstract),"EN"))
}
LIMIT 10
SPARQL Results
This selects mountains with English abstracts that have at least one depiction (or else the pattern wouldn't match) and for which there is some mountain range of which the mountain is the highest place. Without the parts from the earlier question, if you just want to retrieve mountains that are the highest place of a range, you can use a query like this:
select * where {
?mountain a dbpedia-owl:Mountain .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
}
LIMIT 10
SPARQL results