How to use the Count query to return results from two different places? - sql

Here is my query:
prefix dbc: <http://dbpedia.org/resource/Category:>
SELECT COUNT (?x) AS ?numberIn_Horror, COUNT (?y) AS ?numberIn_Action
WHERE { ?x dct:subject dbc:Horror_film .
?y dct:subject dbc:Action_film .}
I get a numeric output for this query however, it does not match the actual value for the number of movies present in these genres.
Is there a way to change the query so it returns the number of movies accurately.
Additionally, I am unsure as to what value it is currently returning.
Added Picture
Game look-up

Related

Why DISTINCT keyword lead to different entity for these two queries?

Query 1
PREFIX ns: <http://rdf.freebase.com/ns/>
SELECT DISTINCT ?x
WHERE {
FILTER (!isLiteral(?x) OR lang(?x) = '' OR langMatches(lang(?x), 'en'))
?x ns:type.object.type ns:religion.religious_leadership_title .
?x ns:religion.religious_leadership_title.leaders ?c0 .
?c0 ns:religion.religious_organization_leadership.start_date ?sk0 .
}
ORDER BY ?sk0
LIMIT 1
Query 2
PREFIX ns: <http://rdf.freebase.com/ns/>
SELECT ?x
WHERE {
FILTER (!isLiteral(?x) OR lang(?x) = '' OR langMatches(lang(?x), 'en'))
?x ns:type.object.type ns:religion.religious_leadership_title .
?x ns:religion.religious_leadership_title.leaders ?c0 .
?c0 ns:religion.religious_organization_leadership.start_date ?sk0 .
}
ORDER BY ?sk0
LIMIT 1
So the only difference between Q1 and Q2 is that there is a DISTINCT keyword when SELECT ?x in Q1. However, Q1 gives answer m.01h_90 while Q2 gives answer m.05rd8.
Ideally, I feel this should not lead to different results, as the purpose of DISTINCT is only to get rid of duplicates in the results set if I understand it correctly, so if the original results do not have duplicates at all, there should not be any difference by adding the DISTINCT keyword.
You have a tie on the value you're ordering. Specifying distinct is causing a different execution plan which orders the rows differently, though still ordering by the one column as requested, with another row as the first one to output. Add the output column to the order by clause and you should see consustent results between the two queries.

SPARQL Query to find results which do not meet a certain criteria

I am trying to write a SPARQL query which will return a set of patient identifier codes (?Crid) which have associated with them a specific diagnosis code (?ICD9) and DO NOT have associated with them a specific medication AND which have an order date (?OrderDate) prior to their recruitment date (?RecruitDate). I have incorporated the OBIB ontology into my graph.
Here is what I have so far (a bit simplified and with a few steps through the graph omitted for readability/sensitivity):
SELECT DISTINCT ?Crid WHERE
{?Crid a obib:CRID .
#-- Return CRIDs with a diagnosis
?Crid obib:hasPart ?ICD9 .
?ICD9 a obib:diagnosis .
#-- Return CRIDs with a medical prescription record
?Crid obib:hasPart ?medRecord .
?medRecord a obib:medicalRecord .
#-- Return CRIDs with an order date
?medRecord obib:hasPart ?OrderDate .
?OrderDate a obib:dateOfDataEntry .
#-- Return CRIDs with a recruitment date
?Crid obib:hasPart ?FormFilling .
?FormFilling a obib:formFilling .
?RecruitDate obib:isAbout ?FormFilling .
?RecruitDate a obib:dateOfDataEntry .
#-- Filter results for specific ICD9 codes
FILTER (?ICD9 = '1')
#-- Subtract Results with Certain Medication and Order Date Prior to Recruitment
#-- This is the part that I think is giving me a problem
MINUS {
FILTER (regex (?medRecord, "medication_1", "i"))
FILTER (?RecruitDate-?OrderDate < "P0D"^^xsd:dayTimeDuration)
}
}
My gut feeling is that I am not using MINUS correctly. This query returns mostly the right results: I am expecting 10 results and it is returning 12. The extraneous 2 results did take "medication_1" and have order dates before their recruitment dates, so I do not want them to be included in the set.
In case it matters, I am using a Stardog endpoint to run this query and to store my graph data.
Instead of
#-- Subtract Results with Certain Medication and Order Date Prior to Recruitment
#-- This is the part that I think is giving me a problem
MINUS {
FILTER (regex (?medRecord, "medication_1", "i"))
FILTER (?RecruitDate-?OrderDate < "P0D"^^xsd:dayTimeDuration)
}
}
I'd probably just write this without MINUS as:
FILTER (!regex(?medRecord, "medication_1", "i"))
FILTER (?RecruitDate-?OrderDate >= "P0D"^^xsd:dayTimeDuration)
I'd also probably consider whether REGEX is the right tool here (would a simple string comparison work?), but that's a different issue.

How to get all the entities that do not have a given attribute?

I need to formulate a SPARQL query that returns me all entities that have a given number of values for a given attribute. For example, I want to have all the countries that border with exactly two other countries.
I also might want to find all countries that do not border with any other country (so, the number of values of the attribute "hasBorderWith" is equal to zero. In this context, it is not clear to me if there is a difference between the following two cases:
An entity has zero values for the given attribute.
An entity does not have the given entity.
For example, I can imagine that a country that does not have borders with other country does not have "hasBorderWith" attribute. Will it cause a problem?
There are a couple of questions embedded here. To find countries bordered by exactly two countries, you'd need to group by the country match and get the count. Then use HAVING, which is executed after the aggregate has been calculated to filter by the count criteria:
SELECT ?country (count(?bordered) AS ?borderCount)
WHERE {
?country a :Country .
?country :hasBorderWith ?bordered
} GROUP BY ?country
HAVING (?borderCount = 2)
For the second question, I don't see a difference between 0 and no property, and this can be computed with a negation query:
SELECT ?country
WHERE {
?country a :Country .
FILTER NOT EXISTS {
?country :hasBorderWith ?x
}
}
EDIT: to find a count of 0
Per the questions and #ASKW's suggestion, the following would get a count of 0 if there are no hasBorderWith properties:
SELECT ?country (count(?bordered) AS ?borderCount)
WHERE {
?country a :Country .
OPTIONAL {
?country :hasBorderWith ?bordered
}
} GROUP BY ?country
HAVING (?borderCount = 0)
The OPTIONAL clause allows the match to occur, but will not contribute to the count(?bordered) aggregate if ?bordered is not bound, hence members of :Country without a :hasBorderWith property will get a count of 0.

How to retrieve blank nodes from DBpedia in SPARQL, and explaining reduced results with DISTINCT

I want to retrieve blank nodes with a SPARQL query. I am using DBpedia as my dataset. For example, when I use the following query, I got a count of about 3.4 million results.
PREFIX prop:<http://dbpedia.org/property/>
select count(?x) where {
?x prop:name ?y
}
SPARQL results
When I use the DISTINCT solution modifier, I get approximately 2.2 million results.
PREFIX prop:<http://dbpedia.org/property/>
select count(DISTINCT ?x) where {
?x prop:name ?y
}
SPARQL results
I have two questions:
Are the 1.2 million records eliminated in the second query duplicates or blank nodes or something else?
How can I retrieve blank nodes and their values from DBpedia?
Getting Blank Nodes
A query like this could be used to retrieve (up to 10) blank nodes:
select ?bnode where {
?bnode ?p ?o
filter(isBlank(?bnode))
}
limit 10
However, I get no results. It doesn't look like there are blank nodes (as subjects, anyhow) in the DBpedia data.
Using DISTINCT and duplicate results
The reason that your queries return a different number of results is that ?x's have more than one name. A query like your first one:
select count(?x) where { ?x prop:name ?y }
on data like:
<somePerson> prop:name "Jim" .
<somePerson> prop:name "James" .
would produce 2, since there are two ways to match ?x prop:name ?y. ?x is bound to <somePerson> in both of them, but ?y is bound to different names. In a query like your second one:
select count(DISTINCT ?x) where { ?x prop:name ?y }
you're explicitly only counting the distinct values of ?x, and there's only one of those in my sample data. This is one way that you can end up with different numbers of results, and it doesn't require any blank nodes.

How to form SPARQL queries that refers to multiple resources

My question is a followup with my first question about SPARQL here.
My SPARQL query results for Mountain objects are here.
From those results I picked a certain object resource.
Now I want to get values of "is dbpedia-owl:highestPlace of" records for this chosen Mountain object.
That is, names of mountain ranges for which this mountain is highest place of.
This is, as I figure, complex. Not only because I do not know the required syntax, but also I get two objects here.
One of them is Mont Blank Massif which is of type "place".
Another one is Western Alps which is of type "mountain range" - my desired record.
I need record # 2 above but not 1. I know 1 is also relevant but sometimes it doesn't follow same pattern. Sometimes the records appear to be of YAGO type, which can be totally misleading. To be safe, I simply want to discard those records whenever there is type mismatch.
How can I form my SPARQL query to get these "is dbpedia-owl:highestPlace of" records and also have the type filtering?
you can use this query, note however that Mont_Blanc_massif in your example is both a dbpedia-owl:Place and a dbpedia-owl:MountainRange
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
}
run query
edit after comment: filter
It is not really clear what you want to filter (yago?), technically you can filter for example like this:
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
FILTER NOT EXISTS {
?place ?pred ?obj
Filter (regex(?obj, "yago"))
}
}
this filters out results that have any object with 'yago' in its URL.
Extending the result from the previous answer, the appropriate query would be
select * where {
?mountain a dbpedia-owl:Mountain ;
dbpedia-owl:abstract ?abstract ;
foaf:depiction ?depiction .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
FILTER(langMatches(lang(?abstract),"EN"))
}
LIMIT 10
SPARQL Results
This selects mountains with English abstracts that have at least one depiction (or else the pattern wouldn't match) and for which there is some mountain range of which the mountain is the highest place. Without the parts from the earlier question, if you just want to retrieve mountains that are the highest place of a range, you can use a query like this:
select * where {
?mountain a dbpedia-owl:Mountain .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
}
LIMIT 10
SPARQL results