SPARQL Query returning strange results - sparql

I am working on SPARQL since nearly 2 years, but I have never seen such strange situation in the past. (Note: I am using native triplestore)
Query1:
prefix leaks: <http://data.ontotext.com/resource/leaks/>
prefix leak: <http://data.ontotext.com/resource/leak/>
SELECT * WHERE
{
leaks:entity-10000001 leak:jurisdiction_description ?object.
}
Query2:
prefix leaks: <http://data.ontotext.com/resource/leaks/>
prefix leak: <http://data.ontotext.com/resource/leak/>
SELECT * WHERE
{
leaks:entity-10000001 ?p ?object.
}
Here Query1 is returning some results where as Query2 is returning no results.
If I put it in other way, merging above both queries, below query (Query3) is returning few records.
Query3:
prefix leaks: <http://data.ontotext.com/resource/leaks/>
prefix leak: <http://data.ontotext.com/resource/leak/>
SELECT distinct ?s WHERE
{
?s leak:jurisdiction_description ?object.
FILTER NOT EXISTS { ?s ?p ?o}.
}
Ideally this should not be the case. Query3 should be always with no results since second condition ?s ?p ?o is superset of first one ?s leak:jurisdiction_description ?object
I have no clue why is this happening.

You have an issue with your triplestore, I guess
Try the same queries at http://data.ontotext.com/sparql, which is the "home" of this dataset, backed by GraphDB. Query2 provides 23 results, as expected.
The fact that it returns results for Query3 is a serious indication there is something wrong with your setup

I came to know why was this happening. For some reason indexing files (pos, sop, etc) were not synchronized properly, had gone to inconsistent state. When I tried deleting primary-triples folder under MARMOTTA_HOME, And reingested data, it started working for me since it forced to reindex triplestore data. Thanks #Jeen Broekstra for the heads up :)

Related

Marklogic: How can we perform the case-insensitve search in a pure SPARQL query?

I have a scenario where i am trying to find the content using the SPARQL query for the triples stored in marklogic. The filter condition in SPARQL query needs to perform the case-insensitve search for a particular term. May i know how can i do that?
For eg:
filter(strstarts(?personName, "FA"^^xs:string))
The above filters should fetch me the results whose personName value starts with upperCase also(like: fa). I think this will clearly give some idea about the issue i am asking about.
I believe you have two options to do case-insensitive search using SPARQL in MarkLogic.
If you want to use SPARQL only than you can do the following (modify the select statement as needed):
select * where {
?personName ?p ?o
FILTER (lcase(str(?personName)) = "fa"^^xs:string)
}
As an alternative you could also mix some fn:* functions with your SPARQL statement so you could do something similar to:
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select * where {
?personName ?p ?o
FILTER (?personName, fn:lower-case("FA"))
}
Don't forget that in MarkLogic you can use any fn:* or cts:* function as well (the prefix for cts:* functions would be prefix cts: <http://marklogic.com/cts#>
I hope this helps.
Next to the good suggestions of Tamas, there is also REGEX. It accepts a case-insensitivity flag. Something like:
select * where {
?personName ?p ?o
FILTER( regex(str(?personName), "^fa", "i") )
}
HTH!

Bizarre results with bif:contains - corrupted full text index?

I recently built a Virtuoso database (version 07.10.3207) using dbpedia data. I'm trying to build some queries for it, and encountering very strange results. For one example:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?s, ?p, ?o where {
?s ?p ?o .
?s rdfs:label "Almond"#en .
?o bif:contains "mythical"
}
This yields a hit. One might expect it to mean that the comment field (the field that matches "mythical") for Almond contains the word "mythical". However, it does not. It is, in fact:
"The almond (/??m?nd/) (Prunus dulcis, syn. Prunus amygdalus, Amygdalus communis, Amygdalus dulcis) (or badam in Indian English, from Persian: ??????) is a species of tree native to the Middle East and South Asia. "Almond" is also the name of the edible and widely cultivated seed of this tree."#en
Many other queries yield similarly strange results.
Trying the same queries on the public dbpedia endpoint does not yield these bizarre results, so I know it's somehow a problem with my database. I guessed that it might have to do with some corruption of the full text indices.
I tried the following, without a super-clear understanding of what exactly they might do, based on other notes I was able to find:
DB.DBA.RDF_OBJ_FT_RULE_ADD(null, null, 'All');
DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();
DB.DBA.RDF_OBJ_FT_RECOVER();
DB.DBA.VT_INDEX_DB_DBA_RDF_OBJ();
Thus far, no dice. I'm sort of wondering if it might have to do with the mangled characters in the comment field - the online dbpedia endpoint renders them properly, while my Virtuoso installation just gives question marks, as seen above. No idea even how to begin approaching this though.
I did include SQL_UTF8_EXECS = 1 in virtuoso.ini (and subsequently restart the server), which still left me with question marks in the results.
Actually, it does not appear to have anything to do with those question marks; I ran the following query:
select ?s, ?p, ?o where {
?s ?p ?o .
?o bif:contains "mythical" .
FILTER (!regex(?o, "mythical", "i"))
}
A pseudorandom selection of hits, none of which contain "mythical" or "?":
"Asgrrr"
"403 BC"
"Potential infinity"
"Beauty and the Beast (talk show)"
"Alberta highway highway 22"
The same query, run at http://dbpedia.org/sparql, returns nothing (as it should).
Any ideas?
Rebuilding the database did not correct the problem. I was, however, able to get a working version by taking the following steps. Some of them may have been unnecessary, but given how long it takes, I haven't done controlled experiments to narrow things down to the minimum.
First, delete the database and associated files, to start with a blank slate.
Edit virtuoso.ini to uncomment/include:
SQL_UTF8_EXECS = 1
Start up Virtuoso, and then from isql issue the following commands:
DB.DBA.RDF_OBJ_FT_RULE_ADD (null, null, 'All');
DB.DBA.VT_BATCH_UPDATE ('DB.DBA.RDF_OBJ', 'OFF', null);
DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ ();
DB.DBA.RDF_OBJ_FT_RECOVER ();
COMMIT WORK;
CHECKPOINT;
CHECKPOINT_INTERVAL(60000);
Then, load the data.
Then, call:
COMMIT WORK;
CHECKPOINT;
DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();
CHECKPOINT;
COMMIT WORK;
CHECKPOINT;
CHECKPOINT_INTERVAL(60);
COMMIT WORK;
Enjoy your fully-text-searchable database!

SPARQL query for all people for an institution on dbpedia

I'm trying to extract alumni lists for universities using SPARQL.
I've identified the ontologies I need:
http://mappings.dbpedia.org/server/ontology/classes/University
http://mappings.dbpedia.org/server/ontology/classes/Person
I tried this query, which you can examine here:
SELECT * WHERE {
?University dbpedia2:alumni ?Person .
}
Which seemed to make sense, except this returns counts instead of people, as the ontology says the property contains.
I found this query somewhere which seemed to do a better job finding universities, but was very slow.
SELECT * WHERE {
{ <http://dbpedia.org/ontology/University> ?property ?hasValue }
UNION
{ ?isValueOf ?property <http://dbpedia.org/ontology/University> }
}
I also tried going the other way, start with all people and look for their almae matres, in this form:
SELECT * WHERE {
?person dbpedia2:almaMater ?University
}
But this is much slower, possibly because searching through the people space is too laborious. This does actually work, but it returns a different set of results in application---namely, all people with a listed alma mater, rather than all people listed by universities as alumni. I'd prefer a syntax that gets me the alumni.
How can I phrase this to return all alumni listed for universities?
The performance of DBpedia's SPARQL endpoint can be a bit unreliable at times. After all, it's apublic service, and isn't intended for huge queries. Nonetheless, I think you can get what you're looking for here without too much trouble. First, you can check how many results there are with a query like this at the public SPARQL endpoint:
select (count(*) as ?nResults) where {
?person dbpedia-owl:almaMater ?almaMater
}
SPARQL results (64928)
Now, if you just want the big list, you'd get it like this. The order by helps organize the results for easy consumption, but isn't technically necessary:
select ?almaMater ?person where {
?person dbpedia-owl:almaMater ?almaMater
}
order by ?almaMater ?person
SPARQL results
If you need to place some additional restrictions on ?almaMater, e.g., to ensure that it's a university, then you can add them to the query. For instance:
select ?almaMater ?person where {
?person dbpedia-owl:almaMater ?almaMater .
?almaMater a dbpedia-owl:University .
}
order by ?almaMater ?person
SPARQL results
In your last query, you are almost there. However, you are currently asking for any resource that can take the place of the ?University variable. As you only want universities to take that place, you can use another triple to further restrict that variable:
SELECT * WHERE {
?University a dbpedia-owl:University.
?person dbpedia2:almaMater ?University.
}
This means that ?University can only be an individual of class dbpedia-owl:University (where dbpedia-owl is mapped to http://dbpedia.org/ontology/).
Your first query:
SELECT * WHERE {
?University dbpedia2:alumni ?Person .
}
isn't just returning counts; it's returning both counts and individual alumni. Apparently dbpedia's data here is poor quality and there are a number of triples misusing the dbpedia2:alumni relation.
You can filter out the counts by adding a second condition requiring that an entity satisfying Person be a member of the appropriate class:
SELECT * WHERE {
?university dbpedia2:alumni ?person .
?person rdf:type <http://dbpedia.org/ontology/Person>
}
What you see running this is that there are very few individuals tagged as alumni; the data is surprisingly scant, unfortunately.

Query on sindice SPARQL endpoint

I tried to make this query on http://sparql.sindice.com/
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE
{
?thing rdfs:label ?name .
?thing rev:hasReview ?review .
filter regex(str(?name), "harlem", "i")
} LIMIT 10
And it returns 504 Gateway Time-out
The server didn't respond in time.
What i'm doing wrong?
Thanks.
You made a query that was too hard for the endpoint to answer in a timely fashion hence why you got a timeout response. Note that there website states the following:
all queries are time and resource limited. notice that this means that
sometime you will get incomplete or even no results. If this is
happening often for you or you really want to run more complex queries
please contact us
Your query essentially selects a vast swathe of data and then makes the engine run a regular expression over ever possible value which is extremely slow.
I believe Sindice use Virtuoso as their SPARQL implementation so you can cheat and use Virtuoso specific full text query extension like so:
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE
{
?thing rdfs:label ?name .
?thing rev:hasReview ?review .
?name bif:contains "harlem" .
}
LIMIT 10
However this query also seems to timeout, if you can add more conditions to constrain your query further you will have more chance of getting results in a timely fashion.

sparql empty result for dbpedia-owl:influenced property

I am trying to retrieve the value of the dbpedia-owl:influenced in this page e.g: Andy_Warhol
The query I write is:
PREFIX rsc : http://dbpedia.org/resource
PREFIX dbpedia-owl :http://dbpedia.org/ontology
SELECT ?o WHERE {
rsc:Andy_Warhol dbpedia-owl:infuenced ?o .
}
but it is EMPTY.
Strange is that when I have the same query for another property from the ontology type like "birthPlace", the sparql engine gives the result back:
SELECT ?o WHERE {
rsc:Andy_Warhol dbpedia-owl:birthplace ?o .
}
which is a link to another resource:
dbpedia.org/resource/Pittsburgh
I am just confused how to write this query?
besides several formal errors addressed in the answer of #Joshua, there is also the semantic problem that the properties you are looking for - in this case - seem to be found on the entities that were influenced.
this query might give you the desired results
PREFIX rsc: <http://dbpedia.org/resource/>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT ?s WHERE {
?s dbpedia-owl:influencedBy rsc:Andy_Warhol .
}
run query
There are a few issues here. One is that the SPARQL, as presented, isn't correct. I edited to make the prefix syntax legal, but the prefixes were still wrong (they didn't end with a final slash). You don't want to be querying for http://dbpedia.org/resourceAndy_Warhol after all; you want to query for http://dbpedia.org/resource/Andy_Warhol. Some standard namespaces for DBpedia are listed on their SPARQL endpoint. Using those namespaces and the SPARQL endpoint, we can ask for all the triples that have http://dbpedia.org/resource/Andy_Warhol as the subject with this query:
SELECT * WHERE {
dbpedia:Andy_Warhol ?p ?o .
}
In the results produced there, you'll see the one using http://dbpedia.org/ontology/birthPlace (note the captial P in birthPlace), but you won't see any triples with the predicate http://dbpedia.org/ontology/infuenced, so it makes sense that your first query has no results. Do you have some reason to suppose that there should be some results?