In a SPARQL query, it is possible to choose a language for each literal (and also strip the language tag):
CONSTRUCT {?x dc:title ?stripped_title.}
WHERE{
?x rdfs:label ?title .
FILTER (langMatches(lang(?title),"en"))
BIND (STR(?title) AS ?stripped_title)
}
But, if the query has many different literal properties that can be very verbose. Is there a way to set a default language for all literals in a query? Or, at least, make this query less verbose?
Related
I wrote this query but it does't work. Anyone knows what is the problem.
PREFIX : <http://www.semanticweb.org/ontologies/2009/pizza.owl#>
SELECT ?X ?Y
WHERE {?X :hasCountryOfOrigin "Italy".
?Y :hasCalorificValue "400"}
According to the Pizza ontology tutorial here, there are two main issues with your query:
hasCountryOfOrigin is an object property, thus, the values can't be literals. Italy is an individual, thus, you have to use the correct URI, probably http://www.semanticweb.org/ontologies/2009/pizza.owl#Italy
The data property hasCalorificValue has values of type integer, i.e. literals should be used like "400"^^xsd:integer (or maybe xsd:int, depends on what you've chosen in Protege)
Both triple patterns in your query are not connected, i.e. no shared variable. I don't see the goal of your query.
PREFIX : <http://www.semanticweb.org/ontologies/2009/pizza.owl#>
SELECT ?X ?Y
WHERE {?X :hasCountryOfOrigin :Italy.
?Y :hasCalorificValue "400"^^xsd:integer}
I've studied SPARQL specification on the topic and also found this answer rather interesting. However definitions are complicated enough, so I still don't see the answer for my question.
I can't find any example of query with blank nodes that returns different results than the same query with variables in place of blank nodes.
For example is there any case when the following queries return different results:
SELECT ?a ?b
WHERE {
?a :predicate _:blankNode .
_:blankNode :otherPredicate ?b .
}
SELECT ?a ?b
WHERE {
?a :predicate ?variable .
?variable :otherPredicate ?b .
}
Maybe there are more complex queries that cause different behavior?
In particular I wonder is there any examples of different results of queries executed on an RDF graph that doesn't have blank nodes.
Thanks.
PS. Yes, I know that blank nodes can be used only in one BasicGraphPattern as opposed to variables. But this is not the difference I'm talking about.
The answer that you linked to is about blank nodes in the data that is being queried, not about blank nodes in the query. You're absolutely right that blank nodes in the query act just like variables. The specification says this (emphasis added):
4.1.4 Syntax for Blank Nodes
Blank nodes in graph patterns act as variables, not as references to
specific blank nodes in the data being queried.
Blank nodes are indicated by either the label form, such as "_:abc",
or the abbreviated form "[]". A blank node that is used in only one
place in the query syntax can be indicated with []. A unique blank
node will be used to form the triple pattern. Blank node labels are
written as "_:abc" for a blank node with label "abc". The same blank
node label cannot be used in two different basic graph patterns in the
same query.
As such, your queries
SELECT ?a ?b
WHERE {
?a :predicate _:blankNode .
_:blankNode :otherPredicate ?b .
}
SELECT ?a ?b
WHERE {
?a :predicate ?variable .
?variable :otherPredicate ?b .
}
behave identically. The benefit of using a blank node instead of a variable is that you can use some more compact syntax. In this case, you could write:
SELECT ?a ?b
WHERE {
?a :predicate [ :otherPredicate ?b ] .
}
Actually, in this case, since you're only looking for one property on the thing that the blank node matches, you could use a property path:
SELECT ?a ?b
WHERE {
?a :predicate/:otherPredicate ?b .
}
For most entailment regimes, blank nodes are variables within the basic graph pattern. For OWL-DL (and others) you can get more answers (examples include the "little house" and "Oedipus" examples -- the Description Logic Handbook has details).
In the defn of SPARQL http://www.w3.org/TR/sparql11-query/#BasicGraphPattern for simple entailment the instance mapping σ(b) behaves just like the solution mapping μ(v).
One important factor not explicitly discussed is the fact that a blank node "variable" limits the results to blank node resources. A normal variable uses all available values (named resources, blank node resources, and literals).
Also, a blank node "variable" cannot be used in functions like BIND, etc. or as a column in results.
There seems to be no consistent way to query for programming languages based on name. Examples:
http://dbpedia.org/page/D_(programming_language)
rdfs:label "D (programming language)"#en
dbpprop:name "D programming language"
owl:sameAs freebase:"D (programming language)"
foaf:name "D programming language"
vs.
http://dbpedia.org/page/C++
rdfs:label "C++"#en
dbpprop:name "C++"
owl:samwAs freebase:"C++"
foaf:name "C++"
Since there's no standard convention for whether "programming language", "(programming language)", "programming_language", "(programming_language", or "" is part of a name for a programming language in dbpedia, I have no idea how to consistently search by name.
I'd like to create some sort of SPARQL query that returns http://dbpedia.org/page/D_(programming_language) for "D" and http://dbpedia.org/page/C++ for "C++", but I don't know how do to this.
Unless at least one of the various triples for programming languages uses a consistent naming convention, I'll have to hack it by querying first against name + " (programming_language)", and falling back to name + "(programming language", name + " programming language" when no results are found. But I'd like a much more robust method.
You could of course just match using a basic substring match or a regex, e.g. like this to find a match for "C++":
SELECT DISTINCT ?pl ?label
WHERE {
?pl a dbpedia-owl:ProgrammingLanguage ;
rdfs:label ?label .
FILTER(langMatches(lang(?label), "en"))
FILTER(regex(str(?label), "C\\+\\+"))
}
Of course, the above will be problematic for a programming language name like "D", since you will get back several matches ("D", "Dylan", "MAD", etc.). In those cases, you might want to do some clever postprocessing of the result, for example tokenizing the returned label and seeing if your input string occurs as a standalone word.
Regex matching in SPARQL is notoriously expensive (in terms of evaluation time), but since you combine it with a type constraint to a particular category, the DBPedia endpoint should be able to handle this kind of query just fine.
I'd use
SELECT distinct ?pl ?label
WHERE {
?pl a dbpedia-owl:ProgrammingLanguage ;
rdfs:label ?label.
?label bif:contains "'C++'" .
filter (str (?label) like '%C++%')
filter (lang(?label)="en")
}
?label bif:contains "'C++'" would filter to some degree, more specifically to C++, C, Objective-C etc., because ++ would be treated as a noise and excluded from actual search pattern.
After that you have C and need two pluses, so filter (str (?label) like '%C++%') would check for them faster than regex.
Add filter (lang(?label)="en") or filter (langmatches(lang(?label),"en")) or no lang check at all by taste.
In ANSI SQL, you can write something like:
SELECT * FROM DBTable WHERE Description LIKE 'MEET'
or also:
SELECT * FROM DBTable WHERE Description LIKE '%MEET%'
What I would like help with is writing the SPARQL equivalent of the above please.
Use a regex filter. You can find a short tutorial here
Here's what it looks like:
PREFIX ns: <http://example.com/namespace>
SELECT ?x
WHERE
{ ?x ns:SomePredicate ?y .
FILTER regex(?y, "YOUR_REGEX", "i") }
YOUR_REGEX must be an expression of the XQuery regular expression language
i is an optional flag. It means that the match is case insensitive.
If you have a fixed string to match you can use that directly in your graph pattern e.g.
PREFIX ns: <http://example.com/namespace>
SELECT ?x
WHERE
{ ?x ns:SomePredicate "YourString" }
Note this does not always work because pattern matching is based on RDF term equality which means "YourString" is not considered the same term as say "YourString"#en so if that may be an issue use the REGEX approach Tom suggests
Also some SPARQL engines provide a full text search extension which allows you to integrate Lucene style queries into your SPARQL queries which may better fit your use case and almost certainly be more efficient for the generic search which would otherwise require a REGEX
There is probably an easy to answer to this, but I can't even figure out how to formulate the Google query to find it.
I'm writing SPARQL construct queries against a dataset that includes blank nodes. So if I do a query like
CONSTRUCT {?x ?y ?z .}
WHERE {?x ?y ?z .}
Then one of my results might be:
nm:John nm:owns _:Node
Which is a problem if all of the
_:Node nm:has nm:Hats
triples don't also get into the query result somehow (because some parsers I'm using like rdflib for Python really don't like dangling bnodes).
Is there a way to write my original CONSTRUCT query to recursively add all triples attached to any bnode results such that no bnodes are left dangling in my new graph?
Recursion isn't possible. The closest I can think of is SPARQL 1.1 property paths (note: that version is out of date) but bnode tests aren't available (afaik).
You could just remove the statements with trailing bnodes:
CONSTRUCT {?x ?y ?z .} WHERE
{
?x ?y ?z .
FILTER (!isBlank(?z))
}
or try your luck fetching the next bit:
CONSTRUCT {?x ?y ?z . ?z ?w ?v } WHERE
{
?x ?y ?z .
OPTIONAL {
?z ?w ?v
FILTER (isBlank(?z) && !isBlank(?v))
}
}
(that last query is pretty punishing, btw)
You may be better off with DESCRIBE, which will often skip bnodes.
As user205512 suggests, performing that grab recursively is not possible, and as they point out, using optional(s) to go arbitrary levels down into your data getting the nodes is not feasible on anything but non-trivial sized databases.
Bnodes themselves are locally scoped, to the result set, or to the file. There's no guarantee that a BNode is you get from parsing or from a result set is the same id that is used in the database (though some database do guarantee this for query results). Furthermore, a query like "select ?s where { ?s ?p _:bnodeid1 }" is the same as "select ? where { ?s ?p ?o }" -- note that bnode is treated as a variable in that case, not as "the thing w/ the id 'bnodeid1'" This quirk of the design makes it difficult to query for bnodes, so if you are in control of the data, I'd suggest not using them. It's not hard to generate names for stuff that would otherwise be bnodes, and named resources v. bnodes will not increase overhead during querying.
That does not help you recurse down and grab data, but for that, I don't recommend doing such general queries; they don't scale well and usually return more than you want or need. I'd suggest you do more directed queries. Your original construct query will pull down the contents of the entire database, that's generally not what you want.
Lastly, while describe can be useful, there's not a standard implementation; the SPARQL spec doesn't define any particular behavior, so what it returns is left to the database vendor, and it can be different. That can make your code less portable if you plan on trying different databases with your application. If you want a specific behavior out of describe, you're best off implementing it yourself. Doing something like the concise bounded description for a resource is an easy piece of code, though you can run into some headaches around Bnodes.
With regard to working with the ruby RDF.rb library, which allows SPARQL queries with significant convenience methods on RDF::Graph objects, the following should expand blank nodes.
rdf_type = RDF::SCHEMA.Person # for example
rdf.query([nil, RDF.type, rdf_type]).each_subject do |subject|
g = RDF::Graph.new
rdf.query([subject, nil, nil]) do |s,p,o|
g << [s,p,o]
g << rdf_expand_blank_nodes(o) if o.node?
end
end
def rdf_expand_blank_nodes(object)
g = RDF::Graph.new
if object.node?
rdf.query([object, nil, nil]) do |s,p,o|
g << [s,p,o]
g << rdf_expand_blank_nodes(o) if o.node?
end
end
g
end