How to use regex in a SPARQL query with an umlaut? - sparql

I tried to execute a SPARQL query that uses regex to get all resources whose label contains a certain string (case insensitive, that's why I use regex). Unfortunately, no resource that contains an umlaut is being returned. The regex should match any label that contains Sigmund or sigmund.
Example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?s WHERE {{
?s rdfs:label ?label .
FILTER ( LANG(?label) = "en" )
BIND (STR(?label) AS ?label_text)
FILTER(REGEX(?label_text, "Sigmund", "i"))
}}
This answer for this query is the following:
http://dbpedia.org/resource/Category:Analysands_of_Sigmund_Freud
http://dbpedia.org/resource/Category:Books_about_Sigmund_Freud
http://dbpedia.org/resource/Category:Books_by_Sigmund_Freud
http://dbpedia.org/resource/Category:Case_studies_by_Sigmund_Freud
http://dbpedia.org/resource/Category:Compositions_by_Sigmund_Romberg
http://dbpedia.org/resource/Category:Cultural_depictions_of_Sigmund_Freud
http://dbpedia.org/resource/Category:Essays_by_Sigmund_Freud
http://dbpedia.org/resource/Category:Musicals_by_Sigmund_Romberg
http://dbpedia.org/resource/Category:Operas_by_Sigmund_Theophil_Staden
http://dbpedia.org/resource/Category:Sigmund_Freud
http://dbpedia.org/resource/Sigmund_Freud
http://dbpedia.org/resource/Category:Sigmund_Freud's_views
http://dbpedia.org/resource/Category:Songs_with_music_by_Sigmund_Romberg
http://dbpedia.org/resource/Category:Translators_of_Sigmund_Freud
http://dbpedia.org/resource/Category:Works_about_Sigmund_Freud
http://dbpedia.org/resource/Category:Works_by_Sigmund_Freud
http://dbpedia.org/resource/Sigmund_Mauderli
http://dbpedia.org/resource/Sigmund_Spaeth
http://dbpedia.org/resource/Barbara_Boggs_Sigmund
http://dbpedia.org/resource/Ben_Sigmund
http://dbpedia.org/resource/Sigmund
http://dbpedia.org/resource/Anne_Sigmund
http://dbpedia.org/resource/Dagobert_Sigmund_von_Wurmser
http://dbpedia.org/resource/Bernhard_Sigmund_Schultze
http://dbpedia.org/resource/Sigmund_Eisner
http://dbpedia.org/resource/Cabinet_of_Sigmundur_Davíð_Gunnlaugsson
http://dbpedia.org/resource/Carl_Ludwig_Sigmund
http://dbpedia.org/resource/Anne_Marie_Sigmund
The problem is, that this list is incomplete. There is at least one resource missing (Sigmund Jähn)
I think that this happens because of the umlaut, since Sigmund Jähn is the only resource that contains such a character.
I already tried to use the whole name for the regex, using the u flag as regex option, writing the ä in unicode, using \X instead of the ä or replacing the ä with an a, but nothing works.

This seems to be an issue specifically with http://dbpedia.org/resource/Sigmund_J%C3%A4hn, presumably not related to umlauts.
While its page and e.g. the Turtle representation list an English rdfs:label, the DBpedia’s endpoint doesn’t find it:
SELECT * WHERE { <http://dbpedia.org/resource/Sigmund_J%C3%A4hn> rdfs:label ?o . }
It finds the labels in all other languages (19 of 20), and most of them contain the umlaut. I have no idea why it’s not finding the English one.
case insensitive, that's why I use regex
You could avoid using REGEX() here, because SPARQL offers LCASE()/UCASE(), which can be used in CONTAINS():
?s rdfs:label ?label .
FILTER( LANG(?label) = "en" ) .
FILTER( CONTAINS(LCASE(?label), "sigmund") ) .
This filter checks if the lower-cased ?label contains the string "sigmund".
Suggested by UninformedUser:
A better alternative, but specific to Virtuoso’s triple store (which is used by the DBpedia endpoint), is using bif:contains :
?s rdfs:label ?label .
FILTER( LANG(?label) = "en" ) .
?label bif:contains "'Sigmund'" . # the string value itself is enclosed in quotes to allow to search for spaces or other non-alphanumeric chars
It uses a full text index

Related

How to query for publication date of books using SPARQL in DBPEDIA

I am trying to retrieve the publication date and the no.of pages for books in DBpedia. I tried the following query and it gives empty results. I see that these are properties under book(http://mappings.dbpedia.org/server/ontology/classes/Book) but could not retrieve it.
I would like to know if there is an error in the code or if dbpedia does not store these dates related to books.
SELECT ?book ?genre ?date ?numberOfPages
WHERE {
?book rdf:type dbpedia-owl:Book .
?book dbp:genre ?genre .
?book dbp:firstPublicationDate ?date .
OPTIONAL {?book dbp:numberOfPages ?numberOfPages .}
}
The dbp:firstPublicationDate does not work for two reasons:
First, as pointed in the first answer, you used the wrong prefix.
But even if you correct it, you'll see that you would still have no results. Then the best thing to do is to test with the minimum number of patters, in you case you should as for books with first publication date, two triple pattern only. If you still don't get results, you should test how <http://dbpedia.org/ontology/firstPublicationDate> is actually used with a query like this:
SELECT ?class (COUNT (DISTINCT ?s) AS ?instances)
WHERE {
?s <http://dbpedia.org/ontology/firstPublicationDate> ?date ;
a ?class
}
GROUP BY ?class
ORDER BY DESC(?instances)
LIMIT 1000
Mapping based properties are using the namespace http://dbpedia.org/ontology/, thus, the prefix must be dbo instead of dbp, which stands for http://dbpedia.org/property/.
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?book ?genre ?date ?numberOfPages
WHERE {
?book a dbo:Book ;
dbp:genre ?genre ;
dbo:firstPublicationDate ?date .
OPTIONAL {?book dbp:numberOfPages ?numberOfPages .}
}
Some additional comments:
put the prefixes to the SPARQL query such that others here can run it without any exceptions (also in the future) - the current SPARQL query uses dbpedia-owl but this one is not pre-defined on the official DBpedia anymore - it's called dbo instead
which brings me to the second point -> if you're using a public SPARQL endpoint, show its URL
you can start debugging your own SPARQL query by simply starting with only parts of it and adding more triple patterns then, e.g. in your case you could check if there is any triple with the property with
PREFIX dbp: <http://dbpedia.org/property/>
SELECT * WHERE {?book dbp:firstPublicationDate ?date } LIMIT 10
Update
As Ivo Velitchkov noticed in his answer below, the property dbo:firstPublicationDate is only used for mangas, etc., i.e. written work that was published periodically. Thus, the result will be empty.

extract the comment from DBpedia

So my question is simple: from this URI --
http://dbpedia.org/snorql/?describe=http%3A%2F%2Fdbpedia.org%2Fresource%2FRed_Dragon_%28spacecraft%29
-- I want to extract specific things like --
rdfs:comment
rdfs:label
how to do that ?
Currently your query gets all properties, including rdfs:label and rdfs:comment. To get just those properties, substitute them for ?property, e.g.:
{ <http://dbpedia.org/resource/Red_Dragon_(spacecraft)> rdfs:label ?label .
<http://dbpedia.org/resource/Red_Dragon_(spacecraft)> rdfs:comment ?comment .
}
Also, you may want to filter for language tags, e.g., FILTER (lang(?label) = "en).

using bind concat in construct query

I have the following query
CONSTRUCT{
?entity a something;
a label ?label .
}
WHERE
{
?entity a something;
a label ?label .
BIND(CONCAT(STR( ?label ), " | SOME ADDITIONAL TEXT I WOULD LIKE TO APPEND MANUALLY") ) AS ?label ) .
}
I simply want to concatenate some text with ?label, however when running the query I get the following error:
BIND clause alias '?label' was previously used
I only want to return a single instance of ?label hence, I defined it in the construct clause.
The error message seems to be accurate, but is only the first of many you will get with this query. The usual request to take a look at some SPARQL learning resources to at least understand the basics of triple-based graph pattern matching, along with, a couple of hints one what to look for. CONSTRUCT isn't a bad place to start, and the following should almost do what I think you intend:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT{
?entity rdfs:label ?label .
}
WHERE
{
?entity a ex:something ;
rdfs:label ?oldlabel .
BIND(CONCAT(STR( ?oldlabel ), " | SOME ADDITIONAL TEXT I WOULD LIKE TO APPEND MANUALLY") ) AS ?label ) .
}
There's quite a few things different about that query, so take a look to see if it accurately does what you want. One hint is the syntactic difference between using '.' and ';' to separate the triple patterns. Another is that each clause defines either a URL, using a qname in the example, or a variable, prefixed by a '?'. Neither 'label' or 'something' are valid.
I say "almost" because CONSTRUCT only returns a set of triples. To modify the labels, which I think is the intent, you need to use SPARQL Update, i.e.:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/example#>
DELETE {
?entity rdfs:label ?oldlabel .
}
INSERT{
?entity rdfs:label ?label .
}
WHERE
{
?entity a ex:something .
?entity rdfs:label ?oldlabel .
BIND(CONCAT(STR( ?oldlabel ), " | SOME ADDITIONAL TEXT I WOULD LIKE TO APPEND MANUALLY") AS ?label ) .
}
Note how the triple pattern finds matches for ?oldlabel and deletes them, inserting the newly bound ?label instead. This query assumes a default graph is defined that holds both the original data and the target for updates. If not then the graph needs to be specified using WITH or GRAPH. (Also included another hint on the syntactic difference between using '.' and ';' to separate triple patterns.)

SPARQL : how to get values for a particular resource?

I am trying this query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
WHERE
{
?AGE rdfs:label ?label.
}
I need all the values of AGE from my model but instead this query is giving me other resources values which have the same property label .
For example I have connected the resource gender to have a property rdfs:label. So in my result I get both age values and gender values.
Can anybody tell me where am I wrong ?
It seems you may be assigning some semantics to the variable '?AGE'. SPARQL is a graph pattern matching language and anything with a '?' as the first character is a variable - or better yet, an unknown in the graph pattern match. I.e., the following is an equivalent query to yours:
SPARQL ?label
WHERE
{ ?s rdfs:label ?label .
}
This will find all triples that have a rdfs:label property and select the value of ?label.
If you have a specific resource you want to query, then specify that resource in the subject, for example:
PREFIX ex: <http://example.org/ex>
SPARQL ?label
WHERE
{ ex:AGE rdfs:label ?label .
}
So understanding the difference between an unknown (denoted by '?' (or '$')) and a known (a qname or a full URI) is important to understand how SPARQL performs graph pattern matching.
Lots of SPARQL learning material on the Web, so a suggestion is to look into some of these to learn some basics.

full text search in jena sparql?

I am new to sparql and I am trying to search a word in one of the property . The simple queries works fine but I don't know how to perform full text search . I saw this example on jena website :
PREFIX text: <http://jena.apache.org/text#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s
{ ?s text:query (rdfs:label 'word' 10) ;
rdfs:label ?label
}
my model contains property named SUB: and I want to write a query for that . I don't understand what is text and query in text:query means in the above example . Pardon me if this question doesn't meet the requirements of SO.
Link to website:http://jena.apache.org/documentation/query/text-query.html
You may not need a full text index:
SELECT ?s
{ ?s your:property ?o .
FILTER regex(str(?o), "word", "i")
}
but if you do text:query is a "property function" -- it trigger accessing the Apache Lucene index and causing ?s to be bound to each of the answers from a match of 'word' (to a limit of 10) over the rdfs:label properties if you have correctly configured and loaded the data and index.