What is the benefit of defining datatypes for literals in an RDF graph? - sparql

I am using rdflib in Python to build my first rdf graph. However, I do not understand the explicit purpose of defining Literal datatypes. I have scraped over the documentation and did my due diligence with google and the stackoverflow search, but I cannot seem to find an actual explanation for this. Why not just leave everything as a plain old Literal?
From what I have experimented with, is this so that you can search for explicit terms in your Sparql query with BIND? Does this also help with FILTERing? i.e. FILTER (?var1 > ?var2), where var1 and var2 should represent integers/floats/etc? Does it help with querying speed? Or am I just way off altogether?
Specifically, why add the following triple to mygraph
mygraph.add((amazingrdf, ns['hasValue'], Literal('42.0', datatype=XSD.float)))
instead of just this?
mygraph.add((amazingrdf, ns['hasValue'], Literal("42.0")))
I suspect that there must be some purpose I am overlooking. I appreciate your help and explanations - I want to learn this right the first time! Thanks!

Comparing two xsd:integer values in SPARQL:
ASK { FILTER (9 < 15) }
Result: true. Now with xsd:string:
ASK { FILTER ("9" < "15") }
Result: false, because when sorting strings, 9 comes after 1.
Some equality checks with xsd:decimal:
ASK { FILTER (+1.000 = 01.0) }
Result is true, it’s the same number. Now with xsd:string:
ASK { FILTER ("+1.000" = "01.0") }
False, because they are clearly different strings.
Doing some maths with xsd:integer:
SELECT (1+1 AS ?result) {}
It returns 2 (as an xsd:integer). Now for strings:
SELECT ("1"+"1" AS ?result) {}
It returns "11" as an xsd:string, because adding strings is interpreted as string concatenation (at least in Jena where I tried this; in other SPARQL engines, adding two strings might be an error, returning nothing).
As you can see, using the right datatype is important to communicate your intent to code that works with the data. The SPARQL examples make this very clear, but when working directly with an RDF API, the same kind of issues crop up around object identity, ordering, and so on.
As shown in the examples above, SPARQL offers convenient syntax for xsd:string, xsd:integer and xsd:decimal (and, not shown, for xsd:boolean and for language-tagged strings). That elevates those datatypes above the rest.


Use Literal or String as SPARQL Predicate

I am constructing a small knowledge graph from triples of strings using rdflib. A typical triple would look like: "Bob" "went" "home", and I am adding them to my graph as shown below (I know I should be using standard objects and namespaces, but this is an experiment to construct the most "barebones" graph that I can):
for s, p, o in triples:
g.add((Literal(s), Literal(p), Literal(o)))
I am attempting to query such a graph using SPARQL, and my query for extracting "Bob" from the above triple looks like:
q = """
WHERE { (?s) %s Literal("home"). }
""" % (Literal('went'))
This gives me the error below, which tells me that the query is malformed:
ParseException: Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, found 'w' (at char 23), (line:1, col:24)
I have tried plugging in the actual strings (e.g. "went" instead of Literal("went")), but that doesn't work either. Several posts like this answer and this answer address how to match literals, but that does not seem to help.
So my question is, is it possible to use Literals or simple strings as predicates in SPARQL, and if so, how? Any help would be much appreciated.

Spinrdf sp:now function is not returning anything

SPIN is a way to represent a wide range of business rules.
This is the official one line description for spin (spinrdf).
Spin enables users to represent their rules with sparqls in ontologies.
I needed to make these descriptions since there is no spinrdf tag.
I have been using spin about a week to write some rules. Now I'm writing some functions to simplify my sparqls in my rules. I have a written a simple date comparison function compareDates. When I call the function with the following sparql there is no errors and gives the expected result.
SELECT ?result
BIND(:compareDates("2015-03-03"^^xsd:date, "2015-06-09"^^xsd:date) as ?result)
I would like to use sp:now function comes with spin. When I use the following sparql I have no output.
SELECT ?result
BIND(:compareDates("2015-03-03"^^xsd:date, sp:now()) as ?result)
Then I tried the following, but no luck:
SELECT ?result
BIND(sp:now() as ?now)
BIND(:compareDates("2015-03-03"^^xsd:date, ?now) as ?result)
And then I decided to see what sp:now returns and I have runned the following sparql the result is null. This lead me to a conclusion that I won't be able to run this function.
BIND(sp:now() as ?now)
I would like to use that function or similar one but I don't get the problem. Any comment is appreciated.
As shown in the following screenshot, the function does not contain any body! This would be the problem but, why it's been placed in the related ontology if won't work.
After some research I have find out two alternative methods for having now datetime. In fact there exists a sparql implementaion of now() function documented here.
BIND(now() as ?now).
This sparql will return the following:
There is an alternative method placed in spin ontology; afn:now() which is placed under spl:MiscFunctions class. This function will give the same result.
By the way, I have been using xsd:date as my functions argument but the both now function alternatives returns xsd:dateTime literals.
To convert these to xsd:date is another story.
There exists some cast functions but they convert only type but not trim the hour part of the xsd:dateTime which causes my comparison to fail.
Thus have come up with the following sparql which uses an indirect approach to convert xsd:dateTime to xsd:date :
SELECT ?nowDateTime ?nowDate
BIND(now() as ?nowDateTime).
BIND(spif:cast(spif:dateFormat(?nowDateTime, "yyyy-MM-dd"), xsd:date) as ?nowDate).
Which converted the successfully.
This could be a premature way to convert between to date literal types but this is what I have came up to solve my problem.
Any advice is appreciated.

Sparql query with Blank node can be complex

I read this blog article, Problems of the RDF model: Blank Nodes, and there's mentioned that using blank nodes can complicate the handling of data.
Can you give me an example why using blank nodes is difficult to perform a SPARQL query?
I do not understand the complexity of blank nodes.
Can you explain me the meaning and semantics of an existential variable?
I do not understand clearly this explanation given in the RDF Semantics Recommendation, 1.5. Blank Nodes as Existential Variables.
Existential Variables
In the (first-order) predicate calculus, there is existential quantification which lets us make assertions about things that exist, without saying (or, possibly, knowing) which specific individuals in the domain we're actually talking about. For instance, a sentence like
entails the sentence
Of course, there are lots of scenarios in which the second sentence could be true without the first one being true. In that sense, the second sentence gives us less information than the first. It's also important to note that the variable x in the second sentence doesn't provide any way to find out which element in the domain of discourse actually has the given userId. It also also doesn't make any claim that there's only one such thing that has the given user id. To make that clearer, we might use an example:
This is presumably true, since someone or something out there is age 29. Note that we can't talk about y as the individual that is age 29, though, because there could be lots of them. All this sentence tells us is that there is at least one.
Even though we used different variables in the two sentences, there's nothing to say that the individuals with the specified properties might not be the same. This is particularly important in nested quantification, e.g.,
∃x.∃y.likes(x, y)
This sentence could be true because there is one individual in the domain that likes itself. just because x and y have different names in the sentence doesn't mean that they might not refer to the same individual.
Blank Nodes as Existential Variables
There is a defined RDF entailment model defined in RDF Semantics. This has been described more in another Stack Overflow question, RDF Graph Entailment. The idea is that an RDF graph is treated a big existential quantification over the blank nodes mentioned in the graph. E.g., if the triples in the graph are t1, …, tn, and the blank nodes that appear in those triples are b1, …, bm, then the graph is a formula:
∃b1, …, bm.(t1 &wedge; … &wedge; tn)
Based on the discussion of the existential variables above, note that this means that blank nodes in the data might refer to same element of the domain, or different elements, and that it's not required that exactly one element could take the place of a blank node. This means that a graph with blank nodes, when interpreted in this manner, provides much less information than you might expect.
Blank Nodes in Real Data
Now, the discussion above is useful if people are using blank nodes as existential variables. In many cases, authors think of them more as anonymous, but definite and distinct objects. E.g., if we casually write
#prefix : <https://stackoverflow.com/q/20629437/1281433/> .
:Carol :hasAddress [ :hasNumber 4222 ;
:hasStreet :Clinton_Way ] .
we may well be trying to say that there is a single address out there with the specified properties, but according to the RDF entailment model, that's not what we're doing.
In practice, this isn't so much of a problem, because we're usually not using RDF entailment. What is a problem though is that since the scope of blank variables is local to a graph, we can't run a SPARQL query against an endpoint asking for Carol's address and get back an IRI that we can reuse. If we run a query like this:
prefix : <https://stackoverflow.com/q/20629437/1281433/>
construct {
:Mike :hasAddress ?address
where {
:Carol :hasAddress ?address
then we get back the following (unhelpful) graph as a result:
#prefix : <https://stackoverflow.com/q/20629437/1281433/> .
:Mike :hasAddress [] .
We won't have a way to get more information about the address because all we have now is a blank node. If we had used IRIs, e.g.,
#prefix : <https://stackoverflow.com/q/20629437/1281433/> .
:Carol :hasAddress :address1267389 .
:address1267389 :hasNumber 4222 ;
:hasStreet :Clinton_Way .
then the query would have produced something more helpful:
#prefix : <https://stackoverflow.com/q/20629437/1281433/> .
:Mike :hasAddress :address1267389 .
Why is this more useful? The first case is like having the data
∃ x.(hasAddress(Carol,x) &wedge; hasNumber(x,4222) &wedge; hasStreet(x,ClintonWay))
and getting back a result
∃ y.hasAddress(Mike,y)
Sure, it's possible that Mike and Carol have the same address, but from these sentences there's no way to know for sure. It's much more helpful to have data like
and getting back a result
From this, you know that they have the same address, and you can ask things about it.
How much this will affect your data and its consumers depends on what the typical use cases are. For automatically constructed graphs, it may be hard to know in advance just what kind of data you'll need to be able to refer to later, so it's a good idea to generate IRIs for as many of your resources as you can. Since IRIs are free-form, it's usually not too hard to do this. For instance, if you've got some sensible “base” IRI, e.g.,
then you can easily append suffixes to identify your resources. E.g.,