Datalog Rule taking forever to compute - sparql

I'm using RDFox v3 and when I import the following rule it takes a long time to run. My datastore has CRM information and I'm trying to classify their origins using rules.
[ ?customer, :referral, "true"] :- [ ?customer, :has, ?referralLink] .
I don't see what I'm doing wrong.
Thank you!

This suggests you have a lot of :has relationships in your data. If this is the case the variables you have listed as ?customer and ?referralLink are matching on everything that is related by :has. This is happening regardless of whether they are a customer or not and that is what I suggest may be taking the time.
If ?customer and ?referralLink have types you might want to specify this in your rule. For example supposing they are of type :CustomerType and :ReferralLinkType then your rule will become:
[ ?customer, :referral, "true"] :-
[ ?customer, :has, ?referralLink] ,
[ ?customer, a, :CustomerType] ,
[ ?referralLink, a, :ReferralLinkType] .
Hopefully that helps.

Related

Writing the correct SQL statement in AWS IoT rule

I am working on AWS IoT to develop my custom solution based on a set of sensors, and I have a problem regarding how to write the SQL statement related to the kind of data I receive from a Zigbee sensor.
An example of what I receive from my sensor is reported here:
{
"type": "reportAttribute",
"from": "WIFI",
"deviceCode": "aws_device_code",
"to": "CLOUD",
"mac": "30:ae:7b:e2:e1:e6",
"time": 1668506014,
"data": {...}
}
What I would like to do is to select messages that have the from field equal to GREENPOWER, something along the lines of SELECT * FROM 'test' WHERE from = 'GREENPOWER', but from is also a keyword in SQL hence my problem. I am no expert whatsoever in SQL, so I am not sure how this can be done. I am also looking for a way to modify the received data, but solving this problem on AWS would be much easier.
Thank you very much for your help!
There are quite a lot of SQL functions that exist in AWS IoT Rule. You can find them here: https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html
In your case, something like this should work:
SELECT * FROM 'test' WHERE get(*, "from") = "GREENPOWER"

CosmosDB: Is it a good practice to use ORDER BY on the property that is also used in range filter?

When I ran the query below on CosmosDB Explorer on Azure portal, several hundreds of RUs was consumed according to Query Stats.
select * from c where c.name = "john" and c._ts > 0
But after I added order by c._ts to the query above, only roughly 20 RUs was consumed.
According to the similar question, this behavior is expected.
(But I don't really understand why range filter is not enough to avoid looking at unnecessary indices)
So is it a good practice to use ORDER BY on the properties that are also used in range filter?
There is no guarantee that a ORDER BY query will use a range index although it normally does.
The best way to ensure you get a good index hit and thus lower RU consumption consistently is to use a composite index like below, of course adjusting your other properties as needed but you can see the _ts part in there as well.
This information can be found in the documentation here
{
"automatic":true,
"indexingMode":"Consistent",
"includedPaths":[
{
"path":"/*"
}
],
"excludedPaths":[],
"compositeIndexes":[
[
{
"path":"/foodGroup",
"order":"ascending"
},
{
"path":"/_ts",
"order":"ascending"
}
]
]
}

RDFlib Blank node update query

I am trying to update the object of a triple with a blank node as its subject using RDFlib. I firstly select the blank node in the first function and insert this blank node into the update query in the second function, however, this doesn't provide me with the required output. I can't use the add() method or initBindings as I need to save the SPARQL query executed for the user.
Sample data
#prefix rr: <http://www.w3.org/ns/r2rml#> .
[ rr:objectMap [ rr:column "age" ;
rr:language "dhhdhd"] ].
Code
mapping_graph = Graph().parse("valid_mapping.ttl",format="ttl")
# find the blank node for the update query
def find_om_IRI():
query = """SELECT ?om
WHERE {
?om rr:language 'dhhdhd' .
}
"""
qres = mapping_graph.query(query)
for row in qres:
return row[0]
# insert blank node as subject to update query
def change_language_tag():
om_IRI = find_om_IRI()
update_query = """
PREFIX rr: <http://www.w3.org/ns/r2rml#>
DELETE DATA{
_:%s rr:language 'dhhdhd' .
}
""" % (om_IRI)
processUpdate(mapping_graph, update_query)
print(update_query)
print(mapping_graph.serialize(format="ttl").decode("utf-8"))
return update_query
change_language_tag()
This however returns the following output. Leaving the graph unchanged.
#prefix rr: <http://www.w3.org/ns/r2rml#> .
[ rr:objectMap [ rr:column "age" ;
rr:language "dhhdhd"] ].
If you filter based on the blank node value. This is the final query I came up with.
PREFIX rr: <http://www.w3.org/ns/r2rml#>
DELETE { ?om rr:language "dhhdhd" }
INSERT { ?om rr:language "en-fhfhfh" }
WHERE {
SELECT ?om
WHERE {
?om rr:language "dhhdhd" .
FILTER(str(?om) = "ub1bL24C24").
}
}
Indeed, as commenter #TallTed says "Blank nodes cannot be directly referenced in separate queries". You are trying to do something with BNs for which the are expressly not defined, that is persist their absolute identity, e.g. across separate queries. You should take the approach of relative identification (locate the BN with reference to identified, URI, nodes) or single SPARQL queries. So this question is an RDF/SPARQL question, not an RDFlib question.
You said: "I can't combine the queries as there could be other object maps with the same language tag". So if you cannot deterministically refer to a node due to its lack of distinctness, you will have to change your data, but I suspect you can - see the suggestion at the end.
Then you said "I have figured out the solution and have updated the question accordingly. Its a hack really..." Yes, don't do this! You should have a solution that's not dependent on some quirks of RDFlib! RDF and Semantic Web in general is all about universally defined and standard data and querying, so don't rely on a particular toolkit for a data question like this. Use RDFlib only as an implementation but one that should be replicable in another language. I personally model all my RDFlib triple adding/deleting/selecting code as standard SPARQL queries first so that my RDFlib code is then just a standard function equivalent.
In your own answer you said "If you filter based on the blank node value...", also don't do this either!
My suggestion is to change your underlying data to include representations of things - named nodes etc - that you can use to fix on for querying. If you cannot distinguish between things that you want to change without resorting to hacks, then you have a data modelling problem that needs solving. I do think you can distinguishes object maps though.
In your data, you must be able to fix on the particular object map for which you are changing the language. Is the object map unique per column and is the column uniquely identified by its rr:column value? If so:
SELECT ?lang
WHERE {
  ?om rr:column ?col .  ?om rr:language ?lang .
  FILTER (?col = "age")
}
This will get you the object map for the column "age" so, to change it:
DELETE {
  ?om rr:language ?lang .
}
INSERT {
  ?om rr:language "new-language" .
}
WHERE {
  ?om rr:column ?col .  ?om rr:language ?lang .
  FILTER (?col = "age")
}

SPIN representation to SPARQL

Is there an API that could help convert SPIN representation (of a SPARQL query) back to its SPARQL query form?
From:
[ a <http://spinrdf.org/sp#Select> ;
<http://spinrdf.org/sp#where> ( [ <http://spinrdf.org/sp#object> [ <http://spinrdf.org/sp#varName>
"o"^^<http://www.w3.org/2001/XMLSchema#string> ] ;
<http://spinrdf.org/sp#predicate>
[ <http://spinrdf.org/sp#varName>
"p"^^<http://www.w3.org/2001/XMLSchema#string> ] ;
<http://spinrdf.org/sp#subject>
[ <http://spinrdf.org/sp#varName>
"s"^^<http://www.w3.org/2001/XMLSchema#string> ]
] )
] .
To:
SELECT *
WHERE {
?s ?p ?o .
}
Thanks in advance.
I know two jena-based APIs to work with SPIN.
You can use either org.topbraid:shacl:1.0.1 which is based on jena-arq:3.0.4 or the mentioned in the comment org.spinrdf:spinrdf:3.0.0-SNAPSHOT, which is a fork of the first one, but with changed namespaces and updated dependencies.
Note the first (original) API also may work with modern jena (3.13.x), at least your task can be solved in such circumstances.
The second API has no maven release yet, but can be included into your project via jitpack.
To solve the problem you need to find the root org.apache.jena.rdf.model.Resource and cast it to org.topbraid.spin.model.Select (or org.spinrdf.model.Select) using jena polymorphism (i.e. the operation org.apache.jena.rdf.model.RDFNode#as(Class)).
Then #toString() will return the desired query with the model's prefixes.
Note that all personalities are already included into model via static initialization.
A demonstration of this approach is SpinTransformer from ONT-API test-scope, which transforms SPARQL-based queries to an equivalent form with sp:text.

Ravendb Multi Get with index

I am trying to use ravendb (build 960) multi get to get the results of several queries.
I am posting to /multi_get with:
[
{"Url":"/databases/myDb/indexes/composers?query=title:beethoven&fetch=title&fetch=biography"},
{"Url":"/databases/myDb/indexes/products?query=title:beethoven&fetch=title&fetch=price"}
]
The server responds with results for each query, however it responds with EVERY document for each index. It looks like neither the query is used, or the fetch parameters.
Is there something I am doing wrong here?
Multi GET assumes all the urls are local to the current database, you can specify urls starting with /datbases/foo
You specify that in the multi get url.
Change you code to generate:
[
{"Url":"/indexes/composers?query=title:beethoven&fetch=title&fetch=biography"},
{"Url":"/indexes/products?query=title:beethoven&fetch=title&fetch=price"}
]
And make sure that you multi get goes to
/databases/mydb/multi_get