Analysis with SPARQL - sparql

I am trying to accomplish some relatively simple analysis with a specific graph.
In Marklogic SPARQL path are created with the following patterns
path+ (one or more duplicate path links)
path* (zero or more duplicate path links)
path? (zero or one path link)
path1/path2 (traversing through 2 different links)
From here, one analysis I would like to achieve is retrieving all nodes that fulfills a specific condition between node X and node Y. Based on this my query would be something like
?nodeX <nodeID> 1
?nodeY <nodeID> 250
?nodeX <nodeLink>* ?nodeY
Which does not really seem correct to me, as I don't think this allows me to retrieve the path linking nodeX to nodeY.
I would also like to know if it is possible to do things such as
Betweeness centrality which is a measure of the number of times a vertex is found between the shortest path of each vertex pair in a graph.
Closeness centrality which is a measure of the distance of one vertex to all other reachable vertices in the graph.
==Update==
Based on the suggestion I have managed to retrieve the path using the following query.
?nodeX <nodeID> "1"
?nodeY <nodeID> "250"
?nodeX <nodeLink>* ?v
?v ?p ?u
?u <nodeLink>* ?nodeY
When I attempted to do <p> | !<p> in my query an error occurred and stating ! was not a valid expression. However, I believe I can still do the same by using ?path which will accept any predicate.

Related

Elasticsearch connector and owl:sameAs on graphdb

im using ruleset OWL-RL optimized and using elasticsearch connector for search.
All i want is to recoginize the entity has same value and merge all values into one document in es.
Im doing this by:
Person - hasPhone - Phone and have InverseFunctionalProperty on relation hasPhone
Example:
http://example.com#1 http://example.com#hasPhone http://example.com#111.
http://example.com#2 http://example.com#hasPhone http://example.com#111.
=> #1 owl:sameAs #2
when i search by ES, i receive two result both #1, #2 . But when i repair connector i get only one result (that what i want).
1./ I want to ask is there a way that ES connector auto merge doc and delete previous doc ?, because i dont want to repair connector all the time. When i set manageIndex:false, it always get two results when searching.
2./ How to receive only one record, exculding the others have owl:sameAs with this record by SPARQL.
3./ Is there a better ruleset for owl:sameAs and InverseFunctionalProperty for reference ?
The connector watches for changes to property paths (as specified by the index definition) but I don't think it can detect the merging (smushing) caused by sameAs, that's why you need the rebuilding. If this is an important case, I can post an improvement issue for you, but please email graphdb-support and vladimir.alexiev at ontotext with a description of your business case (and link to this question)
If you have "sameAs optimization" enabled for the repo (which it is by default) and do NOT have "expand sameAs URLs" in the query, you should get only 1 result for queries like ?x <http://example.com#hasPhone> <http://example.com#111>
OWL-RL-Optimized is good for your case. (The rulesets supporting InverseFunctionalProperty are OWL-RL, OWL-QL, rdfsPlus and their optimized variants.)

SPARQL path traversing

I am trying to create a query using SPARQL on a ttl file where I have part of the graph representing links as follows:
Is it possible to search for the type Debit and get all the literals associated with its parent ie: R494Vol1D2, Salvo, Vassallo?
Do I need to use paths?
As AKSW correctly said, RDF is about directed graphs. So I created a small n-triples file based on your image of the graph. I assume that the dataset looks like this:
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://purl.org/dc/terms/type> "Debit".
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://purl.org/dc/terms/identifier> "R494Vol1D2".
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://data.archiveshub.ac.uk/def/associatedWith> <http://natarchives.com.mt/person/person796>.
<http://natarchives.com.mt/person/person796> <http://xmlns.com/foaf/0.1/firstName> "Salvo".
<http://natarchives.com.mt/person/person796> <http://xmlns.com/foaf/0.1/family_name> "Vassallo".
Also I did not know the prefix locah but according to http://prefix.cc it stands for http://data.archiveshub.ac.uk/def/
So if this dataset is correct you could use the following query:
1 SELECT ?literal WHERE{
2 ?start <http://purl.org/dc/terms/type> "Debit".
3 ?start <http://data.archiveshub.ac.uk/def/associatedWith>* ?parent.
4 ?parent ?hasLiteral ?literal.
5 FILTER(isLiteral(?literal) && ?literal != "Debit" )
6 }
In line 2 we define the starting point of our path, which is every vertex that has the type "Debit". Then we look for all vertices that are connected to ?start with an edge labelled with <http://data.archiveshub.ac.uk/def/associatedWith>. These vertices are then bound to ?parent. After that we look for all triples that have ?parent as subject and store the object in ?literal. In Line 6 we filter everything that is not a literal or is "Debit" from ?literal resulting in the desired outcome.
If I modeled the direction of <http://data.archiveshub.ac.uk/def/associatedWith> wrongly, you could change line 3 of the query to:
?start ^<http://data.archiveshub.ac.uk/def/associatedWith>* ?parent
This would change the direction of the edge.
And to answer the question if you need to use paths: If you do not know how long the path of edges labeled with <http://data.archiveshub.ac.uk/def/associatedWith> will be, then in my opinion yes, you will have to use either * or + of property paths.

How Do I Query Against Data.gov

I am trying to teach myself this weekend how to run API queries against a data source in this case data.gov. At first I thought I'd use a simple SQL variant, but it seems in this case I have to use SPARQL.
I've read through the documentation, downloaded Twinkle, and can't seem to quite get it to run. Here is an example of a query I'm running. I'm basically trying to find all gas stations that are null around Denver, CO.
PREFIX station: https://api.data.gov/nrel/alt-fuel-stations/v1/nearest.json?api_key=???location=Denver+CO
SELECT *
WHERE
{ ?x station:network ?network like "null"
}
Any help would be very much appreciated.
SPARQL is a graph pattern language for RDF triples. A query consists of a set of "basic graph patterns" described by triple patterns of the form <subject>, <predicate>, <object>. RDF defines the subject and predicate with URI's and the object is either a URI (object property) or literal (datatype or language-tagged property). Each triple pattern in a query must therefore have three entities.
Since we don't have any examples of your data, I'll provide a way to explore the data a bit. Let's assume your prefix is correctly defined, which I doubt - it will not be the REST API URL, but the URI of the entity itself. Then you can try the following:
PREFIX station: <http://api.data.gov/nrel...>
SELECT *
WHERE
{ ?s station:network ?network .
}
...setting the PREFIX to correctly represent the namespace for network. Then look at the binding for ?network and find out how they represent null. Let's say it is a string as you show. Then the query would look like:
PREFIX station: <http://api.data.gov/nrel...>
SELECT ?s
WHERE
{ ?s station:network "null" .
}
There is no like in SPARQL, but you could use a FILTER clause using regex or other string matching features of SPARQL.
And please, please, please google "SPARQL" and "RDF". There is lots of information about SPARQL, and the W3C's SPARQL 1.1 Query Language Recommendation is a comprehensive source with many good examples.

How to assign parameter values in a SparqlParameterizedString

I'm playing around some with Dotnetrdf's sparql engine and I'm trying to create parametered queries with no success yet.
Say I'm working on a graph g with a blank node identified as _:1690 with the code
Dim queryString As SparqlParameterizedString = New SparqlParameterizedString()
queryString.Namespaces.AddNamespace("rdfs", UriFactory.Create("http://www.w3.org/2000/01/rdf-schema#"))
queryString.CommandText = "SELECT ?label { #context rdfs:label ?label } "
queryString.SetParameter("context", g.GetBlankNode("1690"))
Dim result As VDS.RDF.Query.SparqlResultSet = g.ExecuteQuery(New SparqlQueryParser().ParseFromString(queryString))
Whenever I run this, I get all nodes having a rdfs:label property instead of filtering the result on my blank node only.
Please, how to set the parameter's value properly so I get only one item in the result ?
Thanks in advance,
Max.
Blank Nodes in a SPARQL query differ from Blank Nodes in a RDF Graph
In a SPARQL Query a blank node is treated as a temporary variable which has limited scope, it does not match a specific blank node so you cannot write a SPARQL query to select by blank node identifier.
So your code creating your query is giving a result the same as if you replaced #context with a variable e.g. ?s
If you need to find the value associated with a specific blank node then you need to formulate a query that uniquely selects that blank node based on the triples it participates in. If you can't do that then you need to re-think your data modelling since if this is the case then you should likely be using URIs instead of blank nodes.
As a workaround since you are using dotNetRDF and have the original graph you are querying you can use the IGraph API instead e.g.
INode label = g.GetTriplesWithSubjectPredicate(g.GetBlankNode("1690"), g.CreateUriNode("rdfs:label")).Select(t => t.Object).FirstOrDefault();
Just remember that label could always be null if the triple you are looking for doesn't exist

SPARQL prefix wildcard

I'm attempting to write a SPARQL query which would allow me to find all nodes which are reachable from a given node. At the moment every edge has the prefix http://www.foo.com/edge# and there are 3 possible edges (uses, extends, implements). While I can get the correct result from "?start (edge:uses | edge:implements | edge:extends)* ?reached " I would like to reduce that down to one statement, some kind of wildcard after edge:, so that if I add more edge types then I wouldn't need to extend the query. Is this possible?
see this SPARQL - Restricting Result Resource to Certain Namespace(s)
If you know it's always going to be in the same namespace, you could have something looking like:
?start ?edge ?reached
FILTER(REGEX(STR(?var), "^http://www.foo.com/edge#"))