SPARQL path traversing - sparql

I am trying to create a query using SPARQL on a ttl file where I have part of the graph representing links as follows:
Is it possible to search for the type Debit and get all the literals associated with its parent ie: R494Vol1D2, Salvo, Vassallo?
Do I need to use paths?

As AKSW correctly said, RDF is about directed graphs. So I created a small n-triples file based on your image of the graph. I assume that the dataset looks like this:
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://purl.org/dc/terms/type> "Debit".
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://purl.org/dc/terms/identifier> "R494Vol1D2".
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://data.archiveshub.ac.uk/def/associatedWith> <http://natarchives.com.mt/person/person796>.
<http://natarchives.com.mt/person/person796> <http://xmlns.com/foaf/0.1/firstName> "Salvo".
<http://natarchives.com.mt/person/person796> <http://xmlns.com/foaf/0.1/family_name> "Vassallo".
Also I did not know the prefix locah but according to http://prefix.cc it stands for http://data.archiveshub.ac.uk/def/
So if this dataset is correct you could use the following query:
1 SELECT ?literal WHERE{
2 ?start <http://purl.org/dc/terms/type> "Debit".
3 ?start <http://data.archiveshub.ac.uk/def/associatedWith>* ?parent.
4 ?parent ?hasLiteral ?literal.
5 FILTER(isLiteral(?literal) && ?literal != "Debit" )
6 }
In line 2 we define the starting point of our path, which is every vertex that has the type "Debit". Then we look for all vertices that are connected to ?start with an edge labelled with <http://data.archiveshub.ac.uk/def/associatedWith>. These vertices are then bound to ?parent. After that we look for all triples that have ?parent as subject and store the object in ?literal. In Line 6 we filter everything that is not a literal or is "Debit" from ?literal resulting in the desired outcome.
If I modeled the direction of <http://data.archiveshub.ac.uk/def/associatedWith> wrongly, you could change line 3 of the query to:
?start ^<http://data.archiveshub.ac.uk/def/associatedWith>* ?parent
This would change the direction of the edge.
And to answer the question if you need to use paths: If you do not know how long the path of edges labeled with <http://data.archiveshub.ac.uk/def/associatedWith> will be, then in my opinion yes, you will have to use either * or + of property paths.

Related

Analysis with SPARQL

I am trying to accomplish some relatively simple analysis with a specific graph.
In Marklogic SPARQL path are created with the following patterns
path+ (one or more duplicate path links)
path* (zero or more duplicate path links)
path? (zero or one path link)
path1/path2 (traversing through 2 different links)
From here, one analysis I would like to achieve is retrieving all nodes that fulfills a specific condition between node X and node Y. Based on this my query would be something like
?nodeX <nodeID> 1
?nodeY <nodeID> 250
?nodeX <nodeLink>* ?nodeY
Which does not really seem correct to me, as I don't think this allows me to retrieve the path linking nodeX to nodeY.
I would also like to know if it is possible to do things such as
Betweeness centrality which is a measure of the number of times a vertex is found between the shortest path of each vertex pair in a graph.
Closeness centrality which is a measure of the distance of one vertex to all other reachable vertices in the graph.
==Update==
Based on the suggestion I have managed to retrieve the path using the following query.
?nodeX <nodeID> "1"
?nodeY <nodeID> "250"
?nodeX <nodeLink>* ?v
?v ?p ?u
?u <nodeLink>* ?nodeY
When I attempted to do <p> | !<p> in my query an error occurred and stating ! was not a valid expression. However, I believe I can still do the same by using ?path which will accept any predicate.

How to filter query SPARQL for property "type"

I have a data source file that one of its properties is an actual class instance:
<clinic:Radiology rdf:ID="rad1234">
<clinic:diagnosis>Stage 4</clinic:diagnosis>
<clinic:ProvidedBy rdf:resource="#MountSinai"/>
<clinic:ReceivedBy rdf:resource="#JohnSmith"/>
<clinic:patientId>7890123</clinic:patientId>
<clinic:radiologyDate>01-01-2017</clinic:radiologyDate>
</clinic:Radiology>
so clinic:ProvidedBy is pointing to this:
<clinic:Radiologists rdf:ID="MountSinai">
<clinic:name>Mount Sinai</clinic:name>
<clinic:npi>1234567</clinic:npi>
<clinic:specialty>Oncology</clinic:specialty>
</clinic:Radiologists>
How do I query using the property clinic:providedBy (which is of type clinic:Radiologists)? Whatever I have tried does not bring back results.
It's also not clear what exactly you want to have, so my answer will return "all radiology resources that are provided by MountSinai":
PREFIX clinic: <THE NAMESPACE OF_THE_CLINIC_PREFIX>
PREFIX : <THE_BASE_NAMESPACE_OF_YOUR_RDF_DOCUMENT>
SELECT DISTINCT ?s WHERE {
?s clinic:ProvidedBy :MountSinai
}
But, I really suggest to start with an RDF and SPARQL tutorial, since form your comment your query
SELECT * WHERE { ?x rdf:resource "#MountSinai" }
is missing fundamental SPARQL basics. And for writing a matching SPARQL query it'S always good to have a look at the data in Turtle resp. N-Triples format both of which being closer to the SPARQL syntax.

How Do I Query Against Data.gov

I am trying to teach myself this weekend how to run API queries against a data source in this case data.gov. At first I thought I'd use a simple SQL variant, but it seems in this case I have to use SPARQL.
I've read through the documentation, downloaded Twinkle, and can't seem to quite get it to run. Here is an example of a query I'm running. I'm basically trying to find all gas stations that are null around Denver, CO.
PREFIX station: https://api.data.gov/nrel/alt-fuel-stations/v1/nearest.json?api_key=???location=Denver+CO
SELECT *
WHERE
{ ?x station:network ?network like "null"
}
Any help would be very much appreciated.
SPARQL is a graph pattern language for RDF triples. A query consists of a set of "basic graph patterns" described by triple patterns of the form <subject>, <predicate>, <object>. RDF defines the subject and predicate with URI's and the object is either a URI (object property) or literal (datatype or language-tagged property). Each triple pattern in a query must therefore have three entities.
Since we don't have any examples of your data, I'll provide a way to explore the data a bit. Let's assume your prefix is correctly defined, which I doubt - it will not be the REST API URL, but the URI of the entity itself. Then you can try the following:
PREFIX station: <http://api.data.gov/nrel...>
SELECT *
WHERE
{ ?s station:network ?network .
}
...setting the PREFIX to correctly represent the namespace for network. Then look at the binding for ?network and find out how they represent null. Let's say it is a string as you show. Then the query would look like:
PREFIX station: <http://api.data.gov/nrel...>
SELECT ?s
WHERE
{ ?s station:network "null" .
}
There is no like in SPARQL, but you could use a FILTER clause using regex or other string matching features of SPARQL.
And please, please, please google "SPARQL" and "RDF". There is lots of information about SPARQL, and the W3C's SPARQL 1.1 Query Language Recommendation is a comprehensive source with many good examples.

How to assign parameter values in a SparqlParameterizedString

I'm playing around some with Dotnetrdf's sparql engine and I'm trying to create parametered queries with no success yet.
Say I'm working on a graph g with a blank node identified as _:1690 with the code
Dim queryString As SparqlParameterizedString = New SparqlParameterizedString()
queryString.Namespaces.AddNamespace("rdfs", UriFactory.Create("http://www.w3.org/2000/01/rdf-schema#"))
queryString.CommandText = "SELECT ?label { #context rdfs:label ?label } "
queryString.SetParameter("context", g.GetBlankNode("1690"))
Dim result As VDS.RDF.Query.SparqlResultSet = g.ExecuteQuery(New SparqlQueryParser().ParseFromString(queryString))
Whenever I run this, I get all nodes having a rdfs:label property instead of filtering the result on my blank node only.
Please, how to set the parameter's value properly so I get only one item in the result ?
Thanks in advance,
Max.
Blank Nodes in a SPARQL query differ from Blank Nodes in a RDF Graph
In a SPARQL Query a blank node is treated as a temporary variable which has limited scope, it does not match a specific blank node so you cannot write a SPARQL query to select by blank node identifier.
So your code creating your query is giving a result the same as if you replaced #context with a variable e.g. ?s
If you need to find the value associated with a specific blank node then you need to formulate a query that uniquely selects that blank node based on the triples it participates in. If you can't do that then you need to re-think your data modelling since if this is the case then you should likely be using URIs instead of blank nodes.
As a workaround since you are using dotNetRDF and have the original graph you are querying you can use the IGraph API instead e.g.
INode label = g.GetTriplesWithSubjectPredicate(g.GetBlankNode("1690"), g.CreateUriNode("rdfs:label")).Select(t => t.Object).FirstOrDefault();
Just remember that label could always be null if the triple you are looking for doesn't exist

SPARQL prefix wildcard

I'm attempting to write a SPARQL query which would allow me to find all nodes which are reachable from a given node. At the moment every edge has the prefix http://www.foo.com/edge# and there are 3 possible edges (uses, extends, implements). While I can get the correct result from "?start (edge:uses | edge:implements | edge:extends)* ?reached " I would like to reduce that down to one statement, some kind of wildcard after edge:, so that if I add more edge types then I wouldn't need to extend the query. Is this possible?
see this SPARQL - Restricting Result Resource to Certain Namespace(s)
If you know it's always going to be in the same namespace, you could have something looking like:
?start ?edge ?reached
FILTER(REGEX(STR(?var), "^http://www.foo.com/edge#"))