Virtuoso SQL query of a SPARQL query in iSQL - sparql

I was going through this link, which uses EXPLAIN() to show us the SQL query that Virtuoso generates(uses internally) for the input SPARQL query. I tried it on my Virtuoso 7.x version and found that I get a different output. I am not able to understand the output fully. Would it be possible to explain what this output from iSQL means and how I would interpret a SQL query from this?
The SPARQL query is
SPARQL SELECT DISTINCT ?s FROM <http://dbpedia.org> WHERE {
?s a <http://dbpedia.org/ontology/Cricketer> .
?s <http://dbpedia.org/property/testdebutyear> ?o .
};
The output I get is
{
Subquery 27
{
RDF_QUAD 3.2e+03 rows(s_1_2_t1.S)
inlined P = #/testdebutyear G = #/dbpedia.org
RDF_QUAD unq 0.8 rows (s_1_2_t0.S)
inlined P = ##type , S = s_1_2_t1.S , O = #/Cricketer , G = #/dbpedia.org
Distinct (s_1_2_t0.S)
After code:
0: s := := artm s_1_2_t0.S
4: BReturn 0
Subquery Select(s)
}
After code:
0: s := Call __ro2sq (s)
5: BReturn 0
Select (s)
}
20 Rows. -- 2 msec.
How do I find the SQL query in this case? Is there a command or link that I am missing?

To answer your specific question, you might read further down the same page you pointed to above, to where it talks specifically about how to "Translate a SPARQL query into the correspondent SQL."
This is also discussed on another feature-specific page.
(Also note, the sample output on both these pages came from Virtuoso 6.x, while you're running 7.x, so your output will likely still differ.)
Here's what I got from my local Virtuoso 7.2.4 (Commercial Edition) --
SQL> SET SPARQL_TRANSLATE ON ;
SQL> SELECT DISTINCT ?s FROM <http://dbpedia.org> WHERE { ?s a <http://dbpedia.org/ontology/Cricketer> . ?s <http://dbpedia.org/property/testdebutyear> ?o . } ;
SPARQL_TO_SQL_TEXT
LONG VARCHAR
_______________________________________________________________________________
SELECT __ro2sq ("s_1_2_rbc"."s") AS "s" FROM (SELECT DISTINCT "s_1_2_t0"."S" AS "s"
FROM DB.DBA.RDF_QUAD AS "s_1_2_t0"
INNER JOIN DB.DBA.RDF_QUAD AS "s_1_2_t1"
ON (
"s_1_2_t0"."S" = "s_1_2_t1"."S")
WHERE
"s_1_2_t0"."G" = __i2idn ( __bft( 'http://dbpedia.org' , 1))
AND
"s_1_2_t0"."P" = __i2idn ( __bft( 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
AND
"s_1_2_t0"."O" = __i2idn ( __bft( 'http://dbpedia.org/ontology/Cricketer' , 1))
AND
"s_1_2_t1"."G" = __i2idn ( __bft( 'http://dbpedia.org' , 1))
AND
"s_1_2_t1"."P" = __i2idn ( __bft( 'http://dbpedia.org/property/testdebutyear' , 1))
OPTION (QUIETCAST)) AS "s_1_2_rbc"
1 Rows. -- 3 msec.
SQL> SET SPARQL_TRANSLATE OFF ;
NOTE -- this feature is only available in the command-line iSQL; it is not found in the browser-based iSQL interface.
All that said... I strongly suggest you ask us your initial questions going forward, instead of likely getting distracted by XY Problems. There's fairly little to understand about the single table of quad data (DB.DBA.RDF_QUAD), which has 2 full and 3 partial indexes by default, which are sufficient for most typical uses, all as discussed in the documentation.
The Virtuoso Users mailing list is often a better resource for Virtuoso-specific questions than here, where you tend to get a fair amount of guessing responses.
(ObDisclaimer: I work for OpenLink Software, producer of Virtuoso.)

Related

SPARQL SubQuery in Filter

SELECT ?Name FROM <http://.../biblio.rdf>
WHERE { ?Aut name ?Name . ?Pub author ?Aut . ?Pub conf ?Conf
FILTER (?Conf IN ( SELECT ?ConfX FROM <http://.../biblio.rdf>
WHERE { ?ConfX series "ISWC" }))}
I have taken the query from http://www.renzoangles.net/files/amw2011.pdf.
Getting the malformed query syntax error when I tried the above format in AWS Neptune.
Please help me fix the above query.
If you want to test that a triple is in the data, which seems to be the intention of the exampel of "FILTER IN SELECT " here, yuo can use FILTER EXISTS
FILTER EXISTS { ?Conf series "ISWC" }

Aggregate functions in Sparql query with empty records

I've been trying to run a sparql query against https://landregistry.data.gov.uk/app/qonsole# to yield some sold properties result.
The query is the following:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ukhpi: <http://landregistry.data.gov.uk/def/ukhpi/>
SELECT sum(?ukhpi_salesVolume)
WHERE
{ { SELECT ?ukhpi_refMonth ?item
WHERE
{ ?item ukhpi:refRegion <http://landregistry.data.gov.uk/id/region/haringey> ;
ukhpi:refMonth ?ukhpi_refMonth
FILTER ( ?ukhpi_refMonth >= "2019-03"^^xsd:gYearMonth )
FILTER ( ?ukhpi_refMonth < "2020-03"^^xsd:gYearMonth )
}
}
OPTIONAL
{ ?item ukhpi:salesVolume ?ukhpi_salesVolume }
}
The problem is, the result from this is empty. However, if i run the same query without the SUM on the 4th line, i can see there are 11 integer records.
My thoughts are that there is a 12th, empty record which causes all the issues in the SUM operation, but sparql is not my storngest side so i'm not sure how to filter this (and remove any empty records) if that's really the problem.
I've also noticed that most of the aggregate functions do not work as well(min, max, avg). *Count does, and returns 11
I actually solved this myself, all that was needed was a coalesce which apparently existed in sparql too.
So:
SELECT sum(COALESCE(?ukhpi_salesVolume, 0))
instead of just
SELECT sum(?ukhpi_salesVolume)

SPARQL subquery wtih xsd:dateTime

I've made one query which returns the date of the Hiroshima bombing.
SELECT ?date
WHERE {
history:the_bombing_of_hiroshima history:hasStartDate ?date .
}
Result: 1945-08-06T:00:00:00
And another query which returns all of the administration that ruled the Soviet Union.
SELECT ?administration
WHERE {
?administration history:rulesCountry "The Soviet Union"^^xsd:string .
}
Now, I'm wondering. An administration in our ontology has a "startDate" and "endDate" data property. Both containing xsd:dateTime values. I want to pass the date, received from the first query into the second query so that we can get the administration that ruled the Soviet Union during the bombing of Hiroshima.
I've tried to read some SPARQL subquery examples but none of them concern dateTime filtering (check to see if a date falls between two others) and many of them are a bit confusing to me.
I suspect the query must be something like
SELECT ?administration
WHERE {
?administration history:rulesCountry "The Soviet Union"^^xsd:string .
?administration history:hasStartDate > [RESULT FROM QUERY1 HERE] && history:hasEndDate < [RESULT FROM QUERY1 HERE]
}
I'd be very grateful for any answers or pointers towards resources and tutorials I can read to get this query to work.

SPARQL: filter on both string and integer?

I'm testing SPARQL with Protégé on this data file
https://raw.githubusercontent.com/miranda-zhang/cloud-computing-schema/master/example/sparql-generate/result/gcloud_vm.ttl
Validated the following works:
PREFIX cocoon: <https://raw.githubusercontent.com/miranda-zhang/cloud-computing-schema/master/ontology_dev/cocoon.ttl>
SELECT ?VM ?cores
WHERE {
?VM a cocoon:VM ;
cocoon:numberOfCores ?cores .
}
For example, it returns something like:
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-N1-ULTRAMEM-80-PREEMPTIBLE "80"#
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-N1-HIGHCPU-64-PREEMPTIBLE "64"#
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-N1-STANDARD-2 "2"#
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-F1-MICRO "shared"#
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-N1-HIGHCPU-8-PREEMPTIBLE "8"#
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-N1-HIGHCPU-32 "32"#
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-N1-HIGHMEM-16-PREEMPTIBLE "16"#
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-N1-STANDARD-96-PREEMPTIBLE "96"#
https://w3id.org/cocoon/data/vm/gcloud/CP-COMPUTEENGINE-VMIMAGE-N1-STANDARD-4 "4"#
I'm not sure if I can apply a filter on ?cores, I tried the following, but they returned nothing:
cocoon:numberOfCores "shared" .
Or
FILTER(?cores = "4") .
I'd also like to apply filter on ?cores (i.e. > 4 and < 8), so I have to make it an xsd:integer? But then I have to get rid of shared which is about < 1 core
Thanks AKSW, impressive knowledge about Protégé.
In the end, I changed my data type to xsd:decimal. Seems to be enough for now.

Jena StmtIterator and database

I have my model stored in a triple store(persistence).
I want to select all individuals related by some document name.
I can do it in 2 ways
1) SPARQL request:
PREFIX base:<http://example#>
select ?s2 ?p2 ?o2
where {
{?doc base:fullName <file:/c:/1.txt>; ?p1 ?o1
} UNION {
?s2 ?p2 ?o2 ;
base:documentID ?doc }
}
Question: How to create Jena's Model from the ResultSet?
2) StmtIterator stmtIterator = model.listStatements(...)
The problem with this approach that I need to use model.listStatements(...) operation for the several times :
a) get document URI by document name
b) get a list of individuals ID related to this document URI
c) finally - get a collection of individuals
I concern about performance - 3 times run model.listStatements(...) - many database requests.
Or all data are read into memory(I doubt about it) from the database during model creation:
Dataset ds = RdfStoreFactory.connectDataset(store, conn);
GraphStore graphStore = GraphStoreFactory.create(ds) ;
?
You need to back up a little bit and think more clearly about what you are trying to do. Your sparql query, once it's corrected (see below), will do a perfectly good job of producing an iterator over the resultset, which will provide you with the properties of each of the documents you're looking for. Specifically, you get one set of bindings for each of s2, p2 and o2 for each value in the resultset. That's what you ask for when you specify select ?s2 ?p2 ?o2. And it's normally what you want: usually, we select some values out of the triple store in order to process them in some way (e.g. rendering them into a list on the UI) and for that we exactly want an iterator over the results. You can have the query return you a model not a resultset, by virtue of a SPARQL construct query or SPARQL describe. However, you then have a need to iterate over the resources in the model, so you aren't much further forward (except that your model is smaller and in-memory).
Your query, incidentally, can be improved. The variables p1 and o1 make the query engine do useless work since you never use them, and there's no need for a union. Corrected, your query should be:
PREFIX base:<http://example#>
select ?s2 ?p2 ?o2
where {
?doc base:fullName <file:/c:/1.txt> .
?s2 base:documentID ?doc ;
?p2 ?o2 .
}
To execute any query, select, describe or construct, from Java see the Jena documentation.
You can efficiently achieve the same results as your query using the model API. For example, (untested):
Model m = ..... ; // your model here
String baseNS = "http://example#";
Resource fileName = m.createResource( "file:/c:/1.txt" );
// create RDF property objects for the properties we need. This can be done in
// a vocab class, or automated with schemagen
Property fullName = m.createProperty( baseNS + "fullName" );
Property documentID = m.createProperty( baseNS + "documentID" );
// find the ?doc with the given fullName
for (ResIterator i = m.listSubjectsWithProperty( fullName, fileName ); i.hasNext(); ) {
Resource doc = i.next();
// find all of the ?s2 whose documentID is ?doc
for (StmtIterator j = m.listStatements( null, documentID, doc ); j.hasNext(); ) {
Resource s2 = j.next().getSubject();
// list the properties of ?s2
for (StmtIterator k = s2.listProperties(); k.hasNext(); ) {
Statement stmt = k.next();
Property p2 = stmt.getPredicate();
RDFNode o2 = stmt.getObject();
// do something with s2 p2 o2 ...
}
}
}
Note that your schema design makes this more complex that it needs to be. If, for example, the document full name resource had a base:isFullNameOf property, then you could simply do a lookup to get the doc. Similarly, it's not clear why you need to distinguish between doc and s2: why not simply have the document properties attached to the doc resource?
Finally: no, opening a database connection does not load the entire DB into memory. However, TDB in particular does make extensive use of caching of regions of the graph in order to make queries more efficient.