Is there a way to sort SPARQL query results by relevance score in MarkLogic 8? - sparql

We're running SPARQL queries on some clinical ontology data in our MarkLogic server. Our queries look like the following:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX cts: <http://marklogic.com/cts#>
SELECT *
FROM <http://example/ontologies/snomedct>
WHERE {
?s rdfs:label ?o .
FILTER cts:contains(?o, cts:word-query("Smoke*", "wildcarded"))
}
LIMIT 10
We expected to get sorted results based off of relevance score, but instead they seemed to be in some random order. Tried many ways with the query but nothing worked. After some research we found this statement in the MarkLogic docs:
When understanding the order an expression returns in, there are two
main rules to consider:
cts:search expressions always return in relevance order (the most relevant to the least relevant).
XPath expressions always return in document order.
Does this mean that cts:contains is a XPath expression that always return in document order? If that's the case, how can we construct a SPARQL query that returns in relevance order?
Thanks,
Kevin

In the example you have, the language you are using is SPARQL - with a fragment filter of the cts:contains.
IN this case, the cts:contains is only useful in isolating fragment IDs that match - thus filtering the candidate documents used in the SPARQL query. Therefore, I do not believe that the the cts relevance is taken into account.
However, you could possibly get results you are looking for in a different way: Do an actual cts:search on the documents in question - then filter them using a cts:triple-range-query.
https://docs.marklogic.com/cts:triple-range-query

Related

Referencing results from a SPARQL query in a subsequent SPARQL query

We would like to use SPARQL in the following scenario:
Execute SPARQL query, e.g.
PREFIX brick: https://brickschema.org/schema/1.1/Brick#
SELECT ?ahu WHERE { ?ahu rdf:type brick:AHU }
Iterate over SPARQL query results and filter the results based on data from another source than RDF database, e.g. "filter out any ahu that do not have valid metadata a SQL database"
Execute another SPARQL query for each filtered result in the previous step, e.g.
PREFIX brick: https://brickschema.org/schema/1.1/Brick#
SELECT ?myZone WHERE {
?myZone rdf:type brick:HVAC_Zone .
"a particular ahu from previous steps" brick:feeds ?myZone}
Both SPARQL queries cannot be expressed as one SPARQL query since interaction with another data source is needed. How shall we design query in step 3 such that it "points" to the particular triplet (or SPARQL query result)?

How to determine whether two SPARQL queries are identical using Python?

When using SPARQL to query RDF dataset, the same query can be written in many different ways. For example, sparql queries are always permutation-invariant with respect to some clauses inside it. Also, we can rename the variables inside a sparql query. But how can we identify those identical SPARQL queries? Ideally, there should be a python package that can parse a sparql query (i.e., a string object) into a query object, and different strings share the same underlying query are parsed into the same object, then we can simply compare the parsed query objects to determine whether two sparql queries are identical. Is there any tool like this (seems prepareQuery() in rdflib doesn't work in this way)? If not, then what should I do?
Semantically identical queries example:
SELECT ?x WHERE { ?x foaf:haha ?k .\n ?person foaf:knows ?x .}
SELECT ?s WHERE { ?person foaf:knows ?s .\n ?s foaf:haha ?k .}
The paper "Generating SPARQL Query Containment Benchmarks
using the SQCFramework" by Muhammad Seleem et al., mentions "SPARQL query containment solvers" where
Query containment is the problem of deciding if the result set of a query Q1 is included
in the result set of another query Q2
If you use such a solver to test whether the result set of Q1 is a subset of Q2 and vice versa, you have established that they are semantically identical.
As for your "off-the-shelf tool": the former paper mentions that those are tested in another paper "Evaluating and benchmarking sparql query containment solvers." by M.W. Chekol et al..
As for the complexity and computability, the latter paper mentions:
The query containment problem for full SPARQL is undecidable [15, 1].
Hence, it is necessary to reduce SPARQL in order to consider it. A
double exponential upper bound has been proven for the containment and
equivalence problems of SPARQL queries without OPTIONAL , FILTER and
under set semantics [7].
However, query containment in both directions is only one way to determine identity of queries. I am unaware whether there is a proof of a better complexity/computability for query identity than for query containment (or a proof on the contrary).

Display all fields in Wikidata Query Service

Wikidata provides query browser at https://query.wikidata.org
I want to display films all fields. I tried with using * but its not working. Does anybody know how to display all fields of the data for Films?
To work with SPARQL is necessary to understand some concepts, as #AKSW said in the comments of the question. If you don't understand the meaning of ?film ?p ?o. This is called triple¹ and is composed by subject-predicate-object. E. g., in the case of the films, it could be: x is a film. This is what you are querying in the Wikidata Query Service (WDQS) when you use ?film wdt:P31 wd:Q11424.
I think it isn't possible to display all the property-values of an item. In addition it probably could cause a timeout because there is many statements of many items.
If you want to check the property-values of all the films in Wikidata I think an option might be you write or find a script to extract the items with P31-Q11424 (instance of films). For that, the accessing data section could be useful (e. g. with pywikibot you could query and extract what you want).
If you are interested in SPARQL and WDQS I recommend you to read some help resources:
Wikidata Query Service Help, specifically the SPARQL tutorial.
Query examples (read another queries is how I began to learn).
SPARQL 1.1 Query Language specification.
RDF Dump Format (because read about the ontology of Wikidata could help to understand the concepts).
Edit
When I answer it I wrote triplestore and linked it to its respective page in the Wikipedia in English, but after the comment of #AKSW I consider I was wrong because the triplestore is the concept which is used to refer to the storage and retrieval of triple or semantic triple, "a set of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions" (from Semantic triple page in Wikipedia in English).

SPARQL query returns no results when a property path is present

The following query returns some results which have skos:broader set as category:History
select ?subject
where
{
?subject skos:broader category:History .
}
However replacing skos:broader with skos:broader+ or skos:broader* returns no results. Why is this? I would expect ethier to fetch at least the results returned in the first query.
I'm using the SPARQL front end here: http://dbpedia.org/sparql
Virtuoso (the endpoint that DBpedia uses) has some idiosyncrasies, supports some non-standard syntax (which often leads people to wonder why a query that worked on DBpedia doesn't work with other libraries), and (I think) doesn't support all of SPARQL 1.1. This may be a case where you've run into some internal limitations. You can approximate the results that you want with a query like the following, though:
select ?category { ?category skos:broader{,7} category:History }
This only follows paths of length seven or less. The {m,n} notation for property paths isn't part of SPARQL 1.1, but was in early drafts, and Virtuoso supports it. It is convenient for limiting the resources used in answer a query, and this is a good use case for it.

Different SPARQL query engines give differing results for DESCRIBE Query

I tried one SPARQL query in two different engines:
Protege 4.3 - SPARQL query tab
Jena 2.11.0
While the query is the same the results returned by these two tools are different.
I tried a DESCRIBE query like the following:
DESCRIBE ?x
WHERE { ?x :someproperty "somevalue"}
Results from protege give me tuples that take ?x as subject/object; while the ones from jena are that take ?x as subject only.
My questions are:
Is the syntax of SPARQL uniform?
If I want DESCRIBE to work as in protege, what should I do in Jena?
To answer your first question yes the SPARQL syntax is uniform since you've used the same query in both tools. However what I think you are actually asking is should the results for the two tools be different or not? i.e. are the semantics of SPARQL uniform
In the case of DESCRIBE then yes the results are explicitly allowed to be different by the SPARQL specification i.e. no the semantics of SPARQL are not uniform but this is only in the case of DESCRIBE.
See Section 16.4 DESCRIBE (Informative) of the SPARQL Specification which states the following:
The query pattern is used to create a result set. The DESCRIBE form
takes each of the resources identified in a solution, together with
any resources directly named by IRI, and assembles a single RDF graph
by taking a "description" which can come from any information
available including the target RDF Dataset. The description is
determined by the query service
The important part of this is the last couple of sentences that say the description is determined by the query service. This means that both Protege's and Jena's answers are correct since they are allowed to choose how they form the description.
Changing Jena DESCRIBE handling
To answer the second part of your question you can change how Jena processes DESCRIBE queries by implementing a custom DescribeHandler and an associated DescribeHandlerFactory. You then need to register your factory like so:
DescribeHandlerRegistry.get().set(new YourDescribeHandlerFactory());