command line tdbquery with text index - sparql

I trying to run a text search query with Jena via command line.
tdbquery --desc textsearch.ttl --query search.rq
The query return empty results with the messages:
17:23:46 WARN TextQueryPF :: Failed to find the text index : tried context and as a text-enabled dataset
17:23:46 WARN TextQueryPF :: No text index - no text search performed
My assembler file is:
#prefix : <http://localhost/jena_example/#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
#prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
#prefix text: <http://jena.apache.org/text#> .
## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
## Initialize text query
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset rdfs:subClassOf ja:RDFDataset .
# Lucene index
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
# Solr index
text:TextIndexSolr rdfs:subClassOf text:TextIndex .
## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.
:text_dataset rdf:type text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
.
# A TDB datset used for RDF storage
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "DB2" ;
tdb:unionDefaultGraph true ; # Optional
.
# Text index description
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene2> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate rdfs:label ]
) .
My query is :
PREFIX text: <http://jena.apache.org/text#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s
{ ?s text:query 'London' ;
rdfs:label ?label
}
I would like to know if I miss any configuration or this query can only be done inside fuseki.

First, you can do text search outside Fuseki. The example you took the code from shows how to do this using plain Jena Dataset in Java.
Second, Andy Seaborne on Jena mailing list suggests the following:
SELECT (count(*) AS ?C) { ?x text:query .... }
in order to "touch" the index before running the real queries.

Related

SHACL: Require sh:property to be a URI

I was wondering if there's a way to specify that a given sh:property is expected to have a value of a URI without any particular class.
In the example SHACL below, the property meta:value would only allow URIs, although I don't know of a way to represent it in SHACL.
Example SHACL
#prefix sh: <http://www.w3.org/ns/shacl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix meta: <metadata#> .
#prefix meta_sh: <metadata/shacl#> .
meta_sh:Entry
a sh:NodeShape;
sh:targetClass meta:Entry;
sh:property [
sh:path meta:value;
# Value expected to be a URI
sh:minCount 1; # Required; 1 or more
];
sh:property [
sh:path meta:time;
sh:datatype xsd:dateTime;
sh:minCount 1; sh:maxCount 1; # Required; 1
];
.
Example Data
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix meta: <metadata#> .
#prefix data: <data#> .
data:Entry_001
a meta:Entry;
meta:value data:ProductListing_459; # URI
meta:value data:RentalListing_934; # URI
meta:time "2022-06-15T06:20:31Z"^^xsd:dateTime;
.
this should work:
#prefix sh: <http://www.w3.org/ns/shacl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix meta: <metadata#> .
#prefix meta_sh: <metadata/shacl#> .
meta_sh:Entry
a sh:NodeShape;
sh:targetClass meta:Entry;
sh:nodeKind sh:IRI ;
sh:property [
sh:path meta:value;
# Value expected to be a URI
sh:minCount 1; # Required; 1 or more
];
sh:property [
sh:path meta:time;
sh:datatype xsd:dateTime;
sh:minCount 1; sh:maxCount 1; # Required; 1
];
.
Yes, you are right, the sh:nodeKind has to be part of the property shape.
The reason why it still not worked: the way you defined the prefixes. They were no full URIs so the SHACL processor added some parts (maybe the disk location of the file, depends). The result: the meta in the shapes never matched the meta in the data.
This works:
SHACL
#prefix sh: <http://www.w3.org/ns/shacl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix meta: <https://example.org/#> .
#prefix meta_sh: <metadata/shacl#> .
meta_sh:Entry
a sh:NodeShape;
sh:targetClass meta:Entry;
sh:property [
sh:path meta:value;
sh:nodeKind sh:IRI ;
sh:minCount 1; # Required; 1 or more
];
.
Data
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix meta: <https://example.org/#> .
#prefix data: <data#> .
data:Entry_001
a meta:Entry;
meta:value data:RentalListing_934 ; # URI
meta:time "2022-06-15T06:20:31Z"^^xsd:dateTime;
.
data:Entry_002
a meta:Entry;
meta:value "this is not an uri" ; # No Uri
meta:time "2022-06-15T06:20:31Z"^^xsd:dateTime;
.

SPARQL selecting unittext (blank nodes)

I recently started working with Linked Data and SPARQL.
I've a dataset which contains unittext, indicating what kind of unit the property has (meters, kilograms and so on).
The unit is a values which is inserted on the relationship between object and its quantitative property.
In my RDF dataset the units are included in a blank node and indicated by https://schema.org/unitText.
An example of the data set is included below.
], [
a owl:Restriction ;
owl:minCardinality "0"^^xsd:nonNegativeInteger ;
owl:onProperty <https://someuri> ;
ns1:unitText "kg"
How can I select this property?
The query so far is:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?label ?aspect ?datatype where {
?s rdfs:subClassOf/owl:onProperty ?aspect .
?aspect rdf:type owl:ObjectProperty.
?aspect skos:prefLabel ?label .
?aspect rdfs:range ?datatype .
FILTER EXISTS{ ?aspect rdfs:range ?datatype. }
VALUES ?datatype {
xsd:string
xsd:gYear
xsd:boolean
xsd:decimal
xsd:integer
xsd:date
}
}
The RDF dataset looks actually like this:
rdfs:subClassOf [ a owl:Restriction ;
owl:minCardinality "0"^^xsd:nonNegativeInteger ;
owl:onProperty <someuri> ;
<https://schema.org/unitText> "kg"
] ;

Inferencing in Protege does not infer types based on `subClassOf`

We are developing an ontology, where we need to infer the type of each node, based on some mappings defined in # Test mapping file section. We tried rdfs:subClassOf, as in the below snippet, but it doesn't infer types. Ideally we want to infer Person 1 and Person 2 to NodeTypePerson, also sydney and canberra to NodeTypeCity.
I tried owl:equivalentClass instead but no luck. rdfs:range and rdfs:domain infer types as expected, but having trouble inferring with rdfs:subClassOf. Any advice is highly appreciated.
UPDATE:
owl:equivalentClass works in Protege. (If infers the type). But can't we use rdfs:subClassOf similarly? Actually I want a way to get this done with RDFS inferencing, where owl:equivalentClass doesn't work obviously. Is there any other RDFS property we can use here?
# baseURI: http://www.Test-app.com/ns
#prefix : <http://www.Test-app.com/ns#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix acl: <http://www.w3.org/ns/auth/acl#> .
#prefix cc: <http://creativecommons.org/ns#> .
#prefix cert: <http://www.w3.org/ns/auth/cert#> .
#prefix csvw: <http://www.w3.org/ns/csvw#> .
#prefix dc: <http://purl.org/dc/terms/> .
#prefix dcam: <http://purl.org/dc/dcam/> .
#prefix dcat: <http://www.w3.org/ns/dcat#> .
#prefix dctype: <http://purl.org/dc/dcmitype/> .
#prefix foaf: <http://xmlns.com/foaf/0.1/> .
#prefix ldp: <http://www.w3.org/ns/ldp#> .
#prefix posixstat: <http://www.w3.org/ns/posix/stat#> .
#prefix schema: <https://schema.org/> .
#prefix shacl: <http://www.w3.org/ns/shacl#> .
#prefix skos: <http://www.w3.org/2004/02/skos/core#> .
#prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
#prefix solid: <http://www.w3.org/ns/solid/terms#> .
#prefix swapdoc: <http://www.w3.org/2000/10/swap/pim/doc#> .
#prefix ui: <http://www.w3.org/ns/ui#> .
#prefix vann: <http://purl.org/vocab/vann/> .
#prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
#prefix vs: <http://www.w3.org/2003/06/sw-vocab-status/ns#> .
#prefix ws: <http://www.w3.org/ns/pim/space#> .
#base <http://www.Test-app.com/ns> .
<http://www.Test-app.com/ns>
rdf:type owl:Ontology ;
dc:title "Test Ontology"#en ;
owl:versionIRI <http://www.Test-app.com/ns/0.1> .
# Test mapping file
#prefix test: <http://www.Test-app.com/ns#> .
:Person
a rdfs:Class ;
owl:equivalentClass :NodeTypePerson .
:locatedNear
a rdf:Property ;
rdfs:subClassOf :NodeCustomAttribute ;
rdfs:label "Located Near" .
:Location
a rdfs:Class ;
rdfs:label "Location" ;
rdfs:subClassOf :NodeTypeCity ;
rdfs:comment "This represents a geolocation." .
:City
a rdfs:Class ;
rdfs:label "City" ;
rdfs:comment "This represents a city." ;
rdfs:subClassOf :Location .
# Some graph instances data
:sydney
a :City ;
rdfs:label "Sydney" ;
rdfs:comment "Australia's largest city." .
:canberra
a :City ;
rdfs:label "Canberra" ;
rdfs:comment "Australia's national capital." .
:person1
a :Person ;
rdfs:label "Person 1" ;
rdfs:comment "First vertex." ;
:locatedNear :sydney .
:person2
a :Person ;
rdfs:label "Person 2" ;
rdfs:comment "Second vertex." ;
:locatedNear :canberra .
# Types
:NodeTypePerson
rdf:type rdfs:Class ;
rdfs:label "NodeTypePerson" ;
rdfs:comment "" ;
vs:term_status "stable" ;
rdfs:isDefinedBy <http://www.Test-app.com/ns> .
:NodeTypeCity
rdf:type rdfs:Class ;
rdfs:label "NodeTypeCity" ;
rdfs:comment "" ;
vs:term_status "stable" ;
rdfs:isDefinedBy <http://www.Test-app.com/ns> .```

sparql and query in group of multiple predicates

I did not get the right sentence to ask the question.
I would like to explain my problem using a sample data.
let's say the triples are as below.
#prefix skos: <http://www.w3.org/2004/02/skos/core#> .
#prefix xs: <http://www.w3.org/2001/XMLSchema#> .
#prefix p0: <http://www.mlacustom.com#> .
#prefix p2: <http://www.mla.com/term/> .
_:bnode7021016689601753065 p0:lastModifiedDateTime "2018-09-14T12:55:38"^^<xsd:dateTime> ;
p0:lastModifiedUser "admin"^^xs:string .
<http://www.mla.com/name/4204078359> p0:hasClassingFacet "http://www.mla.com/facet/RA"^^xs:string ;
p0:type "Normal"^^xs:string ;
p0:classification _:bnode3452184423513029143 ,
_:bnode6827572371999795686 ;
p0:recordType "Name"^^xs:string ;
p0:recordNumber "4204078359"^^xs:string ;
p0:stdDescriptor "classification111111"^^xs:string ;
p0:establishedBy "admin"^^xs:string ;
skos:prefLabel "classification111111"^^xs:string ;
p0:createdBy "admin"^^xs:string ;
a skos:Concept ;
p0:createdDate "2018-09-14T12:55:38"^^<xsd:dateTime> ;
p0:establishedDate "2018-09-14T12:55:38"^^<xsd:dateTime> ;
p0:hasRAsubFacet "http://www.mla.com/subfacet/classing-subject-authors"^^xs:string ;
p0:lastModifiedDate "2018-09-14T12:55:38"^^<xsd:dateTime> ;
p0:lastModifiedDetails _:bnode7021016689601753065 ;
p0:isProblematic "N,N"^^xs:string ;
p0:lastModifiedBy "admin"^^xs:string ;
p0:status "established"^^xs:string .
_:bnode3452184423513029143 p0:literature p2:1513 ;
p0:timePeriod p2:1005 ;
p0:language p2:3199 .
_:bnode6827572371999795686 p0:literature p2:11307 ;
p0:timePeriod p2:1009 ;
p0:language p2:31 .
please have a look at the p0:classification it has two blank nodes and both the blank nodes has triples with p0:literature, p0:timePeriod, p0:language
Now I want to write a SPARQL query where
(
(p0:literature is p2:1513 AND p0:timePeriod is p2:1005) AND
(p0:literature is p2:11307 AND p0:timePeriod is p2:1009)
)
As per the above scenario it should return me the http://www.mla.com/name/4204078359 subject
Classification can have any number of blank nodes.
Here is one of the solutions. It is possible to find many different queries. This one is explicit.
SELECT ?s WHERE {
?s p0:classification ?o1 .
?s p0:classification ?o2 .
?o1 p0:literature p2:1513 ;
p0:timePeriod p2:1005 .
?o2 p0:literature p2:11307 ;
p0:timePeriod p2:1009 .
}

sparql query: get several values from a List

My query returns result dependent on which one of two sparql OPTIONS clauses come first for a List of Images. Although using Jena ARQ is not an option at this point, and I'd like to solve this with a pure SPARQL query, still I'd like to know how it could be solved with Jena as well.
My data presentation is attached below, the data may contain a list of images. My image representation is also below. I'm attaching my query as well.
The sprql query has two variables urlX, and urlY declared in the 2 OPTIONS blocks, if a List of images exists. Depending on which of the OPTIONS comes first, I get the value for that one variable, while the other one doesn't get reached. It seems the issue has to do with using OPTIONS clause. I'm not sure what else I can try instead, I'm far from being an expert on sparql queries. I want the query to do the following: if a collection of images is present, I want to see if both image sizes (dc:conformsTo) are present and get both urlX and urlY values, or get the ones that exist. Much appreciate your time.
My data representation:
#prefix dc: <http://purl.org/dc/terms/> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix lews: <http://lews.com/content/> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
lews:26331340 lews:name "the Human Good Luck Charm is Back"^^xsd:token ;
dc:created "2014-10-20T17:14:55.357-07:00"^^xsd:dateTime ;
dc:identifier "26331340"^^xsd:int ;
dc:modified "2016-08-04T13:43:00.897-07:00"^^xsd:dateTime ;
dc:title "the Human Good Luck Charm is Back" ;
dc:hasPart <http://lews.com/content/26331340#Images> ;
dc:abstract "As the World Series gets underway..." ;
dc:description "The super fan who rooted for the Royals is back to boost morale." ;
dc:subject "hoping for a World Series victory".
<http://lews.com/content/26331340#Images> dc:identifier "Images"^^xsd:token ;
rdf:first lews:26331375 ;
rdf:rest _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331376 ;
rdf:li <http://lews.com/content/26331340#Images> , _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331376 , _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331377 , _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331378 , _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331379 , _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331380 ;
a rdf:List .
_:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331376 rdf:first lews:26331376 ;
rdf:rest _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331377 .
_:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331377 rdf:first lews:26331377 ;
rdf:rest _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331378 .
_:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331378 rdf:first lews:26331378 ;
rdf:rest _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331379 .
_:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331379 rdf:first lews:26331379 ;
rdf:rest _:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331380 .
_:genid-15e530b0195547d9ac3f8e5e6785a747-x26331340Images26331380 rdf:first lews:26331380 ;
rdf:rest rdf:nil .
My image representation:
#prefix dc: <http://purl.org/dc/terms/> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix lews: <http://abcnews.com/content/> .
#prefix mrss: <http://search.yahoo.com/mrss/> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
lews:26331376 lews:name "141020_wn_donvan0_704x396.jpg"^^xsd:token ;
lews:section "wnt"^^xsd:token ;
lews:type "Image"^^xsd:token ;
dc:conformsTo "704x396"^^xsd:token ;
dc:created "2014-10-20T17:15:09.637-07:00"^^xsd:dateTime ;
dc:hasFormat <http://lews.go.com/images/WNT/141020_wn_donvan0_704x396.jpg> ;
dc:identifier "26331376"^^xsd:int ;
dc:isPartOf <http://lews.go.com/WNT> ;
dc:modified "2014-10-20T17:15:09.947-07:00"^^xsd:dateTime ;
dc:type "StillImage"^^xsd:token ;
mrss:height "396"^^xsd:int ;
mrss:width "704"^^xsd:int ;
xsd:date "2014-10-20"^^xsd:date ;
xsd:gMonthDay "--10-20"^^xsd:gMonthDay ;
xsd:gYear "2014"^^xsd:gYear ;
xsd:gYearMonth "2014-10"^^xsd:gYearMonth .
My query:
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX mrss: <http://search.yahoo.com/mrss/>
PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX lews: <http://abcnews.com/content/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?id ?title ?description ?urlX ?urlY ?section ?imgName
?subject dc:identifier ?id.
OPTIONAL {?subject dc:title ?title.}
OPTIONAL {?subject dc:description ?description.}
OPTIONAL {?subject dc:isPartOf ?section.}
OPTIONAL {
?subject dc:hasPart ?imageCol.
?imageCol dc:identifier "Images"^^xsd:token.
OPTIONAL{
?imageCol rdf:li ?bnode.
?bnode rdf:first ?image.
?image lews:name ?imgName;
dc:conformsTo "4x3";
dc:hasFormat ?urlX.
}
OPTIONAL{
?imageCol rdf:li ?bnode.
?bnode rdf:first ?image.
?image lews:name ?imgName;
dc:conformsTo "16x9";
dc:hasFormat ?urlY.
}
}
}
LIMIT ${limit}
If I understood correctly what it is you want, you just need to group the optionals differently:
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX mrss: <http://search.yahoo.com/mrss/>
PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX lews: <http://abcnews.com/content/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?id ?title ?description ?urlX ?urlY ?section ?imgName {
?subject dc:identifier ?id.
OPTIONAL { ?subject dc:title ?title. }
OPTIONAL { ?subject dc:description ?description. }
OPTIONAL { ?subject dc:isPartOf ?section. }
OPTIONAL {
?subject dc:hasPart ?imageCol.
?imageCol dc:identifier "Images"^^xsd:token.
OPTIONAL {
?imageCol rdf:li ?bnode.
?bnode rdf:first ?image.
?image lews:name ?imgName;
# Here we optionally bind ?urlX and/or ?urlY
OPTIONAL {
?image dc:conformsTo "4x3";
dc:hasFormat ?urlX.
}
OPTIONAL {
?image dc:conformsTo "16x9";
dc:hasFormat ?urlY.
}
}
}
}