GraphDB load triples from URI - sparql

I'm trying to use the following SPARQL syntax with GraphDB. I've never used it anywhere else before, so I don't have a good sense of whether it's official.
See also Using the 'GRAPH' keyword in SPARQL to fetch remote graphs
LOAD <http://raw.githubusercontent.com/evidenceontology/evidenceontology/master/eco.owl> INTO GRAPH <http://example.com/loaded>
When I try to load an RDF/XML file with an .owl extension, I generally get an error message like
Internal Server Error (#500)
White spaces are required between publicId and systemId. [line 1,
column 63]
When I try to load a turtle file with a .ttl extension, I generally get something more like
Internal Server Error (#500)
IRI included an unencoded space: '32' [line 1]
What am I doing wrong?

Related

Not full error message shown after importing RDF text snippet

I am new to GraphDB. I am trying to use SHACL validation by pasting my shapes into the text field in the "import RDF text snippet" function. The problem I have is that if there is an error not the entire error message is shown. I can see "..." at the end of the error message that is shown but the details of what has gone wrong seem to be missing.
This is what I get:
org.eclipse.rdf4j.sail.shacl.GraphDBShaclSailValidationException:
Failed SHACL validation
#prefix sh: <http://www.w3.org/ns/shacl#> .
_:node1gh8je8tbx1779741 ...
The "..." at the end of the message show that there is more to this error message (and according to the SHACL validation I can expect more details to be shown in the error message).
I have tried it in both MS Edge and Chrome so I don't think it's a browser issue. There doesn't seem to be anything to expand the message. In case it makes a difference, I use Windows 10.
The message is intentionally cut off, because the violation reports tend to be too long. For the full message you can check GraphDB main log in /logs directory.

When passing a path as flowfile attribute XMLValidator doesn't work, but when passing the exact same path in the schema directly it does

i'm fairly new with working with NiFi. We're trying to validate an xmlfile, except we need to use a different xsd depending on some value passed in the file. Extracting and routing on the name wasn't an issue, and we stored the desired filepath in an attribute (xsdFile).
However, when trying to use that attribute in the XMLValidation processor, it changes the path and gives an error. When I copy the path from the attributes and copy it to the schema, it works, so the path itself isn't wrong.
Attribute passed in flowfile:
xsdFile:
C:\Users\MYNAME\Documents\NiFi\FLOW_RESOURCES\input\validatexml\camt.053.001.02_CvW_2.xsd
XMLValidation processor properties:
Schema File: ${xsdFile}
Error:
Failed to properly initialize Processor. If still scheduled to run, NiFi will attempt to initialize and run the Processor again after the 'Administrative Yield Duration' has elapsed. Failure is due to java.io.FileNotFoundException:
Schema file not found at specified location: C:\Users\MYNAME\DOCUME~1\NiFi\NIFI-1~1.0: java.io.FileNotFoundException:
Schema file not found at specified location: C:\Users\MYNAME\DOCUME~1\NiFi\NIFI-1~1.0
java.io.FileNotFoundException: Schema file not found at specified location: C:\Users\MYNAME\DOCUME~1\NiFi\NIFI-1~1.0
Why does this not work? Is there another way to do this, or do we need to route to different XMLValidators?
Check documentation for this processor:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.ValidateXml/index.html
Schema File:
The path to the Schema file that is to be used for validation
Supports Expression Language: true
(will be evaluated using variable registry only)
So, flow file attribute can't be used for this parameter

Solr pdf index bad request

I'd like to have a simple setup of solr where I can index and search large folders of pdf/docx files. I mostly need just full text search, no need to have fields separated and the original documents do not seem to have well defined structure anyway. I follow https://lucene.apache.org/solr/quickstart.html which is straightforward, however, when I try to index my own folder with some pdf files, some files return error like:
POSTing file G1504225.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for
url: http://localhost:8983/solr/gettingstarted/update/extract?
resource.name=%2Fhome%2Fsolr%2Fsolr-6.5.1%2F..%2Ftrain_data%2FG1504225.pdf&literal.id=%2Fhome%2Fsolr%2Fsolr-6.5.1%2F..%2Ftrain_data%2FG1504225.pdf
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int
name="QTime">263</int></lst><lst name="error"><lst name="metadata"><str
name="error-class">org.apache.solr.common.SolrException</str><str
name="root-error-class">java.lang.NumberFormatException</str><str
name="error-class">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException</str><str name="root-error-class">org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException</str></lst><str name="msg">Async exception during distributed update: Error from server at http://127.0.1.1:8983/solr/gettingstarted_shard2_replica1: Bad Request
request:
http://127.0.1.1:8983/solr/gettingstarted_shard2_replica1/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=TOLEADER&distrib.from=http%3A%2F%2F127.0.1.1%3A8983%2Fsolr%2Fgettingstarted_shard1_replica1%2F&wt=javabin&version=2
Remote error message: ERROR: [doc=/home/solr/solr-6.5.1/../train_data/G1504225.pdf] Error adding field 'title'='United Nations' msg=For input string: "United Nations"</str><int name="code">400</int></lst>
</response>
SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://localhost:8983/solr/gettingstarted/update/extract?
resource.name=%2Fhome%2Fsolr%2Fsolr-6.5.1%2F..%2Ftrain_data%2FG1504225.pdf&literal.id=%2Fhome%2Fsolr%2Fsolr-6.5.1%2F..%2Ftrain_data%2FG1504225.pdf
Most of the files are fine and I can search them. Any ideas?
Solr uses Tika to extract the text from those files. Some types of files, pdf specially, are hard to parse, as it is a proprietary format and Tika is always trying to catch up edge cases etc. So it is normal that some files will throw errors. You have to expect that.
See how many instances of NumberFormatException/pdfbox are found...(pdfbox is the library Tika uses for pdf files).
If you really want to get all the text from all pdf, even the ones erroring, you can put them in a special folder, and process them again extracting the text yourself with another library, different libraries will have different results of the same pdf, so you can use the superset of the text several libraries produce. But you will have to write some glue code for this, unless Tika allows you to plug specific libraries for specific file types (not sure if it does now, it didn't do that before).

JSON Schema for FHIR false positives

I am new to JSON Schema, and am trying to validate JSON based on the HL7-FHIR schemas. Data I think should be invalid (and that the official Java-based validator says are invalid) shows up as valid.
For example, {"dog": "food"} should be invalid, because when I run the validator, I get:
> java -jar org.hl7.fhir.validator.jar bad.json -defn definitions.json.zip
.. load FHIR from definitions.json.zip
.. connect to tx server # http://tx.fhir.org/r3
(vnull-null)
.. validate
*FAILURE* validating bad.json: error:1 warn:0 info:0
Fatal # $ (line 1, col2) : Unable to find resourceType property
But if I paste the fhir.schema.json file from here into a JSON Schema validator like the one here, and evaluate {"dog": "food"}, it's valid.
It's valid even if I supply a resourceType, which I thought might cause the restrictions to kick in. It's also valid if I copy an example I expect to be valid—say, this Practitioner example—and change some of the types (set name to be a string rather than an array, for example).
I'm not sure if I'm running into a problem with the HL7-FHIR JSON Schema in particular or with JSON Schemas in general. I believe my question is different than this one because it appears that we're up to release 3.0, and so the schema I'm using is updated.

bq CLI says my JSON schema is invalid while the browser GUI says it's fine. Where am I going wrong?

I have a JSON schema:
[{"name":"timestamp","type":"integer"},{"name":"xml_id","type":"string"},{"name":"prod","type":"string"},{"name":"version","type":"string"},{"name":"distcode","type":"string"},{"name":"origcode","type":"string"},{"name":"overcode","type":"string"},{"name":"prevcode","type":"string"},{"name":"ie","type":"string"},{"name":"os","type":"string"},{"name":"payload","type":"string"},{"name":"language","type":"string"},{"name":"userid","type":"string"},{"name":"sysid","type":"string"},{"name":"loc","type":"string"},{"name":"impetus","type":"string"},{"name":"numprompts","type":"record","mode":"repeated","fields":[{"name":"type","type":"string"},{"name":"count","type":"integer"}]},{"name":"rcode","type":"record","mode":"repeated","fields":[{"name":"offer","type":"string"},{"name":"code","type":"integer"}]},{"name":"bw","type":"string"},{"name":"pkg_id","type":"string"},{"name":"cpath","type":"string"},{"name":"rsrc","type":"string"},{"name":"pcode","type":"string"},{"name":"opage","type":"string"},{"name":"action","type":"string"},{"name":"value","type":"string"},{"name":"other","type":"record","mode":"repeated","fields":[{"name":"param","type":"string"},{"name":"value","type":"string"}]}]
(http://jsoneditoronline.org/ for pretty print)
When loading through the browser GUI the schema is accepted as valid. The cli throws the following error:
BigQuery error in load operation: Invalid schema entry: "fields":[{"name":"type"
Is there something wrong with my schema as specified?
If you are passing the schema as json, you should write it to a file and pass the file name as the schema parameter. Passing the schema inline on the command line is only allowed for simple flat schemas.