Saving SPARQL query results from AWS neptune to CSV files - amazon-neptune

Can anyone please let me know how to save the results of SPARQL queries in AWS Neptune to CSV file. I am using sagemaker notebook to connect to database cluster. Please find the below query. Any help would be greatly appreciated.
%%sparql
PREFIX mo: <>
SELECT (strafter(str(?s), '#') as ?sample)WHERE { ?s mo:belongs_to_project ?o.
FILTER(regex(str(?o), "PROJECT_00001"))}

The results from a query can be saved to a Python variable using:
%%sparql --store-to result
Then in another cell you could write a little Python code that takes the result (which will be in JSON) and creates a CSV containing whatever data you need from the result using the Python CSV helper classes and methods (https://docs.python.org/3/library/csv.html).
UPDATED: to add that if you just want the whole result as a CSV file you can achieve that by running a curl command from a cell using the %%bash magic. Here is an example:
%%bash
curl -X POST --data-binary 'query=SELECT ?s ?p ?o WHERE {?s ?p ?o} LIMIT 10' -H "Accept: text/csv" https://{cluster endpoint}:8182/sparql > results.csv

Related

Retrieving data from Neptune DB using SPARQL queries

I am trying to retrieve the data from Neptune DB by using SPARQL queries. I connected to the EC2 instance which has same VPC as Neptune from local Jupyter Notebook. But the query is not retrieving any data and stdout is empty, I'm not sure where I am going wrong. Any help would be greatly appreciated. Please find the query below.
stdin, stdout, stderr = ssh.exec_command(
' curl https://.cluster-.us-east1.neptune.amazonaws.com:8182/sparql \-d "query=PREFIX mol: <http://www.semanticweb.org/25#>\
SELECT * WHERE {?s ?p ?o } LIMIT 1" \ -H "Accept: text/csv"')
Thank you in Advance.
You maybe running into issues with cURL. cURL has issues with multiple line inputs (such as a SPARQL query). Best to use the following format:
curl 'https://cluster.cluster.us-east-2.neptune.amazonaws.com:8182/sparql' \
-H "Accept: text/csv" \
--data-urlencode query='
PREFIX mol: <http://www.semanticweb.org/25#>
SELECT * WHERE { ?s ?p ?o } LIMIT 1'

When I running gh-rdf3x engine's commend rdf3xquery It prompt:parse error: unknown prefix 'http'

I try to use gh-rdf3x engine to do some SPARQL search, so I use LUBM-100 dataset and then I use RDF2RDF tool to make all .owl file into a test.nt file.
then I use gh-rdf3x command
./rdf3xload dataDB test.nt
to build a dataDB file. At last, I want to do some search so I use LUBM SPARQL#1 as test.sparql.
Then I do the command
./rdf3xquery dataDB test.sparql
It prompts
parse error: unknown prefix 'http'
I do all the thing as described in the GH-RDF3X Wiki, so I don't know why it prompt that.
And the message may be from file gh-rdf3x/cts/parser/TurtleParser.cpp
Thank you for your help.
I guess you're using the LUBM query from this file which unfortunately contains several syntax errors.
The first query is missing the angle brackets < and > which must be put around full URIs:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
SELECT ?X WHERE {
?X rdf:type ub:GraduateStudent .
?X ub:takesCourse <http://www.Department0.University0.edu/GraduateCourse0>
}

Virtuoso ISQL Result Dump Format

I am running following query on Virtuoso isql.
SPARQL
CONSTRUCT
{
?infectee ?getInfectedBy ?infector
}
FROM <http://ndssl.bi.vt.edu/chicago/>
WHERE
{
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ndssl.bi.vt.edu/chicago/vocab/dendrogram>.
?s <http://ndssl.bi.vt.edu/chicago/vocab/dendrogram_infectee_pid> ?infectee.
?s <http://ndssl.bi.vt.edu/chicago/vocab/dendrogram_infector_pid> ?infector.
?s <http://ndssl.bi.vt.edu/chicago/vocab/dendrogram_iteration> '0'^^xsd:decimal.
BIND (iri('http://ndssl.bi.vt.edu/chicago/vocab/getInfectedBy') as ?getInfectedBy)
};
I want to dump result in "N-Triples" format. How can I do it in isql?
Answered on the Virtuoso Users mailing list where the question was also asked...
Dumping results in various formats can be done by using the
define output:format "{XX}"
pragma, so in your case it would be:
SQL> sparql define output:format "TURTLE" CONSTRUCT ...
Other possible formats are:
NICE_TTL
RDF_XML
etc.
When using the ISQL client to fetch long texts, use the set blobs on; directive to avoid receiving a data truncated warning.
i.e.:
SQL> set blobs on;
SQL> sparql define output:format ...
For CONSTRUCT, the supported formats are:
TRIG, TTL, JSON, JSON;TALIS, SOAP, RDF/XML, NT, RDFA;XHTML, JSON;RES, HTML;MICRODATA, HTML, JS, ATOM;XML, JSON;ODATA, XML, CXML;QRCODE, CXML, HTML;UL, HTML;TR, JSON;LD, CSV, TSV, NICE_TTL, HTML;NICE_MICRODATA, HTML;SCRIPT_LD_JSON, HTML;SCRIPT_TTL, HTML;NICE_TTL
Documentation links:
Pragmas to control the type of the result
List of supported formats
Examples:
Example Dump arbitrary query result as N-Triples
Controlling SPARQL Output Data Types
To get the result into a local file, the following should work:
insert the data into XX.ttl local file:
isql host:port dba pwd exec="set blobs on; sparql define output:format '"TURTLE"' construct {...} from <....> where {....}" > XX.ttl
trim the first 9 lines so to have as content only the triples:
tail -n +9 XX.ttl > XX_new.ttl

query a fuseki server using python (or something)

I'm trying to issue a complicated query against a fuseki server that I'm running locally through a browser but it keeps crashing- is it possible to do it through a python script? If so- how?
You can use any suitable command line tool, for example curl:
curl http://localhost:3030/your_service/sparql --data 'query=ASK { ?s ?p ?o . }'
If you want to use Python specifically, you can use SPARQLWrapper, or just the Requests package.
Example using Requests:
import requests
response = requests.post('http://localhost:3030/your_service/sparql',
data={'query': 'ASK { ?s ?p ?o . }'})
print(response.json())
./s-query --service=http://localhost:3030/myDataset/query --query=/home/matthias/EIS/EDSA/27/18.05/queryFile.rq
With the above command it can also work.
Follow the ideas from the SOH - SPARQL over HTTP page, i.e.
SOH SPARQL Query
s-query --service=endpointURL 'query string'
s-query --service=endpointURL --query=queryFile.rq

is there a way where I can write my RDF data DIRECTLY and query it via SPARQL

I'm practicing some SPARQL queries and features, my problem is that whenever I get new RDF data, I go to protoge, build classes and relationships and instances that accompany to these RDF, then explore these data, then import it to a dataset in Fuseki, then I query it.
and if I make any mistake in my data, I'd need to repeat the whole process.
It's becoming tedious and time consuming, that's why I badly need to ask you if there's a place (tool or a fuseki support plugin ... or anything ..) where I can write my simple rdf data directly and then query it from my fuseki.
that data I want to write for example is really simple, it's like this:
#prefix dei: <http://www.warsaw.pl/ > .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
#prefix foaf: <http://xmlns.com/foaf/0.1/> .
dei:~rcardish/
      rdf:type foaf:Person ;
      foaf:name “Rafa Cardish" ;
foaf:member <http://www.warsaw.pl> ;
      rdfs:SeeAlso <http://www.linkedin.com/in/rafacardish>
but even that simple and small rdf takes me time in my current way.
Your help is appreciated.
Update 1
Now I can do this to load my data into fuseki
./fuseki-server --file=/Users/bla bla bla/rdfData.rdf /testdataset
and this is the data
#prefix dc: <http://purl.org/dc/elements/1.1/> .
#prefix : <http://example.org/book/> .
#prefix ns: <http://example.org/ns#> .
:book1 dc:title "SPARQL Tutorial" .
:book1 ns:price 42 .
:book1 ns:discount 0.2 .
:book2 dc:title "The Semantic Web" .
:book2 ns:price 23 .
:book2 ns:discount 0.25 .
but I don't know how to query it because when i run Fuseki, i go to query and I have to select a dataset but here there is no dataset shown in the drop down
Update 2
Now it's working, my mistake is that the file extension was .ttl.rdf, when I make it .ttl alone it works
You can use command line tools (e.g. Redland tools) or write a simple script to load RDF and run SPARQL queries over it (e.g. using RDFlib):
1) Using Redland: roqet query.rq -D data.ttl -r json
Runnit it with a query for ?s dc:title ?title using your example data returns:
"bindings" : [
{
"s" : { "type": "uri", "value": "http://example.org/book/book1" },
"title" : { "type": "literal", "value": "SPARQL Tutorial" }
},
{
"s" : { "type": "uri", "value": "http://example.org/book/book2" },
"title" : { "type": "literal", "value": "The Semantic Web" }
}
]
JSON is one of result formats but you can also get CSV, etc.
See also: Command line Semantic Web with Redland
2) Using RDFlib to run SPARQL queries is also quite a compact way to do this (in Python):
import rdflib
g = rdflib.Graph()
g.parse("data.ttl", format="n3")
qres = g.query(
"""SELECT DISTINCT *
WHERE {
?s dc:title ?title .
}""")
for row in qres:
print("%s : %s" % row)
Results:
http://example.org/book/book1 : SPARQL Tutorial
http://example.org/book/book2 : The Semantic Web
You can use Jena's command line tools to run SPARQL queries easily. In the Jena distribution, there's a sparql executable with --query and --data flags. Once you have your RDF file (if you're writing by hand, I'd suggest using the Turtle/N3 serialization and giving your file the .ttl or .n3 suffix), and then it's easy to run a query. For instance, when I have a data.n3 file like this:
#prefix : <http://stackoverflow.com/a/35854800/1281433/>
<http://stackoverflow.com/users/1281433/> :hasUserName "Joshua Taylor" .
and a query.rq like this:
prefix : <http://stackoverflow.com/a/35854800/1281433/>
select ?user where {
?user :hasUserName "Joshua Taylor"
}
Then I can get output like this:
$ sparql --query query.rq --data data.n3
---------------------------------------------
| user |
=============================================
| <http://stackoverflow.com/users/1281433/> |
---------------------------------------------