Fetch data from Factbook offline - sparql

Since Factbook's SPARQL endpoint is down, I downloaded the zip file of their data. I do not understand how can i fire SPARQL queries at it. Any idea as to how it can be achieved? The unzipped version doesn't have a single RDF file so Fuseki is giving me an error

Related

airbyte ETL ,connection between http API source and big query

i have a task in hand, where I am supposed to create python based HTTP API connector for airbyte. connector will return a response which will contain some links of zip files.
each zip file contains csv file, which is supposed to be uploaded to the bigquery
now I have made the connector which is returning the URL of the zip file.
The main question is how to send the underlying csv file to the bigquery ,
i can for sure unzip or even read the csv file in the python connector, but i am stuck on the part of sending the same to the bigquery.
p.s if you guys can tell me even about sending the CSV to google cloud storage, that will be awesome too
When you are building an Airbyte source connector with the CDK your connector code must output records that will be sent to the destination, BigQuery in your case. This allows to decouple extraction logic from loading logic, and makes your source connector destination agnostic.
I'd suggest this high level logic in your source connector's implementation:
Call the source API to retrieve the zip's url
Download + unzip the zip
Parse the CSV file with Pandas
Output parsed records
This is under the assumption that all CSV files have the same schema. If not you'll have to declare one stream per schema.
A great guide, with more details on how to develop a Python connector is available here.
Once your source connector outputs AirbyteRecordMessages you'll be able to connect it to BigQuery and chose the best loading method according to your need (Standard or GCS staging).

I want to setup dbpedia dataset locally

I want to setup DBpedia dataset locally, but I'm not sure how to do it. I have downloaded mappingbased_objects_en.ttl and infobox_properties_mapped_en.ttl.bz2, is there anything else I need to download,
now how can I query this using SPARQL ? do I need to install anything to make it queryable from sparql. is there any Database software for SPARQL like mysql ??
I tried http://dbpedia.org/sparql, but due to the restriction of 10000 query limit I want to setup DBpedia in my system
Any lead would be appreciated.
Thanks
PS: This two files (mappingbased_objects_en.ttl, infobox_properties_mapped_en.ttl.bz2) doesn't seems to have all the entity information for ex: Steve Jobs is not there in those files but Tim Cook is there and I'm certain Steve jobs is present in DBpedia.
You need to install DBPedia on a local triplestore, such as Virtuoso. I explain this in this article but here is the gist on how to install and query DBPedia locally with Virtuoso Triplestore:
The Virtuoso Open Source Edition can be downloaded from here.
Once Virtuoso is installed, run it and start VOS Database.
Go to Virtuoso admin page in the browser (you may have to give it a bit of time to start): http://localhost:8890/conductor/
Login with default credentials (dba/dba)
In tab “Quad Store Upload” for testing you can upload a ttl file to the specified named graph IRI, such as “http://localhost:8890/DBPedia”.
Next you can test the triplestore in the SPARQL tab or directly at the local endpoint. For example:
SELECT count(*) WHERE
{?category skos:broader <http://dbpedia.org/resource/Category:Environmental_issues>}
However the upload might fail for bigger files.
For bigger files and also for uploading multiple files, it is best to use the bulk upload.
In order to bulk upload files from anywhere (and not just the Virtuoso import folder), you must add your folder to the DirsAllowed property in the Virtuoso configuration file virtuoso.ini. You must restart Virtuoso for the changes in virtuoso.ini to be effective. For example, assuming that the dumps are in /tmp/virtuoso_db/dbpedia/ttl, you can add path /tmp/virtuoso_db to DirsAllowed.
Once Virtuoso is back and running, go the the Interactive SQL (ISQL) window and register the files to be loaded by typing in:
ld_dir('/tmp/virtuoso_db/dbpedia/ttl/','*.ttl','http://localhost:8890/DBPedia');
You can then perform the bulk load of all the registered files by typing in:
rdf_loader_run();
You can monitor the number of triples being uploaded by performing the following SPARQL query on the local endpoint:
select count(*) as ?c where {?a ?b ?c}
Although #firefly's answer is still correct, there is a much simpler way to setup dbpedia locally provided by dbpedia itself:
git clone https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart.git
cd virtuoso-sparql-endpoint-quickstart
COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/latest-core VIRTUOSO_ADMIN_PASSWD=password docker-compose up
Source: https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart

Load OpenStreetMap data into Virtuoso

How can I load data from OpenStreetMap of a particulat area (e.g. Berlin) into the (open source) Virtuoso triple store which runs on my local computer (Ubuntu)?
I tried to download the particular OSM file and access it with sparqlify in order to convert it to RDF (or turtle etc.), which then later (at least that was my idea) could be loaded to Virtuoso using the bulk loading strategy. That did not work.
I would be happy if you could tell me if there is any other alternative how I could convert the osm files into RDF.... or, if there is a totally different approach?!
I also thought about using apache jena within Java to access the linkedgeodata access point directly, however, I guess having the data locally gives me much more performance at a later point when I run SPARQL commands.
Cheers

Access LinkedMDB offline [duplicate]

i want to query from Linked Movie Database at linkedmdb.org locally.
is there some rdf or owl version of it that can i download query locally instead of remotely
I tried to query it and got the following error:
org.openjena.riot.RiotException: <E:\Applications\linkedmdb-latest-dump\linkedmdb-latest-dump.nt> Code: 11/LOWERCASE_PREFERRED in SCHEME: lowercase is preferred in this component
org.openjena.riot.system.IRIResolver.exceptions(IRIResolver.java:256)
org.openjena.riot.system.IRIResolver.access$100(IRIResolver.java:24)
org.openjena.riot.system.IRIResolver$IRIResolverNormal.resolveToString(IRIResolver.java:380)
org.openjena.riot.system.IRIResolver.resolveGlobalToString(IRIResolver.java:78)
org.openjena.riot.system.JenaReaderRIOT.readImpl(JenaReaderRIOT.java:121)
org.openjena.riot.system.JenaReaderRIOT.read(JenaReaderRIOT.java:79)
com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
com.hp.hpl.jena.util.FileManager.readModelWorker(FileManager.java:395)
com.hp.hpl.jena.util.FileManager.loadModelWorker(FileManager.java:299)
com.hp.hpl.jena.util.FileManager.loadModel(FileManager.java:250)
ServletExample.runQuery(ServletExample.java:92)
ServletExample.doGet(ServletExample.java:62)
javax.servlet.http.HttpServlet.service(HttpServlet.java:627)
javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
There's a claim that there is a download from this page. Haven't tried it myself, so I don't know whether it's fresh or not.
There is a dump in ntriples format at this address:
http://queens.db.toronto.edu/~oktie/linkedmdb/
If you want to query it you may upload the dump files onto one local triple store such as 4store or jena (using the relational support). Other libraries and tools are available, depending on the language you're more familiar with.
If you need more information let me know.

How to install Jena SemanticWeb Framework in Play Framework

I put jena jar files in the lib folder and see the message:
A JPA error occurred (Cannot start a JPA manager without a
properly configured database): No datasource configured
what am I doing wrong?
I found the answer. This was a problem in Play.
There's some reason put in front of the class directive from the module javax
I do not know why it happened, simply remove and earned
This error doesn't asoociate with jena, because if your dont't choose model (dataset) while your execute query, you will get next message - No dataset description for query and com.hp.hpl.jena.query.QueryExecException. But if you choose jena as datasource in play, you may get your message(sorry, but i don't know much about Play).
What operations you do with jena?
I don't know much Jena but it seems that some persistent ontologies might be stored into database. Thus, it would mean Jena needs a database connection?
Is this error an error of Jena and not from Play?
What do you try to do in your code before getting this error?
If Jena requires some configuration and resource creation before using it, you should think of creating a little Jena play plugin to initialize your Jena context...