Query and Graph are not found Fuseki Info [108] 400 Error - sparql

I am running query from the live demo (http://biohackathon.org/d3sparql/). The SPARQL endpoint is changed to the localhost endpoint built using Apache Jena Fuseki server. The RDF file is uploaded and stored in the server and the SPARQL endpoint is set. The set up for making the RDF file accessible via HTTP is basically a success and changes could be tracked through terminal.
So, I am trying to implement D3Sparkle to visualise the data. All I did so far is changing the endpoint and put my SPARQL query respectively in the D3Sparkle live demo page. However, when I try running the query, it I got this error:
[2017-02-25 00:05:30] Fuseki INFO [108] GET /ds :: 'data' :: <none> ? query=PREFIX%20up%3A%20%3Chttp%3A%2F%2Fpurl.uniprot.org%2Fcore%2F%3E%0APREFIX%20tax%3A%20%3Chttp%3A%2F%2Fpurl.uniprot.org%2Ftaxonomy%2F%3E%0ASELECT%20%3Froot_name%20%3Fparent_name%20%3Fchild_name%0AFROM%20%3Chttp%3A%2F%2Ftogogenome.org%2Fgraph%2Funiprot%3E%0AWHERE%0A%7B%0A%20%20VALUES%20%3Froot_name%20%7B%20%22Tardigrada%22%20%7D%0A%20%20%3Froot%20up%3AscientificName%20%3Froot_name%20.%0A%20%20%3Fchild%20rdfs%3AsubClassOf%2B%20%3Froot%20.%0A%20%20%3Fchild%20rdfs%3AsubClassOf%20%3Fparent%20.%0A%20%20%3Fchild%20up%3AscientificName%20%3Fchild_name%20.%0A%20%20%3Fparent%20up%3AscientificName%20%3Fparent_name%20.%0A%7D%0A%20%20%20%20%20%20
[2017-02-25 00:05:30] Fuseki INFO [108] 400 Neither ?default nor ?graph in the query string of the request (0 ms)
Does anybody had the same experience or any advices?
Thank you in advance for your help.

The request is not directed to the query endpoint. Usually that's /ds/sparql or /ds/query.
It is going to /ds/data which is the default for SPARQL Graph Store Protocol (GSP) operations.
The error is from the GSP handler.

Related

SPARQL over HTTP not executing

I am using the SOH commands given by apache jena fuseki platform to s-get and s-post and s-query but I am always getting the output as s-get : command not found. What is wrong in this case and how can I make this work?

How can I do to reconcile AGROVOC sparql endpoint with OpenRefine

Having this sparql service available http://agrovoc.uniroma2.it/sparql, using the OpenRefine RDF extension, is it possible to use it as a reconciliation service?
So far I've tried but got the following error:
12:06:51.461 [..ctReconciliationService] error reconciling 'Lenteja'
(3153ms) org.shaded.apache.jena.query.QueryException: Endpoint
returned Content-Type: text/html which is not rcognized for SELECT
queries
I think may be I got the wrong endpoint.
OpenRefine version is 3.3

Graphdb sparql cannot make federated query

I am trying to make a federated query using my local GraphDB database and SPARQL endpoint. For some reason I always get errors or empty columns in my result table.
I tried several SPARQL endpoints which are working perfectly fine on FactForge. My code and error message are below can you help me to get it working.
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX bif: <bif:>
SELECT ?duration1 ?duration2{
VALUES (?start ?end)
{("2011-02-02T14:45:14"^^xsd:dateTime "2011-02-04T14:45:13"^^xsd:dateTime)}
SERVICE <http://dbpedia.org/sparql> #-- Virtuoso
{ BIND ((?end - ?start)/86400.0 AS ?duration1) }
}
Error 500: error
Query evaluation error: org.eclipse.rdf4j.query.QueryEvaluationException: Virtuoso 22023 Error Can not load own service description
I am using the free version of GraphDB. My OS is macOS Mojave 10.14.5. I directly installed GraphDB from the website and i never used RDF4J before. Also after this error i installed a local Virtuoso SPARQL endpoint and i can use federated query from that endpoint using my local GraphDB as a service. But i cannot use federated query from my local GraphDB endpoint using my local Virtuoso endpoint as a service since it gives the same error above.
The root cause of the above error is "Virtuoso 22023 Error Can not load own service description", which is handled as an unknown runtime exception resulting HTTP Code 500.
I suspect this was a temporal glitch in the DBPedia service since the same query currently works from GraphDB tested with the latest GraphDB 9.0 versions and an older GraphDB 8.x version via FactForge service.

How to fetch Spark Streaming job statistics using REST calls when running in yarn-cluster mode

I have a spark streaming program running on Yarn Cluster in "yarn-cluster" mode. (-master yarn-cluster).
I want to fetch spark job statistics using REST APIs in json format.
I am able to fetch basic statistics using REST url call:
http://yarn-cluster:8088/proxy/application_1446697245218_0091/metrics/json. But this is giving very basic statistics.
However I want to fetch per executor or per RDD based statistics.
How to do that using REST calls and where I can find the exact REST url to get these statistics.
Though $SPARK_HOME/conf/metrics.properties file sheds some light regarding urls i.e.
5. MetricsServlet is added by default as a sink in master, worker and client driver, you can send http request "/metrics/json" to get a snapshot of all the registered metrics in json format. For master, requests "/metrics/master/json" and "/metrics/applications/json" can be sent seperately to get metrics snapshot of instance master and applications. MetricsServlet may not be configured by self.
but that is fetching html pages not json. Only "/metrics/json" fetches stats in json format.
On top of that knowing application_id pro-grammatically is a challenge in itself when running in yarn-cluster mode.
I checked REST API section of Spark Monitoring page, but that didn't worked when we run spark job in yarn-cluster mode. Any pointers/answers are welcomed.
You should be able to access the Spark REST API using:
http://yarn-cluster:8088/proxy/application_1446697245218_0091/api/v1/applications/
From here you can select the app-id from the list and then use the following endpoint to get information about executors, for example:
http://yarn-cluster:8088/proxy/application_1446697245218_0091/api/v1/applications/{app-id}/executors
I verified this with my spark streaming application that is running in yarn cluster mode.
I'll explain how I arrived at the JSON response using a web browser. (This is for a Spark 1.5.2 streaming application in yarn-cluster mode).
First, use the hadoop url to view the RUNNING applications. http://{yarn-cluster}:8088/cluster/apps/RUNNING.
Next, select a running application, say http://{yarn-cluster}:8088/cluster/app/application_1450927949656_0021.
Next, click on the TrackingUrl link. This uses a proxy and the port is different in my case: http://{yarn-proxy}l:20888/proxy/application_1450927949656_0021/. This shows the spark UI. Now, append the api/v1/applications to this URL: http://{yarn-proxy}l:20888/proxy/application_1450927949656_0021/api/v1/applications.
You should see a JSON response with the application name supplied to SparkConf and the start time of the application.
I was able to reconstruct the metrics in the columns seen in the Spark Streaming web UI (batch start time, processing delay, scheduling delay) using the /jobs/ endpoint.
The script I used is available here. I wrote a short post describing and tying its functionality back to the Spark codebase. This does not need any web-scraping.
It works for Spark 2.0.0 and YARN 2.7.2, but may work for other version combinations too.
You'll need to scrape through the HTML page to get the relevant metrics. There isn't a Spark rest endpoint for capturing this info.

multiple file upload to bigquery

I am trying to do multiple file upload simultaneously to google big-query using command line tool. I got following error :
BigQuery error in load operation: Could not connect with BigQuery server.
Http response status: 503
Http response content:
Service Unavailable
Any way to workaround this problem ?
How do I upload multiple files simultaneously to google big-query using command line tool.
Multiple file upload should work (and we use it every day). If you're getting a 503, that indicates something is wrong with the service. One thing you might want to make sure of is that if you're using a * in your command line that you have it quoted so that the shell doesn't expand it automatically before it gets passed to bq.
If you're getting a 503 error, can you retry the command the flag --apilog=- (this needs to be one of the first params) which will dump the interaction with the server to stdout. The problem may be obvious from that log, but if it isn't can you update your question with the relevant portions of the log? If you're not comfortable posting that information on a public forum, can you e-mail it to me at tigani at google dot com?