How can I use Jena to run SPARQL update queries to a remote endpoint? (assuming I am allowed to).
So far, what I have is the following
UpdateRequest update = UpdateFactory.create(queryString);
UpdateProcessor processor = UpdateExecutionFactory.create(update, dataset);
processor.execute();
But I do not know what dataset is and where to add the remote endpoint URL.
OK, so here is the solution:
RDFConnection conn = RDFConnectionFactory.connect(endpoint);
UpdateRequest update = UpdateFactory.create(queryString);
conn.update(update);
Related
I am trying to retrieve a custom sql query from an oracle database source using the existing connector in the data catalog in an AWS Glue job script. I found this in the doc of AWS :
DataSource = glueContext.create_dynamic_frame.from_options(connection_type =
"custom.jdbc", connection_options = {"query":"SELECT id, name, department FROM department
WHERE id < 200","connectionName":"test-connection-jdbc"}, transformation_ctx =
"DataSource0")
But it's not working and I don't know why.
ps : the connector is well configured and tested.
the error raised is : getDynamicFrame. empty.reduceLeft
that the query is executed and loaded to the target.
In Amazon Neptune I would like to run multiple Gremlin commands in Java as a single transactions. The document says that tx.commit() and tx.rollback() is not supported. It suggests this - Multiple statements separated by a semicolon (;) or a newline character (\n) are included in a single transaction.
Example from the document show that Gremlin is supported in Java but I don't understand how to "Multiple statements separated by a semicolon"
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster));
// Add a vertex.
// Note that a Gremlin terminal step, e.g. next(), is required to make a request to the remote server.
// The full list of Gremlin terminal steps is at https://tinkerpop.apache.org/docs/current/reference/#terminal-steps
g.addV("Person").property("Name", "Justin").next();
// Add a vertex with a user-supplied ID.
g.addV("Custom Label").property(T.id, "CustomId1").property("name", "Custom id vertex 1").next();
g.addV("Custom Label").property(T.id, "CustomId2").property("name", "Custom id vertex 2").next();
g.addE("Edge Label").from(g.V("CustomId1")).to(g.V("CustomId2")).next();
The doc you are referring is for using the "string" mode for query submission. In your approach you are using the "bytecode" mode by using the remote instance of the graph traversal source (the "g" object). Instead you should submit a string script via the client object
Client client = gremlinCluster.connect();
client.submit("g.V()...iterate(); g.V()...iterate(); g.V()...");
Gremlin sessions
Java Example
After getting the cluster object,
String sessionId = UUID.randomUUID().toString();
Client client = cluster.connect(sessionId);
client.submit(query1);
client.submit(query2);
.
.
.
client.submit(query3);
client.close();
When you run .close() all the mutations get committed.
You can also capture the response from the Query reference.
List<Result> results = client.submit(query);
results.stream()...
You can also use the SessionedClient, which will run all queries in the same transaction upon close().
More information is here: https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-sessions.html#access-graph-gremlin-sessions-glv
It seems that my query execution is now performed synchronously rather than asynchronously using the latest (as of Oct. 17. 2017) version of the BigQuery libraries available via "com.google.cloud:google-cloud-bigquery:0.26.0-beta".
I need to use the latest version so that I can use properly set the maxBillingTier option.
Here is my code-snippet:
QueryJobConfiguration request =
QueryJobConfiguration.newBuilder(query)
.setDefaultDataset(datasetId)
.setMaximumBillingTier(MAX_BILLING_TIER)
.build();
BigQuery.QueryOption pageSizeOption = BigQuery.QueryOption.of(
BigQuery.QueryResultsOption.pageSize(PAGE_SIZE));
BigQuery.QueryOption maxWaitOption = BigQuery.QueryOption.of(
BigQuery.QueryResultsOption.maxWaitTime(MAX_WAIT_MILLIS));
QueryResponse response = null;
try {
response = bigQuery.query(request,
pageSizeOption,
maxWaitOption);
} catch ( // exception-handling code deleted for brevity ) {
...
}
return response.getJobId();
A similarly formatted request using QueryRequest from version 0.24.0 instead of QueryJobConfiguration would have (quickly) returned the jobId, which I could then use to poll for status. Now, I suddenly have no simple way of reporting the status of the query to my calling code.
Update:
I was able to get asynchronous query results with this approach:
QueryJobConfiguration request =
QueryJobConfiguration.newBuilder(query)
.setDefaultDataset(datasetId)
.setMaximumBillingTier(MAX_BILLING_TIER)
.build();
JobInfo jobInfo = JobInfo
.newBuilder(request)
.setJobId(jobId)
.build();
Job job = bigQuery.create(jobInfo);
QueryResponse response = job.getQueryResults(pageSizeOption,
maxWaitOption);
return response.getJobId();
Of course, I need to add exception-handling, but that's the gist. However, it is far less elegant than the simpler format available in version 0.24.0-beta.
Is there a more elegant solution?
Would setting Priority to BATCH priority have an effect on this?
I am using Pivotal GemFire 9.0.0 with 1 Locator and 1 Server. The Server has a Region called "submissions", like below -
<gfe:replicated-region id="submissionsRegion" name="submissions"
statistics="true" template="replicateRegionTemplate">
...
</gfe:replicated-region>
I am getting Region as null when executing the following code -
Region<K, V> region = clientCache.getRegion("submissions");
Surprisingly, the same ClientCache returns all the records when I query using OQL and QueryService as shown below -
String queryString = "SELECT * FROM /submissions";
QueryService queryService = clientCache.getQueryService();
Query query = queryService.newQuery(queryString);
SelectResults results = (SelectResults) query.execute();
I am initializing my ClientCache like this -
ClientCache clientCache = new ClientCacheFactory()
.addPoolLocator("localhost", 10479)
.set("name", "MyClientCache")
.set("log-level", "error")
.create();
I am really baffled by this. Any pointer or help would be great.
You need to configure your ClientCache (either through a cache.xml or pure GemFire API) with the regions as well. Using your example:
ClientRegionFactory regionFactory = clientCache.createClientRegionFactory(ClientRegionShortcut.PROXY);
Region region = regionFactory.create("submissions");
The ClientRegionShortcut.PROXY is used just for the sake of simplicity, you should use the shortcut that meets your needs.
The OQL works as expected because you are obtaining the QueryService through the ClientCache.getQueryService() method (instead of ClientCache.getLocalQueryService()), so the query is actually executed on Server Side.
You can get more information about how to configure the Client/Server topology in
Client/Server Configuration.
Hope this helps.
Cheers.
Yes, you need to "define" the corresponding client-side Region, matching the server-side REPLICATE Region by name (i.e. "submissions"). Actually this is a requirement independent of the server Regions' DataPolicy type (e.g. REPLICATE or PARTITION).
This is necessary since not every client wants to know about or even needs have data/events from every possible server Region. Of course, this is also configurable through subscription and "Interests Registration" (with Client/Server Event Messaging, or alternatively, CQs).
Anyway, you can completely avoid the use of the GemFire API directly or even GemFire's native cache.xml (highly recommend avoiding) by using either SDG's XML namespace...
<gfe:client-cache properties-ref="gemfireProperties" ... />
<gfe:client-region id="submissions" shortcut="PROXY"/>
Or by using Spring JavaConfig with SDG's API...
#Configuration
class GemFireConfiguration {
Properties gemfireProperties() {
Properties gemfireProperties = new Properties();
gemfireProperties.setProperty("log-level", "config");
...
return gemfireProperties;
}
#Bean
ClientCacheFactoryBean gemfireCache() {
ClientCacheFactoryBean gemfireCache = new ClientCacheFactoryBean();
gemfireCache.setClose(true);
gemfireCache.setProperties(gemfireProperties());
...
return gemfireCache;
}
#Bean(name = "submissions");
ClientRegionFactoryBean submissionsRegion(GemFireCache gemfireCache) {
ClientRegionFactoryBean submissions = new ClientRegionFactoryBean();
submissions.setCache(gemfireCache);
submissions.setClose(false);
submissions.setShortcut(ClientRegionShortcut.PROXY);
...
return submissions;
}
...
}
The "submissions" Region can be wrapped with SDG's GemfireTemplate, which will handle getting the "correct" QueryService on your behalf when running queries using the find(..) method.
Of course, you may be interested in making your client "submissions" Region a CACHING_PROXY" too. Of course, you will then need to register "interests" in the keys or data of interests. CQs are the best way to do this as it uses query criteria to define the data of "interests".
CACHING_PROXY is exactly as it sounds, caching data locally in the client based on the interests policies. This also gives you the ability to use the "local" QueryService to query data locally, avoiding the network hop.
Anyway, many options here.
Cheers,
John
If I run DELETE/INSERT query from the SPARQL endpoint, the UPDATE operation works. The SPARQL query is
SELECT ALL
INSERT DATA INTO <PERSONGRAPH> { personURI rdf:type foaf:Person }
Is it possible to do the same operation using Java code (either Jena or VirtuosoExecutionFactory), such that the UPDATE operation happens without needing to load the entire graphs into memory? I would like to invoke SPARQL endpoint from code to do the UPDATE operation. Please correct me if the assumption that the entire triples of a graph will be loaded in memory is wrong. The following Jena code works, but it loads the entire model to memory which causes the machine to not work when the triple size grows above 50,000.
SELECT ALL
String queryString1 = " INSERT DATA { personURI
rdf:type foaf:Person } ";
UpdateRequest request1 = UpdateFactory.create(queryString1);
UpdateAction.execute(request1, personModel);
I would like to do the same by invoking sparqlService or using createServiceRequest so that it avoids loading the entire graph into memory similar to the way it works for SPARQL endpoint.
The following code is not updating the Virtuoso store.
SELECT ALL
String queryString1 = " INSERT DATA { personURI
rdf:type foaf:Person } ";
com.hp.hpl.jenafix.query.Query query1 = com.hp.hpl.jenafix.query.QueryFactory.create(queryString1);
com.hp.hpl.jenafix.query.QueryExecution qexec1 = com.hp.hpl.jenafix.query.QueryExecutionFactory.sparqlService("http://IP:8890/sparql", query1);
I have tried using VirtuosoQueryExecutionFactory.sparqlService, QueryExecutionFactory.createServiceRequest and QueryExecutionFactory.sparqlService. These work for SELECT but not for UPDATE. Please let me know how to do update by invoking the SPARQL endpoint from Java code. Any suggestions, tips are much appreciated.
For new StackOverflow users there is a restriction of 2 URLs. Sadly personUri is an URL and can't be mentioned due to restriction. I am mentioning the personUri here for completeness. personUri is http://onmobile.com/umdb/person/juhi_chawla_268e7a02-8737-464f-97f8-172961d3335b
Based on Andy's feedback tried using both suggestions of UpdateExecutionFactory and Http Client.
On trying to use UpdateExecutionFactory and Http Client, got Connection refused problem while performing UPDATE but not on doing SELECT. Proxy Host and port are already set.
Thank you for the comment. The INSERT syntax works for Virtuoso Open Source which is the store that is being used. I have problem using the UpdateExecutionFactory. I tried with the following
String queryString = "DELETE DATA FROM <PERSONGRAPH> { <"
+ personURI
+ "> rdf:type foaf:Person } ";
com.hp.hpl.jena.update.UpdateRequest request = com.hp.hpl.jena.update.UpdateFactory.create(queryString);
UpdateProcessor proc = UpdateExecutionFactory.createRemote(request, "http://IP:8890/sparql");
proc.execute();
and got the following error stacktrace
org.apache.http.conn.HttpHostConnectException: Connection to IP:8890 refused
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at org.openjena.riot.web.HttpOp.execHttpPost(HttpOp.java:208)
at org.openjena.riot.web.HttpOp.execHttpPost(HttpOp.java:154)
at org.openjena.riot.web.HttpOp.execHttpPost(HttpOp.java:128)
at com.hp.hpl.jena.sparql.modify.UpdateProcessRemote.execute(UpdateProcessRemote.java:60)
Could it be possible that Virtuoso has a different URL endpoint for update?
An update is not a query in SPARQL.
Use
UpdateExecutionFactory
To stream data to the server, simply open an HTTP POST connection with content type application/sparql-update, and write the update (with large data) into the stream.
BTW:
INSERT DATA INTO <PERSONGRAPH>
isn't legal SPARQL Update syntax.
I'm not sure I follow your question. It's not clear what was added after AndyS' response, and what was there to start with. His note about your syntax error is probably worth more study, if you haven't resolved this yet -- and if you have, an update sharing that resolution would be a good idea.
I'd also recommend reviewing the documentation regarding Jena connections to Virtuoso.
Also worth noting -- questions specifically regarding Virtuoso are generally best raised on the public OpenLink Discussion Forums, the Virtuoso Users mailing list, or through a confidential Support Case.
You could use the following code to update data directly on server side without loading to local client memory.
public static void main(String[] args) {
String url;
if(args.length == 0)
url = "jdbc:virtuoso://localhost:1111";
else
url = args[0];
VirtGraph set = new VirtGraph (url, "dba", "dba");
String str = "CLEAR GRAPH <http://test1>";
VirtuosoUpdateRequest vur = VirtuosoUpdateFactory.create(str, set);
vur.exec();
str = "INSERT INTO GRAPH <http://test1> { <http://aa> <http://bb> 'cc' . <http://aa1> <http://bb> 123 . <http://aa1> <http://bb> 124 . <http://aa1> <http://bb> 125 . }";
vur = VirtuosoUpdateFactory.create(str, set);
vur.exec();
Look at VirtuosoSPARQLExample8.java and VirtuosoSPARQLExample9.java in Virtuoso Jena Provider examples.