How to run multiple gremlin commands as a single transaction? - amazon-neptune

In Amazon Neptune I would like to run multiple Gremlin commands in Java as a single transactions. The document says that tx.commit() and tx.rollback() is not supported. It suggests this - Multiple statements separated by a semicolon (;) or a newline character (\n) are included in a single transaction.
Example from the document show that Gremlin is supported in Java but I don't understand how to "Multiple statements separated by a semicolon"
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster));
// Add a vertex.
// Note that a Gremlin terminal step, e.g. next(), is required to make a request to the remote server.
// The full list of Gremlin terminal steps is at https://tinkerpop.apache.org/docs/current/reference/#terminal-steps
g.addV("Person").property("Name", "Justin").next();
// Add a vertex with a user-supplied ID.
g.addV("Custom Label").property(T.id, "CustomId1").property("name", "Custom id vertex 1").next();
g.addV("Custom Label").property(T.id, "CustomId2").property("name", "Custom id vertex 2").next();
g.addE("Edge Label").from(g.V("CustomId1")).to(g.V("CustomId2")).next();

The doc you are referring is for using the "string" mode for query submission. In your approach you are using the "bytecode" mode by using the remote instance of the graph traversal source (the "g" object). Instead you should submit a string script via the client object
Client client = gremlinCluster.connect();
client.submit("g.V()...iterate(); g.V()...iterate(); g.V()...");

Gremlin sessions
Java Example
After getting the cluster object,
String sessionId = UUID.randomUUID().toString();
Client client = cluster.connect(sessionId);
client.submit(query1);
client.submit(query2);
.
.
.
client.submit(query3);
client.close();
When you run .close() all the mutations get committed.
You can also capture the response from the Query reference.
List<Result> results = client.submit(query);
results.stream()...

You can also use the SessionedClient, which will run all queries in the same transaction upon close().
More information is here: https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-sessions.html#access-graph-gremlin-sessions-glv

Related

Do we need to close DBCPConnectionPool object in Custom Processor Or Is it handled by Controller Service itself?

I have created a Custom processor which take care of saving some records in mysql database. For setting up mysql database i am using DBCPConnectionPool object in my custom processor which does work of saving data to database tables correctly, But i am worried of pooling mechanism i am not closing this connection after my logic of saving is completed. This is working for 2 to 3 flowfiles but when i send multiple flowfile will it work correctly?
DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
Connection con = dbcpService.getConnection();
I am looking for clarification as my currently flow is working correctly with less number of flowfile
You should be returning it to the pool, most likely with a try-with-resource:
try (final Connection con = dbcpService.getConnection();
final PreparedStatement st = con.prepareStatement(selectQuery)) {
}
You can always consult the standard processors to see what they do:
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java#L223

Storing graph to gremlin server from in memory graph

I'm new to Graphs in general.
I'm attempting to store a TinkerPopGraph that I've created dynamically to gremlin server to be able to issue gremlin queries against it.
Consider the following code:
Graph inMemoryGraph;
inMemoryGraph = TinkerGraph.open();
inMemoryGraph.io(IoCore.graphml()).readGraph("test.graphml");
GraphTraversalSource g = inMemoryGraph.traversal();
List<Result> results =
client.submit("g.V().valueMap()").all().get();
I need some glue code. The gremlin query here is issued against the modern graph that is a default binding for the g variable. I would like to somehow store my inMemoryGraph so that when I run a gremlin query, its ran against my graph.
All graph configurations in Gremlin Server must occur through its YAML configuration file. Since you say you're connected to the modern graph I'll assume that you're using the default "modern" configuration file that ships with the standard distribution of Gremlin Server. If that is the case, then you should look at conf/gremlin-server-modern.yaml. You'll notice that this:
graphs: {
graph: conf/tinkergraph-empty.properties}
That creates a Graph reference in Gremlin Server called "graph" which you can reference from scripts. Next, note this second configuration:
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/generate-modern.groovy]}}}
Specifically, pay attention to scripts/generate-modern.groovy which is a Gremlin Server initialization script. Opening that up you will see this:
// an init script that returns a Map allows explicit setting of global bindings.
def globals = [:]
// Generates the modern graph into an "empty" TinkerGraph via LifeCycleHook.
// Note that the name of the key in the "global" map is unimportant.
globals << [hook : [
onStartUp: { ctx ->
ctx.logger.info("Loading 'modern' graph data.")
org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerFactory.generateModern(graph)
}
] as LifeCycleHook]
// define the default TraversalSource to bind queries to - this one will be named "g".
globals << [g : graph.traversal()]
The comments should do most of the explaining. The connection here is that you need to inject your graph initialization code into this script and assign your inMemoryGraph.traversal() to g or whatever variable name you wish to use to identify it on the server. All of this is described in the Reference Documentation.
There is a way to make this work in a more dynamic fashion, but it involves extending Gremlin Server through its interfaces. You would have to build a custom GraphManager - the interface can be found here. Then you would set the graphManager key in the server configuration file with the fully qualified name of your instance.

Get the JDBC Providers for the Cell using wsadmin

I am trying to list the jdbcprovider list at cell scope but it also list the jdbcproviders at node and server scope, how to get rid off the providers at node and server scope from the list?
AdminConfig.list('JDBCProvider', AdminConfig.getid( '/Cell:CellV70A/'))
output:
'"DB2 Universal JDBC Driver Provider(cells/CellV70A/nodes/nodename|resources.xml#JDBCProvider_1302300228086)"\n"DB2 Universal JDBC Driver Provider(cells/CellV70A|resources.xml#JDBCProvider_1263590015775)"\n"WebSphere embedded ConnectJDBC driver for MS SQL Server(cells/CellV70A|resources.xml#JDBCProvider_1272027151294)"'
If you look at the help for the AdminConfig.list command:
wsadmin>print AdminConfig.help('list')
WASX7056I: Method: list
...
Method: list
Arguments: type, scope
Description: Lists all the configuration objects of the type named
by "type" within the scope of the configuration object named by "scope."
...
It says "within the scope". Since node and server-scoped JDBCProviders are within the scope of the cell, they are returned by your command. If you list all JDBCProviders at cell scope using the Admin Console and then look at the Command Assistance, you'll see something like:
Note that scripting list commands may generate more information than is displayed by the administrative console because the console generally filters with respect to scope, templates, and built-in entries. AdminConfig.list('JDBCProvider', AdminConfig.getid('/Cell:MyCell/'))
So you'll need to filter your return list similarly. You could throw together a very simple script to do so:
jdbcProviders = AdminConfig.list('JDBCProvider', AdminConfig.getid('/Cell:MyCell')).split('\r\n')
for jdbcProvider in jdbcProviders:
if "/nodes/" or "/servers/" in jdbcProvider:
continue
print jdbcProvider

How to use insert_job

I want to run a Bigquery SQL query using insert method.
I ran the following code just like so:
JobConfigurationQuery = Google::Apis::BigqueryV2::JobConfigurationQuery
bq = Google::Apis::BigqueryV2::BigqueryService.new
scopes = [Google::Apis::BigqueryV2::AUTH_BIGQUERY]
bq.authorization = Google::Auth.get_application_default(scopes)
bq.authorization.fetch_access_token!
query_config = {query: "select colA from [dataset.table]"}
qr = JobConfigurationQuery.new(configuration:{query: query_config})
bq.insert_job(projectId, qr)
and I got an error as below:
Caught error invalid: Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0:
Please let me know how to use the insert_job method.
I'm not sure what client library you're using, but insert_job probably takes a JobConfiguration. You should create one of those and set the query parameter to equal your JobConfigurationQuery you've created.
This is necessary because you can insert various jobs (load, copy, extract) with different types of configurations to this one API method, and they all take a single configuration type with a subfield that specifies which type and details about the job to insert.
More info from BigQuery's documentation:
jobs.insert documentation
job resource: note the "configuration" field and its "query" subfield

Biztalk 2009: SQL to WCF-SQL adapter migration; orchestration not receiving message?

From topic,
I have a receive location that currently uses sql adapter(receive port) to call(poll?) a stored procedure. The stored proc returns a FOR XML result.
The receiver then activates an orchestration which takes the message and populate the data from the message and into some variables (expression shape).
Orchestration looks like:
LongScope[ AtomicScope[ Receive location -> Expression ] ][Error handling]
I tried a direct migration to wcf-sql with XMLpolling as InboundOperationType, but it throws a null exception during the variable assignation(I assume).
Additional detail:
I caught the message from the receiver by filtering pipelineName using a send port. There is a slight different in the message retrieved by sql and wcf-sql adapter
sql:
<rootNode xmlns="namespace"><row data1="data1" data2 = "data2" /></rootnode>
wcf-sql:
<rootNode xmlns="namespace"><row data1="data1" data2 = "data2" xmlns=""/></rootnode>
Which should do nothing, if this msdn post is correct
I also went into orchestration debugger. Weird thing is, when using sql adapter, the message is still = null, but the varibles are assigned without problem. I also tried adding a send port directly after the receive port to dump the message. Nothing came out
I would appreciate any info/suggestion/solution
Do tell me if im missing any info.
Irrelevant Info:
As of this post the receive port doesnt even trigger anymore. I dont know why. Rebooting PC.
Also I suspect Biztalk gave my bruxism and lead to me requiring 6 teeth fillings
The difference between XML in SQL en WCF-SQL has nothing to do with the MSDN post you are linking to.
In the 2nd XML (WCF-SQL adapter), the row node does not have a namespace. In the 1st XML (SQL adapter), the row node inherits the default namespace "namespace" from its parent: 'root'.
Regarding the Receive Port not triggering anymore:
Are you sure your Host Instance(s) are still running?
My solution:
I added "xmlns = 'namespace'" as a 'data' in the stored procedure.
The adapter recognized it and removed it(since it was the same as parent node), allowing me to use the old schema.
Filler:
So I generated a schema using the output from WCF-SQL adapter,
however I couldnt replace my old one with it, since the expression shape will not recognize its child elements (var = messageObject.childElement)
I created a map to map the new one back to the old one.
But that didnt work, because they both shared the same namespace, and biztalk complained during runtime that it couldnt decide which schema to use.