How to set Neo4J config keys in gremlin-scala?

How to set Neo4J config keys in gremlin-scala? - apache

When running a Neo4J database server standalone (on Ubuntu 14.04), configuration options are set for the global installation in etc/neo4j/neo4j.conf or possibly $NEO4J_HOME/conf/neo4j.conf.
However, when instantiating a Neo4j database from Java or Scala using Apache's Neo4jGraph class (org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph), there is no global installation, and the constructor does not (as far as I can tell) look for any configuration files.
In particular, when running the test suite for my application, I end up with many simultaneous instances of Neo4jGraph, which ends up throwing a java.net.BindException: Address already in use because all of these instances are trying to communicate over a small range of ports for online backup, which I don't actually need. These channels are set with config options dbms.backup.address (default value: 127.0.0.1:6362-6372) and dbms.backup.enabled (default value: true).
My problem would be solved by setting dbms.backup.enabled to false, or expanding the port range.
Things that have not worked:
Creating /etc/neo4j/neo4j.conf containing the line dbms.backup.enabled=false.
Creating the same file in my project's src/main/resources directory.
Creating the same file in src/main/resources/neo4j.
Manually setting the configuration property inside the Scala code:
val db = new Neo4jGraph(dataDirectory)
db.configuration.addProperty("dbms.backup.enabled",false)
or
db.configuration.addProperty("neo4j.conf.dbms.backup.enabled",false)
or
db.configuration.addProperty("gremlin.neo4j.conf.dbms.backup.enabled",false)
How should I go about setting this property?

Neo4jGraph configuration through TinkerPop is accomplished by a pass-through of configuration keys. In TinkerPop 3.x, that would mean that all Neo4j keys prefixed with gremlin.neo4j.conf that are provided via Configuration object to Neo4jGraph.open() or GraphFactory.open() will be passed down directly to the Neo4j instance. You can see examples of this here in the TinkerPop documentation on high availability configuration.
In TinkerPop 2.x, the same approach was taken however the key prefix was instead blueprints.neo4j.conf.* as discussed here.

Manipulating db.configuration after the database connection had already been opened was definitely futile.
stephen mallette's answer was on the right track, but this particular configuration doesn't appear to pass through in the way his linked example does. There is a naming mismatch between the configuration keys expected in neo4j.conf and those expected in org.neo4j.backup.OnlineBackupKernelExtension. Instead of dbms.backup.address and dbms.backup.enabled, that class looks for config keys online_backup_server and online_backup_enabled.
I was not able to get these keys passed down to the underlying Neo4jGraphAPI instance correctly. What I had to do, instead, was the following:
import org.neo4j.tinkerpop.api.impl.Neo4jFactoryImpl
import scala.collection.JavaConverters._
val factory = new Neo4jFactoryImpl()
val config = Map(
"online_backup_enabled" -> "true",
"online_backup_server" -> "0.0.0.0:6350-6359"
).asJava
val db = Neo4jGraph.open(factory.newGraphDatabase(dataDirectory,config))
With this initialization, the instance correctly listened for backups on port 6350; changing "true" to "false" disabled backup listening.

Using Neo4j 3.0.0 the following disables port listening for me (Java code)
import org.apache.commons.configuration.BaseConfiguration;
import org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph;
BaseConfiguration conf = new BaseConfiguration();
conf.setProperty(Neo4jGraph.CONFIG_DIRECTORY, "/path/to/db");
conf.setProperty(Neo4jGraph.CONFIG_CONF + "." + "dbms.backup.enabled", "false");
graph = Neo4jGraph.open(config);

Related

How can I configure a specific serialization method to use only for Celery ping?

I have a celery app which has to be pinged by another app. This other app uses json to serialize celery task parameters, but my app has a custom serialization protocol. When the other app tries to ping my app (app.control.ping), it throws the following error:
"Celery ping failed: Refusing to deserialize untrusted content of type application/x-stjson (application/x-stjson)"
My whole codebase relies on this custom encoding, so I was wondering if there is a way to configure a json serialization but only for this ping, and to continue using the custom encoding for the other tasks.
These are the relevant celery settings:
accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_serializer = CUSTOM_CELERY_SERIALIZATION
task_serializer = CUSTOM_CELERY_SERIALIZATION
event_serializer = CUSTOM_CELERY_SERIALIZATION
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Specs: celery=5.1.2
python: 3.8
OS: Linux docker container
Any help would be much appreciated.

Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Because result_serializer, task_serializer, and event_serializer doesn't accept list but just a single str value, unlike e.g. accept_content
The list for e.g. accept_content is possible because if there are 2 items, we can check if the type of an incoming request is one of the 2 items. It isn't possible for e.g. result_serializer because if there were 2 items, then what should be chosen for the result of task-A? (thus the need for a single value)
This means that if you set result_serializer = 'json', this will have a global effect where all the results of all tasks (the returned value of the tasks which can be retrieved by calling e.g. response.get()) would be serialized/deserialized using the json-serializer. Thus, it might work for the ping but it might not for the tasks that can't be directly serialized/deserialized to/from JSON which really needs the custom stjson-serializer.
Currently with Celery==5.1.2, it seems that task-specific setting of result_serializer isn't possible, thus we can't set a single task to be encoded in 'json' and not 'stjson' without setting it globally for all, I assume the same case applies to ping.
Open request to add result_serializer option for tasks
A short discussion in another question
Not the best solution but a workaround is that instead of fixing it in your app's side, you may opt to just add support to serialize/deserialize the contents of type 'application/x-stjson' in the other app.
other_app/celery.py
import ast
from celery import Celery
from kombu.serialization import register
# This is just a possible implementation. Replace with the actual serializer/deserializer for stjson in your app.
def stjson_encoder(obj):
return str(obj)
def stjson_decoder(obj):
obj = ast.literal_eval(obj)
return obj
register(
'stjson',
stjson_encoder,
stjson_decoder,
content_type='application/x-stjson',
content_encoding='utf-8',
)
app = Celery('other_app')
app.conf.update(
accept_content=['json', 'stjson'],
)
You app remains to accept and respond stjson format, but now the other app is configured to be able to parse such format.

Spring Cloud Server serving multiple property files for the same application

Lets say I have applicationA that has 3 property files:
-> applicationA
- datasource.properties
- security.properties
- jms.properties
How do I move all properties to a spring cloud config server and keep them separate?
As of today I have configured the config server that will only read ONE property file as this seems to be the standard way. This file the config server picks up seems to be resolved by using the spring.application.name. In my case it will only read ONE file with this name:
-> applicationA.properties
How can I add the other files to be resolved by the config server?

Not possible in the way how you requested. Spring Cloud Config Server uses NativeEnvironmentRepository which is:
Simple implementation of {#link EnvironmentRepository} that uses a SpringApplication and configuration files located through the normal protocols. The resulting Environment is composed of property sources located using the application name as the config file stem (spring.config.name) and the environment name as a Spring profile.
See: https://github.com/spring-cloud/spring-cloud-config/blob/master/spring-cloud-config-server/src/main/java/org/springframework/cloud/config/server/environment/NativeEnvironmentRepository.java
So basically every time when client request properties from Config Server it creates ConfigurableApplicationContext using SpringApplicationBuilder. And it is launched with next configuration property:
String config = application;
if (!config.startsWith("application")) {
config = "application," + config;
}
list.add("--spring.config.name=" + config);
So possible names for property files will be only application.properties(or .yml) and config client application name that is requesting configuration - in your case applicationA.properties.
But you can "cheat".
In config server configuration you can add such property
spring:
cloud:
config:
server:
git:
search-paths: '{application}, {application}/your-subdirectory'
In this case Config Server will search for same property file names but in few directories and you can use subdirectories to keep your properties separate.
So with configuration above you will be able to load configuration from:
applicationA/application.properies
applicationA/your-subdirectory/application.properies

This can be done.
You need to create your own EnvironmentRepository, which loads your property files.
org.springframework.cloud.config.server.support.AbstractScmAccessor#getSearchLocations
searches for the property files to load :
for (String prof : profiles) {
for (String app : apps) {
String value = location;
if (app != null) {
value = value.replace("{application}", app);
}
if (prof != null) {
value = value.replace("{profile}", prof);
}
if (label != null) {
value = value.replace("{label}", label);
}
if (!value.endsWith("/")) {
value = value + "/";
}
output.addAll(matchingDirectories(dir, value));
}
}
There you could add custom code, that reads the required property files.
The above code matches exactly the behaviour described in the spring docs.
The NativeEnvironmentRepository does NOT access GIT/SCM in any way, so you should use
JGitEnvironmentRepository as base for your own implementation.

As #nmyk pointed out, NativeEnvironmentRepository boots a mini app in order to collect the properties by providing it with - sort of speak - "hardcoded" {appname}.* and application.* supported property file names. (#Stefan Isele - prefabware.com JGitEnvironmentRepository ends up using NativeEnvironmentRepository as well, for that matter).
I have issued a pull request for spring-cloud-config-server 1.4.x, that supports defining additional file names, through a spring.cloud.config.server.searchNames environment property, in the same sense one can do for a single springboot app, as defined in the Externalized Configuration.Application Property Files section of the documentation, using the spring.config.name enviroment property. I hope they review it soon, since it seems many have asked about this feature in stack overflow, and surely many many more search for it and read the currently advised solutions.
It worths mentioning that many ppl advise "abusing" the profile feature to achieve this, which is a bad practice, in my humble opinion, as I describe in this answer

Storing graph to gremlin server from in memory graph

I'm new to Graphs in general.
I'm attempting to store a TinkerPopGraph that I've created dynamically to gremlin server to be able to issue gremlin queries against it.
Consider the following code:
Graph inMemoryGraph;
inMemoryGraph = TinkerGraph.open();
inMemoryGraph.io(IoCore.graphml()).readGraph("test.graphml");
GraphTraversalSource g = inMemoryGraph.traversal();
List<Result> results =
client.submit("g.V().valueMap()").all().get();
I need some glue code. The gremlin query here is issued against the modern graph that is a default binding for the g variable. I would like to somehow store my inMemoryGraph so that when I run a gremlin query, its ran against my graph.

All graph configurations in Gremlin Server must occur through its YAML configuration file. Since you say you're connected to the modern graph I'll assume that you're using the default "modern" configuration file that ships with the standard distribution of Gremlin Server. If that is the case, then you should look at conf/gremlin-server-modern.yaml. You'll notice that this:
graphs: {
graph: conf/tinkergraph-empty.properties}
That creates a Graph reference in Gremlin Server called "graph" which you can reference from scripts. Next, note this second configuration:
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/generate-modern.groovy]}}}
Specifically, pay attention to scripts/generate-modern.groovy which is a Gremlin Server initialization script. Opening that up you will see this:
// an init script that returns a Map allows explicit setting of global bindings.
def globals = [:]
// Generates the modern graph into an "empty" TinkerGraph via LifeCycleHook.
// Note that the name of the key in the "global" map is unimportant.
globals << [hook : [
onStartUp: { ctx ->
ctx.logger.info("Loading 'modern' graph data.")
org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerFactory.generateModern(graph)
}
] as LifeCycleHook]
// define the default TraversalSource to bind queries to - this one will be named "g".
globals << [g : graph.traversal()]
The comments should do most of the explaining. The connection here is that you need to inject your graph initialization code into this script and assign your inMemoryGraph.traversal() to g or whatever variable name you wish to use to identify it on the server. All of this is described in the Reference Documentation.
There is a way to make this work in a more dynamic fashion, but it involves extending Gremlin Server through its interfaces. You would have to build a custom GraphManager - the interface can be found here. Then you would set the graphManager key in the server configuration file with the fully qualified name of your instance.

Setting user credentials on aws instance using jclouds

I am trying to create an aws instance using jclouds 1.9.0 and then run a script on it (via ssh). I am following the example locate here but I am getting authentication failed errors when the client (java program) tries to connect at the instance. The AWS console show that instance is up and running.
The example tries to create a LoginCrendentials object
String user = System.getProperty("user.name");
String privateKey = Files.toString(new File(System.getProperty("user.home") + "/.ssh/id_rsa"), UTF_8);
return LoginCredentials.builder().user(user).privateKey(privateKey).build();
which is latter used from the ssh client
responses = compute.runScriptOnNodesMatching(
inGroup(groupName), // predicate used to select nodes
exec(command), // what you actually intend to run
overrideLoginCredentials(login) // use my local user & ssh key
.runAsRoot(false) // don't attempt to run as root (sudo)
.wrapInInitScript(false));
Some Login information are injected to the instance with following commands
Statement bootInstructions = AdminAccess.standard();
templateBuilder.options(runScript(bootInstructions));
Since I am on Windows machine the creation of LoginCrendentials 'fails' and thus I alter its code to
String user = "ec2-user";
String privateKey = "-----BEGIN RSA PRIVATE KEY-----.....-----END RSA PRIVATE KEY-----";
return LoginCredentials.builder().user(user).privateKey(privateKey).build();
I also to define the credentials while building the template as described in "EC2: In Depth" guide but with no luck.
An alternative is to build instance and inject the keypair as follows, but this implies that I need to have the ssh key stored in my AWS console, which is not currently the case and also breaks the functionality of running a script (via ssh) since I can not infer the NodeMetadata from a RunningInstance object.
RunInstancesOptions options = RunInstancesOptions.Builder.asType("t2.micro").withKeyName(keypair).withSecurityGroup(securityGroup).withUserData(script.getBytes());
Any suggestions??
Note: While I am currently testing this on aws, I want to keep the code as decoupled from the provider as possible.
Update 26/10/2015
Based on #Ignasi Barrera answer, I changed my implementation by adding .init(new MyAdminAccessConfiguration()) while creating the bootInstructions
Statement bootInstructions = AdminAccess.standard().init(new MyAdminAccessConfiguration());
templateBuilder.options(runScript(bootInstructions));
Where MyAdminAccessConfiguration is my own implementation of the AdminAccessConfiguration interface as #Ignasi Barrera described it.

I think the issue relies on the fact that the jclouds code runs on a Windows machine and jclouds makes some Unix assumptions by default.
There are two different things here: first, the AdminAccess.standard() is used to configure a user in the deployed node once it boots, and later the LoginCredentials object passed to the run script method is used to authenticate against the user that has been created with the previous statement.
The issue here is that the AdminAccess.standard() reads the "current user" information and assumes a Unix System. That user information is provided by this Default class, and in your case I'm pretty sure it will fallback to the catch block and return an auto-generated SSH key pair. That means, the AdminAccess.standard() is creating a user in the node with an auto-generated (random) SSH key, but the LoginCredentials you are building don't match those keys, thus the authentication failure.
Since the AdminAccess entity is immutable, the better and cleaner approach to fix this is to create your own implementation of the AdminAccessConfiguration interface. You can just copy the entire Default class and change the Unix specific bits to accommodate the SSH setup in your Windows machine. Once you have the implementation class, you can inject it by creating a Guice module and passing it to the list of modules provided when creating the jclouds context. Something like:
// Create the custom module to inject your implementation
Module windowsAdminAccess = new AbstractModule() {
#Override protected void configure() {
bind(AdminAccessConfiguration.class).to(YourCustomWindowsImpl.class).in(Scopes.SINGLETON);
}
};
// Provide the module in the module list when creating the context
ComputeServiceContext context = ContextBuilder.newBuilder("aws-ec2")
.credentials("api-key", "api-secret")
.modules(ImmutableSet.<Module> of(windowsAdminAccess, new SshjSshClientModule()))
.buildView(ComputeServiceContext.class);

DBD::Oracle, Cursors and Environment under mod_perl

Need some help, because I can't find any solution for my problems with DBD::Oracle.
So at first, this is the current situation:
We are running Apache2 with mod_perl 2.0.4 at our company
Apache web server was set up with a startup script which is setting some environment variables (LD_LIBRARY_PATH, ORACLE_HOME, NLS_LANG)
In httpd.conf there are also environment variables for LD_LIBRARY_PATH and ORACLE_HOME (via SetEnv)
We are generally using the perl module DBI with driver DBD::Oracle to connect to our main database
Before we create a new instance of DBI we are setting some perl env variables, too (%ENV). We are setting ORACLE_HOME and NLS_LANG.
So far, this works fine. But now we are extending our system and need to connect to a remote database. Again, we are using DBI and DBD::Oracle. But for now there are some new conditions:
New connection must run in parallel to the existing one
TNSNAMES.ORA for the new connection is placed at a different location (not at $ORACLE_HOME.'/network/admin')
New database contents are provided by stored procedures, which we are fetching with DBD::Oracle and cursors (like explained here: https://metacpan.org/pod/DBD::Oracle#Binding-Cursors)
The stored procedures are returning object types and collection types, containing attributes of oracle type DATE
To get these dates in a readable format, we set a new env variable $ENV{NLS_DATE_FORMAT}
To ensure the date format we additionally alter the session by alter session set nls_date_format ...
Okay, this works fine, too. But only if we make a new connection on the console. New TNS location is found by the script, connection could be established and fetching data from the procedures by cursor is also working. Alle DATE types are formatted as specified.
Now, if we try to make this connection at apache environment, it fails. At first the datasource name could not resolved by DBI/DBD::Oracle. I think this is because of our new TNSNAMES.ORA file or rather the location is not found by DBI/DBD::Oracle in Apache context (published by $ENV{TNS_ADMIN}). But I don't know why???
The second problem is (if I create a dirty workaround for our first one) that the date format, published by $ENV{NLS_DATE_FORMAT} is only working on first level of our cursor select.
BEGIN OPEN :cursor FOR SELECT * FROM TABLE(stored_procedure) END;
The example above returns collection types of object which are containing date attributes. In Apache context the format published by NLS_DATE_FORMAT is not recognized. If I use a simple form of the example like this
BEGIN OPEN :cursor FOR SELECT SYSDATE FROM TABLE(stored_procedure) END;
the result (a single date field) is formatted well. So I think subordinated structures were not formatted because $ENV{NLS_DATE_FORMAT} works only in console context and not in Apache context, too.
So there must be a problem with the perl environment variables (%ENV) running under Apache and mod_perl. Maybe a problem of mod_perl?
I am at my wit's end. Maybe anyone in the whole wide world has a solution ... and excuse my english :-) If you need some further explanations, I will try to define it more precisely.

If your problem is that changes to %ENV made while processing a request don't seem to be honoured, this is because mod_perl assumes you might be running multiple threads and doesn't actually change the process environment when you change %ENV, so external libraries (like the oracle client) or child processes don't see the change.
You can work around it by first using the prefork MPM, so there aren't any threading issues, and then making changes to the environment using Env::C instead of the %ENV hash.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas