Create a repository on a remote server with RDF4J - graphdb

I've been trying to create a new repository on a remote GraphDB server using RDF4J, but I'm having problems.
This runs, but is seemingly not correct
HTTPRepositoryConfig implConfig = new HTTPRepositoryConfig(address);
RepositoryConfig repoConfig = new RepositoryConfig("test", "test", implConfig);
Model m = new
However, based on the info I get from "edit repository" in the workbench, the result doesn't look right. All the values are empty, except for id and title.
This fails
I tried to copy the settings from an existing repository that I created on the workbench, but that failed with:
org.eclipse.rdf4j.repository.config.RepositoryConfigException:
Unsupported repository type: owlim:MonitorRepository
The code for that attempt is inspired by the one found here . Except that the config file is based on an existing repo, as explained above. I also tried to config file provided in the example, but that failed aswell:
org.eclipse.rdf4j.repository.config.RepositoryConfigException:
Unsupported Sail type: graphdb:FreeSail
Anyone got any tips?
UPDATE
As Henriette Harmse correctly pointed out, I should have provided my code, not simply linked to it. That way I might have discovered that I hadn't done a complete copy after all, but changed the important first bits that she points out in her answer. Full code below:
String address = "serveradr";
RemoteRepositoryManager repositoryManager = new RemoteRepositoryManager( address);
repositoryManager.initialize();
// Instantiate a repository graph model
TreeModel graph = new TreeModel();
InputStream config = Rdf4jHelper.class.getResourceAsStream("/repoconf2.ttl");
RDFParser rdfParser = Rio.createParser(RDFFormat.TURTLE);
rdfParser.setRDFHandler(new StatementCollector(graph));
rdfParser.parse(config, RepositoryConfigSchema.NAMESPACE);
config.close();
// Retrieve the repository node as a resource
Resource repositoryNode = graph.filter(null, RDF.TYPE, RepositoryConfigSchema.REPOSITORY).subjects().iterator().next();
// Create a repository configuration object and add it to the repositoryManager
RepositoryConfig repositoryConfig = RepositoryConfig.create(graph, repositoryNode);
It fails on the last line.
ANSWERED #HenrietteHarmse gives the correct method in her answer below. The error is caused by missing dependencies. Instead of using RDF4J directly, I should have used the graphdb-free-runtime.

There are a number of issues here:
(1) RepositoryManager repositoryManager = new LocalRepositoryManager(new File(".")); will create a repository where ever your Java application is running from.
(2) Changing to new LocalRepositoryManager(new File("$GraphDBInstall/data/repositories")) will cause the repository to be created under the control of GraphDB (assuming you have a local GraphDB instance) only if GraphDB is not running. If you start GraphDB after running your program, you will be able to see the repository in GraphDB workbench.
(3) What you need to do is get the repository manager of the remote GraphDB, which can be done with RepositoryManager repositoryManager = RepositoryProvider.getRepositoryManager("http://IPAddressOfGraphDB:7200");.
(4) In the way you have specified the config, you cause the RDF graph config to be lost. The correct way to specify it is:
RepositoryConfig repositoryConfig = RepositoryConfig.create(graph, repositoryNode);
repositoryManager.addRepositoryConfig(repositoryConfig);
(5) A minor issue is that GraphUtil.getUniqueSubject(...) has been deprecated, for which you can use something like the following:
Model model = graph.filter(null, RDF.TYPE, RepositoryConfigSchema.REPOSITORY);
Iterator<Statement> iterator = model.iterator();
if (!iterator.hasNext())
throw new RuntimeException("Oops, no <http://www.openrdf.org/config/repository#> subject found!");
Statement statement = iterator.next();
Resource repositoryNode = statement.getSubject();
EDIT on 20180408:
(5) Or you can use the compact option as #JeenBroekstra suggested in the comments:
Models.subject(
graph.filter(null, RDF.TYPE, RepositoryConfigSchema.REPOSITORY))
.orElseThrow(() -> new RuntimeException("Oops, no <http://www.openrdf.org/config/repository#> subject found!"));
EDIT on 20180409:
For convenience I have added the complete code example here.
EDIT on 20180410:
So the actual culprit turned out to be an incorrect pom.xml. The correct version is as below:
<dependency>
<groupId>com.ontotext.graphdb</groupId>
<artifactId>graphdb-free-runtime</artifactId>
<version>8.4.1</version>
</dependency>

I believe I just had the same issue. I used the example code from GraphDB Free for running with RDF4J as a remote service and ran into the same exception as you (Unsupported Sail type: graphdb:FreeSail). Henriette Harmse's answer does not directly address this issue but one should follow the suggestions given there to avoid running into issues later. In addition, based on a look into the RDF4J code you need the following dependency in your pom.xml file (assuming GraphDB 8.5):
<dependency>
<groupId>com.ontotext.graphdb</groupId>
<artifactId>graphdb-free-runtime</artifactId>
<version>8.5.0</version>
</dependency>
This seems to be because there is some kind of service loading going on with META-INF, which I frankly am not familiar with. Maybe someone can provide more details in the comments. The requirement for adding this dependency in also seems to be absent from the instructions, so if this works for you, please let me know. Others who followed the same steps we did should be able to resolve this issue as well then.

Related

opendaylight: Genius install flow in switch

I am using opendaylight / Carbon and am trying to work with the Genius wrapper. I want to install a flow in a switch based on a MAC address match for an incoming packet. The instruction I want to install is a "GOTO" instruction. I proceed as follows:
FlowEntityBuilder flowEntityBuilder = new FlowEntityBuilder();
flowEntityBuilder.setTableId(tableId)
.setDpnId(dpnId)
.setFlowId(FlowUtils.createFlowId().toString())
.setFlowName("gotoTable1");
MatchInfo matchInfo = new MatchEthernetSource(macAddress);
InstructionInfo instructionInfo = new InstructionGotoTable(tableId);
FlowEntity flowEntity = flowEntityBuilder.addInstructionInfoList(instructionInfo).addMatchInfoList(matchInfo).build();
mdsalApiManager.installFlow(dpnId,flowEntity);
Mu intention was to create a flow entity and install it using the IMDSalApiManager.installFlow method.
Here is the exception that I see:
java.lang.IllegalArgumentException: Node (urn:opendaylight:flow:inventory?revision=2013-08-19)ethernet-source is missing mandatory descendant /(urn:opendaylight:flow:inventory?revision=2013-08-19)address
Any help debugging this would be appreciated.
This is how you build a GOTO instruction with OpenDaylight :
GoToTableBuilder gttb = new GoToTableBuilder();
gttb.setTableId(tableGoto);
Instruction gotoInstruction = new InstructionBuilder()
.setOrder(1).setInstruction(new GoToTableCaseBuilder()
.setGoToTable(gttb.build())
.build())
.build();
You can use this to adjust your code.
It turned out to be an issue on my end where the MacAddress supplied was null. I fixed this problem. However I still do not see the flow in the switch.

How to set Neo4J config keys in gremlin-scala?

When running a Neo4J database server standalone (on Ubuntu 14.04), configuration options are set for the global installation in etc/neo4j/neo4j.conf or possibly $NEO4J_HOME/conf/neo4j.conf.
However, when instantiating a Neo4j database from Java or Scala using Apache's Neo4jGraph class (org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph), there is no global installation, and the constructor does not (as far as I can tell) look for any configuration files.
In particular, when running the test suite for my application, I end up with many simultaneous instances of Neo4jGraph, which ends up throwing a java.net.BindException: Address already in use because all of these instances are trying to communicate over a small range of ports for online backup, which I don't actually need. These channels are set with config options dbms.backup.address (default value: 127.0.0.1:6362-6372) and dbms.backup.enabled (default value: true).
My problem would be solved by setting dbms.backup.enabled to false, or expanding the port range.
Things that have not worked:
Creating /etc/neo4j/neo4j.conf containing the line dbms.backup.enabled=false.
Creating the same file in my project's src/main/resources directory.
Creating the same file in src/main/resources/neo4j.
Manually setting the configuration property inside the Scala code:
val db = new Neo4jGraph(dataDirectory)
db.configuration.addProperty("dbms.backup.enabled",false)
or
db.configuration.addProperty("neo4j.conf.dbms.backup.enabled",false)
or
db.configuration.addProperty("gremlin.neo4j.conf.dbms.backup.enabled",false)
How should I go about setting this property?
Neo4jGraph configuration through TinkerPop is accomplished by a pass-through of configuration keys. In TinkerPop 3.x, that would mean that all Neo4j keys prefixed with gremlin.neo4j.conf that are provided via Configuration object to Neo4jGraph.open() or GraphFactory.open() will be passed down directly to the Neo4j instance. You can see examples of this here in the TinkerPop documentation on high availability configuration.
In TinkerPop 2.x, the same approach was taken however the key prefix was instead blueprints.neo4j.conf.* as discussed here.
Manipulating db.configuration after the database connection had already been opened was definitely futile.
stephen mallette's answer was on the right track, but this particular configuration doesn't appear to pass through in the way his linked example does. There is a naming mismatch between the configuration keys expected in neo4j.conf and those expected in org.neo4j.backup.OnlineBackupKernelExtension. Instead of dbms.backup.address and dbms.backup.enabled, that class looks for config keys online_backup_server and online_backup_enabled.
I was not able to get these keys passed down to the underlying Neo4jGraphAPI instance correctly. What I had to do, instead, was the following:
import org.neo4j.tinkerpop.api.impl.Neo4jFactoryImpl
import scala.collection.JavaConverters._
val factory = new Neo4jFactoryImpl()
val config = Map(
"online_backup_enabled" -> "true",
"online_backup_server" -> "0.0.0.0:6350-6359"
).asJava
val db = Neo4jGraph.open(factory.newGraphDatabase(dataDirectory,config))
With this initialization, the instance correctly listened for backups on port 6350; changing "true" to "false" disabled backup listening.
Using Neo4j 3.0.0 the following disables port listening for me (Java code)
import org.apache.commons.configuration.BaseConfiguration;
import org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph;
BaseConfiguration conf = new BaseConfiguration();
conf.setProperty(Neo4jGraph.CONFIG_DIRECTORY, "/path/to/db");
conf.setProperty(Neo4jGraph.CONFIG_CONF + "." + "dbms.backup.enabled", "false");
graph = Neo4jGraph.open(config);

Hadoop: wrong classpath in map reduce job

I'm running a cloudera cluster in 3 virtual maschines and try to execute hbase bulk load via a map reduce job. But I got always the error:
error: Class org.apache.hadoop.hbase.mapreduce.HFileOutputFormat not found
So, it seems that the map process doesnt find the class. So I tried this:
1) add the hbase.jar to the HADOOP_CLASSPATH on every node
2) adding TableMapReduceUtil.addDependencyJars(job) / TableMapReduceUtil.addDependencyJars(myConf, HFileOutputFormat.class) to my source code
nothing worked. I have absolute no idea why the class is not found, because the jar/class is definitely available in the classpath.
If I take a look into the job.xml I see the following entry:
name=tmpjars value=file:/C:/Users/Thomas/.m2/repository/org/apache/zookeeper/zookeeper/3.4.5-cdh4.3.0/zookeeper-3.4.5-cdh4.3.0.jar,file:/C:/Users/Thomas/.m2/repository/org/apache/hbase/hbase/0.94.6-cdh4.3.0/hbase-0.94.6-cdh4.3.0.jar,file:/C:/Users/Thomas/.m2/repository/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.3.0/hadoop-core-2.0.0-mr1-cdh4.3.0.jar,file:/C:/Users/Thomas/.m2/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar,file:/C:/Users/Thomas/.m2/repository/com/google/protobuf/protobuf-java/2.4.0a/protobuf-java-2.4.0a.jar
This seems a little bit odd to me, these are my local jars on the windows system. Maybe this should be the hdfs jars? If yes, how can I change the values for "tmpjars"?
Here is the java code I try to execute:
configuration = new Configuration(false);
configuration.set("mapred.job.tracker", "192.168.2.41:8021");
configuration.set("fs.defaultFS", "hdfs://192.168.2.41:8020/");
configuration.set("hbase.zookeeper.quorum", "192.168.2.41");
configuration.set("hbase.zookeeper.property.clientPort", "2181");
Job job = new Job(configuration, "HBase Bulk Import for "
+ tablename);
job.setJarByClass(HBaseKVMapper.class);
job.setMapperClass(HBaseKVMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(KeyValue.class);
job.setOutputFormatClass(HFileOutputFormat.class);
job.setPartitionerClass(TotalOrderPartitioner.class);
job.setInputFormatClass(TextInputFormat.class);
HFileOutputFormat.configureIncrementalLoad(job, hTable);
FileInputFormat.addInputPath(job, new Path("myfile1"));
FileOutputFormat.setOutputPath(job, new Path("myfile2"));
job.waitForCompletion(true);
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(
configuration);
loader.doBulkLoad(new Path("myFile3"), hTable);
EDIT:
I tried a little bit more and its totaly strange. I add the following line to the java code:
job.setJarByClass(HFileOutputFormat.class);
after I executed this, the error is gone, but another class not found exception appear:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class mypackage.bulkLoad.HBaseKVMapper not found
HBaseKVMapper is my custom Mapper Class I want to execute. So, I tried to add it with "job.setJarByClass(HBaseKVMapper.class)", but it doesnt work since its its only a class file and no jar. So I generated a Jarfile including HBaseKVMapper.class. After that, I executed it again and now got the HFileOutputFormat.class not found exception again.
After debugging a little bit, I found out that the setJarByClass Methode only copies the local jar file to .staging/job_#number/job.jar on HDFS. So, this setJarByClass() Method will only work for one jar file because it overwrites the job.jar after executing setJarByClass() again with another jar.
While searching for the eroor I saw the following strcuture in the the job staging direcotry:
and inside the libjars direcotry I saw the relevant jar files
so, the hbase jar is inside the libjars directory but the jobtracker doesn't use this it for executing the job. Why?
I would try using Cloudera Manager (free version) as it takes care of these issues for you. Otherwise note the following:
Both your own classes and the HBase Class HFileOutputFormat need to be available on the classpath locally and remotely.
Submitting the job
Meaning getting the classpath right locally for when your driver runs:
$ env HADOOP_CLASSPATH=$(hbase classpath) hadoop jar path/to/jar class....
On the server
In your hadoop-env.sh
export HADOOP_CLASSPATH=$(hbase claspath)
or use
TableMapReduceUtil.addDependencyJars
I found a "hacked" solution which worked for me, but I'm not happy with it because it's not really practicable.
My "hacked" solution:
create one big Jar with all necessary class files, I called it "big.jar" and add it to the local (eclipse) classpath
add the line: job.setJarByClass(MyMapperClass.class) ... the MyMapperClass has to be in the big.jar
When I execute this the big.jar will be copied for every job to the filesystem. No errors anymore. The problem is, that the jar is 80mb in size and have to be copied every time.
If anywone knows a better way I would be tahnkful if he could tell me how.
EDIT:
Now I try to execute jobs with Apache Pig and have exactly the same problem. My hacked soultion doesn't work in this case because pig creats the jobs automaticly. Here is the pig error:
java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.mapreduce.TableSplit not found

repository created via RepositoryManager not behaving the same as repo created via workbench

I create sesame native java store using following code:
Create a native java store:
// create a configuration for the SAIL stack
boolean persist = true;
String indexes = "spoc,posc,cspo";
SailImplConfig backendConfig = new NativeStoreConfig(indexes);
// stack an inferencer config on top of our backend-config
backendConfig = new ForwardChainingRDFSInferencerConfig(backendConfig);
// create a configuration for the repository implementation
RepositoryImplConfig repositoryTypeSpec = new SailRepositoryConfig(backendConfig);
RepositoryConfig repConfig = new RepositoryConfig(repositoryId, repositoryTypeSpec);
repConfig.setTitle(repositoryId);
manager.addRepositoryConfig(repConfig);
Repository repository = manager.getRepository(repositoryId);
create a in-memory store:
// create a configuration for the SAIL stack
boolean persist = true;
SailImplConfig backendConfig = new MemoryStoreConfig(persist);
// stack an inferencer config on top of our backend-config
backendConfig = new ForwardChainingRDFSInferencerConfig(backendConfig);
// create a configuration for the repository implementation
RepositoryImplConfig repositoryTypeSpec = new SailRepositoryConfig(backendConfig);
RepositoryConfig repConfig = new RepositoryConfig(repositoryId, repositoryTypeSpec);
repConfig.setTitle(repositoryId);
manager.addRepositoryConfig(repConfig);
Repository repository = manager.getRepository(repositoryId);
When I store data in this repo and query back, the results are not same as the results returned from repository created using workbench. I get duplicate/multiple entries in my resultset.
Same behavior for in-memory store.
I also observed that my triples belong to a blank context which is not the case in repository created via workbench.
What is wrong with my code above?
There is nothing wrong with your code, as far as I can see. If the store as created from the Workbench behaves differently, this most likely means that it's configured with a different SAIL stack.
The most likely candidate for the difference is this bit:
// stack an inferencer config on top of our backend-config
backendConfig = new ForwardChainingRDFSInferencerConfig(backendConfig);
You have configured your repository with a reasoner on top here. If the repository created via the workbench does not use a reasoner, you will get different results on identical queries (including, sometimes, apparent duplicate results).
If you consider this a problem, you can fix this in two ways. One is (of course) to simply not create your repository with a reasoner on top. The other is to disable reasoning for specific queries. In the Workbench, you can do this by disabling the "Include inferred statements" checkbox in the query screen. Programmatically, you can do this by using Query.setIncludeInferred(false) on your prepared query object. See the javadoc for details.

extra-paths not added to python path with zc.recipe.testrunner

I am trying to run tests by adding a version of tornado downloaded from github.com in the sys.path.
[tests]
recipe = zc.recipe.testrunner
extra-paths = ${buildout:directory}/parts/tornado/
defaults = ['--auto-color', '--auto-progress', '-v']
But when I run bin/tests I get the following error :
ImportError: No module named tornado
Am I not understanding how to use extra-paths ?
Martin
Have you tried looking into generated bin/tests script if it contains your path? It will tell definitely if your buildout.cfg is correct or not. Maybe problem is elsewhere. Because it seem that your code is ok.
If you happen to regularly include various branches from git/mercurial or elsewhere to buildout, you might be interested in mr.developer. mr.developer can download and add package to develop =. You wont need to set extra-path in every section.