File IO in cypher/neo4j, or via other language integration - file-io

Is there file IO capability built into cypher/neo4j, or does one accomplish that from another language, such as by using neo4j from python or java? An example motivation is producing system documentation from a graph of knowledge about a system.

The neo4j ecosystem provides numerous built-in ways to do file IO, depending on what aspect of neo4j you are using.
As a few examples:
The Cypher language has a LOAD CSV clause that supports importing data from CSV files.
The APOC plugin provides many useful procedures that inpout from or export to files.
The neo4j browser provides ways to export request results (as images or as data).

Related

Load testing tools for SpringBoot Web applications

Currently we use soap-ui/manual posting of xmls etc for load testing our spring boot webapplications.
Looking for any free load testing tools that others have been using and want to recommend?
Thanks for your help!
The specific load testing or in general testing tool is the one you will "like" more by lots of personal/company needs. There are plenty of them, here is the short list of them I used:
Blazemeter
Gatling
JMeter
I presonally spent most of load testing time in Gatling, first of all it is using scala development language and quite easy to include in your Java project via maven/gradle, secondaly apart from other benefits it have direct JDBC connection possibilty which let's you to have your test data directly in database. Lots of other pros. But one more time it is strongly my opinion and my preferences. E.g. if you are big fan of XML you will most probably like JMetter, in general Blazemter is kind of a next level of JMeter.
I usually code the test using my current favorite language, Python in recent times. For example in python this would be some code around the requests library, and possibly some multi-process code- nothing to heavy or complex.
I find it more flexible to code myself, on average having more control over the load (there are pros and cons to using a prebuilt tool in this sense) and it usually integrates better with other code in my automation suite.
But the answer is somewhat context dependent, is there someone with the knowledge and resource to develop a tool ? do you have to document the results or make them comparable with other systems?

Lucene index replication

On a load balanced environment where in i have standalone Java thread(essentially through a spring boot jar for sake of simplicity lets call it Project 1), which reads some metadata and updates lucene indexes at a certain location.
Then There is an actual web application(Project 2) through which I want to query through these indexes(which another Project 1 has created) however the index file, what are the available options:
Copy the index file periodically to the lucene of web application which would not be possible as we may have to re kick the application I trust.
Maintain both projects as one package in a war and so single instance of lucene is available to both.
Any other replication strategy??
Any help on above would be highly appreciated.
Best,
- Vaibhav
This really depends on your non functional requirements by your application and any given architectural decision driven by them.
But here some thoughts:
copy an index like from folderA to folderB sounds like a pretty bad idea. especially if both applications have to run all the time.
You don't want a direct dependency between these two applications so you have to some kind of build your own lucene component which is serving API functionalities you need.
I would recommend building a component with a proper API. This component uses lucene as library and in cases like multiple systems or instances like to use this component i would suggest a nice NRT (Near Real Time) implementation of Lucene.

Loading big RDF file into Sesame

I'm trying to create a SPARQL endpoint based on Sesame. I installed Tomcat, PostgreSQL, and deployed a Sesame's web application. I created a repository based on PostgreSQL RDF store. Now i need to load a big ttl file (540M triples, file size is several GB) into a repository. Loading a big file over Workbench is not a good solution - it will take several days. What is the best non-programming solution to load the data? Are there tools like "console" to load data? For example, Virtuoso has isql tool for bulk loading...
There is no ready-made bulk loading tool available for Sesame that I am aware of - though Sesame-compatible triplestore vendors do have such tooling available as part of their specific database. Programming a bulk-upload solution is not particularly hard, but we somehow never got around to including such a tool in the Sesame core distribution.
540M triples, by the way, is probably too large for any of Sesame's default stores - the Native Store only scales to about 150M, and loading such a large dataset into the memory store is just too unwieldy (even if you had the available RAM). So you probably need to look into using a Sesame-compatible database provided by a third party. There are many choices available, both commercial and free/open-source, see this overview on the Sesame website for a list of some suggestions.

Importing data from SQL Server to Neo4J

First let me explain. My platform is mostly Windows and my data mostly resides in a relational database (SQL Server 2008). I primarily work with C# but occasuionally work with PERL and JavaScript. I was looking to learning what a graph database could do for my data but there seems to be a continual stream of tools and utilities that are not available that I need to install and learn. I am so busy learning the tools that I loose focus of what I really want and that is to work with a graph database.
It seems that Neo4j is relatively small and should be accessible to evaluate its features. I would like to import my data from an existing SQL database into Neo4J with the relationships established initially via the foreign keys. The idea seems relatively straightforward but it seems I need to learn Java, PHP, etc. not only to access Neo4J but also to access the existing database. I was wondering if anyone had some recommendations, tools, documentation that would accomplish this goal fairly simply. Do I do down the route of PHP? Java? What additional libraries/packages do I need? What tools are most useful? Thank you.
I think you want the Neo4J batch importer, which can be found on github. Using this, we were able to export 20 million nodes and relationships to import them into Neo4j.
I think you will have to write your own. Neo4jD looks interesting.

J2SE desktop applications - JPA database vs Collections?

I come from a web development background and haven't done anything significant in Java in quite some time.
I'm doing a small project, most of which involves some models with relationships and straightforward CRUD operations with those objects.
JPA/EclipseLink seems to suit the problem, but this is the kind of app that has File->Open and File->Save features, i.e. the data will be stored in files by the user, rather than persisting in the database between sessions.
The last time I worked on a project like this, I stored the objects in ArrayList objects, but having worked with MVC frameworks since, that seems a bit primitive. On the other hand, using JPA, opening a file would require loading a whole bunch of objects in the database, just for the convenience of not having to write code to manage the objects.
What's the typical approach for managing model data with Java SE desktop applications?
JPA was specifically build with databases in mind. This means that typically it operates on a big datastore with objects belonging to many different users.
In a file based scenario, quite often files are not that big and all objects in the file belong to the same user and same document. In that case I'd say for a binary format the old Java serialization still works for temporary files.
For longer term or interchangeable formats XML is better suited. Using JAXB (included in the standard Java library) you can marshal and demarshal Java objects to XML using an annotation based approach that on the surface resembles JPA. In fact, I've worked with model objects that have both JPA and JAXB annotations so they can be stored in a Database as well as in an XML file.
If your desktop app however uses files that represents potentially huge datasets for which you need paging and querying, then using JPA might still be the better option. There are various small embedded DBs available for Java, although I don't know how simple it is to let a data source point to a user selected file. Normally a persistence unit in Java is mapped to a fixed data source and you can't yet create persistence units on the fly.
Yet another option would be to use JDO, which is a mapping technology like JPA, but not an ORM. It's much more independent of the backend persistence technology that's being used and indeed maps to files as well.
Sorry that this is not a real answer, but more like some things to take into account, but hope it's helpful in some way.