um.. What is AgensGraph? - definition

I heard about AgensGraph, but I wonder exactly what it is.
If you know someone, please let me know.

I got "What is AgensGraph" from AgensGraph documentation, which you can find this document from a following link: http://bitnine.net/support/documents_backup/quick-start-guide-html/
Agens Graph is a new generation multi-model graph database for the modern complex data environment. Agens Graph is a multi-model database, which supports relational and graph data model at the same time. It enables developers to integrate the legacy relational data model and the nobel graph data model in one database. Agens Graph supports Ansi-SQL and Open Cypher (http://www.opencypher.org). SQL query and Cypher query can be integrated into a single query in Agens Graph.
Agens Graph is based on powerful PostgreSQL RDBMS, so it is very robust, fully-featured and ready to enterprise use. It is optimzied for handling complex connected graph data but at the same time, it provides a plenty of powerful database features essential to the enterprise database environment, like ACID transaction, multi version concurrency control, stored procedure, trigger, constraint, sophistrated monitoring and flexible data model (JSON). Moreover, Agens Graph can leverage the rich eco-systems of PostgreSQL and can be extended with many outstanding external modules, like PostGIS.

Related

How to mix RDMS DB with a Graph DB

I am developing a website using Django, and PostgreSQL which would seemingly have huge amount of data as gathered in social network sites.
I need to use RDMS with SQL for tabular data for less SQL complexity and also Graph DB with Cipher for large data for high query complexity.
Please let me know how to go about this. Also please let me know whether it is feasible.
EDIT: Clarity as asked in Comments:-
The database structure can be similar to that of a social network like Facebook. I've checked FB Engineering page for their open graph. For graph DB I can find only Neo4J graph DB with proper ACID values though I would prefer an open source graph DB. Graph DB structure, I require basically for summary of huge volume data pertaining to relationships like friends, updates, daily user related updates as individual relations. Horizontal Scalability is important for future up gradation to me.
I intend to use PostgreSQL for base informational data and push the relational data updates to graph DB like Facebook uses both MySql and open graph.
Based on your reply to my queries. I would first suggest looking at TitanDB. I believe it fulfills many of your requirements:
It is open source.
It scales horizontally.
In addition to meeting your requirements it has existed for quite sometime and many companies are using it in Production. The only thing you would have to get used to is that it uses TinkerPop traversals, not Cypher queries. Also note that I believe Titan is not ACID for most backends. This is a result of it being horizontally scalable.
If you would like a more structured (but significantly less mature) approach to Graph DBs then you can look at the stack that myself and some colleagues are working on MindmapsDB which sits on top of Titan, but uses a more "sql-like" query language.
OrientDB Gremlin is also a very good option but lacks the maturity and support of Titan.
There are many other graph vendors out there such as DSE Graph, IBM Graph, etc . . . but the ones I have listed above are the opensource ones I have worked with.

Graph database implemented using key value store

I have a requirement for a graph database that needs to be backed-up and potentially accessed at a lower level of abstraction. It is also must be distributed for the sake of load balancing, (single master replication will do).
I know that it is possible to implement a graph database using a self-referencing key-value store. The Git object database is an example of this pattern. One of the frustrating things I find about most graph databases is that they do not "disclose" their underlying persistence layer in public api.
Do any replicated graph databases exist that allow for an underlying key-value stores to be "plugged-in" or accessed directly?
I addition to Gremlin/Tinkerpop, mentionned by #amirouche above, I'm aware of two solutions:
Redis, completed by its Graph Module, matches your description.
Cayley could also be a solution as it provides graph features over various SQL and NoSQL backends, some of them supporting distributed mode (Postgresql, MySQL, MongoDB, CockroachDB)

How is a graph database different to a graph represented in a relational database?

I can represent a graph trivially in a relational database with two tables: vertex and edge. Richer structure like "properties" and "labels" (in Neo4j terminology) can be represented as more tables. Have I misunderstood, or does a graph database like Neo4j allow me to represent anything that is not easily representable relationally?
I can query this graph using SQL, with recursive subqueries if necessary, and with multiple separate queries in a transaction if necessary. Have I misunderstood, or does a graph query language like Cypher provide greater expressivity than SQL?
The relational model of a graph is stored and queried efficiently, AFAIK. Does a graph database structure its storage, or optimize its queries, in some way that provides performance characteristics that cannot be gained from a relational database?
My relational database provides ACID guarantees, and allows me to write fairly expressive constraints on my graph data (and even more constraints if I break out the single vertex table into a properly normalized schema). Have I misunderstood, or does a graph database provide some guarantees or verify some kind of correctness properties that are not available in my relational database?
I am struggling to see how a graph database such as Neo4j is anything but a subset of the relational model. (Apologies for using Neo4j as representative of all graph databases here; it's the only one I've looked at.)
In short: Is graph database ⊆ relational database?
Is One a Subset of the Other?
Definitely no; both are eventually modeled on the mathematical concepts of relations or graphs. Both models being super-general, there is basically no information content that you can't represent using either one. This means that while they might differ in many syntactic sugar ways, and in the way they encourage you to model/think of data (just like programming languages differ) they both have the same "expressive power".
What you describe in your question is one way of modeling a graph (vertex and edge tables). That implementation of a graph is a subset of what relational can express. Similarly, I could mock up tables and rows using a graph database, but I would have chosen a particular implementation - this wouldn't demonstrate that relational data is a subset of graph data.
So the first insight is that they have roughly equal expressive power. You can model anything in either. So the real question you should be asking is why would you choose one over the other?
Why Would you Choose One Over The Other?
All databases exist to facilitate data access. Simply put, you store it so that you can get at the data. But exactly how do you need to get at the data? There are many different access patterns. The design space for databases in general is enormous. Any time a database makes a certain decision, that tends to automatically make it better at some things, worse at others. For example, when you create an index in a relational database, you've just sped up reads -- but you've degraded the performance of writes, because the index has to be maintained.
So, when approaching the question, "Graph or Relational?" - you should first figure out what does your data look like, and what do your data access patterns look like. If you knew what those things were, then you could evaluate a bunch of databases, see the choices they've made, and pick the one that's a good fit for what you need. And then if a DBMS made a choice that would make certain access patterns difficult, buggy, or slow -- you could avoid that DBMS for that data set.
It's (Partly) About Data Access Patterns
Graph databases tend to be better than relational when the data being stored is a graph, when the data access pattern involves a lot of graph traversal, or both. (See this other answer I wrote for a more in-depth discussion of why this is). That link there also provides the answer to your specific question: "Does a graph database structure its storage, or optimize its queries, in some way that provides performance characteristics that cannot be gained from a relational database?"
You say: I can query this graph using SQL, with recursive subqueries if necessary, and with multiple separate queries in a transaction if necessary. -- So technically this is true, but let's take an example to see why relational might not be good enough. Say I have a graph (in RDBMS, a table of nodes, a table of edges, with a join key between them). Let's say I pick out one node, and I want to identify everything that is between 6 and 8 hops away from that node. Here's the cypher to do that:
match (myChosenNode {id: 'foo'})-[r:relationshipType*6..8]->(y) return y;
I really want to see you write that up as SQL. It's possible, but it's hard and complicated. And it will also perform like a dog, because of the sheer quantity of joining you'll be doing on non-trivial quantities of data.
ACID
OK now on the ACID guarantees, Neo4J provides transactions with ACID guarantees. The answer will be different for different graph databases though, particularly the ones implemented on top of Hadoop/HBase. YMMV there, so check the fine print with each database.
It is true that there are a number of features of RDBMS that you typically won't find in graph databases, examples being triggers and certain kinds of constraints. As a long-time RDMBS nerd myself, I'm not so happy about those things being missing, I think they are valuable.
Summary
What this mostly boils down to for me, and many other engineers I work with is:
What is your data?
What are your access patterns?
If your data is a graph, or your access patterns involve a lot of graph traversal, you should probably use a graph DB. If your data is more tabluar, or your access patterns are more oriented around bulk scans, then you should use RDBMS. At the end of the day, they're two different tools with different niches. If you use them in their area of strength, you'll be happy. If you use RDBMS to model a graph just "because you can", you'll suffer. If you use a graph database to do a lot of bulk scans of every node in every graph, you'll suffer. Like most of tech, it's just about using the right tool for the job.

Is Neo4j's Cypher query language open-source?

what is the status of the Neo4j's language Cypher? I really like it, but I would like to avoid the Neo4j lock-in. Are there some other Cypher interface like there are in Gremlin?
Regards
Cypher is totally OSS, see https://github.com/neo4j/community/tree/master/cypher . Right now there is one implementation, but potentially there can be more. It's just too early in the evolution to make it a standard, we are still heavily experimenting with it.
Check out Pixy, a declarative graph query language that works on any Blueprints-compatible graph database. It is built on Gremlin/Pipes from the Tinkerpop software stack.
Pixy enables complex pattern matching and logic programming on graph databases by translating PROLOG-style rules and goals to Gremlin pipelines that represent graph traversal operations. It has some additional advantages over Cypher, other than avoiding vendor lock-in.
Pixy is available under the Apache 2.0 license.
openCypher has been implemented by many databases. According to their site these are some of them:
Agens Graph: A multi-model database
Amazon Neptune
AnzoGraph: A native massively parallel (MPP) graph analytical database
ArcadeDB
CAPS: Cypher for Apache Spark
Cypher for Gremlin
Katana Graph
Memgraph: An in-memory, transactional graph database
Neo4j: A native, transactional property graph database
RedisGraph: A graph module for Redis
SAP HANA Graph

Graph Database: TinkerPop/Blueprints vs W3C Linked data

Looking for an infrastructure for network analysis for heterogeneous (multiple node types (multi-mode), multiple edge type (multi-relation) and multiple descriptive features (multi-featured)) networks, I've noticed that there are two standard stacks in the Graph Database world:
On one hand we have the ThinkPop/Blueprint property graph model. It is supported by Neo4j, OrientDB GraphDB, Dex, Titan, InfiniteGraph, etc.
The Tinkerpop stack includes the Blueprint property graph model interface, the Gremlin graph traversal language, and the Furnace graph algorithms package.
On the other hand we have W3C's Linked Data technology stack, which is supported by AllegroGraph, 4store, Oracle Database Semantic Technologies, OWLIM, SYSTap BigData, etc.
Semantic data is represented using RDF/RDFS/OWL, and can be queried using SPARQL On top it offers rules and reasoning capabilities.
Now, suppose that I want to represent heterogeneous data in a graph database, and analyse such data (statistics, relations discovery, structure, evolution, etc.) (I know these terms are wide and vague) - What are the relative strengths of each model for various types of network analysis tasks? Do these two models complement each other?
Couple things, your exemplars of linked data stacks are all triple stores. You would start building a linked data application by first getting your triple store set up, but calling a database a linked data stack is incorrect imo. That's also an incomplete list of triple stores, there is also Sesame, Jena, Mulgara, and Stardog. Sesame and Jena kind of pull double duty, they're the two de-facto standard Java APIs for the semantic web, but both provide triple stores that come bundled with the APIs. I also know that both Cray and IBM are working on triple stores, but I don't know much about either at this point. I do know that Stardog works well with the TinkerPop stack and that it's basically a drop in and start writing Gremlin queries against the RDF.
I think the strengths of RDF/OWL is that you 1) get a real query language 2) they're w3c standards and 3) you get reasoning, if the triple store supports it, for free (more or less -- you still have to write an ontology).
With RDF/OWL/SPARQL being standards, it makes it quite easy to pick up and move to a new triple store with a different feature set should you need to, your data is already in a common format that everyone understands and any application logic encoded as queries are completely portable. And in most cases, you'd be writing against either the Sesame or Jena APIs, or working over SPARQL protocol, so you might need to only change your config/init. I think that's a big win in the early prototyping phases.
I also think that RDF/OWL especially combined w/ reasoning and the kinds of complex SPARQL queries that you can create with the new SPARQL 1.1 really suit themselves well to building complicated analytic applications. Also, I think that the impression that most people have that RDF triple stores don't scale is no longer correct. Most triple stores at this point easily scale into the billions of triples and have very competitive throughput numbers as well.
So based on what I think you might be doing, I think semweb might be a better bet for you. I did a similar project a few years back using RDF & RDFS for the backend fronted by a simple Pylons based webapp and was very happy with the results.