OrientDB select Vertex, Edge pairs from query - sql

In an OrientDb graph database, I'm trying to get some information about Vertex, Edge pairs.
For example, consider the following case:
V1 ---E1---> V2
---E2---> V3 --E3--> V2
I would like to have as result the following 3 rows;
V1, E1
V1, E2
V3, E3
I've tried the following:
select label, flatten(out.label) from V
select label from (select flatten(out) from V)
select label, flatten(out) from V
select flatten(out) from V
select $current, label from (traverse out from V while $depth <= 1) where $depth = 1
But none of these solutions seem to return what I want. How can I return Vertex, Edge pairs?

What you are trying to do is actually extremely simple with OrientDB, it seems you are overthinking the issue.
Let's create your example:
V1 ---E1---> V2
---E2---> V3 --E3--> V2
In OrientDB, you would do this as follows:
/* Create nodes */
CREATE CLASS Node EXTENDS V
CREATE PROPERTY Node.name STRING (MANDATORY TRUE)
CREATE VERTEX Node SET name = 'V1'
CREATE VERTEX Node SET name = 'V2'
CREATE VERTEX Node SET name = 'V3'
/* Create edges */
CREATE CLASS Link EXTENDS E
CREATE PROPERTY Link.name STRING (MANDATORY TRUE)
CREATE EDGE Link
FROM (SELECT FROM Node WHERE name = 'V1')
TO (SELECT FROM Node WHERE name = 'V2')
SET name = 'E1'
CREATE EDGE Link
FROM (SELECT FROM Node WHERE name = 'V1')
TO (SELECT FROM Node WHERE name = 'V3')
SET name = 'E2'
CREATE EDGE Link
FROM (SELECT FROM Node WHERE name = 'V3')
TO (SELECT FROM Node WHERE name = 'V2')
SET name = 'E3'
This creates the following graph:
Now a little explanation of how to query in OrientDB. Let's say you load one vertex: SELECT * FROM Node WHERE name = 'V1'. Then, to load other information, you use:
To load all incoming vertices (skipping the edges): in()
To load all incoming vertices of class Link (skipping the edges): in('Link')
To load all incoming edges: inE()
To load all incoming edges of class Link: inE('Link')
To load all outgoing vertices (skipping the edges): out()
To load all outgoing vertices of class Link (skipping the edges): out('Link')
To load all outgoing edges: outE()
To load all outgoing edges of class Link: outE('Link')
So in your case, you want to load all the vertices and their outgoing edges, so we do:
SELECT name, outE('Link') FROM Node
Which loads the name of the vertices and a pointer to the outgoing edges:
If you would like to have a list of the names of the outgoing edges, we simply do:
SELECT name, outE('Link').name FROM Node
Which gives:
Which is exactly what you asked for in your question. As you can see, this is extremely simple to do in OrientDB, you just need to realize that OrientDB is smarter than you think :)

FLATTEN operator works alone, because get a field and let it to become the result. I don't understand what you want to do. Can you write the expected output please?

The CYPHER syntax, as used in Neo4j finally rescued me.
start n=node(*) MATCH (n)-[left]->(n2)<-[right]-(n3) WHERE n.type? ='myType' AND left.line > right.line - 1 AND left.line < right.line + 1 RETURN n, left, n2, right, n3
The node n is the pivoting element, on wich an filter can be provided, just as on each other step within the path. For me it was important to select a further step depending on an other part of the path.
With OrientDb I couldnt find a way to relate the properties to each other easily.

Related

Filter neo4j result, return distinct combination of node IDs

I have a graph with Airport nodes and Flight relationships, and I want to find triangles from a specific node where the edges are all within 10% length of each other.
MATCH path = (first:Airport{ID: 12953})-[f1:Flight]->
(second:Airport)-[f2:Flight]->
(third:Airport)-[f3:Flight]->
(last:Airport{ID: 12953})
WHERE second.ID <>first.ID AND
third.ID <>first.ID AND
f1.Distance<=(1.1*f2.Distance) AND
f1.Distance<=(1.1*f3.Distance) AND
f2.Distance<=(1.1*f1.Distance) AND
f2.Distance<=(1.1*f3.Distance) AND
f3.Distance<=(1.1*f1.Distance) AND
f3.Distance<=(1.1*f2.Distance)
WITH (first.ID, second.ID, third.ID) as triplet
return count(DISTINCT triplet)
I only want to return a set of nodes once (no matter how many different flights exist between them), but the with line doesn't work. Basically what I want to create is a new type of variable "object" that has the three IDs as its properties and run distinct on that. Is that possible in neo4j? If not, is there some workaround?
You can use the APOC function apoc.coll.sort to sort each list of 3 IDs, so that the DISTINCT option will properly treat lists with the same IDs as being the same.
Here is a simplified query that uses the APOC function:
MATCH path = (first:Airport{ID: 12953})-[f1:Flight]->
(second:Airport)-[f2:Flight]->
(third:Airport)-[f3:Flight]->
(first)
WHERE second <> first <> third AND
f2.Distance<=(1.1*f1.Distance)>=f3.Distance AND
f1.Distance<=(1.1*f2.Distance)>=f3.Distance AND
f1.Distance<=(1.1*f3.Distance)>=f2.Distance
RETURN COUNT(DISTINCT apoc.coll.sort([first.ID, second.ID, third.ID]]))
NOTE: the second <> first test may not be necessary since there should not be any flights (if a "flight" is the same as a "leg") that fly from an airport back to itself.
You can return an object with keys or an array. For example:
UNWIND range(1, 10000) AS i
WITH
{
id1: toInteger(rand()*3),
id2: toInteger(rand()*3),
id3: toInteger(rand()*3)
} AS triplet
RETURN DISTINCT triplet
or
UNWIND range(1, 10000) AS i
WITH
[ toInteger(rand()*3), toInteger(rand()*3), toInteger(rand()*3) ] AS triplet
RETURN DISTINCT triplet
Update. You can simplify your query by reusing a variable in the query, specifying the length of the path and using the list functions:
MATCH ps = (A:Airport {ID: 12953})-[:Flight*3]->(A)
WITH ps
WHERE reduce(
total = 0,
rel1 IN relationships(ps) |
total + reduce(
acc = 0,
rel2 IN relationships(ps) |
acc + CASE WHEN rel1.Distance <= 1.1 * rel2.Distance THEN 0 ELSE 1 END
)) = 0
RETURN count(DISTINCT [n IN nodes(ps) | n.ID][0..3])

How to merge results of two queries in OrientDB

Let's say I have 4 vertex classes: V1,V2,V3,V4
And also 3 edge classes: E1,E2,E3
Then instances of them are (possibly) connected like this:
V1 --E1--> V2
V2 --E2--> V3
V2 --E3--> V4
V3 --E3--> V4
So, graph-wise something like:
V1---E1---V2
| \
E2 E3
| \
V3---E3---V4
With directions shown above.
I'm now interested in paths over the exact edges shown from V1 to V4 (There might be other edges between them as well that we don't know about, so only the edge types already mentioned are ok.)
To check if one of the paths from V1 to V4 exists (rather V4 will be returned if path exists):
SELECT EXPAND(out('E1').out('E3')) FROM V1 WHERE id = <someIdThatV1Has>
To check if the other path exists (rather V4 will be returned if path exists):
SELECT EXPAND(out('E1').out('E2').out('E3')) FROM V1 WHERE id = <someIdThatV1Has>
The only interest I have is to know if ONE of the two paths exists. I would like to do this with one query.
Question
How can I merge these two queries to one query to find out if one of the two paths exists?
(If possible, a general answer to how to merge different traversal queries in OrientDB along with an explicit answer would be highly appreciated.)
Thanks!
Try with unionAll
select expand($c)
let $a = ( SELECT EXPAND(out('E1').out('E3')) FROM V1 WHERE id = <someIdThatV1Has>),
$b = ( SELECT EXPAND(out('E1').out('E2').out('E3')) FROM V1 WHERE id = <someIdThatV1Has>),
$c = unionAll( $a, $b )
You can look the documentation at the following link
http://orientdb.com/docs/2.1/SQL.html#select-from-multiple-targets

Refining a query by Edge metadata

I currently have two Vertex classes, VersionSet and Version, with one non-lightweight Edge class, VersionSetToVersion. The VersionSetToVersion edge class also has a property called status which can have the value 'latest'.
If I have some #rid of a VersionSet vertex (i.e. #14:1), how would I construct an orient-db style SQL query to retrieve only the Version vertex that has a VersionSetToVersion EDGE with a status of 'latest'?
Here's a query that will return all Versions related to the VersionSet with #rid #14:1 regardless of the status property
SELECT out('VersionSetToVersion') FROM #14:1
This returns two VersionSet objects: #15:1, and #15:2, but only the edge to #15:2 has the status of 'latest'.
How can I refine this query by the status property on the EDGE so only #15:2 is returned in the results?
Try this:
SELECT outE('VersionSetToVersion')[status = 'latest'].inV() FROM #14:1
select out_VersionSetToVersion[status = 'latest'] from 14:1

SQL: Finding a subgraph

I have a graph network stored in an SQL server. The graph network ( collection of labeled, undirected and connected graphs) is stored in Vertex-Edge mapping scheme (i.e there are 2 tables..one for vertices and one for edges) :
Vertices ( graphID , vertexID, vertexLabel )
Edges ( graphID , sourceVertex , destinationVertex ,edgeLabel )
I am looking for a simple way of counting a particular subgraph in this network. For example: I would like to find how many instances of "A-B-C" are present in this network : "C-D-A-B-C-E-A-B-C-F". I have a few ideas on how this can be done in say Java or C++ ...but I have no clue how to approach this problem using SQL. any ideas?
A little background: I'm no student..this is a small project I would like to pursue. I do a lot of social media analysis (in memory) but have little experience mining graphs against an SQL database.
my idea is to create a stored procedure which input is a string like 'A-B-C' or a precreated table with vertices in proper order ('A', 'B', 'C'). So you will have a loop and step by step you should walk through the path 'A-B-C'. For this you need a temp table for vertices on current step:
1)step 0
#currentLabel = getNextVertexLabel(...) --need to decide how to do this
select
*
into #v
from Vertices
where
vertexLabel = #currentLabel
--we need it later
select
*
into #tempV
from #v
where
0 <> 0
2)step i
#currentLabel = getNextVertexLabel(...)
insert #tempV
select
vs.*
from #v v
join Edges e on
e.SourceVertex = v.VertexID
and e.graphID = v.graphID
join Vertices vs on
e.destinationVertex = vs.VertexID
and e.graphID = vs.graphID
where
vs.vertexLabel = #currentLabel
truncate table #v
insert #v
select * from #tempV
truncate table #tempV
3)after loop
You result will store at #v. So the number of subgraphs will be:
select count(*) from #v

Data structure for efficient multi-parameters search

I have collection of multidimensional object (e.g class Person = {age : int , height : int, weight : int etc...}).
I need to query the collection with queries where some dimensions are fixed and the rest unspecified (e.g getallPersonWith {age = c , height = a} or getAllPersonWith {weigth = d}...)
Right now i have a multimap with {age, Height,...} (e.g all dimension that can be fixed) -> List : Person.To perform a query i first compute the set of keys that verify the query, then merge the corresponding list from the map.
Is there anything better, in terms of query speed ? in particular is there anything closer to using one sorted list by dimension (which i believe to be the fastest solutions, but too cumbersome to manage:) )
Just to be clear, i am not looking for an sql query.
For your purpose you can have a look at:
http://code.google.com/p/cqengine/
Should get you in the right direction
You mean something like:
SELECT * FROM person p
WHERE gender = 'F'
AND age >=18
AND age < 30
AND weight > 60 -- metric measures here !!
AND weight < 70
AND NOT EXISTS (
SELECT * from couple c
WHERE c.one = p.id OR c.two=p.id
);
Why do you think I use SQL?