I've discovered recently about OrientDB and I've been playing a little with this tool these past few weeks. However, I noticed today that something seemed to be wrong whenever I added an edge between two vertices. The edge record is not present if I make a query such as SELECT FROM E, this just returns an empty set. In spite of this, it is possible to see the relationship as a property in the nodes, and queries like SELECT IN() FROM V do work.
This poses an issue; if I can't access directly the edge record, I can't modify it with more properties, or even if I could, I wouldn't be able to see the changes made. I thought this could be a design decision for some reason but the GratefulDeadConcerts example database doesn't seem to have this problem.
I'll illustrate my question with an example:
Let's create a graph database in OrientDB from scratch and name it "Test". We'll create a couple of vertices:
CREATE VERTEX SET TEST=123
CREATE VERTEX SET TEST=456
Let's assume the #rid of these nodes are #9:0 and #9:1 respectively, as we haven't changed anything from the default settings. Let's create an edge between them:
CREATE EDGE FROM #9:0 TO #9:1
Now, let's take a look at the output of the query SELECT FROM V:
orientdb {Test}> SELECT FROM V
----+----+----+----+----
# |#RID|TEST|out_|in_
----+----+----+----+----
0 |#9:0|123 |#9:1|null
1 |#9:1|456 |null|#9:0
----+----+----+----+----
2 item(s) found. Query executed in 0.005 sec(s).
Everything looks right so far. However, the output of the query SELECT FROM E is simply 0 item(s) found. Query executed in 0.016 sec(s).. If we execute SELECT IN() FROM V we get the following:
orientdb {Test}> SELECT IN() FROM V
----+-----+----
# |#RID |IN
----+-----+----
0 |#-2:1|[0]
1 |#-2:2|[1]
----+-----+----
2 item(s) found. Query executed in 0.005 sec(s).
From this, I assume that the edges are created in cluster number -2, even if the default cluster for the class E is 10, and I haven't added any other clusters. I suspect this has something to do with the problem, but I'm not sure how to fix it. I have tried adding new clusters to the class E and creating the edges in this new cluster, but to no avail, I keep getting the exact same result.
So my question is, how do I make edges records show up in OrientDB?
I'm using OrientDB Community 1.7-RC2 and have tried this in two different machines, one Windows 7 and another one Debian Wheezy.
Extracted from https://github.com/orientechnologies/orientdb/wiki/Troubleshooting#why-i-cant-see-all-the-edges:
OrientDB, by default, manages edges as "lightweight" edges if they have no properties. This means that if an edge has no properties, it's not stored as physical record. But don't worry, your edge is still there but encoded in a separate data structure. For this reason if you execute a select from Eno edges or less edges than expected are returned. It's extremely rare the need to have the list of edges, but if this is your case you can disable this feature by issuing this command once (with a slow down and a bigger database size):
alter database custom useLightweightEdges=false
Related
I have a big layer with lines, and a view that needs to calculate the length of these lines without counting their overlaps
A working query that does half the job (but does not account for the overlap, so overestimates the number)
select name, sum(st_length(t.geom)) from mytable t where st_isvalid(t.geom) group by name
The intended query that returns SQL Error [XX000]: ERROR: GEOSUnaryUnion: TopologyException: found non-noded intersection between LINESTRING (446659 422287, 446661 422289) and LINESTRING (446659 422288, 446660 422288) at 446659.27944086661 422288.0015405959
select name,st_length(st_union(t.geom)) from mytable t where st_isvalid(t.geom) group by name
The thing is that the later works fine for the first 200 rows, it's only when I try to export the entire view that I get the error
Would there be a way to use the preferred query first, and if it returns an error on a row use the other one? Something like:
case when st_length(st_union(t.geom)) = error then sum(st_length(t.geom))
else st_length(st_union(t.geom)) end
Make sure your geometries are valid before union by wrapping them in ST_MakeValid(). You can also query their individual validity using select id, ST_IsValid(t.geom) from mytable; to maybe filter out or correct the affected ones. In cases where one of you geometries is itself invalid in this way, it'll help. This will still leave cases where the invalidity appears after combining multiple valid geometries together.
See if ST_UnaryUnion(ST_Collect(ST_MakeValid(t.geom))) changes anything. It will try to dissolve and node the component linestrings.
When really desperate, you can make a PL/pgSQL wrapper around both of your functions and switch to the backup one in the exception block.
At the expense of some precision and with the benefit of a bit higher performance, you could try snapping them to grid ST_Union(ST_SnapToGrid(t.geom,1e-7)), gradually increasing the grid size to 1e-6, 1e-5. Some geometries could be not actually intersecting, but be so close, PostGIS can't tell at the precision it operates at. You can also try applying this only to your problematic geometries, if you can pinpoint them.
As reminded by #dr_jts PostGIS 3.1.0 includes a new overlay engine, so if your select postgis_full_version(); shows anything below that and GEOS 3.9.0, it might be worth upgrading. The upcoming PostGIS 3.2.0 with GEOS 3.10.1 should also provide some iprovement in validity checks.
Here's a related thread.
I’ve a simple Vertex „url“:
schema.vertexLabel('url').partitionKey('url_fingerprint', 'prop1').properties("url_complete").ifNotExists().create()
And a edgeLabel called „links“ which connects one url to another.
schema.edgeLabel('links').properties("prop1", 'prop2').connection('url', 'url').ifNotExists().create()
It’s possible that one url has millions of incoming links (e.g. front page of ebay.com from all it’s subpages).
But that seems to result in really big partitions / and a crash of dse because of wide partitions (From Opscenter wide partitions report):
graphdbname.url_e (2284 mb)
How can i avoid that situation? How to handle this „Supernodes“? I’ve found a „partition“ command (article about this [1]) for Labels but that is set deprecated and will be removed in DSE 6.0 / the only hint in release notes is to model the data on another way - but i’ve no idea how i can do that in that case.
I’m happy about every hint. Thanks!
[1] https://www.experoinc.com/post/dse-graph-partitioning-part-2-taming-your-supernodes
The current recommendation is to use the concept of "bucketing" that drives data model design in the C* world and apply that to the graph by creating an intermediary Vertex that represents groups of links.
2 Vertex Labels
URL
URL_Group | partition key ((url, group)) … i.e. a composite primary key with 2 partition key components
2 Edges
URL -> URL_Group
URL_Group (replaces existing self reference edge) URL_Group <->URL_Group
Store no more than 100Kish url_fingerprints per group. Create a new group after each 100kish edges exist.
This solution requires bookkeeping to determine when a new group is needed.
This could be done through a simple C* table for fast, easy retrievable.
CREATE TABLE lookup url_fingerprint, group, count counter PRIMARY KEY (url_fingerprint, group)
This should preserve DESC order, may need to add an ORDER BY statement if DESC order is not preserved.
Prior to writing to the Graph, one would need to read the table to find the latest group.
SELECT url_fingerprint, group, count from lookup LIMIT(1)
If the counter is > 100kish, create a new group (increment group +1). During or after writing a new row to Graph, one would need to increment the counter.
Traversing would require something similar to:
g.V().has(some url).out(URL).out(URL_Group).in(URL)
Where conceptually you would traverse the relationships like URL -> URL_Group->URL_Group<-URL
The visual model of this type of traversal would look like the following diagram
I am having a strange issue when querying Orientdb for a list of outgoing relationships from a node. Imagine that I have a node #34:1 with 100 outgoing relationships of class CONTACT_OF. I can query OrientDB for this relationships using the vertex or using the edge just like this
Using the vertex
SELECT outE('CONTACT_OF') FROM #34:1
As a result OrientDB returns '[]'. This makes no sense at all, since the node has been connected with 100 contacts. I've tried with other kind of relationship and it works as expected, but for some reason that I don't understand it returns '[]' when querying for CONTACT_OF.
Using the edge class
SELECT FROM CONTACT_OF WHERE out=#34:1
As a result OrientDB returns the 100 records of contacts.
The question is why when executing
SELECT outE('CONTACT_OF') FROM #34:1
the result is an empty array?
Any help would be appreciated, thanks.
EDIT: I am using OrientDB community 2.1-rc3
Here is a sample of the anomaly in orientdb studio http://s8.postimg.org/5p1vxbk45/Captura.png
What's an efficient way to find all nodes within N hops of a given node? My particular graph isn't highly connected, i.e. most nodes have only degree 2, so for example the following query returns only 27 nodes (as expected), but it takes about a minute of runtime and the CPU is pegged:
MATCH (a {id:"36380_A"})-[*1..20]-(b) RETURN a,b;
All the engine's time is spent in traversals, because if I just find that starting node by itself, the result returns instantly.
I really only want the set of unique nodes and relationships (for visualization), so I also tried adding DISTINCT to try to stop it from re-visiting nodes it's seen before, but I see no change in run time.
As you said, matching the start node alone is really fast and faster if your property is indexed.
However what you are trying to do now is matching the whole pattern in the graph.
Keep your idea of your fast starting point:
MATCH (a:Label {id:"1234-a"})
once you got it pass it to the rest of the query with WITH
WITH a
then match the relationships from your fast starting point :
MATCH (a)-[:Rel*1..20]->(b)
I am using Oracle APEX 4.2.2 and have constructed a Tree region based off a view.
Now when I take this query (see below) and run this query say in Oracle SQL Developer - all is fine but when I place this same query within the page in Oracle APEX based off a Tree region - all saves correctly but when I run this query, no records/tree is displayed at all.
Now the underlying view can change in record size but for the example I am talking about here, I have just over 6000 records that I need to build a Oracle Tree hierarchy from.
One thing I have noticed is that if I reduce the record size to say 500 rows, the tree displays perfectly.
Questions:
1) Now is there a limitation that I am not aware of as I really need to get this going based on whether there are 500 records or 6000 records?
2) Is 6000 rows too many for a tree hierarchy representation?
3) Could it possibly be because that Oracle APEX 4.2.2 is now using js for building trees and there causing issues due to the quantity of data?
4) Is there a means of reducing the depth of the tree records so that I can still at least display something to the user?
My query is something like:
SELECT case when connect_by_isleaf = 1 then 0
when level = 1 then 1
else -1
end as status,
level,
c as title,
null as icon,
c as value,
null as tooltip,
null as link
FROM t
start with p IS NULL
CONNECT BY NOCYCLE PRIOR c = p;
Also I've noticed that if I try and run the query in SQL Workshop, it doesn't work there either unless I reduce the record size down to say 500 records.
I asked about using IE because the 'too large tree' issue especially plays up in IE. I've seen this issue pass by and asked about a couple of times already. The conclusion was simply that there isn't much to be done about it and generally the browser(s) don't cope too well with a tree with such a large dataset. Usually the issue isn't there or is minimal in ff or chrome though, and ie is mostly not playing ball, and my guess is that this has to do with memory and dom manipulation.
1) Now is there a limitation that I am not aware of as I really need
to get this going based on whether there are 500 records or 6000
records?
No limitation.
2) Is 6000 rows too many for a tree hierarchy representation?
Probably, yes.
3) Could it possibly be because that Oracle APEX 4.2.2 is now using js
for building trees and there causing issues due to the quantity of
data?
Trees are being built with jstree since 4.0 (don't know about 3.2). Apex puts out a global variable in the tree region which holds all the data. The initialization of the widget will then create the complete ul-li list structure. Part of the issue might be that there are so many nodes to begin with, and then how this is ran through jstree, and the huge amount of dom manipulation occuring. I'm not sure if this would go better with the newer release of jstree (apex version is 0.9.9 while 1.x has been released for a while now).
4) Is there a means of reducing the depth of the tree records so that
I can still atleast display something to the user?
If you want to limit the depth you can limit the query by using level in the where clause. eg
WHERE level <= 3
Alternative options will probably be non-apex solutions. Dynamic trees, ajax for the tree nodes, another plugin,... I haven't really explored those as I haven't had to deal with such a big tree yet.
I experienced, that the number of displayable tree nodes depends also on the text lenghts in you tree (e.g. nodes and tooltips). The shorter the texts, the more nodes your tree can display. However, it makes a difference of maybe 50 nodes, so it won't solve your problem, as it didn't solve mine.
My mediocre educated guess is, that this ul-li is limited in size.
I built in a drop-down prefilter, so the user has to narrow down what she/he wants to have displayed.