how to find closed cycles using neo4j - cypher

I need to find the meshes (closed cycles) of a circuit to apply Kirchhoff's laws (I just need the cycles that do not contain other cycles in). I already have the model created with nodes and connections.
I hope for your help on the closest query to my problem.

I tried
MATCH p=(n)-[*2..3]-(n) RETURN n, length(p), nodes(p), relationships(p)
This query is very slow so, I set 2 to 3 relationship limit.

Related

Multithreaded grouping algorithm

I have a collection of circles, each of which may or may not intersect one or more other circles in the collection. I want to group these circles such that each "group" contains all circles such that every member of the group intersects at least one other member of the group, and such that no member of any group intersects any member of any other group. I have come up with the following VB.NET/pseudocode algorithm to solve this problem on a single thread:
Dim groups As New List(Of List(Of Circle))
For Each circleToClassify In allCircles
Dim added As Boolean
For Each group In groups
For Each circle In group
If circleToClassify.Intersects(circle) Then
group.Add(circleToClassify)
added = True
Exit For
End If
Next
If added Then
Exit For
End If
Next
If Not added Then
Dim newGroup As New List(Of Circle)
newGroup.Add(circleToClassify)
groups.Add(newGroup)
End If
Next
Return groups
Or in English
Take each item from the collection of circles
Check if it intersects with any member of any existing group (Bear in mind a "group" may only contain a single circle)
If the circle does intersect in the aforementioned manner add it to the appropriate group
Otherwise create a new group with this circle as its only member
Go to step 1.
What I want to be able to do is perform this task using an arbitrary number of threads. However, I haven't got very far at all as all solutions I've come up with so far will just end up executing serially due to locking.
Can anyone provide any tips on what I want to be thinking about to achieve this multithreading?
TLDR
The best multithreaded solutions avoid sharing or perform read-only sharing. (And hence don't need locks.)
Consider partitioning your work so that threads don't share result data, and then merging each thread's results.
Note that when you strip away the detail of detecting whether groups of circles intersect, you are really dealing with a connected components graph theory problem. There's plenty of useful material on this subject online. And in fact you may find it much easier and sufficiently fast to simply apply a breadth first search algorithm to find connected components.
Detail
When doing multi-threaded development, first prize is to implement the threads in such a way as to minimise the number of locks. In the most trivial case: if they don't share any data, they don't need locks at all. However, if you can guarantee that the shared data won't be modified while the threads are running: then you don't need locks in this case either.
In your question, there's no need for your input list of circles to be modified. The problem you have is that you're building up a shared list of circle groups. Basically you're sharing your result space and need locks to ensure the integrity of the results.
One technique in this situation is to "partition and merge". As a trivial example consider finding the maximum of a large list of numbers. The naive (and ideal single-threaded solution) is to:
keep a single "current maximum" found;
compare each element to this value;
and update the "current maximum" if it's higher.
The problem for multithreading occurs in updating of the shared result. One solution is to:
partition the list for each of p threads;
find the maximum within each partition;
once all threads finish their work, the final result is trivially obtained by finding the maximum of the p partitioned maximums.
The trade-off against a single-threaded solution involves weighing up the ease with which the workload can be partitioned and the per-thread results merged versus the often much simpler single-threaded approach.
Applying partition and merge to circle clusters
As a side note: Observe that your question is essentially a graph theory question such that: Each circle is a node; where if any 2 circles intersect, there's an undirected edge between them; and you're trying to determine the connected components of the graph.
Obviously this provides an area that you can research for more ideas/information. But more importantly it makes easier to analyse the problem with simple boolean assessment of whether 2 circles intersect.
Also note the potential performance improvements by first pre-processing your circles into a suitable graph structure.
Assume you have 8 circles (A-H) where 1's in the table below indicate the 2 circles intersect.
ABCDEFGH
A11000110
B11000000
C00100000
D00010101
E00001110
F10011100
G10001010
H00010001
One partitioning idea involves determining what's connected by only considering a subset of circles and all their immediate connections.
ABCDEFGH
A11000110 p1 [AB]
B11000000
---------
C00100000 p2 [CD]
D00010101
---------
E00001110 p3 [EF]
F10011100
---------
G10001010 p4 [GH]
H00010001
NB Even though threads are sharing data (e.g. 2 threads may consider the intersection between circles A and F concurrently), the share is read-only and doesn't require a lock.
Assume 4 partitions (and 4 threads) of [AB][CD][EF][GH]. Connected components per partition would be broken down as follows:
[AB]: ABFG
[CD]: C DFH
[EF]: ADEFG
[GH]: AEG DH
You now have a list of potentially overlapping connected components. Merging involves iterating the list to find overlaps. If found, take union of the 2 sets is a new connected component. This will finally produce: ABFGDHE and C
Some optimisation techniques to consider:
The bottom left of the matrix mirrors the top-right. So you should be able to avoid duplicating processing of the inverse connections.
The merging of partitions can itself be partitioned and merged.
In fact in the extreme case you could start out partitioning a single circle per partition.
Connected(A) = ABFG
Connected(B) = B
Connected(AB) = ABFG
Connected(C) = C
Connected(D) = DFH
Connected(CD) = C,DFH
Connected(ABCD) = ABFGDH,C
Connected(E) = EFG
Connected(F) = F
Connected(EF) = EFG
Connected(G) = G
Connected(H) = H
Connected(GH) = G,H
Connected(EFGH) = EFG,H
Connected(ABCDEFGH) = ABFGDHE,C
Very NB You need to ensure appropriate selection of data structures and algorithms or suffer extremely poor performance. E.g. A naive intersection implementation might require O(n^2) operations to determine if two intermediate connected components intersect and totally destroy your goal that lead to all this additional complexity.
One approach is to divide the image into blocks, run the algorithm for each block independently, on different threads (i.e. considering only the circles whose center is in that block), and afterwards join the groups from different blocks that have intersecting circles.
Another approach is to formulate the problem using a graph, where the nodes represent circles, and an edge exists between two nodes if the corresponding circles are intersecting. We need to find the connected components of this graph. This disregards the geometric aspects of the problem, however, there are general algorithms which may be useful (e.g. you could consider the last slides from this link).

Neo4J: efficient neighborhood query

What's an efficient way to find all nodes within N hops of a given node? My particular graph isn't highly connected, i.e. most nodes have only degree 2, so for example the following query returns only 27 nodes (as expected), but it takes about a minute of runtime and the CPU is pegged:
MATCH (a {id:"36380_A"})-[*1..20]-(b) RETURN a,b;
All the engine's time is spent in traversals, because if I just find that starting node by itself, the result returns instantly.
I really only want the set of unique nodes and relationships (for visualization), so I also tried adding DISTINCT to try to stop it from re-visiting nodes it's seen before, but I see no change in run time.
As you said, matching the start node alone is really fast and faster if your property is indexed.
However what you are trying to do now is matching the whole pattern in the graph.
Keep your idea of your fast starting point:
MATCH (a:Label {id:"1234-a"})
once you got it pass it to the rest of the query with WITH
WITH a
then match the relationships from your fast starting point :
MATCH (a)-[:Rel*1..20]->(b)

Neo4j Query Optimization - 15 diff labels

So I have 15 different Labels in Neo4j which represent 15 real world business Objects. All these Objects are related to each other. Each Object (Label) has thousands of nodes . What I'm trying to do is to do optional Matches between all these Labels and get the Related data. The query is running extremely slow with all 15 selected and works fine with 3-4 object types.
So Typically this is the query with less object types which works fine.
MATCH (incident:Incidents)
WHERE incident.incident_number IN ["INC000005590903","INC000005590903"]
MATCH (device:Devices)
WHERE device.deviceid_udr in ["RE221869491800Uh_pVAevJpYAhRcJ"]
MATCH (alarm:Alarms)
WHERE alarm.entryid_udr in ["ALM123000000110"]
MATCH incident-[a]-alarm
MATCH device-[b]-alarm
MATCH incident-[c]-device
RETURN incident.incident_number, device.deviceid,alarm.entryid_udr
When I do a query to find data related between 15 different object types it runs extremely slow. Do you have any suggestions how I could approach this problem
When you analyze what is happening in your query, it's not to hard to see why it is slow. Each of your initial matches is unrelated to the others, so the entire domain for each label is searched. If you use relationship matching up front, you can significantly reduce the size of each the time it takes to do the query.
Try this query and see how it goes in comparison with the one in your question.
MATCH (incident:Incidents {incident_number : "INC000005590903"})
WITH incident
MATCH (incident)--(device:Devices {deviceid_udr : "RE221869491800Uh_pVAevJpYAhRcJ"})
WITH incident, device
MATCH (device)--(alarm:Alarms {entryid_udr : "ALM123000000110"})
WITH incident, device, alarm
MATCH (incident)--(alarm)
RETURN incident.incident_number, device.deviceid_udr, alarm.entryid_udr
In this query, once you find the incident, the next match is a search of the relationships on the particular incident to find a matching device. Once that is found, matching the alarm is limited in the same fashion.
In addition, if you have a different relationship type for each of the relationships (incident to device, device to alarm, alarm to incident, etc), using the specifications of the relationship types in the matches will speed things up even further, since it will again reduce the number of items that must be searched. Without relationship typing, relationships to nodes with wrong labels will be tested.
I don't know if it is intentional or not, but you also are matching for a closed ring in your example. That's not a problem if you are careful not to allow match loops to occur, but the query won't succeed if there isn't a closed ring. Just thought I'd point it out.

Oracle APEX Tree Hierarchy Limitations/Issues

I am using Oracle APEX 4.2.2 and have constructed a Tree region based off a view.
Now when I take this query (see below) and run this query say in Oracle SQL Developer - all is fine but when I place this same query within the page in Oracle APEX based off a Tree region - all saves correctly but when I run this query, no records/tree is displayed at all.
Now the underlying view can change in record size but for the example I am talking about here, I have just over 6000 records that I need to build a Oracle Tree hierarchy from.
One thing I have noticed is that if I reduce the record size to say 500 rows, the tree displays perfectly.
Questions:
1) Now is there a limitation that I am not aware of as I really need to get this going based on whether there are 500 records or 6000 records?
2) Is 6000 rows too many for a tree hierarchy representation?
3) Could it possibly be because that Oracle APEX 4.2.2 is now using js for building trees and there causing issues due to the quantity of data?
4) Is there a means of reducing the depth of the tree records so that I can still at least display something to the user?
My query is something like:
SELECT case when connect_by_isleaf = 1 then 0
when level = 1 then 1
else -1
end as status,
level,
c as title,
null as icon,
c as value,
null as tooltip,
null as link
FROM t
start with p IS NULL
CONNECT BY NOCYCLE PRIOR c = p;
Also I've noticed that if I try and run the query in SQL Workshop, it doesn't work there either unless I reduce the record size down to say 500 records.
I asked about using IE because the 'too large tree' issue especially plays up in IE. I've seen this issue pass by and asked about a couple of times already. The conclusion was simply that there isn't much to be done about it and generally the browser(s) don't cope too well with a tree with such a large dataset. Usually the issue isn't there or is minimal in ff or chrome though, and ie is mostly not playing ball, and my guess is that this has to do with memory and dom manipulation.
1) Now is there a limitation that I am not aware of as I really need
to get this going based on whether there are 500 records or 6000
records?
No limitation.
2) Is 6000 rows too many for a tree hierarchy representation?
Probably, yes.
3) Could it possibly be because that Oracle APEX 4.2.2 is now using js
for building trees and there causing issues due to the quantity of
data?
Trees are being built with jstree since 4.0 (don't know about 3.2). Apex puts out a global variable in the tree region which holds all the data. The initialization of the widget will then create the complete ul-li list structure. Part of the issue might be that there are so many nodes to begin with, and then how this is ran through jstree, and the huge amount of dom manipulation occuring. I'm not sure if this would go better with the newer release of jstree (apex version is 0.9.9 while 1.x has been released for a while now).
4) Is there a means of reducing the depth of the tree records so that
I can still atleast display something to the user?
If you want to limit the depth you can limit the query by using level in the where clause. eg
WHERE level <= 3
Alternative options will probably be non-apex solutions. Dynamic trees, ajax for the tree nodes, another plugin,... I haven't really explored those as I haven't had to deal with such a big tree yet.
I experienced, that the number of displayable tree nodes depends also on the text lenghts in you tree (e.g. nodes and tooltips). The shorter the texts, the more nodes your tree can display. However, it makes a difference of maybe 50 nodes, so it won't solve your problem, as it didn't solve mine.
My mediocre educated guess is, that this ul-li is limited in size.
I built in a drop-down prefilter, so the user has to narrow down what she/he wants to have displayed.

Storage algorithm question - verify sequential data with little memory

I found this on an "interview questions" site and have been pondering it for a couple of days. I will keep churning, but am interested what you guys think
"10 Gbytes of 32-bit numbers on a magnetic tape, all there from 0 to 10G in random order. You have 64 32 bit words of memory available: design an algorithm to check that each number from 0 to 10G occurs once and only once on the tape, with minimum passes of the tape by a read head connected to your algorithm."
32-bit numbers can take 4G = 2^32 different values. There are 2.5*2^32 numbers on tape total. So after 2^32 count one of numbers will repeat 100%. If there were <= 2^32 numbers on tape then it was possible that there are two different cases – when all numbers are different or when at least one repeats.
It's a trick question, as Michael Anderson and I have figured out. You can't store 10G 32b numbers on a 10G tape. The interviewer (a) is messing with you and (b) is trying to find out how much you think about a problem before you start solving it.
The utterly naive algorithm, which takes as many passes as there are numbers to check, would be to walk through and verify that the lowest number is there. Then do it again checking that the next lowest is there. And so on.
This requires one word of storage to keep track of where you are - you could cut down the number of passes by a factor of 64 by using all 64 words to keep track of where you're up to in several different locations in the search space - checking all of your current ones on each pass. Still O(n) passes, of course.
You could probably cut it down even more by using portions of the words - given that your search space for each segment is smaller, you won't need to keep track of the full 32-bit range.
Perform an in-place mergesort or quicksort, using tape for storage? Then iterate through the numbers in sequence, tracking to see that each number = previous+1.
Requires cleverly implemented sort, and is fairly slow, but achieves the goal I believe.
Edit: oh bugger, it's never specified you can write.
Here's a second approach: scan through trying to build up to 30-ish ranges of contiginous numbers. IE 1,2,3,4,5 would be one range, 8,9,10,11,12 would be another, etc. If ranges overlap with existing, then they are merged. I think you only need to make a limited number of passes to either get the complete range or prove there are gaps... much less than just scanning through in blocks of a couple thousand to see if all digits are present.
It'll take me a bit to prove or disprove the limits for this though.
Do 2 reduces on the numbers, a sum and a bitwise XOR.
The sum should be (10G + 1) * 10G / 2
The XOR should be ... something
It looks like there is a catch in the question that no one has talked about so far; the interviewer has only asked the interviewee to write a program that CHECKS
(i) if each number that makes up the 10G is present once and only once--- what should the interviewee do if the numbers in the given list are present multple times? should he assume that he should stop execting the programme and throw exception or should he assume that he should correct the mistake by removing the repeating number and replace it with another (this may actually be a costly excercise as this involves complete reshuffle of the number set)? correcting this is required to perform the second step in the question, i.e. to verify that the data is stored in the best possible way that it requires least possible passes.
(ii) When the interviewee was asked to only check if the 10G weight data set of numbers are stored in such a way that they require least paases to access any of those numbers;
what should the interviewee do? should he stop and throw exception the moment he finds an issue in the algorithm they were stored in, or correct the mistake and continue till all the elements are sorted in the order of least possible passes?
If the intension of the interviewer is to ask the interviewee to write an algorithm that finds the best combinaton of numbers that can be stored in 10GB, given 64 32 Bit registers; and also to write an algorithm to save these chosen set of numbers in the best possible way that require least number of passes to access each; he should have asked this directly, woudn't he?
I suppose the intension of the interviewer may be to only see how the interviewee is approaching the problem rather than to actually extract a working solution from the interviewee; wold any buy this notion?
Regards,
Samba