I represent SMS text messages sent by persons. In order to simplify the problem I have only 2 nodes (one Person, one Phone) and 3 relationship (the person has a phone and sent to himself two text messages). The graph was created as follows:
try ( Transaction tx = graphDb.beginTx() )
{
Node aNode1 = graphDb.createNode();
aNode1.addLabel(DynamicLabel.label("Person"));
aNode1.setProperty("Name", "Juana");
tx.success();
Node aNode2 = graphDb.createNode();
aNode2.addLabel(DynamicLabel.label("Phone"));
aNode2.setProperty("Number", "1111-1111");
tx.success();
// (:Person) -[:has]->(:Phone)
aNode1.createRelationshipTo(aNode2, RelationshipType.withName("has"));
tx.success();
// finally SMS text sent at different moments
// (:Phone) -[:sms]-> (:Phone)
Relationship rel1 = aNode2.createRelationshipTo(aNode2, RelationshipType.withName("sms"));
rel1.setProperty("Length", 100);
tx.success();
Relationship rel2 = aNode2.createRelationshipTo(aNode2, RelationshipType.withName("sms"));
rel2.setProperty("Length", 50);
tx.success();
}
When I execute the following Cypher query:
MATCH (p1 :Person)-[:has]-> (n1 :Phone) -[r :sms]-(n2: Phone)<-[:has]-(p2 :Person)
RETURN p1, p2
I obtain zero tuples. I do not understand the resultset because I have to sms text between p1 and p2 (in this case the same person).
Surprisingly, if I eliminates the node p2 in the query:
MATCH (p1 :Person)-[:has]-> (n1 :Phone) -[r :sms]-(n2: Phone)
RETURN p1
I obtain Juana, as expected.
I can not understand the resultset (zero tuples) of my first query.
Cypher will only traverse a particular relationship once when trying to traverse a path, to avoid infinite loops. You already traverse the (p1:Person) - [:HAS] - > (n1:Phone) relationship once in your query, so you can't get back to Juana from her phone. This doesn't apply to nodes, however, which is why you can get her phone twice from the looped [:sms] relationship.
Related
Continuing with the following question Neo4j Cypher manual relationship index, APOC trigger and data duplication I have created the scenario which reproduces the issue:
CALL apoc.trigger.add('TEST_TRIGGER', "UNWIND keys({assignedRelationshipProperties}) AS key
UNWIND {assignedRelationshipProperties}[key] AS map
WITH map
WHERE type(map.relationship) = 'LIVES_IN'
CALL apoc.index.addRelationship(map.relationship, keys(map.relationship))
RETURN count(*)", {phase:'before'})
CREATE (p:Person) SET p.id = 1 return p
CREATE (p:Person) SET p.id = 2 return p
CREATE (c:City) return c
MATCH (p:Person), (c:City) WHERE p.id = 1 CREATE (p)-[r:LIVES_IN]->(c) SET r.time = 10 RETURN type(r)
MATCH (p:Person), (c:City) WHERE p.id = 2 CREATE (p)-[r:LIVES_IN]->(c) SET r.time = 20 RETURN type(r)
Now, let's try to select the person with r.time = 10:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN person
The query above correctly returns only one node.
Now, let's do the same but return the person count:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN count(person)
The query above returns count = 2.
Why this query returns count = 2 instead of the single node?
Also, the following query:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
CALL apoc.index.relationships('LIVES_IN', 'time:10') YIELD rel
RETURN rel
returns 2 relationships:
{
"time": 10
}
{
"time": 10
}
but I expect only the single one in the manual index where time = 10.
What am I doing wrong ?
The first query in your example returns also two lines. Apparently you look at the result in the form of a graph. Try or switch to table mode, or change the query for this:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN ID(person)
Two lines are obtained because you have two people living in the same city and for each relationship you do a search on the index. Try this:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
WITH DISTINCT c
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN COUNT(DISTINCT person)
Just your query example is wrong, use this:
CALL apoc.index.relationships('LIVES_IN', 'time:10') YIELD rel
RETURN rel
or this
MATCH (c:City)
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN count(person)
I have data organized such way:
There are 1k of teachers, 10k of pupils, every pupil has ~100 homeworks.
I need get all homeworks of pupils, related to a teacher via classes, or by direct link between them. All vertices and edges have some attributes, and let's suppose all required indices are already built, or we can discuss them a bit later.
I can get all required pupils ids by such fast enough query:
$query1 = "FOR v1 IN 1..1 INBOUND #teacherId teacher_pupil FILTER v1.deleted == false RETURN DISTINCT v1._id";
$query2 = "FOR v2 IN 2..2 INBOUND #teacherId OUTBOUND teacher_class, INBOUND pupil_class FILTER v2.deleted == false RETURN DISTINCT v2._id";
$queryUnion = "FOR x IN UNION_DISTINCT (($query1), ($query2)) RETURN x";
Then I wrote the following:
$query = "
LET pupilIds = ($queryUnion)
FOR pupilId IN pupilIds
LET homeworks = (
FOR homework IN 1..1 ANY pupilId pupil_homework
return [homework._id, pupilId]
)
RETURN homeworks";
I got my homeworks, and I even can try filter them, but the query is too slow - that's an incorrect way, I believe.
Question 1 How can I do it without getting all Homeworks huge amount to memory at a time (LIMIT or whatever), sorting and filtering Homeworks by vertex' attributes fast and efficient? I'm sure limiting pupils, or pupil-related homeworks in the query/subquery's FOR leads to incorrect sorting/pagination.
I did another try with pure graph AQL query:
$query1 = "FOR v1 IN 2..2 INBOUND #teacherId pupil_teacher, OUTBOUND pupil_homework RETURN v1._id";
$query2 = "FOR v2 IN 3..3 INBOUND #teacherId teacher_class, pupil_class, OUTBOUND pupil_homework RETURN v2._id";
$query = "FOR x IN UNION_DISTINCT (($query1), ($query2)) LIMIT 500, 500 RETURN x";
It isn't much faster, and I don't know how filter Teacher vertices by attributes.
Question 2 What approach is the best for building such AQL queries, how can I access vertices of a graph filtering all path's parts by attributes? Can I paginate the result to save memory and speedup the query? How can I speed up it at all?
Thank you!
Assuming teacher and pupil are related to each other via classes(2 outbound links) or directly(single outbound link) and with no other way you can do something like this
FOR v IN 1..2 OUTBOUND "teacher_id" GRAPH "graph_name"
FILTER LIKE(v._id, "pupil_collection_name/%")
FOR homeworks IN 1 OUTBOUND v GRAPH "graph_name"
LIMIT lowerLimit,numberOfItems
RETURN homeworks
But if there is a possibility that a teacher and pupil can be related to each other with something other than a class we would have to filter our query with respect to the edge we are seeing as well
FOR v IN 1..2 OUTBOUND "teacher_id" GRAPH "graph_name"
FILTER LIKE(v._id, "pupil_collection_name/%") && (e.name == "ClassPupil" || e.name == "TeacherPupil")
FOR homeworks IN 1 OUTBOUND v GRAPH "graph_name"
LIMIT lowerLimit,numberOfItems
RETURN homeworks
Note that since same teacher can be related to a pupil directly as well as via a class, we can have non unique homeworks . Hence using a RETURN DISTINCT homeworks is suggested. But if duplications are not a problem, the above query should work
I am new to Neo4j,I have the following situation
In the above diagram represented a node with label user with sub-nodes having label shops. Each of these sub-nodes have sub-nodes with label items. Each node items has attribute size and the items node is in descending order by size attribute for each node shops as represented in the figure.
Question
I want to get two items node whose size is less than or equal to 17 from each shops .
How to do that? I tried, but its not working the way I need
Here is what I have tried
match (a:user{id:20000})-[:follows]-(b:shops)
with b
match (b)-[:next*]->(c:items)
where c.size<=17
return b
limit 2
Note- These shops node can have thousands of items nodes. So how to find the desired nodes without traversing all thousands of items nodes.
Please help , thanks in advance.
Right now Cypher does not handle this case well enough, I would probably do a java based unmanaged extension for this.
It would look like this:
public List<Node> findItems(Node shop, int size, int count) {
List<Node> results=new ArrayList<>(count);
Node item = shop.getSingleRelationship(OUTGOING, "next").getEndNode();
while (item.getProperty("size") > size && results.size() < count) {
if (item.getProperty("size") <= size) result.add(item);
item = item.getSingleRelationship(OUTGOING, "next").getEndNode();
}
return result;
}
List<Node> results=new ArrayList<>(count*10);
for (Relationship rel = user.getRelationships(OUTGOING,"follows")) {
Node shop = rel.getEndNode();
results.addAll(findItems(shop,size,count));
}
You can avoid having to traverse all items of each shop by grouping them according to size. In this approach, your graph looks like this
(:User)-[:follows]-(:Shop)-[:sells]-(:Category {size: 17})-[:next]-(:Item)
You could then find two items per shop using
match (a:User {id: 20000})-[:follows]-(s:Shop)-[:sells]-(c:Category)
where c.size <= 17
with *
match p = (c)-[:next*0..2]-()
with s, collect(tail(nodes(p))) AS allCatItems
return s, reduce(shopItems=allCatItems[..0], catItems in allCatItems | shopItems + catItems)[..2]
shopItems=allCatItems[..0] is a workaround for a type checking problem, this essentially initializes shopItems to be an empty Collection of nodes.
Sorry for the naff title, but I'm not really sure how to explain this, I'm one of the new generation whose SQL skills have degraded thanks to the active record patterns!
Basically I have three tables in PostgreSQL
Client (One Client has many maps)
- id
Maps (Map has one client and many layers)
- id
- client_id
Layer (Layer has one map)
- id
- map_id
I would like to write an SQL query that returns Cliend.id along with a count of how many maps that client has and the total number of layers the client has across all maps.
Is this possible with a single query? Speed isn't of concern as this is just for analytical purposes so will be run infrequently.
I'd use a pair of subqueries for this. Something like:
SELECT
id,
(
SELECT count(map.id)
FROM map
WHERE map.client_id = client.id
) AS n_maps,
(
SELECT count(layer.id)
FROM map INNER JOIN layer ON (layer.map_id = map.id)
WHERE map.client_id = client.id
) AS n_layers
FROM client;
I'd do it like this, a single SQL Query inside a method in the Client model:
def self.statistics
Client.select("
clients.id AS client_id,
COUNT(DISTINCT(maps.id)) AS total_maps,
COUNT(layers.id) AS total_layers")
.joins(maps: :layers)
.group("clients.id")
end
In order for this to work, you need the associations declared between your models (Client has_many :maps, Map has_many :layers)
You can go depper in the ActiveRecord's query interface here
I have a class Org, which has ParentId (which points to a Consumer) and Orgs properties, to enable a hierarchy of Org instances. I also have a class Customer, which has a OrgId property. Given any Org instance, named Owner, how can I retrieve all Customer instances for that org? That is, before LINQ I would do a 'manual' traversal of the Org tree with Owner as its root. I'm sure something simpler exists though.
Example: If I have a root level Org called 'Film', with Id '1', and sub-Org called 'Horror' with ParentId of '1', and Id of 23, I want to query for all Customers under Film, so I must get all customers with OrgId's of both 1 and 23.
Linq won't help you with this but SQL Server will.
Create a CTE to generate a flattened list of Org Ids, something like:
CREATE PROCEDURE [dbo].[OrganizationIds]
#rootId int
AS
WITH OrgCte AS
(
SELECT OrganizationId FROM Organizations where OrganizationId = #rootId
UNION ALL
SELECT parent.OrganizationId FROM Organizations parent
INNER JOIN OrgCte child ON parent.Parent_OrganizationId = Child.OrganizationId
)
SELECT * FROM OrgCte
RETURN 0
Now add a function import to your context mapped to this stored procedure. This results in a method on your context (the returned values are nullable int since the original Parent_OrganizationId is declared as INT NULL):
public partial class TestEntities : ObjectContext
{
public ObjectResult<int?> OrganizationIds(int? rootId)
{
...
Now you can use a query like this:
// get all org ids for specific root. This needs to be a separate
// query or LtoE throws an exception regarding nullable int.
var ids = OrganizationIds(2);
// now find all customers
Customers.Where (c => ids.Contains(c.Organization.OrganizationId)).Dump();
Unfortunately, not natively in Entity Framework. You need to build your own solution. Probably you need to iterate up to the root. You can optimize this algorithm by asking EF to get a certain number of parents in one go like this:
...
select new { x.Customer, x.Parent.Customer, x.Parent.Parent.Customer }
You are limited to a statically fixed number of parent with this approach (here: 3), but it will save you 2/3 of the database roundtrips.
Edit: I think I did not get your data model right but I hope the idea is clear.
Edit 2: In response to your comment and edit I have adapted the approach like this:
var rootOrg = ...;
var orgLevels = new [] {
select o from db.Orgs where o == rootOrg, //level 0
select o from db.Orgs where o.ParentOrg == rootOrg, //level 1
select o from db.Orgs where o.ParentOrg.ParentOrg == rootOrg, //level 2
select o from db.Orgs where o.ParentOrg.ParentOrg.ParentOrg == rootOrg, //level 3
};
var setOfAllOrgsInSubtree = orgLevels.Aggregate((a, b) => a.Union(b)); //query for all org levels
var customers = from c in db.Customers where setOfAllOrgsInSubtree.Contains(c.Org) select c;
Notice that this only works for a bounded maximum tree depth. In practice, this is usually the case (like 10 or 20).
Performance will not be great but it is a LINQ-to-Entities-only solution.