Neo4j Cypher manual relationship index, APOC trigger and data duplication 2 - indexing

Continuing with the following question Neo4j Cypher manual relationship index, APOC trigger and data duplication I have created the scenario which reproduces the issue:
CALL apoc.trigger.add('TEST_TRIGGER', "UNWIND keys({assignedRelationshipProperties}) AS key
UNWIND {assignedRelationshipProperties}[key] AS map
WITH map
WHERE type(map.relationship) = 'LIVES_IN'
CALL apoc.index.addRelationship(map.relationship, keys(map.relationship))
RETURN count(*)", {phase:'before'})
CREATE (p:Person) SET p.id = 1 return p
CREATE (p:Person) SET p.id = 2 return p
CREATE (c:City) return c
MATCH (p:Person), (c:City) WHERE p.id = 1 CREATE (p)-[r:LIVES_IN]->(c) SET r.time = 10 RETURN type(r)
MATCH (p:Person), (c:City) WHERE p.id = 2 CREATE (p)-[r:LIVES_IN]->(c) SET r.time = 20 RETURN type(r)
Now, let's try to select the person with r.time = 10:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN person
The query above correctly returns only one node.
Now, let's do the same but return the person count:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN count(person)
The query above returns count = 2.
Why this query returns count = 2 instead of the single node?
Also, the following query:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
CALL apoc.index.relationships('LIVES_IN', 'time:10') YIELD rel
RETURN rel
returns 2 relationships:
{
"time": 10
}
{
"time": 10
}
but I expect only the single one in the manual index where time = 10.
What am I doing wrong ?

The first query in your example returns also two lines. Apparently you look at the result in the form of a graph. Try or switch to table mode, or change the query for this:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN ID(person)
Two lines are obtained because you have two people living in the same city and for each relationship you do a search on the index. Try this:
MATCH (p:Person)-[r:LIVES_IN]->(c:City)
WITH DISTINCT c
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN COUNT(DISTINCT person)

Just your query example is wrong, use this:
CALL apoc.index.relationships('LIVES_IN', 'time:10') YIELD rel
RETURN rel
or this
MATCH (c:City)
CALL apoc.index.in(c, 'LIVES_IN', 'time:10') YIELD node AS person
RETURN count(person)

Related

Return results from more than one database table in Django

Suppose I have 3 hypothetical models;
class State(models.Model):
name = models.CharField(max_length=20)
class Company(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
class Person(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
I want to be able to return results in a Django app, where the results, if using SQL directly, would be based on a query such as this:
SELECT a.name as 'personName',b.name as 'companyName', b.state as 'State'
FROM Person a, Company b
WHERE a.state=b.state
I have tried using the select_related() method as suggested here, but I don't think this is quite what I am after, since I am trying to join two tables that have a common foreign-key, but have no key-relationships amongst themselves.
Any suggestions?
Since a Person can have multiple Companys in the same state. It is not a good idea to do the JOIN at the database level. That would mean that the database will (likely) return the same Company multiple times, making the output quite large.
We can prefetch the related companies, with:
qs = Person.objects.select_related('state').prefetch_related('state__company')
Then we can query the Companys in the same state with:
for person in qs:
print(person.state.company_set.all())
You can use a Prefetch-object [Django-doc] to prefetch the list of related companies in an attribute of the Person, for example:
from django.db.models import Prefetch
qs = Person.objects.prefetch_related(
Prefetch('state__company', Company.objects.all(), to_attr='same_state_companies')
)
Then you can print the companies with:
for person in qs:
print(person.same_state_companies)

Cypher Query does not work

I represent SMS text messages sent by persons. In order to simplify the problem I have only 2 nodes (one Person, one Phone) and 3 relationship (the person has a phone and sent to himself two text messages). The graph was created as follows:
try ( Transaction tx = graphDb.beginTx() )
{
Node aNode1 = graphDb.createNode();
aNode1.addLabel(DynamicLabel.label("Person"));
aNode1.setProperty("Name", "Juana");
tx.success();
Node aNode2 = graphDb.createNode();
aNode2.addLabel(DynamicLabel.label("Phone"));
aNode2.setProperty("Number", "1111-1111");
tx.success();
// (:Person) -[:has]->(:Phone)
aNode1.createRelationshipTo(aNode2, RelationshipType.withName("has"));
tx.success();
// finally SMS text sent at different moments
// (:Phone) -[:sms]-> (:Phone)
Relationship rel1 = aNode2.createRelationshipTo(aNode2, RelationshipType.withName("sms"));
rel1.setProperty("Length", 100);
tx.success();
Relationship rel2 = aNode2.createRelationshipTo(aNode2, RelationshipType.withName("sms"));
rel2.setProperty("Length", 50);
tx.success();
}
When I execute the following Cypher query:
MATCH (p1 :Person)-[:has]-> (n1 :Phone) -[r :sms]-(n2: Phone)<-[:has]-(p2 :Person)
RETURN p1, p2
I obtain zero tuples. I do not understand the resultset because I have to sms text between p1 and p2 (in this case the same person).
Surprisingly, if I eliminates the node p2 in the query:
MATCH (p1 :Person)-[:has]-> (n1 :Phone) -[r :sms]-(n2: Phone)
RETURN p1
I obtain Juana, as expected.
I can not understand the resultset (zero tuples) of my first query.
Cypher will only traverse a particular relationship once when trying to traverse a path, to avoid infinite loops. You already traverse the (p1:Person) - [:HAS] - > (n1:Phone) relationship once in your query, so you can't get back to Juana from her phone. This doesn't apply to nodes, however, which is why you can get her phone twice from the looped [:sms] relationship.

Union between optional and non-optional tables

I have two queries that select records where a union needs to be taken, one of which is a left join and one of which is a regular (i.e. inner) join.
Here's the left join case:
def regularAccountRecords = for {
(customer, account) <- customers joinLeft accounts on (_.accountId === _.accountId) // + some other special conditions
} yield (customer, account)
Here's the regular join case:
def specialAccountRecords = for {
(customer, account) <- customers join accounts on (_.accountId === _.accountId) // + some other special conditions
} yield (customer, account)
Now I want to take a union of the two record sets:
regularAccountRecords ++ specialAccountRecords
Obviously this doesn't work because in the regular join case it returns Query[(Customer, Account),...] and in the left join case it returns Query[(Customer, Rep[Option[Account]]),...] and this results in a Type Mismatch error.
Now, If this were a regular column type (e.g. Rep[String]) I could convert it to an optional via the ? operator (i.e. record.?) and get Rep[Option[String]] but using it on a table (i.e. the accounts table) causes:
Error:(62, 85) value ? is not a member of com.test.Account
How do I work around this issue and do the union properly?
Okay, looks like this is what the '?' projection is for but I didn't realize it because I disabled the optionEnabled option in the Codegen. Here's what your codegen extension is supposed to look like:
class MyCodegen extends SourceCodeGenerator(inputModel) {
override def TableClass = new TableClassDef {
override def optionEnabled = true
}
}
Alternatively, you can use implicit classes to tack this thing onto the generated TableClass yourself. Here is how that would look:
implicit class AccountExtensions(account:Account) {
def ? = (Rep.Some(account.id), account.name).shaped.<>({r=>r._1.map(_=> Account.tupled((r._2, r._1.get)))}, (_:Any) => throw new Exception("Inserting into ? projection not supported."))
}
NOTE: be sure to check the field ordering, depending on how this
projection is done, the union query might put the ID field in the wrong
place in the output, use
println(query.result.statements.headOption) to debug the output
SQL to be sure.
Once you do that, you will be able to use account.? in the yield statement:
def specialAccountRecords = for {
(customer, account) <- customers join accounts on (_.accountId === _.accountId)
} yield (customer, account.?)
...and then you will be able to unionize the tables correctly
regularAccountRecords ++ specialAccountRecords
I really wish the Slick people would put a note on how the '?' projection is useful in the documentation beyond the vague statement 'useful for outer joins'.

How do I flatten a hierarchy in LINQ to Entities?

I have a class Org, which has ParentId (which points to a Consumer) and Orgs properties, to enable a hierarchy of Org instances. I also have a class Customer, which has a OrgId property. Given any Org instance, named Owner, how can I retrieve all Customer instances for that org? That is, before LINQ I would do a 'manual' traversal of the Org tree with Owner as its root. I'm sure something simpler exists though.
Example: If I have a root level Org called 'Film', with Id '1', and sub-Org called 'Horror' with ParentId of '1', and Id of 23, I want to query for all Customers under Film, so I must get all customers with OrgId's of both 1 and 23.
Linq won't help you with this but SQL Server will.
Create a CTE to generate a flattened list of Org Ids, something like:
CREATE PROCEDURE [dbo].[OrganizationIds]
#rootId int
AS
WITH OrgCte AS
(
SELECT OrganizationId FROM Organizations where OrganizationId = #rootId
UNION ALL
SELECT parent.OrganizationId FROM Organizations parent
INNER JOIN OrgCte child ON parent.Parent_OrganizationId = Child.OrganizationId
)
SELECT * FROM OrgCte
RETURN 0
Now add a function import to your context mapped to this stored procedure. This results in a method on your context (the returned values are nullable int since the original Parent_OrganizationId is declared as INT NULL):
public partial class TestEntities : ObjectContext
{
public ObjectResult<int?> OrganizationIds(int? rootId)
{
...
Now you can use a query like this:
// get all org ids for specific root. This needs to be a separate
// query or LtoE throws an exception regarding nullable int.
var ids = OrganizationIds(2);
// now find all customers
Customers.Where (c => ids.Contains(c.Organization.OrganizationId)).Dump();
Unfortunately, not natively in Entity Framework. You need to build your own solution. Probably you need to iterate up to the root. You can optimize this algorithm by asking EF to get a certain number of parents in one go like this:
...
select new { x.Customer, x.Parent.Customer, x.Parent.Parent.Customer }
You are limited to a statically fixed number of parent with this approach (here: 3), but it will save you 2/3 of the database roundtrips.
Edit: I think I did not get your data model right but I hope the idea is clear.
Edit 2: In response to your comment and edit I have adapted the approach like this:
var rootOrg = ...;
var orgLevels = new [] {
select o from db.Orgs where o == rootOrg, //level 0
select o from db.Orgs where o.ParentOrg == rootOrg, //level 1
select o from db.Orgs where o.ParentOrg.ParentOrg == rootOrg, //level 2
select o from db.Orgs where o.ParentOrg.ParentOrg.ParentOrg == rootOrg, //level 3
};
var setOfAllOrgsInSubtree = orgLevels.Aggregate((a, b) => a.Union(b)); //query for all org levels
var customers = from c in db.Customers where setOfAllOrgsInSubtree.Contains(c.Org) select c;
Notice that this only works for a bounded maximum tree depth. In practice, this is usually the case (like 10 or 20).
Performance will not be great but it is a LINQ-to-Entities-only solution.

Raven DB Count Queries

I have a need to get a Count of Documents in a particular collection :
There is an existing index Raven/DocumentCollections that stores the Count and Name of the collection paired with the actual documents belonging to the collection. I'd like to pick up the count from this index if possible.
Here is the Map-Reduce of the Raven/DocumentCollections index :
from doc in docs
let Name = doc["#metadata"]["Raven-Entity-Name"]
where Name != null
select new { Name , Count = 1}
from result in results
group result by result.Name into g
select new { Name = g.Key, Count = g.Sum(x=>x.Count) }
On a side note, var Count = DocumentSession.Query<Post>().Count(); always returns 0 as the result for me, even though clearly there are 500 odd documents in my DB atleast 50 of them have in their metadata "Raven-Entity-Name" as "Posts". I have absolutely no idea why this Count query keeps returning 0 as the answer - Raven logs show this when Count is done
Request # 106: GET - 0 ms - TestStore - 200 - /indexes/dynamic/Posts?query=&start=0&pageSize=1&aggregation=None
For anyone still looking for the answer (this question was posted in 2011), the appropriate way to do this now is:
var numPosts = session.Query<Post>().Count();
To get the results from the index, you can use:
session.Query<Collection>("Raven/DocumentCollections")
.Where(x=>x.Name == "Posts")
.FirstOrDefault();
That will give you the result you want.