How to limit subnodes from each nodes Neo4j Cypher - cypher

I am new to Neo4j,I have the following situation
In the above diagram represented a node with label user with sub-nodes having label shops. Each of these sub-nodes have sub-nodes with label items. Each node items has attribute size and the items node is in descending order by size attribute for each node shops as represented in the figure.
Question
I want to get two items node whose size is less than or equal to 17 from each shops .
How to do that? I tried, but its not working the way I need
Here is what I have tried
match (a:user{id:20000})-[:follows]-(b:shops)
with b
match (b)-[:next*]->(c:items)
where c.size<=17
return b
limit 2
Note- These shops node can have thousands of items nodes. So how to find the desired nodes without traversing all thousands of items nodes.
Please help , thanks in advance.

Right now Cypher does not handle this case well enough, I would probably do a java based unmanaged extension for this.
It would look like this:
public List<Node> findItems(Node shop, int size, int count) {
List<Node> results=new ArrayList<>(count);
Node item = shop.getSingleRelationship(OUTGOING, "next").getEndNode();
while (item.getProperty("size") > size && results.size() < count) {
if (item.getProperty("size") <= size) result.add(item);
item = item.getSingleRelationship(OUTGOING, "next").getEndNode();
}
return result;
}
List<Node> results=new ArrayList<>(count*10);
for (Relationship rel = user.getRelationships(OUTGOING,"follows")) {
Node shop = rel.getEndNode();
results.addAll(findItems(shop,size,count));
}

You can avoid having to traverse all items of each shop by grouping them according to size. In this approach, your graph looks like this
(:User)-[:follows]-(:Shop)-[:sells]-(:Category {size: 17})-[:next]-(:Item)
You could then find two items per shop using
match (a:User {id: 20000})-[:follows]-(s:Shop)-[:sells]-(c:Category)
where c.size <= 17
with *
match p = (c)-[:next*0..2]-()
with s, collect(tail(nodes(p))) AS allCatItems
return s, reduce(shopItems=allCatItems[..0], catItems in allCatItems | shopItems + catItems)[..2]
shopItems=allCatItems[..0] is a workaround for a type checking problem, this essentially initializes shopItems to be an empty Collection of nodes.

Related

Why are all my SQL queries being duplicated 4 times for Django using "Prefetch_related" for nested MPTT children?

I have a Child MPTT model that has a ForeignKey to itself:
class Child(MPTTModel):
title = models.CharField(max_length=255)
parent = TreeForeignKey(
"self", on_delete=models.CASCADE, null=True, blank=True, related_name="children"
)
I have a recursive Serializer as I want to show all levels of children for any given Child:
class ChildrenSerializer(serializers.HyperlinkedModelSerializer):
url = HyperlinkedIdentityField(
view_name="app:children-detail", lookup_field="pk"
)
class Meta:
model = Child
fields = ("url", "title", "children")
def get_fields(self):
fields = super(ChildrenSerializer, self).get_fields()
fields["children"] = ChildrenSerializer(many=True)
return fields
I am trying to reduce the number of duplicate/similar queries made when accessing a Child's DetailView.
The view below works for a depth of 2 - however, the "depth" is not always known or static.
class ChildrenDetailView(generics.RetrieveUpdateDestroyAPIView):
queryset = Child.objects.prefetch_related(
"children",
"children__children",
# A depth of 3 will additionally require "children__children__children",
# A depth of 4 will additionally require "children__children__children__children",
# etc.
)
serializer_class = ChildrenSerializer
lookup_field = "pk"
Note: If I don't use prefetch_related and simply set the queryset as Child.objects.all(), every SQL query is duplicated four times... which I have no idea why.
How do I leverage a Child's depth (i.e. the Child's MPTT level field) to optimize prefetching? Should I be overwriting the view's get_object and/or retrieve?
Does it even matter if I add a ridiculous number of depths to the prefetch? E.g. children__children__children__children__children__children__children__children? It doesn't seem to increase the number of queries for Children objects that don't require that level of depth.
Edit:
Hm, not sure why but when I try to serialize any Child's top parent (i.e. MPTT's get_root), it duplicates the SQL query four times???
class Child(MPTTModel):
...
#property
def top_parent(self):
return self.get_root()
class ChildrenSerializer(serializers.HyperlinkedModelSerializer):
...
top_parent = ParentSerializer()
fields = ("url", "title", "children", "top_parent")
Edit 2
Adding an arbitrary SerializerMethodField confirms it's being queried four times... for some reason? e.g.
class ChildrenSerializer(serializers.HyperlinkedModelSerializer):
...
foo = serializers.SerializerMethodField()
def get_foo(self, obj):
print("bar")
return obj.get_root().title
This will print "bar" four times. The SQL query is also repeated four times according to django-debug-toolbar:
SELECT ••• FROM "app_child" WHERE ("app_child"."parent_id" IS NULL AND "app_child"."tree_id" = '7') LIMIT 21
4 similar queries. Duplicated 4 times.
Are you using DRF's browsable API? It initializes serializer 3 more times for HTML forms, in rest_framework.renderers.BrowsableAPIRenderer.get_context.
If you do the same request with, say, Postman, "bar" should get printed only once.

Java 8 Stream API - convert for loop over map & list iterator inside it

In the below code, I am trying to calculate the total price of a basket, where basket is a HashMap containing the products as key and the quantity as value. Promotions are available as a list of Promotion.
I am looping over every map entry and for each of them iterating the promotions. If the promotion matches, I am taking the promotion price (promotion.computeDiscountedPrice()) and removing the promotion from the list (Because a promotion is applicable only to a product & product is unique in the list)
If there is no promotion, we execute block.
if (!offerApplied) { /* .... */ }
Can you please help me in doing this same operation using JAVA 8 stream api?
BigDecimal basketPrice = new BigDecimal("0.0");
Map<String, Integer> basket = buildBasket(input);
List<Promotion> promotions = getOffersApplicable(basket);
for (Map.Entry<String, Integer> entry : trolley.entrySet()) {
boolean offerApplied = false;
Iterator<Promotion> promotionIterator = promotions.iterator();
while (promotionIterator.hasNext()) {
Promotion promotion = promotionIterator.next();
if (entry.getKey().equalsIgnoreCase(offer.getProduct().getProductName())) {
basketPrice = basketPrice.add(promotion.computeDiscountedPrice());
offerApplied = true;
promotionIterator.remove();
break;
}
if (!offerApplied) {
basketPrice = basketPrice.add(Product.valueOf(entry.getKey()).getPrice()
.multiply(new BigDecimal(entry.getValue())));
}
}
return basketPrice;
The simplest and cleaner solution, with a better performance than having to iterate the entire promotions list, is to start by creating a map of promotions identified by the product id (in lower case or upper case [assuming no case collision occurs by the use of equalsIgnoreCase(..)]).
Map<String, Promotion> promotionByProduct = promotions.stream()
.collect(Collectors.toMap(prom -> prom.getProduct()
.getProductName().toLowerCase(), Function.identity()));
This will avoid the need to iterate over the entire array when searching for promotions, it also avoids deleting items from it, which in case of being an ArrayList would need to shift to left the remaining elements each time the remove is used.
BigDecimal basketPrice = basket.keySet().stream()
.map(name -> Optional.ofNullable(promotionByProduct.get(name.toLowerCase()))
.map(Promotion::computeDiscountedPrice) // promotion exists
.orElseGet(() -> Product.valueOf(name).getPrice()) // no promotion
.multiply(BigDecimal.valueOf(basket.get(name))))
.reduce(BigDecimal.ZERO, BigDecimal::add);
It iterates for each product name in the basket, then checks if a promotion exists, it uses the computeDiscountedPrice method, otherwise it looks the product with Product.valueOf(..) and gets the price, after that it mutiplies this value by the quantity of products in the basket and finally the results are reduced (all values of the basket are added) with the BigDecimal.add() method.
Important thing to note, is that in your code, you don't multiply by the quantity the result of promotion.computeDiscountedPrice() (this code above does), i'm not sure if that is a type in your code, or that's the way it should behave.
If case it is in fact the way it should behave (you don't want to multiply quantity by promotion.computeDiscountedPrice()) the code would be:
BigDecimal basketPrice = basket.keySet().stream()
.map(name -> Optional.ofNullable(promotionByProduct.get(name.toLowerCase()))
.map(Promotion::computeDiscountedPrice)
.orElseGet(() -> Product.valueOf(name).getPrice()
.multiply(BigDecimal.valueOf(basket.get(name)))))
.reduce(BigDecimal.ZERO, BigDecimal::add);
Here the only value multiplied by quantity would be the product price obtained with Product.valueOf(name).getPrice().
Finally another option, all in one line and not using the map (iterating over the promotions) using the first approach (multipling by quantity in the end):
BigDecimal basketPrice = basket.keySet().stream()
.map(name -> promotions.stream()
.filter(prom -> name.equalsIgnoreCase(prom.getProduct().getProductName()))
.findFirst().map(Promotion::computeDiscountedPrice) // promotion exists
.orElseGet(() -> Product.valueOf(name).getPrice()) // no promotion
.multiply(BigDecimal.valueOf(basket.get(name))))
.reduce(BigDecimal.ZERO, BigDecimal::add);

Arangodb AQL query

I have data organized such way:
There are 1k of teachers, 10k of pupils, every pupil has ~100 homeworks.
I need get all homeworks of pupils, related to a teacher via classes, or by direct link between them. All vertices and edges have some attributes, and let's suppose all required indices are already built, or we can discuss them a bit later.
I can get all required pupils ids by such fast enough query:
$query1 = "FOR v1 IN 1..1 INBOUND #teacherId teacher_pupil FILTER v1.deleted == false RETURN DISTINCT v1._id";
$query2 = "FOR v2 IN 2..2 INBOUND #teacherId OUTBOUND teacher_class, INBOUND pupil_class FILTER v2.deleted == false RETURN DISTINCT v2._id";
$queryUnion = "FOR x IN UNION_DISTINCT (($query1), ($query2)) RETURN x";
Then I wrote the following:
$query = "
LET pupilIds = ($queryUnion)
FOR pupilId IN pupilIds
LET homeworks = (
FOR homework IN 1..1 ANY pupilId pupil_homework
return [homework._id, pupilId]
)
RETURN homeworks";
I got my homeworks, and I even can try filter them, but the query is too slow - that's an incorrect way, I believe.
Question 1 How can I do it without getting all Homeworks huge amount to memory at a time (LIMIT or whatever), sorting and filtering Homeworks by vertex' attributes fast and efficient? I'm sure limiting pupils, or pupil-related homeworks in the query/subquery's FOR leads to incorrect sorting/pagination.
I did another try with pure graph AQL query:
$query1 = "FOR v1 IN 2..2 INBOUND #teacherId pupil_teacher, OUTBOUND pupil_homework RETURN v1._id";
$query2 = "FOR v2 IN 3..3 INBOUND #teacherId teacher_class, pupil_class, OUTBOUND pupil_homework RETURN v2._id";
$query = "FOR x IN UNION_DISTINCT (($query1), ($query2)) LIMIT 500, 500 RETURN x";
It isn't much faster, and I don't know how filter Teacher vertices by attributes.
Question 2 What approach is the best for building such AQL queries, how can I access vertices of a graph filtering all path's parts by attributes? Can I paginate the result to save memory and speedup the query? How can I speed up it at all?
Thank you!
Assuming teacher and pupil are related to each other via classes(2 outbound links) or directly(single outbound link) and with no other way you can do something like this
FOR v IN 1..2 OUTBOUND "teacher_id" GRAPH "graph_name"
FILTER LIKE(v._id, "pupil_collection_name/%")
FOR homeworks IN 1 OUTBOUND v GRAPH "graph_name"
LIMIT lowerLimit,numberOfItems
RETURN homeworks
But if there is a possibility that a teacher and pupil can be related to each other with something other than a class we would have to filter our query with respect to the edge we are seeing as well
FOR v IN 1..2 OUTBOUND "teacher_id" GRAPH "graph_name"
FILTER LIKE(v._id, "pupil_collection_name/%") && (e.name == "ClassPupil" || e.name == "TeacherPupil")
FOR homeworks IN 1 OUTBOUND v GRAPH "graph_name"
LIMIT lowerLimit,numberOfItems
RETURN homeworks
Note that since same teacher can be related to a pupil directly as well as via a class, we can have non unique homeworks . Hence using a RETURN DISTINCT homeworks is suggested. But if duplications are not a problem, the above query should work

Cypher Query does not work

I represent SMS text messages sent by persons. In order to simplify the problem I have only 2 nodes (one Person, one Phone) and 3 relationship (the person has a phone and sent to himself two text messages). The graph was created as follows:
try ( Transaction tx = graphDb.beginTx() )
{
Node aNode1 = graphDb.createNode();
aNode1.addLabel(DynamicLabel.label("Person"));
aNode1.setProperty("Name", "Juana");
tx.success();
Node aNode2 = graphDb.createNode();
aNode2.addLabel(DynamicLabel.label("Phone"));
aNode2.setProperty("Number", "1111-1111");
tx.success();
// (:Person) -[:has]->(:Phone)
aNode1.createRelationshipTo(aNode2, RelationshipType.withName("has"));
tx.success();
// finally SMS text sent at different moments
// (:Phone) -[:sms]-> (:Phone)
Relationship rel1 = aNode2.createRelationshipTo(aNode2, RelationshipType.withName("sms"));
rel1.setProperty("Length", 100);
tx.success();
Relationship rel2 = aNode2.createRelationshipTo(aNode2, RelationshipType.withName("sms"));
rel2.setProperty("Length", 50);
tx.success();
}
When I execute the following Cypher query:
MATCH (p1 :Person)-[:has]-> (n1 :Phone) -[r :sms]-(n2: Phone)<-[:has]-(p2 :Person)
RETURN p1, p2
I obtain zero tuples. I do not understand the resultset because I have to sms text between p1 and p2 (in this case the same person).
Surprisingly, if I eliminates the node p2 in the query:
MATCH (p1 :Person)-[:has]-> (n1 :Phone) -[r :sms]-(n2: Phone)
RETURN p1
I obtain Juana, as expected.
I can not understand the resultset (zero tuples) of my first query.
Cypher will only traverse a particular relationship once when trying to traverse a path, to avoid infinite loops. You already traverse the (p1:Person) - [:HAS] - > (n1:Phone) relationship once in your query, so you can't get back to Juana from her phone. This doesn't apply to nodes, however, which is why you can get her phone twice from the looped [:sms] relationship.

Using LINQ to pull collection until aggregate condition met

At a high level, I need a query that can pull a subset of records based on the sum of a column, just like Linq: How to query items from a collection until the sum reaches a certain value.
However, the key difference is that he's already got his records in an object, and I don't and can't. My table can have millions of records. If I build my query the way he did, I get this error:
"A lambda expression with a statement body
cannot be converted to an expression tree"
Which makes sense after researching it, LINQ can't turn the answer in the above referenced question into valid SQL.
I'm going to make a hypothetical table that represents my situation.
Order Id | Cookie Name | Qty
1 Sugar 5
2 Snickerdoodle 4
3 Chocolate chip 8
4 Snickerdoodle 10
5 Snickerdoodle 5
Given this sample, I need to write a query that grabs the first X orders of Snickerdoodle until the summed Qty exceedes an input from the parameter (i.e. If the user chooses 13, it would return records 2 & 4 ).
I'm using Nhibernate.Linq, because I'm more comfortable in LINQ. I'm completely open to ICreate if the need arises.
As a side note, I'm interested in this as a concept as well as a direct problem. Even though I need a Sum, there has to be a way to do something akin to a takewhile that executes until a condition is met.
pragmatic approach
int needed = ...;
int actual = 0;
int page = 0;
const int pagesize = 20; // set to some sensible value, eg. the pagesize of the grid shown to the user
var results = new List<CookieOrder>();
while (actual < needed)
{
var partialResults = session.Query<CookieOrder>()
.Where(c => c.Name == "Snickerdoodle")
.OrderBy(c => c.Id)
.Skip(page * pagesize)
.Take(pagesize)
.ToList();
for(int i = 0; i < partialResults.Length && actual < needed; i++)
{
results.Add(partialResults[i]);
actual = partialResults[i].Quantity;
}
page++;
}
return results;