I'm having problems undestanding some cypher concepts - naming nodes - cypher

I'm following one tutorial and I'm having a hard time understating the very basic things and I need your help. The code is:
CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"});
The explanation for this code is: The query above will create 2 nodes in the database, one labeled "User" with name "Alice" and the other labeled "Software" with name "Memgraph". It will also create a relationship that "Alice" likes "Memgraph".
This part I get.
What I don't get is this:
To find created nodes and relationships, execute the following query:
MATCH (u:User)-[r]->(x) RETURN u, r, x;
If I've created a node that has a variable (or how is u called), why is the relation related to as r, and software as x? When a relation was created, it was just defined as :Likes, and software was m.
From where do r and x come from? Is there any connection between CREATE and MATCH or is the order the only an important thing and names are not important at all?

When you were creating nodes and a relationship between them, you used variables, but you could have done it like this:
CREATE (:User {name: "Alice"})-[:Likes]->(:Software {name: "Memgraph"});
The above query would also create nodes and a relationship. We use variables when we want to use certain objects in the other parts of one query. For example, if you wanted to also return the created nodes, you would execute the following query:
CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"})
RETURN u, m;
You can notice that you needed variables u and m, in order to return them. Also, you can't return the relationship object, since you did not assign variable to it.
Now to answer the other part of your question - variables are used in one query and they are not memorized in any way to be used in other query. So if you ran:
CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"});
and then you want to get u and m in some other query, those won't be Alice and Memgraph nodes anymore. Hence, when you execute:
MATCH (u:User)-[r]->(x) RETURN u, r, x;
you get every node labeled with User, regardless of Alice from the query before (but Alice will be included too, since they are a person), and all other nodes that the User nodes are related to, along with those relationships. You choose your variable names, so it does not matter whether they are u, r and x or m, n and l.

Related

OrientDB graph query that match specific relationship

I am developing an application using OrientDB as a database. The database is already filled, and now I need to make some queries to obtain specific information.
I have 3 classes and 3 edges to be concerned. What I need to do is query the database to see if some specific relationship exists. The relationship is like this:
ParlamentarVertex --Realiza> TransacaoVertex --FornecidaPor> EmpresaFornecedoraVertex AND ParlamentarVertex --SocioDe> EmpresaFornecedoraVertex
The names with a vertex in it are a vertex of course, and the arrows are the edges between the two vertexes.
I've tried to do this:
SELECT TxNomeParlamentar, SgPartido, SgUF FROM Parlamentar where ...
SELECT EXPAND( out('RealizaTransacao').out('FornecidaPor') ) FROM Parlamentar
But I do not know how to specify the relationships after the where clause.
I've also tried to use match
MATCH {class: Parlamentar, as: p} -Realiza-> {as:realiza}
But I am not sure how to specify the and clause that is really important for my query.
Does anyone have some tip, so I can go in the right direction?
Thanks in advance!
EDIT 1
I've managed to use the query below:
SELECT EXPAND( out('RealizaTransacao').out('FornecidaPor').in('SocioDe') ) FROM Parlamentar
It almost works, but return some relationships incorrectly. It looks like a join that I did not bind the Pk and FK.
The easiest thing here is to use a MATCH as follows:
MATCH
{class:ParlamentarVertex, as:p} -Realiza-> {class:TransacaoVertex, as:t}
-FornecidaPor-> {class:EmpresaFornecedoraVertex, as:e},
{as:p} -SocioDe-> {as:e}
RETURN p, p.TxNomeParlamentar, p.SgPartido, p.SgUF, t, e
(or RETURN whatever you need)
As you can see, the AND is represented as the addition of multiple patterns, separated by a comma

REST solutions when not strictly referring to an actual resource

Take a simple table person:
CREATE TABLE person
(
id bigint,
name nvarchar(128),
age int
)
You can represent this in a REST interface:
GET /person
GET /person/5
PUT /person
POST /person/5
PATCH /person/5
DELETE /person/5
This interface would expect 2 parameters:
{
name: 'Joe',
age: 16,
}
You could then define an API that would expect those two parameters, and even make the age optional.
However, suppose you wanted to define a model on the client side that wanted to do fancy things with this person table, such as pulling all teenagers, how would you best represent this?
I suppose I could do something this, only support GET, and then arbitrarily require different parameters which matched the query needs:
GET /person/teenager
However, I don't know that this would properly meet all use cases. For example, I believe REST urls should only have nouns, and I am not sure how to put something like this into noun form:
GET /person/by-age
Any ideas / references / suggestions?
The most common way to constrain your person listing results is to use query parameters. You can define your query parameters in any way that is useful for your API. One example might be GET /person?age=13..20 to get only teenagers. Another example might be GET /person/?filter=age>=13,age<20.

Neo4j Match Node Property OR Relationship Property

I'm trying to write a query that will return nodes that either match a node property or a relationship property.
For instance, I want all nodes where the name property is George OR if the relationship property status is "good". I have two queries that will get the nodes for each of these:
MATCH (n) where n.name = 'George' return n
MATCH (n)-[r]-() where r.status = 'good' return n
Is there a singe query I could write to get these combined results? I thought I could use this optional query (below), but I seemed to have misunderstood the optional match clause because I'm only getting nodes from the first query.
MATCH (n) where n.name = 'George'
Optional MATCH (n)-[r]-() where r.status = 'good' return distinct n
By the time the optional match happens, the only n nodes that are around to make the optional match from are the ones that already match the first criteria. You can do
MATCH (n)
WHERE n.name = 'George' OR n-[{ status:"good" }]->()
RETURN n
but for larger graphs remember that this will not make efficient use of indices.
Another way would be
MATCH (n {name:"George"})
RETURN n
UNION MATCH (n)-[{status:"good"})->()
RETURN n
This should do better with indices for the first match, assuming you use a label and have the relevant index set up (but the second part would still potentially be very inefficient).
Edit
Re comment, relationship indexing would make that part faster, correct, but to my mind it would be better to say it is slow because the pattern is underdetermined. The second match pattern does something like
bind every node in the graph to (n)
get all outgoing relationships (regardless of type) from (n)
check relationship for status="good"
You could improve performance with relationship indexing, but since a relationship exists only between the two nodes it relates, you can think of it instead as indexed by those nodes. That is, fix the first bullet point by excluding nodes whose relationships are not relevant. The two match clauses could look like
MATCH (n:Person {name:"George"})
// add label to use index
MATCH (n:Person)-[{status:"good"}]->()
// add label to limit (n) -no indexing, but better than unlimited (n)
MATCH (n:Person {name:"Curious"})-[{status:"good"}]->()
// add label to use index -now the relationships are sort-of-indexed
and/or type the relationship
MATCH (n)-[:REL {status:"good"}]->() // add type to speed up relationship retrieval
in fact, with anonymous relationships and rel property, it would probably make sense (bullet point three) to make the property the type, so
MATCH (n)-[:GOOD]->() // absent type, it would make sense to use the property as type instead
Your actual queries may look very different, and your question wasn't really about query performance at all :) oh well.

Django - finding the extreme member of each group

I've been playing around with the new aggregation functionality in the Django ORM, and there's a class of problem I think should be possible, but I can't seem to get it to work. The type of query I'm trying to generate is described here.
So, let's say I have the following models -
class ContactGroup(models.Model):
.... whatever ....
class Contact(models.Model):
group = models.ForeignKey(ContactGroup)
name = models.CharField(max_length=20)
email = models.EmailField()
...
class Record(models.Model):
contact = models.ForeignKey(Contact)
group = models.ForeignKey(ContactGroup)
record_date = models.DateTimeField(default=datetime.datetime.now)
... name, email, and other fields that are in Contact ...
So, each time a Contact is created or modified, a new Record is created that saves the information as it appears in the contact at that time, along with a timestamp. Now, I want a query that, for example, returns the most recent Record instance for every Contact associated to a ContactGroup. In pseudo-code:
group = ContactGroup.objects.get(...)
records_i_want = group.record_set.most_recent_record_for_every_contact()
Once I get this figured out, I just want to be able to throw a filter(record_date__lt=some_date) on the queryset, and get the information as it existed at some_date.
Anybody have any ideas?
edit: It seems I'm not really making myself clear. Using models like these, I want a way to do the following with pure django ORM (no extra()):
ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])
Putting the subquery in the where clause is only one strategy for solving this problem, the others are pretty well covered by the first link I gave above. I know where-clause subselects are not possible without using extra(), but I thought perhaps one of the other ways was made possible by the new aggregation features.
It sounds like you want to keep records of changes to objects in Django.
Pro Django has a section in chapter 11 (Enhancing Applications) in which the author shows how to create a model that uses another model as a client that it tracks for inserts/deletes/updates.The model is generated dynamically from the client definition and relies on signals. The code shows most_recent() function but you could adapt this to obtain the object state on a particular date.
I assume it is the tracking in Django that is problematic, not the SQL to obtain this, right?
First of all, I'll point out that:
ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])
will not get you the same effect as:
records_i_want = group.record_set.most_recent_record_for_every_contact()
The first query returns every record associated with a particular group (or associated with any of the contacts of a particular group) that has a record_date less than the date/ time specified in the extra. Run this on the shell and then do this to review the query django created:
from django.db import connection
connection.queries[-1]
which reveals:
'SELECT "contacts_record"."id", "contacts_record"."contact_id", "contacts_record"."group_id", "contacts_record"."record_date", "contacts_record"."name", "contacts_record"."email" FROM "contacts_record" WHERE "contacts_record"."group_id" = 1 AND record_date = (select max(record_date) from contacts_record r where r.id=contacts_record.id and r.record_date <= \'2009-07-18\')
Not exactly what you want, right?
Now the aggregation feature is used to retrieve aggregated data and not objects associated with aggregated data. So if you're trying to minimize number of queries executed using aggregation when trying to obtain group.record_set.most_recent_record_for_every_contact() you won't succeed.
Without using aggregation, you can get the most recent record for all contacts associated with a group using:
[x.record_set.all().order_by('-record_date')[0] for x in group.contact_set.all()]
Using aggregation, the closest I could get to that was:
group.record_set.values('contact').annotate(latest_date=Max('record_date'))
The latter returns a list of dictionaries like:
[{'contact': 1, 'latest_date': somedate }, {'contact': 2, 'latest_date': somedate }]
So one entry for for each contact in a given group and the latest record date associated with it.
Anyway, the minimum query number is probably 1 + # of contacts in a group. If you are interested obtaining the result using a single query, that is also possible, but you'll have to construct your models in a different way. But that's a totally different aspect of your problem.
I hope this will help you understand how to approach the problem using aggregation/ the regular ORM functions.

Hierarchical tagging in SQL

I have a PHP web application which uses a MySQL database for object tagging, in which I've used the tag structure accepted as the answer to this SO question.
I'd like to implement a tag hierarchy, where each tag can have a unique parent tag. Searches for a parent tag T would then match all descendants of T (i.e. T, tags whos parent is T (children of T), grandchildren of T, etc.).
The easiest way of doing this seems to be to add a ParentID field to the tag table, which contains the ID of a tag's parent tag, or some magic number if the tag has no parent. Searching for descendants, however, then requires repeated full searches of the database to find the tags in each 'generation', which I'd like to avoid.
A (presumably) faster, but less normalised way of doing this would be to have a table containing all the children of each tag, or even all the descendants of each tag. This however runs the risk of inconsistent data in the database (e.g. a tag being the child of more than one parent).
Is there a good way to make queries to find descendants fast, while keeping the data as normalised as possible?
I implemented it using two columns. I simplify it here a little, because I had to keep the tag name in a separate field/table because I had to localize it for different languages:
tag
path
Look at these rows for example:
tag path
--- ----
database database/
mysql database/mysql/
mysql4 database/mysql/mysql4/
mysql4-1 database/mysql/mysql4-1/
oracle database/oracle/
sqlserver database/sqlserver/
sqlserver2005 database/sqlserver/sqlserver2005/
sqlserver2005 database/sqlserver/sqlserver2008/
etc.
Using the like operator on the path field you can easily get all needed tag rows:
SELECT * FROM tags WHERE path LIKE 'database/%'
There are some implementation details like when you move a node in the hierarchy you have to change all children too etc., but it's not hard.
Also make sure that the length of your path is long enough - in my case I used not the tag name for the path, but another field to make sure that I don't get too long paths.
Ali's answer has a link to Joe Celko's Trees and Hierarchies in SQL for Smarties, which confirms my suspicion - there isn't a simple database structure that offers the best of all worlds. The best for my purpose seems to be the "Frequent Insertion Tree" detailed in this book, which is like the "Nested Set Model" of Ali's link, but with non-consecutive indexing. This allows O(1) insertion (a la unstructured BASIC line numbering), with occasional index reorganisation as and when needed.
A few ways here
You could build what Kimball calls a Hierarchy Helper Table.
Say you hierarchy looks like this: A -> B | B -> C | C -> D
you'd insert records into a table that looks like this
ParentID, ChildID, Depth, Highest Flag, Lowest Flag
A, A, 0, Y, N
A, B, 1, N, N
A, C, 2, N, N
A, D, 3, N, Y
B, B, 0, N, N
B, C, 1, N, N
B, D, 2, N, Y
C, C, 0, N, N
C, D, 1, N, Y
D, D, 0. N, Y
I think I have that correct.... anyways. The point is you still store you hierarchy correctly, you just build this table FROM your proper table. THIS table queries like a Banshee. Say you want to know what all the first level below B are.
WHERE parentID = 'B' and Depth = 1
I would use some kind of array to store the children tags, this should be a lot faster than joining a table on itself (especially if you have a large number of tags). I had a look, and I can't tell if mysql has a native array data type, but you can emulate this by using a text column and storing a serialized array in it. If you want to speed things up further, you should be able to put a text search index on that column to find out which tags are related.
[Edit]
After reading Ali's article, I did some more hunting and found this presentation on a bunch of approaches for implementing hierarchies in postgres. Might still be helpful for explanatory purposes.