What query do I have to execute to get back all of the wanted results from Memgraph's database? - cypher

I have a node in Memgraph's database called Page and a relationship called HasLink.
What query do I have to execute to get back all of the pages that are connected with relationship HasLink?

I tried running
MATCH (p1:Page)-[link:HasLink]->(p2:Page) RETURN p1, link, p2
And it worked. I just needed to check my node and relationship labels. Found out about this as well: Memgraph docs have Cypher manual with a lot of useful things for begginers.

Related

OrientDB does not show out relationships correctly when querying from vertex

I am having a strange issue when querying Orientdb for a list of outgoing relationships from a node. Imagine that I have a node #34:1 with 100 outgoing relationships of class CONTACT_OF. I can query OrientDB for this relationships using the vertex or using the edge just like this
Using the vertex
SELECT outE('CONTACT_OF') FROM #34:1
As a result OrientDB returns '[]'. This makes no sense at all, since the node has been connected with 100 contacts. I've tried with other kind of relationship and it works as expected, but for some reason that I don't understand it returns '[]' when querying for CONTACT_OF.
Using the edge class
SELECT FROM CONTACT_OF WHERE out=#34:1
As a result OrientDB returns the 100 records of contacts.
The question is why when executing
SELECT outE('CONTACT_OF') FROM #34:1
the result is an empty array?
Any help would be appreciated, thanks.
EDIT: I am using OrientDB community 2.1-rc3
Here is a sample of the anomaly in orientdb studio http://s8.postimg.org/5p1vxbk45/Captura.png

Edges records not showing up in OrientDB

I've discovered recently about OrientDB and I've been playing a little with this tool these past few weeks. However, I noticed today that something seemed to be wrong whenever I added an edge between two vertices. The edge record is not present if I make a query such as SELECT FROM E, this just returns an empty set. In spite of this, it is possible to see the relationship as a property in the nodes, and queries like SELECT IN() FROM V do work.
This poses an issue; if I can't access directly the edge record, I can't modify it with more properties, or even if I could, I wouldn't be able to see the changes made. I thought this could be a design decision for some reason but the GratefulDeadConcerts example database doesn't seem to have this problem.
I'll illustrate my question with an example:
Let's create a graph database in OrientDB from scratch and name it "Test". We'll create a couple of vertices:
CREATE VERTEX SET TEST=123
CREATE VERTEX SET TEST=456
Let's assume the #rid of these nodes are #9:0 and #9:1 respectively, as we haven't changed anything from the default settings. Let's create an edge between them:
CREATE EDGE FROM #9:0 TO #9:1
Now, let's take a look at the output of the query SELECT FROM V:
orientdb {Test}> SELECT FROM V
----+----+----+----+----
# |#RID|TEST|out_|in_
----+----+----+----+----
0 |#9:0|123 |#9:1|null
1 |#9:1|456 |null|#9:0
----+----+----+----+----
2 item(s) found. Query executed in 0.005 sec(s).
Everything looks right so far. However, the output of the query SELECT FROM E is simply 0 item(s) found. Query executed in 0.016 sec(s).. If we execute SELECT IN() FROM V we get the following:
orientdb {Test}> SELECT IN() FROM V
----+-----+----
# |#RID |IN
----+-----+----
0 |#-2:1|[0]
1 |#-2:2|[1]
----+-----+----
2 item(s) found. Query executed in 0.005 sec(s).
From this, I assume that the edges are created in cluster number -2, even if the default cluster for the class E is 10, and I haven't added any other clusters. I suspect this has something to do with the problem, but I'm not sure how to fix it. I have tried adding new clusters to the class E and creating the edges in this new cluster, but to no avail, I keep getting the exact same result.
So my question is, how do I make edges records show up in OrientDB?
I'm using OrientDB Community 1.7-RC2 and have tried this in two different machines, one Windows 7 and another one Debian Wheezy.
Extracted from https://github.com/orientechnologies/orientdb/wiki/Troubleshooting#why-i-cant-see-all-the-edges:
OrientDB, by default, manages edges as "lightweight" edges if they have no properties. This means that if an edge has no properties, it's not stored as physical record. But don't worry, your edge is still there but encoded in a separate data structure. For this reason if you execute a select from Eno edges or less edges than expected are returned. It's extremely rare the need to have the list of edges, but if this is your case you can disable this feature by issuing this command once (with a slow down and a bigger database size):
alter database custom useLightweightEdges=false

neo4j count nodes performance on 200K nodes and 450K relations

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.
We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.
With Cypher we quering like this.
START u=node:users('id:*')
RETURN count(u)
And the results are
Returned 1 row.Query took 4080ms
The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.
Is there any other way to do this, maybe with Gremlin, or with some other server plugin?
I'll cache those results, but I want to know if is possible to tweak this.
Thanks a lot and sorry for my poor english.
Finaly, using Gremlin instead of Cypher, I found the solution.
g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
new org.neo4j.index.lucene.QueryContext('*')
).size()
This method uses the lucene index to get "aproximate" rows.
Thanks again to all.
Mmh,
this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?
You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:
Transaction tx = db.beginTx();
try {
...
...
tx.acquireWriteLock( countingNode );
countingNode.setProperty( "user_count",
((Integer)countingNode.getProperty( "user_count" ))+1 );
tx.success();
} finally {
tx.finish();
}
If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :
company1-[:IS_ENTITY]->companyentity
Or if you are using 2.0
company1:COMPANY
The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0
The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.
Your queries would look like this :
v2.0
MATCH company:COMPANY
RETURN count(company)
v1.9
START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)

django objects...values() select only some fields

I'm optimizing the memory load (~2GB, offline accounting and analysis routine) of this line:
l2 = Photograph.objects.filter(**(movie.get_selectors())).values()
Is there a way to convince django to skip certain columns when fetching values()?
Specifically, the routine obtains all rows of the table matching certain criteria (db is optimized and performs it very quickly), but it is a bit too much for python to handle - there is a long string referenced in each row, storing the urls for thumbnails.
I only really need three fields from each row, but, if all the fields are included, it suddenly consumes about 5kB/row which sadly pushes the RAM to the limit.
The values(*fields) function allows you to specify which fields you want.
Check out the QuerySet method, only. When you declare that you only want certain fields to be loaded immediately, the QuerySet manager will not pull in the other fields in your object, till you try to access them.
If you have to deal with ForeignKeys, that must also be pre-fetched, then also check out select_related
The two links above to the Django documentation have good examples, that should clarify their use.
Take a look at Django Debug Toolbar it comes with a debugsqlshell management command that allows you to see the SQL queries being generated, along with the time taken, as you play around with your models on a django/python shell.

What does the ISPALUSER function call in the Msmerge_*_VIEW views do?

I'm trying to understand how SQL Server 2005 replication works, and I'm looking at the views titled Msmerge_[Publication]_[Table]_VIEW. These views seems to define the merge filters, and are pretty straight forward, except for one line of sql in the WHERE clause:
AND ({fn ISPALUSER('1A381615-B57D-4915-BA4B-E16BF7A9AC58')} = 1)
What does the ISPALUSER function do? I cannot seem to find it anywhere under functions in management studio, nor really any mention of it on the web.
(The reason I'm looking at these views is that we have a performance issue when a client replicates up new records. Sql like if not exists (select 1 from [MSmerge_[Publication]_[Table]_VIEW] where [rowguid] = #rowguid) is running and taking 10+ seconds per row, which obiously kills performance when you have more than a couple of rows going up)
Seems its checking if the user is in the special security role MSmerge_PAL_role, which seems to govern who has access to the replication functionality.
Therefore, ISPALUSER checks if the user is in that specific role.
Still not sure what the PAL stands for.