Can't display paths for matched nodes with multiple incoming relationships in Cypher - cypher

I have a Neo4j graph based on this pattern: (:Entity)-[:HAS_VALUE]->(:Value)-[:HAS_SYNONYM]->(:Synonym).
Each Entity can have multiple Values, and each Value can have multiple Synonyms. I'm looking for ambiguous synonyms -- Synonym nodes with more than 1 Value node connected to them.
I can find the offending Synonym nodes as follows:
MATCH ()-[r:HAS_SYNONYM]->(n) WITH n, count(r) as num_values WHERE num_values > 1 RETURN n
However, when I try to display the paths (so that I can see the interconnections with the offending Values), the browser gives me "(no changes, no records)". Here are some of the queries I tried:
MATCH (n1:Value)-[r:HAS_SYNONYM]->(n2:Synonym) WITH n1, n2, r, count(r) as num_values WHERE num_values > 1 RETURN n1, r, n2
MATCH (n1)-[r:HAS_SYNONYM]->(n2) WITH n1, n2, r, count(r) as num_values WHERE num_values > 1 RETURN n1, r, n2
MATCH p = (n)-[r:HAS_SYNONYM]->() MATCH (n)-[r:HAS_SYNONYM]->() WITH p, n, count(r) as num_values WHERE num_values > 1 RETURN nodes(p)
Is my syntax wrong, or is it more fundamental?

The problem with your queries is the WITH command. When we write
WITH n1, n2, r, count(r), the count is computed for each distinct combination of n1, n2, and r, which will always be 1, hence you see no changes, no records.
Try calculating the count by n2 nodes first, then filter the synonyms with count > 1, then fetch the paths for those synonyms.
Like this:
MATCH (:Value)-[r:HAS_SYNONYM]->(n2:Synonym) WITH n2, count(r) as num_values WHERE num_values > 1
MATCH (n2)<-[rel:HAS_SYNONYM]-(v:Value)
RETURN v, rel, n2

Related

How to check if all items in a group are contained in a string for many items? - RegEx

I'm planning on embedding RegEx in my SQL query so can't really use a loop for this.
Essentially, I'm trying to check a series of groups to see if the name of my column contains all the attributes of any of those individual groups (i.e. groups must be checked against independently and not altogether).
For example,
group1 = l, w, q
group2 = o, l, d
If the column name contains all the items in group1 (e.g. low_quality) should return true. Similarly, cold would also return true since it matches against every item in group2.
How would I go about validating whether or not my column name contains every item in any group? (items can be >1 character and I have around 40 groups to test against).
Could something along the following lines be modified?
SELECT column_name,
(CASE WHEN REGEXP_LIKE(column_name, ((?=l)(?=w)(?=q))|((?=o)(?=l)(?=d))) THEN true ELSE NULL
END) AS flag
FROM information_schema.columns;
I'm just not sure how to check against multiple groups independently.
You can use
REGEXP_LIKE(column_name, '^(?:(?=.*l)(?=.*w)(?=.*q)|(?=.*o)(?=.*l)(?=.*d))')
Details:
^ - start of string
(?: - a non-capturing group matching either of the two patterns:
(?=.*l)(?=.*w)(?=.*q) - l, w or q in any order
| - or
(?=.*o)(?=.*l)(?=.*d) - o, l or d in any order
) - end of the group.

max of a max of a function for subset of ids in single query?

How do I make the following code into a single query :
Nodes : id,value
i.e. get back Max of the Max of function applied for subset of nodes against the whole table
this is pseudo code.
The DB is postgresql
#select the nodes, filtered by some criteria
Nodes = select id,value from nodes where ....
#for every node.value find the max of fun() applied to the whole table, collect it
FOR n IN Nodes :
Maxes.append(
select s.id, MAX(fun(n.value, s.value))
from nodes s
where s.id != n.id
)
#find the Max-score&Id of the collected Max scores
ID,score = MAX(Maxes)
in sql , you just call max() , it's an aggregation function and runs against the whole data set you feed it, so your query would be as simple as:
select max(fun(n1.value, n2.value))
from nodes n1
join nodes n2
on n1.id <> n2.id
where ....
of course you need to define and make fun() function in advance in your postgresl

Make a query that prints unique pairs?

If you had a relation with the schema numbers(numtype, ccnum) where numtype and cpr together form a key. Then how would you print all pairs of ccnums that have the same numtype? I have thought about something like this
SELECT N1.ccnum AS cc1, N2.ccnum AS cc2
FROM numbers AS N1, numbers AS N2
WHERE N1.numtype = N2.numtype AND N1.ccnum <> N2.ccnum
that is, taking the product of two numbers relations on the condition given in the WHERE clause. The problem(there may be more, so if you see one, please point it out :)) ) is that pairs would be printet twice in the form of (a, b) and (b, a). I only want one of those. How would you write the query?
Use < rather than <>:
SELECT N1.ccnum AS cc1, N2.ccnum AS cc2
FROM numbers N1 JOIN
numbers N2
ON N1.numtype = N2.numtype AND N1.ccnum < N2.ccnum;
Notice that I replace the , with JOIN and WHERE with ON. I would advise you to learn modern, explicit, standard, readable JOIN syntax.

NTH in Legacy SQL in BigQuery doesn't work as expected

I have this query written in Legacy SQL:
select
nth(1, a) first_a,
nth(1, b) first_b
from (
select *
from
(select 12 a, null b),
(select null a, 54 b)
)
As a result I was expecting one row with values (12, null), but I got (12, 54) instead. In the documentation for NTH it says:
NTH(n, field)
Returns the nth sequential value in the scope of the function, where n
is a constant. The NTH function starts counting at 1, so there is no
zeroth term. If the scope of the function has less than n values, the
function returns NULL.
There is nothing indicating that nulls would be ignored.
Is this a bug in BigQuery?
This is the important part in the documentation:
in the scope of the function
The scope is normally a "record" (in legacy SQL terms), where you fetch the nth value within a repeated field. As written, though, this query has the effect of using NTH as an aggregate function. The values in the group have no well-defined order, but it so happens that NULL is ordered after the non-null values, so NTH(1, ...) gives a non-null value. Try using 2 as the ordinal instead, for instance:
select
nth(2, a) first_a,
nth(2, b) first_b
from (
select *
from
(select 12 a, null b),
(select null a, 54 b)
)
This returns null, null as output.
With that said, to ensure well-defined semantics in your queries, the best option is to use standard SQL instead. Some analogues to the NTH operator when using standard SQL are:
The array bracket operator, e.g. array_column[OFFSET(0)] to get the first element in an array.
The NTH_VALUE window function, e.g. NTH_VALUE(x, 1) OVER (PARTITION BY y ORDER BY z). See also FIRST_VALUE and LAST_VALUE.

How to select only those rows which have more than one of given fields with values

Is there some elegant way to do that, without a big WHERE with lots of AND and OR? For example there are 4 columns: A, B, C, D. For each row the columns have random integer values. I need to select only those rows which have more than one column with a non-zero value. For example (1,2,3,4) and (3,4,0,0) should get selected, however (0,0,7,0) should not be selected (there are no rows that have zeros only).
PS. I know how this looks but the funny thing is that this is not exam or something, it's a real query which I need to use in a real app :D
SELECT *
FROM mytable
WHERE (0, 0, 0) NOT IN ((a, b, c), (a, b, d), (a, c, d), (b, c, d))
This I believe is this shortest way, though not necessarily the most efficient.
There. No WHERE, no OR and no AND:
SELECT
IF(`column1` != 0,1,0) +
IF(`column2` != 0,1,0) +
IF(`column3` != 0,1,0) +
IF(`column4` != 0,1,0) AS `results_sum`
FROM `table`
HAVING
`results_sum` > 1
Try
select *
from table t
where ( abs(sign(A))
+ abs(sign(B))
+ abs(sign(C))
+ abs(sign(D))
) > 0