how to count relationships from inital query - optimization

Hi I would like to make a query from a query (if that makes any sense)
My original solution is
PROFILE MATCH (q:Question)-[:TAGGED]-> (:Tag {name:"python"})
CALL{ WITH q ]
MATCH (q:Question)-[:TAGGED]-> (t:Tag)
WITH q, count(t) as c
RETURN c}
RETURN max(c)
The aim is to find all the questions q with the relationship TAGGED that is python. From the q nodes that we get the second objective is to count the number of relationships TAGGED that they have. The goal is to find the maximum amount of TAGGED relationships a question q can have. The problem is that this is not optimized enough as I am trying to limit the db hits. Another idea was the following
MATCH (:Tag {name: 'python'}) <-[:TAGGED]- (q:Question)-[:TAGGED]->(t: Tag)
WITH q, count(t) + 1 AS c
RETURN max(c)
In the first case, I tried to find first the questions that had at least the tag python and then pipeline the questions to count the number of relationships the filtered questions had but this seemed to be worse compared to the second query.
In the second query I had a problem with an expansion when I tried PROFILE and at the stage (q)-[anon_2:TAGGED]->(t) I take on too many db hits.
I'm confused as to how my first query doesn't work as well as the second.

I would try the following query:
MATCH (q:Question)
WHERE (q)-[:TAGGED]-> (:Tag {name:"python"})
WITH q,size((q)-[:TAGGED]->()) AS count
RETURN max(count)

Let's try this one:
PROFILE MATCH (q:Question)-[:TAGGED]-> (t:Tag)
WITH q, collect(t) AS tags
WHERE ANY(tag IN tags WHERE tag.name = 'python')
WITH q, size(tags) AS tagSize
RETURN max(tagSize)

Related

Can I rewrite this Cypher query to be compatible with Redis Graph?

My use-case is that I have some agents in organisation structure. I want select for some agent (can by me) to see sum (amount of money) of all contracts that that agents subordinates (and subordinates of their subordinates and so on...) created with clients grouped by contract category.
Problem is that Redis Graph do not currently support all predicate. But I need to filter relations between agents because we have multiple "modules" with different organisation structures and I need report just from one module at the time.
My current Cypher query is:
MATCH path = (:agent {id: 482})<-[:supervised *]-(b:agent)
WHERE all(rel IN relationships(path) WHERE
rel.module_id = 1
AND rel.valid_from < '2020-05-29'
AND '2020-05-29' < rel.valid_to)
WITH b as mediators
MATCH (mediators)-[:mediated]->(c:contract)
RETURN
c.category as category,
count(c) as contract_count,
sum(c.sum) as sum
ORDER BY sum DESC, category
This query works in Neo4j.
I don't event know if this query is correctly written for the type of result that I want.
My boss would really like to use Redis Graph instead Neo4j because of performance reasons but I can't find any way to rewrite this query to be functional in the Redis graph. Is it even possible?
Edit 1: I was told that we will be using graph just for currently valid data and just for one module so I no longer need functional all predicate but I am still interested in answer.
The ALL function isn't supported at the moment, we do intend to add it in the near future, an awkward way of achieving the same effect as the ALL function would be a combination of UNWIND and count
MATCH path = (:agent {id: 482})<-[:supervised *]-(b:agent)
WITH b AS b, relationships(path) AS edges, size(relationships(path)) AS edge_count
UNWIND edges AS r
WITH b AS b, edge_count AS edge_count, r AS r
WHERE r.module_id = 1 AND r.valid_from < '2020-05-29' AND '2020-05-29' < r.valid_to
WITH b AS b, edge_count AS edge_count, count(r) AS filter_edge_count
WHERE edge_count = filter_edge_count
....

SQL counting number of rows

I am looking for a way to search for a certain number of rows as a quality check. For example, we have tables that have a certain set of results that are needed.
Here is a quick table for an example:
ID: Name: Result: Reportable:
ONE A 10 X
TWO B 12 X
THREE C 1
FOUR D 18 X
FOUR(redo) D 11 X
So we are looking to double check results as there are people who accidentally report results multiple times (as in the case with ID FOUR). We have used having counts but we need the numbers to be specific and need a query to verify that number is satisfied.
In the table above we only want IDs ONE, TWO, and FOUR, however we have 4 results (one extra). Currently we have our check showing the count needed (ie 3) and the current result count (4) to show the mismatch but want a query to easily only show the result needed. We would need the redo result most of the time so we have set it so we take the latest date, but it doesn't help filter how many rows or results. I apologize if anything is confusing and I am not able to share the SQL query that we have currently. It's my first time posting so if I need to clarify anything please let me know as this seems to be very complicated. Thank you for your time.
EDIT: The details
We have one table (Table A) letting us know which results are reportable. The ones that are reportable go into another table (Table B). We have had issues in which people have made too many results reportable which overpopulates the Table B. Our old query had a count in Table B, but due to mistakes in people placing multiple reportables, samples which had many redos seem to be finished as they were all placed and met the count in Table B.
So now by using the Table A that helps tell us how many are Reportable, we want this to double check that the samples are indeed ready.
As I understand the question, you want ids that have multiple reportables. Assuming you really mean name, then:
select name
from t
where reportable = 'X'
group by name
having count(*) >= 2;

Sum two counts in a new column without repeating the code

I have one maybe stupid question.
Look at the query :
select count(a) as A, count(b) as b, count(a)+count(b) as C
From X
How can I sum up the two columns without repeating the code:
Something like:
select count(a) as A, count(b) as b, A+B as C
From X
For the sake of completeness, using a CTE:
WITH V AS (
SELECT COUNT(a) as A, COUNT(b) as B
FROM X
)
SELECT A, B, A + B as C
FROM V
This can easily be handled by making the engine perform only two aggregate functions and a scalar computation. Try this.
SELECT A, B, A + B as C
FROM (
SELECT COUNT(a) as A, COUNT(b) as B
FROM X
) T
You may get the two individual counts of a same table and then get the summation of those counts, like bellow
SELECT
(SELECT COUNT(a) FROM X )+
(SELECT COUNT(b) FROM X )
AS C
Let's agree on one point: SQL is not an Object-Oriented language. In fact, when we think of computer languages, we are thinking of procedural languages (you use the language to describe step by step how you want the data to be manipulated). SQL is declarative (you describe the desired result and the system works out how to get it).
When you program in a procedural languages your main concerns are: 1) is this the best algorithm to arrive at the correct result? and 2) do these steps correctly implement the algorithm?
When you program in a declarative language your main concern is: is this the best description of the desired result?
In SQL, most of your effort will be going into correctly forming the filtering criteria (the where clause) and the join criteria (any on clauses). Once that is done correctly, you're pretty much just down to aggregating and formating (if applicable).
The first query you show is perfectly formed. You want the number of all the non-null values in A, the number of all the non-null values in B, and the total of both of those amounts. In some systems, you can even use the second form you show, which does nothing more than abstract away the count(x) text. This is convenient in that if you should have to change a count(x) to sum(x), you only have to make a change in one place rather than two, but it doesn't change the description of the data -- and that is important.
Using a CTE or nested query may allow you to mimic the abstraction not available in some systems, but be careful making cosmetic changes -- changes that do not alter the description of the data. If you look at the execution plan of the two queries as you show them, the CTE and the subquery, in most systems they will probably all be identical. In other words, you've painted your car a different color, but it's still the same car.
But since it now takes you two distinct steps in 4 or 5 lines to explain what it originally took only one step in one line to express, it's rather difficult to defend the notion that you have made an improvement. In fact, I'll bet you can come up with a lot more bullet points explaining why it would be better if you had started with the CTE or subquery and should change them to your original query than the other way around.
I'm not saying that what you are doing is wrong. But in the real world, we are generally short of the spare time to spend on strictly cosmetic changes.

SQLite design question

Similar to a feed reader, I'm storing a bunch of articles, each pertaining to a source (feed) and each feed can belong to a category. What I'm trying to do is:
Retrieve the articles of the feeds that belong to a certain category.
Group the articles. One scenario would be by date(published_time), so that I have groups, for example: (12.04.09 - 3 articles, 17.04.09 - 9 articles, and so on)
Loop through each group and display each article. Pseudo-code:
foreach (Group group in results)
{
print(group.Name);
foreach (Article article in g.Articles)
{
print(article.Title);
print(article.Content);
}
}
I thought something simple like:
SELECT group_concat(item_id, '#') FROM items GROUP BY date(published_time)
would work. But then I'd have to split the resulting rows and loop through that (and there is no group_concat(*) function)
I'm confused as to how I would group(2) the results so that I can iterate through each one, preserving the group name. I thought that a SQL query returns ONE big table, and so, it seems to be impossible to accomplish this with just one query.
I reckon this is more of a DB design question, I'm also new to SQLite (SQL for that matter), so I ask you, gurus, how would one get this done efficiently?
SELECT Title, Content, date(published_time) AS Date
FROM items
ORDER BY date(published_time);
Pseudocode:
last = None
for r in results:
if not last or r.Date != last.Date:
print "Group", r.Date
print r.Title, r.Content
last = r

LINQ exclusion

Is there a direct LINQ syntax for finding the members of set A that are absent from set B? In SQL I would write this
SELECT A.* FROM A LEFT JOIN B ON A.ID = B.ID WHERE B.ID IS NULL
See the MSDN documentation on the Except operator.
var results = from itemA in A
where !B.Any(itemB => itemB.Id == itemA.Id)
select itemA;
I believe your LINQ would be something like the following.
var items = A.Except(
from itemA in A
from itemB in B
where itemA.ID == itemB.ID
select itemA);
Update
As indicated by Maslow in the comments, this may well not be the most performant query. As with any code, it is important to carry out some level of profiling to remove bottlenecks and inefficient algorithms. In this case, chaowman's answer provides a better performing result.
The reasons can be seen with a little examination of the queries. In the example I provided, there are at least two loops over the A collection - 1 to combine the A and B list, and the other to perform the Except operation - whereas in chaowman's answer (reproduced below), the A collection is only iterated once.
// chaowman's solution only iterates A once and partially iterates B
var results = from itemA in A
where !B.Any(itemB => itemB.Id == itemA.Id)
select itemA;
Also, in my answer, the B collection is iterated in its entirety for every item in A, whereas in chaowman's answer, it is only iterated upto the point at which a match is found.
As you can see, even before looking at the SQL generated, you can spot potential performance issues just from the query itself. Thanks again to Maslow for highlighting this.