SQL Query - Check for Two Distinct Values - sql

Given the below data set I want to run a query to highlight any 'pairs' that do not consist of a 'left' and 'right'.
+---------+-----------+---------------+----------------------+
| Pair_Id | Pair_Name | Individual_Id | Individual_Direction |
+---------+-----------+---------------+----------------------+
| 1 | A | A1 | Left |
| 1 | A | A2 | Right |
| 2 | B | B1 | Right |
| 2 | B | B2 | Left |
| 3 | C | C1 | Left |
| 3 | C | C2 | Left |
| 4 | D | D1 | Right |
| 4 | D | D2 | Left |
| 5 | E | E1 | Left |
| 5 | E | E2 | Right |
+---------+-----------+---------------+----------------------+
In this instance Pair 3 'C' has two lefts. Therefore, I would look to display the following:
+---------+-----------+---------------+----------------------+
| Pair_Id | Pair_Name | Individual_Id | Individual_Direction |
+---------+-----------+---------------+----------------------+
| 3 | C | C1 | Left |
| 3 | C | C2 | Left |
+---------+-----------+---------------+----------------------+

You can simply use not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_Direction <> t.Individual_Direction
) ;
With an index on (pair_id, Individual_Direction), this should not only be the most concise solution but also the fastest.
If you want to be sure that there are pairs (the above returns singletons):
select t.*
from t
where not exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_Direction <> t.Individual_Direction
) and
exists (select 1
from t t2
where t2.pair_id = t.pair_id and
t2.Individual_ID <> t.Individual_ID
);
You can also do this using window functions:
select t.*
from (select t.*,
count(*) over (partition by pair_id) as cnt,
min(status) over (partition by pair_id) as min_status,
max(status) over (partition by pair_id) as max_status
from t
) t
where cnt > 1 and min_status <> max_status;

One option uses aggregation:
WITH cte AS (
SELECT Pair_Name
FROM yourTable
WHERE Individual_Direction IN ('Left', 'Right')
GROUP BY Pair_Name
HAVING MIN(Individual_Direction) = MAX(Individual_Direction)
)
SELECT *
FROM yourTable
WHERE Pair_Name IN (SELECT Pair_Name FROM cte);
The HAVING clause used above asserts that a matching pair has both a minimum and maximum direction which are the same. This implies that such a pair only has one direction.
As is the case with Gordon's answer, an index on (Pair_Name, Individual_Direction) might help performance:
CREATE INDEX idx ON yourTable (Pair_Name, Individual_Direction);

There should be an elegant way of using window function than what I wrote:
WITH ranked AS
(
SELECT *, RANK() OVER(ORDER BY Pair_Id, Pair_Name, Individual_Direction) AS r
FROM pairs
),
counted AS
(
SELECT Pair_Id, Pair_Name, Individual_Direction,r, COUNT(r) as times FROM ranked
GROUP BY Pair_Id, Pair_Name, Individual_Direction, r
HAVING COUNT(r) > 1
)
SELECT ranked.Pair_Id, ranked.Pair_Name, ranked.Individual_Id, ranked.Individual_Direction FROM ranked
RIGHT JOIN counted
ON ranked.Pair_Id=counted.Pair_Id
AND ranked.Pair_Name=counted.Pair_Name
AND ranked.Individual_Direction=counted.Individual_Direction

Related

Grouping data using PostgreSQL based on 2 fields

I have a problem with grouping data in postgresql. let say that I have table called my_table
some_id | description | other_id
---------|-----------------|-----------
1 | description-1 | a
1 | description-2 | b
2 | description-3 | a
2 | description-4 | a
3 | description-5 | a
3 | description-6 | b
3 | description-7 | b
4 | description-8 | a
4 | description-9 | a
4 | description-10 | a
...
I would like to group my database based on some_id then differentiate which one has same and different other_id
I would expecting 2 type of queries: 1 that has same other_id and 1 that has different other_id
Expected result
some_id | description | other_id
---------|-----------------|-----------
2 | description-3 | a
2 | description-4 | a
4 | description-8 | a
4 | description-9 | a
4 | description-10 | a
AND
some_id | description | other_id
---------|-----------------|-----------
1 | description-1 | a
1 | description-2 | b
3 | description-5 | a
3 | description-6 | b
3 | description-7 | b
I am open for suggestion both using sequelize or raw query
thank you
One approach, using MIN and MAX as analytic functions:
WITH cte AS (
SELECT *, MIN(other_id) OVER (PARTITION BY some_id) min_other_id,
MAX(other_id) OVER (PARTITION BY some_id) max_other_id
FROM yourTable
)
-- all some_id the same
SELECT some_id, description, other_id
FROM cte
WHERE min_other_id = max_other_id;
-- not all some_id the same
SELECT some_id, description, other_id
FROM cte
WHERE min_other_id <> max_other_id;
Demo
You can also do this using exists and not exists:
-- all same
select t.*
from my_table t
where not exists (select 1
from my_table t2
where t2.some_id = t.some_id and t2.other_id <> t.other_id
);
-- any different
select t.*
from my_table t
where exists (select 1
from my_table t2
where t2.some_id = t.some_id and t2.other_id <> t.other_id
);
Note that this ignores NULL values. If you want them treated as a "different" value then use is distinct from rather than <>.

Iterate over the rows of a second table to return resultset with cumulative sum

Yesterday, after the help of a SO user #
Iterate over the rows of a second table to return resultset
I was able to make a combination of rows with a selfjoin.
After some modifications, to adapt to my implementation, I faced a new challenge that I'm stuck: how to make an aggregate sum of a third column?
My issue is better explained in the image below:
Based on the code
SELECT
b1.table_a_id,
b1.label_x,
b2.label_y
FROM table_a a
INNER JOIN table_b b1
ON b1.table_a_id = a.table_a_id
INNER JOIN table_b b2
ON b2.table_a_id = b1.table_a_id AND
b2.label_y > b1.label_x
ORDER BY
b1.table_a_id,
b1.label_x,
b2.label_y;
I was able to acquire the combinations.
What should be the next step to get the cumulative sum based on a third column?
I couldn't think of a solution without using a second service, such as python with pandas, using a cumsum function.
To generate the expected resultset, you would need to join the table with itself with an inequality condition on the order column. Then, you can do a window sum:
select
t1.table_a_id,
t1.label_x,
t2.label_y,
sum(t2.value) over(
partition by t1.table_a_id, t1.label_x
order by t1."order", t2."order"
) agg_value
from
table_b t1
inner join table_b t2
on t1.table_a_id = t2.table_a_id
and t2."order" >= t1."order"
order by t1."order", t2."order"
Note: order is a reserved word, so it needs to be quoted; if you actual database column has a different name, you can remove the double quotes.
Demo on DB Fiddle:
TABLE_A_ID | LABEL_X | LABEL_Y | AGG_VALUE
---------: | :------ | :------ | --------:
1 | A | B | 1
1 | A | C | 3
1 | A | D | 6
1 | A | E | 10
1 | A | F | 15
1 | B | C | 2
1 | B | D | 5
1 | B | E | 9
1 | B | F | 14
1 | C | D | 3
1 | C | E | 7
1 | C | F | 12
1 | D | E | 4
1 | D | F | 9
1 | E | F | 5
You seem to want a cumulative sum:
SELECT b1.table_a_id, b1.label_x, b2.label_y,
SUM(b1.value) OVER (PARTITION BY b1.table_a_id, b1.label_x
ORDER BY b2.order
) as AGG_VALUE

Impala - Does impala allow multi GROUP_CONCAT in one query

For example, I have a table below
+-----------+-------+------------+
| Id | a| b|
+-----------+-------+------------+
| 1 | 6 | 20 |
| 1 | 4 | 55 |
| 1 | 9 | 56 |
| 1 | 2 | 67 |
| 1 | 7 | 80 |
| 1 | 5 | 66 |
| 1 | 3 | 33 |
| 1 | 8 | 34 |
| 1 | 1 | 52 |
I want the output would be like below by using Impala
+-----------+-------------------+-----------------------------+
| Id | a | b |
+-----------+-------------------+-----------------------------+
| 1 | 6,4,9,2,7,5,3,8,1 | 20,55,56,67,80,66,33,34,52 |
+-----------+-------------------+-----------------------------+
In Impala, I have used
SELECT Id,
group_concat(DISTINCT a) AS a,
group_concat(DISTINCT b) AS b
FROM table GROUP BY Id
It will always get Syntax error. Just wondering is that we are not allowed to use multi group_concat for one query in Impala? or not allow to use multi Distinct for one query?
From the documentation for GROUP_CONCAT:
You cannot apply the DISTINCT operator to the argument of this function.
But, as workaround, we can use two separate subqueries to find the distinct values:
WITH cte1 AS (
SELECT Id, GROUP_CONCAT(a) AS a
FROM (SELECT DISTINCT Id, a FROM yourTable) t
GROUP BY Id
),
cte2 AS (
SELECT Id, GROUP_CONCAT(b) AS b
FROM (SELECT DISTINCT Id, b FROM yourTable) t
GROUP BY Id
)
SELECT
t1.Id,
t1.a,
t2.b
FROM cte1 t1
INNER JOIN cte2 t2
ON t1.Id = t2.Id;

How to select position based on some quantity in SQL Server

I have two tables:
Table #1 - Student:
+------+-------+
|Roll | Name |
+------+-------+
| 1 | A |
| 2 | B |
| 3 | C |
+------+-------+
Table #2 - Mark:
+------+------+
| Roll | Mark |
+------+------+
| 1 | 85 |
| 3 | 95 |
+------+------+
Output needs to be:
+-------+------+-------+---------+
| Roll | Name | Mark |Position |
+-------+------+-------+---------+
| 1 | A | 85 | 2 |
| 2 | B | 0 | 3 |
| 3 | C | 95 | 1 |
+-------+------+-------+---------+
What should be the query to get the output? I think the rank function is to be used, but don't know to use it...
Use LEFT JOIN to join two tables and then use RANK().
Query
select *, Position = rank() over(
order by t.Mark desc
)
from(
select t1.Roll, t1.Name, coalesce(t2.Mark, 0) as Mark
from student t1
left join Mark t2
on t1.Roll = t2.Roll
)t
order by t.Name;
Use Left Join and Rank() function.
Query
select T1.Roll, T1.Name, isnull(T2.Mark,0) Mark, rank() over(order by mark desc) Position
from STUDENT T1 left join MARK T2
on T1.Roll = T2.Roll order by T1.Roll

I need a specific output

I have to get a specific output format from my tables.
Let's say I have a simple table with 2 columns name and value.
table T1
+---------------+------------------+
| Name | Value |
+---------------+------------------+
| stuff1 | 1 |
| stuff1 | 1 |
| stuff2 | 2 |
| stuff3 | 1 |
| stuff2 | 4 |
| stuff2 | 2 |
| stuff3 | 4 |
+---------------+------------------+
I know the values are in the interval 1-4. I group it by name and value and count number of the same rows as Number and get the following table:
table T2
+---------------+------------------+--------+
| Name | Value | Number |
+---------------+------------------+--------+
| stuff1 | 1 | 2 |
| stuff2 | 2 | 2 |
| stuff3 | 1 | 1 |
| stuff3 | 4 | 1 |
+---------------+------------------+--------+
Here is the part when I need your help! What should I do if I want to get these format?
table T3
+---------------+------------------+--------+
| Name | Value | Number |
+---------------+------------------+--------+
| stuff1 | 1 | 2 |
| stuff1 | 2 | 0 |
| stuff1 | 3 | 0 |
| stuff1 | 4 | 0 |
| stuff2 | 1 | 0 |
| stuff2 | 2 | 2 |
| stuff2 | 3 | 0 |
| stuff2 | 4 | 0 |
| stuff3 | 1 | 1 |
| stuff3 | 2 | 0 |
| stuff3 | 3 | 0 |
| stuff3 | 4 | 1 |
+---------------+------------------+--------+
Thanks for any suggestions!
You start with a cross join to generate all possible combinations and then left-join in the results from your existing query:
select n.name, v.value, coalesce(nv.cnt, 0) as "Number"
from (select distinct name from table t) n cross join
(select distinct value from table t) v left outer join
(select name, value, count(*) as cnt
from table t
group by name, value
) nv
on nv.name = n.name and nv.value = v.value;
Variation on the theme.
Differences between Gordon Linoff and Owen existing answers.
I prefer GROUP BY to get the Names rather than a DISTINCT. This may have better performance in a case like this. (See Rob Farley's still relevant article.)
I explode the subqueries into a series of CTEs for clarity.
I use table T2 as the question now labels the group results set instead of showing that as as subquery.
WITH PossibleValue AS (
SELECT 1 Value
UNION ALL
SELECT Value + 1
FROM PossibleValue
WHERE Value < 4
),
Name AS (
SELECT Name
FROM T1
GROUP BY Name
),
NameValue AS (
SELECT Name
,Value
FROM Name
CROSS JOIN
PossibleValue
)
SELECT nv.Name
,nv.Value
,ISNULL(T2.Number,0) Number
FROM NameValue nv
LEFT JOIN
T2 ON nv.Name = T2.Name
AND nv.Value = T2.Value
Yet another solution, this time using a Table Value Constructor in a CTE to build a table of name value combinations.
WITH value AS
( SELECT DISTINCT t.name, v.value
FROM T1 AS t
CROSS JOIN (VALUES (1),(2),(3),(4)) AS v (value)
)
SELECT v.name AS 'Name', v.value AS 'Value', COUNT(t.name) AS 'Number'
FROM value AS v
LEFT JOIN T1 AS t ON t.value = v.value AND t.name = v.name
GROUP BY v.name, v.value, t.name;