ratio from different condition in Group By clause SQL - sql

I have a table t with three columns, a, b, c. I want to calculate the number of a where b =1 over the number of a where b = 2 for every category in c. some Pseudo code is like: (mysql)
select count(distinct a) where b = 1 / count(distinct a) where b = 2
from t
group by c
but this won't work in SQL, since the condition 'where' cannot add for every category in c in the clause group by c.

You don't mention which database you are using, so I'll assume it implements FULL OUTER JOIN.
Also, you don't say what to do in case a division by zero could happen. Anyway, this query will get you the separate sums, so you can compute the division as needed:
select
coalesce(x.c, y.c) as c
coalesce(x.e, 0) as b1
coalesce(y.f, 0) as b2
case when y.f is null or y.f = 0 then -1 else x.e / y.f end
from (
select c, count(distinct a) as e
from t
where b = 1
group by c
) x
full join (
select c, count(distinct a) as f
from t
where b = 2
group by c
) y on x.c = y.c

You can do this in SQL Server, PostgreSQL, MySQL:
create table test (a int, b int, c varchar(10));
insert into test values
(1, 1, 'food'), (2, 1, 'food'), (3, 1, 'food'),
(1, 2, 'food'), (2, 2, 'food'),
(1, 1, 'drinks'), (2, 1, 'drinks'), (2, 1, 'drinks'),
(1, 2, 'drinks')
;
select cat.c, cast(sum(b1_count) as decimal)/sum(b2_count), sum(b1_count), sum(b2_count) from
(select distinct c from test) as cat
left join
(select c, count(distinct a) b1_count from test where b = 1 group by c) b1 on cat.c = b1.c
left join
(select c, count(distinct a) b2_count from test where b = 2 group by c) b2
on cat.c = b2.c
group by cat.c;
Result
c | (No column name) | (No column name) | (No column name)
:----- | ---------------: | ---------------: | ---------------:
drinks | 2.00000000000 | 2 | 1
food | 1.50000000000 | 3 | 2
Examples:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=2003e0baa46bfbb197152b829ea57d2d
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=2003e0baa46bfbb197152b829ea57d2d
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=2003e0baa46bfbb197152b829ea57d2d

You can use conditional aggregation:
select count(distinct case when b = 1 then a end) / count(distinct case when b = 2 then a end)
from t
group by c;
You don't mention your database, but some do integer division -- which can result in unexpected truncation. You might want a * 1.0 / instead of / to for non-integer division.

Related

How to Compare Multiple Columns and Rows of a Table

I have a big data table that looks something like this
ID Marker Value1 Value2
================================
1 A 10 11
1 B 12 13
1 C 14 15
2 A 10 11
2 B 13 12
2 C
3 A 10 11
3 C 12 13
I want to search this data by the following data, which is user input and not stored in a table:
Marker Value1 Value2
==========================
A 10 11
B 12 13
C 14 14
The result should be something like this:
ID Marker Value1 Value2 Match?
==========================================
1 A 10 11 true
1 B 12 13 true
1 C 14 15 false
2 A 10 11 true
2 B 13 12 true
2 C false
3 A 10 11 true
3 C 12 13 false
And ultimately this (the above table is not necessary, it should demonstrate how these values came to be):
ID Matches Percent
========================
1 2 66%
2 2 66%
3 1 33%
I'm searching for the most promising approach to get this to work in SQL (PostgreSQL to be exact).
My ideas:
Create a temporary table, join it with the above one and group the result
Use CASE WHEN or a temporary PROCEDURE to only use a single (probably bloated) query
I'm not satisified with either approach, hence the question. How can I compare two tables like these efficiently?
The user input can be supplied using a VALUES clause in a common table expression and that can then be used in a left join with the actual table.
with user_input (marker, value1, value2) as (
values
('A', 10, 11),
('B', 12, 13),
('C', 14, 14)
)
select d.id,
count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)),
100 * count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)) / cast(count(*) as numeric) as pct
from data d
left join user_input u on (d.marker, d.value1, d.value2) = (u.marker, u.value1, u.value2)
group by d.id
order by d.id;
Returns:
id | count | pct
---+-------+------
1 | 2 | 66.67
2 | 2 | 66.67
3 | 1 | 50.00
Online example: https://rextester.com/OBOOD9042
Edit
If the order of the values isn't relevant (so (12,13) is considered the same as (13,12) then the comparison gets a bit more complicated.
with user_input (marker, value1, value2) as (
values
('A', 10, 11),
('B', 12, 13),
('C', 14, 14)
)
select d.id,
count(*) filter (where (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) is not distinct from (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2)))
from data d
left join user_input u on (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) = (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2))
group by d.id
order by d.id;
You can use a CTE to pre-compute the matches. Then a simple aggregation will do the trick. Assuming your parameters are:
Marker Value1 Value2
==========================
m1 x1 y1
m2 x2 y2
m3 x3 y3
You can do:
with x as (
select
id,
case when
marker = :m1 and (value1 = :x1 and value2 = :y1 or value1 = :y1 and value2 = :x1)
or marker = :m2 and (value1 = :x2 and value2 = :y2 or value1 = :y2 and value2 = :x2)
or marker = :m3 and (value1 = :x3 and value2 = :y3 or value1 = :y3 and value2 = :x3)
then 1 else 0 end as matches
from t
)
select
id,
sum(matches) as matches,
100.0 * sum(matches) / count(*) as percent
from x
group by id
Try this:
CREATE TABLE #Temp
(
Marker nvarchar(50),
Value1 nvarchar(50),
Value2 nvarchar(50)
)
INSERT INTO #Temp Values ('A', '10', '11')
INSERT INTO #Temp Values ('B', '12', '13')
INSERT INTO #Temp Values ('C', '14', '14')
SELECT m.Id, m.Marker, m.Value1, m.Value2,
(Select
CASE
WHEN COUNT(*) = 0 THEN 'False'
WHEN COUNT(*) <> 0 THEN 'True'
END
FROM #Temp t
WHERE t.Marker = m.Marker and t.Value1 = m.Value1 and t.Value2 = m.Value2) as Matches
FROM [Test].[dbo].[Markers] m
ORDER BY Matches DESC
Drop TABLE #Temp
If it's exactly what you want, I try to solve the second part of it.

postgres Finding the difference between numbers of the same column

I have a table like the below
| date | key | value | |
|------------|-----|-------|---|
| 01-01-2009 | a | 25 | |
| 01-01-2009 | b | 25 | |
| 01-01-2009 | c | 10 | |
I'm trying to come up with a query which would allow me to do (a+b)-c for each day - but my join is doing this (a+b+c)-c
with total as (
select
sum(value),
date
from
A
where
key in ('a',
'b')
group by
date )
select
sum(total.sum) as a, sum(A.value) as b,
sum(total.sum) - sum(A.value) as value,
A.date
from
A
join total on
total.date = A.date
where
A.key = 'c'
group by
A.date
This is giving me a value of 50 (it should be 40) - my C values are getting calcualted as part of the total table during the join
What am i doing wrong?
How about simply doing conditional aggregation?
select date,
sum(case when key in ('a', 'b') then value
when key in ('c') then - value
end) as daily_calc
from a
group by date;
A join doesn't seem very helpful for this calculation.
You can join three table expressions, as in:
select
a.date,
(a.value + b.value) - c.value
from (select * from A where key = 'a') a
join (select * from A where key = 'b') b on b.date = a.date
join (select * from A where key = 'c') c on c.date = a.date

Perform ranking depend on category

I Have a table looks like this:
RowNum category Rank4A Rank4B
-------------------------------------------
1 A
2 A
3 B
5 A
6 B
9 B
My requirement is based on the RowNum order, Make two new ranking columns depend on category. Rank4A works like the DENSERANK() by category = A, but if the row is for category B, it derives the latest appeared rank for category A order by RowNum. Rank4B have similar logic, but it orders by RowNum in DESC order. So the result would like this (W means this cell I don't care its value):
RowNum category Rank4A Rank4B
-------------------------------------------
1 A 1 W
2 A 2 W
3 B 2 3
5 A 3 2
6 B W 2
9 B W 1
One more additional requirement is that CROSS APPLY or CURSOR is not allowed due to dataset being large. Any neat solutions?
Edit: Also no CTE (due to MAX 32767 limit)
You can use the following query:
SELECT RowNum, category,
SUM(CASE
WHEN category = 'A' THEN 1
ELSE 0
END) OVER (ORDER BY RowNum) AS Rank4A,
SUM(CASE
WHEN category = 'B' THEN 1
ELSE 0
END) OVER (ORDER BY RowNum DESC) AS Rank4B
FROM mytable
ORDER BY RowNum
Giorgos Betsos' answer is better, please read it first.
Try this out. I believe each CTE is clear enough to show the steps.
IF OBJECT_ID('tempdb..#Data') IS NOT NULL
DROP TABLE #Data
CREATE TABLE #Data (
RowNum INT,
Category CHAR(1))
INSERT INTO #Data (
RowNum,
Category)
VALUES
(1, 'A'),
(2, 'A'),
(3, 'B'),
(5, 'A'),
(6, 'B'),
(9, 'B')
;WITH AscendentDenseRanking AS
(
SELECT
D.RowNum,
D.Category,
AscendentDenseRanking = DENSE_RANK() OVER (ORDER BY D.Rownum ASC)
FROM
#Data AS D
WHERE
D.Category = 'A'
),
LaggedRankingA AS
(
SELECT
D.RowNum,
AscendentDenseRankingA = MAX(A.AscendentDenseRanking)
FROM
#Data AS D
INNER JOIN AscendentDenseRanking AS A ON D.RowNum > A.RowNum
WHERE
D.Category = 'B'
GROUP BY
D.RowNum
),
DescendantDenseRanking AS
(
SELECT
D.RowNum,
D.Category,
DescendantDenseRanking = DENSE_RANK() OVER (ORDER BY D.Rownum DESC)
FROM
#Data AS D
WHERE
D.Category = 'B'
),
LaggedRankingB AS
(
SELECT
D.RowNum,
AscendentDenseRankingB = MAX(A.DescendantDenseRanking)
FROM
#Data AS D
INNER JOIN DescendantDenseRanking AS A ON D.RowNum < A.RowNum
WHERE
D.Category = 'A'
GROUP BY
D.RowNum
)
SELECT
D.RowNum,
D.Category,
Rank4A = ISNULL(RA.AscendentDenseRanking, LA.AscendentDenseRankingA),
Rank4B = ISNULL(RB.DescendantDenseRanking, LB.AscendentDenseRankingB)
FROM
#Data AS D
LEFT JOIN AscendentDenseRanking AS RA ON D.RowNum = RA.RowNum
LEFT JOIN LaggedRankingA AS LA ON D.RowNum = LA.RowNum
LEFT JOIN DescendantDenseRanking AS RB ON D.RowNum = RB.RowNum
LEFT JOIN LaggedRankingB AS LB ON D.RowNum = LB.RowNum
/*
Results:
RowNum Category Rank4A Rank4B
----------- -------- -------------------- --------------------
1 A 1 3
2 A 2 3
3 B 2 3
5 A 3 2
6 B 3 2
9 B 3 1
*/
This isn't a recursive CTE, so the limit 32k doesn't apply.

Finding all duplicate rows with the given conditions

I have a table called Fruit which has two columns (id,cost) and I want to select id and cost, and find all duplicate rows where the cost is the same but the id is different. How can I write this query?
I write this query
SELECT id,cost
From Fruit a
WHERE (SELECT COUNT(*)
FROM Fruit b
WHERE a.cost = b.cost
) > 1
This works but only give me the rows where the cost is the same but the id might be same as well, I only want results where the cost is the same but id is different
This is what you need:
SELECT DISTINCT F1.*
FROM Fruit F1
INNER JOIN Fruit F2 ON F1.id <> F2.id AND F1.cost = F2.cost
If you want repeated id-cost pairs to be listed too, just remove the DISTINCT.
You can add a simple condition where id are not equal.
SELECT id,cost From Fruit a WHERE (SELECT COUNT(*) FROM Fruit b WHERE a.cost = b.cost and a.id <> b.id ) > 1
Here <> is operator for not equal.
I hope it will help you :)
you could select all the rows with cost duplicated using a group by and having ccount(*) >1
and for this get all the row that match
select a.id, a.cost
from Fruit a
where cost in ( select b.cost
from fruit b
group by b.cost
having count(*) > 1
)
for avoid duplicated result you can add distinct
select distinct a.id, a.cost
from Fruit a
where cost in ( select b.cost
from fruit b
group by b.cost
having count(*) > 1
)
You can add a simple condition where id are not equal.
SELECT id,cost From Fruit a WHERE (SELECT COUNT(*) FROM Fruit b WHERE a.cost = b.cost and a.id <> b.id ) > 1
Here <> is the operator Of not equal in sql.
This works. Ran it in SQL Server because Oracle is broken on Fiddle, but should work on either system.
MS SQL Server 2014 Schema Setup:
CREATE TABLE ab
([id] int, [cost] int)
;
INSERT INTO ab
([id], [cost])
VALUES
(1, 5),
(2, 5),
(3, 15),
(3, 15),
(4, 24),
(5, 68),
(6, 13),
(7, 3)
;
Query 1:
with a1 as (
SELECT id
,cost
,rank () over (partition by cost order by id) dup
From ab
)
select * from a1 where dup > 1
Results:
| id | cost | dup |
|----|------|-----|
| 2 | 5 | 2 |
And then to return all values where there was a duplicate cost:
with a1 as (
SELECT id
,cost
,rank () over (partition by cost order by id) dup
From ab
)
,a2 as ( select * from a1 where dup > 1)
select * from ab
join a2 on ab.cost = a2.cost

Microsoft SQL Server : comparing a vertical table as a horizontal table for vector

I have a set of vectors where each vector has an element of a through z.
I would like to have a query such that that I get the first vector on the left and the comparing to vector on the right.
Let's say the first vector is (a,b,c) and the other two vectors are (a,b) and (c).
When I really want this:
v1 el1 v2 el2
-----------------
1 a 2 a
1 b 2 b
1 c 2 null
1 a 3 null
1 b 3 null
1 c 3 c
This way it would be easy to have another pass to calculate the metrics per vector as they relate to vector #1.
DROP TABLE #vector
CREATE TABLE #vector (v VARCHAR(10),el VARCHAR(10))
INSERT INTO #vector(v, el) VALUES ('1', 'a')
INSERT INTO #vector(v, el) VALUES ('1', 'b')
INSERT INTO #vector(v, el) VALUES ('1', 'c')
INSERT INTO #vector(v, el) VALUES ('2', 'a')
INSERT INTO #vector(v, el) VALUES ('2', 'b')
INSERT INTO #vector(v, el) VALUES ('3', 'c')
SELECT *
FROM #vector a
LEFT JOIN #vector b on a.el = b.el AND a.v <> b.v
WHERE a.v = '1'
I actually get this:
v el v el
--------------
1 a 2 a
1 b 2 b
1 c 3 c
I thought about PIVOT:
WITH vectors AS (
select *
from
( select v,el from #vector ) src
PIVOT (
count(el) for el in ([a],[b],[c])
) piv)
SELECT * FROM vectors a JOIN vectors b ON b.v <> a.v WHERE a.v=1
Which returns this:
v a b c v a b c
------------------------------
1 1 1 1 2 1 1 0
1 1 1 1 3 0 0 1
Which admittedly, I can use but it requires me to rewrite simple summation query into one in which I must specify a through z.
SELECT
v, el, present
FROM
(SELECT *
FROM
(SELECT v, el FROM #vector) src
PIVOT (count(el) for el in ([a],[b],[c]) ) piv) foo
UNPIVOT (present FOR el IN (a,b,c)) AS up;
This returns:
v el present
------------------
1 a 1
1 b 1
1 c 1
2 a 1
2 b 1
2 c 0
3 a 0
3 b 0
3 c 1
So as a possible final answer:
SELECT v, el, present
INTO #vector2
FROM (select * FROM ( select v,el from #vector ) src PIVOT ( count(el) for el in ([a],[b],[c]) ) piv) foo
UNPIVOT (present FOR el IN (a,b,c)) AS up;
SELECT * FROM #vector2 a LEFT JOIN #vector2 b on a.el = b.el AND a.v <> b.v WHERE a.v='1'
ORDER BY a.v,b.v
Returns:
v el present v el present
1 a 1 2 a 1
1 b 1 2 b 1
1 c 1 2 c 0
1 c 1 3 c 1
1 b 1 3 b 0
1 a 1 3 a 0
So through the PIVOT and UNPIVOT, I can get the zeros filled in.
However, this seems like a complicated solution.
Is there an easier way?
One idea would be to alter #vector and add 'present' and populate the zero entries. But populating the other zero entries wastes space and it is non-trivial to determine which 0s to insert.
Thank you for your help. :)
Instead of pivot/unpivot use UNION
SELECT * FROM (
SELECT 2 as part, a.v as a_v, a.el as a_el, b.v as b_v, b.el as b_el
FROM #vector a
LEFT JOIN #vector b on b.v='2' and a.el=b.el
WHERE a.v='1'
UNION ALL
SELECT 3 as part, a.*, b.*
FROM #vector a
LEFT JOIN #vector b on b.v='3' and a.el=b.el
WHERE a.v='1') t
order by part, a_v, a_el