How to Compare Multiple Columns and Rows of a Table - sql

I have a big data table that looks something like this
ID Marker Value1 Value2
================================
1 A 10 11
1 B 12 13
1 C 14 15
2 A 10 11
2 B 13 12
2 C
3 A 10 11
3 C 12 13
I want to search this data by the following data, which is user input and not stored in a table:
Marker Value1 Value2
==========================
A 10 11
B 12 13
C 14 14
The result should be something like this:
ID Marker Value1 Value2 Match?
==========================================
1 A 10 11 true
1 B 12 13 true
1 C 14 15 false
2 A 10 11 true
2 B 13 12 true
2 C false
3 A 10 11 true
3 C 12 13 false
And ultimately this (the above table is not necessary, it should demonstrate how these values came to be):
ID Matches Percent
========================
1 2 66%
2 2 66%
3 1 33%
I'm searching for the most promising approach to get this to work in SQL (PostgreSQL to be exact).
My ideas:
Create a temporary table, join it with the above one and group the result
Use CASE WHEN or a temporary PROCEDURE to only use a single (probably bloated) query
I'm not satisified with either approach, hence the question. How can I compare two tables like these efficiently?

The user input can be supplied using a VALUES clause in a common table expression and that can then be used in a left join with the actual table.
with user_input (marker, value1, value2) as (
values
('A', 10, 11),
('B', 12, 13),
('C', 14, 14)
)
select d.id,
count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)),
100 * count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)) / cast(count(*) as numeric) as pct
from data d
left join user_input u on (d.marker, d.value1, d.value2) = (u.marker, u.value1, u.value2)
group by d.id
order by d.id;
Returns:
id | count | pct
---+-------+------
1 | 2 | 66.67
2 | 2 | 66.67
3 | 1 | 50.00
Online example: https://rextester.com/OBOOD9042
Edit
If the order of the values isn't relevant (so (12,13) is considered the same as (13,12) then the comparison gets a bit more complicated.
with user_input (marker, value1, value2) as (
values
('A', 10, 11),
('B', 12, 13),
('C', 14, 14)
)
select d.id,
count(*) filter (where (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) is not distinct from (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2)))
from data d
left join user_input u on (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) = (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2))
group by d.id
order by d.id;

You can use a CTE to pre-compute the matches. Then a simple aggregation will do the trick. Assuming your parameters are:
Marker Value1 Value2
==========================
m1 x1 y1
m2 x2 y2
m3 x3 y3
You can do:
with x as (
select
id,
case when
marker = :m1 and (value1 = :x1 and value2 = :y1 or value1 = :y1 and value2 = :x1)
or marker = :m2 and (value1 = :x2 and value2 = :y2 or value1 = :y2 and value2 = :x2)
or marker = :m3 and (value1 = :x3 and value2 = :y3 or value1 = :y3 and value2 = :x3)
then 1 else 0 end as matches
from t
)
select
id,
sum(matches) as matches,
100.0 * sum(matches) / count(*) as percent
from x
group by id

Try this:
CREATE TABLE #Temp
(
Marker nvarchar(50),
Value1 nvarchar(50),
Value2 nvarchar(50)
)
INSERT INTO #Temp Values ('A', '10', '11')
INSERT INTO #Temp Values ('B', '12', '13')
INSERT INTO #Temp Values ('C', '14', '14')
SELECT m.Id, m.Marker, m.Value1, m.Value2,
(Select
CASE
WHEN COUNT(*) = 0 THEN 'False'
WHEN COUNT(*) <> 0 THEN 'True'
END
FROM #Temp t
WHERE t.Marker = m.Marker and t.Value1 = m.Value1 and t.Value2 = m.Value2) as Matches
FROM [Test].[dbo].[Markers] m
ORDER BY Matches DESC
Drop TABLE #Temp
If it's exactly what you want, I try to solve the second part of it.

Related

How to get specific records in posgtres

In Postgres I have two tables:
Table A { int keyA, Text name}
Table B { int keyB, int keyA, char mark, date start, date end}
Mark from Table B could be 'X', 'Y', 'Z'.
I want to get every record 'X' with dates but only one from 'Y', 'Z'. Also if there are 'X', 'Y', 'Z' i want only 'X'.
From:
keyB
keyA
mark
start
end
1
1
X
15-01-2023
16-01-2023
2
1
X
17-01-2023
18-01-2023
3
1
Y
null
null
4
1
Z
null
null
5
2
Y
null
null
6
2
Z
null
null
7
2
Y
null
null
8
3
Z
null
null
9
3
Y
null
null
10
4
X
19-01-2023
20-01-2023
I want to get
keyB
keyA
mark
start
end
1
1
X
15-01-2023
16-01-2023
2
1
X
17-01-2023
17-01-2023
5
2
Y
null
null
8
3
Z
null
null
10
4
X
19-01-2023
20-01-2023
I tried:
1.
Select A.name,
(select b2.start from B b2 where b2.keyA = A.keyA and b2.mark = 'X') as Start,
(select b2.end from B b2 where b2.keyA = A.keyA and b2.mark = 'X') as End,
from A order by name;
Order is important. I need to have name first.
There is a porblem. In subqueries i have more than one record so i have to add limit 1. But I want to get every X not only one.
If I do this
Select A.name, B.start, B.end
from A inner join B on A.keyA = B.keyB
I'll have X, Y, Z and as I mentioned I want only X or one from Y or Z.
Any idea how should I solve this?
Use the row_number function with your join query as the following:
select name, keyB, keyA, mark, start_dt, end_dt
from
(
select A.name, B.*,
row_number() over (partition by B.keyA order by case when B.mark='X' then 1 else 2 end, B.keyb) rn
from tableB B join tableA A
on B.keyA = A.keyA
) T
where mark = 'X' or rn = 1
order by keyb
See demo

ratio from different condition in Group By clause SQL

I have a table t with three columns, a, b, c. I want to calculate the number of a where b =1 over the number of a where b = 2 for every category in c. some Pseudo code is like: (mysql)
select count(distinct a) where b = 1 / count(distinct a) where b = 2
from t
group by c
but this won't work in SQL, since the condition 'where' cannot add for every category in c in the clause group by c.
You don't mention which database you are using, so I'll assume it implements FULL OUTER JOIN.
Also, you don't say what to do in case a division by zero could happen. Anyway, this query will get you the separate sums, so you can compute the division as needed:
select
coalesce(x.c, y.c) as c
coalesce(x.e, 0) as b1
coalesce(y.f, 0) as b2
case when y.f is null or y.f = 0 then -1 else x.e / y.f end
from (
select c, count(distinct a) as e
from t
where b = 1
group by c
) x
full join (
select c, count(distinct a) as f
from t
where b = 2
group by c
) y on x.c = y.c
You can do this in SQL Server, PostgreSQL, MySQL:
create table test (a int, b int, c varchar(10));
insert into test values
(1, 1, 'food'), (2, 1, 'food'), (3, 1, 'food'),
(1, 2, 'food'), (2, 2, 'food'),
(1, 1, 'drinks'), (2, 1, 'drinks'), (2, 1, 'drinks'),
(1, 2, 'drinks')
;
select cat.c, cast(sum(b1_count) as decimal)/sum(b2_count), sum(b1_count), sum(b2_count) from
(select distinct c from test) as cat
left join
(select c, count(distinct a) b1_count from test where b = 1 group by c) b1 on cat.c = b1.c
left join
(select c, count(distinct a) b2_count from test where b = 2 group by c) b2
on cat.c = b2.c
group by cat.c;
Result
c | (No column name) | (No column name) | (No column name)
:----- | ---------------: | ---------------: | ---------------:
drinks | 2.00000000000 | 2 | 1
food | 1.50000000000 | 3 | 2
Examples:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=2003e0baa46bfbb197152b829ea57d2d
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=2003e0baa46bfbb197152b829ea57d2d
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=2003e0baa46bfbb197152b829ea57d2d
You can use conditional aggregation:
select count(distinct case when b = 1 then a end) / count(distinct case when b = 2 then a end)
from t
group by c;
You don't mention your database, but some do integer division -- which can result in unexpected truncation. You might want a * 1.0 / instead of / to for non-integer division.

Merging results from two subqueries Postgresql

I am using Postgresql, I created two subqueries that return results as follows:
firm_id type_1 fee_1
1 2 100
2 4 300
5 1 100
firm_id type_2 fee_2
1 3 200
2 3 200
3 2 150
4 5 300
I would like to yield a result as:
firm_id type_1 type_2 total_fee
1 2 3 300
2 4 3 500
3 0 2 150
4 0 5 300
5 1 0 100
Any helps appreciated!
Use FULL JOIN and coalesce():
with q1(firm_id, type_1, fee_1) as (
values
(1, 2, 100),
(2, 4, 300),
(5, 1, 100)),
q2 (firm_id, type_2, fee_2) as (
values
(1, 3, 200),
(2, 3, 200),
(3, 2, 150),
(4, 5, 300))
select
firm_id,
coalesce(type_1, 0) type_1,
coalesce(type_2, 0) type_2,
coalesce(fee_1, 0)+ coalesce(fee_2, 0) total_fee
from q1
full join q2
using (firm_id);
firm_id | type_1 | type_2 | total_fee
---------+--------+--------+-----------
1 | 2 | 3 | 300
2 | 4 | 3 | 500
3 | 0 | 2 | 150
4 | 0 | 5 | 300
5 | 1 | 0 | 100
(5 rows)
SELECT firm_id
,coalesce(t.type_1, 0) type_1
,coalesce(b.type_1, 0) type_2
,coalesce(t.fee_1, 0) + coalesce(b.fee_1, 0) total_fee
FROM (
SELECT * --Your first select query
FROM tablea
) t
FULL JOIN (
SELECT * --Your second select query
FROM tableb
) b using (firm_id)
FULL JOIN: combines the results of both left and right outer joins.
The joined table will contain all records from both tables, and fill in NULLs for missing matches on either side.
COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display
SELECT coalesce( t1."firm_id", t2."firm_id" ) as firm_id,
coalesce( t1."type_1", 0 ) as type_1,
coalesce( t2."type_2", 0 ) as type_2,
coalesce( t1."fee_1", 0 )
+
coalesce( t2."fee_2", 0 ) as total_fee
FROM table1 t1
FULL JOIN table2 t2
ON t1."firm_id" = t2."firm_id"
where table1 and table2 must be replaced by your subqueries
see a demo: http://sqlfiddle.com/#!15/6d391/2
select coalesce(sq1.firm_id, sq2.firm_id) as firm_id, coalesce(type_1, 0), coalesce(type_2, 0), coalesce(fee_1, 0)+coalesce(fee_2, 0) as total_fee
from <subquery1> as sq1
outer join <subquery1> as sq1
on sq1.firm_id=sq2.firm_id
If you have two subqueries, say a and b, that give you set A and set B, what you're looking for is a full join A x B where at least one firm_id in the full join is not null, with a calculated column total_fee = fee_1 + fee_2.
I'm not that familiar with postgresql syntax but it should be something like
select
-- get rid of the columns you don't want
a.firm_id, a.type_1, a.type_2, a.fee_1,
b.firm_id, b.type_1, b.type_2, b.fee_2,
a.fee_1 + b.fee_2 as total_fee
from ( subquery_1 here ) as a
full join ( subquery_2 here) as b
on
b.firm_id = a.firm_id and
b.type_1 <> a.type_1
where
a.firm_id is not null or b.firm_id is not null

Using Scoring to find Best Match in SQL

Suppose I have a DATA table like:
ID | Col1 | Col2 | Col3
1 a b 23
2 a c 14
3 f g 11
Suppose I have a POSSIBLE_MATCHES table like:
MatchID | Col1 | Col2 | Col3
101 a a 11
102 a b 11
103 a b 14
104 a c 23
105 f a 1
Suppose I have a WEIGHTS table like (if you want for the sake of this discussion and simplicity assume all weights to be 1 - I can improvise my solution later to incorporate the weights):
Col | Weight
Col1 1
Col2 1.5
Col3 2
So for each possible match we would calculate a SCORE on each matching column.
Score = Col1 Weight * (CASE WHEN DATA.COL1 = POSSIBLE_MATCHES.Col1 THEN 1 ELSE 0) +
Col2 Weight * (CASE WHEN DATA.COL2 = POSSIBLE_MATCHES.Col2 THEN 1 ELSE 0) +
Col3 Weight * (CASE WHEN DATA.COL3 = POSSIBLE_MATCHES.Col3 THEN 1 ELSE 0)
So for example the BEST MATCH for the first row: Col1 = a, Col2 = b, Col3 = 23:
MatchID | Col1 | Col2 | Col3 | Score
101 a a 11 1*1 + 1.5*0 + 2*0 = 1
102 a b 11 1*1 + 1.5*1 + 2*0 = 2.5
103 a b 14 1*1 + 1.5*1 + 2*0 = 2.5
104 a c 23 1*1 + 1.5*0 + 2*1 = 3
105 f a 1 1*0 + 1.5*0 + 2*0 = 0
So in this case the best match for ID:1 is MatchID:104. If the scores are the same then take the lowest MatchID.
Here's a sql fiddle if you wish to play around with this:
http://sqlfiddle.com/#!6/9df45/1
For each ID in DATA how would I find the BEST match in POSSIBLE MATCHES?
In this solution, we do a full join to get all possibilities and evaluate the score of all of them. Then, we assign them a number from best to lowest with ROW_NUMBER. Finally, we exclude all those that aren't the best one with "WHERE Rank = 1"
SELECT *
FROM
(SELECT data.ID,
possible_matches.MatchID,
Score = (CASE WHEN data.Col1 = possible_matches.Col1 THEN 1 ELSE 0 END) * 1 +
(CASE WHEN data.Col2 = possible_matches.Col2 THEN 1 ELSE 0 END) * 1.5 +
(CASE WHEN data.Col3 = possible_matches.Col3 THEN 1 ELSE 0 END) * 2,
[Rank] = ROW_NUMBER() OVER(PARTITION BY data.ID ORDER BY (CASE WHEN data.Col1 = possible_matches.Col1 THEN 1 ELSE 0 END) * 1 +
(CASE WHEN data.Col2 = possible_matches.Col2 THEN 1 ELSE 0 END) * 1.5 +
(CASE WHEN data.Col3 = possible_matches.Col3 THEN 1 ELSE 0 END) * 2 DESC)
from data, possible_matches) AS AllScore
WHERE AllScore.[Rank] = 1
Try this:
DECLARE #d TABLE(ID INT, Col1 CHAR(1), Col2 CHAR(1), Col3 INT)
DECLARE #m TABLE(ID INT, Col1 CHAR(1), Col2 CHAR(1), Col3 INT)
INSERT INTO #d VALUES
(1, 'a', 'b', 23),
(2, 'a', 'c', 14),
(3, 'f', 'g', 11)
INSERT INTO #m VALUES
(101, 'a', 'a', 11),
(102, 'a', 'b', 11),
(103, 'a', 'b', 14),
(104, 'a', 'c', 23),
(105, 'f', 'a', 1)
SELECT DataID, MatchID FROM
(
SELECT d.ID AS DataID,
m.ID AS MatchID,
ROW_NUMBER() OVER(PARTITION BY d.ID ORDER BY
CASE WHEN d.Col1 = m.Col1 THEN 1 ELSE 0 END * 1 +
CASE WHEN d.Col2 = m.Col2 THEN 1 ELSE 0 END * 1.5 +
CASE WHEN d.Col3 = m.Col3 THEN 1 ELSE 0 END * 2 DESC) AS rn
FROM #d d
CROSS JOIN #m m
) t WHERE rn = 1
Output:
DataID MatchID
1 104
2 103
3 102

SQL query to group based on sum

I have a simple table with values that I want to chunk/partition into distinct groups based on the sum of those values (up to a certain limit group sum total).
e.g.,. imagine a table like the following:
Key Value
-----------
A 1
B 4
C 2
D 2
E 5
F 1
And I would like to group into sets such that no one grouping's sum will exceed some given value (say, 5).
The result would be something like:
Group Key Value
-------------------
1 A 1
B 4
--------
Total: 5
2 C 2
D 2
--------
Total: 4
3 E 5
--------
Total: 5
4 F 1
--------
Total: 1
Is such a query possible?
While I am inclined to agree with the comments that this is best done outside of SQL, here is some SQL which would seem to do roughly what you're asking:
with mytable AS (
select 'A' AS [Key], 1 AS [Value] UNION ALL
select 'B', 4 UNION ALL
select 'C', 2 UNION ALL
select 'D', 2 UNION ALL
select 'E', 5 UNION ALL
select 'F', 1
)
, Sums AS (
select T1.[Key] AS T1K
, T2.[Key] AS T2K
, (SELECT SUM([Value])
FROM mytable T3
WHERE T3.[Key] <= T2.[Key]
AND T3.[Key] >= T1.[Key]) AS TheSum
from mytable T1
inner join mytable T2
on T2.[Key] >= T1.[Key]
)
select S1.T1K AS StartKey
, S1.T2K AS EndKey
, S1.TheSum
from Sums S1
left join Sums S2
on (S1.T1K >= S2.T1K and S1.T2K <= S2.T2K)
and S2.TheSum > S1.TheSum
and S2.TheSum <= 5
where S1.TheSum <= 5
AND S2.T1K IS NULL
When I ran this code on SQL Server 2008 I got the following results:
StartKey EndKey Sum
A B 5
C D 4
E E 5
F F 1
It should be straightforward to construct the required groups from these results.
If you want to have only two members or less in each set, you can use the following query:
Select
A.[Key] as K1 ,
B.[Key] as K2 ,
isnull(A.value,0) as V1 ,
isnull(B.value,0) as V2 ,
(A.value+B.value)as Total
from Table_1 as A left join Table_1 as B
on A.value+B.value<=5 and A.[Key]<>B.[Key]
For finding sets having more members, you can continue to use joins.