SQL Server : most frequent value in each row - sql

How can I find most frequent value in each row in SQL Server?
Example:
1 a d a a c a b --> a
2 b a c b b b d --> b
3 h a h h b c d --> h
4 d d c h g p m --> d
5 e e g h d e h --> e
In first row, 'a' is most frequent value, etc.

Considering these values are in separate columns, with an UNPIVOT query the solution would look something like.....
Test Data
Declare #T table (ID INT , Col1 varchar(1) , Col2 varchar(1) , Col3 varchar(1)
, Col4 varchar(1) , Col5 varchar(1) , Col6 varchar(1) , Col7 varchar(1))
Insert Into #T values
('1','a','d','a','a','c','a','b'),
('2','b','a','c','b','b','b','d'),
('3','h','a','h','h','b','c','d'),
('4','d','d','c','h','g','p','m'),
('5','e','e','g','h','d','e','h');
Query
WITH X AS (
Select ID , Val, COUNT(*)total
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY COUNT(*) DESC) rn
from #T
UNPIVOT (Val FOR N IN (Col1,Col2,Col3,Col4,Col5,Col6,Col7))up
GROUP BY ID , Val
)
Select t.* , Val
FROM X
INNER JOIN #T t ON x.ID = t.ID
WHERE rn = 1
Result Set
+----+------+------+------+------+------+------+------+-----+
| ID | Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Val |
+----+------+------+------+------+------+------+------+-----+
| 1 | a | d | a | a | c | a | b | a |
| 2 | b | a | c | b | b | b | d | b |
| 3 | h | a | h | h | b | c | d | h |
| 4 | d | d | c | h | g | p | m | d |
| 5 | e | e | g | h | d | e | h | e |
+----+------+------+------+------+------+------+------+-----+

Related

Full outer join not giving the answer I need

I am using PostgreSQL and am having difficulty with getting a series of queries that combine the data from two tables (t1, t2)
t1 is
studyida
gender
age
a
M
1
a
M
2
a
M
3
b
F
4
b
F
5
b
F
6
c
M
13
c
M
14
c
M
15
and t2 is
studyida
studyidb
gender
age
a
z
M
3
a
z
M
4
a
z
M
5
NULL
y
F
7
NULL
y
F
8
NULL
y
F
9
c
x
M
10
c
x
M
11
c
x
M
12
NULL
w
F
7
NULL
w
F
8
NULL
w
F
9
NULL
u
M
7
NULL
u
M
8
NULL
u
M
9
t1 and t2 are related via StudyIDA and gender. What I need is a comprehensive listing from both tables, including including the ages. Sometimes the age in t1 equals the age in t2 (e.g. for StudyIDA=a, age=3) but most of the time it does not.
I am looking to create a table like this
StudyIDA
StudyIDB
gender
ageA
ageB
a
z
M
1
a
z
M
2
a
z
M
3
3
a
z
M
4
a
z
M
5
b
NULL
F
4
b
NULL
F
5
b
NULL
F
6
NULL
y
F
7
NULL
y
F
8
NULL
y
F
9
c
x
F
13
c
x
F
14
c
x
F
15
c
x
F
10
c
x
F
11
c
x
F
12
NULL
w
F
7
NULL
w
F
8
NULL
w
F
9
NULL
u
M
7
NULL
u
M
8
NULL
u
M
9
I was thinking that first a full outer join of t1 and t2 would give me what I want but it does not.
Then I was thinking I need a listing of all the individuals (lets call it t3), and then do a series of inserts (e.g. t1+t3 and also t1+t3) into a new table to 'construct' what I need. I am really stuck on the odd times when age in t1 equals the age in t2 (e.g. for StudyIDA=a, age=3).
I am still not getting what I need. Here is my code so far
DROP TABLE IF EXISTS t1, t2, t3;
CREATE TEMPORARY TABLE t1 (StudyIDA VARCHAR, gender VARCHAR, age int);
INSERT INTO t1 VALUES
('a','M', 1),('a','M', 2),('a','M', 3),
('b','F', 4),('b','F', 5),('b','F', 6),
('c','M', 13),('c','M', 14),('c','M', 15);
SELECT * FROM t1;
CREATE TEMPORARY TABLE t2 (StudyIDA VARCHAR, StudyIDB varchar, gender VARCHAR, age int);
INSERT INTO t2 VALUES
('a','z','M', 3), ('a','z','M', 4), ('a','z','M', 5),
(NULL,'y','F', 7),(NULL,'y','F', 8),(NULL,'y','F', 9),
('c','x','M', 10),('c','x','M', 11),('c','x','M', 12),
(NULL,'w','F', 7),(NULL,'w','F', 8),(NULL,'w','F', 9),
(NULL,'u','M', 7),(NULL,'u','M', 8),(NULL,'u','M', 9);
SELECT * FROM t2;
CREATE TEMPORARY TABLE t3 (StudyIDA_t1 VARCHAR, gender_t1 VARCHAR, StudyIDA_t2 VARCHAR,StudyIDB varchar,
gender_t2 VARCHAR);
INSERT INTO t3
SELECT * FROM (SELECT DISTINCT StudyIDA, gender FROM t1) a FULL OUTER JOIN
(SELECT DISTINCT StudyIDA, StudyIDB, gender FROM t2) b
ON a.StudyIDA=b.StudyIDA AND a.gender=b.gender
ORDER BY a.StudyIDA;
SELECT * FROM t3 ORDER BY StudyIDA_t1;
SELECT 'IN t1', *
FROM t3 JOIN t1 on t1.StudyIDA=t3.StudyIDA_t1 AND t1.gender=t3.gender_t1
ORDER BY StudyIDA_t1, StudyIDB;
SELECT 'In t2',*
FROM t3 JOIN t2 on t3.StudyIDA_t1=t2.StudyIDA AND t3.gender_t1=t2.gender
ORDER BY StudyIDA_t1, t3.StudyIDB;
DROP TABLE IF EXISTS t1, t2, t3;
A full join that includes the age maybe?
And some coalesce's for common fields.
SELECT DISTINCT
COALESCE(t1.StudyIDA, t2.StudyIDA) AS StudyIDA
, t2.StudyIDB
, COALESCE(t1.gender, t2.gender) AS gender
, t1.age as ageA
, t2.age as ageB
FROM t1
FULL JOIN t2
ON t2.StudyIDA is not distinct from t1.StudyIDA
AND t2.gender = t1.gender
AND t2.age = t1.age
ORDER BY StudyIDA, gender, ageA, ageB;
studyida | studyidb | gender | agea | ageb
:------- | :------- | :----- | ---: | ---:
a | null | M | 1 | null
a | null | M | 2 | null
a | z | M | 3 | 3
a | z | M | null | 4
a | z | M | null | 5
b | null | F | 4 | null
b | null | F | 5 | null
b | null | F | 6 | null
c | null | M | 13 | null
c | null | M | 14 | null
c | null | M | 15 | null
c | x | M | null | 10
c | x | M | null | 11
c | x | M | null | 12
null | w | F | null | 7
null | y | F | null | 7
null | w | F | null | 8
null | y | F | null | 8
null | w | F | null | 9
null | y | F | null | 9
null | u | M | null | 7
null | u | M | null | 8
null | u | M | null | 9
db<>fiddle here
Your sample data indicates that only t2.studyida can be NULL and all other columns should really be declared as NOT NULL.
If so, I suggest this simpler query:
SELECT studyida, b.studyidb, gender, age
, CASE WHEN a.age IS NULL THEN 'b'
WHEN b.age IS NULL THEN 'a'
ELSE 'a and b' END as source
FROM t1 a
FULL JOIN t2 b USING (studyida, gender, age)
ORDER BY studyida, gender, age;
db<>fiddle here
The USING clause is convenient for identically named join columns. Only a single instance of the joining column is in the result set, effectively what COALESCE(a.col, b.col) gives you otherwise. (You might just use SELECT *.)
You can still reference source columns with table-qualification, like a.age.
I reduced to a single age column and added source. You may or may not want that.
Either way, "age" is subject to bitrot, almost always the wrong choice for a table column, and should typically be replaced by "birthday" or similar.

sql - Deletion in closure table with multiple same paths

I have the following hierarchical structure:
A -> E -> C -> D
|
|
|-> B -> D
Here is the closure table I've come up with:
| Ancestor | Descendant | Depth |
| A | A | 0 |
| B | B | 0 |
| C | C | 0 |
| D | D | 0 |
| E | E | 0 |
| A | E | 1 |
| A | B | 1 |
| A | C | 2 |
| E | C | 1 |
| A | D | 3 |
| E | D | 2 |
| C | D | 1 |
| A | D | 2 |
| B | D | 1 |
I want to remove the link between B and D, and therefore I want to delete the link between A and D (the one of depth 2). The problem is that I don't want to delete the link between A and D of depth 3 since I didn't delete the link between C and D.
For the moment, here is the SQL statement to list the links I want to delete:
SELECT link.ancestor, link.descendant, link.depth
FROM closure_table p,
closure_table link,
closure_table c
WHERE p.ancestor = link.ancestor
AND c.descendant = link.descendant
AND p.descendant = B
AND c.ancestor = D;
but this statement give me rows I don't want to delete:
| Ancestor | Descendant | Depth |
| A | D | 2 |
| A | D | 3 | <- As said before, I want to keep this one
| B | D | 1 |
You can select the ancestor-descendant pair that has the minimum depth of all of those same ancestor-descendant pairs:
with edges(s, e) as (
-- the pairs to be removed
select 'A', 'D'
union all
select 'B', 'D'
),
n_l as (
select c.* from closure c where c.ancestor != c.descendant
)
select c.* from n_l c where exists (select 1 from edges e where e.s = c.ancestor and e.e = c.descendant)
and c.depth = (select min(c1.depth) from n_l c1 where c1.ancestor = c.ancestor and c1.descendant = c.descendant);
Output:
ancestor
descendant
depth
A
D
2
B
D
1
I think I’ve found the solution, for those who are interested:
declare #Descendant nchar(10) = 'D';
declare #Ancestor nchar(10) = 'B';
with cte as
(
select Ancestor, Depth
from closure_table
where Descendant = #Descendant
and Ancestor = #Ancestor
and Depth = 1
union all
select r.Ancestor, l.Depth + 1 as Depth
from cte as l
join closure_table as r on r.Descendant = l.Ancestor
where r.Depth = 1
)
delete closure_table
from closure_table
join cte on cte.Ancestor = closure_table.Ancestor and cte.Depth = closure_table.Depth
where closure_table.Descendant = #Descendant;

Update multiple rows with values of from a specific row based on same ID

I want to optimize my query to use CTE and some windows functions for better improvement. I am updating existing rows with specified data from other row but they have the same ID number. The code is for MS SQL.
LinkTable:
ID | TYPE | value1 | Value2 | Value3
-----------------------------------------------
234 | MAT | a | b | c
234 | PMS | null | null | null
234 | AMN | null | null | null
45 | MAT | x | m | n
45 | LKM | null | null | null
45 | DFG | null | null | null
3 | MAT | k | s | q
3 | LKM | null | null | null
The result should be:
ID | TYPE | value1 | Value2 | Value3
-----------------------------------------------
234 | MAT | a | b | c
234 | PMS | a | b | c
234 | AMN | a | b | c
45 | MAT | x | m | n
45 | LKM | x | m | n
45 | DFG | x | m | n
3 | MAT | k | s | q
3 | LKM | k | s | q
I used this code:
UPDATE m
SET m.[value1] = l.[value1]
, m.[value2] = l.[value2]
, m.[value2] = l.[value3]
FROM #LinkTable m
INNER JOIN #LinkTable l on l.[ID] = m.[ID]
WHERE l.[type] = 'MAT'
It updates also the main row from which i take the values.
Could anyone help?
Your code is basically fine, but I would add some filters:
UPDATE m
SET m.[value1] = l.[value1],
m.[value2] = l.[value2],
m.[value3] = l.[value3]
FROM #LinkTable m JOIN
#LinkTable l
ON l.[ID] = m.[ID]
WHERE l.[type] = 'MAT' AND
m.type <> 'MAT';
Note: You also have an error in the SET clause. The column value2 is set twice.
You can use window functions like this:
UPDATE m
SET m.[value1] = m.MATvalue1
, m.[value2] = m.MATvalue2
, m.[value3] = m.MATvalue3
FROM (
SELECT *,
MATvalue1 = MIN(CASE WHEN m.[type] = 'MAT' THEN l.value1 END) OVER (PARTITION BY m.ID),
MATvalue2 = MIN(CASE WHEN m.[type] = 'MAT' THEN l.value2 END) OVER (PARTITION BY m.ID),
MATvalue3 = MIN(CASE WHEN m.[type] = 'MAT' THEN l.value3 END) OVER (PARTITION BY m.ID)
FROM #LinkTable m
) m
WHERE m.[type] <> 'MAT';
Note that this may not necessarily be more performant than Gordon's answer

How do I output a set of columns and values, with one column per row as key value pairs in presto?

I have data of the form
id | col1 | col2 | col3 | col4 | col5 | col6 |
----------------------------------------------
1 | a | b | c | d | e | f |
2 | a | b | c | d | e | f |
3 | a | b | c | d | e | f |
that I'm trying to get into the form
id | key | value |
------------------
1 | col1| a
1 | col2| b
1 | col3| c
1 | col4| d
1 | col5| e
1 | col6| f
2 | col1| a
2 | col2| b
2 | col3| c
2 | col4| d
2 | col5| e
2 | col6| f
3 | col1| a
3 | col2| b
3 | col3| c
3 | col4| d
3 | col5| e
3 | col6| f
and I can't for the life of me figure out how to go about doing it. I can accomplish the opposite and turn a map into a single row based on a key via doing something like the follows, but I'm not sure how to go from a single row to many rows based on the columns.
SELECT
id,
key['a'] AS col1,
key['b'] AS col2
FROM (
SELECT id, map_agg(key, value) key
FROM table_a
GROUP BY id
) temp
Is this something that is possible in presto?
You can zip arrays & unnest them using cross join unnest. so, construct a zipped array using column names & column values and then use unnest.
with test (id,col1,col2,col3,col4,col5,col6) AS (
values
(1,'a','b','c','d','e','f'),
(2,'a','b','c','d','e','f'),
(3,'a','b','c','d','e','f')
)
select id, k, v
from test
cross join unnest(
array['col1', 'col2', 'col3', 'col4','col5', 'col6']
, array[col1, col2, col3, col4, col5, col6]
) as x(k, v)
I don't have Presto/Athena onhand to test this, but I think the approach is:
select t.id, kv[1] as key, kv[2] as value
from (select t.*,
array[row('col1', col1),
row('col2', col2),
row('col3', col3),
row('col4', col4),
row('col5', col5),
row('col6', col6)
] as kv_ar
from t
) t cross join
unnest(kv_ar) kv

Joining two queries returns much more rows than expected?

I have the two queries. They both return around 60 rows. But after joining them, they return 900 rows. Is there a way to get the 60 rows while joining them.
Query 1:
SELECT
f.id_user,
f.topup_date,
f.topup_value,
LEAD(f.topup_date) OVER (PARTITION BY(f.id_user) ORDER BY f.topup_date DESC),
f.topup_date::timestamp - LEAD(f.topup_date::timestamp) OVER (PARTITION BY(f.id_user) ORDER BY f.topup_date DESC),
CASE WHEN f.topup_value >= 20 THEN 'Y' ELSE 'N' end,
CASE WHEN f.topup_value >= 20 THEN LEAD(f.topup_date) OVER (PARTITION BY (f.id_user) ORDER BY f.topup_date DESC) END
FROM topups AS f
Query 2:
SELECT
CAST(t2.topup_value as float)/CAST(t1.topup_value as float)
FROM (
SELECT
t1.id_user,
t1.topup_value,
ROW_NUMBER() OVER (PARTITION BY t1.id_user ORDER BY t1.topup_date ) AS rowrank
FROM topups t1
) AS t1
INNER JOIN topups t2 ON t1.id_user=t2.id_user
WHERE t1.rowrank = 1
GROUP BY
t2.id_user,
t2.topup_value,
t2.topup_date,
t1.topup_value,
t1.rowrank
ORDER BY
t2.id_user,
t2.topup_date DESC
Joined query:
SELECT
f.id_user,
f.topup_date,
f.topup_value,
LEAD(f.topup_date) OVER (PARTITION BY(f.id_user) ORDER BY f.topup_date DESC),
f.topup_date::timestamp - LEAD(f.topup_date::timestamp) OVER (PARTITION BY(f.id_user) ORDER BY f.topup_date DESC),
CASE WHEN f.topup_value >= 20 then 'Y' ELSE 'N' END,
CASE WHEN f.topup_value >= 20 THEN LEAD(f.topup_date) OVER (PARTITION BY (f.id_user) ORDER BY f.topup_date desc) END,
CAST(t2.topup_value AS float)/CAST(t1.topup_value AS float)
FROM (
SELECT
t1.id_user,
t1.topup_value,
ROW_NUMBER() OVER (PARTITION BY t1.id_user ORDER BY t1.topup_date ) AS rowrank
FROM topups t1
) AS t1
INNER JOIN topups t2 ON t1.id_user = t2.id_user
INNER JOIN topups f ON f.id_user = t2.id_user
WHERE t1.rowrank = 1
GROUP BY
f.id_user,
f.topup_date,
f.topup_value,
t2.topup_value,
t1.topup_value,
t2.id_user,
t2.topup_date
ORDER BY
t2.id_user,
t2.topup_date DESC,
f.id_user,
f.topup_date DESC
You want to join two query results. For each row in one query result you expect to find one row in the other query result. So, look at the first row in the first query result. You seem to want to join it with exactly one row in the second query result. Which row is this? Which columns do you compare in order to find this matching row?
Let's say these are your query results:
col1 | col4 | col7 | col6 | col3
-----+------+------+------+-----
A | B | 100 | 110 | E
A | B | 19 | 22 | E
F | G | 80 | 78 | H
F | I | 22 | 12 | J
and
col4 | col2 | col1 | col3 | col8
-----+------+------+------+-----
B | 333 | A | E | 89
B | 211 | A | E | 84
G | 815 | F | H | 77
I | 639 | F | J | 79
You want some result like this:
col1 | col4 | col7 | col6 | col3 | col4 | col2 | col1 | col3 | col8
-----+------+------+------+------+------+------+------+------+-----
A | B | 100 | 110 | E | B | 333 | A | E | 89
A | B | 19 | 22 | E | B | 211 | A | E | 84
F | G | 80 | 78 | H | G | 815 | F | H | 77
F | I | 22 | 12 | J | I | 639 | F | J | 79
but you are getting something like this instead:
col1 | col4 | col7 | col6 | col3 | col4 | col2 | col1 | col3 | col8
-----+------+------+------+------+------+------+------+------+-----
A | B | 100 | 110 | E | B | 333 | A | E | 89
A | B | 100 | 110 | E | B | 211 | A | E | 84
A | B | 19 | 22 | E | B | 333 | A | E | 89
A | B | 19 | 22 | E | B | 211 | A | E | 84
F | G | 80 | 78 | H | G | 815 | F | H | 77
F | G | 80 | 78 | J | I | 639 | F | J | 79
F | I | 22 | 12 | H | G | 815 | F | H | 77
F | I | 22 | 12 | J | I | 639 | F | J | 79
You are getting such result, because you just picked one column to join the two query results (id_user in your case, col1 in mine). Look at the first row of the first query result above. It has col1 = 'A'. If I join the second query result on col1, then there are two matching rows, because the second query result has two rows with col1 = 'A'. I end up with many more matches than I want.
So, what are the columns we want to match? In my example it is col1, col3, and col4. Look at the first row of the first query result again. It has col1 = 'A' and col3 = 'B' and col4 = 'E'. There is only one row in the second result set matching col1 = 'A' and col3 = 'B' and col4 = 'E'. My query would hence be
select *
from (<query 1 here>) q1
join (<query 2 here>) q2 on q2.col1 = q1.col1 and q2.col3 = q1.col3 and q2.col4 = q1.col4;
Or I would rather explicitly say which columns I want to see in my result and remove duplicate columns:
select q1.col1, q2.col4, q1.col7, q1.col6, q1.col3, q2.col2, q2.col8
from (<query 1 here>) q1
join (<query 2 here>) q2 on q2.col1 = q1.col1 and q2.col3 = q1.col3 and q2.col4 = q1.col4
order by q1.col1, q2.col4, q1.col7;