how to get distinct values from multiple columns in 1 single row in oracle sql - sql

I have a row of data like this
id first_cd sec_cd third_cd fourth_cd fifth_cd sixth_cd
1 A B null C C D
output should be :
id first_cd sec_cd third_cd fourth_cd fifth_cd sixth_cd
1 A B C D D D
I need to get distinct values from the columns and remove nulls where there are.
if, first_cd...sixth_cd are columns on the same row.
1 A B null C C D are the values
Anyway to do in this in oracle sql

This is a good place to use lateral joins:
select t.*, x.*
from t cross join lateral
(select max(case when seqnum = 1 then cd end) as cd1,
max(case when seqnum = 2 then cd end) as cd2,
max(case when seqnum = 3 then cd end) as cd3,
max(case when seqnum = 4 then cd end) as cd4,
max(case when seqnum = 5 then cd end) as cd5,
max(case when seqnum = 6 then cd end) as cd6
from (select t.*, row_number() over (order by n) as seqnum
from (select t.cd1 as cd, 1 as n from dual union all
select t.cd2, 2 from dual union all
select t.cd3, 3 from dual union all
select t.cd4, 4 from dual union all
select t.cd5, 5 from dual union all
select t.cd6, 6 from dual
) x
where cd is not null
) x
) x;
Note: This returns the excess values as NULL, which seems more in line with your problem.

Related

How to find the highest and second highest entry in SQL in a single query using `GROUP BY`?

Let this be the table that is provided.
PID
TID
Type
Freq
1
1
A
3
1
1
A
2
1
1
A
1
1
1
B
3
1
2
A
4
1
2
B
5
I want to write a query to get an output like this.
PID
TID
Type
Max_Freq_1
Max_Freq_2
1
1
A
3
2
1
1
B
3
NULL
1
2
A
4
NULL
1
2
B
5
NULL
That is, given a combination of PID, TID, Type, what is the highest and second-highest frequency? If there aren't a sufficient number of entries in the table, then put second highest as NULL
If your database can use the window functions, then the top 2 Freq can be calculated via the DENSE_RANK function.
SELECT PID, TID, Type
, MAX(CASE WHEN Rnk = 1 THEN Freq END) AS Max_Freq_1
, MAX(CASE WHEN Rnk = 2 THEN Freq END) AS Max_Freq_2
FROM
(
SELECT PID, TID, Type, Freq
, DENSE_RANK() OVER (PARTITION BY PID, TID, Type ORDER BY Freq DESC) AS Rnk
FROM YourTable t
) q
GROUP BY PID, TID, Type
ORDER BY PID, TID, Type
pid
tid
type
max_freq_1
max_freq_2
1
1
A
3
2
1
1
B
3
null
1
2
A
4
null
1
2
B
5
null
If ROW_NUMBER isn't available, then try this.
SELECT PID, TID, Type
, MAX(CASE WHEN Rnk = 1 THEN Freq END) AS Max_Freq_1
, MAX(CASE WHEN Rnk = 2 THEN Freq END) AS Max_Freq_2
FROM
(
SELECT t1.PID, t1.TID, t1.Type, t1.Freq
, COUNT(DISTINCT t2.Freq) AS Rnk
FROM YourTable t1
LEFT JOIN YourTable t2
ON t2.PID = t1.PID
AND t2.TID = t1.TID
AND t2.Type = t1.Type
AND t2.Freq >= t1.Freq
GROUP BY t1.PID, t1.TID, t1.Type, t1.Freq
) q
GROUP BY PID, TID, Type
ORDER BY PID, TID, Type
Demo on db<>fiddle here
This is what I came up with on PostgreSQL. Using the window function like row_number is the easiest way to get the result you want.
with t as (
select *, row_number() over (partition by pid, tid, "type" order by freq desc) as r
from test_so
) select pid, tid, "type", max(case when r = 1 then freq end) as "highest", max(case when r = 2 then freq end) as "second_highest"
from t
group by pid, tid, "type"

How put grouping variable to columns in SQL/

I have following dataset
and want to get this
How can I do it?
Using SQL Server, you can use a PIVOT, such as :
SELECT Time, [a],[b],[c]
FROM
(
SELECT time, [group],value
FROM dataset) d
PIVOT
(
SUM(value)
FOR [group] IN ([a],[b],[c])
) AS pvt
You can try it on the following fiddle.
Changed the column names to not conflict with reserved words. You would have to put them into single quotes otherwise.
WITH
-- the input
indata(grp,tm,val) AS (
SELECT 'a',1,44
UNION ALL SELECT 'a',2,22
UNION ALL SELECT 'a',3, 1
UNION ALL SELECT 'b',1, 1
UNION ALL SELECT 'b',2, 5
UNION ALL SELECT 'b',3, 6
UNION ALL SELECT 'c',1, 7
UNION ALL SELECT 'c',2, 8
UNION ALL SELECT 'c',3, 9
)
SELECT tm
, SUM(CASE grp WHEN 'a' THEN val END) AS a
, SUM(CASE grp WHEN 'b' THEN val END) AS b
, SUM(CASE grp WHEN 'c' THEN val END) AS c
FROM indata
GROUP BY tm
;
tm | a | b | c
----+----+---+---
1 | 44 | 1 | 7
2 | 22 | 5 | 8
3 | 1 | 6 | 9
select * from
(
select
time,[group],value
from yourTable
group by time,[group],value
)
as table
pivot
(
sum([value])
for [group] in ([a],[b],[c])
) as p
order by time
This is the result
for Vertica,
SELECT time
, SUM(value) FILTER (WHERE group = a) a
, SUM(value) FILTER (WHERE group = b) b
, SUM(value) FILTER (WHERE group = c) c
FROM yourTable
GROUP BY time

count rows which have max value less than specified parameter

I want to find in my table, max value which is less than specified in parameter and get count of rows that have the same value as max value. For example in my table I have values: (4,1,3,1,4,4,10), and it is list of parameters in string "2,9,10,4". I have to split string to separate parameters. Base on this sample values I want to get something like that:
param | max value | count
2 | 1 | 2
9 | 4 | 3
10 | 4 | 3
4 | 3 | 1
And it is my sample query:
select
[param]
, max([val]) [max_value_by_param]
, max(count) [count]
from(
select
n.value as [param]
,a.val
, count(*) as [count]
from (--mock of table
select 1 as val union all
select 3 as val union all
select 4 as val union all
select 1 as val union all
select 3 as val union all
select 4 as val union all
select 4 as val union all
select 10 as val
) a
join (select [value] from string_split('2,9,10,4', ',')) n--list of params
on a.val < n.[value]
group by n.value, a.val
) tmp
group by [param]
Is it possible to do it better/easier ?
Here is a way to express this using apply:
select s.value as param, a.val, a.cnt
from string_split('2,9,10,4', ',') s outer apply
(select top (1) a.val, count(*) as cnt
from a
group by a.val
having a.val < s.value
order by a.val desc
) a;
Here is a db<>fiddle.
But the fastest method is probably going to be:
with av as (
select a.val, count(*) as cnt
from a
group by a.val
union all
select s.value, null as cnt
from string_split('2,9,10,4', ',') s
)
select val, a_val, a_cnt
from (select av.*,
max(case when cnt is not null then val end) over (order by val, (case when cnt is null then 1 else 2 end)) as a_val,
max(case when cnt is not null then cnt end) over (order by val, (case when cnt is null then 1 else 2 end)) as a_cnt
from av
) av
where cnt is null;
This only aggregates the data once and should return all parameters, even those with no preceding values in a.

How to calculate leave-out-median using PL/SQL analytic function

I'm trying to write an analytic function in PL/SQL that, when applied to a column within a table, returns for each row in the table, the median of the column excluding the given row.
An example to clarify: Suppose I have a table TABLE consisting of one column X that takes on the following values:
1
2
3
4
5
I want to define an analytic function LOOM() such that:
SELECT LOOM(X)
FROM TABLE
delivers the following:
3.5
3.5
3
2.5
2.5
i.e., for each row, the median of X, excluding the given row. I've been struggling to build the desired LOOM() function.
I'm not sure if there is a "clever" way to do this. You can do the calculation with a correlated subquery.
Assuming the x values are unique -- as in your example --
with t as (
select 1 as x from dual union all
select 2 as x from dual union all
select 3 as x from dual union all
select 4 as x from dual union all
select 5 as x from dual
)
select t.*,
(select median(x)
from t t2
where t2.x <> t.x
) as loom
from t;
EDIT:
A more efficient method uses analytic functions but requires more direct calculation of the median. For instance:
with t as (
select 1 as x from dual union all
select 2 as x from dual union all
select 3 as x from dual union all
select 4 as x from dual union all
select 5 as x from dual
)
select t.*,
(case when mod(cnt, 2) = 0
then (case when x <= candidate_1 then candidate_2 else candidate_1 end)
else (case when x <= candidate_1 then (candidate_2 + candidate_3)/2
when x = candidate_2 then (candidate_1 + candidate_3)/2
else (candidate_1 + candidate_2) / 2
end)
end) as loom
from (select t.*,
max(case when seqnum = floor(cnt / 2) then x end) over () as candidate_1,
max(case when seqnum = floor(cnt / 2) + 1 then x end) over () as candidate_2,
max(case when seqnum = floor(cnt / 2) + 2 then x end) over () as candidate_3
from (select t.*,
row_number() over (order by x) as seqnum,
count(*) over () as cnt
from t
) t
) t

selecting the highest count for a categorical variable when grouping

I have the following table:
custID Cat
1 A
1 B
1 B
1 B
1 C
2 A
2 A
2 C
3 B
3 C
4 A
4 C
4 C
4 C
What I need is the most efficient way to aggregate by CustID in such a manner that I obtain the most frequent category (cat), the second most frequent and the third. The output of the above should be
most freq 2nd most freq 3rd most freq
1 B A C
2 A C Null
3 B C Null
4 C A Null
When there is a tie in the count I do not really care what is first and what is second. For example for customer 1 2nd most freq and 3rd most freq could be swapped because each of them occur 1 time only.
Any sql would be fine, preferable hive sql.
Thank you
Try to use group by twice and dense_rank() to sort accorting to the cat count. Actually I'm not 100% sure , but I guess it should work in hive as well.
select custId,
max(case when t.rn = 1 then cat end) as [most freq],
max(case when t.rn = 2 then cat end) as [2nd most freq],
max(case when t.rn = 3 then cat end) as [3th most freq]
from
(
select custId, cat, dense_rank() over (partition by custId order by count(*) desc) rn
from your_table
group by custId, cat
) t
group by custId
demo
According to the comments I add slightly modified solution that conforms with Hive SQL
select custId,
max(case when t.rn = 1 then cat else null end) as most_freq,
max(case when t.rn = 2 then cat else null end) as 2nd_most_freq,
max(case when t.rn = 3 then cat else null end) as 3th_most_freq
from
(
select custId, cat, dense_rank() over (partition by custId order by ct desc) rn
from (
select custId, cat, count(*) ct
from your_table
group by custId, cat
) your_table_with_counts
) t
group by custId
Hive SQL demo
SELECT journal, count(*) as frequency
FROM ${hiveconf:TNHIVE}
WHERE journal IS NOT NULL
GROUP BY journal
ORDER BY frequency DESC
LIMIT 5;