hive - Top N with multiple measurements in one query? - sql

The sample table
value measurement1 measurement2
-------|-------------|-----------
value1 1 **2**
value2 **3** **3**
value3 **2** 1
Then find top 2 highest value,
I want to get the output below:
top 2 by measurement1 top 2 by measurement2
---------------------|----------------------
value2 value2
value3 value1

You can do this using row_number() and join:
select s1.value1 as col1,
s2.value2 as col2
from (select s.*,
row_number() over (order by value1) as seqnum
from sample s
) s1 join
(select s.*,
row_number() over (order by value2) as seqnum
from sample s
) s2
on s1.seqnum = s2.seqnum
where s1.seqnum <= 2
order by s1.seqnum;

Related

How to show "value"[column] and "sum of all such values"[row wise] for a group of EmpId in the same row

How to show "value"[column] and "sum of all such values"[row wise] for a group of EmpId in the same row.
SELECT ID, SUM(VALUE1), SUM(VALUE2)
GROUP BY EmpID
The above query will return the sum, but I also want to show values of Value1 and Value2 in the same row.
E.g.
I have following input table:
EmpID VALUE1 VALUE2
==================
1 1 5
1 2 6
2 3 7
2 4 8
I want following output table (grouped by EmpID) -> Both Value1, Value2 and their sum in same row
EmpID Value1 VALUE2 total_Value1 total_Value2
===============================================
1 1 5 3 11
1 2 6 3 11
2 3 7 7 15
2 4 8 7 15
Using SUM() OVER(PARTITION BY) window function, you can get your expected result:
SELECT EmpID, VALUE1, VALUE2,
SUM(VALUE1) OVER (PARTITION BY EmpID) AS total_Value1,
SUM(VALUE2) OVER (PARTITION BY EmpID) AS total_Value2
FROM TableName
Demo on db<>fiddle
Use window functions:
SELECT ID, SUM(SUM(VALUE1)) OVER (PARTITION BY id),
SUM(SUM(VALUE2)) OVER (PARTITION BY id)
FROM T
GROUP BY EmpID;
However, I don't think aggregation is needed:
SELECT ID, SUM(VALUE1) OVER (PARTITION BY id),
SUM(VALUE2) OVER (PARTITION BY id)
FROM T;

Sql Group by Value1 having count(*) > 1 but with different value 2

Given an SQL table like this
id value1 value2
---------------
1 1 1
2 1 1
3 1 1
4 2 1
5 2 2
6 3 1
I want to find all the value1's that have duplicate value1 (i.e using group by having count(*)>1) but only if they have different values for value2
So in this example I just want to return 2
Im using Postgres
If I understand correctly, this is group by with a having clause:
select value1
from t
group by value1
having min(value2) <> max(value2)
use
select * from ( select * , ROW_NUMBER() OVER(PARTITION BY Value1 ORDER BY Value1 , Value2 ASC) AS RowValue1, ROW_NUMBER() OVER(PARTITION BY Value1 , Value2 ORDER BY Value1 , Value2 ASC) AS RowValue2 from Table_1 ) As TableTmp where TableTmp.RowValue1 <> TableTmp.RowValue2
Or
select * from Table_1 where value1 in (select value1 from Table_1 group by value1 having min(value2) <> max(value2) )

How to not count NULL values in DENSE_RANK()?

Say I have the following table:
col
NULL
1
1
2
Then I select:
SELECT col, DENSE_RANK() OVER(ORDER BY col) as rnk from table
Then I get:
col rnk
NULL 1
1 2
1 2
2 3
What I want to get is this:
col rnk
NULL NULL
1 1
1 1
2 2
But if I query:
SELECT col, CASE WHEN col IS NOT NULL THEN DENSE_RANK() OVER(ORDER BY col) END as rnk from table
Then I get:
col rnk
NULL NULL
1 2
1 2
2 3
Is there a way to disregard NULLs when ranking, other than using a WHERE clause? I have some other columns whose rows cannot be omitted.
Use partition by:
SELECT col,
(CASE WHEN col IS NOT NULL
THEN DENSE_RANK() OVER (PARTITION BY (CASE WHEN col IS NOT NULL THEN 1 ELSE 2 END)
ORDER BY col
)
END) as rnk
FROM table;
Below is for BigQuery Legacy SQL
SELECT col, CASE WHEN col IS NOT NULL THEN rnk END AS rnk
FROM (
SELECT
col, (col IS NULL) AS tmp,
DENSE_RANK() OVER(PARTITION BY tmp ORDER BY col) AS rnk
FROM table
)

Select new column based on existing value

I have sample data like below
create Table #Temp(id int, Data1 varchar(10), Data2 bigint)
Insert Into #Temp
Values(1,'Value1',109040774),
(2,'Value2',10000006099758),
(3,'Value3',10000006099758),
(4,'Value1',14538),
(5,'Value2',10000006097458),
(6,'Value3',10000006097458),
(7,'Value1',4454834),
And trying to select new column based on Data1, so the output will be
id Data1 NewColumn
1 Value1 109040774
2 Value2 109040774
3 Value3 109040774
4 Value1 14538 --reset here because same value of Data1 (Value 1 started repeating)
5 Value2 14538
6 Value3 14538
7 Value1 4454834 --reset here because same value of Data1 (Value 1 started repeating)
I was trying to use something like below, but not what I after
SELECT id, Data1,
FIRST_VALUE(Data2) OVER (Partition by Data1 ORDER BY Id ASC) AS NewCol
FROM #Temp
Order By Id
Any help is appreciate
Here is my idea assuming a special value1 defines a new group:
Use a cumulative sum to calculate the number of value1s on or before each row.
This defines a grouping.
Within each group, use first_value().
Hence:
select t.id, t.data1,
first_value(data2) over (partition by grp order by id) as newcolumn
from (select t.*,
sum(case when data1 = 'value1' then 1 else 0 end) over (order by id) as grp
from t
) t;

Group Count in T/SQL

Source:
CREATE TABLE #TempTab (Value INT, Value1 varchar(10), Value2 varchar(10),
GRP varchar(10))
INSERT INTO #TempTab
SELECT 1,'One','One','One'
UNION ALL
SELECT 1,'One','One','One'
UNION ALL
sELECT 1,'One','One','Two'
UNION ALL
SELECT 2,'One','One','One'
UNION ALL
SELECT 2,'One','One','Two'
UNION ALL
SELECT 2,'One','One','Three'
UNION ALL
SELECT 3,'One','One','One'
UNION ALL
SELECT 3,'One','One','One'
Current query effort:
SELECT Value, Value1, Value2, GRP
, COUNT(1) OVER(PARTITION BY Value, Value1, Value2) CNT
, ROW_NUMBER() OVER(PARTITION BY Value, Value1, Value2, GRP ORDER BY Value) RN
, CASE
WHEN COUNT(*) OVER (PARTITION BY Value, Value1, Value2, GRP) > 1 THEN 1
ELSE 0
END IsMultiple
FROM #TempTab
DROP TABLE #TempTab
Current output:
Value Value1 Value2 GRP CNT RN IsMultiple
1 One One One 3 1 1
1 One One One 3 2 1
1 One One Two 3 1 0
2 One One One 3 1 0
2 One One Two 3 1 0
2 One One Three 3 1 0
3 One One One 2 1 1
3 One One One 2 2 1
Desired output:
Value Value1 Value2 GRP CNT RN IsMultiple NoUniqueGRPed
1 One One One 3 1 1 2
1 One One One 3 2 1 2
1 One One Two 3 1 0 2
2 One One One 3 1 0 3
2 One One Two 3 1 0 3
2 One One Three 3 1 0 3
3 One One One 2 1 1 1
3 One One One 2 2 1 1
Goal:
I am trying to derive a field called NoUniqueGRPed. This field is
basically count of unique grouped records based on Value, Value1, and
Value2 fields. i.e. Value = 1, Value1 = One, and Value2 = One has
three records but two unique GRP values (One and Two) so NoUniqueGRPed
should be 2.
I'm having trouble trying to figure out how to do the unique
aggregation/grouping.
You can try qith cross apply:
SELECT ...,
ca.NoUniqueGRPed
FROM #TempTab t1
CROSS APPLY(SELECT COUNT(DISTINCT GRP) AS NoUniqueGRPed
FROM #TempTab t2
WHERE t1.Value = t2.Value)ca
You can do this directly with window functions:
select tt.*,
count(distinct grp) over (partition by value, value1, value2) as NewColumn
from #TempTab tt
EDIT:
I though that limitation had been fixed. Alas. You can do this using a combination of sum() and row_number():
select tt.*,
sum(case when seqnum = 1 then 1 else 0 end) over (partition by value, value1, value2) as NewColumn
from (select tt.*, row_number() over (partition by value, value1, value2, grp order by grp) as seqnum
from #TempTab tt
) tt