How do I find the Sum and Max value per Unique ID in HIVE?

How do I find the Sum and Max value per Unique ID in HIVE? - sql

basically how do I turn
id name quantity
1 Jerry 1
1 Jerry 2
1 Nana 1
2 Max 4
2 Lenny 3
into
id name quantity
1 Jerry 3
2 Max 4
in HIVE?
I want to sum up and find the highest quantity for each unique ID

You can use window functions with aggregation:
select id, name, quantity
from (select id, name, sum(quantity) as quantity,
row_number() over (partition by id order by sum(quantity) desc) as seqnum
from t
group by id, name
) t
where seqnum = 1;

You can first calculate the sum of quantity per group, then rank them according to descending quantity, and finally filter the rows with rank = 1.
select
id, name, quantity
from (
select
*,
row_number() over (partition by id order by quantity desc) as rn
from (
select id, name, sum(quantity) as quantity
from mytable
group by id, name
)
) where rn = 1;

try like below
with cte as
(
select id,name,sum(quantity) as q
from table_name group by id,name
) select id,name,q from cte t1
where t1.q=( select max(q) from cte t2 where t1.id=t2.id)

Related

Selecting rows that have row_number more than 1

I have a table as following (using bigquery):
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
112
2020
11
3000
1
113
2020
11
1000
1
Is there a way in which I can select rows that have row numbers more than one?
For example, my desired output is:
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.
The original code block I used for the first table result is:
SELECT
id,
year,
month,
SUM(sales) AS sales,
ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM
table
GROUP BY
id, year, month

You can use window functions:
select t.* except (cnt)
from (select t.*,
count(*) over (partition by id) as cnt
from t
) t
where cnt > 1;
As applied to your aggregation query:
SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month,
SUM(sales) as sales,
ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
FROM table
GROUP BY id, year, month
) iym
WHERE cnt > 1;

You can wrap your query as in below example
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (YOUR_ORIGINAL_QUERY)
)
where flag
so it can look as
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (
SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month
)
)
where flag
so when applied to sample data in your question - it will produce below output

Try this:
with tmp as (SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)
It's a so clearly exists statement SQL

This is what I use, it's similar to #ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:
Difference between EXISTS and IN in SQL?
WITH T1 AS
(
SELECT
id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);

List the most up-to-date product of each category,postqresql queries

user_id product_id category_id date_added date_update
1 2 1 2.3.2021 null
1 3 1 2.3.2020 2.4.2023
1 4 2 2.3.2020 null
1 5 2 2.3.2020 2.4.2023
2 5 2 2.3.2020 2.4.2023
2 4 1 2.3.2020 null
List the most up-to-date product of each category

You can use row_number()
select * from
(
select *,row_number() over(parition by userid,category_id order by date_update) as rn
from tablename
)A where rn=1
OR you can also use distinct on
select distinct on (user_id,category_id) *
FROM tablename
ORDER BY user_id,category_id, date_update

List the most up-to-date product of each category
You can use distinct on. Let me assume that if the update date is null, then you want the creation date:
select distinct on (category_id) t.*
from t
order by category_id, coalesce(date_update, date_added) desc;
If you wanted this per user/category combination, the logic would be:
select distinct on (user_id, category_id) t.*
from t
order by user_id, category_id, coalesce(date_update, date_added) desc;

Using Window function
select u_id,c_id, p_id, coalesce (date_update, date_added) as date ,
rank () over (partition by u_id, c_id order by coalesce (date_update, date_added) desc) as r
from inventory
) t where r = 1

How to get the last inserted value for every id in SQL Server 2008?

I have to count the unique status from multiple values. Here is my table example
Id Status OrderId
-------------------
1 1 43
2 2 43
3 1 44
Desired output
It should give the count(status) for Status '1' is 1 and Status '2' is 1. But when using count its giving 2 for status '1'.

You have to do
count(DISTINCT status)
instead of
count(status)
to get
unique status from multiple values.
EDIT:
If you want to get (not count) the Status value of the last inserted record for every OrderId, then you can do:
SELECT Status
FROM (
SELECT Id, Status, OrderId,
ROW_NUMBER() OVER (PARTITION BY OrderId
ORDER BY Id DESC) AS rn
FROM mytable ) t
WHERE t.rn = 1

If you want to get the last status for each order:
with cte as(select *, row_number()
over(partition by OrderID order by Id desc) from TableName)
select * from cte where rn = 1
Or:
select * from (select *, row_number()
over(partition by OrderID order by Id desc) from TableName) t
where rn = 1

Get specific row from a subquery using aggregate function

I am trying to get a specific row from a subquery, but I cannot use an aggregate function in a WHERE clause and I have read that I should be using a HAVING clause but I have no idea where to start.
This is my current sql statement:
SELECT *
FROM
(
select ID, SUM(BALANCE) AS Balance FROM bankacc GROUP BY ID
)A
I will get :
ID | Balance
1 | 30
2 | 40
3 | 50
4 | 50
I need the rows with the MAX(Balance), but I have no idea where to start, please help.

With window function:
DECLARE #t TABLE ( ID INT, Amount MONEY )
INSERT INTO #t
VALUES ( 1, 10 ),
( 1, 10 ),
( 1, 10 ),
( 2, 5 ),
( 2, 20 ),
( 3, 50 )
SELECT ID ,
Amount
FROM ( SELECT ID ,
SUM(Amount) AS Amount ,
RANK() OVER ( ORDER BY SUM(Amount) DESC ) AS rn
FROM #t
GROUP BY ID
) t
WHERE rn = 1
With TOP and TIES:
SELECT TOP 1 WITH TIES
ID ,
SUM(Amount) AS Amount
FROM #t
GROUP BY ID
ORDER BY Amount desc
These versions will return rows where sum will be max, not just top 1 row.
Output:
ID Amount
3 50.00

you can wrap it in a subquery:
SELECT q.id, max(q.b)
FROM
(
select ID, SUM(BALANCE) b FROM bankacc GROUP BY ID
) q
group by q.id
or order them in dessending order and get first record:
select top 1 ID, SUM(BALANCE) b FROM bankacc GROUP BY ID order by b desc
in MySQL you need to use limit 1 instead of top 1

I think this should be simple.
-- This will return only 1 record, even if there are 2 records for MAX same amount
SELECT top 1 ID ,
Amount
FROM ( SELECT ID ,
SUM(Amount) AS Amount
FROM Table
GROUP BY ID
) t
Order by Amount desc,ID asc
Using Window function : This will return what you want.
SELECT ID ,
Amount
FROM ( SELECT ID ,
SUM(Amount) AS Amount ,
RANK() OVER ( ORDER BY SUM(Amount) DESC ) AS rnk
FROM Table
GROUP BY ID
) t
WHERE rnk = 1

Second maximum and minimum values

Given a table with multiple rows of an int field and the same identifier, is it possible to return the 2nd maximum and 2nd minimum value from the table.
A table consists of
ID | number
------------------------
1 | 10
1 | 11
1 | 13
1 | 14
1 | 15
1 | 16
Final Result would be
ID | nMin | nMax
--------------------------------
1 | 11 | 15

You can use row_number to assign a ranking per ID. Then you can group by id and pick the rows with the ranking you're after. The following example picks the second lowest and third highest :
select id
, max(case when rnAsc = 2 then number end) as SecondLowest
, max(case when rnDesc = 3 then number end) as ThirdHighest
from (
select ID
, row_number() over (partition by ID order by number) as rnAsc
, row_number() over (partition by ID order by number desc) as rnDesc
) as SubQueryAlias
group by
id
The max is just to pick out the one non-null value; you can replace it with min or even avg and it would not affect the outcome.

This will work, but see caveats:
SELECT Id, number
INTO #T
FROM (
SELECT 1 ID, 10 number
UNION
SELECT 1 ID, 10 number
UNION
SELECT 1 ID, 11 number
UNION
SELECT 1 ID, 13 number
UNION
SELECT 1 ID, 14 number
UNION
SELECT 1 ID, 15 number
UNION
SELECT 1 ID, 16 number
) U;
WITH EX AS (
SELECT Id, MIN(number) MinNumber, MAX(number) MaxNumber
FROM #T
GROUP BY Id
)
SELECT #T.Id, MIN(number) nMin, MAX(number) nMax
FROM #T INNER JOIN
EX ON #T.Id = EX.Id
WHERE #T.number <> MinNumber AND #T.number <> MaxNumber
GROUP BY #T.Id
DROP TABLE #T;
If you have two MAX values that are the same value, this will not pick them up. So depending on how your data is presented you could be losing the proper result.

You could select the next minimum value by using the following method:
SELECT MAX(Number)
FROM
(
SELECT top 2 (Number)
FROM table1 t1
WHERE ID = {MyNumber}
order by Number
)a
It only works if you can restrict the inner query with a where clause

This would be a better way. I quickly put this together, but if you can combine the two queries, you will get exactly what you were looking for.
select *
from
(
select
myID,
myNumber,
row_number() over (order by myID) as myRowNumber
from MyTable
) x
where x.myRowNumber = 2
select *
from
(
select
myID,
myNumber,
row_number() over (order by myID desc) as myRowNumber
from MyTable
) y
where x.myRowNumber = 2

let the table name be tblName.
select max(number) from tblName where number not in (select max(number) from tblName);
same for min, just replace max with min.

As I myself learned just today the solution is to use LIMIT. You order the results so that the highest values are on top and limit the result to 2. Then you select that subselect and order it the other way round and only take the first one.
SELECT somefield FROM (
SELECT somefield from table
ORDER BY somefield DESC LIMIT 2)
ORDER BY somefield ASC LIMIT 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I find the Sum and Max value per Unique ID in HIVE? - sql

basically how do I turn id name quantity 1 Jerry 1 1 Jerry 2 1 Nana 1 2 Max 4 2 Lenny 3 into id name quantity 1 Jerry 3 2 Max 4 in HIVE? I want to sum up and find the highest quantity for each unique ID

You can use window functions with aggregation: select id, name, quantity from (select id, name, sum(quantity) as quantity, row_number() over (partition by id order by sum(quantity) desc) as seqnum from t group by id, name ) t where seqnum = 1;

try like below with cte as ( select id,name,sum(quantity) as q from table_name group by id,name ) select id,name,q from cte t1 where t1.q=( select max(q) from cte t2 where t1.id=t2.id)

Related

Selecting rows that have row_number more than 1

List the most up-to-date product of each category,postqresql queries

How to get the last inserted value for every id in SQL Server 2008?

Get specific row from a subquery using aggregate function

Second maximum and minimum values

Categories

Resources