Obtain corresponding value to max value of another column - hive

I need to find the corresponding value to the max value of another column.
My data is as below:
group
subgroup
subgroup_2
value_a
value_b
date
A
101
1
200
101
20220301
A
102
1
105
90
20220301
A
103
2
90
202
20220301
A
211
2
75
107
20220301
B
212
1
91
65
20220301
B
213
1
175
101
20220301
I would need to format the data like this:
group
subgroup_2
max_value_a
value_b
date
A
1
200
101
20220301
A
2
90
202
20220301
B
1
175
101
20220301
I can achieve the format fairly easily via a group by, however I have to aggregate value_b to do this which doesn't give me the result I need.
I know I can use rank() over partition by but it doesn't seem to provide the format I require.
This is the query I used below, however it only provides the max of one subgroup_2 rather than the max of each:
select group, subgroup_2, max_value_a, value_b, date
from
(
select a.group, a.subgroup_2, a.max_value_a, a.value_b, a.date,
rank() over(partition by a.group, subgroup_2, a.date order by a.max_value_a desc) as rnk
from table_1 a
)s
where rnk=1

You want to use ROW_NUMBER here:
SELECT group, subgroup_2, value_a AS max_value_a, value_b, date
FROM
(
SELECT group, subgroup_2, value_a, value_b, date,
ROW_NUMBER() OVER (PARTITION BY group, subgroup_2 ORDER BY value_a DESC) rn
FROM table_1
) t
WHERE rn = 1;

Related

Query to pull data from column based off max value of second column

I have a table that has [Order], [Yield], [Scrap], [OpAc] columns. I need to pull the yield based on the max value of [OpAc].
Order
Yield
Scrap
OpAc
1234
140
0
10
1234
140
0
20
1234
130
10
30
1234
130
0
40
1234
125
5
50
1234
110
15
60
1235
140
0
10
1235
138
2
20
1235
138
0
30
1235
138
0
40
1235
138
0
50
1235
137
1
60
1235
137
0
70
Expected Results
Order
Yield
1234
110
1235
137
The query that I have tried is
select [Order], [Yield], MAX([OpAc]) as Max_OpAc
from SCRAP
GROUP BY [Order], [Yield]
order by [order]
This produces
Order
Yield
Max_OpAc
1234
110
60
1234
125
50
1234
130
40
1234
140
20
1235
137
70
1235
138
50
1235
140
10
I've tried setting up some CTE queries to break it down into separate functions but I keep getting caught at this step.
WITH CTE1 AS(
SELECT ROW_NUMBER() OVER(PARTITION BY [Order] ORDER BY [Order],[OpAc]) AS RN , *
FROM SAP_SCRAP
),
This proved to be redundant due to the fact that the [OpAc] field is sequential for each step.
Thanks in advance for any help
You almost got it!
WITH Orders_By_OpAc_Desc AS (
SELECT
[Order],
[Yield].
ROW_NUMBER() OVER (PARTITION BY [Order] ORDER BY OpAc DESC) AS [rn],
FROM
SCRAP
)
SELECT [Order],
[Yield]
FROM
Orders_By_OpAc_Desc
WHERE
rn = 1
The trick here is ROW_NUMBER() OVER (PARTITION BY [Order] ORDER BY OpAc DESC) AS [rn]. It might be confusing to understand in SQL, but when expressed in words it's a bit clearer.
This statement takes each group of rows with the same Order value (PARTITION BY [Order]), orders each group by OpAc in descending order so that the higher OpAc values end up "on top" of the group (ORDER BY OpAc DESC), and numbers each row in the group "top" to "bottom", starting with 1 (ROW_NUMBER()).
Meaning, each row with this number set to 1 has the highest OpAc value for the OrderId.
Wrap that into a CTE and then select just the rows with this number (rn) set to 1. Voi-la.
You definitely want the OVER (PARTITION BY) but MAX() is also an option here. You want something like:
SELECT
*
FROM
(
SELECT
t3.*
, MAX(OpAc) OVER (PARTITION BY [Order]) max1
FROM
SCRAP t3
) a
WHERE
a.Max1 = a.OpAc
for MAX()
Depending on your SQL Server edition, version, and query needs, you may be able to use FIRST_VALUE() as well:
SELECT
DISTINCT
t3.[Order],
FIRST_VALUE(Yield) OVER(PARTITION BY [Order] ORDER BY OpAc DESC) Yield
FROM
SCRAP t3
You were so close. Just missing an ORDER BY OpAc DESC in your ROW_NUMBER function.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE orders (
[Order] int null
, Yield int null
, Scrap int null
, OpAc int null
);
INSERT INTO orders ([Order], Yield, Scrap, OpAc)
VALUES (1234,140,0,10)
, (1234,140,0,20)
, (1234,130,10,30)
, (1234,130,0,40)
, (1234,125,5,50)
, (1234,110,15,60)
, (1235,140,0,10)
, (1235,138,2,20)
, (1235,138,0,30)
, (1235,138,0,40)
, (1235,138,0,50)
, (1235,137,1,60)
, (1235,137,0,70)
;
Query 1:
WITH CTE1 AS (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [Order] ORDER BY OpAc DESC) as row_num
FROM orders
)
SELECT *
FROM CTE1 as c
WHERE c.row_num = 1
Results:
| Order | Yield | Scrap | OpAc | row_num |
|-------|-------|-------|------|---------|
| 1234 | 110 | 15 | 60 | 1 |
| 1235 | 137 | 0 | 70 | 1 |

MSSQL - Running sum with reset after gap

I have been trying to solve a problem for a few days now, but I just can't get it solved. Hence my question today.
I would like to calculate the running sum in the following table. My result so far looks like this:
PersonID
Visit_date
Medication_intake
Previous_date
Date_diff
Running_sum
1
2012-04-26
1
1
2012-11-16
1
2012-04-26
204
204
1
2013-04-11
0
1
2013-07-19
1
1
2013-12-05
1
2013-07-19
139
343
1
2014-03-18
1
2013-12-05
103
585
1
2014-06-24
0
2
2014-12-01
1
2
2015-03-09
1
2014-12-01
98
98
2
2015-09-28
0
This is my desired result. So only the running sum over contiguous blocks (Medication_intake=1) should be calculated.
PersonID
Visit_date
Medication_intake
Previous_date
Date_diff
Running_sum
1
2012-04-26
1
1
2012-11-16
1
2012-04-26
204
204
1
2013-04-11
0
1
2013-07-19
1
1
2013-12-05
1
2013-07-19
139
139
1
2014-03-18
1
2013-12-05
103
242
1
2014-06-24
0
2
2014-12-01
1
2
2015-03-09
1
2014-12-01
98
98
2
2015-09-28
0
I work with Microsoft SQL Server 2019 Express.
Thank you very much for your tips!
This is a gaps and islands problem, and one approach uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PersonID
ORDER BY Visit_date) rn1,
ROW_NUMBER() OVER (PARTITION BY PersonId, Medication_intake
ORDER BY Visit_date) rn2
FROM yourTable
)
SELECT PersonID, Visit_date, Medication_intake, Previous_date, Date_diff,
CASE WHEN Date_diff IS NOT NULL AND Medication_intake = 1
THEN SUM(Date_diff) OVER (PARTITION BY PersonID, rn1 - rn2
ORDER BY Visit_date) END AS Running_sum
FROM cte
ORDER BY PersonID, Visit_date;
Demo
The CASE expression in the outer query computes the rolling sum for date diff along islands of records having a medication intake value of 1. For other records, or for records where date diff be null, the value generated is simply null.

Getting latest price of different products from control table

I have a control table, where Prices with Item number are tracked date wise.
id ItemNo Price Date
---------------------------
1 a001 100 1/1/2003
2 a001 105 1/2/2003
3 a001 110 1/3/2003
4 b100 50 1/1/2003
5 b100 55 1/2/2003
6 b100 60 1/3/2003
7 c501 35 1/1/2003
8 c501 38 1/2/2003
9 c501 42 1/3/2003
10 a001 95 1/1/2004
This is the query I am running.
SELECT pr.*
FROM prices pr
INNER JOIN
(
SELECT ItemNo, max(date) max_date
FROM prices
GROUP BY ItemNo
) p ON pr.ItemNo = p.ItemNo AND
pr.date = p.max_date
order by ItemNo ASC
I am getting below values
id ItemNo Price Date
------------------------------
10 a001 95 2004-01-01
6 b100 60 2003-01-03
9 c501 42 2003-01-03
Question is, is my query right or wrong? though I am getting my desired result.
Your query does what you want, and is a valid approach to solve your problem.
An alternative option would be to use a correlated subquery for filtering:
select p.*
from prices p
where p.date = (select max(p1.date) from prices where p1.itemno = p.itemno)
The upside of this query is that it can take advantage of an index on (itemno, date).
You can also use window functions:
select *
from (
select p.*, rank() over(partition by itemno order by date desc) rn
from prices p
) p
where rn = 1
I would recommend benchmarking the three options against your real data to assess which one performs better.

Row_Number Sybase SQL Anywhere change on multiple condition

I have a selection that returns
EMP DOC DATE
1 78 01/01
1 96 02/01
1 96 02/01
1 105 07/01
2 4 04/01
2 7 04/01
3 45 07/01
3 45 07/01
3 67 09/01
And i want to add a row number (il'l use it as a primary id) but i want it to change always when the "EMP" changes, and also won't change when the doc is same as previous one like:
EMP DOC DATE ID
1 78 01/01 1
1 96 02/01 2
1 96 02/01 2
1 105 07/01 3
2 4 04/01 1
2 7 04/01 2
3 45 07/01 1
3 45 07/01 1
3 67 09/01 2
In SQL Server I could use LAG to compare previous DOC but I can't seem to find a way into SYBASE SQL Anywhere, I'm using ROW_NUMBER to partitions by the "EMP", but it's not what I need.
SELECT EMP, DOC, DATE, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY EMP, DOC, DATE) ID -- <== THIS WILL CHANGE THE ROW NUMBER ON SAME DOC ON SAME EMP, SO WOULD NOT WORK.
Anyone have a direction for this?
You sem to want dense_rank():
select
emp,
doc,
date,
dense_rank() over(partition by emp order by date) id
from mytable
This numbers rows within groups having the same emp, and increments only when date changes, without gaps.
if performance is not a issue in your case, you can try sth. like:
SELECT tx.EMP, tx.DOC, tx.DATE, y.ID
FROM table_xxx tx
join y on tx.EMP = y.EMP and tx.DOC = y.DOC
(SELECT EMP, DOC, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY DOC) ID
FROM(SELECT EMP, DOC FROM table_xxx GROUP BY EMP, DOC)x)y

How to get latest records based on two columns of max

I have a table called Inventory with the below columns
item warehouse date sequence number value
111 100 2019-09-25 12:29:41.000 1 10
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 1 5
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-19 12:05:23.000 1 4
333 300 2020-01-20 12:05:23.000 1 5
Expected Output:
item warehouse date sequence number value
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-20 12:05:23.000 1 5
Based on item and warehouse, i need to pick latest date and latest sequence number of value.
I tried with below code
select item,warehouse,sequencenumber,sum(value),max(date) as date1
from Inventory t1
where
t1.date IN (select max(date) from Inventory t2
where t1.warehouse=t2.warehouse
and t1.item = t2.item
group by t2.item,t2.warehouse)
group by t1.item,t1.warehouse,t1.sequencenumber
Its working for latest date but not for latest sequence number.
Can you please suggest how to write a query to get my expected output.
You can use row_number() for this:
select *
from (
select
t.*,
row_number() over(
partition by item, warehouse
order by date desc, sequence_number desc, value desc
) rn
from mytable t
) t
where rn = 1