Picking up latest 2 records from table in hive

Picking up latest 2 records from table in hive - hive

Team, I have a scenario here.
I need to pick 2 latest record through Hql.
I have tried rownumber but does not seems to be getting expected out put
Select
A.emp_ref_i,
A.last_updt_d,
A.start_date,
case when A.Last_updt_d=max(A.Last_updt_d) over (partition by A.emp_ref_i)
and A.start_date=max(a.start_date) over (partition by A.emp_ref_i)
then 'Y' else 'N' end as Valid_f,
a.CHANGE
from
(
select
distinct(emp_ref_i),
last_updt_d,
start_date,
CHANGE
from
PR) A
Currently getting output as
EMP_REF_I LAST_UPDT_D start_date Valid_f CHANGE
1 123 3/29/2020 2/3/2019 Y CHG3
2 123 3/30/2019 2/4/2018 N CHG2
3 123 3/29/2019 2/4/2018 N CHG1
but required:
EMP_REF_I LAST_UPDT_D start_date Valid_f CHANGE
1 123 3/29/2020 2/3/2019 Y CHG3
2 123 3/30/2019 2/4/2018 N CHG2

Use row_number and filter:
select s.emp_ref_i,
s.last_updt_d,
s.start_date,
case when rn=1 then 'Y' else 'N' end Valid_f,
s.change
from
(
Select
A.*,
row_number() over(partition by A.emp_ref_i order by a.Last_updt_d desc, a.start_date desc) rn
from (...) A
)s
where rn<=2;

Related

Finding Latest First x among consecutive x from table

I am trying to write a query to find first latest 1's from each group as below. For example, for Group 1, It shouldn't be 1/2/2022 since it has 1/6/2022 which was shown later. Shouldn't be 1/7/2022 too for Group 1.
Please let me know if you have any idea.
Thanks!
Table x (AsOfDate, Group_Id, Value)
AsOfDate Group_Id Value
1/1/2022 1 0
1/1/2022 2 1
1/2/2022 1 1
1/2/2022 2 1
1/3/2022 1 0
1/3/2022 2 0
1/4/2022 1 0
1/4/2022 2 0
1/5/2022 1 0
1/5/2022 2 1
1/6/2022 1 1
1/6/2022 2 0
1/7/2022 1 1
1/7/2022 2 0
Output
AsOfDate Group_Id
1/6/2022 1
1/5/2022 2

What you want is find the earliest date of the last group for continuous row with Value = 1
Use LAG() window function to find the continuous group of Value
use dense_rank() to rank it by grp find the latest group (r = 1)
min() to get the "first" AsOfDate
select AsOfDate = min(AsOfDate),
Group_Id
from
(
select *, r = dense_rank() over (partition by Group_Id, Value
order by grp desc)
from
(
select *, grp = sum(g) over (partition by Group_Id order by AsOfDate)
from
(
select *, g = case when Value <> lag(Value) over (partition by Group_Id
order by AsOfDate)
then 1
else 0
end
from x
) x
) x
) x
where Value = 1
and r = 1
group by Group_Id

return row where column value changed from last change

I have a table and i want to know the minimum date since the last change grouped by 2 columns
in the data, I want to know the lates PartNumberID by location, with the min date since the last change.
*Expected row it's not part of the table
DATA:
Location
RecordAddedDate
PartNumberID
ExpectedRow
7
2022-06-23
1
I want this row
8
2022-06-23
1
I want this row
8
2022-06-24
1
8
2022-06-25
1
9
2022-06-23
1
I want this row
15
2022-06-23
1
15
2022-06-24
1
15
2022-06-25
2
15
2022-06-26
1
I want this row
15
2022-06-27
1
Expected output:
Location
RecordAddedDate
PartNumberID
7
2022-06-23
1
8
2022-06-23
1
9
2022-06-23
1
15
2022-06-26
1
I'm on sql
I have tried with but I dont know how to stop when the value change
with cte as (
select t.LocationID, t.RecordAddedDate, t.PartNumberID
FROM mytable t
INNER JOIN (select PL.LocationID, PL.RecordAddedDate, PL.PartNumberID
FROM mytable PL INNER JOIN
(SELECT PSCc.LocationID, MAX(PSCc.RecordAddedDate) AS DateSetup
FROM mytable PSCc
WHERE PSCc.RecordDeleted = 0
GROUP BY PSCc.LocationID) AS PSCc ON PSCc.LocationID = PL.LocationID AND PSCc.DateSetup = RecordAddedDate) as tt on t.RecordAddedDate<=tt.RecordAddedDate and t.LocationID= tt.LocationID and t.PartNumberID= tt.PartNumberID
)
select *
from cte c
where not exists(
select 1 from cte
where cte.LocationID = c.LocationID
and cte.PartNumberID=c.PartNumberID
and cte.RecordAddedDate<c.RecordAddedDate
)
order by LocationID,RecordAddedDate
Thank you

use lag() to find the last change (order by RecordAddedDate desc) in PartNumberID.
cumulative sum sum(isChange) to group the related rows under same group no. grp = 0 with be the rows of the last change
To get the min - RecordAddedDate, use row_number()
with
cte1 as
(
select *,
isChange = case when PartNumberID
= isnull(lag(PartNumberID) over (partition by Location
order by RecordAddedDate desc),
PartNumberID)
then 0
else 1
end
from mytable
),
cte2 as
(
select *, grp = sum(isChange) over (partition by Location order by RecordAddedDate desc)
from cte1
),
cte3 as
(
select *, rn = row_number() over (partition by Location order by RecordAddedDate)
from cte2 t
where t.grp = 0
)
select *
from cte3 t
where t.rn = 1
db<>fiddle demo

SQL Query to find the Row with first change of data

UniqueId
ITEM
DATE
1
A
2022-01-01
2
A
2022-01-02
3
B
2022-01-03
4
B
2022-01-04
5
A
2022-01-05
6
A
2022-01-06
7
B
2022-01-07
8
B
2022-01-08
9
A
2022-01-09
10
A
2022-01-10
11
A
2022-01-11
I have above table where the item is changing from A to B and then B to A (etc).
The the most recent item in the table based on the date is A (the last row).
I need to find the date on which this last item (A) was started to be in effect.
So in this case the item A was in effect from 2022-01-09 onwards (UniqueId 9).
How can I find the UniqueId or the date of item A, where it got changed to be in effect (Row 9)?
Thank you.

with data as (
select *,
last_value(item) over (order by "date") as last_item,
lag(item) over (order by "date") as prev_item
from T
)
select
max(case when item = last_item and item <> prev_item then "date" end) as max_date
from data;
or
with data as (
select *,
case when item <> lag(item) over (order by "date")
and item = last_value(item) over (order by "date")
then 1 end as flag
from T
)
select max("date") as last_transition_date
from data
where flag = 1;
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=bd5f6398c0167d74c26a67fafac5225e
Supposing you need all the data:
with data as (
select *,
case when item <> lag(item) over (order by "date")
and item = last_value(item) over (order by "date")
then 1 end as flag
from T
)
select *,
max(case when flag = 1 then "date" end) over () as last_transition_date
from data;

Getting a flag using a comparison of current item with previous item in time, using LAG() is indeed the way.
But it's absolutely sufficient to get the highest date and highest unique (as both are sorted ascending together) where the obtained flag is 1:
WITH
-- your input
indata(UniqueId,ITEM,DATE) AS (
SELECT 1,'A',DATE '2022-01-01'
UNION ALL SELECT 2,'A',DATE '2022-01-02'
UNION ALL SELECT 3,'B',DATE '2022-01-03'
UNION ALL SELECT 4,'B',DATE '2022-01-04'
UNION ALL SELECT 5,'A',DATE '2022-01-05'
UNION ALL SELECT 6,'A',DATE '2022-01-06'
UNION ALL SELECT 7,'B',DATE '2022-01-07'
UNION ALL SELECT 8,'B',DATE '2022-01-08'
UNION ALL SELECT 9,'A',DATE '2022-01-09'
UNION ALL SELECT 10,'A',DATE '2022-01-10'
UNION ALL SELECT 11,'A',DATE '2022-01-11'
)
-- real query starts here; replace following comma with "WITH"
,
w_change_ind AS (
SELECT
*
, CASE WHEN LAG(item) OVER(ORDER BY date) <> item
THEN 1
ELSE 0
END AS chg_ind
FROM indata
)
SELECT
MAX(uniqueid) AS uqid
, MAX(date) AS dt
FROM w_change_ind
WHERE chg_ind=1
;
-- out uqid | dt
-- out ------+------------
-- out 9 | 2022-01-09

Based on your description, this is one way to do what you want.
select top 1 * from table1
where item ='A'
order by uniqueid desc
If this is not what you want, then you will have to provide additional information.

First value in DATE minus 30 days SQL

I have bunch of data out of which I'm showing ID, max date and it's corresponding values (user id, type, ...). Then I need to take MAX date for each ID, substract 30 days and show first date and it's corresponding values within this date period.
Example:
ID Date Name
1 01.05.2018 AAA
1 21.04.2018 CCC
1 05.04.2018 BBB
1 28.03.2018 AAA
expected:
ID max_date max_name previous_date previous_name
1 01.05.2018 AAA 05.04.2018 BBB
I have working solution using subselects, but as I have quite huge WHERE part, refresh takes ages.
SUBSELECT looks like that:
(SELECT MIN(N.name)
FROM t1 N
WHERE N.ID = T.ID
AND (N.date < MAX(T.date) AND N.date >= (MAX(T.date)-30))
AND (...)) AS PreviousName
How'd you write the select?
I'm using TSQL
Thanks

I can do this with 2 CTEs to build up the dates and names.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE t1 (ID int, theDate date, theName varchar(10)) ;
INSERT INTO t1 (ID, theDate, theName)
VALUES
( 1,'2018-05-01','AAA' )
, ( 1,'2018-04-21','CCC' )
, ( 1,'2018-04-05','BBB' )
, ( 1,'2018-03-27','AAA' )
, ( 2,'2018-05-02','AAA' )
, ( 2,'2018-05-21','CCC' )
, ( 2,'2018-03-03','BBB' )
, ( 2,'2018-01-20','AAA' )
;
Main Query:
;WITH cte1 AS (
SELECT t1.ID, t1.theDate, t1.theName
, DATEADD(day,-30,t1.theDate) AS dMinus30
, ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.theDate DESC) AS rn
FROM t1
)
, cte2 AS (
SELECT c2.ID, c2.theDate, c2.theName
, ROW_NUMBER() OVER (PARTITION BY c2.ID ORDER BY c2.theDate) AS rn
, COUNT(*) OVER (PARTITION BY c2.ID) AS theCount
FROM cte1
INNER JOIN cte1 c2 ON cte1.ID = c2.ID
AND c2.theDate >= cte1.dMinus30
WHERE cte1.rn = 1
GROUP BY c2.ID, c2.theDate, c2.theName
)
SELECT cte1.ID, cte1.theDate AS max_date, cte1.theName AS max_name
, cte2.theDate AS previous_date, cte2.theName AS previous_name
, cte2.theCount
FROM cte1
INNER JOIN cte2 ON cte1.ID = cte2.ID
AND cte2.rn=1
WHERE cte1.rn = 1
Results:
| ID | max_date | max_name | previous_date | previous_name |
|----|------------|----------|---------------|---------------|
| 1 | 2018-05-01 | AAA | 2018-04-05 | BBB |
| 2 | 2018-05-21 | CCC | 2018-05-02 | AAA |
cte1 builds the list of max_date and max_name grouped by the ID and then using a ROW_NUMBER() window function to sort the groups by the dates to get the most recent date. cte2 joins back to this list to get all dates within the last 30 days of cte1's max date. Then it does essentially the same thing to get the last date. Then the outer query joins those two results together to get the columns needed while only selecting the most and least recent rows from each respectively.
I'm not sure how well it will scale with your data, but using the CTEs should optimize pretty well.
EDIT: For the additional requirement, I just added in another COUNT() window function to cte2.

I would do:
select id,
max(case when seqnum = 1 then date end) as max_date,
max(case when seqnum = 1 then name end) as max_name,
max(case when seqnum = 2 then date end) as prev_date,
max(case when seqnum = 2 then name end) as prev_name,
from (select e.*, row_number() over (partition by id order by date desc) as seqnum
from example e
) e
group by id;

How to calculate unique rank in SQL Server (without any duplication)?

I want to calculate unique rankings but I get duplicate rankings
Here's my attempt:
SELECT
TG.EMPCODE,
DENSE_RANK() OVER (ORDER BY TS.COUNT_DEL DESC, TG.COUNT_TG DESC) AS YOUR_RANK
FROM
(SELECT
EmpCode,
SUM(CASE WHEN Tgenerate = 1 THEN 1 ELSE 0 END) AS COUNT_TG
FROM
TBLTGENERATE1
GROUP BY
EMPCODE) TG
INNER JOIN
(SELECT
EMP_CODE,
SUM(CASE WHEN STATUS = 'DELIVERED' THEN 1 ELSE 0 END) AS COUNT_DEL
FROM
TBLSTAT
GROUP BY
EMP_CODE) TS ON TG.EMPCODE = TS.EMP_CODE;
The output I get is like this:
EID Rank
---------
102 1
105 2
101 2
103 3
106 4
There is same rank for 105 and 101.
How do I calculate unique ranking?

Use ROW_NUMBER() instead of DENSE_RANK():
SELECT TG.EMPCODE,
ROW_NUMBER() OVER (ORDER BY TS.COUNT_DEL DESC, TG.COUNT_TG DESC) AS YOUR_RANK
Ties will then be given sequential rankings.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Picking up latest 2 records from table in hive - hive

Use row_number and filter: select s.emp_ref_i, s.last_updt_d, s.start_date, case when rn=1 then 'Y' else 'N' end Valid_f, s.change from ( Select A.*, row_number() over(partition by A.emp_ref_i order by a.Last_updt_d desc, a.start_date desc) rn from (...) A )s where rn<=2;

Related

Finding Latest First x among consecutive x from table

return row where column value changed from last change

SQL Query to find the Row with first change of data

First value in DATE minus 30 days SQL

How to calculate unique rank in SQL Server (without any duplication)?

Categories

Resources