Select prev value in Hive SQL - sql

I'm been poking at this for a while and haven't had luck. I have a table like the below. I'm trying to get the col value prior to the first text value per user. I'm on Hive.
user ts col isnumber
1 1473811200 5 y
1 1473811205 10 y
1 1473811207 15 y
1 1473811212 text1 n
1 1473811215 text2 n
1 1473811225 30 y
2 1473811201 10 y
2 1473811205 text3 n
2 1473811207 20 y
2 1473811210 30 y
Output should be:
user col
1 15
2 10

Using windowed functions:
SELECT user_, prev
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY user_ ORDER BY ts) AS rn
FROM (SELECT *, CASE
WHEN isnumber = 'y' THEN NULL
WHEN LAG(isnumber,1) OVER(PARTITION BY user_ ORDER BY ts) = 'y'
THEN LAG(col,1) OVER(PARTITION BY user_ ORDER BY ts)
END AS prev
FROM tab) sub
WHERE prev IS NOT NULL) sub2
WHERE rn = 1;
DBFiddle Demo

Related

Finding Latest First x among consecutive x from table

I am trying to write a query to find first latest 1's from each group as below. For example, for Group 1, It shouldn't be 1/2/2022 since it has 1/6/2022 which was shown later. Shouldn't be 1/7/2022 too for Group 1.
Please let me know if you have any idea.
Thanks!
Table x (AsOfDate, Group_Id, Value)
AsOfDate Group_Id Value
1/1/2022 1 0
1/1/2022 2 1
1/2/2022 1 1
1/2/2022 2 1
1/3/2022 1 0
1/3/2022 2 0
1/4/2022 1 0
1/4/2022 2 0
1/5/2022 1 0
1/5/2022 2 1
1/6/2022 1 1
1/6/2022 2 0
1/7/2022 1 1
1/7/2022 2 0
Output
AsOfDate Group_Id
1/6/2022 1
1/5/2022 2
What you want is find the earliest date of the last group for continuous row with Value = 1
Use LAG() window function to find the continuous group of Value
use dense_rank() to rank it by grp find the latest group (r = 1)
min() to get the "first" AsOfDate
select AsOfDate = min(AsOfDate),
Group_Id
from
(
select *, r = dense_rank() over (partition by Group_Id, Value
order by grp desc)
from
(
select *, grp = sum(g) over (partition by Group_Id order by AsOfDate)
from
(
select *, g = case when Value <> lag(Value) over (partition by Group_Id
order by AsOfDate)
then 1
else 0
end
from x
) x
) x
) x
where Value = 1
and r = 1
group by Group_Id

find the consecutive values in impala

I have a data set below with ID, Date and Value. I want to flag the ID where three consecutive days has value 0.
id
date
value
1
8/10/2021
1
1
8/11/2021
0
1
8/12/2021
0
1
8/13/2021
0
1
8/14/2021
5
2
8/10/2021
2
2
8/11/2021
3
2
8/12/2021
0
2
8/13/2021
0
2
8/14/2021
6
3
8/10/2021
3
3
8/11/2021
4
3
8/12/2021
0
3
8/13/2021
0
3
8/14/2021
0
output
id
date
value
Flag
1
8/10/2021
1
Y
1
8/11/2021
0
Y
1
8/12/2021
0
Y
1
8/13/2021
0
Y
1
8/14/2021
5
Y
2
8/10/2021
2
N
2
8/11/2021
3
N
2
8/12/2021
0
N
2
8/13/2021
0
N
2
8/14/2021
6
N
3
8/10/2021
3
Y
3
8/11/2021
4
Y
3
8/12/2021
0
Y
3
8/13/2021
0
Y
3
8/14/2021
0
Y
Thank you.
Using window count() function you can count 0's in the frame [current row, 2 following] (ordered by date) - three consecutive rows frame calculated for each row:
count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt.
If count happens to equal 3 then it means 3 consecutive 0's found, case expression produces Y for each row with cnt=3 : case when cnt=3 then 'Y' else 'N' end.
To propagate 'Y' flag to the whole id group use max(...) over (partition by id)
Demo with your data example (tested on Hive):
with mydata as (--Data example, dates converted to sortable format yyyy-MM-dd
select 1 id,'2021-08-10' date_, 1 value union all
select 1,'2021-08-11',0 union all
select 1,'2021-08-12',0 union all
select 1,'2021-08-13',0 union all
select 1,'2021-08-14',5 union all
select 2,'2021-08-10',2 union all
select 2,'2021-08-11',3 union all
select 2,'2021-08-12',0 union all
select 2,'2021-08-13',0 union all
select 2,'2021-08-14',6 union all
select 3,'2021-08-10',3 union all
select 3,'2021-08-11',4 union all
select 3,'2021-08-12',0 union all
select 3,'2021-08-13',0 union all
select 3,'2021-08-14',0
) --End of data example, use your table instead of this CTE
select id, date_, value,
max(case when cnt=3 then 'Y' else 'N' end) over (partition by id) flag
from
(
select id, date_, value,
count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt
from mydata
)s
order by id, date_ --remove ordering if not necessary
--added it to get result in the same order
Result:
id date_ value flag
1 2021-08-10 1 Y
1 2021-08-11 0 Y
1 2021-08-12 0 Y
1 2021-08-13 0 Y
1 2021-08-14 5 Y
2 2021-08-10 2 N
2 2021-08-11 3 N
2 2021-08-12 0 N
2 2021-08-13 0 N
2 2021-08-14 6 N
3 2021-08-10 3 Y
3 2021-08-11 4 Y
3 2021-08-12 0 Y
3 2021-08-13 0 Y
3 2021-08-14 0 Y
You can identify the ids by comparing lag()s. Then spread the value across all rows. The following gets the flag on the third 0:
select t.*,
(case when value = 0 and prev_value_date_2 = prev_date_2
then 'Y' else 'N'
end) as flag_on_row
from (select t.*,
lag(date, 2) over (partition by value, id order by date) as prev_value_date_2,
lag(date, 2) over (partition by id order by date) as prev_date_2
from t
) t;
The above logic uses lag() so it is easy to extend to longer streaks of 0s. The "2" is looking two rows behind, so if the lagged values are the same, then there are three rows in a row with the same value.
And to spread the value:
select t.*, max(flag_on_row) over (partition by id) as flag
from (select t.*,
(case when value = 0 and prev_value_date_2 = prev_date_2
then 'Y' else 'N'
end) as flag_on_row
from (select t.*,
lag(date, 2) over (partition by value, id order by date) as prev_value_date_2,
lag(date, 2) over (partition by id order by date) as prev_date_2
from t
) t
) t;

Cumulative sum value 1 and reset sum when meet 0 SQL

I am try below query but it's not working.
SELECT *,
CASE WHEN x = 1
THEN ROW_NUMBER() OVER(PARTITION BY id ORDER BY date)
ELSE 0 END AS y
Expected result :
x y
1 1
1 2
1 3
0 0
1 1
1 2
How can I achieve this ? i still want to keep 0 in y column
Count the number of zeros up to each value and then use this to group. The final enumeration uses row_number():
select t.*,
(case when x = 0 then 0
else row_number() over (partition by x, grp order by date)
end) as y
from (select t.*, countif(x = 0) over (order by date) as grp
from t
) t

random row from diapason (1: n) in groups sql

I need select random row from Table using groups and order, but random's row number in group should not be more then constant (for example const = 3).
What I mean:
id time x
1 10:20 1
1 11:21 9
1 16:14 4
1 08:13 8
2 01:20 2
2 21:13 0
For id=1 rows could be:
id time x
1 10:20 1
1 11:21 9
1 08:13 8
BUT not
1 16:14 4 because in order by time it's local number more than 3
for
Id= 2 - any row
WITH cte as (
SELECT *, ROW_NUMBER() OVER (partition by id ORDER BY RANNDOM()) as rn
FROM myTable
)
SELECT *
FROM cte
WHERE rn <= 3
Something like this:
SELECT distinct on (id) *
FROM (select
row_number() over (partition by id order by time ) as up_lim
from tab1) as a
WHERE row_number <= 3
ORDER by id, random() ;

SQL Random N rows for each distinct value in column

I have the following table:
Name Field
A 1
B 1
C 1
D 1
E 1
F 1
G 1
H 2
I 2
J 2
K 3
L 3
M 3
N 3
O 3
P 3
Q 3
R 3
S 3
T 3
I need a SQL query which will generate me a set with 5 random rows for each distinct value on column Field.
For example, results expected:
Name Field
A 1
B 1
D 1
E 1
G 1
J 2
I 2
H 2
M 3
Q 3
T 3
S 3
P 3
Is there an easy way to do this? Or should i split that table into more tables and generate random for each table then union them?
You can do this with a CTE using a ROW_NUMBER() whilst PARTITIONing on the Field:
;With Cte As
(
Select Name, Field,
Row_Number() Over (Partition By Field Order By NewId()) RN
From YourTable
)
Select Name, Field
From Cte
Where RN <= 5
SQL Fiddle
You can readily do this with row_number():
select name, field
from (select t.*,
row_number() over (partition by field order by newid()) as seqnum
from t
) t
where seqnum <= 5;
An enhancement to Gordon Linoff's code, This code really helped me if you need criteria in your query.
select *
from (select t.*,
row_number() over (partition by region order by newid()) as seqnum
from MyTable t
WHERE t.program = 'ACME'
) t
where seqnum <= 1500;