Find missing number from a sequence for each value in hive - sql

I have a table like below-
User-id | sequence
1 0
1 1
1 2
2 1
3 2
here 0 1 and 2 are fixed sequence which a user can at max have, now i want a flag as N where any sequence is missing for a user else flag should be Y.I can say output should like-
1 0 Y
1 1 Y
1 2 Y
2 0 N
2 1 Y
2 2 N
3 0 N
3 1 N
3 2 Y

Select distinct user_id, cross join with sequence (0, 1, 2) to get all user+sequnce combinations, left join with your table to calculate the flag
select us.user_id,
us.sequence,
case when t.user_id is null then 'N' else 'Y' end flag
from
(--all user sequence combinations
select u.user_id, s.sequence
from (select distinct user_id from mytable) u
cross join (select stack (3, 0, 1, 2) as sequence) s
) us --all user+sequence
left join mytable t on us.sequence=t.sequence and us.user_id=t.user_id
order by us.user_id, us.sequence;
Demo with your data example:
with
mytable as ( --use your table instead of this
select stack(5,
1, 0,
1, 1,
1, 2,
2, 1,
3, 2) as (user_id,sequence)
)
select us.user_id,
us.sequence,
case when t.user_id is null then 'N' else 'Y' end flag
from
(--all user sequence combinations
select u.user_id,
s.sequence
from (select distinct user_id from mytable) u
cross join (select stack (3, 0, 1, 2) as sequence) s
) us --all user+sequence
left join mytable t on us.sequence=t.sequence and us.user_id=t.user_id
order by us.user_id, us.sequence;
Result:
user_id sequence flag
1 0 Y
1 1 Y
1 2 Y
2 0 N
2 1 Y
2 2 N
3 0 N
3 1 N
3 2 Y

Related

query to flag a column based on multiple conditions

I want to flag a column based on interdependent conditions. My input data is as below
id
status
rnk
A
Open
1
A
Delay
2
A
In
3
B
In
1
B
Out
2
B
Delay
3
B
count
4
C
In
1
C
Close
2
C
out
3
D
Close
1
D
Open
2
D
Delay
3
D
In
4
My output should look like
id
status
rnk
flag
A
Open
1
N
A
Delay
2
Y
A
In
3
N
B
In
1
N
B
Out
2
N
B
Delay
3
N
B
count
4
N
C
In
1
N
C
Close
2
N
C
out
3
N
D
Close
1
N
D
Open
2
N
D
Delay
3
Y
D
In
4
N
Logic - if status column is anything other than Delay then the flag will be N.
If the status column is Delay and if the status is either 'Open' or 'Close' for records which are having less rnk than that of Delay within the same ID then flag will be Y else N.
Example - for ID 'A' we have a status 'Delay' and its rank is 2, now we need to check if the status of A with rank < 2 is either 'Open' or 'Close' then flag 'Delay' to 'Y'
please note: rnk column is already populated in the table based on different logic
Below is the query I have tried, but I am getting flag 'N' for all the records,
SELECT
*,
CASE WHEN status != 'Delay' THEN 'N'
WHEN rnk < (COALESCE(MAX(CASE WHEN status = 'Delay' THEN rnk ELSE -1 END) OVER(PARTITION BY id)))
AND status IN ('Open','Close') THEN 'Y'
ELSE 'N'
END AS flag
FROM TABLE
A correlated subquery is here more help full
SELECT
*,
CASE WHEN status != 'Delay' THEN 'N'
WHEN Exists( SELECT 1 FROM Table1 ta1 WHERE ta1.id = t1.id
AND status IN ('Open','Close') AND ta1.rnk < t1.rnk) THEN 'Y'
ELSE 'N'
END AS flag
FROM Table1 t1
id
status
rnk
flag
A
Open
1
N
A
Delay
2
Y
A
In
3
N
B
In
1
N
B
Out
2
N
B
Delay
3
N
B
count
4
N
C
In
1
N
C
Close
2
N
C
out
3
N
D
Close
1
N
D
Open
2
N
D
Delay
3
Y
D
In
4
N
Consider below simple approach
select *, if(
status = 'Delay' and
countif(status in ('Open', 'Close')) over(partition by id order by rnk) > 0,
'Y', 'N') as flag
from your_table
if applied to sample data in your question - output is
Use a grouped Common Table expression with the maximum rank and row count of all rows with status Open or Close, and left-join the base table with it:
-- your input, don't use in real query...
WITH
indata(id,status,rnk) AS (
SELECT 'A','Open',1
UNION ALL SELECT 'A','Delay',2
UNION ALL SELECT 'A','In',3
UNION ALL SELECT 'B','In',1
UNION ALL SELECT 'B','Out',2
UNION ALL SELECT 'B','Delay',3
UNION ALL SELECT 'B','count',4
UNION ALL SELECT 'C','In',1
UNION ALL SELECT 'C','Close',2
UNION ALL SELECT 'C','out',3
UNION ALL SELECT 'D','Close',1
UNION ALL SELECT 'D','Open',2
UNION ALL SELECT 'D','Delay',3
UNION ALL SELECT 'D','In',4
)
-- input ends here, real query starts below
-- replace following comma with "WITH" ...
,
prev_stats AS (
SELECT
id
, MAX(rnk) AS rnk
, COUNT(*) AS num
FROM indata
WHERE status IN ('Open','Close')
GROUP BY id
)
SELECT
indata.*
, CASE
WHEN status <> 'Delay' THEN 'N'
ELSE
CASE
WHEN prev_stats.num > 0 THEN 'Y'
ELSE 'N'
END
END AS flag
FROM indata
LEFT JOIN prev_stats ON indata.id = prev_stats.id
AND indata.rnk > prev_stats.rnk
;
Result:
id
status
rnk
flag
A
Open
1
N
A
Delay
2
Y
A
In
3
N
B
In
1
N
B
Out
2
N
B
Delay
3
N
B
count
4
N
C
In
1
N
C
Close
2
N
C
out
3
N
D
Close
1
N
D
Open
2
N
D
Delay
3
Y
D
In
4
N

find the consecutive values in impala

I have a data set below with ID, Date and Value. I want to flag the ID where three consecutive days has value 0.
id
date
value
1
8/10/2021
1
1
8/11/2021
0
1
8/12/2021
0
1
8/13/2021
0
1
8/14/2021
5
2
8/10/2021
2
2
8/11/2021
3
2
8/12/2021
0
2
8/13/2021
0
2
8/14/2021
6
3
8/10/2021
3
3
8/11/2021
4
3
8/12/2021
0
3
8/13/2021
0
3
8/14/2021
0
output
id
date
value
Flag
1
8/10/2021
1
Y
1
8/11/2021
0
Y
1
8/12/2021
0
Y
1
8/13/2021
0
Y
1
8/14/2021
5
Y
2
8/10/2021
2
N
2
8/11/2021
3
N
2
8/12/2021
0
N
2
8/13/2021
0
N
2
8/14/2021
6
N
3
8/10/2021
3
Y
3
8/11/2021
4
Y
3
8/12/2021
0
Y
3
8/13/2021
0
Y
3
8/14/2021
0
Y
Thank you.
Using window count() function you can count 0's in the frame [current row, 2 following] (ordered by date) - three consecutive rows frame calculated for each row:
count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt.
If count happens to equal 3 then it means 3 consecutive 0's found, case expression produces Y for each row with cnt=3 : case when cnt=3 then 'Y' else 'N' end.
To propagate 'Y' flag to the whole id group use max(...) over (partition by id)
Demo with your data example (tested on Hive):
with mydata as (--Data example, dates converted to sortable format yyyy-MM-dd
select 1 id,'2021-08-10' date_, 1 value union all
select 1,'2021-08-11',0 union all
select 1,'2021-08-12',0 union all
select 1,'2021-08-13',0 union all
select 1,'2021-08-14',5 union all
select 2,'2021-08-10',2 union all
select 2,'2021-08-11',3 union all
select 2,'2021-08-12',0 union all
select 2,'2021-08-13',0 union all
select 2,'2021-08-14',6 union all
select 3,'2021-08-10',3 union all
select 3,'2021-08-11',4 union all
select 3,'2021-08-12',0 union all
select 3,'2021-08-13',0 union all
select 3,'2021-08-14',0
) --End of data example, use your table instead of this CTE
select id, date_, value,
max(case when cnt=3 then 'Y' else 'N' end) over (partition by id) flag
from
(
select id, date_, value,
count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt
from mydata
)s
order by id, date_ --remove ordering if not necessary
--added it to get result in the same order
Result:
id date_ value flag
1 2021-08-10 1 Y
1 2021-08-11 0 Y
1 2021-08-12 0 Y
1 2021-08-13 0 Y
1 2021-08-14 5 Y
2 2021-08-10 2 N
2 2021-08-11 3 N
2 2021-08-12 0 N
2 2021-08-13 0 N
2 2021-08-14 6 N
3 2021-08-10 3 Y
3 2021-08-11 4 Y
3 2021-08-12 0 Y
3 2021-08-13 0 Y
3 2021-08-14 0 Y
You can identify the ids by comparing lag()s. Then spread the value across all rows. The following gets the flag on the third 0:
select t.*,
(case when value = 0 and prev_value_date_2 = prev_date_2
then 'Y' else 'N'
end) as flag_on_row
from (select t.*,
lag(date, 2) over (partition by value, id order by date) as prev_value_date_2,
lag(date, 2) over (partition by id order by date) as prev_date_2
from t
) t;
The above logic uses lag() so it is easy to extend to longer streaks of 0s. The "2" is looking two rows behind, so if the lagged values are the same, then there are three rows in a row with the same value.
And to spread the value:
select t.*, max(flag_on_row) over (partition by id) as flag
from (select t.*,
(case when value = 0 and prev_value_date_2 = prev_date_2
then 'Y' else 'N'
end) as flag_on_row
from (select t.*,
lag(date, 2) over (partition by value, id order by date) as prev_value_date_2,
lag(date, 2) over (partition by id order by date) as prev_date_2
from t
) t
) t;

sql grouping grades

I have a table for subjects as follows:
id Subject Grade Ext
100 Math 6 +
100 Science 4 -
100 Hist 3
100 Geo 2 +
100 CompSi 1
I am expecting output per student in a class(id = 100) as follows:
Grade Ext StudentGrade
6 + 1
6 0
6 - 0
5 + 0
5 0
5 - 0
4 + 0
4 0
4 - 1
3 + 0
3 1
3 - 0
2 + 1
2 0
2 - 0
1 + 0
1 1
1 - 0
I would want this done on oracle/sql rather than UI. Any inputs please.
You should generate rows first, before join them with your table like below. I use the with clause here to generate the 18 rows in your sample.
with rws (grade, ext) as (
select ceil(level/3), decode(mod(level, 3), 0, '+', 1, '-', null)
from dual
connect by level <= 3 * 6
)
select r.grade, r.ext, nvl2(t.Ext, 1, 0) studentGrade
from rws r
left join your_table t
on t.Grade = r.Grade and decode(t.Ext, r.Ext, 1, 0) = 1
order by 1 desc, decode(r.ext, null, 2, '-', 3, '+', 1)
You could do something like this. In the WITH clause I generate two small "helper" tables (really, inline views) for grades from 1 to 6 and for "extensions" of +, null and -. In the "extensions" view I also create an "ordering" column to use in ordering the final output (if you are wondering why I included that).
Also in the WITH clause I included sample data - you will have to remove that and instead use your actual table name in the main query.
The idea is to cross-join "grades" and "extensions", and left-outer-join the result to your input data. Count the grades from the input data, grouped by grade and extension, and after filtering the desired id. The decode thing in the join condition is needed because for extension we want to treat null as equal to null - something that decode does nicely.
with
sample_inputs (id, subject, grade, ext) as (
select 100, 'Math' , 6, '+' from dual union all
select 100, 'Science', 4, '-' from dual union all
select 100, 'Hist' , 3, null from dual union all
select 100, 'Geo' , 2, '+' from dual union all
select 100, 'CompSi' , 1, null from dual
)
, g (grade) as (select level from dual connect by level <= 6)
, e (ord, ext) as (
select 1, '+' from dual union all
select 2, null from dual union all
select 3, '-' from dual
)
select g.grade, e.ext, count(t.grade) as studentgrade
from g cross join e left outer join sample_inputs t
on t.grade = g.grade and decode(t.ext, e.ext, 0) = 0
and t.id = 100 -- change this as needed!
group by g.grade, e.ext, e.ord
order by g.grade desc, e.ord
;
OUTPUT:
GRADE EXT STUDENTGRADE
----- --- ------------
6 + 1
6 0
6 - 0
5 + 0
5 0
5 - 0
4 + 0
4 0
4 - 1
3 + 0
3 1
3 - 0
2 + 1
2 0
2 - 0
1 + 0
1 1
1 - 0
It looks like you want sparse data to be filled in as part of joining students and subjects.
Since Oracle 10g the correct way to do this has been with a "partition outer join".
The documentation has examples.
https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6

Filter out entire group based on item ranking in SQL

I have a table as shown below:
group item rank
1 A 1
1 B 2
1 C 3
2 A 2
2 B 1
3 A 1
3 C 2
I want those groups data only, where item A has rank 1 as shown below:
group item rank
1 A 1
1 B 2
1 C 3
3 A 1
3 C 2
In group 2, A has rank 2, therefore not a part of output.
One way is using an IN clause
select *
from yourTable
where id in (select id from yourtable where item='A' and rank = 1)
you could use a subquery for get the involved id and the join
select * from my_table m
inner join (
select distinct id
from my_table
where item = 'A'
and rank = 1
) t on t.id = m.id

Adjusting table based on previous values in BigQuery

I have a table that looks like below:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|0| 0
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 0
2 |3/1/16|1| 0
3 |3/1/16|2| 0
I'm trying to make it so that flag is populated if X=2 in the PREVIOUS month. As such, it should look like this:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|2| 1
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 1
2 |3/1/16|1| 0
3 |3/1/16|2| 1
I use this in SQL:
`select ID, date, X, flag into Work_Table from t
(
Select ID, date, X, flag,
Lag(X) Over (Partition By ID Order By date Asc) As Prev into Flag_table
From Work_Table
)
Update [dbo].[Flag_table]
Set flag = 1
where prev = '2'
UPDATE t
Set t.flag = [dbo].[Flag_table].flag FROM T
JOIN [dbo].[Flag_table]
ON t.ID= [dbo].[Flag_table].ID where T.date = [dbo].[Flag_table].date`
However I cannot do this in Bigquery. Any ideas?
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
You can test / play with it using dummy data from your question as
#standardSQL
WITH `project.dataset.work_table` AS (
SELECT 1 id, '1/1/16' dt, 2 x, 0 flag UNION ALL
SELECT 2, '1/1/16', 0, 0 UNION ALL
SELECT 3, '1/1/16', 0, 0 UNION ALL
SELECT 1, '2/1/16', 0, 0 UNION ALL
SELECT 2, '2/1/16', 1, 0 UNION ALL
SELECT 3, '2/1/16', 2, 0 UNION ALL
SELECT 1, '3/1/16', 2, 0 UNION ALL
SELECT 2, '3/1/16', 1, 0 UNION ALL
SELECT 3, '3/1/16', 2, 0
)
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
ORDER BY dt, id
with result as
Row id dt x flag
1 1 1/1/16 2 0
2 2 1/1/16 0 0
3 3 1/1/16 0 0
4 1 2/1/16 0 1
5 2 2/1/16 1 0
6 3 2/1/16 2 0
7 1 3/1/16 2 0
8 2 3/1/16 1 0
9 3 3/1/16 2 1