t-sql repeat row numbers within group - sql

I need to create an ID for every time a name changes in the task history.
The rank needs to do restart with each task and step.
The closest I got to my goal is using the code below.
But it does not produce correct result for when a person appears again in the historical list of actions.
DENSE_RANK() OVER (ORDER BY TaskName, Person)
Thanks in advance

You can use lag() to see where a person changes. Then use a cumulative sum:
select t.*,
sum(case when prev_person = person then 0 else 1 end) over
(partition by task_name order by timestamp) as desired_output
from (select t.*,
lag(person) over (partition by task_name order by timestamp) as prev_person
from t
) t ;
Note: I am interpreting your question as your wanting the numbers separately for each task ("every time a name changes in the task history").
EDIT:
Based on your comment:
select t.*,
sum(case when prev_person = person and prev_stop_name = step_name then 0 else 1 end) over
(partition by task_name order by timestamp) as desired_output
from (select t.*,
lag(person) over (partition by task_name order by timestamp) as prev_person,
lag(step_name) over (partition by task_name order by timestamp) as prev_step_name
from t
) t ;

Related

Create partitions based on column values in sql

I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:
You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle

How Can I Partition Rows in Cases When the Data is Greater than Other Values of the Same Category

I am using ROW NUMBER() OVER (PARTITION BY) to obtain a numerical index of the first occurring incident a customer purchased a product.
Using the SQL query of:
SELECT
ROW_NUMBER () OVER (PARTITION BY
[Customer Name]
ORDER BY
[Created Date] ) AS Partition,
[Customer Name],
[Created Date]
FROM Database
My data populates as such:
Current Table
My Question
I would like my data to partition additionally by the date. But only if the next date is greater than 60 days from the prior day. The numerical list would reset every 60 days. This Table would populate like this:
Ideal Data
Use lag() and a cumulative sum to define the groups:
select t.*,
sum(case when prev_createddate > dateadd(day, -60, createddate) then 0 else 1 end) over (partition by customername order by createddate) as grp
from (select t.*,
lag(createddate) over (partition by customername order by createddate) as prev_createddate
from t
) t;
Then use row_number() within each group:
select t.*,
row_number() over (partition by customername, grp order by createddate) as mypartition
from (select t.*,
sum(case when prev_createddate > dateadd(day, -60, createddate) then 0 else 1 end) over (partition by customername order by createddate) as grp
from (select t.*,
lag(createddate) over (partition by customername order by createddate) as prev_createddate
from t
) t
) t;
Note that partition is a very poor name for a column because it is a SQL key word.

Select a line equal to 'X' without TOP 'N' plus the previous line 'Y' in SQL Server?

I need to return in a query only the last lines with 'ProductStatus' equal 'Stop' and the previous line.
I have the table:
And need to get this result:
How do I do this in SQL Server?
One method uses window functions to calculate the last stop and then get the row before that:
select t.*
from (select t.*,
lead(seqnum_ps) over (partition by producttype order by datevalue) as next_seqnum_ps,
lead(status) over (partition by producttype order by datevalue) as next_status
from (select t.*,
row_number() over (partition by producttype, product_status order by datevalue desc) as seqnum_ps
from t
) t
) t
where (seqnum_ps = 1 and product_status = 'Stop') or
(next_seqnum_ps = 1 and next_product_status = 'Stop');
An alternative method gets the maximum stop time and uses that:
select t.*
from (select t.*,
max(case when product_status = 'Stop' then datevalue end) over (partition by producttype) as max_stop_dv,
lead(datevalue) over (partition by producttype order by datevalue) as next_dv
from t
) t
where datevalue = max_stop_dv or
next_dv = max_stop_dv;

How to generate session_id by sql?

My tracking system do not generate sessions IDS.
I have user_id & event_date_time.
I need a new session_id for each user's session that starts 30 minutes or more after last event_date_time of each user.
My final goal is to calculate median session time.
I tried to generate session_id=1 and session_id=2 once event_date_time-next_event_time>30 and guid=guid, but i'm stuck from here
select a.*,
case when (a.next_event_date-a.event_date)*24*60<30 and userID=next_userID
then 1
when (a.next_event_date-a.event_date)*24*60>=30 and userID=next_userID then
2
end session_id
from
(select f.userID,
lead(f.userID) over (partition by f.guid order by f.event_date)
next_guid,
f.event_date,
lead(f.event_date) over (partition by f.guid order by f.event_date)
next_event_date
from event_table f
)a
where next_event_date is not null
If I understood correctly you could generate ID's this way:
select id, guid, event_date,
sum(chg) over (partition by guid order by event_date) session_id
from (
select id, guid, event_date,
case when lag(guid) over (partition by guid order by event_date) = guid
and 24 * 60 * (event_date -lag(event_date)
over (partition by guid order by event_date) ) < 30
then 0 else 1
end chg
from event_table ) a
dbfiddle demo
Compare neighbouring rows, if there are different guids or time difference is greater than 30 minutes then assign 1. Then sum these values analytically.
I think you're on the right track using lead or lag. My recommendation would be to break this into steps and create a temp table to work against:
With the first query, assign every record its own unique ID, either a sequence number or GUID. You could also capture some of the lagged data in this step.
With a second query, find the overlaps (< 30 minutes) and make the overlapping records all the same -- either the same as the earliest or latest in that grouping, doesn't matter as long as it's consistent.
Something like this:
create table events_temp as (
select f.*,
row_number() over (partition by f.userID order by f.event_date) as user_row,
lag(f.userID) over (partition by f.userID order by f.event_date) as prev_userID,
lag(f.event_date) over (partition by f.userID order by f.event_date) as prev_event_date
from event_table f
order by f.userId, f.event_date
)
select a.*,
case when prev_userID = userID
and 24 * 60 * (event_date - prev_event_date) < 30
then lag(user_row) over (partition by userID order by user_row)
else user_row
end as session_id
from events_temp

How to return all the rows in the yellow census blocks?

Hey the schema is like this: for the whole dataset, we should order by machine_id first, then order by ss2k. after that, for each machine, we should find all the rows with at least consecutively 5 flag = 'census'. In this dataset, the result should be all the yellow rows..
I cannot return the last 4 rows of the yellow blocks by using this:
drop table if exists qz_panel_census_228_rank;
create table qz_panel_census_228_rank as
select t.*
from (select t.*,
count(*) filter (where flag = 'census') over (partition by machine_id, date order by ss2k rows between current row and 4 following) as census_cnt5,
count(*) filter (where flag = 'census') over (partition by machine_id, date) as count_census,
row_number() over (partition by machine_id, date order by ss2k) as seqnum,
count(*) over (partition by machine_id, date) as cnt
from qz_panel_census_228 t
) t
where census_cnt5 = 5
group by 1,2,3,4,5,6,7,8,9,10,11
DISTRIBUTED BY (machine_id);
You were close, but you need to search in both directions:
select t.*
from (select t.*,
case when count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between 4 preceding and current row) = 5
or count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between current row and 4 following) = 5
then 1
else 0
end as flag
from qz_panel_census_228 t
) t
where flag = 1
Edit:
This approach will not work unless you add an extra count for each possible 5 row window, e.g. 3 preceding and 1 following, 2 preceding and 2 following, etc. This results in ugly code and is not very flexible.
The common way to solve this gaps & islands problem is to assign consecutive rows to a common group first:
select *
from
(
select t2.*,
count(*) over (partition by machine_id, date, grp) as cnt
from
(
select t1.*
from (select t.*,
-- keep the same number for 'census' rows
sum(case when flag = 'census' then 0 else 1 end)
over (partition by machine_id, date
order by ss2k
rows unbounded preceding) as grp
from qz_panel_census_228 t
) t1
where flag = 'census' -- only census rows
) as t2
) t3
where cnt >= 5 -- only groups of at least 5 census rows
Wow, there has to be a better way of doing this, but the only way I could figure out was to create blocks of consecutive 'census' values. This looks awful but might be a catalyst to a better idea.
with q1 as (
select
machine_id, recorded, ss2k, flag, date,
case
when flag = 'census' and
lag (flag) over (order by machine_id, ss2k) != 'census'
then 1
else 0
end as block
from foo
),
q2 as (
select
machine_id, recorded, ss2k, flag, date,
sum (block) over (order by machine_id, ss2k) as group_id,
case when flag = 'census' then 1 else 0 end as census
from q1
),
q3 as (
select
machine_id, recorded, ss2k, flag, date, group_id,
sum (census) over (partition by group_id order by ss2k) as max_count
from q2
),
groups as (
select group_id
from q3
group by group_id
having max (max_count) >= 5
)
select
q2.machine_id, q2.recorded, q2.ss2k, q2.flag, q2.date
from
q2
join groups g on q2.group_id = g.group_id
where
q2.flag = 'census'
If you run each query within the with clauses in isolation, I think you will see how this evolves.