Create partitions based on column values in sql - sql

I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:

You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle

Related

How to increment a parent group number when the child window has incrementing values?

I am using Spark SQL 3.2.0
Please see the DB Fiddle link for a simplified example of my dataset and desired outcome.
In abstract, I have a dataset with a series of related events that can be grouped by their time order and event number. When ordering by time and event number, every time the event number resets to 1, you're looking at a new set of events.
I understand how to use row_number() or dense_rank() to increment event_group_number where sub_event_number = 1, but I'm uncertain how to make the rows where sub_event_number > 1 take on the correct event_group_number.
I'm currently doing the following:
case
when sub_event_number = 1 and is_event_type
then row_number() over (partition by context_id, event_id, sub_event_number order by is_event_type asc, start_time asc) - 1
else null
end as event_group_number
I'd be grateful for any help, and I'm happy to answer any questions.
It seems you're looking for a cumulative conditional sum:
SELECT context_id,
event_id,
start_time,
NULLIF(
SUM(CASE WHEN sub_event_number = 1 THEN 1 ELSE 0 END) OVER(
PARTITION BY context_id, event_id
ORDER BY is_event_type, start_time) - 1,
0
) AS event_group_number
FROM foobar
ORDER BY context_id, event_id, is_event_type, start_time
db-fiddle

t-sql repeat row numbers within group

I need to create an ID for every time a name changes in the task history.
The rank needs to do restart with each task and step.
The closest I got to my goal is using the code below.
But it does not produce correct result for when a person appears again in the historical list of actions.
DENSE_RANK() OVER (ORDER BY TaskName, Person)
Thanks in advance
You can use lag() to see where a person changes. Then use a cumulative sum:
select t.*,
sum(case when prev_person = person then 0 else 1 end) over
(partition by task_name order by timestamp) as desired_output
from (select t.*,
lag(person) over (partition by task_name order by timestamp) as prev_person
from t
) t ;
Note: I am interpreting your question as your wanting the numbers separately for each task ("every time a name changes in the task history").
EDIT:
Based on your comment:
select t.*,
sum(case when prev_person = person and prev_stop_name = step_name then 0 else 1 end) over
(partition by task_name order by timestamp) as desired_output
from (select t.*,
lag(person) over (partition by task_name order by timestamp) as prev_person,
lag(step_name) over (partition by task_name order by timestamp) as prev_step_name
from t
) t ;

how to select first and last row in 1 query after Filtering and then carry out calculation between the values of two values in one query

I'm using T-SQL 2014
Suppose I have a stock price chart as follow
I want to write efficient code for a stored function to display the Open price at the start, Close price at the end, and the difference between Close and Open. Is it possible to do that in one query? The query seems easy but it turned out extremely difficult. My first problem is to display the first row and last row in one query.
My attempt is like this
create function GetVolatilityRank(#from date, #to date)
returns table as
return(
with Price_Selected_Time as (select * from Price where [date] between #from and #to)
select
(select top 1([Open]) from Price_Selected_Time) as 'Open',
(select top 1([Close]) from Price_Selected_Time order by date desc) as 'Close',
[Close] - [Open] as 'Difference'
);
I feel this code is very clumsy. And it also won't let me pass, because the 'Open'and 'Close' is not defined yet.
Is there anyway to query this in one select?
Thank you
We can handle this via a regular query using ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY Date) rn_start,
ROW_NUMBER() OVER (ORDER BY Date DESC) rn_end
FROM Price
)
SELECT
MAX(CASE WHEN rn_start = 1 THEN [Open] END) AS OpenStart,
MAX(CASE WHEN rn_end = 1 THEN [Close] END) AS CloseEnd,
MAX(CASE WHEN rn_end = 1 THEN [Close] END) -
MAX(CASE WHEN rn_start = 1 THEN [Open] END) AS diff
FROM cte;

How to get the difference between (multiple) two different rows?

I have a set of data containing some fields: month, customer_id, row_num (RANK), and verified_date.
The rank field indicates the first (1) and second (2) purchase of each customer. I would like to know the time difference between first and second purchase for each customer and show only its first month = month where row_num = 1.
https://i.ibb.co/PjJk5Y0/Capture.png
So my expected result is like below image:
https://i.ibb.co/y5Mww7k/Capture-2.png
I'm using StandardSQL in Google Bigquery.
row_num, verified_date
from table
GROUP BY 1, 2```
We can try using a pivot query here, aggregating by the customer_id:
SELECT
MAX(CASE WHEN row_num = 1 THEN month END) AS month,
customer_id,
1 AS row_num,
DATE_DIFF(MAX(CASE WHEN row_num = 2 THEN verified_date END),
MAX(CASE WHEN row_num = 1 THEN verified_date END), DAY) AS difference
FROM yourTable
GROUP BY
customer_id;

SQL Date intelligence: filtering data by seconds ran from last known valid result

Help! We're trying to create a new column (Is Valid?) to reproduce the logic below.
It is a binary result that:
it is 1 if it is the first known value of an ID
it is 1 if it is 3 seconds or later than the previous "1" of that ID
Note 1: this is not the difference in seconds from the previous record
It is 0 if it is less than 3 seconds than the previous "1" of that ID
Note 2: there are many IDs in the data set
Note 3: original dataset has ID and Date
Attached a PoC of the data and the expected result.
You would have to do this using a recursive CTE, which is quite expensive:
with tt as (
select t.*, row_number() over (partition by id order by time) as seqnum
from t
),
recursive cte as (
select t.*, time as grp_start
from tt
where seqnum = 1
union all
select tt.*,
(case when tt.time < cte.grp_start + interval '3 second'
then tt.time
else tt.grp_start
end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select cte.*,
(case when grp_start = lag(grp_start) over (partition by id order by time)
then 0 else 1
end) as isValid
from cte;