Writing an analytic function to mark last date of some column - sql

Can you help me to write an analytic function that marks the last date a client's service was stopped. For example one client has 2-3 stops of his service, and I would like to count how many stops there are, and to mark last date of stopping.
I'm using
SELECT column_name1, column_name2, column_name3, column_name4
, ROW_NUMBER() OVER(PARTITION BY column_name3 ORDER BY column_name4) AS Something
FROM ...
WHERE ...
ORDER B
where column_name3 contains status - is service stopped, and column_name4 contains date of last stop.

I hope I understood you correctly
select unique column_name1, column_name2, column_name3,
max(when column_name3 = 'inactive' then column_name4 end) over(partition by column_name1) last_date,
count(when column_name3 = 'inactive' then 1 end) over(partition by column_name1) cnt

Related

How to increment a parent group number when the child window has incrementing values?

I am using Spark SQL 3.2.0
Please see the DB Fiddle link for a simplified example of my dataset and desired outcome.
In abstract, I have a dataset with a series of related events that can be grouped by their time order and event number. When ordering by time and event number, every time the event number resets to 1, you're looking at a new set of events.
I understand how to use row_number() or dense_rank() to increment event_group_number where sub_event_number = 1, but I'm uncertain how to make the rows where sub_event_number > 1 take on the correct event_group_number.
I'm currently doing the following:
case
when sub_event_number = 1 and is_event_type
then row_number() over (partition by context_id, event_id, sub_event_number order by is_event_type asc, start_time asc) - 1
else null
end as event_group_number
I'd be grateful for any help, and I'm happy to answer any questions.
It seems you're looking for a cumulative conditional sum:
SELECT context_id,
event_id,
start_time,
NULLIF(
SUM(CASE WHEN sub_event_number = 1 THEN 1 ELSE 0 END) OVER(
PARTITION BY context_id, event_id
ORDER BY is_event_type, start_time) - 1,
0
) AS event_group_number
FROM foobar
ORDER BY context_id, event_id, is_event_type, start_time
db-fiddle

Create partitions based on column values in sql

I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:
You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle

t-sql repeat row numbers within group

I need to create an ID for every time a name changes in the task history.
The rank needs to do restart with each task and step.
The closest I got to my goal is using the code below.
But it does not produce correct result for when a person appears again in the historical list of actions.
DENSE_RANK() OVER (ORDER BY TaskName, Person)
Thanks in advance
You can use lag() to see where a person changes. Then use a cumulative sum:
select t.*,
sum(case when prev_person = person then 0 else 1 end) over
(partition by task_name order by timestamp) as desired_output
from (select t.*,
lag(person) over (partition by task_name order by timestamp) as prev_person
from t
) t ;
Note: I am interpreting your question as your wanting the numbers separately for each task ("every time a name changes in the task history").
EDIT:
Based on your comment:
select t.*,
sum(case when prev_person = person and prev_stop_name = step_name then 0 else 1 end) over
(partition by task_name order by timestamp) as desired_output
from (select t.*,
lag(person) over (partition by task_name order by timestamp) as prev_person,
lag(step_name) over (partition by task_name order by timestamp) as prev_step_name
from t
) t ;

how to select first and last row in 1 query after Filtering and then carry out calculation between the values of two values in one query

I'm using T-SQL 2014
Suppose I have a stock price chart as follow
I want to write efficient code for a stored function to display the Open price at the start, Close price at the end, and the difference between Close and Open. Is it possible to do that in one query? The query seems easy but it turned out extremely difficult. My first problem is to display the first row and last row in one query.
My attempt is like this
create function GetVolatilityRank(#from date, #to date)
returns table as
return(
with Price_Selected_Time as (select * from Price where [date] between #from and #to)
select
(select top 1([Open]) from Price_Selected_Time) as 'Open',
(select top 1([Close]) from Price_Selected_Time order by date desc) as 'Close',
[Close] - [Open] as 'Difference'
);
I feel this code is very clumsy. And it also won't let me pass, because the 'Open'and 'Close' is not defined yet.
Is there anyway to query this in one select?
Thank you
We can handle this via a regular query using ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY Date) rn_start,
ROW_NUMBER() OVER (ORDER BY Date DESC) rn_end
FROM Price
)
SELECT
MAX(CASE WHEN rn_start = 1 THEN [Open] END) AS OpenStart,
MAX(CASE WHEN rn_end = 1 THEN [Close] END) AS CloseEnd,
MAX(CASE WHEN rn_end = 1 THEN [Close] END) -
MAX(CASE WHEN rn_start = 1 THEN [Open] END) AS diff
FROM cte;

SQL rollup on sessions

I have an impression event table that has a bunch of timestamps and marked start/end boundaries. I am trying to roll it up to have a metric that says "this session contains at least 1 impression with feature x". I'm not sure how exactly to do this. Any help would be appreciated. Thanks.
I want to roll this up into something that looks like:
account, session_start, session_end, interacted_with_feature
3004514, 2018-02-23 13:43:35.475, 2018-02-23 13:43:47.377, FALSE
where it is simple for me to say if this session had any interactions with the feature or not.
Perhaps aggregation does what you want:
select account, min(timestamp), max(timestamp), max(interacted_with_feature)
from t
group by account;
I was able to solve this with conditional cumulative sums to generate a session group ID for each row.
with cte as (
select *
, sum(case when session_boundary = 'start' then 1 else 0 end)
over (partition by account order by timestamp rows unbounded preceding)
as session_num
from raw_sessions
)
select account
, session_num
, min(timestamp) as session_start
, max(timestamp) as session_end
, bool_or(interacted_with_feature) as interacted_with_feature
from cte
group by account, session_num