How to flag id's by a condition? - sql

I want to create a column that flags an id it has a straight order process. i.e. id’s which don’t have order_status pending or info_required.
e.g. id a has pending, so is_straight will be false. b has no pending or info_required, so it should be true.
Here is the example data:
WITH t1 AS (
SELECT 'a' AS id, 'created' AS status, '2021-11-02 15:04:07'::timestamp AS created_at UNION ALL
SELECT 'a' AS id, 'created' AS status, '2021-11-03 13:23:34'::timestamp AS created_at UNION ALL
SELECT 'a' AS id, 'pending' AS status, '2021-11-07 04:04:46'::timestamp AS created_at UNION ALL
SELECT 'a' AS id, 'successful' AS status, '2021-11-07 13:25:05'::timestamp AS created_at UNION ALL
SELECT 'b' AS id, 'created' AS status, '2021-11-11 16:19:07'::timestamp AS created_at UNION ALL
SELECT 'b' AS id, 'successful' AS status, '2021-11-13 17:57:55'::timestamp AS created_at UNION ALL
SELECT 'c' AS id, 'created' AS status, '2021-11-15 01:09:23'::timestamp AS created_at UNION ALL
SELECT 'c' AS id, 'info_required' AS status, '2021-11-17 11:06:00'::timestamp AS created_at UNION ALL
SELECT 'c' AS id, 'successful' AS status, '2021-11-21 23:35:46'::timestamp AS created_at
)

Using windowed COUNT_IF:
SELECT *,
COUNT_IF(order_status IN ('pending', 'info_required')) OVER(PARTITION BY id) = 0
AS is_straight
FROM t1;
Output:

Related

How to get max status as a column?

I want to create a column that shows whether it is the max order_status as TRUE or FALSE based on created_at.
Is there a way to achieve this without a subquery in Snowflake?
Here is my example data:
WITH t1 AS (
SELECT 'A' AS id, 'created' AS status, '2021-05-18 18:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-19 11:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-05-19 12:00:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-05-20 18:30:00'::timestamp AS created_at
)
Using windowed MAX:
WITH t1(id, status, created_at) AS (
SELECT 'A', 'created', '2021-05-18 18:30:00'::timestamp UNION ALL
SELECT 'A', 'created', '2021-05-19 11:30:00'::timestamp UNION ALL
SELECT 'A', 'pending', '2021-05-19 12:00:00'::timestamp UNION ALL
SELECT 'A', 'successful', '2021-05-20 18:30:00'::timestamp AS created_at
)
SELECT *, created_at = MAX(created_at) OVER(PARTITION BY ID) AS is_final_order_status
FROM t1;
Output:
A cased row_number could work
SELECT id, status, created_at
, CASE
WHEN 1 = ROW_NUMBER() OVER (PARTITION BY id ORDER BY created_at DESC)
THEN 'TRUE'
ELSE 'FALSE'
END is_final_order_status
FROM t1

How to build a closed funnel of user steps in Big Query?

Please help me with the BigQuery query. I need to build a closed funnel of user steps events in a mobile app for a week.
The table looks like this:
It is necessary to collect all unique users who have passed from step 1 to step 2 and so on to step 6 during this period. Between these steps, they could do something else, be distracted by other events. But what is important is the passage of each unique user through these steps in a given period of time.
Please tell me how to create such a funnel?
There can be multiple ways of achieving this. Here is an approach using identical sample data, which is not the most optimal but is very self-explanatory and definite:
with data as (
select 'a' as user_id, cast('2020-01-01 04:45:00' as timestamp) as event_timestamp, '1' as step_name
union all
select 'b' as user_id, cast('2020-01-01 04:50:00' as timestamp) as event_timestamp, '1' as step_name
union all
select 'a' as user_id, cast('2020-01-01 05:00:00' as timestamp) as event_timestamp, '2' as step_name
union all
select 'a' as user_id, cast('2020-01-01 05:15:00' as timestamp) as event_timestamp, '3' as step_name
union all
select 'b' as user_id, cast('2020-01-01 04:55:00' as timestamp) as event_timestamp, '2' as step_name
union all
select 'c' as user_id, cast('2020-01-01 04:58:00' as timestamp) as event_timestamp, '1' as step_name
union all
select 'a' as user_id, cast('2020-01-01 05:16:00' as timestamp) as event_timestamp, '4' as step_name
union all
select 'b' as user_id, cast('2020-01-01 05:16:00' as timestamp) as event_timestamp, '3' as step_name
),
data2 as (
select a.user_id, a.step_name step_1, b.step_name step_2, c.step_name step_3, d.step_name step_4 from ( select user_id, event_timestamp, step_name from data where step_name = '1') a
left join data b on (a.user_id = b.user_id and a.event_timestamp < b.event_timestamp and b.step_name = '2')
left join data c on (b.user_id = c.user_id and b.event_timestamp < c.event_timestamp and c.step_name = '3')
left join data d on (c.user_id = d.user_id and c.event_timestamp < d.event_timestamp and d.step_name = '4')
)
select * from (
select 'step_1' as event_name, count(distinct user_id) as n_users from data2 where step_1 is not null
group by 1
union all
select 'step_2' as event_name, count(distinct user_id) as n_users from data2 where (step_1 is not null and step_2 is not null)
group by 1
union all
select 'step_3' as event_name, count(distinct user_id) as n_users from data2 where (step_1 is not null and step_2 is not null and step_3 is not null)
group by 1
union all
select 'step_4' as event_name, count(distinct user_id) as n_users from data2 where (step_1 is not null and step_2 is not null and step_3 is not null and step_4 is not null)
group by 1
)
order by 1
You can further optimize this based on your specific filters, conditions, etc.

How to do a union with where clause

I am still learning to write it and below query is not giving me accurate results plus it is also not optimized.
So one of the main things that I am trying to do is create 3 date types:
sales_shift,
refund,
sales_reg
in my data sets.
To achieve this I am doing a union.
All 3 data sets are querying the same source sales_main. The problem is in my 2nd data set with ‘Refund’ as date_type, it is not pulling the actual rows because of ColumnA = ‘B’ condition.
I would like this set to look at only those records that were pulled in 1st data set (i.e ‘Sales_shift’ as date_type) and then apply the condition ColumnA = ‘B’. How do I do that?
My 3rd data set with ‘sales_reg’ as date_type should be same as 1st set except with transaction_date not shifted. How do I do that. I was thinking of where exists but do not know how to apply it.
Any help would be awesome. Thanks much
create table sales_refund as select * from (
with table1 as (select 'Sales_shift' as date_type,
add_months(date_trunc('month',transaction_date),1) as event_date,
sku,
sum(sales) as Sales_total,
sum(refund) as Refund_total,
case when region in ('US','EU') then 'type A' else 'type B' end as Flag_field1
from sales_main where transaction_date >= add_months(current_date(),-6)
group by sku,
'Sales_shift'
add_months(date_trunc('month',transaction_date),1),
Flag_field1),
table2 as (select sku,
date_type,
event_date,
Sales_total,
Refund_total,
Flag_field1,
case when Flag_field1 = 'type B' or (Flag_field1 = 'type A' and Sales_total > 20000) then 'Yes' else 'No' end as Flag_field2
from table1 )
Select sku,
date_type,
event_date,
Sales_total,
Refund_total,
Flag_field1
fromm table2 where Flag_field2 = 'Yes' )
union all
select sku,
'Refund' as date_type,
date_trunc('month',refund_date) as event_date,
sum(sales) as Sales_total,
sum(refund) as Refund_total,
'X' as Flag_field1
from sales_main where transaction_date >= add_months(current_date(),-6) and ColumnA = 'B'
group by sku,
'Refund'
date_trunc('month',refund_date),
'X'
union all
select sku,
'Sales_Reg' as date_type,
date_trunc('month',transaction_date) as event_date,
sum(sales) as Sales_total,
sum(refund) as Refund_total,
'Y' as Flag_field1
from sales_main where transaction_date >= add_months(current_date(),-6)
group by sku,
'Refund'
date_trunc('month',transaction_date),
'Y'

SQL Query to get the Max Date of a certain Status and subract that from the Max Date of another Status

An example would be.. Say a ticket is in New status. I want to get the MAX Date of New Status and the Max date of Completed Status and calculate the difference between the MAX Completed Status from the MAX New Status
ex.
SELECT t.ID,
MAX(update_date) WHERE t.status = 'New' start_time,
MAX(update_date) WHERE t.status = 'Completed' stop_time,
DATEDIFF(second, MAX(update_date), MAX(update_date)) elapsed_sec
FROM xxx.dbo t
GROUP BY t.ID;
Thank you so much,
P
SELECT
t.id
,DATEDIFF(second, start_time, stop_time) elapsed_sec
FROM (
SELECT
ID,
(SELECT MAX(update_date) from xxx.dbo WHERE status = 'New' AND ID=t2.ID) start_time,
(SELECT MAX(update_date) from xxx.dbo WHERE status = 'Completed' AND ID=t2.ID) stop_time
FROM xxx.dbo t2
) t
I would suggest doing this using condition aggregation and not with correlated subqueries:
SELECT t.ID,
MAX(CASE WHEN t.status = 'New' THEN update_date END) as start_time,
MAX(CASE WHEN t.status = 'Completed' THEN update_date END) as stop_time,
MAX(update_date) WHERE t.status = 'Completed' stop_time,
DATEDIFF(second,
MAX(CASE WHEN t.status = 'New' THEN update_date END),
MAX(CASE WHEN t.status = 'Completed' THEN update_date END)
) as elapsed_sec
FROM xxx.dbo t
GROUP BY t.ID;

sql server select within select

say I have the following query:
select ID, ActualDate, DueDate
from table1
What I need to do is to add another field called Flag
which will be marked as "Y" if ActualDate is greater than DueDate
select ID, ActualDate, DueDate,
CASE
WHEN ActualDate > DueDate THEN 'Y'
ELSE 'N'
END as Flag
from table1
The above won't work as I get invalid column name ActualDate. Invalid column name DueDate.
What I need to do is a select within a select like this:
select ID, ActualDate, DueDate,
CASE
WHEN ActualDate > DueDate THEN 'Y'
ELSE 'N'
END as Flag
from
(select ID, ActualDate, DueDate
from table1) tbl1
)
If your table has the fields in it, then the following should work without the need for a subquery:
select ID,
ActualDate,
DueDate,
CASE
WHEN ActualDate > DueDate
THEN 'Y'
ELSE 'N'
END as Flag
FROM table1
You can use a subquery but it is unnecessary:
select ID,
ActualDate,
DueDate,
CASE
WHEN ActualDate > DueDate
THEN 'Y'
ELSE 'N'
END as Flag
FROM
(
select ID, ActualDate, DueDate
from table1
) tbl1