I want to create a column that shows whether it is the max order_status as TRUE or FALSE based on created_at.
Is there a way to achieve this without a subquery in Snowflake?
Here is my example data:
WITH t1 AS (
SELECT 'A' AS id, 'created' AS status, '2021-05-18 18:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-05-19 11:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-05-19 12:00:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-05-20 18:30:00'::timestamp AS created_at
)
Using windowed MAX:
WITH t1(id, status, created_at) AS (
SELECT 'A', 'created', '2021-05-18 18:30:00'::timestamp UNION ALL
SELECT 'A', 'created', '2021-05-19 11:30:00'::timestamp UNION ALL
SELECT 'A', 'pending', '2021-05-19 12:00:00'::timestamp UNION ALL
SELECT 'A', 'successful', '2021-05-20 18:30:00'::timestamp AS created_at
)
SELECT *, created_at = MAX(created_at) OVER(PARTITION BY ID) AS is_final_order_status
FROM t1;
Output:
A cased row_number could work
SELECT id, status, created_at
, CASE
WHEN 1 = ROW_NUMBER() OVER (PARTITION BY id ORDER BY created_at DESC)
THEN 'TRUE'
ELSE 'FALSE'
END is_final_order_status
FROM t1
Related
I want to create a column that flags an id it has a straight order process. i.e. id’s which don’t have order_status pending or info_required.
e.g. id a has pending, so is_straight will be false. b has no pending or info_required, so it should be true.
Here is the example data:
WITH t1 AS (
SELECT 'a' AS id, 'created' AS status, '2021-11-02 15:04:07'::timestamp AS created_at UNION ALL
SELECT 'a' AS id, 'created' AS status, '2021-11-03 13:23:34'::timestamp AS created_at UNION ALL
SELECT 'a' AS id, 'pending' AS status, '2021-11-07 04:04:46'::timestamp AS created_at UNION ALL
SELECT 'a' AS id, 'successful' AS status, '2021-11-07 13:25:05'::timestamp AS created_at UNION ALL
SELECT 'b' AS id, 'created' AS status, '2021-11-11 16:19:07'::timestamp AS created_at UNION ALL
SELECT 'b' AS id, 'successful' AS status, '2021-11-13 17:57:55'::timestamp AS created_at UNION ALL
SELECT 'c' AS id, 'created' AS status, '2021-11-15 01:09:23'::timestamp AS created_at UNION ALL
SELECT 'c' AS id, 'info_required' AS status, '2021-11-17 11:06:00'::timestamp AS created_at UNION ALL
SELECT 'c' AS id, 'successful' AS status, '2021-11-21 23:35:46'::timestamp AS created_at
)
Using windowed COUNT_IF:
SELECT *,
COUNT_IF(order_status IN ('pending', 'info_required')) OVER(PARTITION BY id) = 0
AS is_straight
FROM t1;
Output:
Please help me with the BigQuery query. I need to build a closed funnel of user steps events in a mobile app for a week.
The table looks like this:
It is necessary to collect all unique users who have passed from step 1 to step 2 and so on to step 6 during this period. Between these steps, they could do something else, be distracted by other events. But what is important is the passage of each unique user through these steps in a given period of time.
Please tell me how to create such a funnel?
There can be multiple ways of achieving this. Here is an approach using identical sample data, which is not the most optimal but is very self-explanatory and definite:
with data as (
select 'a' as user_id, cast('2020-01-01 04:45:00' as timestamp) as event_timestamp, '1' as step_name
union all
select 'b' as user_id, cast('2020-01-01 04:50:00' as timestamp) as event_timestamp, '1' as step_name
union all
select 'a' as user_id, cast('2020-01-01 05:00:00' as timestamp) as event_timestamp, '2' as step_name
union all
select 'a' as user_id, cast('2020-01-01 05:15:00' as timestamp) as event_timestamp, '3' as step_name
union all
select 'b' as user_id, cast('2020-01-01 04:55:00' as timestamp) as event_timestamp, '2' as step_name
union all
select 'c' as user_id, cast('2020-01-01 04:58:00' as timestamp) as event_timestamp, '1' as step_name
union all
select 'a' as user_id, cast('2020-01-01 05:16:00' as timestamp) as event_timestamp, '4' as step_name
union all
select 'b' as user_id, cast('2020-01-01 05:16:00' as timestamp) as event_timestamp, '3' as step_name
),
data2 as (
select a.user_id, a.step_name step_1, b.step_name step_2, c.step_name step_3, d.step_name step_4 from ( select user_id, event_timestamp, step_name from data where step_name = '1') a
left join data b on (a.user_id = b.user_id and a.event_timestamp < b.event_timestamp and b.step_name = '2')
left join data c on (b.user_id = c.user_id and b.event_timestamp < c.event_timestamp and c.step_name = '3')
left join data d on (c.user_id = d.user_id and c.event_timestamp < d.event_timestamp and d.step_name = '4')
)
select * from (
select 'step_1' as event_name, count(distinct user_id) as n_users from data2 where step_1 is not null
group by 1
union all
select 'step_2' as event_name, count(distinct user_id) as n_users from data2 where (step_1 is not null and step_2 is not null)
group by 1
union all
select 'step_3' as event_name, count(distinct user_id) as n_users from data2 where (step_1 is not null and step_2 is not null and step_3 is not null)
group by 1
union all
select 'step_4' as event_name, count(distinct user_id) as n_users from data2 where (step_1 is not null and step_2 is not null and step_3 is not null and step_4 is not null)
group by 1
)
order by 1
You can further optimize this based on your specific filters, conditions, etc.
I am still learning to write it and below query is not giving me accurate results plus it is also not optimized.
So one of the main things that I am trying to do is create 3 date types:
sales_shift,
refund,
sales_reg
in my data sets.
To achieve this I am doing a union.
All 3 data sets are querying the same source sales_main. The problem is in my 2nd data set with ‘Refund’ as date_type, it is not pulling the actual rows because of ColumnA = ‘B’ condition.
I would like this set to look at only those records that were pulled in 1st data set (i.e ‘Sales_shift’ as date_type) and then apply the condition ColumnA = ‘B’. How do I do that?
My 3rd data set with ‘sales_reg’ as date_type should be same as 1st set except with transaction_date not shifted. How do I do that. I was thinking of where exists but do not know how to apply it.
Any help would be awesome. Thanks much
create table sales_refund as select * from (
with table1 as (select 'Sales_shift' as date_type,
add_months(date_trunc('month',transaction_date),1) as event_date,
sku,
sum(sales) as Sales_total,
sum(refund) as Refund_total,
case when region in ('US','EU') then 'type A' else 'type B' end as Flag_field1
from sales_main where transaction_date >= add_months(current_date(),-6)
group by sku,
'Sales_shift'
add_months(date_trunc('month',transaction_date),1),
Flag_field1),
table2 as (select sku,
date_type,
event_date,
Sales_total,
Refund_total,
Flag_field1,
case when Flag_field1 = 'type B' or (Flag_field1 = 'type A' and Sales_total > 20000) then 'Yes' else 'No' end as Flag_field2
from table1 )
Select sku,
date_type,
event_date,
Sales_total,
Refund_total,
Flag_field1
fromm table2 where Flag_field2 = 'Yes' )
union all
select sku,
'Refund' as date_type,
date_trunc('month',refund_date) as event_date,
sum(sales) as Sales_total,
sum(refund) as Refund_total,
'X' as Flag_field1
from sales_main where transaction_date >= add_months(current_date(),-6) and ColumnA = 'B'
group by sku,
'Refund'
date_trunc('month',refund_date),
'X'
union all
select sku,
'Sales_Reg' as date_type,
date_trunc('month',transaction_date) as event_date,
sum(sales) as Sales_total,
sum(refund) as Refund_total,
'Y' as Flag_field1
from sales_main where transaction_date >= add_months(current_date(),-6)
group by sku,
'Refund'
date_trunc('month',transaction_date),
'Y'
Users table:
user_id name
1 john
2 mark
3 scott
4 piter
user_products table:
user_id product_id
1 2
1 4
1 5
2 4
2 5
2 7
3 1
3 5
3 4
3 2
4 1
As we see, users 1,2,3 all have products 4 and 5. So how to select users with at least 2 identical products ?
One option is to use a self join:
SELECT
u.user_id,
u.name
FROM user_products up1
INNER JOIN user_products up2
ON up1.product_id = up2.product_id AND
up1.user_id <> up2.user_id
INNER JOIN Users u
ON up1.user_id = u.user_id
GROUP BY
u.user_id,
u.name
HAVING
COUNT(DISTINCT up1.product_id) > 1;
Demo
The idea here is to try to match each of the records for a given user to records from a different user, but having the same product. The aggregation step then checks if a given user still has at least two products after the inner join, which implies that in fact he does have at least two products in common with some other user.
The matching users here are: john, mark, and scott
A self join is the right approach, but I think the right logic is:
SELECT up1.user_id, up2.user_id
FROM user_products up1 JOIN
user_products up2
ON up1.product_id = up2.product_id AND
up1.user_id < up2.user_id
GROUP BY up1.user_id, up2.user_id
HAVING COUNT(DISTINCT up1.product_id) >= 2;
If you want the list of products, you can include array_agg(distinct up1.product_id).
You can try this as well:
select distinct b.user_id from #UserProduct b
join (
select count(1) cnt,product_id from #UserProduct
group by product_id
having count(1) = 3) c
on b.product_id=c.product_id
This time, I hope can help you
Select * into #User From (
Select '1' [user_id], 'john' [user_name] Union All
Select '2' [user_id], 'mark' [user_name] Union All
Select '3' [user_id], 'scott' [user_name] Union All
Select '4' [user_id], 'piter' [user_name]
) A
Select * into #UserProduct From (
Select '1' [user_id], '2' [product_id] union All
Select '1' [user_id], '4' [product_id] union All
Select '1' [user_id], '5' [product_id] union All
Select '2' [user_id], '4' [product_id] union All
Select '2' [user_id], '5' [product_id] union All
Select '2' [user_id], '7' [product_id] union All
Select '3' [user_id], '1' [product_id] union All
Select '3' [user_id], '5' [product_id] union All
Select '3' [user_id], '4' [product_id] union All
Select '3' [user_id], '2' [product_id] union All
Select '4' [user_id], '1' [product_id]
) A
Select U1.[user_id] From (
Select
A.[Product_id] Product_id1,
B.[Product_id] Product_id2
From (
Select [Product_id] From #UserProduct
Group By [Product_id]
) A
Left Join (
Select [Product_id] From #UserProduct
Group By [Product_id]
) B On 1 = 1
Where A.[Product_id] < B.[Product_id]
) Product2
Left Join (
Select [user_id] From #UserProduct
Group By [user_id]
) [User] On 1 = 1
Left Join #UserProduct U1 On U1.[user_id] = [User].[user_id] and U1.Product_id = Product_id1
Left Join #UserProduct U2 On U2.[user_id] = [User].[user_id] and U2.Product_id = Product_id2
Where (U1.[user_id] Is Not Null And U2.[user_id] Is Not Null)
Group By U1.[user_id]
Result :
Hope can help you
Select * into #User From (
Select '1' [user_id], 'john' [user_name] Union All
Select '2' [user_id], 'mark' [user_name] Union All
Select '3' [user_id], 'scott' [user_name] Union All
Select '4' [user_id], 'piter' [user_name]
) A
Select * into #UserProduct From (
Select '1' [user_id], '2' [product_id] union All
Select '1' [user_id], '4' [product_id] union All
Select '1' [user_id], '5' [product_id] union All
Select '2' [user_id], '4' [product_id] union All
Select '2' [user_id], '5' [product_id] union All
Select '2' [user_id], '7' [product_id] union All
Select '3' [user_id], '1' [product_id] union All
Select '3' [user_id], '5' [product_id] union All
Select '3' [user_id], '4' [product_id] union All
Select '3' [user_id], '2' [product_id] union All
Select '4' [user_id], '1' [product_id]
) A
Select U.[user_id], U.[user_name], P.[qty_product]
From #User U
Left Join (
Select
[user_id], Count(*) [qty_product]
From #UserProduct
Group By [user_id]
Having Count(*) > 1
) P On P.[user_id] = U.[user_id]
Result :
An example would be.. Say a ticket is in New status. I want to get the MAX Date of New Status and the Max date of Completed Status and calculate the difference between the MAX Completed Status from the MAX New Status
ex.
SELECT t.ID,
MAX(update_date) WHERE t.status = 'New' start_time,
MAX(update_date) WHERE t.status = 'Completed' stop_time,
DATEDIFF(second, MAX(update_date), MAX(update_date)) elapsed_sec
FROM xxx.dbo t
GROUP BY t.ID;
Thank you so much,
P
SELECT
t.id
,DATEDIFF(second, start_time, stop_time) elapsed_sec
FROM (
SELECT
ID,
(SELECT MAX(update_date) from xxx.dbo WHERE status = 'New' AND ID=t2.ID) start_time,
(SELECT MAX(update_date) from xxx.dbo WHERE status = 'Completed' AND ID=t2.ID) stop_time
FROM xxx.dbo t2
) t
I would suggest doing this using condition aggregation and not with correlated subqueries:
SELECT t.ID,
MAX(CASE WHEN t.status = 'New' THEN update_date END) as start_time,
MAX(CASE WHEN t.status = 'Completed' THEN update_date END) as stop_time,
MAX(update_date) WHERE t.status = 'Completed' stop_time,
DATEDIFF(second,
MAX(CASE WHEN t.status = 'New' THEN update_date END),
MAX(CASE WHEN t.status = 'Completed' THEN update_date END)
) as elapsed_sec
FROM xxx.dbo t
GROUP BY t.ID;