SQL sum of column value, unique per user per day - sql

I have a postgres table that looks like this:
id | user_id | state | created_at
The state can be any of the following:
new, paying, paid, completing, complete, payment_failed, completion_failed
I need a statement that returns a report with the following:
sum of all paid states by date
sum of all completed states by date
sum of all new, paying, completing states by date with only one per user per day to be counted
sum of all payment_failed, completion_failed by date with only one per user per day to be counted
So far I have this:
SELECT
DATE(created_at) AS date,
SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete,
SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at)
A sum of the in progress and failed states is easy enough by adding this to the select:
SUM(CASE WHEN state IN('new','paying','completing') THEN 1 ELSE 0 END) AS in_progress,
SUM(CASE WHEN state IN('payment_failed','completion_failed') THEN 1 ELSE 0 END) AS failed
But i'm having trouble figuring out how to make only one per user_id per day in_progress and failed states to be counted.
The reason I need this is to manipulate the failure rate in our stats, as many users who trigger a failure or incomplete order go on to trigger more which inflates our failure rate.
Thanking you in advance.

SELECT created_at::date AS the_date
,SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete
,SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid
,COUNT(DISTINCT CASE WHEN state IN('new','paying','completing')
THEN user_id ELSE NULL END) AS in_progress
,COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed')
THEN user_id ELSE NULL END) AS failed
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY created_at::date
I use the_date as alias, since it is unwise (while allowed) to use the key word date as identifier.
You could use a similar technique for complete and paid, one is as good as the other there:
COUNT(CASE WHEN state = 'complete' THEN 1 ELSE NULL END) AS complete

Try something like:
SELECT
DATE(created_at) AS date,
SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete,
SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid,
COUNT(DISTINCT CASE WHEN state IN('new','paying','completing') THEN user_id ELSE NULL END) AS in_progress,
COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed') THEN user_id ELSE NULL END) AS failed
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at);
The main idea - COUNT (DISTINCT ...) will count unique user_id and wont count NULL values.
Details: aggregate functions, 4.2.7. Aggregate Expressions
The whole query with same style counts and simplified CASE WHEN ...:
SELECT
DATE(created_at) AS date,
COUNT(CASE WHEN state = 'complete' THEN 1 END) AS complete,
COUNT(CASE WHEN state = 'paid' THEN 1 END) AS paid,
COUNT(DISTINCT CASE WHEN state IN('new','paying','completing') THEN user_id END) AS in_progress,
COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed') THEN user_id END) AS failed
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at);

Related

How to exclude 0 from count()? in sql?

I have a code as below where I want to count number of first purchases for a given period of time. I have a column in my sales table where if the buyer is not a first time buyer, then is_first_purchase = 0
For example:
buyer_id = 456391 is already an existing buyer who made purchases on 2 different dates.
Hence is_first_purchase column will show as 0 as per below.
If i do a count() on is_first_purchase for this buyer_id = 456391 then it should return 0 instead of 2.
My query is as follows:
with first_purchases as
(select *,
case when is_first_purchase = 1 then 'Yes' else 'No' end as first_purchase
from sales)
select
count(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
from first_purchases
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
It returned the below which is not an intended output
Appreciate if someone can help explain how to exclude is_first_purchase = 0 from the count, thanks.
Because COUNT function count when the value isn't NULL (include 0), if you don't want to count, need to let CASE WHEN return NULL
There are two ways you can count as your expectation, one is SUM other is COUNT but remove the part of else 0
SUM(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
COUNT(case when first_purchase = 'Yes' then 1 end) as no_of_first_purchases
From your question, I would combine CTE and main query as below
select
COUNT(case when is_first_purchase = 1 then 1 end) as no_of_first_purchases
from sales
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
I think that you are using COUNT() when you want SUM().
with first_purchases as
(select *,
case when is_first_purchase = 1 then 'Yes' else 'No' end as first_purchase
from sales)
select
SUM(case when first_purchase = 'Yes' then 1 else 0 end) as no_of_first_purchases
from first_purchases
where buyer_id = 456391
and date_id between '2021-02-01' and '2021-03-01'
order by 1 desc;
You could simplify your query as:
SELECT COUNT(*) AS
FROM sales no_of_first_purchases
WHERE is_first_purchase = 1
AND buyer_id = 456391
AND date_id BETWEEN '2021-02-01' AND '2021-03-01'
ORDER BY 1 DESC;
It is better to avoid the use of functions like IF and CASE when it can be done with WHERE.
The simplest approach for Trino (f.k.a. Presto SQL) is to use an aggregate with a filter:
count(name) FILTER (WHERE first_purchase = 'Yes') AS no_of_first_purchases

SQL - Dividing aggregated fields, very new to SQL

I have list of line items from invoices with a field that indicates if a line was delivered or picked up. I need to find a percentage of delivered items from the total number of lines.
SALES_NBR | Total | Deliveryrate
1 = Delivered 0 = picked up from FULFILLMENT.
SELECT SALES_NBR,
COUNT (ITEMS) as Total,
SUM (case when FULFILLMENT = '1' then 1 else 0 end) as delivered,
(SELECT delivered/total) as Deliveryrate
FROM Invoice_table
WHERE STORE IN '0123'
And SALE_DATE >='2020-02-01'
And SALE_DATE <='2020-02-07'
Group By SALES_NBR, Deliveryrate;
My query executes but never finishes for some reason. Is there any easier way to do this? Fulfillment field does not contain any NULL values.
Any help would be appreciated.
I need to find a percentage of delivered items from the total number of lines.
The simplest method is to use avg():
select SALES_NBR,
avg(fulfillment) as delivered_ratio
from Invoice_table
where STORE = '0123' and
SALE_DATE >='2020-02-01' and
SALE_DATE <='2020-02-07'
group by SALES_NBR;
I'm not sure if the group by sales_nbr is needed.
If you want to get a "nice" query, you can use subqueries like this:
select
qry.*,
qry.delivered/qry.total as Deliveryrate
from (
select
SALES_NBR,
count(ITEMS) as Total,
sum(case when FULFILLMENT = '1' then 1 else 0 end) as delivered
from Invoice_table
where STORE IN '0123'
and SALE_DATE >='2020-02-01'
and SALE_DATE <='2020-02-07'
group by SALES_NBR
) qry;
But I think this one, even being ugglier, could perform faster:
select
SALES_NBR,
count(ITEMS) as Total,
sum(case when FULFILLMENT = '1' then 1 else 0 end) as delivered,
sum(case when FULFILLMENT = '1' then 1 else 0 end)/count(ITEMS) as Deliveryrate
from Invoice_table
where STORE IN '0123'
and SALE_DATE >='2020-02-01'
and SALE_DATE <='2020-02-07'
group by SALES_NBR

SQL Sum Case when State equals

I'm looking for SQL coding that will sum the count when a certain state appears and have that sum for the state's particular row. I was able to create two columns to sum the count for a specific state but then that number is in every row.
For example, if there are 24 Arizona records then I went 24 to appear in every row for Arizona. And if there are 58 Oregon records then I want 58 to appear in every row for Oregon. And so on...
This is what I currently have
select appid, rcvddt, state,
sum(count(case when state = 'OR' then 1 else null end)) over () as ORcount,
sum(count(case when state = 'AZ' then 1 else null end)) over () as AZcount
from smbus.submissions
where (apprcvddt >= '2017-08-01' and apprcvddt <= '2018-08-31')
group by state, rcvddt, appid
order by (case when state is null then 1 else 0 end), state
I think you just want count() as a window function:
select s.*,
count(*) over (partition by state) as state_cnt
from smbus.submissions s
where apprcvddt >= '2017-08-01' and apprcvddt <= '2018-08-31'
order by (case when state is null then 1 else 0 end), state

SQL select grouping and subtract

i have table named source table with data like this :
And i want to do query that subtract row with status plus and minus to be like this group by product name :
How to do that in SQL query? thanks!
Group by the product and then use a conditional SUM()
select product,
sum(case when status = 'plus' then total else 0 end) -
sum(case when status = 'minus' then total else 0 end) as total,
sum(case when status = 'plus' then amount else 0 end) -
sum(case when status = 'minus' then amount else 0 end) as amount
from your_table
group by product
There is another method using join, which works for the particular data you have provided (which has one "plus" and one "minus" row per product):
select tplus.product, (tplus.total - tminus.total) as total,
(tplus.amount - tminus.amount) as amount
from t tplus join
t tminus
on tplus.product = tminus.product and
tplus.status = 'plus' and
tplus.status = 'minus';
Both this and the aggregation query work well for the data you have provided. In other words, there are multiple ways to solve this problem (each has its strengths).
you can query as below:
select product , sum (case when [status] = 'minus' then -Total else Total end) as Total
, sum (case when [status] = 'minus' then -Amount else Amount end) as SumAmount
from yourproduct
group by product

Funnel query with Amazon Redshift / PostgreSQL

I'm trying to analyze a funnel using event data in Redshift and have difficulties finding an efficient query to extract that data.
For example, in Redshift I have:
timestamp action user id
--------- ------ -------
2015-05-05 12:00 homepage 1
2015-05-05 12:01 product page 1
2015-05-05 12:02 homepage 2
2015-05-05 12:03 checkout 1
I would like to extract the funnel statistics. For example:
homepage_count product_page_count checkout_count
-------------- ------------------ --------------
100 50 25
Where homepage_count represent the distinct number of users who visited the homepage, product_page_count represents the distinct numbers of users who visited the homepage after visiting the homepage, and checkout_count represents the number of users who checked out after visiting the homepage and the product page.
What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?
I think the best method might be to add flags to the data for the first visit of each type for each user and then use these for aggregation logic:
select sum(case when ts_homepage is not null then 1 else 0 end) as homepage_count,
sum(case when ts_productpage > ts_homepage then 1 else 0 end) as productpage_count,
sum(case when ts_checkout > ts.productpage and ts.productpage > ts.homepage then 1 else 0 end) as checkout_count
from (select userid,
min(case when action = 'homepage' then timestamp end) as ts_homepage,
min(case when action = 'product page' then timestamp end) as ts_productpage,
min(case when action = 'checkout' then timestamp end) as ts_checkout
from table t
group by userid
) t
The above answer is very much correct . I have modified it for people using it for AWS Mobile Analytics and Redshift.
select sum(case when ts_homepage is not null then 1 else 0 end) as homepage_count,
sum(case when ts_productpage > ts_homepage then 1 else 0 end) as productpage_count,
sum(case when ts_checkout > ts_productpage and ts_productpage > ts_homepage then 1 else 0 end) as checkout_count
from (select client_id,
min(case when event_type = 'App Launch' then event_timestamp end) as ts_homepage,
min(case when event_type = 'SignUp Success' then event_timestamp end) as ts_productpage,
min(case when event_type = 'Start Quiz' then event_timestamp end) as ts_checkout
from awsma.v_event
group by client_id
) ts;
Just in case more precise model required: when product page can be opened twice. First time before home page and second one after. This case usually should be considered as conversion as well.
Redshift SQL query:
SELECT
COUNT(
DISTINCT CASE WHEN cur_homepage_time IS NOT NULL
THEN user_id END
) Step1,
COUNT(
DISTINCT CASE WHEN cur_homepage_time IS NOT NULL AND cur_productpage_time IS NOT NULL
THEN user_id END
) Step2,
COUNT(
DISTINCT CASE WHEN
cur_homepage_time IS NOT NULL AND cur_productpage_time IS NOT NULL AND cur_checkout_time IS NOT NULL
THEN user_id END
) Step3
FROM (
SELECT
user_id,
timestamp,
COALESCE(homepage_time,
LAG(homepage_time) IGNORE NULLS OVER(PARTITION BY user_id
ORDER BY time)
) cur_homepage_time,
COALESCE(productpage_time,
LAG(productpage_time) IGNORE NULLS OVER(PARTITION BY distinct_id
ORDER BY time)
) cur_productpage_time,
COALESCE(checkout_time,
LAG(checkout_time) IGNORE NULLS OVER(PARTITION BY distinct_id
ORDER BY time)
) cur_checkout_time
FROM
(
SELECT
timestamp,
user_id,
(CASE WHEN event = 'homepage'
THEN timestamp END) homepage_time,
(CASE WHEN event = 'product page'
THEN timestamp END) productpage_time,
(CASE WHEN event = 'checkout'
THEN timestamp END) checkout_time
FROM events
WHERE timestamp > '2016-05-01' AND timestamp < '2017-01-01'
ORDER BY user_id, timestamp
) event_times
ORDER BY user_id, timestamp
) event_windows
This query fills each row's cur_homepage_time, cur_productpage_time and cur_checkout_time with recent timestamp of event occurrences. So in case for some specific time (read row) event occured then particular column is not NULL.
More info here.