Select customers who purchased BOTH this year and last - sql

I'm fairly new to SQL and I'm using BigQuery to find customers who purchased in 2019 and 2018.
This is the query I'm using to find the customers who purchased in 2019.
SELECT DISTINCT contact_email
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
FROM `table.orders`
) orders -- identify duplicate rows
WHERE
instance = 1
AND processed_at between '2019-01-01 00:00:00 UTC' AND '2020-01-01 00:00:00 UTC'
I'm struggling now with how to pull in distinct users who purchased this year AND last year. Can anyone point me in the correct direction? Thank you.

Hmmm. I think I might do this as an aggregation query:
select o.contact_email
from `table.orders o`
where instance = 1 and
processed_at >= timestamp('2018-01-01') and
processed_at < timestamp('2020-01-01')
group by o.contact_email
having count(distinct year(processed_at)) = 2;

You can use aggregation:
select contact_email
from `table.orders`
where
instance = 1
and processed_at >= timestamp('2018-01-01')
and processed_at < timestamp('2020-01-01')
group by contact_email
having
max(case
when processed_at >= timestamp('2019-01-01')
and processed_at < timestamp('2020-01-01')
then 1 end
) = 1
and max(case
when processed_at >= timestamp('2018-01-01')
and processed_at < timestamp('2019-01-01')
then 1 end
) = 1

Related

How to create a Postgres query that will generate a series with calculated values

I have this query that I use to calculate returning customers (with more than one order)
SELECT COUNT(*)
FROM (SELECT customer_id, COUNT(*) as order_count
FROM orders
WHERE shop_id = #{shop_id}
AND orders.active = true
AND orders.created_at >= '#{from}'
AND orders.created_at < '#{to}'
GROUP BY customer_id
HAVING COUNT(orders) > 1
ORDER BY order_count) src;
And if I want new customers (that have only one order) I simply change this line:
HAVING COUNT(orders) = 1
Now, how can I generate a series between 2 given dates that will give me the number of new and returning customers for each day between the dates?
Expected result:
date
new
returning
2022-01-01
2
3
2022-01-02
5
9
I have tried this but doesn't work at all (error syntax near from is the error I'm getting) and I'm not sure how to fix. Ideas?
select *, return_customers
from (select created_at, count(*) as order_count
from orders
where shop_id = 43
and created_at >= '2022-07-01'
and created_at < '2022-07-10'
group by customer_id
having count(orders) > 1
order by order_count) as return_customers from generate_series(timestamp '2007-01', timestamp '2022-07-11', interval '1 day')
as g(created_at)
left join (
select created_at::date,
count(*) as order_count
from orders
where shop_id 43
and created_at >= '2022-07-01'
and created_at < '2022-07-10'
group by customer_id
having count(orders) > 1
order by order_count
group by 1) o using (created_at)) sub
order by created_at desc;
This is based on your initial query w/o the having clause and conditional counts with filter. order by in src is redundant too.
SELECT src.order_date as "date",
COUNT(*) filter (where order_count > 1) as "returning",
COUNT(*) filter (where order_count = 1) as "new"
FROM
(
SELECT date_trunc('day', o.created_at)::date as order_date,
COUNT(*) as order_count
FROM orders o
WHERE o.shop_id = #{shop_id}
AND o.active
AND o.created_at >= '#{from}'
AND o.created_at < '#{to}'
GROUP BY o.customer_id, order_date
) as src
group by order_date;

Based on todays date, how to get the date of the penultimate working day?

I try to figure out, how I can get the penultimate workingday from todays date.
In my query, I would like to add an where clause where a specific date is <= today´s date minus 2 working days.
Like:
SELECT
SalesAmount
,SalesDate
FROM mytable t
JOIN D_Calendar c ON t.Date = c.CAL_DATE
WHERE SalesDate <= GETDATE()- 2 workingdays
I have a calendar table with a column "isworkingDay" in my database and I think i have to use this but i don´t know how?!
Structure of this table is like:
CAL_DATE
DayIsWorkDay
2022-07-28
1
2022-07-29
1
2022-07-30
0
2022-07-31
0
2022-08-01
1
One example: Today is Monday, August 01, 2022. So based on today, I need to get Thursday, July 28 2022.
My desired result in the where clause should get me something like this:
where SalesDate<= Getdate() minus 2 workingdays
Thanks for your ideas!
You could use something like this:
SELECT t.SalesDate,
PreviousWorkingDay = d.CAL_DATE
FROM mytable t
CROSS APPLY
( SELECT c.CAL_DATE
FROM D_Calendar AS c
WHERE c.CAL_DATE < t.SalesDate
AND c.DayIsWorkDay = 1
ORDER BY c.CAL_DATE DESC OFFSET 1 ROWS FETCH NEXT 1 ROW ONLY
) AS d;
It uses OFFSET 1 ROWS within the CROSS APPLY to get the penultimate working day
This is how i implemented the idea from #SMor:
SELECT
SalesAmount
,SalesDate
FROM mytable t
JOIN D_Calendar c ON t.Date = c.CAL_DATE
WHERE SalesDate <= (SELECT
MIN(t1.CAL_DATE) as MinDate
FROM
(SELECT TOP 2
[CAL_DATE]
FROM [DWH_PROD].[cbi].[D_Calendar]
WHERE CAL_DAYISWORKDAY = 1 AND CAL_DATE < DATEADD(dd,0,DATEDIFF(dd,0,GETDATE()))
ORDER BY CAL_DATE DESC
) t1)
Thank you for your ideas and recommendations!
You can use a ROW_NUMBER() OVER(ORDER BY CAL_DATE desc) getting get the top 2 rows then take the row with number 2.
Example:
-- setup
Declare #D_Calendar as Table (CAL_DATE date, DayIsWorkDay bit)
insert into #D_Calendar values('2022-07-27', 1)
insert into #D_Calendar values('2022-07-28', 1)
insert into #D_Calendar values('2022-07-29', 1)
insert into #D_Calendar values('2022-07-30', 0)
insert into #D_Calendar values('2022-07-31', 0)
insert into #D_Calendar values('2022-08-01', 1)
Declare #RefDate DateTime = '2022-08-01 10:00'
-- example query
Select CAL_DATE
From
(Select top 2 ROW_NUMBER() OVER(ORDER BY CAL_DATE desc) AS BusinessDaysBack, CAL_DATE
from #D_Calendar
where DayIsWorkDay = 1
and CAL_DATE < Cast(#RefDate as Date)) as Data
Where BusinessDaysBack = 2
From there you can plug that into your where clause to get :
SELECT
SalesAmount
,SalesDate
FROM mytable t
WHERE SalesDate <= (Select CAL_DATE
From (Select top 2 ROW_NUMBER() OVER(ORDER BY CAL_DATE desc) AS BusinessDaysBack, CAL_DATE
from D_Calendar
where DayIsWorkDay = 1
and CAL_DATE < Cast(getdate() as Date)) as Data
Where BusinessDaysBack = 2)
Change the 2 to 3 to go three days back etc

Improve query to be less repetitive

Is there a way to improve this query? I see two problems here -
Repetitive code
Hard coded strings
The first CTE calculates count based on 18 months. The second CTE calculates count based on 12 months.
with month_18 as (
select proc_cd, count(*) as month_18 from
(
select distinct patient, proc_cd from
service
where proc_cd = '35'
and month_id >= (select month_id from annual)
and month_id <= '202009' --This month should be 18 months from the month above
and length(patient) > 1
) a
group by proc_cd
),
month_12 as
(
select proc_cd, count(*) as month_12 from
(
select distinct patient_id, proc_cd from
service
where proc_cd = '35'
and month_id >= '201910'
and month_id <= '202009' --This month should be 12 months from the month above
and length(patient) > 1
) a
group by proc_cd
)
select a.*, b.month_12 from
month_18 a
join month_12 b
on a.proc_cd = b.proc_cd
If I understand correctly, you can use conditional aggregation:
select proc_cd,
count(distinct patient) filter (where month_id >= (select month_id from annual) and month_id <= '202009') as month_18,
count(distinct patient) filter (where month_id >= '201910' and month_id <= '202009')
from service
where proc_cd = 35 and
length(patient) > 1
group by proc_cd;
If you have to deal with date arithmetic on the month ids, you can convert to a date, do the arithmetic and convert back to a string:
select to_char(to_date(month_id, 'YYYYMM') - interval '12 month', 'YYYYMM')
from (values ('202009')) v(month_id);

How to define the filter in dates?

With the query, I basically want to compare avg_clicks at different time periods and set a filter according to the avg_clicks.
The below query gives us avg_clicks for each shop in January 2020. But I want to see the avg_clicks that is higher than 0 in January 2020.
Question 1: When I add the where avg_clicks > 0 in the query, I am getting the following error: Column 'avg_clicks' cannot be resolved. Where to put the filter?
SELECT AVG(a.clicks) AS avg_clicks,
a.shop_id,
b.shop_name
FROM
(SELECT SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= CAST('2020-01-01' AS date)
AND date <= CAST('2020-01-31' AS date)
GROUP BY shop_id, date) as a
JOIN Y as b
ON a.shop_id = b.shop_id
GROUP BY a.shop_id, b.shop_name
Question 2: As I wrote, I want to compare two different times. And now, I want to see avg_clicks that is 0 in February 2020.
As a result, the desired output will show me the list of shops that had more than 0 clicks in January, but 0 clicks in February.
Hope I could explain my question. Thanks in advance.
For your Question 1 try to use having clause. Read execution order of SQL statement which gives you a better idea why are you getting avg_clicks() error.
SELECT AVG(a.clicks) AS avg_clicks,
a.shop_id,
b.shop_name
FROM
(SELECT SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-01-01'
AND date <= '2020-01-31'
GROUP BY shop_id, date) as a
JOIN Y as b
ON a.shop_id = b.shop_id
GROUP BY a.shop_id, b.shop_name
HAVING AVG(a.clicks) > 0
For your Question 2, you can do something like this
SELECT
shop_id,
b.shop_name,
jan_avg_clicks,
feb_avg_clicks
FROM
(
SELECT
AVG(clicks) AS jan_avg_clicks,
shop_id
FROM
(
SELECT
SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-01-01'
AND date <= '2020-01-31'
GROUP BY
shop_id,
date
) as a
GROUP BY
shop_id
HAVING AVG(clicks) > 0
) jan
join
(
SELECT
AVG(clicks) AS feb_avg_clicks,
shop_id
FROM
(
SELECT
SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-02-01'
AND date < '2020-03-01'
GROUP BY
shop_id,
date
) as a
GROUP BY
shop_id
HAVING AVG(clicks) = 0
) feb
on jan.shop_id = feb.shop_id
join Y as b
on jan.shop_id = b.shop_id
Start with conditional aggregation:
SELECT shop_id,
SUM(CASE WHEN DATE_TRUNC('month', date) = '2020-01-01' THEN clicks_on END) / COUNT(DISTINCT date) as avg_clicks_jan,
SUM(CASE WHEN DATE_TRUNC('month', date) = '2020-02-01' THEN clicks_on END) / COUNT(DISTINCT date) as avg_clicks_feb
FROM X
WHERE site = 'com' AND
date >= '2020-01-01' AND
date < '2020-03-01'
GROUP BY shop_id;
I'm not sure what comparison you want to make. But if you want to filter based on the aggregated values, use a HAVING clause.

How to continous count by using SQL Oracle

Regarding my table as below
WORKING_CALENDAR_TABLE
===================================================
EMPLOYEE ID | DATE | WORKING DAY (0: Holiday; 1: WORKING DAY)
===================================================
02661 2017/12/01 1
02661 2017/12/02 1
02661 2017/12/03 0
02661 2017/12/04 0
02661 2017/12/05 0
02661 2017/12/06 1
02661 2017/12/07 1
02661 2017/12/08 1
02661 2017/12/09 1
When 2017/12/10, my expected result as below
===================================================
EMPLOYEE ID | CONTINOUS WORKING DAY
===================================================
02661 4
IF WE USE SQL ORACLE, CAN WE UTILIZE SQL ORACLE to got this result ?
One way of doing it can be by:
Getting the last day before your reference date (2017/12/10) where the employee didn´t work.
Counting the rows after the date in 1. and before your reference date. Forcibly, each row represents a working day for the employee.
Here's the code with some comments:
Select employee_id, count(*) as continuous_days from mytable
where employee_id = '02661' and date > (select max(date)
where employee_id ='02661' and working_day = 0 and date date <
'2017/12/10') and date <
'2017/12/10' group by employee_id
/* (select max(date)
where employee_id ='02661' and working_day = 0 and date <
'2017/12/10') gets the last day where the employee didn't work before the reference date. Each row from a date after max(date) represent a working day for the employee because all of them are going to to have working_day = 1*/
Something to improve here:
In the case the employee had no holidays before a reference date then
select max(date)
where employee_id ='02661' and working_day = 0 and date date <
'2017/12/10' will return null so you could use COALESCE to prevent errors and in the case of a null, you get some other value in return. I think that in your case it would suffice with a very early date. You could use it this way:
COALESCE( (select max(date)
where employee_id ='02661' and working_day = 0 and date <
'2017/12/10'), '1900-01-01')
and the complete query would be:
select employee_id, count(*) as continuous_days from mytable
where employee_id = '02661' and date > COALESCE( (select max(date)
where employee_id ='02661' and working_day = 0 and date <
'2017/12/10'), '1900-01-01') and date <
'2017/12/10' group by employee_id
For a given date, the most general solution is:
select employee_id, count(*)
from t
where employee_id = '02661' and
date < date '2017-12-10' and
date > (select max(t2.date)
from t t2
where t2.employee_id = t.employee_id and t2.date < date '2017-12-10'
);
You can try this query:
SELECT T.EMPLOYEE_ID,
COUNT(0) AS CONTINUOUS_DAYS
FROM WORKING_CALENDAR_TABLE T
WHERE T.WORK_DATE BETWEEN (SELECT MAX(WORK_DATE) + 1
FROM WORKING_CALENDAR_TABLE I
WHERE I.EMPLOYEE_ID = T.EMPLOYEE_ID
AND I.WORK_DAY = 0)
AND DATE '2017-12-10' + 1
GROUP BY T.EMPLOYEE_ID