How to get a timeline with zero observations for stores not selling - sql

I have an issue with a table I would like to create.
I have a table for accumulated sales for each store for each day where the store had a sale. Meaning that if the store didn't have a sale on a specific day, there's no line for this observation.
what I would like is a line for each store for each day, also the days where the store didn't have any sales, in this case the daily sales would just be zero.
I've tried making a full outer join between a daily generate_series and the table mentioned above.
select
timeline::date as date,
store_rev.store_name,
store_rev.store_daily_rev
FROM generate_series(
'2017-03-01',
now(),
'1 day') AS timeline
FULL OUTER JOIN(select
r.date,
r.store_name,
r.store_daily_rev,
FROM revenue r) store_rev ON timeline.date=store_rev.date
But this doesn't give me a row of zero if the store didn't have any sales.
Hope you guys can help me out. Thanks!

Are you sure that column date in table revenue is type DATE
I modifited your query, try this:
select
timeline::date as date,
store_rev.store_name,
store_rev.store_daily_rev
from
(
select
timeline::date as date
from
generate_series('2017-03-01', now(),'1 day') AS timeline
) timeline
JOIN(select
r.date,
r.store_name,
r.store_daily_rev,
FROM revenue r) store_rev ON timeline.date::date = store_rev.date::date;

I think you want a cross join followed by a left join. This will get all combinations:
select timeline.date, s.store_name,
coalesce(r.store_daily_rev, 0) as store_daily_rev
from generate_series('2017-03-01', now(), '1 day') as timeline(date) cross join
(select distinct r.store_name from revenue r) s left join
revenue r
on r.store_name = s.store_name and r.date = timeline.date;
The cross join generates all the rows (the combinations of dates and stores). The left join brings in the revenue values for the store/dates that have them.

Related

Postgresql left join date_trunc with default values

I have 3 tables which I'm querying to get the data based on different conditions. I have from and to params and these are the ones I'm using to create a range of time in which I'm looking for the data in those tables.
For instance if I have from equals to '2020-07-01' and to equals to '2020-08-01' I'm expecting to receive the grouped row values of the tables by week, if in some case some of the weeks don't have records I want to return 0, if some tables have records for the same week, I'd like to sum them.
Currently I have this:
SELECT d.day, COALESCE(t.total, 0)
FROM (
SELECT day::date
FROM generate_series(timestamp '2020-07-01',
timestamp '2020-08-01',
interval '1 week') day
) d
LEFT JOIN (
SELECT date AS day,
SUM(total)
FROM table1
WHERE id = '1'
AND date BETWEEN '2020-07-01' AND '2020-08-01'
GROUP BY day
) t USING (day)
ORDER BY d.day;
I'm generating a series of dates grouped by week, and on top of that I'm doing adding a left join. Now for some reason, it only works if the dates match completely, otherwise COALESCE(t.total, 0) returns 0 even if in that week the SUM(total) is not 0.
The same way I'm applying the LEFT JOIN, I'm using other left joins with other tables in the same query, so I'm falling with the same problem.
Please see if this works for you. Whenever you find yourself aggregating more than once, ask yourself whether it is necessary.
Rather than try to match on discrete days, use time ranges.
with limits as (
select '2020-07-01'::timestamp as dt_start,
'2020-08-01'::timestamp as dt_end
), weeks as (
SELECT x.day::date as day, least(x.day::date + 7, dt_end::date) as day_end
FROM limits l
CROSS JOIN LATERAL
generate_series(l.dt_start, l.dt_end, interval '1 week') as x(day)
WHERE x.day::date != least(x.day::date + 7, dt_end::date)
), t1 as (
select w.day,
sum(coalesce(t.total, 0)) as t1total
from weeks w
left join table1 t
on t.id = 1
and t.date >= w.day
and t.date < w.day_end
group by w.day
), t2 as (
select w.day,
sum(coalesce(t.sum_measure, 0)) as t2total
from weeks w
left join table2 t
on t.something = 'whatever'
and t.date >= w.day
and t.date < w.day_end
group by w.day
)
select t1.day,
t1.t1total,
t2.t2total
from t1
join t2 on t2.day = t1.day;
You can keep adding tables like that with CTEs.
My earlier example with multiple left join was bad because it blows out the rows due to a lack of join conditions between the left-joined tables.
There is an interesting corner case for e.g. 2019-02-01 to 2019-03-01 which returns an empty interval as the last week. I have updated to filter that out.

SQLite - Use a CTE to divide a query

quick question for those SQL experts out there. I feel a bit stupid because I have the feeling I am close to reaching the solution but have not been able to do so.
If I have these two tables, how can I use the former one to divide a column of the second one?
WITH month_usage AS
(SELECT strftime('%m', starttime) AS month, SUM(slots) AS total
FROM Bookings
GROUP BY month)
SELECT strftime('%m', b.starttime) AS month, f.name, SUM(slots) AS usage
FROM Bookings as b
LEFT JOIN Facilities as f
ON b.facid = f.facid
GROUP BY name, month
ORDER BY month
The first one computes the total for each month
The second one is the one I want to divide the usage column by the total of each month to get the percentage
When I JOIN both tables using month as an id it messes up the content, any suggestion?
I want to divide the usage column by the total of each month to get the percentage
Just use window functions:
SELECT
strftime('%m', b.starttime) AS month,
f.name,
SUM(slots) AS usage
1.0 * SUM(slots) AS usage
/ SUM(SUM(slots)) OVER(PARTITION BY strftime('%m', b.starttime)) ratio
FROM Bookings as b
LEFT JOIN Facilities as f
ON b.facid = f.facid
GROUP BY name, month
ORDER BY month

How to Average Number of Chats per Day on LEFT JOIN table in Snowflake SQL?

In Snowflake SQL dictation, how do I average the number of video chats per day using a field from a table I left joined to the entire query?
I'm thinking I have to do a SUM function to total the number of video chats and then aggregate by # of video chats for each date and then divide by 30 days (the rolling date range I specified throughout my entire query).
Any help would be appreciated as deadlines are approaching. Thank you.
SELECT DISTINCT
t1."pid",
IFNULL(t2."VideoChats",0),
t3."SFUser",
t3."TotalProviders",
t4."dimaccount.practice_specialty",
t5."Account: CMRR",
t6."CreatedDate",
t7."stg_sf_case.Date_Time_Resolved__c",
t8."stg_sf_case.Closed_Date",
t9."pid"
FROM (SELECT "pid"
FROM "EDW_PROD"."PUBLIC"."STG_MYSQL_PROVIDERMODULES" AS a
WHERE a."active"
AND a."status" = 'PURCHASED'
AND a."module_id" = '14'
GROUP BY a."pid"
) t1
LEFT JOIN (SELECT "started_at",
"pid",
COUNT(*) AS "VideoChats"
FROM "EDW_PROD"."PUBLIC"."STG_MYSQL_VIDEOCHATROOM" AS b
LEFT JOIN "EDW_PROD"."PUBLIC"."DIMACCOUNT" AS dimaccount
ON b."pid" = dimaccount."PID"
WHERE b."started_at" >= DATE_TRUNC('month', CURRENT_DATE())
AND b."started_at" < DATEADD('month', 1, DATE_TRUNC('month', CURRENT_DATE()))
AND dimaccount."CurrentRow" = 'Y'
GROUP BY b."pid", b."started_at"
) t2 ON t1."pid" = t2."pid"
For a rolling average you probably want to use a window function. Something along these lines.
SELECT AVG(VideoChats) over (partition by pid order by started_at rows between 30 preceding and current row) as AvgVideoChats
--I saw a post about AVG not allowing a sliding window, so you may have to do this instead
SELECT SUM(VideoChats) over (partition by pid order by started_at rows between 30 preceding and current row) / 30. as AvgVideoChats
You may need to do this in a wrapper around your t2 query and adjust your date filters so that there are values available for averaging, but I'm not quite clear enough on what your query is doing with dates, or what results you are looking for, to be sure.

How can I adjust this query to produce a result that shows the average on a month-by-month basis over time

I'm having a hard time producing the desired result with one of my queries.
I'd like to be able to display the average revenue generated per user on a rolling month by month basis, based on the following criteria:
User must belong to a particular cohort, defined as a user who has booked more than 20 times in the last 90 days (so, for example, a user only gets counted in the January cohort if they have booked more than 20 times across the months of November, December and January)
The below query is what I have now, which pulls the average revenue per user for the January cohort:
WITH bookings as (SELECT u.id as user_id, count(*) as bookings_last_90, sum(total)/100 as revenue_last_90
FROM revenue r
JOIN users u on r.user_id = u.id
WHERE (CAST(r.created_at AS date) BETWEEN CAST((NOW() + INTERVAL '-90 day') AS date)
AND CAST(now() AS date))
GROUP BY u.id
HAVING COUNT(*) >= 20)
SELECT avg(b.revenue_last_90)
FROM bookings b;
I essentially need to adapt the above query to pull the average revenue per cohort user on a rolling month by month basis, keeping in tact the past 90-day timeframe for cohort definition.
The general approach when you've got a query that works with one timestamp is:
Generate a list of dates or timestamps to use in a table, view, CTE, etc
Join to the list of timestamps
Replace the timestamp you're using with the timestamp from the list
With no schema, I can't test it, but the results may look something like:
WITH --first generate list of dates from the created_at field in revenue
month_list as (select date_trunc('month' , r.created_at) as m from revenue r group by 1 )
--then use that in the bookings query
, bookings as (SELECT u.id as user_id, m.m as cohort_month, count(*) as bookings_last_90, sum(total)/100 as revenue_last_90
FROM revenue r
JOIN users u on r.user_id = u.id
join month_list m on r.created_at between m.m + interval'-60 day' and m.m + interval'1 month'
WHERE true
GROUP BY u.id , m.m
HAVING COUNT(*) >= 20)
--finally, use the date in the result query
SELECT avg(b.revenue_last_90), cohort_month
FROM bookings b group by cohort_month;

How can I make generate_series work for part of a month?

I'm using filled_months to fill blank months in and group data by months. The problem is I can't seem to make it work for querying partial months (e.g. 2016-09-01 to 2016-09-15), it always counts the full month. Can someone point me in the right direction?
with filled_months AS
(SELECT
month,
0 AS blank_count
FROM generate_series(date_trunc('month',date('2016-09-01')), date_trunc('month',date('2016-09-15')), '1 month') AS
month)
SELECT to_char(mnth.month, 'YYYY Mon') AS month_year,
count(distinct places.id)
FROM filled_months mnth
left outer join restaurants
ON date_trunc('month', restaurants.created_at) = mnth.month
left outer join places
ON restaurants.places_id = places.id
WHERE places.id IS NULL OR restaurants.id IS NULL
GROUP BY mnth.month
ORDER BY mnth.month
If I understand your quandary correctly, I don't think you even need the generate series here. I think you can get by with a between in your where clause and let the grouping in SQL handle the rest:
SELECT
to_char(r.created_at, 'YYYY Mon') AS month,
count(distinct p.id)
FROM
restaurants r
left join places p ON r.places_id = p.id
WHERE
r.created_at between '2016-09-01' and '2016-09-15'
GROUP BY month
ORDER BY month
I tested this on three records with dates, 9/1/16, 9/15/16 and 9/30/16, and it gave a count of two. When I expanded the range to 9/30 it correctly gave three.
DISCLAIMER: I didn't understand what this meant:
places.id IS NULL OR restaurants.id IS NULL
If this doesn't work, then perhaps you can add some sample data to your question, along with some expected results.