30-day rolling/moving sum when current date is missing - sql

I have a table (view_of_referred_events) which stores the number of visitors for a given page.
date country_id referral product_id visitors
2016-04-01 216 pl 113759 1
2016-04-03 216 pl 113759 1
2016-04-06 216 pl 113759 13
2016-04-07 216 pl 113759 10
I want to compute the 30-day rolling/moving sum for this product, even for those days which are missing. So the end result should be something like the following:
date country_id referral product_id cumulative_visitors
2016-04-01 216 pl 113759 1
2016-04-02 216 pl 113759 1
2016-04-03 216 pl 113759 2
2016-04-04 216 pl 113759 2
2016-04-05 216 pl 113759 2
2016-04-06 216 pl 113759 15
2016-04-07 216 pl 113759 25
Now, this is a simplistic representation, because I have tens of different country_id, referral and product_id. I can't pre-create a table with all possible combinations of {date, country_id, referral and product_id} because this would become untreatable considering the size of the table. I don't also want to have a row in the final table if that specific {date, country_id, referral and product_id} didn't exist before.
I was thinking if there was an easy way to tell Impala to use the value of the previous row (the previous day) if in view_of_referred_events there are no visitors for that day.
I wrote this query, where list_of_dates is a table with a list of days from April 1st to April 7th.
select
t.`date`,
t.country_id,
t.referral,
t.product_id,
sum(visitors) over (partition by t.country_id, t.referral, t.product_id order by t.`date`
rows between 30 preceding and current row) as cumulative_sum_visitors
from (
selec
d.`date`,
re.country_id,
re.referral,
re.product_id,
sum(visitors) as visitors
from list_of_dates d
left outer join view_of_referred_events re on d.`date` = re.`date`
and re.referral = "pl"
and re.product_id = "113759"
and re.country_id = "216"
group by d.`date`, re.country_id, re.referral, re.product_id
) t
order by t.`date` asc;
This returns something similar to what I want, but not exactly that.
date country_id referral product_id cumulative_visitors
2016-04-01 216 pl 113759 1
2016-04-02 NULL NULL NULL NULL
2016-04-03 216 pl 113759 2
2016-04-04 NULL NULL NULL NULL
2016-04-05 NULL NULL NULL NULL
2016-04-06 216 pl 113759 15
2016-04-07 216 pl 113759 25

I have added another sub query to get the value from the last row in the partition. I am not sure what version of hive/impala you are using, last_value(column_name, ignore null values true/false) is the syntax.
I assume you are trying to find the cumulative counts over a 30 days (month), I recommend using month field to group the rows. The month could come either from your dimension table list_of_dates or just substr(date, 1, 7) and get the cumulative counts of visitors over ..rows unbounded preceding and current row.
query:
select
`date`,
country_id,
referral,
product_id,
sum(visitors) over (partition by country_id, referral, product_id order by `date`
rows between 30 preceding and current row) as cumulative_sum_visitors
from (select
t.`date`,
-- get the last not null value from the partition window w for country_id, referral & product_id
last_value(t.country_id, true) over w as country_id,
last_value(t.referral, true) over w as referral
last_value(t.product_id, true) over w as product_id
if(visitors = null, 0, visitors) as visitors
from (
select
d.`date`,
re.country_id,
re.referral,
re.product_id,
sum(visitors) as visitors
from list_of_dates d
left outer join view_of_referred_events re on d.`date` = re.`date`
and re.referral = "pl"
and re.product_id = "113759"
and re.country_id = "216"
group by d.`date`, re.country_id, re.referral, re.product_id
) t
window w as (partition by t.country_id, t.referral, t.product_id order by t.`date`
rows between unbounded preceding and unbounded following)) t1
order by `date` asc;

I'm not sure how goo the performance will be, but you can do this by aggregating the data twice and adding 30 days for the second aggregation and negating the count.
Something like this:
with t as (
select d.`date`, re.country_id, re.referral, re.product_id,
sum(visitors) as visitors
from list_of_dates d left outer join
view_of_referred_events re
on d.`date` = re.`date` and
re.referral = 'pl' and
re.product_id = 113759 and
re.country_id = 216
group by d.`date`, re.country_id, re.referral, re.product_id
)
select date, country_id, referral, product_id,
sum(sum(visitors)) over (partition by country_id, referral, product_id order by date) as visitors
from ((select date, country_id, referral, product_id, visitors
from t
) union all
(select date_add(date, 30), country_id, referral, product_id, -visitors
from t
)
) tt
group by date, country_id, referral, product_id;

Related

SQL join type to have as many rows as each date for each customer

I have these two tables
date
2017-1
2017-2
2017-3
2017-4
2017-5
2017-6
and
date
customer
no_orders
city_code
2017-1
156
1
DNZ
2017-3
156
5
LON
2017-5
156
4
DNZ
2017-6
156
2
YQB
How can I join these two tables to have one row for each customer for all the dates same as below?
If on a date, the customer has no order, its no_order should be 0 and its city_code should be the city_code of the previous date.
date
customer
no_orders
city_code_2
2017-1
156
1
DNZ
2017-2
156
0
DNZ
2017-3
156
5
LON
2017-4
156
0
LON
2017-5
156
4
DNZ
2017-6
156
2
YQB
This code by #Tim Biegeleisen resolved part 1 of my question but now I want to handle both parts with each other.
SELECT d.date, c.customer, COALESCE(t.no_orders, 0) AS no_orders
FROM dates d
CROSS JOIN (SELECT DISTINCT customer FROM customers) c
LEFT JOIN customers t
ON t.date = d.date AND
t.customer = c.customer
ORDER BY c.customer, d.date;
We can use the following calendar table approach:
SELECT d.date, c.customer, COALESCE(t.no_orders, 0) AS no_orders
FROM dates d
CROSS JOIN (SELECT DISTINCT customer FROM customers) c
LEFT JOIN customers t
ON t.date = d.date AND
t.customer = c.customer
ORDER BY c.customer, d.date;
This assumes that the first table is called dates and the second table customers. The query works by using a cross join to generate a set of all dates and customers. We then left join to the second table to bring in the number of orders for a given customer on a given day. Absent number of orders are reported as zero.

Select count & sum from order table, as well as count & sum from event table where order_id matches & event = expiry

I have 2 tables, one containing Order information and one containing Order Event information, example structure below:
Orders Table:
merchant_id
order_id
amount
order_date
111111
123456
100
2021-07-01
111111
789012
50
2021-07-20
111111
642443
75
2021-08-12
Events Table:
merchant_id
order_id
event
amount
date
111111
789012
EXPIRY
50
2021-08-03
111111
642443
EXPIRY
75
2021-08-28
Desired Output:
I am trying to get a breakdown by Merchant Id and month of:
Order Count
Order Sum
Expiry Count (how many of the orders placed in that month have expired regardless of date expired)
Expiry Sum (value of the expiry count above)
Example Output:
merchant_id
order_month
order_count
order_sum
expiry_count
expiry_sum
111111
7
3
150
1
50
111111
8
1
75
1
50
I have tried a few queries with no luck, the furthest I've gotten is:
select o.merchant_id, extract(month from o.order_date) as order_month, count(o.order_id) as order_count, sum(o.order_amount) as order_sum, count(e.order_id) as expiry_count, sum(e.amount) as expiry_sum
from orders o
left join events e on e.order_id = o.order_id
where o.merchant_id = '111111'
and o.order_date >= '2021-07-01'
group by o.merchant_id, order_month
order by o.merchant_id, order_month
However that outputs the exact same values for order_count & expiry_count, as well as order_sum & expiry_sum. Additionally I need to only retrieve events where event = 'EXPIRY' however I get no results when I add that filter.
Any help would be much appreciated!
Add the condition on event to the join (not the where):
select o.merchant_id, extract(month from o.order_date) as order_month, count(o.order_id) as order_count, sum(o.order_amount) as order_sum, count(e.order_id) as expiry_count, sum(e.amount) as expiry_sum
from orders o
left join events e on e.order_id = o.order_id
and e.event = 'EXPIRY'
where o.merchant_id = '111111'
and o.order_date >= '2021-07-01'
group by o.merchant_id, order_month
order by o.merchant_id, order_month
If you put a condition on an outer joined table in the where clause, you force the join to behave as an inner join (as if you deleted the left keyword).

Find out per day the first trip duration and last trip duration of a bike

Find out per day first trip duration and last trip duration of a bike.
Table
trip_id bike-id trip_date trip_starttime trip_duration
1 1 2018-12-01 12:00:00.0000000 10
2 2 2018-12-01 14:00:00.0000000 25
3 1 2018-12-01 14:30:00.0000000 5
4 3 2018-12-02 05:00:00.0000000 12
5 3 2018-12-02 19:00:00.0000000 37
6 1 2018-12-02 20:30:00.0000000 20
Expected Result
trip_date bike-id first_trip_duration last_trip_duration
2018-12-01 1 10 5
2018-12-01 2 25 25
2018-12-02 1 20 20
2018-12-02 3 12 37
I tried it with below code,
select A.trip_date,A.[bike-id],A.trip_duration AS Minduration,B.trip_duration AS MaxDUrtaion from
(SELECT T1.trip_date,T1.[bike-id],T1.trip_duration FROM TRIP T1
INNER JOIN (
select trip_date,[bike-id] , min(trip_starttime) AS Mindate
from Trip group by trip_date,[bike-id] ) T2
oN T1.[bike-id]=T2.[bike-id] AND T1.trip_date=T2.trip_date AND t1.trip_starttime=t2.Mindate ) as A
inner join
(SELECT T1.trip_date,T1.[bike-id],T1.trip_duration FROM TRIP T1
INNER JOIN (
select trip_date,[bike-id] , MAX(trip_starttime) AS Maxdate
from Trip group by trip_date,[bike-id] ) T2
oN T1.[bike-id]=T2.[bike-id] AND T1.trip_date=T2.trip_date AND t1.trip_starttime=t2.Maxdate ) as B
ON A.[bike-id]=B.[bike-id] AND A.trip_date=B.trip_date
order by A.trip_date,A.[bike-id]
I want to know some other logic too, please help out.
First, determine for each date/bike the first and last trip.
Then, determine the duration of these trips.
Something like this might do it (I didn't test it though):
SELECT minmax.trip_date,
minmax.bike_id,
first.trip_duration AS first_trip_duration,
last.trip_duration AS last_trip_duration
FROM (SELECT trip_date,
bike_id,
MIN(trip_starttime) AS first_trip,
MAX(trip_starttime) AS last_trip
FROM trip_table
GROUP BY trip_date,
bike_id
) minmax
JOIN trip_table first
ON minmax.trip_date = first.trip_date
AND minmax.bike_id = first.bike_id
AND minmax.first_trip = first.trip_starttime
JOIN trip_table last
ON minmax.trip_date = last.trip_date
AND minmax.bike_id = last.bike_id
AND minmax.last_trip = last.trip_starttime
Supposing you have the necessary indexes on the table.
Preferably a unique index on (bike_id, trip_date, starttime).
select trip_date,bike_id
,first_value(trip_duration) over(partition by trip_date,bike_id order by trip_starttime) as first_trip_duration
,first_value(trip_duration) over(partition by trip_date,bike_id order by trip_starttime desc) as last_trip_duration
from trip;
Assuming window functions are supported, this can be done with first_value.
select distinct
trip_date
,bike_id
,first_value(trip_duration) over(partition by trip_date,bike_id order by trip_starttime) as first_trip_duration
,first_value(trip_duration) over(partition by trip_date,bike_id order by trip_starttime desc) as last_trip_duration
from trip

MSSQL: How to display TOP 10 items from GROUP BY query?

I have a query which displays this as result:
Year-Month SN_NAME Raised Incidents
2015-11 A 14494
2015-11 B 8432
2015-11 D 5496
2015-11 G 4778
2015-11 H 4554
2015-11 C 4203
2015-11 X 3477
.......+ thousands more rows for 2015-11
2015-12 A 3373
2015-12 B 3322
2015-12 H 2814
2015-12 D 2745
......+ thousands more rows for 2015-12
......+ thousands more rows for 2016-01 - 2016-10
2016-11 B 2645
2016-11 C 2571
2016-11 E 2475
2016-11 D 2466
....+ thousands more rows for 2016-11
I need to select TOP 10 SN_NAME by Raised_Incident count from last month and and then show their COUNTS for previous 12 months.
The query I use to display above result is this one:
DECLARE #startOfCurrentMonth DATETIME
SET #startOfCurrentMonth = DATEADD(month, DATEDIFF(month, 0, CURRENT_TIMESTAMP), 0)
SELECT
CONVERT(char(7),IM.SN_SYS_CREATED_ON,121) as "Year-Month"
,CI.SN_NAME
,COUNT(IM.SN_NUMBER) as "Raised Incidents"
FROM [dbo].[tab_IM_Incident] IM
LEFT JOIN [dbo].[tab_SNOW_CMDB_CI] CI on IM.SN_CMDB_CI = CI.SN_SYS_ID
WHERE
IM.SN_SYS_CREATED_ON >= DATEADD(month, -13, #startOfCurrentMonth) AND IM.SN_SYS_CREATED_ON < #startOfCurrentMonth
AND (IM.SN_U_SUB_STATE <> 'Cancelled' OR IM.SN_U_SUB_STATE IS NULL)
GROUP BY
CONVERT(char(7),IM.SN_SYS_CREATED_ON,121)
, CI.SN_NAME
ORDER BY
CONVERT(char(7),IM.SN_SYS_CREATED_ON,121)
, COUNT(IM.SN_NUMBER) DESC
The problem is I don't know how to limit each month values to TOP10 only, as the query returns me around 200 000 rows in total, while it should return 13x10 = 130 rows.
The expected output is exactly as on top of the question, but limited to only top 10 rows per month for last 13 months.
Please advise.
If I understand correctly, you want the 10 incidents that are top for the most recent month, and then to see their incidents for all months in the data.
Here is one method:
with t as (
your query here
)
select t.*
from (select top 10 t.*
from t
order by YearMonth desc, RaisedIncidents desc
) top10 left join
t
on t.sn_name = top10.sn_name
order by YearMonth desc, RaisedIncidents desc;
Note that the top 10 is not filtering on the latest month. Instead, it orders by the latest month and then RaisedIncidents. This assumes that there are at least 10 incidents in the most recent month.

SQL combining a left join with a where between

I have two queries that work perfectly.
SELECT fy.date_stop as pend
FROM account_fiscalyear fy
WHERE <any date> BETWEEN fy.date_start AND fy.date_stop
This returns the last date of the fiscal year in which can be found.
and
SELECT a.id as id, COALESCE(MAX(l.date),a.purchase_date) AS date
FROM account_asset_asset a
LEFT JOIN account_move_line l ON (l.asset_id = a.id)
WHERE a.id <some condition>
GROUP BY a.id, a.purchase_date
This returns a results similar to the following giving the asset id and purchase date or last depreciation date for the asset.
61 2014-09-01
96 2014-09-01
115 2015-02-25
181 2015-11-27
122 2015-04-03
87 2014-09-01
67 2014-09-01
207 2016-09-09
54 2014-09-01
159 2015-08-25
163 2015-08-19
....
The result I want is the asset id but this time with the last day of the financial year that the purchase date or last depreciation date can be found in. I just don't seem to be able to find a way to combine the two queries.
Solved it.
SELECT a.id as id, COALESCE(MAX(l.date), a.purchase_date) as date
FROM
(SELECT ass.id as id, fy.date_stop as purchase_date
FROM account_fiscalyear fy, account_asset_asset ass
WHERE ass.purchase_date BETWEEN fy.date_start AND fy.date_stop) a
LEFT JOIN
(SELECT mvl.asset_id as asset_id, fy.date_stop as date
FROM account_move_line mvl, account_period per, account_fiscalyear fy
WHERE mvl.period_id = per.id AND per.fiscalyear_id = fy.id) l
ON (l.asset_id = a.id)
GROUP BY a.id, a.purchase_date