How can I aggregate values based on an arbitrary monthly cycle date range in SQL? - sql

Given a table as such:
# SELECT * FROM payments ORDER BY payment_date DESC;
id | payment_type_id | payment_date | amount
----+-----------------+--------------+---------
4 | 1 | 2019-11-18 | 300.00
3 | 1 | 2019-11-17 | 1000.00
2 | 1 | 2019-11-16 | 250.00
1 | 1 | 2019-11-15 | 300.00
14 | 1 | 2019-10-18 | 130.00
13 | 1 | 2019-10-18 | 100.00
15 | 1 | 2019-09-18 | 1300.00
16 | 1 | 2019-09-17 | 1300.00
17 | 1 | 2019-09-01 | 400.00
18 | 1 | 2019-08-25 | 400.00
(10 rows)
How can I SUM the amount column based on an arbitrary date range, not simply a date truncation?
Taking the example of a date range beginning on the 15th of a month, and ending on the 14th of the following month, the output I would expect to see is:
payment_type_id | payment_date | amount
-----------------+--------------+---------
1 | 2019-11-15 | 1850.00
1 | 2019-10-15 | 230.00
1 | 2019-09-15 | 2600.00
1 | 2019-08-15 | 800.00
Can this be done in SQL, or is this something that's better handled in code? I would traditionally do this in code, but looking to extend my knowledge of SQL (which at this stage, isnt much!)

Click demo:db<>fiddle
You can use a combination of the CASE clause and the date_trunc() function:
SELECT
payment_type_id,
CASE
WHEN date_part('day', payment_date) < 15 THEN
date_trunc('month', payment_date) + interval '-1month 14 days'
ELSE date_trunc('month', payment_date) + interval '14 days'
END AS payment_date,
SUM(amount) AS amount
FROM
payments
GROUP BY 1,2
date_part('day', ...) gives out the current day of month
The CASE clause is for dividing the dates before the 15th of month and after.
The date_trunc('month', ...) converts all dates in a month to the first of this month
So, if date is before the 15th of the current month, it should be grouped to the 15th of the previous month (this is what +interval '-1month 14 days' calculates: +14, because the date_trunc() truncates to the 1st of month: 1 + 14 = 15). Otherwise it is group to the 15th of the current month.
After calculating these payment_days, you can use them for simple grouping.

I would simply subtract 14 days, truncate the month, and add 14 days back:
select payment_type_id,
date_trunc('month', payment_date - interval '14 day') + interval '14 day' as month_15,
sum(amount)
from payments
group by payment_type_id, month_15
order by payment_type_id, month_15;
No conditional logic is actually needed for this.
Here is a db<>fiddle.

You can use the generate_series() function and make a inner join comparing month and year, like this:
SELECT specific_date_on_month, SUM(amount)
FROM (SELECT generate_series('2015-01-15'::date, '2015-12-15'::date, '1 month'::interval) AS specific_date_on_month)
INNER JOIN payments
ON (TO_CHAR(payment_date, 'yyyymm')=TO_CHAR(specific_date_on_month, 'yyyymm'))
GROUP BY specific_date_on_month;
The generate_series(<begin>, <end>, <interval>) function generate a serie based on begin and end with an specific interval.

Related

BigQuery: Repeat the same calculated value in multiple rows

I'm trying to get several simple queries into one new table using Googe Big Query. In the final table is existing revenue data per day (that I can simply draw from another table). I then want to calculate the average revenue per day of the current month and continue this value until the end of the month. So the final table is updated every day and includes actual data and forecasted data.
So far, I came up with the following, which generates an error message in combination: Scalar subquery produced more than one element
#This gives me the date, the revenue per day and the info that it's actual data
SELECT
date, sum(revenue), 'ACTUAL' as type from `project.dataset.table` where date >"2020-01-01" and date < current_date() group by date
union distinct
# This shall provide the remaining dates of the current month
SELECT
(select calendar_date FROM `project.dataset.calendar_table` where calendar_date >= current_date() and calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)),
#This shall provide the average revenue per day so far and write this value for each day of the remaining month
(SELECT avg(revenue_daily) FROM
(select sum(revenue) as revenue_daily from `project.dataset.table` WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date) as average_daily_revenue where calendar >= current_date()),
'FORECAST'
How I wish the final data looks like:
+------------+------------+----------+
| date | revenue | type |
+------------+------------+----------+
| 01.04.2020 | 100 € | ACTUAL |
| … | 5.000 € | ACTUAL |
| 23.04.2020 | 200 € | ACTUAL |
| 24.04.2020 | 230,43 € | FORECAST |
| 25.04.2020 | 230,43 € | FORECAST |
| 26.04.2020 | 230,43 € | FORECAST |
| 27.04.2020 | 230,43 € | FORECAST |
| 28.04.2020 | 230,43 € | FORECAST |
| 29.04.2020 | 230,43 € | FORECAST |
| 30.04.2020 | 230,43 € | FORECAST |
+------------+------------+----------+
The forecast value is simply the sum of the actual revenue of the month divided by the number of days the month had so far.
Thanks for any hint on how to approach this.
I just figured something out, which creates the data I need. I'll still work on updating this every day automatically. But this is what I got so far:
select
date, 'actual' as type, sum(revenue) as revenue from `project.dataset.revenue` where date >="2020-01-01" and date < current_date() group by date
union distinct
select calendar_date, 'forecast',(SELECT avg(revenue_daily) FROM
(select sum(revenue) as revenue_daily from `project.dataset.revenue` WHERE extract(year from date) = extract (year from current_date()) and extract(month from date) = extract (month from current_date()) group by date order by date) as average_daily_revenue), FROM `project.dataset.calendar` where calendar_date >= current_date() and calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY) order by date

SQL null values not being shown in results

I'm having trouble getting the null values on a SQL Query. This is the description of the problem:
Gross income by week. Money is collected from guests when they leave.
For each Thursday in November and December 2016, show the total amount
of money collected from the previous Friday to that day, inclusive.
Here's the code that I've written that should return the weekly income from Thursday to previous Friday, the answer i get is partially correct as the weeks that have income are correctly displayed while the weeks that don't have any income are not displayed. I've tried adding a IFNULL clause but that's still not fixing the problem.
SELECT DATE_ADD(MAKEDATE(2016, 7), INTERVAL WEEK(DATE_ADD(calendar.i, INTERVAL booking.nights - 5 DAY), 0) WEEK) AS Thursday, IFNULL(SUM(booking.nights * rate.amount) + SUM(e.amount),0) AS weekly_ncome
FROM booking
RIGHT OUTER
JOIN calendar ON booking.booking_date = calendar.i
JOIN rate ON (booking.occupants = rate.occupancy AND booking.room_type_requested = rate.room_type)
LEFT JOIN (
SELECT booking_id, IFNULL(SUM(amount),0) AS amount
FROM extra
GROUP BY booking_id
) AS e ON (e.booking_id = booking.booking_id)
GROUP BY Thursday;
For reference, this is a question found on SQLzoo Guesthouse section, question 15. This is the expected result:
+------------+---------------+
| Thursday | weekly_income |
+------------+---------------+
| 2016-11-03 | 0.00 |
| 2016-11-10 | 12608.94 |
| 2016-11-17 | 13552.56 |
| 2016-11-24 | 12929.69 |
| 2016-12-01 | 11685.14 |
| 2016-12-08 | 13093.79 |
| 2016-12-15 | 8975.87 |
| 2016-12-22 | 1395.77 |
| 2016-12-29 | 0.00 |
| 2017-01-05 | 0.00 |
+------------+---------------+
I get the same as above but the ones with weekly income of 0 don't show up.
Here is one way to get the Thursdays in Nov 2016 and Dec 2016:
SELECT i AS thursday, i - INTERVAL 6 DAY AS friday
FROM calendar
WHERE i >= '2016-11-01' AND i - INTERVAL 6 DAY <= '2016-12-31' AND DAYOFWEEK(i) = 5
Just left join your data with this, make sure that you join with the checkout date (booking_date + nights days):
SELECT
thursday, SUM(
COALESCE(booking.nights * rate.amount, 0) +
COALESCE(extras.total, 0)
) AS weekly_income
FROM (
SELECT i AS thursday, i - INTERVAL 6 DAY AS friday
FROM calendar
WHERE i >= '2016-11-01' AND i - INTERVAL 6 DAY <= '2016-12-31' AND DAYOFWEEK(i) = 5
) AS thursdays
LEFT JOIN (
booking
INNER JOIN rate ON booking.occupants = rate.occupancy AND booking.room_type_requested = rate.room_type
LEFT JOIN (
SELECT booking_id, SUM(amount) AS total
FROM extra
GROUP BY booking_id
) AS extras ON booking.booking_id = extras.booking_id
) ON booking.booking_date + INTERVAL booking.nights DAY BETWEEN friday AND thursday
GROUP BY thursday

PostgreSQL query group by two "parameters"

I've been trying to figure out the following PostgreSQL query with no success for two days now.
Let's say I have the following table:
| date | value |
-------------------------
| 2018-05-11 | 0.20 |
| 2018-05-11 | -0.12 |
| 2018-05-11 | 0.15 |
| 2018-05-10 | -1.20 |
| 2018-05-10 | -0.70 |
| 2018-05-10 | -0.16 |
| 2018-05-10 | 0.07 |
And I need to find out the query to count positive and negative values per day:
| date | positives | negatives |
------------------------------------------
| 2018-05-11 | 2 | 1 |
| 2018-05-10 | 1 | 3 |
I've been able to figure out the query to extract only positives or negatives, but not both at the same time:
SELECT to_char(table.date, 'DD/MM') AS date
COUNT(*) AS negative
FROM table
WHERE table.date >= DATE(NOW() - '20 days' :: INTERVAL) AND
value < '0'
GROUP BY to_char(date, 'DD/MM'), table.date
ORDER BY table.date DESC;
Can please someone assist? This is driving me mad. Thank you.
Use a FILTER clause with the aggregate function.
SELECT to_char(table.date, 'DD/MM') AS date,
COUNT(*) FILTER (WHERE value < 0) AS negative,
COUNT(*) FILTER (WHERE value > 0) AS positive
FROM table
WHERE table.date >= DATE(NOW() - '20 days'::INTERVAL)
GROUP BY 1
ORDER BY DATE(table.date) DESC
I would simply do:
select date_trunc('day', t.date) as dte,
sum( (value < 0)::int ) as negatives,
sum( (value > 0)::int ) as positives
from t
where t.date >= current_date - interval '20 days'
group by date_trunc('day', t.date),
order by dte desc;
Notes:
I prefer using date_trunc() to casting to a string for removing the time component.
You don't need to use now() and convert to a date. You can just use current_date.
Converting a string to an interval seems awkward, when you can specify an interval using the interval keyword.

Dynamically calculate how many months have passed - SQL Server

I am trying to calculate how many months ago the date field was
I have a table
CREATE TABLE Date(
Date Date
);
INSERT INTO Date (Date)
VALUES ('05-01-18'),
('04-01-18'),
('03-01-18'),
('02-01-18'),
('01-01-18'),
('12-01-17'),
('11-01-17');
And a query
SELECT Date ,
MONTH(Date),
CASE WHEN MONTH(Date) = MONTH(GETDATE()) Then 'Current Month'
WHEN MONTH(Date) = MONTH(GETDATE()) -1 Then '1 Month Ago'
WHEN MONTH(Date) = MONTH(GETDATE()) -2 Then '2 Month Ago'
ELSE 'n/a' END AS [Months Ago]
FROM Date
Which gives me the correct result:
| Date | | Months Ago |
|------------|----|---------------|
| 2018-05-01 | 5 | Current Month |
| 2018-04-01 | 4 | 1 Month Ago |
| 2018-03-01 | 3 | 2 Month Ago |
| 2018-02-01 | 2 | n/a |
| 2018-01-01 | 1 | n/a |
| 2017-12-01 | 12 | n/a |
| 2017-11-01 | 11 | n/a |
But is there anyway to create this dynamically instead of keep having to write case expressions. So if anyone add's more dates in the future this will just work without having to add more cases?
You exactly want datediff():
select datediff(month, date, getdate()) as num_months_ago
datediff() counts the number of month boundaries between two dates. So, Dec 31 is "one month before" Jan 1. This appears to be the behavior that you want.
I don't see an advantage to putting this in a string format.
In case you do want this to have a string format:
SELECT D.[Date],
DATEPART(MONTH,D.[Date]) AS [Month],
CASE WHEN V.DD = 0 THEN 'Current Month'
WHEN V.DD = 1 THEN '1 Month Ago'
ELSE CONVERT(varchar(4), V.DD) + ' Months ago' END AS MonthsAgo
FROM [Date] D
CROSS APPLY (VALUES(DATEDIFF(MONTH, D.[Date], GETDATE()))) V(DD);
I, however, agree with Gordon, SQL Server isn't really the palce to do that type of formatting. :)

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)