SQL null values not being shown in results - sql

I'm having trouble getting the null values on a SQL Query. This is the description of the problem:
Gross income by week. Money is collected from guests when they leave.
For each Thursday in November and December 2016, show the total amount
of money collected from the previous Friday to that day, inclusive.
Here's the code that I've written that should return the weekly income from Thursday to previous Friday, the answer i get is partially correct as the weeks that have income are correctly displayed while the weeks that don't have any income are not displayed. I've tried adding a IFNULL clause but that's still not fixing the problem.
SELECT DATE_ADD(MAKEDATE(2016, 7), INTERVAL WEEK(DATE_ADD(calendar.i, INTERVAL booking.nights - 5 DAY), 0) WEEK) AS Thursday, IFNULL(SUM(booking.nights * rate.amount) + SUM(e.amount),0) AS weekly_ncome
FROM booking
RIGHT OUTER
JOIN calendar ON booking.booking_date = calendar.i
JOIN rate ON (booking.occupants = rate.occupancy AND booking.room_type_requested = rate.room_type)
LEFT JOIN (
SELECT booking_id, IFNULL(SUM(amount),0) AS amount
FROM extra
GROUP BY booking_id
) AS e ON (e.booking_id = booking.booking_id)
GROUP BY Thursday;
For reference, this is a question found on SQLzoo Guesthouse section, question 15. This is the expected result:
+------------+---------------+
| Thursday | weekly_income |
+------------+---------------+
| 2016-11-03 | 0.00 |
| 2016-11-10 | 12608.94 |
| 2016-11-17 | 13552.56 |
| 2016-11-24 | 12929.69 |
| 2016-12-01 | 11685.14 |
| 2016-12-08 | 13093.79 |
| 2016-12-15 | 8975.87 |
| 2016-12-22 | 1395.77 |
| 2016-12-29 | 0.00 |
| 2017-01-05 | 0.00 |
+------------+---------------+
I get the same as above but the ones with weekly income of 0 don't show up.

Here is one way to get the Thursdays in Nov 2016 and Dec 2016:
SELECT i AS thursday, i - INTERVAL 6 DAY AS friday
FROM calendar
WHERE i >= '2016-11-01' AND i - INTERVAL 6 DAY <= '2016-12-31' AND DAYOFWEEK(i) = 5
Just left join your data with this, make sure that you join with the checkout date (booking_date + nights days):
SELECT
thursday, SUM(
COALESCE(booking.nights * rate.amount, 0) +
COALESCE(extras.total, 0)
) AS weekly_income
FROM (
SELECT i AS thursday, i - INTERVAL 6 DAY AS friday
FROM calendar
WHERE i >= '2016-11-01' AND i - INTERVAL 6 DAY <= '2016-12-31' AND DAYOFWEEK(i) = 5
) AS thursdays
LEFT JOIN (
booking
INNER JOIN rate ON booking.occupants = rate.occupancy AND booking.room_type_requested = rate.room_type
LEFT JOIN (
SELECT booking_id, SUM(amount) AS total
FROM extra
GROUP BY booking_id
) AS extras ON booking.booking_id = extras.booking_id
) ON booking.booking_date + INTERVAL booking.nights DAY BETWEEN friday AND thursday
GROUP BY thursday

Related

How can I aggregate values based on an arbitrary monthly cycle date range in SQL?

Given a table as such:
# SELECT * FROM payments ORDER BY payment_date DESC;
id | payment_type_id | payment_date | amount
----+-----------------+--------------+---------
4 | 1 | 2019-11-18 | 300.00
3 | 1 | 2019-11-17 | 1000.00
2 | 1 | 2019-11-16 | 250.00
1 | 1 | 2019-11-15 | 300.00
14 | 1 | 2019-10-18 | 130.00
13 | 1 | 2019-10-18 | 100.00
15 | 1 | 2019-09-18 | 1300.00
16 | 1 | 2019-09-17 | 1300.00
17 | 1 | 2019-09-01 | 400.00
18 | 1 | 2019-08-25 | 400.00
(10 rows)
How can I SUM the amount column based on an arbitrary date range, not simply a date truncation?
Taking the example of a date range beginning on the 15th of a month, and ending on the 14th of the following month, the output I would expect to see is:
payment_type_id | payment_date | amount
-----------------+--------------+---------
1 | 2019-11-15 | 1850.00
1 | 2019-10-15 | 230.00
1 | 2019-09-15 | 2600.00
1 | 2019-08-15 | 800.00
Can this be done in SQL, or is this something that's better handled in code? I would traditionally do this in code, but looking to extend my knowledge of SQL (which at this stage, isnt much!)
Click demo:db<>fiddle
You can use a combination of the CASE clause and the date_trunc() function:
SELECT
payment_type_id,
CASE
WHEN date_part('day', payment_date) < 15 THEN
date_trunc('month', payment_date) + interval '-1month 14 days'
ELSE date_trunc('month', payment_date) + interval '14 days'
END AS payment_date,
SUM(amount) AS amount
FROM
payments
GROUP BY 1,2
date_part('day', ...) gives out the current day of month
The CASE clause is for dividing the dates before the 15th of month and after.
The date_trunc('month', ...) converts all dates in a month to the first of this month
So, if date is before the 15th of the current month, it should be grouped to the 15th of the previous month (this is what +interval '-1month 14 days' calculates: +14, because the date_trunc() truncates to the 1st of month: 1 + 14 = 15). Otherwise it is group to the 15th of the current month.
After calculating these payment_days, you can use them for simple grouping.
I would simply subtract 14 days, truncate the month, and add 14 days back:
select payment_type_id,
date_trunc('month', payment_date - interval '14 day') + interval '14 day' as month_15,
sum(amount)
from payments
group by payment_type_id, month_15
order by payment_type_id, month_15;
No conditional logic is actually needed for this.
Here is a db<>fiddle.
You can use the generate_series() function and make a inner join comparing month and year, like this:
SELECT specific_date_on_month, SUM(amount)
FROM (SELECT generate_series('2015-01-15'::date, '2015-12-15'::date, '1 month'::interval) AS specific_date_on_month)
INNER JOIN payments
ON (TO_CHAR(payment_date, 'yyyymm')=TO_CHAR(specific_date_on_month, 'yyyymm'))
GROUP BY specific_date_on_month;
The generate_series(<begin>, <end>, <interval>) function generate a serie based on begin and end with an specific interval.

Filling null values in timeseries data (for weekends) with previous day values

I have a view with dates, stock name and daily stock prices for weekdays. This excludes data for Saturdays and Sundays.
I want to fill data on Saturdays and Sundays with all the stock names and corresponding stock prices for previous day (Friday).
How can I run a SQL query to get the desired output?
Thank you for your help in resolving this query.
E.g.
Original data
Date Stock-Name Stock-Price
2019/06/30 null null
2019/06/29 null null
2019/06/28 Appl $200
2019/06/28 Goog $1100
2019/06/28 Tsla $300
2019/06/27 Appl $210
2019/06/27 Goog $1200
2019/06/27 Tsla $200
Expected Output
Date | Stock Name | Stock Price
--------------------------------------
2019/06/30 | Appl | $200
2019/06/30 | Goog | $1100
2019/06/30 | Tsla | $300
2019/06/29 | Appl | $200
2019/06/29 | Goog | $1100
2019/06/29 | Tsla | $300
2019/06/28 | Appl | $200
2019/06/28 | Goog | $1100
2019/06/28 | Tsla | $300
2019/06/27 | Appl | $210
2019/06/27 | Goog | $1200
2019/06/27 | Tsla | $200
Use a cross join to generate the rows and a left join to bring in values.
To get the preceding value, you can use lag() with ignore nulls for some similar mechanism (not all databases support this functionality).
So:
select d.date, s.stockname,
coalesce(t.price,
lag(t.price ignore nulls) over (partition by s.stockname order by d.date)
) as price
from (select distinct date from t
) d cross join
(select stockname from t where stockname is not null
) s left join
t
on t.date = d.date and t.stockname = s.stockname
order by d.date desc, s.stockname;
demo:db<>fiddle
For Postgres:
With the lead() window function you are able to get values from next records:
SELECT
mydate,
CASE
WHEN date_part('dow', mydate) = 6 THEN
lead(stock_name) OVER w
WHEN date_part('dow', mydate) = 0 THEN
lead(stock_name, 2) OVER w
ELSE
stock_name
END AS stock_name,
CASE
WHEN date_part('dow', mydate) = 6 THEN
lead(stock_price) OVER w
WHEN date_part('dow', mydate) = 0 THEN
lead(stock_price, 2) OVER w
ELSE
stock_price
END AS stock_price
FROM
stock
WINDOW w AS (ORDER BY mydate DESC)
This query checks for the current week day using the date_part() function. The parameter dow (day of week) gives out 0 for Sundays and 6 for Saturdays.
If it is Saturday, the lead() function only looks one record further and if it Sunday 2 days. Otherwise the original data is taken.
Note: "Date" is not a really good column name because it is a reserved keyword in Postgres. To avoid certain problems you should adjust the names a little bit.

Dynamically calculate how many months have passed - SQL Server

I am trying to calculate how many months ago the date field was
I have a table
CREATE TABLE Date(
Date Date
);
INSERT INTO Date (Date)
VALUES ('05-01-18'),
('04-01-18'),
('03-01-18'),
('02-01-18'),
('01-01-18'),
('12-01-17'),
('11-01-17');
And a query
SELECT Date ,
MONTH(Date),
CASE WHEN MONTH(Date) = MONTH(GETDATE()) Then 'Current Month'
WHEN MONTH(Date) = MONTH(GETDATE()) -1 Then '1 Month Ago'
WHEN MONTH(Date) = MONTH(GETDATE()) -2 Then '2 Month Ago'
ELSE 'n/a' END AS [Months Ago]
FROM Date
Which gives me the correct result:
| Date | | Months Ago |
|------------|----|---------------|
| 2018-05-01 | 5 | Current Month |
| 2018-04-01 | 4 | 1 Month Ago |
| 2018-03-01 | 3 | 2 Month Ago |
| 2018-02-01 | 2 | n/a |
| 2018-01-01 | 1 | n/a |
| 2017-12-01 | 12 | n/a |
| 2017-11-01 | 11 | n/a |
But is there anyway to create this dynamically instead of keep having to write case expressions. So if anyone add's more dates in the future this will just work without having to add more cases?
You exactly want datediff():
select datediff(month, date, getdate()) as num_months_ago
datediff() counts the number of month boundaries between two dates. So, Dec 31 is "one month before" Jan 1. This appears to be the behavior that you want.
I don't see an advantage to putting this in a string format.
In case you do want this to have a string format:
SELECT D.[Date],
DATEPART(MONTH,D.[Date]) AS [Month],
CASE WHEN V.DD = 0 THEN 'Current Month'
WHEN V.DD = 1 THEN '1 Month Ago'
ELSE CONVERT(varchar(4), V.DD) + ' Months ago' END AS MonthsAgo
FROM [Date] D
CROSS APPLY (VALUES(DATEDIFF(MONTH, D.[Date], GETDATE()))) V(DD);
I, however, agree with Gordon, SQL Server isn't really the palce to do that type of formatting. :)

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

How can I identify groups of consecutive dates in SQL?

Im trying to write a function which identifies groups of dates, and measures the size of the group.
I've been doing this procedurally in Python until now but I'd like to move it into SQL.
for example, the list
Bill 01/01/2011
Bill 02/01/2011
Bill 03/01/2011
Bill 05/01/2011
Bill 07/01/2011
should be output into a new table as:
Bill 01/01/2011 3
Bill 02/01/2011 3
Bill 03/01/2011 3
Bill 05/01/2011 1
Bill 07/01/2011 1
Ideally this should also be able to account for weekends and public holidays - the dates in my table will aways be Mon-Fri (I think I can solve this by making a new table of working days and numbering them in sequence). Someone at work suggested I try a CTE. Im pretty new to this, so I'd appreciate any guidance anyone could provide! Thanks.
You can do this with a clever application of window functions. Consider the following:
select name, date, row_number() over (partition by name order by date)
from t
This adds a row number, which in your example would simply be 1, 2, 3, 4, 5. Now, take the difference from the date, and you have a constant value for the group.
select name, date,
dateadd(d, - row_number() over (partition by name order by date), date) as val
from t
Finally, you want the number of groups in sequence. I would also add a group identifier (for instance, to distinguish between the last two).
select name, date,
count(*) over (partition by name, val) as NumInSeq,
dense_rank() over (partition by name order by val) as SeqID
from (select name, date,
dateadd(d, - row_number() over (partition by name order by date), date) as val
from t
) t
Somehow, I missed the part about weekdays and holidays. This solution does not solve that problem.
The following query account the weekends and holidays. The query has a provision to include the holidays on-the-fly, though for the purpose of making the query clearer, I just materialized the holidays to an actual table.
CREATE TABLE tx
(n varchar(4), d date);
INSERT INTO tx
(n, d)
VALUES
('Bill', '2006-12-29'), -- Friday
-- 2006-12-30 is Saturday
-- 2006-12-31 is Sunday
-- 2007-01-01 is New Year's Holiday
('Bill', '2007-01-02'), -- Tuesday
('Bill', '2007-01-03'), -- Wednesday
('Bill', '2007-01-04'), -- Thursday
('Bill', '2007-01-05'), -- Friday
-- 2007-01-06 is Saturday
-- 2007-01-07 is Sunday
('Bill', '2007-01-08'), -- Monday
('Bill', '2007-01-09'), -- Tuesday
('Bill', '2012-07-09'), -- Monday
('Bill', '2012-07-10'), -- Tuesday
('Bill', '2012-07-11'); -- Wednesday
create table holiday(d date);
insert into holiday(d) values
('2007-01-01');
/* query should return 7 consecutive good
attendance(from December 29 2006 to January 9 2007) */
/* and 3 consecutive attendance from July 7 2012 to July 11 2012. */
Query:
with first_date as
(
-- get the monday of the earliest date
select dateadd( ww, datediff(ww,0,min(d)), 0 ) as first_date
from tx
)
,shifted as
(
select
tx.n, tx.d,
diff = datediff(day, fd.first_date, tx.d)
- (datediff(day, fd.first_date, tx.d)/7 * 2)
from tx
cross join first_date fd
union
select
xxx.n, h.d,
diff = datediff(day, fd.first_date, h.d)
- (datediff(day, fd.first_date, h.d)/7 * 2)
from holiday h
cross join first_date fd
cross join (select distinct n from tx) as xxx
)
,grouped as
(
select *, grp = diff - row_number() over(partition by n order by d)
from shifted
)
select
d, n, dense_rank() over (partition by n order by grp) as nth_streak
,count(*) over (partition by n, grp) as streak
from grouped
where d not in (select d from holiday) -- remove the holidays
Output:
| D | N | NTH_STREAK | STREAK |
-------------------------------------------
| 2006-12-29 | Bill | 1 | 7 |
| 2007-01-02 | Bill | 1 | 7 |
| 2007-01-03 | Bill | 1 | 7 |
| 2007-01-04 | Bill | 1 | 7 |
| 2007-01-05 | Bill | 1 | 7 |
| 2007-01-08 | Bill | 1 | 7 |
| 2007-01-09 | Bill | 1 | 7 |
| 2012-07-09 | Bill | 2 | 3 |
| 2012-07-10 | Bill | 2 | 3 |
| 2012-07-11 | Bill | 2 | 3 |
Live test: http://www.sqlfiddle.com/#!3/815c5/1
The main logic of the query is to shift all the dates two days back. This is done by dividing the date to 7 and multiplying it by two, then subtracting it from the original number. For example, if a given date falls on 15th, this will be computed as 15/7 * 2 == 4; then subtract 4 from the original number, 15 - 4 == 11. 15 will become the 11th day. Likewise the 8th day becomes the 6th day; 8 - (8/7 * 2) == 6.
Weekends are not in attendance(e.g. 6,7,13,14)
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15
Applying the computation to all the weekday numbers will yield these values:
1 2 3 4 5
6 7 8 9 10
11
For holidays, you need to slot them on attendance, so to the consecutive-ness could be easily determined, then just remove them from the final query. The above attendance yields 11 consecutive good attendance.
Query logic's detailed explanation here: http://www.ienablemuch.com/2012/07/monitoring-perfect-attendance.html