How to Create table with Dates in range defined by table with start date inputs - sql

I am trying to create a dates table in SQL based on a set of inputs, but I haven't been able to figure it out.
I am receiving in SQL inputs as below:
This table:
Date
Value
2022-01-01
5
2022-07-12
10
2022-11-15
3
A Start Date = 2022-01-01
A stop Date = 2022-12-01
I need to get a table as below starting from Start Date until Stop Date, assiging each correspondent number based on the initial table to each date in that period:
Date
Value
2022-01-01
5
2022-01-02
5
2022-01-03
5
2022-01-04
5
.
5
.
5
.
5
2022-07-09
5
2022-07-10
5
2022-07-11
5
2022-07-12
10
2022-07-13
10
2022-07-14
10
.
10
.
10
2022-11-13
10
2022-11-14
10
2022-11-15
3
2022-11-16
3
2022-11-17
3
2022-11-18
3
How can I do that?
Thanks.

Using the window function lead() over() in concert with an ad-hoc tally table
Example
Select Date = dateadd(DAY,N,A.Date)
,A.Value
From (
Select *
,nDays = datediff(DAY,Date,lead(Date,1,dateadd(day,1,'2022-12-01')) over (order by date))
From YourTable
) A
Join ( Select Top 1000 N=-1+Row_Number() Over (Order By (Select NULL)) From master..spt_values n1, master..spt_values n2 ) B
on N<NDays
Order by Date
Results
Date Value
2022-01-01 5
2022-01-02 5
2022-01-03 5
2022-01-04 5
2022-01-05 5
...
2022-07-10 5
2022-07-11 5
2022-07-12 10
2022-07-13 10
2022-07-14 10
...
2022-11-12 10
2022-11-13 10
2022-11-14 10
2022-11-15 3
2022-11-16 3
2022-11-17 3
...
2022-11-30 3
2022-12-01 3

Related

Get data from exactly 30 days ago only in SQL (Big Query)

In BigQuery I am trying to extract data from an exact date, 30 days ago, so that every day when I pull/refresh the data, it is always 30 days ago - no more, no less, however using the following it pulls in two dates:
SELECT FORMAT_DATE("%Y-%m-%d",createddatetime1) as dated, brand, orderid
FROM TABLE
AND createddatetime1 between TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 29 DAY)
I have tried different tactics, such as convert and cast, but I cant seem to pull data for one day only. createddatetime1 is formatted as "2022-08-02 23:53:57 UTC"
Example current output, you'll see two dates in there:
Row createddatetime1 brand orderid
1 2022-08-02 23:53:57 UTC ABC 1
2 2022-08-02 14:11:05 UTC ABC 2
3 2022-08-02 13:31:52 UTC ABC 3
4 2022-08-02 20:14:16 UTC ABC 4
5 2022-08-02 23:18:28 UTC ABC 5
6 2022-08-02 17:27:06 UTC ABC 6
7 2022-08-03 01:44:12 UTC ABC 7
8 2022-08-03 09:57:19 UTC ABC 8
9 2022-08-02 12:32:23 UTC ABC 9
10 2022-08-02 18:52:33 UTC ABC 10
Expected output:
Row createddatetime1 brand orderid
1 02/08/2022 ABC 1
2 02/08/2022 ABC 2
3 02/08/2022 ABC 3
4 02/08/2022 ABC 4
5 02/08/2022 ABC 5
6 02/08/2022 ABC 6
7 02/08/2022 ABC 7
8 02/08/2022 ABC 8
9 02/08/2022 ABC 9
10 02/08/2022 ABC 10
You're getting data for both dates as BETWEEN has both boundaries inclusive i.e. Both the start and end values are included. You need to extract the date from the timestamp column and use equality to filter the required rows.
This should work
where DATE(createddatetime1) = DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
This should work:
SELECT
Date(createddatetime1) as date, brand, orderid
FROM TABLE
where DATE(createddatetime1) = current_date()-30

SQL lag to row which meets condition

I have a table which contains measures taken on random dates, partitioned by the site at which they were taken.
site
date
measurement
AB1234
2022-12-09
1
AB1234
2022-06-11
2
AB1234
2019-05-22
3
AB1234
2017-01-30
4
CD5678
2022-11-01
5
CD5678
2020-04-10
6
CD5678
2017-04-10
7
CD5678
2017-01-22
8
In order to calculate a year on year growth, I want to have an additional field for each record which contains the previous measurement at that site. The challenging part is that I only want the previous which occurred more than a year in the past.
Like so:
site
date
measurement
previous_measurement
AB1234
2022-12-09
1
3
AB1234
2022-06-11
2
3
AB1234
2019-05-22
3
4
AB1234
2017-01-30
4
NULL
CD5678
2022-11-01
5
6
CD5678
2020-04-10
6
7
CD5678
2017-04-10
7
NULL
CD5678
2017-01-22
8
NULL
It feels like it should be possible with a window function, but I can't work it out.
Please help :(
Amazon Athena engine version 3 incorporated from Trino. If it has incorporated full support for frame type RANGE for window functions you can use that:
-- sample data
with dataset(site, date, measurement) as (
values ('AB1234', date '2022-12-09', 1),
('AB1234', date '2022-06-11', 2),
('AB1234', date '2019-05-22', 3),
('AB1234', date '2017-01-30', 4),
('CD5678', date '2022-11-01', 5),
('CD5678', date '2020-04-10', 6),
('CD5678', date '2017-04-10', 7),
('CD5678', date '2017-01-22', 8)
)
-- query
select *,
last_value(measurement) over (
partition by site
order by date
RANGE BETWEEN UNBOUNDED PRECEDING AND interval '1' year PRECEDING)
from dataset;
Output:
site
date
measurement
_col3
CD5678
2017-01-22
8
NULL
CD5678
2017-04-10
7
NULL
CD5678
2020-04-10
6
7
CD5678
2022-11-01
5
6
AB1234
2017-01-30
4
NULL
AB1234
2019-05-22
3
4
AB1234
2022-06-11
2
3
AB1234
2022-12-09
1
3

CASE in WHERE Clause in Snowflake

I am trying to do a case statement within the where clause in snowflake but I’m not quite sure how should I go about doing it.
What I’m trying to do is, if my current month is Jan, then the where clause for date is between start of previous year and today. If not, the where clause for date would be between start of current year and today.
WHERE
CASE MONTH(CURRENT_DATE()) = 1 THEN DATE BETWEEN DATE_TRUNC(‘YEAR’, DATEADD(YEAR, -1, CURRENT_DATE())) AND CURRENT_DATE()
CASE MONTH(CURRENT_DATE()) != 1 THEN DATE BETWEEN DATE_TRUNC(‘YEAR’, CURRENT_DATE()) AND CURRENT_DATE()
END
Appreciate any help on this!
Use a CASE expression that returns -1 if the current month is January or 0 for any other month, so that you can get with DATEADD() a date of the previous or the current year to use in DATE_TRUNC():
WHERE DATE BETWEEN
DATE_TRUNC('YEAR', DATEADD(YEAR, CASE WHEN MONTH(CURRENT_DATE()) = 1 THEN -1 ELSE 0 END, CURRENT_DATE()))
AND
CURRENT_DATE()
I suspect that you don't even need to use CASE here:
WHERE
(MONTH(CURRENT_DATE()) = 1 AND
DATE BETWEEN DATE_TRUNC(‘YEAR’, DATEADD(YEAR, -1, CURRENT_DATE())) AND
CURRENT_DATE()) OR
(MONTH(CURRENT_DATE()) != 1 AND
DATE BETWEEN DATE_TRUNC(‘YEAR’, CURRENT_DATE()) AND CURRENT_DATE())
So the other answers are quite good, but... the answer can be even simpler
Making a little table to brake down what is happening.
select
row_number() over (order by null) - 1 as rn,
dateadd('day', rn * 5, date_trunc('year',current_date())) as pretend_current_date,
DATEADD(YEAR, -1, pretend_current_date) as pcd_sub1,
month(pretend_current_date) as pcd_month,
DATE_TRUNC(year, iff(pcd_month = 1, pcd_sub1, pretend_current_date)) as _from,
pretend_current_date as _to
from table(generator(ROWCOUNT => 30))
order by rn;
this shows:
RN
PRETEND_CURRENT_DATE
PCD_SUB1
PCD_MONTH
_FROM
_TO
0
2022-01-01
2021-01-01
1
2021-01-01
2022-01-01
1
2022-01-06
2021-01-06
1
2021-01-01
2022-01-06
2
2022-01-11
2021-01-11
1
2021-01-01
2022-01-11
3
2022-01-16
2021-01-16
1
2021-01-01
2022-01-16
4
2022-01-21
2021-01-21
1
2021-01-01
2022-01-21
5
2022-01-26
2021-01-26
1
2021-01-01
2022-01-26
6
2022-01-31
2021-01-31
1
2021-01-01
2022-01-31
7
2022-02-05
2021-02-05
2
2022-01-01
2022-02-05
8
2022-02-10
2021-02-10
2
2022-01-01
2022-02-10
9
2022-02-15
2021-02-15
2
2022-01-01
2022-02-15
10
2022-02-20
2021-02-20
2
2022-01-01
2022-02-20
11
2022-02-25
2021-02-25
2
2022-01-01
2022-02-25
12
2022-03-02
2021-03-02
3
2022-01-01
2022-03-02
13
2022-03-07
2021-03-07
3
2022-01-01
2022-03-07
14
2022-03-12
2021-03-12
3
2022-01-01
2022-03-12
15
2022-03-17
2021-03-17
3
2022-01-01
2022-03-17
16
2022-03-22
2021-03-22
3
2022-01-01
2022-03-22
17
2022-03-27
2021-03-27
3
2022-01-01
2022-03-27
18
2022-04-01
2021-04-01
4
2022-01-01
2022-04-01
19
2022-04-06
2021-04-06
4
2022-01-01
2022-04-06
20
2022-04-11
2021-04-11
4
2022-01-01
2022-04-11
21
2022-04-16
2021-04-16
4
2022-01-01
2022-04-16
22
2022-04-21
2021-04-21
4
2022-01-01
2022-04-21
23
2022-04-26
2021-04-26
4
2022-01-01
2022-04-26
24
2022-05-01
2021-05-01
5
2022-01-01
2022-05-01
25
2022-05-06
2021-05-06
5
2022-01-01
2022-05-06
26
2022-05-11
2021-05-11
5
2022-01-01
2022-05-11
27
2022-05-16
2021-05-16
5
2022-01-01
2022-05-16
28
2022-05-21
2021-05-21
5
2022-01-01
2022-05-21
29
2022-05-26
2021-05-26
5
2022-01-01
2022-05-26
Your logic is asking "is the current date in the month of January", at which point take the prior year, and then date truncate to the year, otherwise take the current date and truncate to the year. As the start of a BETWEEN test.
This is the same as getting the current date subtracting one month, and truncating this to year.
Thus there is no need for any IFF or CASE
WHERE date BETWEEN DATE_TRUNC(year, DATEADD(month,-1, CURRENT_DATE())) AND CURRENT_DATE()
and if you like to drop some paren's, CURRENT_DATE can be used if you leave it in upper case, thus it can even be smaller:
WHERE date BETWEEN DATE_TRUNC(year, DATEADD(month,-1, CURRENT_DATE)) AND CURRENT_DATE

How can I join two tables on an ID and a DATE RANGE in SQL

I have 2 query result tables containing records for different assessments. There are RAssessments and NAssessments which make up a complete review.
The aim is to eventually determine which reviews were completed. I would like to join the two tables on the ID, and on the date, HOWEVER the date each assessment is completed on may not be identical and may be several days apart, and some ID's may have more of an RAssessment than an NAssessment.
Therefore, I would like to join T1 on to T2 on ID & on T1Date(+ or - 7 days). There is no other way to match the two tables and to align the records other than using the date range, as this is a poorly designed database. I hope for some help with this as I am stumped.
Here is some sample data:
Table #1:
ID
RAssessmentDate
1
2020-01-03
1
2020-03-03
1
2020-05-03
2
2020-01-09
2
2020-04-09
3
2022-07-21
4
2020-06-30
4
2020-12-30
4
2021-06-30
4
2021-12-30
Table #2:
ID
NAssessmentDate
1
2020-01-07
1
2020-03-02
1
2020-05-03
2
2020-01-09
2
2020-07-06
2
2020-04-10
3
2022-07-21
4
2021-01-03
4
2021-06-28
4
2022-01-02
4
2022-06-26
I would like my end result table to look like this:
ID
RAssessmentDate
NAssessmentDate
1
2020-01-03
2020-01-07
1
2020-03-03
2020-03-02
1
2020-05-03
2020-05-03
2
2020-01-09
2020-01-09
2
2020-04-09
2020-04-10
2
NULL
2020-07-06
3
2022-07-21
2022-07-21
4
2020-06-30
NULL
4
2020-12-30
2021-01-03
4
2021-06-30
2021-06-28
4
2021-12-30
2022-01-02
4
NULL
2022-01-02
Try this:
SELECT
COALESCE(a.ID, b.ID) ID,
a.RAssessmentDate,
b.NAssessmentDate
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table1
) a
FULL OUTER JOIN (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table2
) b ON a.ID = b.ID AND a.RowId = b.RowId
WHERE (a.RAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')
OR (b.NAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')

Select statement for overlapping dates

I need a SELECT query that returns the RoomID's of rows in which the dates overlap each other, ex.
Client ID 10 and 6 arrive on different days, but they are assigned to the same room during their stay at the hotel.
RoomID ArrivalDate DepartureDate ClientID
2 2020-11-02 2021-11-10 10
2 2021-11-01 2021-11-11 6
4 2021-10-18 2021-10-20 4
4 2021-12-13 2021-12-21 11
4 2021-12-14 2021-12-21 12
8 2021-12-10 2021-12-19 8
9 2021-09-20 2021-09-25 2
9 2021-09-21 2021-09-25 1
9 2021-12-10 2021-12-15 7
10 2021-10-19 2021-10-26 5
11 2021-10-02 2021-10-10 3
11 2021-12-12 2021-12-18 9
12 2021-10-04 2021-10-09 2
CREATE DATABASE Hotel;
CREATE TABLE reservations (
roomID INT NOT NULL,
ArrivalDate DATE NOT NULL,
DepartureDate DATE NOT NULL,
clientID INT NOT NULL,
PRIMARY KEY (roomID, ArrivalDate),
CHECK (ArrivalDate <= DepartureDate)
);
I appreciate any help.
You can get overlaps using exists:
select t.*
from t
where exists (select 1
from t t2
where t2.RoomID = t.RoomId and
t2.ClientID <> t.ClientId and
t2.ArrivalDate < t.DepartureDate and
t2.DepartureDate > t.ArrivalDate
);