Get data from exactly 30 days ago only in SQL (Big Query) - google-bigquery

In BigQuery I am trying to extract data from an exact date, 30 days ago, so that every day when I pull/refresh the data, it is always 30 days ago - no more, no less, however using the following it pulls in two dates:
SELECT FORMAT_DATE("%Y-%m-%d",createddatetime1) as dated, brand, orderid
FROM TABLE
AND createddatetime1 between TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 29 DAY)
I have tried different tactics, such as convert and cast, but I cant seem to pull data for one day only. createddatetime1 is formatted as "2022-08-02 23:53:57 UTC"
Example current output, you'll see two dates in there:
Row createddatetime1 brand orderid
1 2022-08-02 23:53:57 UTC ABC 1
2 2022-08-02 14:11:05 UTC ABC 2
3 2022-08-02 13:31:52 UTC ABC 3
4 2022-08-02 20:14:16 UTC ABC 4
5 2022-08-02 23:18:28 UTC ABC 5
6 2022-08-02 17:27:06 UTC ABC 6
7 2022-08-03 01:44:12 UTC ABC 7
8 2022-08-03 09:57:19 UTC ABC 8
9 2022-08-02 12:32:23 UTC ABC 9
10 2022-08-02 18:52:33 UTC ABC 10
Expected output:
Row createddatetime1 brand orderid
1 02/08/2022 ABC 1
2 02/08/2022 ABC 2
3 02/08/2022 ABC 3
4 02/08/2022 ABC 4
5 02/08/2022 ABC 5
6 02/08/2022 ABC 6
7 02/08/2022 ABC 7
8 02/08/2022 ABC 8
9 02/08/2022 ABC 9
10 02/08/2022 ABC 10

You're getting data for both dates as BETWEEN has both boundaries inclusive i.e. Both the start and end values are included. You need to extract the date from the timestamp column and use equality to filter the required rows.
This should work
where DATE(createddatetime1) = DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)

This should work:
SELECT
Date(createddatetime1) as date, brand, orderid
FROM TABLE
where DATE(createddatetime1) = current_date()-30

Related

Calculate year on year Run_time from cumulative Run_time

In SQL Server, I have maintained following details.
S.No
Vehicle_ID
Start Date
Failed Date
Total Run years
1
1
2011-01-01
2013-12-31
3
2
1
2014-01-01
2015-12-31
2
3
1
2016-01-01
2019-12-31
4
4
1
2020-01-01
2022-12-31
3
5
2
2011-01-01
2015-12-31
5
6
2
2016-01-01
2022-12-31
7
8
3
2013-01-01
2016-12-31
4
10
3
2017-01-01
2021-12-31
5
I would like to calculate year on year Run_time from cumulative Run_time
Required result like this:
Year
Run Years
2011
2
2012
2
2013
3
2014
3
2015
3
2016
3
2017
3
2018
3
2019
3
2020
3
2021
3
2022
2
Year column first year is MIN year from Start Date
Year column end year is MAX year from Failed Date
First of all we can generate list of relevant years using GENERATE_SERIES
SELECT value [Year]
FROM GENERATE_SERIES(
(SELECT YEAR(MIN(StartDate)) FROM Cars),
(SELECT YEAR(MAX(EndDate)) FROM Cars)
)
After this we can JOIN YEARS table with our data group it by Year ad get count:
WITH YEARS AS (
SELECT value [Year]
FROM GENERATE_SERIES(
(SELECT YEAR(MIN(StartDate)) FROM Cars),
(SELECT YEAR(MAX(EndDate)) FROM Cars)
)
) SELECT [Year], COUNT(*)
FROM YEARS
JOIN Cars ON [Year] BETWEEN YEAR(StartDate) AND YEAR(EndDate)
GROUP BY [Year]
And test above queries on:
https://sqlize.online/sql/mssql2022/9ae8b354190c9cf2d19739bf9b9a9816/

SQL lag to row which meets condition

I have a table which contains measures taken on random dates, partitioned by the site at which they were taken.
site
date
measurement
AB1234
2022-12-09
1
AB1234
2022-06-11
2
AB1234
2019-05-22
3
AB1234
2017-01-30
4
CD5678
2022-11-01
5
CD5678
2020-04-10
6
CD5678
2017-04-10
7
CD5678
2017-01-22
8
In order to calculate a year on year growth, I want to have an additional field for each record which contains the previous measurement at that site. The challenging part is that I only want the previous which occurred more than a year in the past.
Like so:
site
date
measurement
previous_measurement
AB1234
2022-12-09
1
3
AB1234
2022-06-11
2
3
AB1234
2019-05-22
3
4
AB1234
2017-01-30
4
NULL
CD5678
2022-11-01
5
6
CD5678
2020-04-10
6
7
CD5678
2017-04-10
7
NULL
CD5678
2017-01-22
8
NULL
It feels like it should be possible with a window function, but I can't work it out.
Please help :(
Amazon Athena engine version 3 incorporated from Trino. If it has incorporated full support for frame type RANGE for window functions you can use that:
-- sample data
with dataset(site, date, measurement) as (
values ('AB1234', date '2022-12-09', 1),
('AB1234', date '2022-06-11', 2),
('AB1234', date '2019-05-22', 3),
('AB1234', date '2017-01-30', 4),
('CD5678', date '2022-11-01', 5),
('CD5678', date '2020-04-10', 6),
('CD5678', date '2017-04-10', 7),
('CD5678', date '2017-01-22', 8)
)
-- query
select *,
last_value(measurement) over (
partition by site
order by date
RANGE BETWEEN UNBOUNDED PRECEDING AND interval '1' year PRECEDING)
from dataset;
Output:
site
date
measurement
_col3
CD5678
2017-01-22
8
NULL
CD5678
2017-04-10
7
NULL
CD5678
2020-04-10
6
7
CD5678
2022-11-01
5
6
AB1234
2017-01-30
4
NULL
AB1234
2019-05-22
3
4
AB1234
2022-06-11
2
3
AB1234
2022-12-09
1
3

How to Create table with Dates in range defined by table with start date inputs

I am trying to create a dates table in SQL based on a set of inputs, but I haven't been able to figure it out.
I am receiving in SQL inputs as below:
This table:
Date
Value
2022-01-01
5
2022-07-12
10
2022-11-15
3
A Start Date = 2022-01-01
A stop Date = 2022-12-01
I need to get a table as below starting from Start Date until Stop Date, assiging each correspondent number based on the initial table to each date in that period:
Date
Value
2022-01-01
5
2022-01-02
5
2022-01-03
5
2022-01-04
5
.
5
.
5
.
5
2022-07-09
5
2022-07-10
5
2022-07-11
5
2022-07-12
10
2022-07-13
10
2022-07-14
10
.
10
.
10
2022-11-13
10
2022-11-14
10
2022-11-15
3
2022-11-16
3
2022-11-17
3
2022-11-18
3
How can I do that?
Thanks.
Using the window function lead() over() in concert with an ad-hoc tally table
Example
Select Date = dateadd(DAY,N,A.Date)
,A.Value
From (
Select *
,nDays = datediff(DAY,Date,lead(Date,1,dateadd(day,1,'2022-12-01')) over (order by date))
From YourTable
) A
Join ( Select Top 1000 N=-1+Row_Number() Over (Order By (Select NULL)) From master..spt_values n1, master..spt_values n2 ) B
on N<NDays
Order by Date
Results
Date Value
2022-01-01 5
2022-01-02 5
2022-01-03 5
2022-01-04 5
2022-01-05 5
...
2022-07-10 5
2022-07-11 5
2022-07-12 10
2022-07-13 10
2022-07-14 10
...
2022-11-12 10
2022-11-13 10
2022-11-14 10
2022-11-15 3
2022-11-16 3
2022-11-17 3
...
2022-11-30 3
2022-12-01 3

Get all dates for all date ranges in table using SQL Server

I have table dbo.WorkSchedules(Id, From, To) where I store date ranges for work schedules. I want to create a view that will have all dates for all rows of WorkSchedules. Thanks to this I have 1 view with all dates for all schedules.
On web I only found solutions for 1 row like 2 parameters start and end. My issue is different where I have multiple rows with start and end range.
Example:
WorkSchedules
Id | From | To
---+------------+-----------
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
Desired result
1 | 2018-01-01
2 | 2018-01-02
3 | 2018-01-03
4 | 2018-01-04
5 | 2018-01-05
6 | 2018-01-08
7 | 2018-01-09
8 | 2018-01-10
9 | 2018-01-11
10| 2018-01-12
If you are regularly dealing with "jobs" and "schedules" then I propose that you need a permanent calendar table (a table where each row is a unique date). You can create rows for dates dynamically but why do this many times when you can do it once and just re-use?
A calendar table, even of several decades, isn't "big" and when indexed they can be very fast as well. You can also store information about holidays and/or fiscal periods etc.
There are many scripts available to produce these tables, here's an answer with 2 scripts on this site: https://stackoverflow.com/a/5635628/2067753
Assuming you use the second (more comprehensive) script, then you can exclude weekends, or other conditions such as holidays, from query results.
Once you have a permanent Calendar table this style of query may be used:
CREATE TABLE WorkSchedules(
Id INTEGER NOT NULL PRIMARY KEY
,[From] DATE NOT NULL
,[To] DATE NOT NULL
);
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (1,'2018-01-01','2018-01-05');
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (2,'2018-01-12','2018-01-12');
with range as (
select min(ws.[From]) as dt_from, max(ws.[To]) dt_to
from WorkSchedules as ws
)
select c.*
from calendar as c
inner join range on c.date between range.dt_from and range.dt_to
where c.KindOfDay = 'BANKDAY'
order by c.date
and the result looks like this (note: "News Years Day" has been excluded)
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- -------------
1 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
2 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
3 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
4 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
5 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
6 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
7 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
8 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
9 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
Without the where clause the full range is:
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- ----------------
1 01.01.2018 00:00:00 2018 1 1 1 1 1 1 2018 1 1 HOLIDAY New Year's Day
2 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
3 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
4 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
5 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
6 06.01.2018 00:00:00 2018 1 1 1 6 6 6 2018 1 1 SATURDAY NULL
7 07.01.2018 00:00:00 2018 1 1 1 7 7 7 2018 1 1 SUNDAY NULL
8 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
9 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
10 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
11 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
12 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
and weekends and holidays may be excluded using the column KindOfDay
See this as a demonstration (with build of calendar table) here: http://rextester.com/CTSW63441
Ok, I worked this out for you, thinking you mean that you meant 01/08/2018 as a From date in the second row.
/*WorkSchedules
Id| From | To
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
*/
--DROP TABLE #WorkSchedules;
CREATE TABLE #WorkSchedules (
ID int,
[DateFrom] DATE,
[DateTo] DATE
)
INSERT INTO #WorkSchedules
SELECT 1, '2018-01-01', '2018-01-05'
UNION
SELECT 2, '2018-01-08', '2018-01-12'
;WITH CTEDATELIMITS AS (
SELECT [DateFrom], [DateTo]
FROM #WorkSchedules
)
,CTEDATES AS
(
SELECT [DateFrom] as [DateResult] FROM CTEDATELIMITS
UNION ALL
SELECT DATEADD(Day, 1, [DateResult]) FROM CTEDATES
JOIN CTEDATELIMITS ON CTEDATES.[DateResult] >= CTEDATELIMITS.[DateFrom]
AND CTEDATES.dateResult < CTEDATELIMITS.[DateTo]
)
SELECT [DateResult] FROM CTEDATES
ORDER BY [DateResult]
You would use a recursive CTE:
with dates as (
select from, to, from as date
from WorkSchedules
union all
select from, to, dateadd(day, 1, date)
from dates
where date < to
)
select row_number() over (order by date), date
from dates;
Note that from and to are reserved words in SQL. They are lousy names for identifiers. I have not escaped them because I assume they are not the actual names of the columns.

How do I aggregate hourly data to daily, weekly, monthly and yearly values in PostgreSQL?

I have a table my_data in my PostgreSQL 9.5 database containing hourly data. The sample data is like:
ID Date hour value
1 01/01/2014 1 9.947484
2 01/01/2014 2 9.161652
3 01/01/2014 3 8.509986
4 01/01/2014 4 7.666654
5 01/01/2014 5 7.110822
6 01/01/2014 6 6.765822
7 01/01/2014 7 6.554989
8 01/01/2014 8 6.574156
9 01/01/2014 9 6.09499
10 01/01/2014 10 8.471653
11 01/01/2014 11 11.36581
12 01/01/2014 12 11.25081
13 01/01/2014 13 9.391651
14 01/01/2014 14 6.976655
15 01/01/2014 15 6.574156
16 01/01/2014 16 6.420823
17 01/01/2014 17 6.229156
18 01/01/2014 18 5.577491
19 01/01/2014 19 4.964159
20 01/01/2014 20 6.593323
21 01/01/2014 21 7.321654
22 01/01/2014 22 9.295818
23 01/01/2014 23 8.241653
24 01/01/2014 24 7.014989
25 02/01/2014 1 6.842489
26 02/01/2014 2 7.513321
27 02/01/2014 3 7.244988
28 02/01/2014 4 5.80749
29 02/01/2014 5 5.481658
30 02/01/2014 6 6.669989
.. .. .. ..
and so on. The data exist for many years in the same manner. Structure of above table is: ID (integer serial not null), Date (date) (mm/dd/yyyy), hour (integer), value (numeric). For a large set of data like above, how do I find daily, weekly, monthly and yearly averages in PostgreSQL?
You use aggregation. For instance:
select date, avg(value)
from t
group by date
order by date;
For the rest, use date_trunc():
select date_trunc('month', date) as yyyymm, avg(value)
from t
group by yyyymm
order by yyyymm;
This assumes that date is stored as a date data type. If it is stored as a string you should fix the data type in your data. You can convert it to a date using to_date().