We are trying to find a syntax to generate the DAY|WEEK|MONTH options from the 3rd param of date functions.
DECLARE var_date_option STRING DEFAULT 'DAY';
select GENERATE_DATE_ARRAY('2019-01-01','2020-01-01',INTERVAL 1 WEEK)
dynamic param here -^^^
Do you know what's the proper syntax to use in DECLARE and that should be converted to valid SQL.
Below is for BigQuery Standard SQL
Those DAY|WEEK|MONTH are LITERALs and cannot be parametrized
And, as you know - dynamic SQL is also not available yet
So, unfortunately below is the only solution I can think of as of today
#standardSQL
DECLARE var_date_option STRING DEFAULT 'DAY';
DECLARE start_date, end_date DATE;
DECLARE date_array ARRAY<DATE>;
SET (start_date, end_date, var_date_option) = ('2019-01-01','2020-01-01', 'MONTH');
SET date_array = (
SELECT CASE var_date_option
WHEN 'DAY' THEN GENERATE_DATE_ARRAY(start_date, end_date, INTERVAL 1 DAY)
WHEN 'WEEK' THEN GENERATE_DATE_ARRAY(start_date, end_date, INTERVAL 1 WEEK)
WHEN 'MONTH' THEN GENERATE_DATE_ARRAY(start_date, end_date, INTERVAL 1 MONTH)
END
);
SELECT * FROM UNNEST(date_array) AS date_dt;
Related
I'm trying to loop over a query result and combine the result.
I want to loop over the variable called rolling date, which gives out an array of dates with 30 day difference.
DECLARE rollingdate ARRAY<DATE>;
SET rollingdate = ( GENERATE_DATE_ARRAY(CURRENT_DATE(), DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY), INTERVAL -30 DAY) );
My table is partitioned by DATE, and I'd like to loop over two consecutive dates from the rolling date and union all the results
select *, rollingdate[0]
from table
where date > rollingdate[1] and date < rollingdate[0]
union all
select *, rollingdate[1]
from table
where date > rollingdate[2] and date < rollingdate[1]
How do I achieve this in bigquery? i tried with bigquery scripts, but they don't take subqueries..
You can try using EXECUTE IMMEDIATE:
DECLARE i INT64 DEFAULT 1;
DECLARE dsql STRING DEFAULT '';
DECLARE rollingdate ARRAY<DATE>;
SET rollingdate = ( GENERATE_DATE_ARRAY(CURRENT_DATE(), DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY), INTERVAL -30 DAY) );
WHILE i <= 2
DO
SET dsql = dsql || " select *, '" || rollingdate[ORDINAL(i)] || "' from table where date > '" || rollingdate[ORDINAL(i+1)] || "' and date < '" || rollingdate[ORDINAL(i)] || "' union all";
SET i = i + 1;
END WHILE;
SET dsql = SUBSTR(dsql, 1, LENGTH(dsql) - LENGTH(' union all'));
EXECUTE IMMEDIATE dsql;
An approach like this should be ok:
with
dates as (
select * from unnest(generate_date_array(current_date, date_sub(current_date(), interval 365 DAY), interval -30 day)) as date
),
date_start_end as (
select lag(date,1) over(order by date asc) as begin_date, date as end_date
from dates
)
select table.*, end_date
from table
inner join date_start_end where date between begin_date and end_date
You might need to make adjustments depending if your date ranges are mean to be inclusive or exclusive.
As mentioned in my comment, any date in table will only be in 1 of your rollingdate intervals. SQL is much more performant when you can do operations on a set and avoid loops.
How can I convert a date integer to a date type? (20200531 into 5/31/2020)
My current table has a datadate formatted as YYYYMMDD (20200531, 20200430, etc.)
The Datatype for the datadate is an int according the Toad Data Point software I'm using. I believe it's using ORACLE sql database.
As a result, when querying this data, I have to type in the where clause as below..
where datadate = '20200531'
My goal is to convert this integer datadate into a date format (5/31/2020) so I can apply the datadate to the where clause.
like..
WHERE datadate = dateadd(DD, -1, CAST(getdate() as date))
(Read below for my answer for if it's an int column)
Assuming it's a textual string:
Assuming that datadate is a string (character, text, etc) column and not a date/datetime/datetime2/datetimeoffset column, then use the CONVERT function with style: 23. The 23 value corresponds to ISO 8601 because the values are in yyyy-MM-dd-order, even though they're missing dashes.
This page has a reference of style numbers: https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-ver15
SELECT
*
FROM
(
SELECT
myTable.*
CONVERT( date, datadate, 23 ) AS valueAsDate
FROM
myTable
) AS q
WHERE
q.valueAsDate = DATEADD( dd, -1, GETDATE() )
Assuming it's an actual int column:
The quick-and-dirty way is to convert the int to varchar and then use the same code as above as if it were a textual field - but don't do this because it's slow:
SELECT
*
FROM
(
SELECT
myTable.*,
CONVERT( char(8), datadate ) AS valueAsChar,
CONVERT( date, CONVERT( char(8), datadate ), 23 ) AS valueAsDate
FROM
myTable
) AS q
WHERE
q.valueAsDate = DATEADD( dd, -1, GETDATE() )
Assuming it's an actual int column (better answer):
We'll need to use DATEFROMPARTS and extract each component using Base-10 arithmetic (fun)!
If we have an integer representing a formatted date (the horror) such as 20200531 then:
We can get the day by performing MOD 31 (e.g. 19950707 MOD 31 == 7)
We can get the month by first dividing by 100 to remove the day part, and then MOD 12: (e.g. 20200531 / 100 == 202005, 202005 MOD 12 == 5)
We can get the year by dividing by 10,000, (e.g. 20200531 / 10000 == 2020).
Btw:
SQL Server uses % for the Modulo operator instead of MOD.
Integer division causes truncation rather than producing decimal or floating-point values (e.g. 5 / 2 == 2 and not 2.5).
Like so:
SELECT
q2.*
FROM
(
SELECT
q.*,
DATEFROMPARTS( q.[Year], q.MonthOfYear, q.DayOfMonth ) AS valueAsDate
FROM
(
SELECT
myTable.*,
( datadate % 31 ) AS DayOfMonth,
( ( datadate / 100 ) % 12 ) AS MonthOfYear,
( datadate / 10000 ) AS [Year]
FROM
myTable
) AS q
) AS q2
WHERE
q2.valueAsDate = DATEADD( dd, -1, GETDATE() )
Obviously, having two nested subqueries is a pain to work with (SQL has terrible ergonomics, I don't understand how or why SQL doesn't allow expressions in a SELECT clause to be used by other expressions in the same query - it's really bad ergonomics...) - but we can convert this to a scalar UDF (and SQL Server will inline scalar UDFs so there's no performance impact).
This function has a TRY/CATCH block in it because of the possibility that you process an invalid value like 20209900 (which isn't a real date as there isn't a 99th month with a 0th day in 2020). In this event the function returns NULL.
CREATE FUNCTION dbo.convertHorribleIntegerDate( #value int ) RETURNS date AS
BEGIN
DECLARE #dayOfMonth int = #value % 31;
DECLARE #monthOfYear int = ( #value / 100 ) % 100;
DECLARE #year int = #value / 10000;
BEGIN TRY
RETURN DATEFROMPARTS( #dayOfMonth, #monthOfYear, #year );
END TRY;
BEGIN CATCH
RETURN NULL;
END CATCH;
END
Which we can use in a query like so:
SELECT
myTable.*,
dbo.convertHorribleIntegerDate( datadate ) AS valueAsDate
FROM
myTable
As SELECT cannot share expression results with other expressions in the same query, you'll still need to use an outer query to work with valueAsDate (or repeat the dbo.convertHorribleIntegerDate function call):
SELECT
*
FROM
(
SELECT
myTable.*,
dbo.convertHorribleIntegerDate( datadate ) AS valueAsDate
FROM
myTable
) AS q
WHERE
q.valueAsDate = DATEADD( dd, -1, GETDATE() )
This answers assumes that you are running Oracle, as suggested in your question.
How can I convert a date integer to a date type? (20200531 into 5/31/2020)
In Oracle, you use to_date() to convert a string to a number. If you are giving it a number, it implicitly converts it to a string before converting it. So in both cases, you would do:
to_date(datadate, 'yyyymmdd')
My goal is to convert this integer datadate into a date format (5/31/2020) so I can apply the datadate to the where clause.
Generally, you want to avoid applying a function on a column in a where predicate: it is not efficient, because the database needs to apply the function on the entire column before it is able to filter. If you want to filter on dateadd as of yesterday, then I would recommend computing yesterday's date and putting it in the same format as the column that is filtered, so you can do a direct match against the existing column values.
If your column is a string:
where datadatea = to_char(sysdate - 1, 'yyyymmdd')
If it's a number:
where datadatea = to_number(to_char(sysdate - 1, 'yyyymmdd'))
Business days are Monday through Friday.
Given I have a datetime field scheduled_for, how can I find the next business date and return that in a column alias?
I've tried something from another SO answer but it doesn't work as intended.
EXTRACT(ISODOW FROM v.scheduled_for)::integer) % 7 as next_business_day,
Error:
Query 1 ERROR: ERROR: syntax error at or near ")"
LINE 3: EXTRACT(ISODOW FROM v.scheduled_for)::integer % 7) as next...
^
Edit:
Thanks for the suggestions, I've attempted this:
SELECT
v.id AS visit_id,
(IF extract(''dow'' from v.scheduled_for) = 0 THEN
return v.scheduled_for + 1::integer;
ELSIF extract(''dow'' from v.scheduled_for) = 6 THEN
return v.scheduled_for - 1::integer;
ELSE
return v.scheduled_for;
) as next_business_day,
'' as invoice_ref_code,
The error I get is:
Query 1 ERROR: ERROR: syntax error at or near ")"
LINE 1: ) as next_business_day,
^
To generalize you need to create a function to calculate the next business day from a given date.
create or replace function utl_next_business_day(date_in date default current_date)
returns date
language sql immutable leakproof strict
as $$
with cd as (select extract(isodow from date_in)::integer d)
select case when d between 1 and 4
then date_in + 1
else date_in + 1 + (7-d)
end
from cd;
$$;
--- any single date
select current_date, utl_next_business_day();
-- over time span (short)
select gdate::date for_date, utl_next_business_day(gdate::date) next_business_day
from generate_series( current_date, current_date + 14, interval '1 day') gdate;
-- around year end over a time span
with test_date (dt) as
( values (date '2019-12-31')
, (date '2020-12-31'), (date '2021-12-31'),(date '2022-12-31')
, (date '2021-01-01'), (date '2022-01-01'),(date '2023-01-01')
)
select dt, utl_next_business_day(dt) from test_date
order by dt;
Alternatively with the calendar table suggestion from #Eric we get.
-- create and populate work table
create table bus_day_calendar ( bus_day date);
insert into bus_day_calendar (bus_day)
select utl_next_business_day(gdate::date)
from generate_series( date '2018-12-31', date '2023-01-01', interval '1 day') gdate
where extract(isodow from gdate)::integer not in (6,7) ;
--- Function to return next business day
create or replace function utl_next_cal_business_day(date_in date default current_date)
returns date
language sql stable leakproof strict
as $$
select min(bus_day)
from bus_day_calendar
where bus_day > date_in;
$$;
--- any single date
select current_date, utl_next_cal_business_day();
-- over time span (short)
select gdate::date for_date, utl_next_cal_business_day(gdate::date) next_business_day
from generate_series( current_date, current_date + 14, interval '1 day') gdate;
-- around year end over a time span
with test_date (dt) as
( values (date '2019-12-31')
, (date '2020-12-31'), (date '2021-12-31'),(date '2022-12-31')
, (date '2021-01-01'), (date '2022-01-01'),(date '2023-01-01')
)
select dt, utl_next_cal_business_day(dt) from test_date
order by dt;
Neither of these as they currently stand handle a non-business day that falls on Mon-Fri, but both can be modified to do so. Since the calendar table requires only deleting roes I think that becomes the superior method if this is necessary.
I am working with BigQuery scripting, I have written a simple WHILE loop which iterates through daily Google Analytics tables and sums the visits, now I'd like to write these results out to a table.
I've gotten as far as creating the table, but I can't capture the value of visits from my SQL query to populate the table. Date works fine, because it is defined outside of the SQL. I tried to DECLARE the value of visits with a new variable, but again this does not work because it's not known outside of the statement.
SET vis = visits;
How can I correctly write my results out to a table?
DECLARE d DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
DECLARE pfix STRING DEFAULT REGEXP_REPLACE(CAST(d AS STRING),"-","");
DECLARE vis INT64;
CREATE OR REPLACE TABLE test.looped_results (Date DATE, Visits INT64);
WHILE d > '2019-10-01' DO
SELECT d, SUM(totals.visits) AS visits
FROM `project.dataset.ga_sessions_*`
WHERE _table_suffix = pfix
GROUP BY Date;
SET d = DATE_SUB(d, INTERVAL 1 DAY);
SET vis = visits;
INSERT INTO test.looped_results VALUES (d, visits);
END WHILE;
Update: I also tried an alternative solution, assigning visits to it's own variable, but this produces the same error:
WHILE d > '2019-10-01' DO
SET vis_count = (SELECT SUM(totals.visits) AS visits
FROM `mindful-agency-136314.43786551.ga_sessions_*`
WHERE _table_suffix = pfix);
INSERT INTO test.looped_results VALUES (d, vis_count);
SET d = DATE_SUB(d, INTERVAL 1 DAY);
END WHILE;
Results:
In my results I see the correct number of rows created, with the correct dates, but the value of visits for each is the value for the most recent day.
I would also move INSERT INTO outside of the WHILE loop by collecting result into result variable (along with few other minor changes) as in below example
DECLARE d DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
DECLARE pfix STRING;
DECLARE vis_count INT64;
DECLARE result ARRAY<STRUCT<vis_date DATE, vis_count INT64>> DEFAULT [];
CREATE OR REPLACE TABLE test.looped_results (Date DATE, Visits INT64);
WHILE d > '2019-10-01' DO
SET pfix = REGEXP_REPLACE(CAST(d AS STRING),"-","");
SET vis_count = (SELECT SUM(totals.visits) AS visits
FROM `project.dataset.ga_sessions_*`
WHERE _table_suffix = pfix);
SET result = ARRAY_CONCAT(result, [STRUCT(d, vis_count)]);
SET d = DATE_SUB(d, INTERVAL 1 DAY);
END WHILE;
INSERT INTO test.looped_results SELECT * FROM UNNEST(result);
Note: I hope your example is for scripting learning purpose and not for production as whenever possible we should stick with set based processing which can be easily done in your case
Here is a better way which is faster and without using a loop.
Basically, you form an array of suffix and do SELECT/INSERT in single query:
DECLARE date_range ARRAY<DATE> DEFAULT
GENERATE_DATE_ARRAY(DATE '2019-10-01', DATE '2019-10-10', INTERVAL 1 DAY);
DECLARE suffix_array ARRAY<STRING>
DEFAULT (SELECT ARRAY_AGG(REGEXP_REPLACE(CAST(dates AS STRING),"-",""))
FROM UNNEST(date_range) dates);
CREATE OR REPLACE TABLE test.looped_results (Date DATE, Visits INT64);
INSERT INTO test.looped_results
SELECT Date, SUM(totals.visits)
FROM `project.dataset.ga_sessions_*`
WHERE _table_suffix in UNNEST(suffix_array);
GROUP BY Date;
Actually, you need to update the pfix variable in there. Also, it is a good idea to instantiate the visits. Finally, your GROUPBY doesn't necessarily need a dimension if you are providing it with a pfix constraint.
This should do it:
DECLARE d DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
DECLARE pfix STRING DEFAULT REGEXP_REPLACE(CAST(d AS STRING),'-','');
DECLARE visits int64;
SET visits = 0;
CREATE OR REPLACE TABLE project.dataset.looped_results (Date DATE, Visits INT64);
WHILE d > '2019-10-01' DO
SET visits = (SELECT SUM(totals.visits) FROM `project.dataset.ga_sessions_*` WHERE _table_suffix = pfix);
SET d = DATE_SUB(d, INTERVAL 1 DAY);
SET pfix = REGEXP_REPLACE(CAST(d AS STRING),"-","");
INSERT INTO dataset.looped_results VALUES (d, visits);
END WHILE;
Hope it helps.
Having reviewed my code (several times!) I realized that I wasn't refreshing the variable which transforms the data into the table prefix within the loop.
Here is a working version of the script, where I set pfix at the end of the loop:
DECLARE d DATE DEFAULT DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY);
DECLARE pfix STRING DEFAULT REGEXP_REPLACE(CAST(d AS STRING),"-","");
DECLARE vis_count INT64;
CREATE OR REPLACE TABLE test.looped_results (Date DATE, Visits INT64);
WHILE d > '2019-10-01' DO
SET vis_count = (SELECT SUM(totals.visits) AS visits
FROM `project.dataset.ga_sessions_*`
WHERE _table_suffix = pfix);
INSERT INTO test.looped_results VALUES (d, vis_count);
SET d = DATE_SUB(d, INTERVAL 1 DAY);
SET pfix = REGEXP_REPLACE(CAST(d AS STRING),"-","");
END WHILE;
I would actually not use a while loop but rather a group by
SELECT date, SUM(totals.visits) AS visits
FROM `mindful-agency-136314.43786551.ga_sessions_*`
GROUP BY Date;
It will give you your results as the table that you want, you don't need to loop on your table.
Depending on your set up but you can set the query to be ran every day so every new day you will have the new values
I need to calculate the number of days between dates as detailed below using MSSQL
Each month should be considered as if it has 30 days (even if it doesn't)
The difference between 2 January, 2013 to 2 March, 2013 will be
(30-2) + 30 + 2 days
where (30-2) will be for January
30 will be for February
2 will be for March
create or replace function datediff( p_what in varchar2,
p_d1 in date,
p_d2 in date ) return number
as
l_result number;
begin
select (p_d2-p_d1) *
decode( upper(p_what),
'DAY', 1, 'SS', 24*60*60, 'MI', 24*60, 'HH', 24, NULL )
into l_result from dual;
return l_result;
end;
/
This is what I do in Oracle (Courtesy: ASKTOM).
I get either days, hours, minutes or seconds in difference.
In MS SQL, either
PRINT DATEDIFF(DAY, '1/1/2011', '3/1/2011')
This gives the number of times the midnight boundary is crossed between the two dates. You may decide to need to add one to this if you're including both dates in the count - or subtract one if you don't want to include either date.
OR
DECLARE #startdate datetime2 = '2007-05-05 12:10:09.3312722';
DECLARE #enddate datetime2 = '2009-05-04 12:10:09.3312722';
SELECT DATEDIFF(day, #startdate, #enddate);
Using this you can manipulate.
Looks like you want to get a result similar to Oracle's MONTHS_BETWEEN in SQL Server.
This is a SQL function i wrote in Teradata, you probably just have to change EXTRACT to YEAR/MONTH/DAY(date)
REPLACE FUNCTION MONTHS_BETWEEN(date1 DATE, date2 DATE)
RETURNS FLOAT
SPECIFIC months_between_DT
RETURNS NULL ON NULL INPUT
CONTAINS SQL
DETERMINISTIC
COLLATION INVOKER
INLINE TYPE 1
RETURN
(EXTRACT(YEAR FROM date1) * 12 + EXTRACT(MONTH FROM date1))
- (EXTRACT(YEAR FROM date2) * 12 + EXTRACT(MONTH FROM date2))
+ CASE
WHEN EXTRACT(MONTH FROM date2) <> EXTRACT(MONTH FROM date2+1) AND
EXTRACT(MONTH FROM date1) <> EXTRACT(MONTH FROM date1+1)
THEN 0
ELSE (CAST(1 AS FLOAT))/31 * (EXTRACT(DAY FROM date1) - EXTRACT(DAY FROM date2))
END
;
Then you simply multiply the result * 30 and cast it to an INT.