Issue with Date_SUB And BETWEEN function - google-bigquery

I came across this issue while creating a parametrised query; Then intent of the query is to pull past 5 months (excluding current month) data based on the date passed as a variable. Basic table schema for Table A is as follows:
as_of_date Y X
2019-12-31 1 AB
2019-11-30 2 CD
2019-10-31 3 EF
2019-09-30 4 GH
2019-08-31 5 MN
2019-07-31 6 XYZ
2020-01-31 7 PQR
2020-02-29 8 AAA
Following is the query I wrote:
WITH
date
AS
(
SELECT CAST("2020-02-29" AS Date) as run_date
)
SELECT DISTINCT CAST(a.as_of_date AS DATE) as_of_date,
FROM A as a
WHERE CAST(a.as_of_date AS DATE) BETWEEN DATE_SUB((SELECT run_date FROM date), INTERVAL 5 Month) AND DATE_SUB((SELECT run_date FROM date), INTERVAL 1 Month)
This query runs fine when run_date is set to "2020-01-31" and returns past 5 months data i.e. Dec, Nov, October, Sept, and August. But fails when date is set to "2020-02-29" it only returns 4 months data.

Simple "fix" is to add DATE_TRUNC(..., MONTH) as in below example
SELECT DISTINCT CAST(a.as_of_date AS DATE) as_of_date,
FROM `project.dataset.tableA` AS a
WHERE DATE_TRUNC(CAST(a.as_of_date AS DATE), MONTH)
BETWEEN DATE_TRUNC(DATE_SUB((SELECT run_date FROM date_cte), INTERVAL 5 Month), MONTH)
AND DATE_TRUNC(DATE_SUB((SELECT run_date FROM date_cte), INTERVAL 1 Month), MONTH)

Related

Get count of day types between two dates

I am trying the get the count of week days between two dates for which I have not found the solution in BigQuery standard sql. I have tried the BQ sql date function DATE_DIFF(date_expression_a, date_expression_b, date_part) following published examples, but it did not reveal the result.
For example, I have two dates 2021-02-13 and 2021-03-31 and my desired outcome would be:
MON
TUE
WED
THUR
FRI
SAT
SUN
6
6
6
6
7
7
7
Consider below approach
with your_table as (
select date
from unnest(generate_date_array("2021-02-13", "2021-03-30")) AS date
)
select * from your_table
pivot (count(*) for format_date('%a', date) in ('Mon','Tue','Wed','Thu','Fri','Sat','Sun'))
with output
Or you can just simply do
select
format_date('%a', date) day_of_week,
count(*) counts
from your_table
group by day_of_week
with output
You can do the following:
SELECT
CASE EXTRACT(DAYOFWEEK
FROM
dates)
WHEN 1 THEN 'MON'
WHEN 2 THEN 'TUE'
WHEN 3 THEN 'WED'
WHEN 4 THEN 'THU'
WHEN 5 THEN 'FRI'
WHEN 6 THEN 'SAT'
WHEN 7 THEN 'SUN'
END
AS day_of_week,
COUNT(*) AS day_count
FROM
UNNEST(GENERATE_DATE_ARRAY("2021-02-13", "2021-03-30")) AS dates
GROUP BY 1
The important part is the GENERATE_DATE_ARRAY function, that will return all the dates between the dates you're interested in. UNNEST will return one row for each date (instead of one row for the array of all dates).
From there, you can extract the day of the week thanks to the BQ date functions, and count the number of occurences with a GROUP BY day_of_week.
The above query gives the following result:

Big Query (SQL) Add one month to the date (Issue) - (Data Studio)

Currently, I'm adding an extra month this way:
DATE_ADD(date, INTERVAL 1 MONTH) AS pDate
I'm trying to compare two values by month, by using the same date range. So I made another custom field with date +1 month and when I use it...it missing days with 31.
June 30 +1 month = July 31 and as I using custom field... it missing fields where date with 31
EDITING v1.0
I have a database for a year, each day is presented.
As example:
01012018
....
31012018
01022018
...
28022018
I need to compare two time period and to solve this issue I create a custom field which takes the date and add +1 month, so after in Data studio (could be in any over platform) can compare 25-30January with 25-30February, the issue is when I add 1 month to the date 30012018 it becomes 30022018 (WHich as you know does not exist)
Anyway, I'm sticking with this idea, but maybe there are any other ways of doing this? WIll repeat again, I need to compare the same date but from different month - 15th January -20 January WITH 15th February - 20th February, but again issue where 30th-31st appears
I need to compare two time period ...
Unfortunately, you question still does not show what exactly your use case - so below is attempt to give you an idea based on generalization of what I see in question
In below (for BigQuery Standard SQL):
the project.dataset.table is your real table with dates and metrics you want to compare.
days_range and months_range - allow you to set range or days and months respectively without doing any changes in main SELECT statement
#standardSQL
WITH days_range AS (
SELECT 15 start_day, 20 end_day
), months_range AS (
SELECT 1 start_month, 4 end_month
)
SELECT
CONCAT(CAST(MIN(day_date) AS STRING), ' - ', CAST(MAX(day_date) AS STRING)) interval_days,
SUM(metric) interval_metric
FROM `project.dataset.table`, days_range, months_range
WHERE EXTRACT(DAY FROM day_date) BETWEEN start_day AND end_day
AND EXTRACT(MONTH FROM day_date) BETWEEN start_month AND end_month
GROUP BY DATE_TRUNC(day_date, MONTH)
-- ORDER BY 1
To play with above you can use below script that mimics your real table by generating days for year of 2018 along with random metrics
#standardSQL
WITH `project.dataset.table` AS (
SELECT day_date, CAST(100 * RAND() AS INT64) metric
FROM UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2018-12-31')) day_date
), days_range AS (
SELECT 15 start_day, 20 end_day
), months_range AS (
SELECT 1 start_month, 4 end_month
)
SELECT
CONCAT(CAST(MIN(day_date) AS STRING), ' - ', CAST(MAX(day_date) AS STRING)) interval_days,
SUM(metric) interval_metric
FROM `project.dataset.table`, days_range, months_range
WHERE EXTRACT(DAY FROM day_date) BETWEEN start_day AND end_day
AND EXTRACT(MONTH FROM day_date) BETWEEN start_month AND end_month
GROUP BY DATE_TRUNC(day_date, MONTH)
ORDER BY 1
with result as
Row interval_days interval_metric
1 2018-01-15 - 2018-01-20 244
2 2018-02-15 - 2018-02-20 235
3 2018-03-15 - 2018-03-20 204
4 2018-04-15 - 2018-04-20 355
if you want to check how same script will 'behave' for 28-30-31 days - try below
#standardSQL
WITH `project.dataset.table` AS (
SELECT day_date, CAST(100 * RAND() AS INT64) metric
FROM UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2018-12-31')) day_date
), days_range AS (
SELECT 25 start_day, 31 end_day
), months_range AS (
SELECT 1 start_month, 4 end_month
)
SELECT
CONCAT(CAST(MIN(day_date) AS STRING), ' - ', CAST(MAX(day_date) AS STRING)) interval_days,
SUM(metric) interval_metric
FROM `project.dataset.table`, days_range, months_range
WHERE EXTRACT(DAY FROM day_date) BETWEEN start_day AND end_day
AND EXTRACT(MONTH FROM day_date) BETWEEN start_month AND end_month
GROUP BY DATE_TRUNC(day_date, MONTH)
ORDER BY 1
with result
Row interval_days interval_metric
1 2018-01-25 - 2018-01-31 364
2 2018-02-25 - 2018-02-28 227
3 2018-03-25 - 2018-03-31 311
4 2018-04-25 - 2018-04-30 308
Hope this will help you to move forward

How do I compare a current partial month vs a previous partial month with postgres?

I'm building some basic reports and I want to see if I'm on track to surpass last month's metrics without waiting for the month to end. Basically I want to compare June 1 (start of current month) through June 23 (current_date) against May 1 (start of previous month) through May 23 (current_date - 1 month).
My goal is to show a count of distinct users that did event1 and event2.
Here's what I have so far:
CREATE VIEW events AS
(SELECT *
FROM public.event
WHERE TYPE in ('event1',
'event2')
AND created_at > now() - interval '1 months' );
CREATE VIEW MAU AS
(SELECT EXTRACT(DOW
FROM created_at) AS month,
DATE_TRUNC('week', created_at) AS week,
COUNT(*) AS total_engagement,
COUNT(DISTINCT user_id) AS total_users
FROM events
GROUP BY 2,
1
ORDER BY week DESC);
SELECT month,
week,
SUM(total_engagement) OVER (PARTITION BY month
ORDER BY week) AS total_engagment
FROM MAU
ORDER BY 1 DESC,
2
Here's an example of what that returns:
Month Week Unique Engagement
6 2017-05-22 00:00:00 165
6 2017-05-29 00:00:00 355
6 2017-06-05 00:00:00 572
6 2017-06-12 00:00:00 723
5 2017-05-22 00:00:00 757
5 2017-05-29 00:00:00 1549
5 2017-06-05 00:00:00 2394
5 2017-06-12 00:00:00 3261
5 2017-06-19 00:00:00 3592
Expected return
Month Day Total Engagement
6 1 50
6 2 100
6 3 180
5 1 89
5 2 213
5 3 284
5 4 341
Can you point out where I've got this wrong or if there's an easier way to do it?
You are confusing days, weeks and months in your question but from the expected output I assume that you want month number, week number within a month and a count of those pairs.
SELECT
month,
week,
count(*) as total_engagement
FROM (
SELECT
extract(month from created_at) as month,
extract('day' from date_trunc('week', created_at::date) -
date_trunc('week', date_trunc('month', created_at::date))) / 7 + 1 as week
FROM public.event
WHERE type IN ('event1', 'event2')
AND created_at > now() - interval '1 month'
) t
GROUP BY 1,2
The most interesting part could be getting the week number within a month and for that you can check this answer.

Group by Week Number and Week Ending

I am trying to show sql query by grouping by week number and to show week ending of the week rather beginning of week but thus far have been futile in achieving this. How can I do this?
select extract(week from actual_sale_date) as week_number,
to_char(date_trunc('week', actual_sale_date) as date, 'MM/dd/yyyy'), count(*)
from data
where project_id = 'ABC'
and actual_sale_date >= date_trunc('year',current_date)
group by rollup( (actual_sale_date))
Result:
week_number date count
1 01/02/2017 2
1 01/02/2017 1
2 01/09/2017 1
2 01/09/2017 1
2 01/09/2017 1
3 01/16/2017 3
3 01/16/2017 1
10
Requested:
week_number week_ending count
1 01/08/2017 3
2 01/15/2017 3
3 01/22/2017 4
10
You were grouping by the actual_sale_date hence the results for a week weren't getting aggregated by week. To get week ending date, add 6 days to the start of week. Use week_number and week ending date in rollup.
select extract(week from actual_sale_date) as week_number,
to_char(date_trunc('week', actual_sale_date) + interval '6' day,'MM/dd/yyyy'),
count(*)
from data
where project_id = 'ABC'
and actual_sale_date >= date_trunc('year',current_date)
group by rollup((extract(week from actual_sale_date)
,to_char(date_trunc('week', actual_sale_date) + interval '6' day,'MM/dd/yyyy')))

Postgres group by timestamp into 6 hourly buckets

I have the following simple table:
ID TIMESTAMP VALUE
4 2011-05-27 15:50:04 1253
5 2011-05-27 15:55:02 1304
6 2011-05-27 16:00:02 1322
7 2011-05-27 16:05:01 1364
I would like to average the VALUES, and GROUP each TIMESTAMP day into 6 hourly buckets. e.g 00:00 to 06:00, 06:00 to 12:00, 12:00 to 18:00 & 18:00 to 00:00.
I am able to group by year, month, day & hour using the following query:
select avg(VALUE),
EXTRACT(year from TIMESTAMP) AS year,
EXTRACT(month from TIMESTAMP) AS month,
EXTRACT(day from TIMESTAMP) as day
from TABLE
group by year,month,day
But I am unable to group each day into 4 periods as defined above, any help is most welcome.
I think grouping the integer value of the quotient of the (Hour of your timestamp / 6) should help. Try it and see if it helps.
Your group by should be something like
group by year, month, day, trunc(EXTRACT(hour from TIMESTAMP) / 6)
The logic behind this is that when the hour part of the date is divided by 6, the int values can only be
0 - 0:00 - 5:59:59
1 - 6:00 - 11:59:59
2 - 12:00 - 17:59:59
3 - 18:00 - 23:59:59
Grouping using this should put your data into 4 groups per day, which is what you need.