Big Query (SQL) Add one month to the date (Issue) - (Data Studio) - sql

Currently, I'm adding an extra month this way:
DATE_ADD(date, INTERVAL 1 MONTH) AS pDate
I'm trying to compare two values by month, by using the same date range. So I made another custom field with date +1 month and when I use it...it missing days with 31.
June 30 +1 month = July 31 and as I using custom field... it missing fields where date with 31
EDITING v1.0
I have a database for a year, each day is presented.
As example:
01012018
....
31012018
01022018
...
28022018
I need to compare two time period and to solve this issue I create a custom field which takes the date and add +1 month, so after in Data studio (could be in any over platform) can compare 25-30January with 25-30February, the issue is when I add 1 month to the date 30012018 it becomes 30022018 (WHich as you know does not exist)
Anyway, I'm sticking with this idea, but maybe there are any other ways of doing this? WIll repeat again, I need to compare the same date but from different month - 15th January -20 January WITH 15th February - 20th February, but again issue where 30th-31st appears

I need to compare two time period ...
Unfortunately, you question still does not show what exactly your use case - so below is attempt to give you an idea based on generalization of what I see in question
In below (for BigQuery Standard SQL):
the project.dataset.table is your real table with dates and metrics you want to compare.
days_range and months_range - allow you to set range or days and months respectively without doing any changes in main SELECT statement
#standardSQL
WITH days_range AS (
SELECT 15 start_day, 20 end_day
), months_range AS (
SELECT 1 start_month, 4 end_month
)
SELECT
CONCAT(CAST(MIN(day_date) AS STRING), ' - ', CAST(MAX(day_date) AS STRING)) interval_days,
SUM(metric) interval_metric
FROM `project.dataset.table`, days_range, months_range
WHERE EXTRACT(DAY FROM day_date) BETWEEN start_day AND end_day
AND EXTRACT(MONTH FROM day_date) BETWEEN start_month AND end_month
GROUP BY DATE_TRUNC(day_date, MONTH)
-- ORDER BY 1
To play with above you can use below script that mimics your real table by generating days for year of 2018 along with random metrics
#standardSQL
WITH `project.dataset.table` AS (
SELECT day_date, CAST(100 * RAND() AS INT64) metric
FROM UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2018-12-31')) day_date
), days_range AS (
SELECT 15 start_day, 20 end_day
), months_range AS (
SELECT 1 start_month, 4 end_month
)
SELECT
CONCAT(CAST(MIN(day_date) AS STRING), ' - ', CAST(MAX(day_date) AS STRING)) interval_days,
SUM(metric) interval_metric
FROM `project.dataset.table`, days_range, months_range
WHERE EXTRACT(DAY FROM day_date) BETWEEN start_day AND end_day
AND EXTRACT(MONTH FROM day_date) BETWEEN start_month AND end_month
GROUP BY DATE_TRUNC(day_date, MONTH)
ORDER BY 1
with result as
Row interval_days interval_metric
1 2018-01-15 - 2018-01-20 244
2 2018-02-15 - 2018-02-20 235
3 2018-03-15 - 2018-03-20 204
4 2018-04-15 - 2018-04-20 355
if you want to check how same script will 'behave' for 28-30-31 days - try below
#standardSQL
WITH `project.dataset.table` AS (
SELECT day_date, CAST(100 * RAND() AS INT64) metric
FROM UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2018-12-31')) day_date
), days_range AS (
SELECT 25 start_day, 31 end_day
), months_range AS (
SELECT 1 start_month, 4 end_month
)
SELECT
CONCAT(CAST(MIN(day_date) AS STRING), ' - ', CAST(MAX(day_date) AS STRING)) interval_days,
SUM(metric) interval_metric
FROM `project.dataset.table`, days_range, months_range
WHERE EXTRACT(DAY FROM day_date) BETWEEN start_day AND end_day
AND EXTRACT(MONTH FROM day_date) BETWEEN start_month AND end_month
GROUP BY DATE_TRUNC(day_date, MONTH)
ORDER BY 1
with result
Row interval_days interval_metric
1 2018-01-25 - 2018-01-31 364
2 2018-02-25 - 2018-02-28 227
3 2018-03-25 - 2018-03-31 311
4 2018-04-25 - 2018-04-30 308
Hope this will help you to move forward

Related

Issue with Date_SUB And BETWEEN function

I came across this issue while creating a parametrised query; Then intent of the query is to pull past 5 months (excluding current month) data based on the date passed as a variable. Basic table schema for Table A is as follows:
as_of_date Y X
2019-12-31 1 AB
2019-11-30 2 CD
2019-10-31 3 EF
2019-09-30 4 GH
2019-08-31 5 MN
2019-07-31 6 XYZ
2020-01-31 7 PQR
2020-02-29 8 AAA
Following is the query I wrote:
WITH
date
AS
(
SELECT CAST("2020-02-29" AS Date) as run_date
)
SELECT DISTINCT CAST(a.as_of_date AS DATE) as_of_date,
FROM A as a
WHERE CAST(a.as_of_date AS DATE) BETWEEN DATE_SUB((SELECT run_date FROM date), INTERVAL 5 Month) AND DATE_SUB((SELECT run_date FROM date), INTERVAL 1 Month)
This query runs fine when run_date is set to "2020-01-31" and returns past 5 months data i.e. Dec, Nov, October, Sept, and August. But fails when date is set to "2020-02-29" it only returns 4 months data.
Simple "fix" is to add DATE_TRUNC(..., MONTH) as in below example
SELECT DISTINCT CAST(a.as_of_date AS DATE) as_of_date,
FROM `project.dataset.tableA` AS a
WHERE DATE_TRUNC(CAST(a.as_of_date AS DATE), MONTH)
BETWEEN DATE_TRUNC(DATE_SUB((SELECT run_date FROM date_cte), INTERVAL 5 Month), MONTH)
AND DATE_TRUNC(DATE_SUB((SELECT run_date FROM date_cte), INTERVAL 1 Month), MONTH)

Max value by ID, date and last x days

Supposed I have a table :
---------------
id | date | value
------------------
1 | Jan 1 | 10
1 | Jan 2 | 12
1 | Jan 3 | 11
2 | Jan 4 | 11
I need to get the max and median value of each id, each date, each for the past 90 days. Im using query :
select id, date, value
max(value) over (partition by id, date) as max_date,
median(value) over (partition by id, date) as med_date
from table
where date > date - interval '90 days'
I tried to export the data and check manually but the result is not correct. Any thing I missed? thanks
expected output is to get maximum value of since the last 90 days. for example the date is April 5th, then it will find the maximum value from Jan 5th (the last 90 days) until April 5th. and then the date moves to April 6th, then it will do again for jan 6th until April 6h and so on for each ID
So im assuming u can get several values for same ID and Date and right ? otherwise partitioning for both id and date makes no sense
SELECT id, date, max(value), avg(value) from table where date > date - interval '90 days'
group by id, value
'group by' does the partitioning
Why are you using window functions? This seems to do what you describe:
select id,
max(value) as max_date,
percentile_disc(0.5) within group (order by value) as median_value
from table
where date > date - interval '90 days';
If you want this per date, use window functions:
select t.*
from (select t.*,
max(value) over (order by date range between '89 day' preceding and current row) as running_max_value,
percentile_disc(0.5) within group (order by value) range between '89 day' preceding and current row) as running_median_value
from t
) t
where date > date - interval '90 days';
The filter is in the outer query so the preceding period can go back further in time.

How to extract random dates targets/sales data from monthly targets

I have monthly targets defined for the different category of items for the complete year.
Example:
January Target for A Category - 15,000
January Target for R Category - 10,000
January Target for O Category - 5,000
Actual Sales for A Category January - 18,400
Actual Sales for R Category January - 8,500
Actual Sales for O Category January - 3,821
The SQL query to compare actual sales with target will be simple as follows:
SELECT TO_CHAR (Sales_Date, 'MM') Sales_Month,
Sales_Category,
SUM (Sales_Value) Sales_Val_Monthly,
Target_Month,
Target_Category,
Target_Value
FROM Sales_Data, Target_Data
WHERE TO_CHAR (Sales_Date, 'MM') = Target_Month
AND Sales_Category = Target_Category
GROUP BY TO_CHAR (Sales_Date, 'MM'),
Target_Month,
Target_Category,
Sales_Category,
Target_Value;
Now I have a requirement that user will input FROM_DATE and TILL_DATE in the report parameter and the starting/ending date can be random, it will not represent a complete month or week, the start date can be 12/01/2018 and end date can be 15/01/2018, i.e., data for 4 days. The result should calculate the actual data for 4 days, calculate the target for 4 days considering the fact that there will be 6 working days (Sunday is a holiday) and if the date range includes Sunday, it should not be considered.
Also, the number of days in a month should be considered and the date parameters may contain some days from one month and some days from another month or maybe more than one month.
Target_Table (Target_Data)
Target_Year Target_Month Target_Category Target_Value
2018 01 A 15000
2018 02 A 8500
2018 03 A 9500
2018 01 R 15000
2018 02 R 8500
2018 03 R 9500
2018 01 O 15000
2018 02 O 8500
2018 03 O 9500
Sales Table (Sales_Data)
Inv_Txn Inv_No Sales_Date Item_Code Sales_Category Qty Rate Sales_Value Inv_Locn Inv_SM_ID
A21 2018000001 02/01/2018 XXXX A 2 5.5 11 O001 XXXX
R32 2018000001 27/02/2018 XXXX R 3 9.5 28.5 O305 XXXX
O98 2018000001 12/03/2018 XXXX O 12 12.5 150 O901 XXXX
U76 2018000001 18/01/2018 XXXX A 98 5.5 539 O801 XXXX
B87 2018000001 19/02/2018 XXXX R 2 9.5 19 O005 XXXX
A21 2018000002 13/03/2018 XXXX R 45 9.5 427.5 O001 XXXX
B87 2018000002 14/03/2018 XXXX O 12 12.5 150 O005 XXXX
Desired Output (From Date: 27/02/2018 Till Date: 06/03/2018)
Target_Category Target_Value Sales_Value
A 87.52 21.88
A 96.25 24.06
A 74.25 18.56
R 100.25 25.06
R 800.2 200.05
R 25.1 6.28
O 75.5 18.88
O 98.1 24.53
O 25.5 6.38
The first step might be to see whether we can get the number of Sundays in a given month. As it turns out, we can - and we don't have to use any SQL tricks or PL/SQL:
SELECT EXTRACT( DAY FROM LAST_DAY(SYSDATE) ) AS month_day_cnt
, CEIL( ( LAST_DAY(TRUNC(SYSDATE, 'MONTH')) - NEXT_DAY(TRUNC(SYSDATE, 'MONTH')-1, 'SUN') + 1 ) / 7 ) AS sunday_cnt
FROM dual;
This will give us the number of days in a given month as well as the number of Sundays. All we need to do is subtract the latter number from the former to get the number of working days. We can work that into your initial query (by the way, I suggest using TRUNC() instead of TO_CHAR() since your users might want a date range that spans more than one calendar year):
SELECT TRUNC(s.Sales_Date, 'MONTH') AS Sales_Month
, EXTRACT( DAY FROM LAST_DAY( TRUNC(s.Sales_Date, 'MONTH') ) ) - CEIL( ( LAST_DAY(TRUNC(s.Sales_Date, 'MONTH')) - NEXT_DAY(TRUNC(s.Sales_Date, 'MONTH')-1, 'SUN') + 1 ) / 7 ) AS working_day_cnt
, s.Sales_Category, SUM(s.Sales_Value) AS Sales_Val_Monthly
, t.Target_Value -- Target_Month and Target_Category are superfluous
FROM Sales_Data s INNER JOIN Target_Data t
ON TO_CHAR(s.Sales_Date, 'MM') = t.Target_Month
AND TO_CHAR(s.Sales_Date, 'YYYY') = t.Target_Year
AND s.Sales_Category = t.Target_Category
GROUP BY TRUNC(s.Sales_Date, 'MONTH'), Sales_Category, Target_Value;
Now given a start date and an end date, we can generate the number of working days for all the months in between those dates as follows:
SELECT TRUNC(range_dt, 'MONTH'), COUNT(*) FROM (
SELECT start_dt + LEVEL - 1 AS range_dt
FROM dual
CONNECT BY start_dt + LEVEL - 1 < end_dt
) WHERE TO_CHAR(range_dt, 'DY') != 'SUN'
GROUP BY TRUNC(range_dt, 'MONTH');
where start_dt and end_dt are parameters supplied by the user. Putting this all together, we'll have something like the following:
WITH rd ( range_month, range_day_cnt ) AS (
SELECT TRUNC(range_dt, 'MONTH'), COUNT(*) FROM (
SELECT start_dt + LEVEL - 1 AS range_dt
FROM dual
CONNECT BY start_dt + LEVEL - 1 < end_dt
) WHERE TO_CHAR(range_dt, 'DY') != 'SUN'
GROUP BY TRUNC(range_dt, 'MONTH')
)
SELECT range_month, Sales_Category, Sales_Val_Monthly
, range_day_cnt, working_day_cnt, Target_Value
, Target_Value*range_day_cnt/working_day_cnt AS prorated_target_value
FROM (
SELECT r.range_month, r.range_day_cnt
, EXTRACT( DAY FROM LAST_DAY( TRUNC(s.Sales_Date, 'MONTH') ) ) - CEIL( ( LAST_DAY(TRUNC(s.Sales_Date, 'MONTH')) - NEXT_DAY(TRUNC(s.Sales_Date, 'MONTH')-1, 'SUN') + 1 ) / 7 ) AS working_day_cnt
, s.Sales_Category, SUM(s.Sales_Value) AS Sales_Val_Monthly
, t.Target_Value -- Target_Month and Target_Category are superfluous
FROM rd INNER JOIN Sales_Data s
ON rd.range_month = TRUNC(s.Sales_Date, 'MONTH')
INNER JOIN Target_Data t
ON TO_CHAR(s.Sales_Date, 'MM') = t.Target_Month
AND TO_CHAR(s.Sales_Date, 'YYYY') = t.Target_Year
AND s.Sales_Category = t.Target_Category
WHERE s.Sales_Date >= TRUNC(start_dt)
AND s.Sales_Date < TRUNC(end_dt+1)
GROUP BY r.range_month, r.range_day_cnt, s.Sales_Category, t.Target_Value
) ORDER BY range_month;
If you have a table of public holidays, then those will have to be factored in somewhere as well - both in the rd common table expression and from the calculation of working days. If the above doesn't give you a start on that then I can take a look again in a bit and see how the other holidays might be worked in.
You can calculate the number of working days between two dates using below query. I added a nonworking date via a table named: holiday_dates and created a series of dates from 12/01/2018 to 15/01. I remove those dates that are either Sunday or holiday. Please let me know if it works for you. Thanks.
create table holiday_dates(holiday_dte date, holiday_desc varchar(100));
insert into holiday_dates values(TO_DATE('13/01/2018','DD-MM-YYYY'), 'Not a Working Day');
With tmp as (
select count(*) as num_of_working_days
from ( select rownum as rn
from all_objects
where rownum <= to_date('15/01/2018','DD-MM-YYYY') - to_date('12/01/2018','DD-MM-YYYY')+1 )
where to_char( to_date('12/01/2018','DD-MM-YYYY')+rn-1, 'DY' ) not in ( 'SUN' )
and not exists ( select null from holiday_dates where holiday_dte = trunc(to_date('12/01/2018','DD-MM-YYYY') + rn - 1)))
SELECT TO_CHAR (Sales_Date, 'MM') Sales_Month,
Sales_Category,
SUM (Sales_Value) Sales_Val_Monthly,
Target_Month,
Target_Category,
Target_Value,
tmp.num_of_working_days
FROM Sales_Data, Target_Data, tmp
WHERE Sales_Date between to_date('12/01/2018','DD-MM-YYYY') and to_date('15/01/2018','DD-MM-YYYY')
AND Sales_Category = Target_Category
GROUP BY TO_CHAR (Sales_Date, 'MM'),
Target_Month,
Target_Category,
Sales_Category,
Target_Value;

How do I compare a current partial month vs a previous partial month with postgres?

I'm building some basic reports and I want to see if I'm on track to surpass last month's metrics without waiting for the month to end. Basically I want to compare June 1 (start of current month) through June 23 (current_date) against May 1 (start of previous month) through May 23 (current_date - 1 month).
My goal is to show a count of distinct users that did event1 and event2.
Here's what I have so far:
CREATE VIEW events AS
(SELECT *
FROM public.event
WHERE TYPE in ('event1',
'event2')
AND created_at > now() - interval '1 months' );
CREATE VIEW MAU AS
(SELECT EXTRACT(DOW
FROM created_at) AS month,
DATE_TRUNC('week', created_at) AS week,
COUNT(*) AS total_engagement,
COUNT(DISTINCT user_id) AS total_users
FROM events
GROUP BY 2,
1
ORDER BY week DESC);
SELECT month,
week,
SUM(total_engagement) OVER (PARTITION BY month
ORDER BY week) AS total_engagment
FROM MAU
ORDER BY 1 DESC,
2
Here's an example of what that returns:
Month Week Unique Engagement
6 2017-05-22 00:00:00 165
6 2017-05-29 00:00:00 355
6 2017-06-05 00:00:00 572
6 2017-06-12 00:00:00 723
5 2017-05-22 00:00:00 757
5 2017-05-29 00:00:00 1549
5 2017-06-05 00:00:00 2394
5 2017-06-12 00:00:00 3261
5 2017-06-19 00:00:00 3592
Expected return
Month Day Total Engagement
6 1 50
6 2 100
6 3 180
5 1 89
5 2 213
5 3 284
5 4 341
Can you point out where I've got this wrong or if there's an easier way to do it?
You are confusing days, weeks and months in your question but from the expected output I assume that you want month number, week number within a month and a count of those pairs.
SELECT
month,
week,
count(*) as total_engagement
FROM (
SELECT
extract(month from created_at) as month,
extract('day' from date_trunc('week', created_at::date) -
date_trunc('week', date_trunc('month', created_at::date))) / 7 + 1 as week
FROM public.event
WHERE type IN ('event1', 'event2')
AND created_at > now() - interval '1 month'
) t
GROUP BY 1,2
The most interesting part could be getting the week number within a month and for that you can check this answer.

Calculating difference between daily sum and a average for the same day of the week in defined time range. SQL 10g Oracle

Hi I'm working with data depending mostly on the day of the week. Data is formatted in a table
Date - position - count/number.
There are multiple different positions.
I was able to sort my data for a each day of the week using.
select MOD(to_char(time, 'J'),7),
sum(COUNT))
from TABLE
where time > sysdate -x
group by to_char(time, 'J')
order by to_char(time, 'J');
This outputs daily sums according to day of the week.
Now I'm able to get an average for a single day of a week in a year.
This code outputs an average for only Sunday
SELECT AVG(asset_sums)
FROM (
select MOD(to_char(time, 'J'),7),
sum(COUNT)) as asset_sums
from table
where time > sysdate -365
and MOD(TO_CHAR(time, 'J'), 7) + 1 IN (7)
group by to_char(time, 'J')
order by to_char(time, 'J')
);
My goal is to be able to get a table with daily sum compared with yearly average for that particular day of the week.
For example yearly average number for Mondays is 57 , Tuesdays 60.
This week my Monday is 59 and Tuesday is 57. Output of the table is
Monday +2, Tuesday -3.
What is the easiest way / most efficient ?
Thanks for your help.
Edit : Format of my data
Date : yyyy-mm-dd | Place : xxxx | Number( of customers) 0 to 10000
2013-09-16 | AAAA | 1534
2013-09-16 | AAAB | 534
2013-09-17 | AAAA | 1434
2013-09-17 | AAAC | 834
2013-09-18 | AAAA | 134
2013-09-18 | AAAD | 183
Needed output
2013-09-16 | Day of the week | Sum | Average monday this year | Difference Sum-AVG
2013-09-16 | 1 (= Monday) | 2068 | 2015| 53
For clarity I will use subquery factoring. First, select the current weeks data. Next, subquery the sum for the day over the current week. Then, subquery the sum for each day over the past year. Then, average the daily sum of each day for each day of the week. Finally, join the two and display the difference.
with
this_week as (
select
time
from table
where time > x - 7
group by time
),
this_week_dly_sum as (
select
to_char(time, 'd') day,
sum(count) sum
from this_week
group by to_char(time, 'd')
),
this_year_dly_sum as (
select
time,
sum(count) sum
from table
where time > x - 365
group by time
),
this_year_dly_avg as (
select
to_char(day, 'd'),
avg(sum) avg
from this_year_dly_sum
group by to_char(day, 'd')
)
select
this_week.time,
to_char(this_week.time, 'day') day of week,
this_week_dly_sum.sum,
this_year_dly_avg.avg,
this_week_dly_sum.sum - this_year_dly_avg.avg difference
from this_week
inner join this_week_dly_sum
on to_char(this_week.time, 'd') = this_week_dly_sum.day
inner join this_year_dly_avg
on to_char(this_week.time, 'd').day = this_year_dly_avg.
group by time
;
You can use analytic function for this.
select date1, to_char(date1, 'd'),
sum(val) over(partition by to_char(date1, 'd')),
avg(val) over(partition by to_char(date1, 'd')),
sum(val) over(partition by to_char(date1, 'd'))-
avg(val) over(partition by to_char(date1, 'd'))
from table1
time > add_month(sysdate,-12);
This will give you daily counts for the last year:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
You can modify it to additionally return averages per day of the week for the specified range:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
At this point you have all the initial data you need but probably for more days than necessary. You can use the above query as a derived table to limit the rows to just those where date > SYSDATE - x:
WITH last_year_by_day AS
(
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
)
SELECT date,
TO_CHAR(TRUNC(time, 'DD'), 'D') AS day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
As some expressions are being repeated multiple times, it can be a good idea to re-factor the query to avoid the repetition. Here's one way:
WITH last_year AS
(
SELECT TRUNC(time, 'DD') AS date,
TO_CHAR(time, 'D') AS day_of_week,
count
FROM yourtable
WHERE time > SYSDATE - 365
),
last_year_by_day AS
(
SELECT date,
day_of_week,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER (PARTITION BY day_of_week) AS asset_sum_avg
FROM last_year
GROUP BY date, day_of_week
)
SELECT date,
day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
One last note is about TO_CHAR('D'), which is used to obtain the day_of_week values. Since you are using a different method for the same results, you may not be aware that the results of TO_CHAR('D') are affected by the NLS_TERRITORY setting. You may want to use an ALTER SESSION statement to set NLS_TERRITORY to the value that would cause TO_CHAR('D') to return 1 for Monday, 2 for Tuesday etc. Here is the list of territories supported.