How to separate data columns by year using basic SQL (bigquery) - sql

I am trying to create a visualization using bigquery and chartio. I want to display traffic volumes by day for each year to compare on one viz, to help identify seasonality.
I can break down the traffic by having a single column for traffic and another column for month and one for year, but this data structure doesn't work when I try to build the viz is chartio.
So what I am trying to do is to set a column for each year, where I have the traffic numbers set out by month. I am not sure of the way to do this, I know I probably need a union or a join here.
The code below combines the values, but doesn't get what I want.
Thanks in advance for the help!
SELECT
EXTRACT(MONTH FROM date) AS month,
EXTRACT(YEAR FROM date) AS year,
SUM(CAST(traffic AS INT64)) AS traffic
FROM
data.source
GROUP BY month, year
This is the output I get:
month year traffic
1 2017 11991865
3 2019 3482067
8 2017 21345567
6 2016 85207567
3 2018 22010756
What I want is:
month traffic_2016 traffic_2017
1 233391865 11991865
2 1123465 3482067
3 11996545 21345567
4 119916655 85207567
5 34571865 22010756

By using IF-ELSE / CASE WHEN statement with GROUP BY
SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(IF(EXTRACT(YEAR FROM date) = 2016, CAST(traffic AS INT64), 0) AS traffic_2016,
SUM(IF(EXTRACT(YEAR FROM date) = 2017, CAST(traffic AS INT64), 0) AS traffic_2017,
FROM
data.source
GROUP BY month
Simply with Join
SELECT
*
FROM
(SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(CAST(traffic AS INT64)) AS traffic_2016
FROM
data.source
WHERE
EXTRACT(MONTH FROM date) = 2016
GROUP BY month)
JOIN
(SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(CAST(traffic AS INT64)) AS traffic_2017
FROM
data.source
WHERE
EXTRACT(MONTH FROM date) = 2017
GROUP BY month)
USING(month)

Below is for BigQuery Standard SQL and provides less verbose and easier to read and maintain and extend with more columns version
#standardSQL
SELECT month,
SUM(IF(year = 2016, value, 0)) traffic_2016,
SUM(IF(year = 2017, value, 0)) traffic_2017,
SUM(IF(year = 2018, value, 0)) traffic_2018,
SUM(IF(year = 2019, value, 0)) traffic_2019
FROM `project.data.source`,
UNNEST([STRUCT(
EXTRACT(MONTH FROM `date`) AS month,
EXTRACT(YEAR FROM `date`) AS year,
CAST(traffic AS INT64) AS value
)])
GROUP BY month

Related

Use different fields based on condition

I log the daily produced energy of my solar panels. Now I want to create a SQL statement to get the sum of produced energy for each month but separate columns for each year.
I came up with the following SQL statement:
SELECT LPAD(extract (month from inverterlogs_summary_daily.bucket)::text, 2, '0') as month,
sum(inverterlogs_summary_daily."EnergyProduced") as a2022
from inverterlogs_summary_daily
WHERE
inverterlogs_summary_daily.bucket >= '01.01.2022' and inverterlogs_summary_daily.bucket < '01.01.2023'
group by month
order by 1;
This results in only getting the values from 2022:
month
a2022
1
100
2
358
3
495
How could I change the SQL statement to get new columns for each year? Is this even possible?
Result should look like this (with a new column for each year, wouldn't mind if I had to update the SQL statement every year):
month
a2022
a2023
1
100
92
2
358
497
3
495
508
You can use conditional aggregation:
select extract(month from bucket) bucket_month,
sum("EnergyProduced") filter(where extract(year from bucket) = 2022) a_2022,
sum("EnergyProduced") filter(where extract(year from bucket) = 2021) a_2021
from inverterlogs_summary_daily
where bucket >= date '2021-01-01' and bucket < date '2023-01-01'
group by extract(month from bucket)
order by bucket_month
I assumed that bucket is of a timestamp-like datatype, and adapted the date arithmetic accordingly.
Side note: the expressions in the filter clause can probably be optimized with the lengthier:
sum("EnergyProduced") filter(
where bucket >= date '2022-01-01' and bucket < date '2023-01-01'
) a_2022,
You can add a condition to the SUM
SELECT to_char(bucket, 'mm') as month,
sum(CASE WHEN extract (YEAR from inverterlogs_summary_daily.bucket) = 2022 then inverterlogs_summary_daily."EnergyProduced" END) as a2022,
sum(CASE WHEN extract (YEAR from inverterlogs_summary_daily.bucket) = 2023 then inverterlogs_summary_daily."EnergyProduced" END) as a2022
from inverterlogs_summary_daily
WHERE
inverterlogs_summary_daily.bucket >= '01.01.2022' and inverterlogs_summary_daily.bucket < '01.01.2024'
group by month

Rolling 12 month filter criteria in SQL

Having an issue in SQL script where I’m trying to achieve filter criteria of rolling 12 months in the day column which stored data as a text in server.
Goal is to count sizes for product at retail store location over the last 12 months from the current day. Currently, in my query I'm using the criteria of year 2019 which only counts the sizes for that year but not for rolling 12 months from current date.
CALENDARDAY column is in text field in the data set and data stores in yyyymmdd format.
When trying to run below script in Tableau with GETDATE and DATEADD function it is giving me a functional error. I am trying to access SAP HANA server with below query.
Any help would be appreciated
Select
SKU, STYLE_ID, Base_Style_ID, COLOR, SIZEKEY, STORE, Year,
count(SIZEKEY)over(partition by STYLE_ID,COLOR,STORE,Year) as SZ_CNT
from
(
select
a."RAW" As SKU,
a."STYLENUM" As STYLE_ID,
mat."BASENUM" AS Base_Style_ID,
a."COLORNUM" AS COLOR,
a."SIZE" AS SIZEKEY,
a."STORENUM" AS STORE,
substring(a."CALENDARDAY",1,4) As year
from PRTRPT_XRE as a
JOIN ZAT_SKU As mat On a."RAW" = mat."SKU"
where a."ORGANIZATION" = 'M20'
and a."COLORNUM" is not null
and substring(a."CALENDARDAY",1,4) = '2019'
Group BY
a."RAW",
a."STYLENUM",
mat."BASENUM",
a."ZCOLORCD",
a."SIZE",
a."STORENUM",
substring(a."CALENDARDAY",1,4)
)
I have never worked on that DB / Server, so I don't have a way to test this.
But hopefully this will work (expecting exact 12 months before today's date)
AND ADD_MONTHS (TO_DATE (a."CALENDARDAY", 'YYYY-MM-DD'), 12) > CURRENT_DATE
or
AND ADD_MONTHS (a."CALENDARDAY", 12) > CURRENT_DATE
Below condition from one of our CALENDAR table also worked same way as ADD_MONTHS mentioned in above response
select distinct CALENDARDAY
from
(
select FISCALWEEK, CALENDARDAY, CNST, row_number()over(partition by CNST order by FISCALWEEK desc) as rnum
from
(
select distinct FISCALWEEK, CALENDARDAY, 'A' as CNST
from CALENDARTABLE
where CALENDARDAY < current_date
order by 1,2
)
) where rnum < 366

How to aggregate YTD measure dynamically

I have a table which has 2 fields timestamp and count. Table has data since 2016 November.
I have to set up a query which will daily aggregate the YTD sum(count) for all the years. I am not using calendar year definition but rather November-October (Next year). This shouldn't ideally change the logic
2017: 11/01/2016-10/31/2017;
2018: 11/01/2017-10/31/2018;
2019: 11/01/2018-10/31/2019;
2020: 11/01/2019-10/31/2020
I want a query that will calculate on any given day aggregate YTD with November 1st as the start date. I tried this query
select ytd_bucket
,sum(count_field) sum
from
(
select
timestamp_field,
count_field,
CASE
WHEN DATE(timestamp_field,"America/Los_Angeles") >= '2019-11-01' THEN '2020'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2018-11-01' AND CAST(CONCAT('2019-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2019'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2017-11-01' AND CAST(CONCAT('2018-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2018'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2016-11-01' AND CAST(CONCAT('2017-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2017'
ELSE NULL END as YTD_bucket
from table
)
group by 1
The above query does not aggregate the numbers are a YTD level. For the years prior to 2020 (ytd_bucket) the query is aggregating the entire years count.
Start by aggregating per day:
select date(timestamp_field, 'America/Los_Angeles') as dte,
count(*)
from table
group by dte;
Then, for the YTD, you want to add one year and get the date:
select dte,
count(*),
sum(count(*)) over (partition by extract(year from date_add(dte, interval 1 month))
order by min(timestamp_field)
) as running_cnt
from (select t.*,
date(timestamp_field, 'America/Los_Angeles') as dte
from t
) t
group by dte;

SQL Server / SSRS: Calculating monthly average based on grouping and historical values

I need to calculate an average based on historical data for a graph in SSRS:
Current Month
Previous Month
2 Months ago
6 Months ago
This query returns the average for each month:
SELECT
avg_val1, month, year
FROM
(SELECT
(sum_val1 / count) as avg_val1, month, year
FROM
(SELECT
SUM(val1) AS sum_val1, SUM(count) AS count, month, year
FROM
(SELECT
COUNT(val1) AS count, SUM(val1) AS val1,
MONTH([SnapshotDate]) AS month,
YEAR([SnapshotDate]) AS year
FROM
[DC].[dbo].[KPI_Values]
WHERE
[SnapshotKey] = 'Some text here'
AND No = '001'
AND Channel = '999'
GROUP BY
[SnapshotDate]) AS sub3
GROUP BY
month, year, count) AS sub2
GROUP BY sum_val1, count, month, year) AS sub1
ORDER BY
year, month ASC
When I add the following WHERE clause I get the average for March (2 months ago):
WHERE month = MONTH(GETDATE())-2
AND year = YEAR(GETDATE())
Now the problem is when I want to retrieve data from 6 months ago; MONTH(GETDATE()) - 6 will output -1 instead of 12. I also have an issue with the fact that the year changes to 2016 and I am a bit unsure of how to implement the logic in my query.
I think I might be going about this wrong... Any suggestions?
Subtract the months from the date using the DATEADD function before you do your comparison. Ex:
WHERE SnapshotDate BETWEEN DATEADD(month, -6, GETDATE()) AND GETDATE()
MONTH(GETDATE()) returns an int so you can go to 0 or negative values. you need a user scalar function managing this, adding 12 when <= 0

SQL Query to show all results before current month

I have a table in Oracle with columns: [DATEID date, COUNT_OF_PHOTOS int]
This table basically represents how many photos were uploaded per day.
I have a query that summarizes the number of photos uploaded per month:
select extract(year from dateid) as year, extract(month from dateid) as month, count(1) as Photos
from picture_table
group by extract(year from dateid), extract(month from dateid)
order by 1, 2
This does what I want, but I would like to run this query at the beginning of each month, lets say 07-02-2012, and have all data EXCLUDING the current month. How would I add a WHERE clause that ignores all entries that have a date equal to the current year+month?
Here is one way:
where to_char(dateid, 'YYYY-MM') <> to_char(sysdate, 'YYYY-MM')
To preserve any indexing strategy you may have on dateid:
select extract(year from dateid) as year, extract(month from dateid) as month, count(1) as Photos
from picture_table
WHERE (dateid < TRUNC(SYSDATE,'MM') OR dateid >= ADD_MONTHS(TRUNC(SYSDATE,'MM'),1))
group by extract(year from dateid), extract(month from dateid)
order by 1, 2