How to GROUP BY Month-Year using BigQuery - sql

I am trying to count the number of bus trips (with start and destinations) on monthly basis (for several years) using TIMESTAMP column/field. I can do this on MONTH basis (TIMESTAMP_TRUNC(start_date, MONTH)) but I would like to do this for MONTH-YEAR basis. Any help is appreciated. Thanks

you can use Standard SQL:
SELECT
FORMAT_DATE('%b-%Y', created_date) mon_year,
COUNT(1) AS `count`
FROM `project.dataset.table`
GROUP BY mon_year
ORDER BY PARSE_DATE('%b-%Y', mon_year)
if you are using timestamp you have to cast it to date
SELECT
FORMAT_DATE('%b-%Y', DATE(CURRENT_TIMESTAMP())) mon_year
will produce:
Sep-2020
As per your example from comments. You can't use count in where clause. If you want to have a filter on aggregation you have to use having docs.
SELECT TIMESTAMP_TRUNC(start_date, MONTH) AS year_month,
start_station_name,
end_station_name,
count(start_station_name) AS count_start,
FROM bigquery-PUBLIC-data.san_francisco.bikeshare_trips
WHERE start_station_name <> end_station_name
GROUP BY year_month,
start_station_name,
end_station_name
HAVING count(start_station_name) > 10
LIMIT 50

Your code should do what you want:
select timestamp_trunc(start_date, month) as yyyymmm, count(*)
from t
group by yyyymm;
This includes both the year and month, so Jan 2020 is different from Jan 2019.
If you wanted just by month of the year, then use extract():
select extract(month from start_date) as mon, count(*)
from t
group by mon;
This would treat Jan 2020 as the same as Jan 2019.

Related

Using () OVER or HAVING clause to get monthly aggregates of counts

I have a big dataset on ticket sales throughout a single year. The schema I am working with is:
ID
date_time_sale (Timestamp, yyyy-MM-dd hh-mm-ss)
weekday (varchar, Mon to Sun)
number_tickets (integer)
ticket_price (float)
total_price (float)
I am trying to get to get the weekday of every month of the year where the highest number of tickets was sold, so, for example, the output would be:
year
month
weekday
total_tickets
2015
01
SAT
5400
2015
02
SUN
4300
2015
03
SUN
6400
I tried using the following, but admittedly SQL is not my strongest skill:
SELECT DISTINCT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
RANK () OVER (PARTITION BY YEAR, MOMTH ORDER BY count(week_day) ASC) weekday_count
from ticket_sales
order by YEAR, MONTH
But I keep running into errors. I tried using a HAVING clause, but I coludn't go anywhere. Any tip on how to effectively use the RANK () OVER (PARTITION BY) clause to get this output, please? Or do I need to use COUNT () OVER?
The analysis exception says:
`cannot resolve '`YEAR`' given input columns: [ticket_sales.YEAR, ticket_sales.MONTH, weekday]; line 1 pos 292;\n'Sort ['YEAR ASC NULLS FIRST, 'MONTH ASC NULLS FIRST], true\n+- Project [YEAR#342, MONTH#358
but then it is quite a long error.
Update:
So I tried this code:
SELECT DISTINCT year,
month,
week_day,
COUNT (week_day) OVER (PARTITION BY year, month, week_day) AS weekday_count
from ticket_sales
order by year, month, weekday_count DESC
And what that did is give the results of all week days in the for every months, so the output is 12*7 instead of 12 rows. Still ways to learn around this but at least I am somewhere.
Try this query and let me know if return the desire result:
I'm not sure if field name is number_tickets or total_tickets, I used number_tickets.
First I sum numbers tickets from year, month and week day, then return a row per year and month with the week's day in which more tickets were sold.
WITH total_by_day AS (SELECT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
EXTRACT(MONTH FROM date_time_sale) AS MONTH,
week_day,
SUM(number_tickets) AS number_tickets
FROM ticket_sales
GROUP BY YEAR, MONTH, week_day)
SELECT DISTINCT
YEAR,
MONTH,
FIRST_VALUE(week_day) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS week_day,
FIRST_VALUE(number_tickets) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS total_tickets
FROM total_by_day
ORDER BY YEAR, MONTH;
In Postgresql database I got the desire result.

How to aggregate YTD measure dynamically

I have a table which has 2 fields timestamp and count. Table has data since 2016 November.
I have to set up a query which will daily aggregate the YTD sum(count) for all the years. I am not using calendar year definition but rather November-October (Next year). This shouldn't ideally change the logic
2017: 11/01/2016-10/31/2017;
2018: 11/01/2017-10/31/2018;
2019: 11/01/2018-10/31/2019;
2020: 11/01/2019-10/31/2020
I want a query that will calculate on any given day aggregate YTD with November 1st as the start date. I tried this query
select ytd_bucket
,sum(count_field) sum
from
(
select
timestamp_field,
count_field,
CASE
WHEN DATE(timestamp_field,"America/Los_Angeles") >= '2019-11-01' THEN '2020'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2018-11-01' AND CAST(CONCAT('2019-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2019'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2017-11-01' AND CAST(CONCAT('2018-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2018'
WHEN DATE(timestamp_field,"America/Los_Angeles") BETWEEN '2016-11-01' AND CAST(CONCAT('2017-',FORMAT_DATE('%m-%d', DATE(CURRENT_TIMESTAMP(),"America/Los_Angeles"))) AS DATE) THEN '2017'
ELSE NULL END as YTD_bucket
from table
)
group by 1
The above query does not aggregate the numbers are a YTD level. For the years prior to 2020 (ytd_bucket) the query is aggregating the entire years count.
Start by aggregating per day:
select date(timestamp_field, 'America/Los_Angeles') as dte,
count(*)
from table
group by dte;
Then, for the YTD, you want to add one year and get the date:
select dte,
count(*),
sum(count(*)) over (partition by extract(year from date_add(dte, interval 1 month))
order by min(timestamp_field)
) as running_cnt
from (select t.*,
date(timestamp_field, 'America/Los_Angeles') as dte
from t
) t
group by dte;

How to separate data columns by year using basic SQL (bigquery)

I am trying to create a visualization using bigquery and chartio. I want to display traffic volumes by day for each year to compare on one viz, to help identify seasonality.
I can break down the traffic by having a single column for traffic and another column for month and one for year, but this data structure doesn't work when I try to build the viz is chartio.
So what I am trying to do is to set a column for each year, where I have the traffic numbers set out by month. I am not sure of the way to do this, I know I probably need a union or a join here.
The code below combines the values, but doesn't get what I want.
Thanks in advance for the help!
SELECT
EXTRACT(MONTH FROM date) AS month,
EXTRACT(YEAR FROM date) AS year,
SUM(CAST(traffic AS INT64)) AS traffic
FROM
data.source
GROUP BY month, year
This is the output I get:
month year traffic
1 2017 11991865
3 2019 3482067
8 2017 21345567
6 2016 85207567
3 2018 22010756
What I want is:
month traffic_2016 traffic_2017
1 233391865 11991865
2 1123465 3482067
3 11996545 21345567
4 119916655 85207567
5 34571865 22010756
By using IF-ELSE / CASE WHEN statement with GROUP BY
SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(IF(EXTRACT(YEAR FROM date) = 2016, CAST(traffic AS INT64), 0) AS traffic_2016,
SUM(IF(EXTRACT(YEAR FROM date) = 2017, CAST(traffic AS INT64), 0) AS traffic_2017,
FROM
data.source
GROUP BY month
Simply with Join
SELECT
*
FROM
(SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(CAST(traffic AS INT64)) AS traffic_2016
FROM
data.source
WHERE
EXTRACT(MONTH FROM date) = 2016
GROUP BY month)
JOIN
(SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(CAST(traffic AS INT64)) AS traffic_2017
FROM
data.source
WHERE
EXTRACT(MONTH FROM date) = 2017
GROUP BY month)
USING(month)
Below is for BigQuery Standard SQL and provides less verbose and easier to read and maintain and extend with more columns version
#standardSQL
SELECT month,
SUM(IF(year = 2016, value, 0)) traffic_2016,
SUM(IF(year = 2017, value, 0)) traffic_2017,
SUM(IF(year = 2018, value, 0)) traffic_2018,
SUM(IF(year = 2019, value, 0)) traffic_2019
FROM `project.data.source`,
UNNEST([STRUCT(
EXTRACT(MONTH FROM `date`) AS month,
EXTRACT(YEAR FROM `date`) AS year,
CAST(traffic AS INT64) AS value
)])
GROUP BY month

GROUP BY month when selecting a date Teradata SQL assistant

SELECT EVENT_DT - ((EVENT_DT -DATE'1900-01-07') MOD 7) AS dates,
CLSFD_USER_ID AS user_id,
COUNT(DISTINCT CLSFD_USER_ID) AS number_of_user_ids,
COUNT(DISTINCT CLSFD_CAS_AD_ID) AS number_of_ads,
SUM(IMPRSN_CNT) AS number_of_impressions
FROM clsfd_access_views.CLSFD_CAS_AD_HST
WHERE CLSFD_SITE_ID = 3001
AND datum >= '2017-01-01'
GROUP BY 1,2
I want to have the total number of unique users during each month of the year 2017. I tried:
GROUP BY EXTRACT(MONTH FROM datum), 2
But this returns an error. What would be the most efficient code to retrieve the total number of user ids, ads, and impressions, per month.
It doesn't make sense to me to be aggregating by users, since they are what you are trying to count. Try grouping by the month and year alone:
SELECT
EXTRACT(YEAR FROM EVENT_DT) || '-' || EXTRACT(MONTH FROM EVENT_DT) AS month,
COUNT(DISTINCT CLSFD_USER_ID) AS number_of_user_ids,
COUNT(DISTINCT CLSFD_CAS_AD_ID) AS number_of_ads,
SUM(IMPRSN_CNT) AS number_of_impressions
FROM clsfd_access_views.CLSFD_CAS_AD_HST
WHERE
CLSFD_SITE_ID = 3001 AND
datum >= '2017-01-01' AND datum < '2018-01-01'
GROUP BY
EXTRACT(YEAR FROM EVENT_DT) || '-' || EXTRACT(MONTH FROM EVENT_DT);
Note that I changed your restriction on datum to also exclude any year greater than 2017.
If you want this values to be included in current query, then you should use analytical functions. For example "total number of unique users during each month" would be something like:
select count(distinct user_id) over(partition by EXTRACT(MONTH FROM datum))
Be aware that those values will be repeated for each user.

SQL Query to show all results before current month

I have a table in Oracle with columns: [DATEID date, COUNT_OF_PHOTOS int]
This table basically represents how many photos were uploaded per day.
I have a query that summarizes the number of photos uploaded per month:
select extract(year from dateid) as year, extract(month from dateid) as month, count(1) as Photos
from picture_table
group by extract(year from dateid), extract(month from dateid)
order by 1, 2
This does what I want, but I would like to run this query at the beginning of each month, lets say 07-02-2012, and have all data EXCLUDING the current month. How would I add a WHERE clause that ignores all entries that have a date equal to the current year+month?
Here is one way:
where to_char(dateid, 'YYYY-MM') <> to_char(sysdate, 'YYYY-MM')
To preserve any indexing strategy you may have on dateid:
select extract(year from dateid) as year, extract(month from dateid) as month, count(1) as Photos
from picture_table
WHERE (dateid < TRUNC(SYSDATE,'MM') OR dateid >= ADD_MONTHS(TRUNC(SYSDATE,'MM'),1))
group by extract(year from dateid), extract(month from dateid)
order by 1, 2