I am working in Postgres with a table like this:
mon yyyy weather
Apr 2014 78.45
Apr 2015 77.32
May 2014 79.56
May 2015 78.43
I would like to be able to query some results, ordered by "mon", where the weather column values are divided according to year-on-year by month.
In other words I want to query weather where Apr 2015 is divided by Apr 2014.
However, I would like to write the query in such a way that I do not have to specify month or year, and the query automatically divides weather values according to: Apr 2015/Apr 2014, then May 2014/May 2014 without having to key in every month and every year, which is laborious.
I have the following code, but this expands columns which is not what I want:
select (select "Weather" from yoy
where mon = 'Apr' and yyyy = '2015'
)/(select "American" from yoy
where mon = 'Apr' and yyyy = '2014'
) as "weather_apr",
(select "Weather" from yoy
where mon = 'May' and yyyy = '2015'
)/(select "Weather" from yoy
where mon = 'May' and yyyy = '2014'
) as "weather_may",
from yoy;
In my opinion this is the right scenario to take advantage of analytical window function. Here the magic without joins:
SELECT yyyy,
weather,
mon,
lead( weather ) over (partition by mon order by mon, yyyy desc),
weather / lead( weather ) over (partition by mon order by mon, yyyy desc)
FROM joy
I think you need a self join like in the below example:
SELECT j1."yyyy" As year,
j2."yyyy" As next_year,
j1."mon",
j1."weather",
j2."weather" As "weather from next year",
j1."weather"::float / j2."weather" As divide
FROM joy j1
JOIN joy j2
ON j1."yyyy" = j2."yyyy" - 1 AND j1."mon" = j2."mon"
demo: http://sqlfiddle.com/#!15/e02ec/1
I find conditional aggregation can be quite useful for this type of query:
select mon,
max(case when yyyy = 2014 then weather end) as weather_2014,
max(case when yyyy = 2015 then weather end) as weather_2015,
(max(case when yyyy = 2015 then weather end) /
max(case when yyyy = 2014 then weather end)
) as ratio
from yoy
group by mon
This assumes that you want the rows reduced to one per month. To get the previous value, just use lag():
select yoy.*,
lag(weather) over (partition by month order by year) as prev_weather
from yoy;
Related
I am trying to create a visualization using bigquery and chartio. I want to display traffic volumes by day for each year to compare on one viz, to help identify seasonality.
I can break down the traffic by having a single column for traffic and another column for month and one for year, but this data structure doesn't work when I try to build the viz is chartio.
So what I am trying to do is to set a column for each year, where I have the traffic numbers set out by month. I am not sure of the way to do this, I know I probably need a union or a join here.
The code below combines the values, but doesn't get what I want.
Thanks in advance for the help!
SELECT
EXTRACT(MONTH FROM date) AS month,
EXTRACT(YEAR FROM date) AS year,
SUM(CAST(traffic AS INT64)) AS traffic
FROM
data.source
GROUP BY month, year
This is the output I get:
month year traffic
1 2017 11991865
3 2019 3482067
8 2017 21345567
6 2016 85207567
3 2018 22010756
What I want is:
month traffic_2016 traffic_2017
1 233391865 11991865
2 1123465 3482067
3 11996545 21345567
4 119916655 85207567
5 34571865 22010756
By using IF-ELSE / CASE WHEN statement with GROUP BY
SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(IF(EXTRACT(YEAR FROM date) = 2016, CAST(traffic AS INT64), 0) AS traffic_2016,
SUM(IF(EXTRACT(YEAR FROM date) = 2017, CAST(traffic AS INT64), 0) AS traffic_2017,
FROM
data.source
GROUP BY month
Simply with Join
SELECT
*
FROM
(SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(CAST(traffic AS INT64)) AS traffic_2016
FROM
data.source
WHERE
EXTRACT(MONTH FROM date) = 2016
GROUP BY month)
JOIN
(SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(CAST(traffic AS INT64)) AS traffic_2017
FROM
data.source
WHERE
EXTRACT(MONTH FROM date) = 2017
GROUP BY month)
USING(month)
Below is for BigQuery Standard SQL and provides less verbose and easier to read and maintain and extend with more columns version
#standardSQL
SELECT month,
SUM(IF(year = 2016, value, 0)) traffic_2016,
SUM(IF(year = 2017, value, 0)) traffic_2017,
SUM(IF(year = 2018, value, 0)) traffic_2018,
SUM(IF(year = 2019, value, 0)) traffic_2019
FROM `project.data.source`,
UNNEST([STRUCT(
EXTRACT(MONTH FROM `date`) AS month,
EXTRACT(YEAR FROM `date`) AS year,
CAST(traffic AS INT64) AS value
)])
GROUP BY month
Anyone aware of a short, neat BQ query (#standardsql) to aggregate metrics (sessions / PVs / users etc.) by running 7d/14d/30d etc. buckets For ex.
16th-22nd April: 300K sessions
9th-15th April: 330K sessions
2nd-8th April: 270K sessions
OR, out-of-the box function that converts GA's date field (STRING) to days_since_epoch
I wrote a query but it's very complicated
- manually extract as YYY, MM, DD components with REGEXP_EXTRACT()
- convert to days_since_epoch using UNIX_DATE
- divide by '7' to group each row into weekly observations
- use GROUP BY to aggregate & report
any pointers to simplify this use case will be highly appreciable !
Cheers!
Anyone aware of a short, neat BQ query (#standardsql) to aggregate metrics (sessions / PVs / users etc.) by running 7d/14d/30d etc. buckets
See below 7d example for BigQuery Standard SQL - you can apply this logic to whatever data you have with hopefully light adjustments
#standardSQL
WITH data AS (
SELECT
day, CAST(1000 * RAND() AS INT64) AS events
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-04-25')) AS day
)
SELECT
FORMAT_DATE('%U', day) as week,
FORMAT_DATE('%Y, %B %d', MIN(day)) AS start,
FORMAT_DATE('%Y, %B %d', MAX(day)) AS finish,
SUM(events) AS events
FROM data
GROUP BY week
ORDER BY week
It produces below output that can be used as a starting point for further tailoring to your desired layout
week start finish events
01 2017, January 01 2017, January 07 3699
02 2017, January 08 2017, January 14 4008
03 2017, January 15 2017, January 21 3726
... ... ... ...
OR, out-of-the box function that converts GA's date field (STRING) to days_since_epoch
To convert STRING expressed date into date of DATE type - use PARSE_DATE as in below example
#standardSQL
SELECT PARSE_DATE('%Y%m%d', '20170425') AS date
result is
date
2017-04-25
Finally, below is example/template for running 7d/14d/30d etc. buckets
#standardSQL
WITH data AS (
SELECT
DAY, CAST(1000 * RAND() AS INT64) AS events
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-04-25')) AS DAY
)
SELECT
DAY,
SUM(CASE WHEN period = 7 THEN events END) AS days_07,
SUM(CASE WHEN period = 14 THEN events END) AS days_14,
SUM(CASE WHEN period = 30 THEN events END) AS days_30
FROM (
SELECT
dates.day AS DAY,
periods.period AS period,
SUM(events) AS events
FROM data AS activity
CROSS JOIN (SELECT DAY FROM data GROUP BY DAY) AS dates
CROSS JOIN (SELECT period FROM (SELECT 7 AS period UNION ALL
SELECT 14 AS period UNION ALL SELECT 30 AS period)) AS periods
WHERE dates.day >= activity.day
AND CAST(DATE_DIFF(dates.day, activity.day, DAY) / periods.period AS INT64) = 0
GROUP BY 1,2
)
GROUP BY DAY
ORDER BY DAY DESC
with output as below
DAY days_07 days_14 days_30
2017-04-25 2087 4004 9700
2017-04-24 1947 4165 9611
2017-04-23 1666 4066 9599
2017-04-22 2121 4820 10014
2017-04-21 2885 5421 10192
I am using the query composer on Google BigQuery.
I want to output the months in the correct order, e.g. starting with January, ending with December.
Here is my query:
SELECT month, gender, SUM(cost) AS Cost
FROM [medicare.medicareTable]
GROUP BY month, gender
ORDER BY month, gender
Without the ORDER BY above, the months were in a completely random order. Now they are alphabetised, which is a little better but still not what I want.
Using the above query, the output looks like this: https://docs.google.com/spreadsheets/d/18r_HhY1jG3Edkj5Nk8gDM_eSQ_1fI6ePHSZuJuoAppE/edit?usp=sharing
Thanks to anyone who can help.
for BigQuery Standard SQL you can use PARSE_DATE(). You can see Supported Format Elements for DATE
WITH m AS (
SELECT 'January 01 2016' AS d UNION ALL
SELECT 'February 01 2016' AS d UNION ALL
SELECT 'March 01 2016' AS d
)
SELECT d, EXTRACT(month FROM PARSE_DATE('%B %d %Y', d)) AS month_number
FROM m
ORDER BY month_number
You can try to get month number from month name and sort is ascending order.
Syntax for SQL Server
select DATEPART(MM,'january 01 2016') -- returns 1
So you can try something like this
SELECT month, gender, SUM(cost) AS Cost
FROM [medicare.medicareTable]
GROUP BY month, gender
ORDER BY datepart(MM,month + ' 01 2016'),gender
Hope this helps
I'm running a query during the month of august, but I want to convert it to a date_part function.
Here's the original query.
SELECT avg(CASE WHEN date = LEAST(current_date-1,'8/31/14') THEN bo ELSE NULL END) end_bo
From <table>
What I'm trying to do, is plug LEAST into the one below.
SELECT avg(CASE WHEN date_part('month', date) = 8 and date_part('year', date) = 2014 THEN bo ELSE NULL END) end_bo
From <table>
The problem is, I don't see where I can plug it in.
The first one looks to see which of those dates are earlier, currentdate or Aug 31 2014, if it is equal to date it returns bo.
While the second one tests to see if the current month is aug 2014 and returns bo.
To apply the same logic to the second query, it might look like this:
SELECT avg(CASE
WHEN least(date_part('month', currentdate), 8) = date_part('month', date)
and date_part('year', date) = 2014
THEN bo
ELSE NULL
END) end_bo
This would look at the current month, find if it was lower than 8, compare it with date month, if equal return bo.
LEAST simply returns the lowest number in a list. So least(9,8) (sept, aug) would return 8.
I have a data set which looks like this:
month year total_sales
01 2014 4567889
02 2014 5635627
03 2014 997673
04 2014 2134566
05 2014 2666477
My goal is to create a YTD function on the above dataset.
Eg: If I want the 01 month data to display, it should give the total sales for 01 month. If i want the 02 month to display, it should give me the total sales for 01 + 02 month combined and so on for the other months.
The query i wrote goes as follows:
select year, case when month in ('01') then 'JAN'
when month in ('01','02') then 'FEB'
-
-
-
when month in ('01','02','03','04','05') then 'MAY'
end as bucket, sum(total_sales) from <table_name>
group by year, case when month in ('01') then 'JAN'
when month in ('01','02') then 'FEB'
-
-
-
when month in ('01','02','03','04','05') then 'MAY'
end
The result set it fetches, doesn't add up the total sales as a YTD function but instead shows the total sales for that particular month only.
I can create the Pivot table view for the required data set but that is not how i need it because i need to build reports on the data set.
Can someone help me with the concept if i am missing something?
Thanks in advance.
Perhaps a correlated subquery would help:
select t.*,
(select sum(total_sales)
from table t2
where t2.year = t.year and
t2.month <= t.month
) as YTD
from table t;
Here is another solution:
WITH months AS (
SELECT month, year
FROM <table_name>
)
SELECT m.month, m.year, SUM(t.total_sales)
FROM months m JOIN <table_name> t ON t.year=m.year AND t.month<=m.month
GROUP BY m.year, m.month
ORDER BY m.year, m.month;