BigQuery - SQL date function, group by month spanning years - sql

I have a field in a BigQuery table:
'created_date'
. I need to get output of the count of records by 'created_date' by month spanning years: 2014...2019 for example:
Desired output:
2014
Jan 1125
Feb 3308
2015
Jan 544
Feb 107
...
2016
...
2017
2018
...
2019
Jan 448
Feb 329
...
and so on.
or even:
Jan-2014 <count>
Feb-2014 <count>
anything will do, just an inclusive count (aggregation) of all records by month.
I found several ways to do this on Stack Overflow for Oracle, PostgreSQL and MySQL, but none of the approaches work with BigQuery.
Has anyone successfully done this with BigQuery? (and how). All responses, very much appreciated.

Below is for BigQuery Standard SQL
It assumes the created_date field is of DATE data type
#standardSQL
SELECT
FORMAT_DATE('%b-%Y', created_date) mon_year,
COUNT(1) AS `count`
FROM `project.dataset.table`
GROUP BY mon_year
ORDER BY PARSE_DATE('%b-%Y', mon_year)
Above query will produce something like below
Row mon_year count
1 Jan-2014 1389
2 Feb-2014 1255
3 Mar-2014 1655
. . .
60 Dec-2018 1677
61 Jan-2019 1534
62 Feb-2019 588

Use date_trunc():
select date_trunc(created_date, month)as yyyymm, count(*)
from t
group by yyyymm
order by yyyymm;

Related

Bigquery numeric datetime to string datetime

I have written a query in bigquery like below:
SELECT date_trunc(dd.created, week) AS week,
COUNT(DISTINCT dd.user) AS total,
COUNT(dd.upload) AS info
FROM
local.detail dd
LEFT JOIN local.list du ON dd.id = du.id
WHERE
regexp_extract(du.email, r '#(.+)') != 'gmail.com'
GROUP BY
date_trunc(dd.created, week);
Output:
week
total
info
2020-02-02 00:00:00
625
382
2020-03-22 00:00:00
1059
329
i want the week_signup column data format like this(just month and day):
week
total
info
Feb 02
625
382
Mar 03
1059
329
How can i write this in bigquery to get this??
Use format_date for the same.
E.g. FORMAT_DATE("%a %d", date_trunc(dd.created_date, week))

How to compare same period of different date ranges in columns in BigQuery standard SQL

i have a hard time figuring out how to compare the same period (e.g. iso week 48) from different years for a certain metric in different columns. I am new to SQL and haven't fully understand how PARTITION BY works but guess that i'll need it for my desired output.
How can i sum the data from column "metric" and compare same periods of different date ranges (e.g. YEAR) in a table?
current table
date iso_week iso_year metric
2021-12-01 48 2021 1000
2021-11-30 48 2021 850
...
2020-11-28 48 2020 800
2020-11-27 48 2020 950
...
2019-11-27 48 2019 700
2019-11-26 48 2019 820
desired output
iso_week metric_thisYear metric_prevYear metric_prev2Year
48 1850 1750 1520
...
Consider below simple approach
select * from (
select * except(date)
from your_table
)
pivot (sum(metric) as metric for iso_year in (2021, 2020, 2019))
if applied to sample data in your question - output is

How Do I retrieve most Recent record in different years With Date date in different table

I'm working with a database that isn't structured that well and need to retrieve the row with the latest month used in specific years. The main data is stored is stored in the member table and lists one row per member month. The Date for the member month is not specifically stored here but connected by a foreign Date_Key and linked to a Date table. This is where the column for the Year and Month can be derived based on the Date_Key specified in each table. Each row in the Date table represents 1 new month for a year and each of these rows has a unique sequential date_key.
I am using Microsoft SQL Server Studio as the environment
Member Table
MemberKey
Membe_ID
Date_Key
100
1234
89
101
1234
96
102
1234
97
103
1236
96
104
1236
97
Date Table
Date_Key
Year
Month
89
2020
10
90
2020
11
91
2020
12
92
2021
1
93
2021
2
94
2021
3
95
2021
4
96
2021
5
97
2021
6
Looking for the following Results
Member_ID
Year
Month
1234
2020
10
1234
2021
6
1236
2021
6
2020/11 is NOT a date. It is a year/month pair. But it seems like a simple aggregate - select year, max(month) group by year. You join and include member ID so you include that column in the GROUP BY clause to get one row per member per year.
select mbr.Member_ID, dts.Year, max(dts.Month) as Month
from dbo.Members as mbr
inner join dbo.Dates as dts on mbr.Date_Key = dts.Date_Key
group by mbr.Member_ID, dts.Year
order by mbr.Member_ID, dts.Year
;

Query help : Running Avg for last 15 days

please help me with a query to find running Avg for every 15 days. I have used below query but not sure how to display only 15 days Avg.
Select Date,
Avg(Qty) OVER (ORDER BY Date ROWS BETWEEN 15 PRECEDING AND CURRENT ROW) AS RunningAvg
FROM Sample
Sample Table : (Contains Qty for each Day)
Date Qty
2014-10-01 4
2014-10-02 5
..
..
2014-12-31 4
Expected Result.
Date RunningAvg
2014-10-01 4
2014-10-15 XX
2014-11-01 XX
2014-11-15 XX
2014-12-01 XX
.
.
.
I'm a bit baffled by the question. Your results seem to suggest that you want the values on the 1st and 15th of the month -- and that has nothing to do with 15-day moving averages. For such filtering you can use:
select t.*
from t
where day(date) in (1, 15);
As you know, some months have 28, 29, or 31 days so "15 days" has nothing to do with the day of the months. And the number of days between the 1st and 15th is 14 days, not 15.

How to perform multiple table calculation with joins and group by

I have two tables client and grouping. They look like this:
Client
C_id
C_grouping_id
Month
Profit
Grouping
Grouping_id
Month
Profit
The client table contains monthly profit for every client and every client belongs to a specific grouping scheme specified by C_grouping_id.
The grouping table contains all the groups and their monthly profits.
I'm struggling with a query that essentially calculates the monthly residual for every subscriber:
Residual= (Subscriber Monthly Profit - Grouping monthly Profit)*(average subscriber monthly profits for all months / average profits for all months for the grouping subscriber belongs to)
I have come up with the following query so far but the results seem to be incorrect:
SELECT client.C_id, client.C_grouping_Id, client.Month,
((client.Profit - grouping.profit) * (avg(client.Profit)/avg(grouping.profit))) as "residual"
FROM client
INNER JOIN grouping
ON "C_grouping_id"="Grouping_id"
group by client.C_id, client.C_grouping_Id,client.Month, grouping.profit
I would appreciate it if someone can shed some light on what I'm doing wrong and how to correct it.
EDIT: Adding sample data and desired results
Client
C_id C_grouping_id Month Profit
001 aaa jul 10$
001 aaa aug 12$
001 aaa sep 8$
016 abc jan 25$
016 abc feb 21$
Grouping
Grouping_id Month Profit
aaa Jul 30$
aaa aug 50$
aaa Sep 15$
abc Jan 21$
abc Feb 27$
Query Result:
C_ID C_grouping_id Month Residual
001 aaa Jul (10-30)*(10/31.3)=-6.38
... and so on for every month for avery client.
This can be done in a pretty straight forward way.
The main difficulty is obviously that you try to deal with different levels of aggregation at once (average of the group and the client as well as the current record).
This is rather difficult/clumsy with simple SELECT FROM GROUP BY-SQL.
But with analytical functions aka Window functions this is very easy.
Start with combining the tables and calculating the base numbers:
select c.c_id as client_id,
c.c_grouping_id as grouping_id,
c.month,
c.profit as client_profit,
g.profit as group_profit,
avg (c.profit) over (partition by c.c_id) as avg_client_profit,
avg (g.profit) over (partition by g.grouping_id) as avg_group_profit
from client c inner join grouping g
on c."C_GROUPING_ID"=g."GROUPING_ID"
and c. "MONTH" = g. "MONTH";
With this you already get the average profits by client and by grouping_id.
Be aware that I changed the data type of the currency column to DECIMAL (10,3) as a VARCHAR with a $ sign in it is just hard to convert.
I also fixed the data for MONTHS as the test data contained different upper/lower case spellings which prevented the join to work.
Finally I turned all column names into upper case to, in order to make typing easier.
Anyhow, running this provides you with the following result set:
CLIENT_ID GROUPING_ID MONTH CLIENT_PROFIT GROUP_PROFIT AVG_CLIENT_PROFIT AVG_GROUP_PROFIT
16 abc JAN 25 21 23 24
16 abc FEB 21 27 23 24
1 aaa JUL 10 30 10 31.666
1 aaa AUG 12 50 10 31.666
1 aaa SEP 8 15 10 31.666
From here it's only one step further to the residual calculation.
You can either put this current SQL into a view to make it reusable for other queries or use it as a inline view.
I chose to use it as a common table expression (CTE) aka WITH clause because it's nice and easy to read:
with p as
(select c.c_id as client_id,
c.c_grouping_id as grouping_id,
c.month,
c.profit as client_profit,
g.profit as group_profit,
avg (c.profit) over (partition by c.c_id) as avg_client_profit,
avg (g.profit) over (partition by g.grouping_id) as avg_group_profit
from client c inner join grouping g
on c."C_GROUPING_ID"=g."GROUPING_ID"
and c. "MONTH" = g. "MONTH")
select client_id, grouping_id, month,
client_profit, group_profit,
avg_client_profit, avg_group_profit,
round( (client_profit - group_profit)
* (avg_client_profit/avg_group_profit), 2) as residual
from p
order by grouping_id, month, client_id;
Notice how easy to read the whole statement is and how straight forward the residual calculation is done.
The result is then this:
CLIENT_ID GROUPING_ID MONTH CLIENT_PROFIT GROUP_PROFIT AVG_CLIENT_PROFIT AVG_GROUP_PROFIT RESIDUAL
1 aaa AUG 12 50 10 31.666 -12
1 aaa JUL 10 30 10 31.666 -6.32
1 aaa SEP 8 15 10 31.666 -2.21
16 abc FEB 21 27 23 24 -5.75
16 abc JAN 25 21 23 24 3.83
Cheers,
Lars