Bigquery numeric datetime to string datetime - sql

I have written a query in bigquery like below:
SELECT date_trunc(dd.created, week) AS week,
COUNT(DISTINCT dd.user) AS total,
COUNT(dd.upload) AS info
FROM
local.detail dd
LEFT JOIN local.list du ON dd.id = du.id
WHERE
regexp_extract(du.email, r '#(.+)') != 'gmail.com'
GROUP BY
date_trunc(dd.created, week);
Output:
week
total
info
2020-02-02 00:00:00
625
382
2020-03-22 00:00:00
1059
329
i want the week_signup column data format like this(just month and day):
week
total
info
Feb 02
625
382
Mar 03
1059
329
How can i write this in bigquery to get this??

Use format_date for the same.
E.g. FORMAT_DATE("%a %d", date_trunc(dd.created_date, week))

Related

How to compare same period of different date ranges in columns in BigQuery standard SQL

i have a hard time figuring out how to compare the same period (e.g. iso week 48) from different years for a certain metric in different columns. I am new to SQL and haven't fully understand how PARTITION BY works but guess that i'll need it for my desired output.
How can i sum the data from column "metric" and compare same periods of different date ranges (e.g. YEAR) in a table?
current table
date iso_week iso_year metric
2021-12-01 48 2021 1000
2021-11-30 48 2021 850
...
2020-11-28 48 2020 800
2020-11-27 48 2020 950
...
2019-11-27 48 2019 700
2019-11-26 48 2019 820
desired output
iso_week metric_thisYear metric_prevYear metric_prev2Year
48 1850 1750 1520
...
Consider below simple approach
select * from (
select * except(date)
from your_table
)
pivot (sum(metric) as metric for iso_year in (2021, 2020, 2019))
if applied to sample data in your question - output is

How to determine number of days in a month in Presto?

I have data with date, userid, and amount. I want to calculate sum(amount) divided by total day for each month. the final will be presented in monthly basis.
The table I have is looks like this
date userid amount
2019-01-01 111 10
2019-01-15 112 20
2019-01-20 113 10
2019-02-01 114 30
2019-02-15 111 20
2019-03-01 115 40
2019-03-23 155 50
desired result is like this
date avg_qty_sol
Jan-19 1.29
Feb-19 1.79
Mar-19 2.90
avg_qty_sold is coming from sum(amount) / total day for respective month
e.g for jan 2019 sum amount is 40 and total days in jan is 31. so the avg_qty_sold is 40/31
Currently Im using case when for this solution. is there any better approach to this?
Since Presto 318, you this is as easy as:
SELECT day(last_day_of_month(some_date))
See https://trino.io/docs/current/functions/datetime.html#last_day_of_month
Before Presto 318, You can combine date_trunc with EXTRACT:
date_trunc('month', date_value)) gives beginning of the month, while date_add('month', 1, date_trunc('month', date_value)) gives beginning of the next month
subtracting date values returns an interval day to second
EXTRACT(DAY FROM interval) returns day-portion of the interval. You can also use day convenience function instead of EXTRACT(DAY FROM ...). The EXTRACT syntax is more verbose and more standard.
presto:default> SELECT
-> date_value,
-> EXTRACT(DAY FROM (
-> date_add('month', 1, date_trunc('month', date_value)) - date_trunc('month', date_value)))
-> FROM (VALUES DATE '2019-01-15', DATE '2019-02-01') t(date_value);
date_value | _col1
------------+-------
2019-01-15 | 31
2019-02-01 | 28
(2 rows)
Less natural, but a bit shorter alternative would be to get day number for the last day of given month with day(date_add('day', -1, date_add('month', 1, date_trunc('month', date_value)))):
presto:default> SELECT
-> date_value,
-> day(date_add('day', -1, date_add('month', 1, date_trunc('month', date_value))))
-> FROM (VALUES DATE '2019-01-15', DATE '2019-02-01') t(date_value);
date_value | _col1
------------+-------
2019-01-15 | 31
2019-02-01 | 28
(2 rows)

BigQuery - SQL date function, group by month spanning years

I have a field in a BigQuery table:
'created_date'
. I need to get output of the count of records by 'created_date' by month spanning years: 2014...2019 for example:
Desired output:
2014
Jan 1125
Feb 3308
2015
Jan 544
Feb 107
...
2016
...
2017
2018
...
2019
Jan 448
Feb 329
...
and so on.
or even:
Jan-2014 <count>
Feb-2014 <count>
anything will do, just an inclusive count (aggregation) of all records by month.
I found several ways to do this on Stack Overflow for Oracle, PostgreSQL and MySQL, but none of the approaches work with BigQuery.
Has anyone successfully done this with BigQuery? (and how). All responses, very much appreciated.
Below is for BigQuery Standard SQL
It assumes the created_date field is of DATE data type
#standardSQL
SELECT
FORMAT_DATE('%b-%Y', created_date) mon_year,
COUNT(1) AS `count`
FROM `project.dataset.table`
GROUP BY mon_year
ORDER BY PARSE_DATE('%b-%Y', mon_year)
Above query will produce something like below
Row mon_year count
1 Jan-2014 1389
2 Feb-2014 1255
3 Mar-2014 1655
. . .
60 Dec-2018 1677
61 Jan-2019 1534
62 Feb-2019 588
Use date_trunc():
select date_trunc(created_date, month)as yyyymm, count(*)
from t
group by yyyymm
order by yyyymm;

Left join with nested selects and aggregate functions

Problem
I have one table of generated dates (s) which I want to join with another table (d) which is a list of dates where a specific occurrence has happened.
table s
Wednesday 23rd August 2017
Thursday 24th August 2017
Friday 25th August 2017
Saturday 26th August 2017
table d
day_created -------------------------------- count
Thursday 24th August 2017 ---------------- 45
Saturday 26th August 2017 ---------------- 32
I want to show rows where the occurrence does not take place, which I cannot do if I just have table d.
I want something that looks like:
day_created -------------------------------- count
Wednesday 23rd August --------------------- 0
Thursday 24th August 2017 ---------------- 45
Friday 25th August 2017 ------------------ 0
Saturday 26th August 2017 ---------------- 32
I've tried joining with a left join as follows:
SELECT day_created, COUNT(d.day_created) as total_per_day
FROM
(SELECT date_trunc('day', task_1.created_at) as day_created
FROM task_1
)
d
LEFT JOIN (
SELECT (generate_series('2017-05-01', current_date, '1 day'::INTERVAL)) as standard_date
)
s
ON d.day_created=s.standard_date
GROUP BY d.day_created
ORDER BY day_created DESC;
I don't get an error however the join isn't working (i.e. it doesn't return dates where the count is null). What it returns is the dates from table d and the count, but not the dates in between where there are 0 occurrences.
I've been going round in circles and have understood that I need to make table s (I think!) the left table, but I'm getting confused as a newbie with the syntax.
This is all in PostgreSQL 9.5.8.
Basically, you had the LEFT JOIN backwards. This should work, with some other simplifications and performance optimizations:
SELECT s.standard_date, COUNT(d.day_created) AS total_per_day
FROM generate_series('2017-05-01', current_date, interval '1 day') s(standard_date)
LEFT JOIN task_1 d ON d.day_created >= s.standard_date
AND d.day_created < s.standard_date + interval '1 day'
GROUP BY 1
ORDER BY 1;
This counts rows in d, like you commented. Does not sum values.
Be aware that generate_series() still returns timestamp with time zone, even if you pass date values to it. You may want to cast to date or format with to_char() for display in the outer SELECT. (But rather group and order by the original timestamp value, not the formatted string.)
There may be corner cases depending on the current time zone setting depending on the actual undisclosed table definition.
Related:
How to avoid a subquery in FILTER clause?
I have one table of generated dates (s)
In real databases, we don't store a generated series. We just generate them when needed.
which I want to join with another table (d) which is a list of dates where a specific occurrence has happened. [...] I want to show rows where the occurrence does not take place, which I cannot do if I just have table d.
Nah, you can do it.
CREATE TABLE d(day_created, count) AS VALUES
('24 August 2017'::date, 45),
('26 August 2017'::date, 32);
SELECT day_created, coalesce(count,0)
FROM (
SELECT d::date
FROM generate_series(
'2017-08-01'::timestamp without time zone,
'2017-09-01'::timestamp without time zone,
'1 day'
) AS gs(d)
) AS gs(day_created)
LEFT OUTER JOIN d USING(day_created)
ORDER BY day_created;
day_created | coalesce
-------------+----------
2017-08-01 | 0
2017-08-02 | 0
2017-08-03 | 0
2017-08-04 | 0
2017-08-05 | 0
2017-08-06 | 0
2017-08-07 | 0
2017-08-08 | 0
2017-08-09 | 0
2017-08-10 | 0
2017-08-11 | 0
2017-08-12 | 0
2017-08-13 | 0
2017-08-14 | 0
2017-08-15 | 0
2017-08-16 | 0
2017-08-17 | 0
2017-08-18 | 0
2017-08-19 | 0
2017-08-20 | 0
2017-08-21 | 0
2017-08-22 | 0
2017-08-23 | 0
2017-08-24 | 45
2017-08-25 | 0
2017-08-26 | 32
2017-08-27 | 0
2017-08-28 | 0
2017-08-29 | 0
2017-08-30 | 0
2017-08-31 | 0
2017-09-01 | 0
(32 rows)

Join against date range, aggregate by SUM

I need to gather the SUM of sales made on a certain category item, grouped by day for a selected date range (could be from a week out to 12weeks) and return 0 instead of NULL for days where no transactions have occurred.
My original idea was to use a pre-populated table called "calendar" (shown below) which has about 10yrs of dates which I could LEFT JOIN my "products" table against to get days when no sales occurred as a 0 SUM.
Result was too large to deal with, so I'm trying to first copy the selected range of dates to an empty table called "datetable" which shares the same column names as "calendar". So I have 3 tables:
"calendar" table. It has 10 years worth of dates with following column names:
IsoDate DayNameOfWeek
2012-01-01 Sun
2012-01-02 Mon
2012-01-03 Tue
2012-01-04 Wed
2012-01-05 Thu
2012-01-06 Fri
2012-01-07 Sat
2012-01-08 Sun
2012-01-09 Mon
2012-01-10 Tue
etc for 10yrs
"datetable" table (this is created empty with two columns to prefill from "calendar" table so the date range data for the LEFT JOIN is more compact):
IsoDate DayNameOfWeek
"products" table. It is where I'm storing sales for each ProductCat:
ExpDate ProductCat Amount
2012-01-03 28 232
2012-01-04 29 100
2012-01-04 29 1002
2012-01-06 12 12
2012-01-06 29 9
2012-01-07 10 100
2012-01-07 29 122
2012-01-07 29 17
The output I'm looking for based on a single "ProductCat" number, in this case 29:
IsoDate DayNameOfWeek AmountSummed
2012-01-01 Sun 0
2012-01-02 Mon 0
2012-01-03 Tue 0
2012-01-04 Wed 1102
2012-01-05 Thu 0
2012-01-06 Fri 9
2012-01-07 Sat 139
2012-01-08 Sun 0
2012-01-09 Mon 0
2012-01-10 Tue 0
My code is below. The initial insert works fine but I'm not sure of the syntax that will make the second part with the JOIN and the SUM work:
INSERT INTO datetable (IsoDate, DayNameOfWeek)
SELECT IsoDate, DayNameOfWeek
FROM calendar
WHERE IsoDate
BETWEEN '2012-07-01' AND '2012-07-10'
SELECT ExpDate, SUM(IFNULL(Amount, 0))
AS AmountSummed
FROM products
WHERE ProductCat = 29
AND ExpDate BETWEEN '2012-07-01' AND '2012-07-10'
LEFT JOIN products
ON datetable.IsoDate=products.ExpDate
GROUP BY datetable.IsoDate
EDIT
This is the code that works now:
SELECT C.IsoDate,IFNULL(SUM(P.Amount),0) AS AmountSummed
FROM calendar C LEFT OUTER JOIN products P ON C.IsoDate=P.ExpDate
AND P.ProductCat = 29
WHERE C.IsoDate BETWEEN '2012-07-01' AND '2012-07-10'
GROUP BY C.IsoDate, C.DayNameOfWeek
ORDER BY C.IsoDate
You've pretty much got what you need. However, you don't need the datetable.
Your query should look like this:
SELECT C.IsoDate, C.DayNameOfWeek, IFNULL(SUM(P.Amount),0) AS AmountSummed
FROM calendar C LEFT JOIN products P ON C.IsoDate=P.ExpDate
WHERE C.IsoDate BETWEEN '2012-07-01' AND '2012-07-10'
AND P.ProductCat = 29
GROUP BY C.IsoDate, C.DayNameOfWeek
ORDER BY C.IsoDate
If you really want to use your datetable, just substitute it in for calendar and remove the C.IsoDate BETWEEN '2012-07-01' AND '2012-07-10' (assuming that the datetable was empty before you started) because datetime already has all the date you are looking for.