PIVOTing BigQuery tables on DATE type - google-bigquery

I'm trying to pivot a table with each row being a transaction with a date. E.g.
WITH Produce AS (
SELECT 'Kale' as product, 51 as sales, DATE('2020-01-01') as dates UNION ALL
SELECT 'Kale', 23, DATE('2020-01-02') UNION ALL
SELECT 'Kale', 45, DATE('2020-01-03') UNION ALL
SELECT 'Kale', 3, DATE('2020-01-04') UNION ALL
SELECT 'Apple', 70, DATE('2020-01-01') UNION ALL
SELECT 'Apple', 85, DATE('2020-01-02') UNION ALL
SELECT 'Apple', 77, DATE('2020-01-03') UNION ALL
SELECT 'Apple', 1, DATE('2020-01-04')
)
My query is something like this.
SELECT * FROM Produce
PIVOT(sum(sales) FOR dates IN (DATE('2020-01-01'), DATE('2020-01-02'), DATE('2020-01-03'), DATE('2020-01-04')))
However BigQuery returns the following error.
Generating an implicit alias for this PIVOT value is not supported; please provide an explicit alias at [14:41]
According to the docs, pivot columns with type of date should be able to generate alias implicitly, or is my understanding wrong?

From Rules for pivot_column:
A pivot_column must be a constant.
DATE('2020-01-01') is an expression, not a constant. So you need to use one of followings.
PIVOT(sum(sales) FOR dates IN (DATE '2020-01-01', ...) -- explicit DATE literal
-- or
PIVOT(sum(sales) FOR dates IN ('2020-01-01', ...) -- literal implicitly coerced to DATE type
-- or
PIVOT(sum(sales) FOR dates IN (DATE('2020-01-01') AS _2020_01_01, ...) -- alias
Dynamic SQL Example
EXECUTE IMMEDIATE FORMAT("""
SELECT * FROM Produce
PIVOT (SUM(sales) FOR dates IN (%s))
""", TRIM(TO_JSON_STRING(GENERATE_DATE_ARRAY('2020-01-01', '2020-01-4')), '[]'));
EXECUTE IMMEDIATE

Use below instead
SELECT * FROM Produce
PIVOT(sum(sales) FOR dates IN ('2020-01-01','2020-01-02', '2020-01-03', '2020-01-04'))
with output

Related

Is there a way to create a Group ID in standard SQL that changes based on some criteria in other columns?

I am working in Google Bigquery, and I am trying to calculate a column in standard SQL that would assign a Group ID to rows, based on some criteria. The criteria would be that a group ID, starting at 1, should be created per unique Variable value, and the group should be split into a new group if the time difference between the current and consecutive Time value is > 2 mins.
See image: Sample Data
I have added a column called LEAD_Time, allowing me to also calculate a Time_Diff column (mins). My desired result is the last column (GroupID). Note how variable C has been split into two groups between rows 23 and 24 due to the time difference being > 2 mins.
It is my understanding that I would need to partition by Variable, and also by some alteration of the TimeStamp_Diff column. I have however not been able to reproduce the last column as per the sample image.
Any help would be greatly appreciated!
Try the following
with sample_data as (
SELECT 1 as row, 'A' as variable, TIME '07:31:30' as time UNION ALL
SELECT 2, 'A', TIME '07:33:30' UNION ALL
SELECT 3, 'A', TIME '07:35:30' UNION ALL
SELECT 4, 'A', TIME '07:37:30' UNION ALL
SELECT 5, 'B', TIME '08:01:30' UNION ALL
SELECT 6, 'B', TIME '08:03:30' UNION ALL
SELECT 7, 'B', TIME '08:05:30' UNION ALL
SELECT 8, 'B', TIME '08:07:30' UNION ALL
SELECT 9, 'C', TIME '09:03:30' UNION ALL
SELECT 10, 'C', TIME '09:05:30' UNION ALL
SELECT 11, 'C', TIME '09:07:30' UNION ALL
SELECT 12, 'C', TIME '09:09:30' UNION ALL
SELECT 13, 'C', TIME '09:11:30' UNION ALL
SELECT 14, 'C', TIME '09:21:30' UNION ALL
SELECT 15, 'C', TIME '09:31:30' UNION ALL
SELECT 16, 'C', TIME '09:33:30' UNION ALL
SELECT 17, 'D', TIME '09:55:30'
),
time_diff_data as (
SELECT *
, LEAD(time) OVER (PARTITION BY variable ORDER BY time) as lead_time
, TIME_DIFF(LEAD(time) OVER (PARTITION BY variable ORDER BY time), time, minute) as time_diff
, TIME_DIFF(time, LAG(time) OVER (PARTITION BY variable ORDER BY time), minute) as prev_time_diff
FROM sample_data
)
select *
,(countif(prev_time_diff > 2) OVER (PARTITION BY variable ORDER BY time))+1 as group_id
from time_diff_data
I think the problem is you really want to be looking at the lag in time_diff rather than the lead. From there you can perform a count based on if the prev_time_diff has gone beyond your threshold.

SQL query help using two WHERE clauses

I have a table with data spanning about two weeks. I want to see the average for the first 7 days and then the next 8.
I have tried various JOINS with no luck. I am new to SQL so I am probably missing something simple.
Basically these queries work. How do I combine them?
select count(Field)/8
from TABLE
WHERE Publish_date >= '04/05/19'
select count(Field)/7
from TABLE
WHERE Publish_date < '04/05/19'
If you really need to combine them, then you can do sub-queries:
SELECT
(
SELECT SUM(Field)/8
FROM TABLE
WHERE Publish_date >= '04/05/19'
) as date1,
(
SELECT SUM(Field)/7
FROM TABLE
WHERE Publish_date < '04/05/19'
) as date2
Please note that you wish to use SUM instead of COUNT, because COUNT just get rows count, not it's values summed up.
I think you can try the AVG() function. If your dates are proper, count of days will be taken care automatically.
2 weeks => 14 days. How did you get 7+8 = 15 days ?
If you need two different rows -
;with t (val, dt) as (
select 183, getdate()-6 union all
select 183, getdate()-5 union all
select 183, getdate()-4 union all
select 183, getdate()-3 union all
select 183, getdate()-2 union all
select 183, getdate()-1 union all
select 183, getdate() union all --< cutoff date
select 183, getdate()+1 union all
select 183, getdate()+2 union all
select 183, getdate()+3 union all
select 183, getdate()+4 union all
select 183, getdate()+5 union all
select 183, getdate()+6 union all
select 20, getdate()+7
)
select 'first-half' , AVG(val) averg from t where dt < getdate()
union all
select 'second-half' , AVG(val) averg from t where dt >= getdate()
Just use UNION ALL between them like this:
select count(Field)/8 from TABLE WHERE Publish_date >= '04/05/19'
UNION ALL
select count(Field)/7 from TABLE WHERE Publish_date < '04/05/19'

Fiscal Period Name from varchar column

I fairly new to SQL and learning so much from this forum. Thank All!
Have another one for which I seek expert advice.
I have a column which basically is fiscal period. However, its not a date column, but instead of type varchar.
Intention is to manipulate the column value to get the name of the fiscal month out of it.
So in the example below, I have column Fiscal_Period available and I need column FiscalMonth.
Fiscal_Period FiscalMonth
----------------------------
2018001 Jan
2018002 Feb
2018003 Mar
2018004 Apr
2018005 May
2018006 Jun
2018007 Jul
2018008 Aug
2018009 Sep
2018010 Oct
2018011 Nov
2018012 Dec
Is there a straightforward way of achieving this then the way I am then trying?
My current hack approach is:
Convert Fiscal_Period column filed into YYYY-DD-MM format by using substring and concat function
Use Datetime to convert into date and then extract month out of it
Use Month and Datename function
My query:
SELECT
DATENAME(MONTH, (CONVERT(DATETIME, (CONCAT(SUBSTRING(FISCAL_PERIOD, 1, 4), '-', SUBSTRING(FISCAL_PERIOD, 6, LEN(FISCAL_PERIOD)), '-', '11'))))) AS PeriodName
FROM
table 1
Cheers!
this is slightly shorter
datename(month, stuff(FISCAL_PERIOD, 5, 1, '-') + '-01')
if you only wanted the abbreviated month name then apply LEFT ( ) on the above expression
As you only need a month name neither the year or day need to vary, hence you can use constants for those. In T-SQL a literal that equates to YYYYMMDD is safe to use as a date, so:
select datename(month,'2018' + right(fiscal_period,2) + '01')
This is basically the answers already given, just figured id finish typing it. It relies on filling in a problem with the date format as given, but otherwise it works the same.
;with src (dt) as
(
select convert(date, '20180101', 120) union all
select '20180201' union all
select '20180301' union all
select '20180401' union all
select '20180501' union all
select '20180601' union all
select '20180701' union all
select '20180801' union all
select '20180901' union all
select '20181001' union all
select '20181101' union all
select '20181201'
)
select
Dt,
left(datename(month, dt), 3)
from src

How to get all storeIds opened 12 months before a specific date with Google BigQuery?

I would like to retrieve all storeIds where dateOpened >=12 months before a specific date (e.g., 10/4/2017). What SQL statement can do it? The dateOpened field is a timestamp. I'm using Google BigQuery Legacy SQL.
We're using Google's BigQuery Legacy SQL.
For BigQuery Legacy SQL you can use as below
#legacySQL
SELECT storeId FROM
WHERE dateOpened
BETWEEN DATE_ADD(TIMESTAMP('2017-10-04'), -12, 'MONTH')
AND TIMESTAMP('2017-10-04')
you can test/play-with this using below "template" with dummy data
#legacySQL
SELECT storeId FROM
(SELECT 1 AS storeId, CURRENT_TIMESTAMP() AS dateOpened),
(SELECT 2 AS storeId, TIMESTAMP('2017-02-10') AS dateOpened),
(SELECT 3 AS storeId, TIMESTAMP('2017-03-10') AS dateOpened),
(SELECT 4 AS storeId, TIMESTAMP('2016-03-10') AS dateOpened)
WHERE dateOpened
BETWEEN DATE_ADD(TIMESTAMP('2017-10-04'), -12, 'MONTH')
AND TIMESTAMP('2017-10-04')
As you might noticed - the answers you got so far are for BigQuery Standard SQL
BigQuery Team very strongly suggests using Standard SQL - if you will decide to follow - see Migrating to Standard SQL
In standard SQL, you can do this using dates (I don't see any reason why the time component would be important for this type of query):
where date(dateOpened) >= dateadd(date(2017, 10, 4), interval -12 month)
This might work for you:
#standardSQL
WITH data AS(
SELECT 1 storeId, TIMESTAMP("2017-08-31") dateOpened UNION ALL
SELECT 2, TIMESTAMP("2017-07-31") UNION ALL
SELECT 3, TIMESTAMP("2017-05-31") UNION ALL
SELECT 4, TIMESTAMP("2017-03-31") UNION ALL
SELECT 5, TIMESTAMP("2017-01-31") UNION ALL
SELECT 6, TIMESTAMP("2016-12-31") UNION ALL
SELECT 7, TIMESTAMP("2016-09-30") UNION ALL
SELECT 8, TIMESTAMP("2016-05-31") UNION ALL
SELECT 9, TIMESTAMP("2016-02-28")
)
SELECT
storeId
FROM data
WHERE DATE(dateOpened) BETWEEN DATE_SUB(DATE("2017-08-01"), INTERVAL 12 MONTH) AND DATE("2017-08-01")
Where 2017-08-01 is your input string used as reference in the filtering.

Oracle SQL DATE datatype returns latest date as 31/12/99 despite younger records

This is hurting me.
Oracle RDMS using SQL Devloper v. 4
I have records in the table
EMP|DATE_STARTED
JOE|11/08/06
BOB|11/08/14
MAY|31/12/99
DATE_STARTED is DATE datatype.
My query below returns 31/12/99 as the latest date, but the table has later records (2006 and 2014).
select max(DATE_STARTED) from EMPLOYEE;
Why doesn't it return 11/08/14?
Just to make you understand why are you getting 31/12/99 as the latest date, see the demo .
Your original query:
with sample_table (EMP,DATE_STARTED) as (
select 'JOE', '11/08/06' from dual
union all
select 'BOB', '11/08/14' from dual
union all
select 'MAY', '12/08/99' from dual
)
select max(DATE_STARTED)
from sample_table
Output
12/08/99
I add to_date to and display the dates once again.
with sample_table (EMP,DATE_STARTED) as (
select 'JOE',to_date('11/08/06','dd-mm-yy')from dual
union all
select 'BOB', to_date('11/08/14','dd-mm-yy') from dual
union all
select 'MAY', to_date('12/08/99','dd-mm-yy') from dual
)
select DATE_STARTED-- max(DATE_STARTED)
from sample_table
Output:
11-08-2006
11-08-2014
12-08-2099
Now from the output of the above query you can see that the result which oracle returned is pretty correct. The max date from the above result set is 12-08-99.
So the inference from the above is when you don't specify the year as yyyy ,Oracle does implicit conversion and set the yy value depending on its system parameters.
To fix your issue you need something like below.
with sample_table (EMP,DATE_STARTED) as (
select 'JOE',to_date('11/08/2006','dd-mm-yy')from dual
union all
select 'BOB', to_date('11/08/2014','dd-mm-yy') from dual
union all
select 'MAY', to_date('12/08/1999','dd-mm-yy') from dual
)
select max(DATE_STARTED)
from sample_table
Output:
11-08-2014
Solved thanks all.
Solution:
1. in SQL Developer go to Tools\Preferences\Database\NLS Parameters\Date
2 update DD/MON/RR hh24:mi:ss to DD/MON/RRRR hh24:mi:ss
and it delivers the expected results!