How to pivot in bigQuery using PIVOT? - google-bigquery

I am trying to pull rows as columns in bigquery.
This is how my data looks like now:
This is how I want my data to look like:
PS: While I have shown only 3 values in column SUB_CLASS_DESC actual count is in 100s. Hence, I am looking to use Procedural language as per documentation here. I followed the example shared here in towardsdatascience.com and wrote below code, but unfortunately that doesn't work:
DECLARE DEPT_CLASS_SUB_CLASS STRING;
SET DEPT_CLASS_SUB_CLASS = (SELECT CONCAT('("', STRING_AGG(DISTINCT DEPT_CLASS_SUB_CLASS, '", "'), '")')
FROM `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_11_TEST`
);
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_PIVOTED_12_TEST` AS
SELECT * FROM
(SELECT HH_ID,DEPT_CLASS_SUB_CLASS,SALE_AMT
FROM `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_11_TEST`
)
PIVOT
(SUM(SALE_AMT)
,FOR DEPT_CLASS_SUB_CLASS IN %s
)""",DEPT_CLASS_SUB_CLASS);
Error I am getting:
Error message suggests to declare before the execute block, and I am doing exactly that, but I don't understand why the error still persists.
I tried declaring variables DEPT_CLASS_SUB_CLASS in different ways but not successful yet. Could anyone please point out where I might be making the mistake.
Much appreciated!

Consider below approach
execute immediate (select '''
select *
from your_table
pivot (any_value(sale_amt) for replace(sub_class_desc, ' ', '_') in (''' || list || '''))
'''
from (
select string_agg(distinct "'" || replace(sub_class_desc, ' ', '_') || "'") list
from your_table
)
)
if applied to dummy data as in your question - output is
How can I save these results into a new pivoted table? Specifically where can I put my CREATE OR REPLACE TABLE?
execute immediate (select '''
create or replace table `your_project.your_dataset.pivot_table` as
select *
from your_table
pivot (any_value(sale_amt) for replace(sub_class_desc, ' ', '_') in (''' || list || '''))
'''
from (
select string_agg(distinct "'" || replace(sub_class_desc, ' ', '_') || "'") list
from your_table
)
);

DEPT_CLASS_SUB_CLASS variable should be placed before any other statement, not just before an execute block being referenced.
From your error message, you seems to declare a variable at [411:1] which means at 411 line. Kindly move it to the top of your script at line 1 and test it again.
you have kind of a PIVOTing problem. I wrote down some test query which do PIVOTing and list columns in an alphabetical order at the same time.
DECLARE sample_data ARRAY<STRUCT<HH_ID STRING, SUB_CLASS_DESC STRING, SALE_AMT FLOAT64>> DEFAULT [
('HHH_001', 'K&B FIXTURE/PLUMBING', 139.),
('HHH_001', 'PULLDOWN KITCHEN FAUCETS', 129.),
('HHH_001', 'TUBULAR REPAIR & REPLACE', 0.)
];
CREATE TEMP TABLE data AS
SELECT r.* REPLACE(TRANSLATE(SUB_CLASS_DESC, ' &/', '___') AS SUB_CLASS_DESC)
FROM UNNEST(sample_data) r
;
EXECUTE IMMEDIATE FORMAT ("""
SELECT *
FROM data
PIVOT (SUM(SALE_AMT) AS sale_amt FOR SUB_CLASS_DESC IN ('%s'));
""", (SELECT STRING_AGG(DISTINCT SUB_CLASS_DESC, "','" ORDER BY SUB_CLASS_DESC ASC) FROM data)
);
Query Result

Related

Facing issue with Dynamic Unpivot in Google BigQuery

Desired Data
Original Data
I am using the script below to achieve this:
DECLARE myunpivot STRING;
SET myunpivot = (
SELECT CONCAT('(', STRING_AGG( column_name, ','), ')'),
From(
SELECT column_name FROM `ProjectA.ProjectA1.INFORMATION_SCHEMA.COLUMNS`
where table_name ="my_table"
));
EXECUTE IMMEDIATE format("""
SELECT * FROM `ProjectA.ProjectA1.table_name`
unpivot
(
columns
FOR value in %s
)
""", myunpivot);
I am getting this error- "Unexpected Keyword DECLARE" although I ensured that DECLARE statement is the first line in my code.
On removing the DECLARE statement, I am getting this error- "Executing OTHER_STATEMENT statements is not implemented"
I am not able to figure out what the issue is over here. Please Help!

Creating a table from a prior WITH clause in BigQuery

WITH LAYER AS (
SELECT
SPLIT(de_nest, '|')[OFFSET(1)] AS product,
....
FROM `table`,
UNNEST(SPLIT(LOWER(REPLACE(variable, '^', '-')), '-')) AS de_nest
)
-- Filter out empty products
CREATE OR REPLACE TABLE `newtable` AS
SELECT * FROM LAYER WHERE product is NOT NULL
This leads me to the following error.
Syntax error: Expected "(" or "," or keyword SELECT but got keyword CREATE at [25:1]
But I cannot seem to find a sensible way of resolving this. My first workload is doing the un-nesting of the first table and the second is doing some filtering on those columns generated from the un-nesting process.
You should try to put the CTE declaration after the CREATE statement:
CREATE OR REPLACE TABLE `new_table` AS
WITH layer AS ...
EDIT: a complete example
CREATE OR REPLACE TABLE
`your_project.your_dataset.your_table` AS
WITH
layer1 AS (
SELECT
'this is my CTE' AS txt),
another_cte AS (
SELECT
txt,
SPLIT(txt, ' ') AS my_array
FROM
layer1)
SELECT
*
FROM
another_cte
Creates the following table

How to join attributes in sql select statement?

I want to join few attributes in select statement as one for example
select id, (name + ' ' + surname + ' ' + age) as info from users
this doesn't work, how to do it?
I'm using postgreSQL.
Postgres uses || to concatenate strings; your query needs to be:
select id, (name || ' ' || surname || ' ' || age) as info from users
It should be the key just above the Enter key on a full keyboard, but on mine it's not a sold line - there's a break in the middle of it. You have to hold the Shift key to get the character (called a pipe, btw).
I believe the ANSI standard concatenation operator is: ||
SELECT id, name ||' ' || surname || ' ' || age AS info FROM users
It perfectly might be dependent of the database I would take a look for a concatenation function for the database you are running the select for.
Example. Mysql: CONCAT(name, surname, age).
You may need to cast the fields to a common type before concatenating. In T-SQL for example this would read.
Select id, Cast(name as varchar(50)) + ' ' + Cast(surname as varchar(50)) + ' ' +
Cast(age as varchar(3)) As info From Users
|| is used for this purpose.
Use it like
select name ||' ' ||surname ||' ' ||age as info from users
If you're using mysql or oracle then try CONCAT function:
SELECT id, CONCAT(name, ' ', surname, ' ', age) as info FROM users
That should work as is, but in general it is better not to do too much concatenation sql side if you can help it; rather return all the columns and concat them on the other end.
What are the data types you are using? You may need to CAST / CONVERT into (n)varchar
Make sure your data types are similar and convert any datatypes to string as necessary:
select id, (name + ' ' + surname + ' ' + convert(varchar(3),age)) as info from users
Works fine in most databases I know, although you probably have to CAST the age field to be TEXT. The exact method for doing this depends on the database you're using, which you did not specify.

SQL script to create insert script

A bit of a vague title, I will explain.
I am writing an SQL script to create an insert statement for each row of a table in my database, purely to be able to apply that data back to another database.
Here is what I have at the moment:
SELECT 'INSERT INTO products (id,name,description) VALUES ('||ID||','''||name||''','''||description||''');' FROM products
And it works great, outputting this:
INSERT INTO products (id,name,description) VALUES (1,'Lorem','Ipsum');
INSERT INTO products (id,name,description) VALUES (2,'Lorem','Ipsum');
INSERT INTO products (id,name,description) VALUES (3,'Lorem','Ipsum');
INSERT INTO products (id,name,description) VALUES (4,'Lorem','Ipsum');
The problem is if one of the fields is empty that row will fail to produce an update script, in the output file the line is just blank. Obviously as there are 20+ fields, some optional this means that hardly any of my scripts are generated.
Is there a way to solve this issue?
pg_dump -a -U user1 -t products -f products.copy database1
and then:
psql -U user2 -d database2 -f products.copy
and you're done. It's also safer and faster.
In the case of NULL fields you can do something like
Select COALESCE(Name, '') from...
The coalesce function returns the first nonnull value in the list.
For truly blank fields (empty nvarchar for instance) I believe your script above will work.
Use the quote_nullable() function new in PostgreSQL 8.4. In addition to permitting NULL values, it retains your data types and protects you from Bobby Tables (SQL injections):
SELECT 'INSERT INTO products (id,name,description) VALUES (' ||
quote_nullable(ID) || ',' || quote_nullable(name) || ',' ||
quote_nullable(description) || ');' FROM products;
In older versions, you get the same behavior with coalesce() and quote_literal():
SELECT 'INSERT INTO products (id,name,description) VALUES (' ||
coalesce(quote_literal(ID), 'null') || ',' ||
coalesce(quote_literal(name), 'null') || ',' ||
coalesce(quote_literal(description), 'null') || ',' ||
');' FROM products;
I wrote a python script based on #intgr answer to construct the select statement.
It takes comma separated list of columns from stdin (use -).
I wanted to use sqlparse but I couldn't understand how to use that lib.
import fileinput
names = ' '.join(fileinput.input())
ns = [x.strip() for x in names.split(',')]
quoted = ['quote_nullable(' + x + ')' for x in ns]
insert = "SELECT 'INSERT INTO <TABLE> ( " + (', ').join(ns) + " ) VALUES(' || " + (" || ',' || ").join(quoted) + " || ');' FROM <TABLE>"
print insert
A gist of the script is here: https://gist.github.com/2568047

SELECT * EXCEPT

Is there any RDBMS that implements something like SELECT * EXCEPT? What I'm after is getting all of the fields except a specific TEXT/BLOB field, and I'd like to just select everything else.
Almost daily I complain to my coworkers that someone should implement this... It's terribly annoying that it doesn't exist.
Edit: I understand everyone's concern for SELECT *. I know the risks associated with SELECT *. However, this, at least in my situation, would not be used for any Production level code, or even Development level code; strictly for debugging, when I need to see all of the values easily.
As I've stated in some of the comments, where I work is strictly a commandline shop, doing everything over ssh. This makes it difficult to use any gui tools (external connections to the database aren't allowed), etc etc.
Thanks for the suggestions though.
As others have said, it is not a good idea to do this in a query because it is prone to issues when someone changes the table structure in the future. However, there is a way to do this... and I can't believe I'm actually suggesting this, but in the spirit of answering the ACTUAL question...
Do it with dynamic SQL... this does all the columns except the "description" column. You could easily turn this into a function or stored proc.
declare #sql varchar(8000),
#table_id int,
#col_id int
set #sql = 'select '
select #table_id = id from sysobjects where name = 'MY_Table'
select #col_id = min(colid) from syscolumns where id = #table_id and name <> 'description'
while (#col_id is not null) begin
select #sql = #sql + name from syscolumns where id = #table_id and colid = #col_id
select #col_id = min(colid) from syscolumns where id = #table_id and colid > #col_id and name <> 'description'
if (#col_id is not null) set #sql = #sql + ','
print #sql
end
set #sql = #sql + ' from MY_table'
exec #sql
Create a view on the table which doesn't include the blob columns
Is there any RDBMS that implements something like SELECT * EXCEPT?
Yes, Google Big Query implements SELECT * EXCEPT:
A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output.
WITH orders AS(
SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity
)
SELECT * EXCEPT (order_id)
FROM orders;
Output:
+-----------+----------+
| item_name | quantity |
+-----------+----------+
| sprocket | 200 |
+-----------+----------+
EDIT:
H2 database also supports SELECT * EXCEPT (col1, col2, ...) syntax.
Wildcard expression
A wildcard expression in a SELECT statement. A wildcard expression represents all visible columns. Some columns can be excluded with optional EXCEPT clause.
EDIT 2:
Hive supports: REGEX Column Specification
A SELECT statement can take regex-based column specification in Hive releases prior to 0.13.0, or in 0.13.0 and later releases if the configuration property hive.support.quoted.identifiers is set to none.
The following query selects all columns except ds and hr.
SELECT `(ds|hr)?+.+` FROM sales
EDIT 3:
Snowflake also now supports: SELECT * EXCEPT (and a RENAME option equivalent to REPLACE in BigQuery)
EXCLUDE col_name EXCLUDE (col_name, col_name, ...)
When you select all columns (SELECT *), specifies the columns that should be excluded from the results.
RENAME col_name AS col_alias RENAME (col_name AS col_alias, col_name AS col_alias, ...)
When you select all columns (SELECT *), specifies the column aliases that should be used in the results.
and so does Databricks SQL (since Runtime 11.0)
star_clause
[ { table_name | view_name } . ] * [ except_clause ]
except_clause
EXCEPT ( { column_name | field_name } [, ...] )
and also DuckDB
-- select all columns except the city column from the addresses table
SELECT * EXCLUDE (city) FROM addresses;
-- select all columns from the addresses table, but replace city with LOWER(city)
SELECT * REPLACE (LOWER(city) AS city) FROM addresses;
-- select all columns matching the given regex from the table
SELECT COLUMNS('number\d+') FROM addresses;
DB2 allows for this. Columns have an attribute/specifier of Hidden.
From the syscolumns documentation
HIDDEN
CHAR(1) NOT NULL WITH DEFAULT 'N'
Indicates whether the column is implicitly hidden:
P Partially hidden. The column is implicitly hidden from SELECT *.
N Not hidden. The column is visible to all SQL statements.
Create table documentation As part of creating your column, you would specify the IMPLICITLY HIDDEN modifier
An example DDL from Implicitly Hidden Columns follows
CREATE TABLE T1
(C1 SMALLINT NOT NULL,
C2 CHAR(10) IMPLICITLY HIDDEN,
C3 TIMESTAMP)
IN DB.TS;
Whether this capability is such a deal maker to drive the adoption of DB2 is left as an exercise to future readers.
Is there any RDBMS that implements something like SELECT * EXCEPT
Yes! The truly relational language Tutorial D allows projection to be expressed in terms of the attributes to be removed instead of the ones to be kept e.g.
my_relvar { ALL BUT description }
In fact, its equivalent to SQL's SELECT * is { ALL BUT }.
Your proposal for SQL is a worthy one but I heard it has already been put to the SQL standard's committee by the users' group and rejected by the vendor's group :(
It has also been explicitly requested for SQL Server but the request was closed as 'won't fix'.
Yes, finally there is :) SQL Standard 2016 defines Polymorphic Table Functions
SQL:2016 introduces polymorphic table functions (PTF) that don't need to specify the result type upfront. Instead, they can provide a describe component procedure that determines the return type at run time. Neither the author of the PTF nor the user of the PTF need to declare the returned columns in advance.
PTFs as described by SQL:2016 are not yet available in any tested database.10 Interested readers may refer to the free technical report “Polymorphic table functions in SQL” released by ISO. The following are some of the examples discussed in the report:
CSVreader, which reads the header line of a CVS file to determine the number and names of the return columns
Pivot (actually unpivot), which turns column groups into rows (example: phonetype, phonenumber) -- me: no more harcoded strings :)
TopNplus, which passes through N rows per partition and one extra row with the totals of the remaining rows
Oracle 18c implements this mechanism. 18c Skip_col Polymorphic Table Function Example Oracle Live SQL and Skip_col Polymorphic Table Function Example
This example shows how to skip data based on name/specific datatype:
CREATE PACKAGE skip_col_pkg AS
-- OVERLOAD 1: Skip by name
FUNCTION skip_col(tab TABLE, col columns)
RETURN TABLE PIPELINED ROW POLYMORPHIC USING skip_col_pkg;
FUNCTION describe(tab IN OUT dbms_tf.table_t,
col dbms_tf.columns_t)
RETURN dbms_tf.describe_t;
-- OVERLOAD 2: Skip by type --
FUNCTION skip_col(tab TABLE,
type_name VARCHAR2,
flip VARCHAR2 DEFAULT 'False')
RETURN TABLE PIPELINED ROW POLYMORPHIC USING skip_col_pkg;
FUNCTION describe(tab IN OUT dbms_tf.table_t,
type_name VARCHAR2,
flip VARCHAR2 DEFAULT 'False')
RETURN dbms_tf.describe_t;
END skip_col_pkg;
and body:
CREATE PACKAGE BODY skip_col_pkg AS
/* OVERLOAD 1: Skip by name
* NAME: skip_col_pkg.skip_col
* ALIAS: skip_col_by_name
*
* PARAMETERS:
* tab - The input table
* col - The name of the columns to drop from the output
*
* DESCRIPTION:
* This PTF removes all the input columns listed in col from the output
* of the PTF.
*/
FUNCTION describe(tab IN OUT dbms_tf.table_t,
col dbms_tf.columns_t)
RETURN dbms_tf.describe_t
AS
new_cols dbms_tf.columns_new_t;
col_id PLS_INTEGER := 1;
BEGIN
FOR i IN 1 .. tab.column.count() LOOP
FOR j IN 1 .. col.count() LOOP
tab.column(i).pass_through := tab.column(i).description.name != col(j);
EXIT WHEN NOT tab.column(i).pass_through;
END LOOP;
END LOOP;
RETURN NULL;
END;
/* OVERLOAD 2: Skip by type
* NAME: skip_col_pkg.skip_col
* ALIAS: skip_col_by_type
*
* PARAMETERS:
* tab - Input table
* type_name - A string representing the type of columns to skip
* flip - 'False' [default] => Match columns with given type_name
* otherwise => Ignore columns with given type_name
*
* DESCRIPTION:
* This PTF removes the given type of columns from the given table.
*/
FUNCTION describe(tab IN OUT dbms_tf.table_t,
type_name VARCHAR2,
flip VARCHAR2 DEFAULT 'False')
RETURN dbms_tf.describe_t
AS
typ CONSTANT VARCHAR2(1024) := upper(trim(type_name));
BEGIN
FOR i IN 1 .. tab.column.count() LOOP
tab.column(i).pass_through :=
CASE upper(substr(flip,1,1))
WHEN 'F' THEN dbms_tf.column_type_name(tab.column(i).description)
!=typ
ELSE dbms_tf.column_type_name(tab.column(i).description)
=typ
END /* case */;
END LOOP;
RETURN NULL;
END;
END skip_col_pkg;
And sample usage:
-- skip number cols
SELECT * FROM skip_col_pkg.skip_col(scott.dept, 'number');
-- only number cols
SELECT * FROM skip_col_pkg.skip_col(scott.dept, 'number', flip => 'True')
-- skip defined columns
SELECT *
FROM skip_col_pkg.skip_col(scott.emp, columns(comm, hiredate, mgr))
WHERE deptno = 20;
I highly recommend to read entire example(creating standalone functions instead of package calls).
You could easily overload skip method for example: skip columns that does not start/end with specific prefix/suffix.
db<>fidde demo
Related: How to Dynamically Change the Columns in a SQL Query By Chris Saxon
Stay away from SELECT *, you are setting yourself for trouble. Always specify exactly which columns you want. It is in fact quite refreshing that the "feature" you are asking for doesn't exist.
I believe the rationale for it not existing is that the author of a query should (for performance sake) only request what they're going to look at/need (and therefore know what columns to specify) -- if someone adds a couple more blobs in the future, you'd be pulling back potentially large fields you're not going to need.
Temp table option here, just drop the columns not required and select * from the altered temp table.
/* Get the data into a temp table */
SELECT * INTO #TempTable
FROM
table
/* Drop the columns that are not needed */
ALTER TABLE #TempTable
DROP COLUMN [columnname]
SELECT * from #TempTable
declare #sql nvarchar(max)
#table char(10)
set #sql = 'select '
set #table = 'table_name'
SELECT #sql = #sql + '[' + COLUMN_NAME + '],'
FROM INFORMATION_SCHEMA.Columns
WHERE TABLE_NAME = #table
and COLUMN_NAME <> 'omitted_column_name'
SET #sql = substring(#sql,1,len(#sql)-1) + ' from ' + #table
EXEC (#sql);
I needed something like what #Glen asks for easing my life with HASHBYTES().
My inspiration was #Jasmine and #Zerubbabel answers. In my case I've different schemas, so the same table name appears more than once at sys.objects. As this may help someone with the same scenario, here it goes:
ALTER PROCEDURE [dbo].[_getLineExceptCol]
#table SYSNAME,
#schema SYSNAME,
#LineId int,
#exception VARCHAR(500)
AS
DECLARE #SQL NVARCHAR(MAX)
BEGIN
SET NOCOUNT ON;
SELECT #SQL = COALESCE(#SQL + ', ', ' ' ) + name
FROM sys.columns
WHERE name <> #exception
AND object_id = (SELECT object_id FROM sys.objects
WHERE name LIKE #table
AND schema_id = (SELECT schema_id FROM sys.schemas WHERE name LIKE #schema))
SELECT #SQL = 'SELECT ' + #SQL + ' FROM ' + #schema + '.' + #table + ' WHERE Id = ' + CAST(#LineId AS nvarchar(50))
EXEC(#SQL)
END
GO
It's an old question, but I hope this answer can still be helpful to others. It can also be modified to add more than one except fields. This can be very handy if you want to unpivot a table with many columns.
DECLARE #SQL NVARCHAR(MAX)
SELECT #SQL = COALESCE(#SQL + ', ', ' ' ) + name FROM sys.columns WHERE name <> 'colName' AND object_id = (SELECT id FROM sysobjects WHERE name = 'tblName')
SELECT #SQL = 'SELECT ' + #SQL + ' FROM ' + 'tblName'
EXEC sp_executesql #SQL
Stored Procedure:
usp_SelectAllExcept 'tblname', 'colname'
ALTER PROCEDURE [dbo].[usp_SelectAllExcept]
(
#tblName SYSNAME
,#exception VARCHAR(500)
)
AS
DECLARE #SQL NVARCHAR(MAX)
SELECT #SQL = COALESCE(#SQL + ', ', ' ' ) + name from sys.columns where name <> #exception and object_id = (Select id from sysobjects where name = #tblName)
SELECT #SQL = 'SELECT ' + #SQL + ' FROM ' + #tblName
EXEC sp_executesql #SQL
For the sake of completeness, this is possible in DremelSQL dialect, doing something like:
WITH orders AS
(SELECT 5 as order_id,
"foobar12" as item_name,
800 as quantity)
SELECT * EXCEPT (order_id)
FROM orders;
Result:
+-----------+----------+
| item_name | quantity |
+-----------+----------+
| foobar12 | 800 |
+-----------+----------+
There also seems to be another way to do it here without Dremel.
Your question was about what RDBMS supports the * EXCEPT (...) syntax, so perhaps, looking at the jOOQ manual page for * EXCEPT can be useful in the future, as that page will keep track of new dialects supporting the syntax.
Currently (mid 2022), among the jOOQ supported RDBMS, at least BigQuery, H2, and Snowflake support the syntax natively. The others need to emulate it by listing the columns explicitly:
-- ACCESS, ASE, AURORA_MYSQL, AURORA_POSTGRES, COCKROACHDB, DB2, DERBY, EXASOL,
-- FIREBIRD, HANA, HSQLDB, INFORMIX, MARIADB, MEMSQL, MYSQL, ORACLE, POSTGRES,
-- REDSHIFT, SQLDATAWAREHOUSE, SQLITE, SQLSERVER, SYBASE, TERADATA, VERTICA,
-- YUGABYTEDB
SELECT LANGUAGE.CD, LANGUAGE.DESCRIPTION
FROM LANGUAGE
-- BIGQUERY, H2
SELECT * EXCEPT (ID)
FROM LANGUAGE
-- SNOWFLAKE
SELECT * EXCLUDE (ID)
FROM LANGUAGE
Disclaimer: I work for the company behind jOOQ
As others are saying: SELECT * is a bad idea.
Some reasons:
Get only what you need (anything more is a waste)
Indexing (index what you need and you can get it more quickly. If you ask for a bunch of non-indexed columns, too, your query plans will suffer.