How to pivot on dynamic values in Snowflake - dynamic

I want to pivot a table based on a field which can contain "dynamic" values (not always known beforehand).
I can make it work by hard coding the values (which is undesirable):
SELECT *
FROM my_table
pivot(SUM(amount) FOR type_id IN (1,2,3,4,5,20,50,83,141,...));
But I can't make it work using a query to provide the values dynamically:
SELECT *
FROM my_table
pivot(SUM(amount) FOR type_id IN (SELECT id FROM types));
---
090150 (22000): Single-row subquery returns more than one row.
SELECT *
FROM my_table
pivot(SUM(amount) FOR type_id IN (SELECT ARRAY_AGG(id) FROM types));
---
001038 (22023): SQL compilation error:
Can not convert parameter 'my_table.type_id' of type [NUMBER(38,0)] into expected type [ARRAY]
Is there a way to accomplish this?

I don't think it's possible in native SQL, but I wrote an article and published some code showing how my team does this by generating the query from Python.
You can call the Python script directly, passing arguments similar to the options Excel gives you for pivot tables:
python generate_pivot_query.py \
--dbtype snowflake --database mydb \
--host myhost.url --port 5432 \
--user me --password myp4ssw0rd \
--base-columns customer_id \
--pivot-columns category \
--exclude-columns order_id \
--aggfunction-mappings amount=sum \
myschema orders
Or, if you're Airflow, you can use a CreatePivotTableOperator to create tasks directly.

I wrote a Snowflake stored procedure to get dynamics pivots inside Snowflake, check:
https://hoffa.medium.com/dynamic-pivots-in-sql-with-snowflake-c763933987c
3 steps:
Query
Call stored procedure call pivot_prev_results()
Find the results select * from table(result_scan(last_query_id(-2)))
The procedure:
create or replace procedure pivot_prev_results()
returns string
language javascript
execute as caller as
$$
var cols_query = `
select '\\''
|| listagg(distinct pivot_column, '\\',\\'') within group (order by pivot_column)
|| '\\''
from table(result_scan(last_query_id(-1)))
`;
var stmt1 = snowflake.createStatement({sqlText: cols_query});
var results1 = stmt1.execute();
results1.next();
var col_list = results1.getColumnValue(1);
pivot_query = `
select *
from (select * from table(result_scan(last_query_id(-2))))
pivot(max(pivot_value) for pivot_column in (${col_list}))
`
var stmt2 = snowflake.createStatement({sqlText: pivot_query});
stmt2.execute();
return `select * from table(result_scan('${stmt2.getQueryId()}'));\n select * from table(result_scan(last_query_id(-2)));`;
$$;

Inspired by my two predecessors, I created another stored proc that could be called to create even multi-grouped and multi-pivot emulated pivot queries:
create or replace procedure
test_pivot.public.get_full_pivot(
"source" varchar, // fully-qualified 'table/view_name' or full '(subquery)'
"row_headers" varchar, // comma-separated list of 1+ GROUP BY field names
"col_header1" varchar, // first (mandatory) PIVOT field name
"col_header2" varchar, // secondary (optional) PIVOT field name ('' if none)
"agg" varchar, // field name for the aggregate values
"aggf" varchar) // aggregate function (sum, avg, min, max, count...)
returns varchar
language javascript
as $$
// collect all distinct values for a column header field
function get_distinct_values(col_header) {
var vals = [];
if (col_header != '') {
var result = snowflake.execute(
{sqlText: `select distinct ${col_header}\n`
+ `from ${source}\n`
+ `order by ${col_header}`}); // nulls last!
while (result.next())
vals.push(result.getColumnValueAsString(1));
}
return vals;
}
var vals1 = get_distinct_values(col_header1);
var vals2 = get_distinct_values(col_header2);
// create and return the emulated pivot query, for one or two column header values
var query = `select ${row_headers}`;
if (vals2.length == 0)
for (const i in vals1) {
var cond1 = (vals1[i] == 'null'
? `${col_header1} is null` : `to_char(${col_header1})='${vals1[i]}'`);
query += `,\n ${aggf}(iff(${cond1}, ${agg}, null)) as "${vals1[i]}"`;
}
else
for (const i in vals1)
for (const j in vals2) {
var cond1 = (vals1[i] == 'null'
? `${col_header1} is null` : `to_char(${col_header1})='${vals1[i]}'`);
var cond2 = (vals2[j] == 'null'
? `${col_header2} is null` : `to_char(${col_header2})='${vals2[j]}'`);
query += `,\n ${aggf}(iff(${cond1} AND ${cond2}, ${agg}, null)) as "${vals1[i]}+${vals2[j]}"`;
}
query += `\nfrom ${source}\n`
+ `group by ${row_headers}\n`
+ `order by ${row_headers};`; // nulls last!
return query;
$$;
Call with:
call test_pivot.public.get_full_pivot(
'test_pivot.public.demographics',
'country, education', 'status', 'gender', 'sales', 'sum');
to generate the following SQL:
select country, education,
sum(iff(to_char(status)='divorced' AND to_char(gender)='F', sales, null)) as "divorced+F",
sum(iff(to_char(status)='divorced' AND to_char(gender)='M', sales, null)) as "divorced+M",
sum(iff(to_char(status)='married' AND to_char(gender)='F', sales, null)) as "married+F",
sum(iff(to_char(status)='married' AND to_char(gender)='M', sales, null)) as "married+M",
sum(iff(to_char(status)='single' AND to_char(gender)='F', sales, null)) as "single+F",
sum(iff(to_char(status)='single' AND to_char(gender)='M', sales, null)) as "single+M",
sum(iff(status is null AND to_char(gender)='F', sales, null)) as "null+F",
sum(iff(status is null AND to_char(gender)='M', sales, null)) as "null+M"
from test_pivot.public.demographics
group by country, education
order by country, education;
Which may return a result structured like:

Related

Declare variables in scheduled query BigQuery;

I am developing a scheduled query where I am using the WITH statement to join and filtrate several tables from BigQuery. To filtrate the dates, I would like to declare the following variables:
DECLARE initial, final DATE;
SET initial = DATE_TRUNC(DATE_TRUNC(CURRENT_DATE(), MONTH)+7,ISOWEEK);
SET final = LAST_DAY(DATE_TRUNC(CURRENT_DATE(), MONTH)+7, ISOWEEK);
However, when executing this query, I am getting two results; one for the variables declared (which I am not interested in having them as output), and the WITH statement that is selected at the end (which as the results that I am interested in).
The principal problem is that, whenever I try t connect this scheduled query to a table in Google Data Studio I get the following error:
Invalid value: configuration.query.destinationTable cannot be set for scripts;
How can I declare a variable without getting it as a result at the end?
Here you have a sample of the code I am trying work in:
DECLARE initial, final DATE;
SET initial = DATE_TRUNC(DATE_TRUNC(CURRENT_DATE(), MONTH)+7,ISOWEEK);
SET final = LAST_DAY(DATE_TRUNC(CURRENT_DATE(), MONTH)+7, ISOWEEK);
WITH HelloWorld AS (
SELECT shop_date, revenue
FROM fulltable
WHERE shop_date >= initial
AND shop_date <= final
)
SELECT * from HelloWorld;
with initial1 as ( select DATE_TRUNC(DATE_TRUNC(CURRENT_DATE(), MONTH)+7,ISOWEEK) as initial2),
final1 as ( select LAST_DAY(DATE_TRUNC(CURRENT_DATE(), MONTH)+7, ISOWEEK) as final2),
HelloWorld AS (
SELECT shop_date, revenue
FROM fulltable
WHERE shop_date >= (select initial2 from initial1) AND shop_date <= (select final2 from final1)
)
SELECT * from HelloWorld;
With config table having just 1 row and cross-joining it with your table, your query can be written like below.
WITH config AS (
SELECT DATE_TRUNC(DATE_TRUNC(CURRENT_DATE(), MONTH)+7,ISOWEEK) AS initial,
LAST_DAY(DATE_TRUNC(CURRENT_DATE(), MONTH)+7, ISOWEEK) AS final
),
HelloWorld AS (
SELECT * FROM UNNEST([DATE '2022-06-06']) shop_date, config
WHERE shop_date >= config.initial AND shop_date <= config.final
)
SELECT * FROM HelloWorld;
A few patterns I've used:
If you have many that have the same return type (STRING)
CREATE TEMP FUNCTION config(key STRING)
RETURNS STRING AS (
CASE key
WHEN "timezone" THEN "America/Edmonton"
WHEN "something" THEN "Value"
END
);
Then use config(key) to retrieve the value.
Or,
Create a function for each constant
CREATE TEMP FUNCTION timezone()
RETURNS STRING AS ("America/Edmonton");
Then use timezone() to get the value.
It would execute the function each time, so don't do something expensive in there (like SELECT from another table).

DB2 SQL function returning multiple values when I am expecting only one

I am trying to get the location of the last time an item was moved via sql function with the code below. Pretty basic, I'm just trying to grab the max date and time. If I run the sql as a regular select and hard code an item number in ATPRIM I get only one location. But if I create this function and then try to run it and then pass the function an item number I get every occurrence in the history file instead of just the MAX which would be the most recent. Also I have tried a Select Distinct and that did not do anything for me.
ATOGST = Item Location
ATPRIM = Item
ATDATE = Date
ATTIME = Time
CREATE FUNCTION ERPLXU/F#QAT1(AATPRIM VARCHAR(10))
RETURNS CHAR(50)
LANGUAGE SQL
NOT DETERMINISTIC
BEGIN DECLARE F#QAT1 CHAR(50) ;
SET F#QAT1 = ' ' ;
SELECT ATOGST
INTO F#QAT1 FROM ERPLXF/QAT as t1
WHERE ATPRIM = AATPRIM
AND ATDATE = (SELECT MAX(ATDATE) FROM ERPLXF/QAT AS T2
WHERE T2.ATPRIM = AATPRIM)
AND ATTIME = (SELECT MAX(ATTIME) FROM ERPLXF/QAT AS T3
WHERE T3.ATPRIM = AATPRIM
AND T3.ATDATE = T1.ATDATE) ;
RETURN F#QAT1 ;
END
EDIT:
So what I am trying to do is get that location and I got it to work on my iSeries in strsql but the problem is we use a web application called Web Object Wizard (WoW) which lets us use sql to make reports that are more user friendly. Below is what I was trying to get to work but the subquery in the select does not work in WoW so that is where I was trying create a function which we know works in other applications.
SELECT distinct t1.atprim, atdesc, dbtabl, dbdtin, dblife, dblpdp,
dbcost, dbbas, dbresv, dbyrdp, dbcurr,
(select atogst
from erplxf.qat as t2
where t1.atprim = t2.atprim and atdate = (select max(atdate) from
erplxf.qat as t3 where t2.atprim = t3.atprim) and attime = (select
max(attime) from erplxf.qat as t4 where t1.atprim = t4.atprim and
t1.atdate = t4.atdate)
) as #113_ToLoc
FROM erplxf.qat as t1 join erplxf.qdb on atassn = dbassn
where dbrcid = 'DB'
and dbcurr != 0
So instead of that subquery at the end of the select it would just be
, erplxu.f#qat1(atprim) as #113_ToLoc
Try this:
CREATE FUNCTION ERPLXU/F#QAT1(AATPRIM VARCHAR(10))
RETURNS CHAR(50)
LANGUAGE SQL
RETURN
SELECT ATOGST
FROM ERPLXF/QAT
WHERE ATPRIM = AATPRIM
ORDER BY ATDATE DESC, ATTIME DESC
FETCH FIRST 1 ROW ONLY;

Is it possible to invoke BigQuery procedures in python client?

Scripting/procedures for BigQuery just came out in beta - is it possible to invoke procedures using the BigQuery python client?
I tried:
query = """CALL `myproject.dataset.procedure`()...."""
job = client.query(query, location="US",)
print(job.results())
print(job.ddl_operation_performed)
print(job._properties) but that didn't give me the result set from the procedure. Is it possible to get the results?
Thank you!
Edited - stored procedure I am calling
CREATE OR REPLACE PROCEDURE `Project.Dataset.Table`(IN country STRING, IN accessDate DATE, IN accessId, OUT saleExists INT64)
BEGIN
IF EXISTS (SELECT 1 FROM dataset.table where purchaseCountry = country and purchaseDate=accessDate and customerId = accessId)
THEN
SET saleExists = (SELECT 1);
ELSE
INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
SET saleExists = (SELECT 0);
END IF;
END;
If you follow the CALL command with a SELECT statement, you can get the return value of the function as a result set. For example, I created the following stored procedure:
BEGIN
-- Build an array of the top 100 names from the year 2017.
DECLARE
top_names ARRAY<STRING>;
SET
top_names = (
SELECT
ARRAY_AGG(name
ORDER BY
number DESC
LIMIT
100)
FROM
`bigquery-public-data.usa_names.usa_1910_current`
WHERE
year = 2017 );
-- Which names appear as words in Shakespeare's plays?
SET
top_shakespeare_names = (
SELECT
ARRAY_AGG(name)
FROM
UNNEST(top_names) AS name
WHERE
name IN (
SELECT
word
FROM
`bigquery-public-data.samples.shakespeare` ));
END
Running the following query will return the procedure's return as the top-level results set.
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `my-project.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
In Python:
from google.cloud import bigquery
client = bigquery.Client()
query_string = """
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `swast-scratch.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
"""
query_job = client.query(query_string)
rows = list(query_job.result())
print(rows)
Related: If you have SELECT statements within a stored procedure, you can walk the job to fetch the results, even if the SELECT statement isn't the last statement in the procedure.
# TODO(developer): Import the client library.
# from google.cloud import bigquery
# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()
# Run a SQL script.
sql_script = """
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE year = 2000
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data.samples.shakespeare`
);
"""
parent_job = client.query(sql_script)
# Wait for the whole script to finish.
rows_iterable = parent_job.result()
print("Script created {} child jobs.".format(parent_job.num_child_jobs))
# Fetch result rows for the final sub-job in the script.
rows = list(rows_iterable)
print("{} of the top 100 names from year 2000 also appear in Shakespeare's works.".format(len(rows)))
# Fetch jobs created by the SQL script.
child_jobs_iterable = client.list_jobs(parent_job=parent_job)
for child_job in child_jobs_iterable:
child_rows = list(child_job.result())
print("Child job with ID {} produced {} rows.".format(child_job.job_id, len(child_rows)))
It works if you have SELECT inside your procedure, given the procedure being:
create or replace procedure dataset.proc_output() BEGIN
SELECT t FROM UNNEST(['1','2','3']) t;
END;
Code:
from google.cloud import bigquery
client = bigquery.Client()
query = """CALL dataset.proc_output()"""
job = client.query(query, location="US")
for result in job.result():
print result
will output:
Row((u'1',), {u't': 0})
Row((u'2',), {u't': 0})
Row((u'3',), {u't': 0})
However, if there are multiple SELECT inside a procedure, only the last result set can be fetched this way.
Update
See below example:
CREATE OR REPLACE PROCEDURE zyun.exists(IN country STRING, IN accessDate DATE, OUT saleExists INT64)
BEGIN
SET saleExists = (WITH data AS (SELECT "US" purchaseCountry, DATE "2019-1-1" purchaseDate)
SELECT Count(*) FROM data where purchaseCountry = country and purchaseDate=accessDate);
IF saleExists = 0 THEN
INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
END IF;
END;
BEGIN
DECLARE saleExists INT64;
CALL zyun.exists("US", DATE "2019-2-1", saleExists);
SELECT saleExists;
END
BTW, your example is much better served with a single MERGE statement instead of a script.

Selecting one out of many strings with concatenation in Redshift

I have to match the following URLs by writing a query in Amazon Redshift:
<some url>/www.abc.com/a/<more url>
<some url>/www.abc.com/b/<more url>
<some url>/www.abc.com/c/<more url>
<some url>/www.abc.com/d/<more url>
Here, obviously the "/www.abc.com/" is constant, but the text after '/' can change. It can take one of the many values that I have a list of (a,b,c,d in this case). How do I match this part that comes immediately after "/www.abc.com/"?
I can think of the following:
select text,
case
when text ilike '%/www.abc.com/' || <what should go here?> || '/%' then 'URLType1'
when <some other condition> then 'URLType2'
end as URLType
from table
I have to maintain the CASE structure.
Any help would be much appreciated.
THe options are:
1) put the list of values into a subquery and then join to this list like this:
with
value_list as (
select 'a' as val union select 'b' union select 'c' union select 'd'
)
select text
from table
join value_list
on text ilike '%/www.abc.com/' || val || '/%'
2) use OR:
select text,
case
when text ilike '%/www.abc.com/a/%'
or text ilike '%/www.abc.com/b/%'
or text ilike '%/www.abc.com/c/%'
or text ilike '%/www.abc.com/d/%'
then 'URLType1'
when <some other condition>
then 'URLType2'
end as URLType
from table
3) Write a Python UDF that takes the url and the list and returns true or false like this:
CREATE OR REPLACE FUNCTION multi_ilike(str varchar(max),arr varchar(max))
RETURNS boolean
STABLE AS $$
if str==None or arr==None:
return None
arr = arr.split(',')
str = str.lower()
for i in arr:
if i.lower() in str:
return True
return False
$$ LANGUAGE plpythonu;
select multi_ilike('<some url>/www.abc.com/a/<more url>','/www.abc.com/a/,/www.abc.com/b/,/www.abc.com/c/,/www.abc.com/d/'); -- returns true
select multi_ilike('<some url>/www.abc.com/F/<more url>','/www.abc.com/a/,/www.abc.com/b/,/www.abc.com/c/,/www.abc.com/d/'); -- returns false

Error creating function in DB2 with params

I have a problem with a function in db2
The function finds a record, and returns a number according to whether the first and second recorded by a user
The query within the function is this
SELECT
CASE
WHEN NUM IN (1,2) THEN 5
ELSE 2.58
END AS VAL
FROM (
select ROW_NUMBER() OVER() AS NUM ,s.POLLIFE
from LQD943DTA.CAQRTRML8 c
INNER JOIN LSMODXTA.SCSRET s ON c.MCCNTR = s.POLLIFE
WHERE s.NOEMP = ( SELECT NOEMP FROM LSMODDTA.LOLLM04 WHERE POLLIFE = '0010111003')
) AS T WHERE POLLIFE = '0010111003'
And works perfect
I create the function with this code
CREATE FUNCTION LIBWEB.BNOWPAPOL(POL CHAR)
RETURNS DECIMAL(7,2)
LANGUAGE SQL
NOT DETERMINISTIC
READS SQL DATA
RETURN (
SELECT
CASE
WHEN NUM IN (1,2) THEN 5
ELSE 2.58
END AS VAL
FROM (
select ROW_NUMBER() OVER() AS NUM ,s.POLLIFE
from LQD943DTA.CAQRTRML8 c
INNER JOIN LSMODXTA.SCSRET s ON c.MCCNTR = s.POLLIFE
WHERE s.NOEMP = ( SELECT NOEMP FROM LSMODDTA.LOLLM04 WHERE POLLIFE = POL)
) AS T WHERE POLLIFE = POL
)
The command runs executed properly
WARNING: 17:55:40 [CREATE - 0 row(s), 0.439 secs] Command processed.
No rows were affected
When I want execute the query a get a error
SELECT LIBWEB.BNOWPAPOL('0010111003') FROM DATAS.DUMMY -- dummy has only one row
I get
[Error Code: -204, SQL State: 42704] [SQL0204] BNOWPAPOL in LIBWEB
type *N not found.
I detect, when I remove the parameter the function works fine!
With this code
CREATE FUNCTION LIBWEB.BNOWPAPOL()
RETURNS DECIMAL(7,2)
LANGUAGE SQL
NOT DETERMINISTIC
READS SQL DATA
RETURN (
SELECT
CASE
WHEN NUM IN (1,2) THEN 5
ELSE 2.58
END AS VAL
FROM (
select ROW_NUMBER() OVER() AS NUM ,s.POLLIFE
from LQD943DTA.CAQRTRML8 c
INNER JOIN LSMODXTA.SCSRET s ON c.MCCNTR = s.POLLIFE
WHERE s.NOEMP = ( SELECT NOEMP FROM LSMODDTA.LOLLM04 WHERE POLLIFE = '0010111003')
) AS T WHERE POLLIFE = '0010111003'
)
Why??
This statement:
SELECT LIBWEB.BNOWPAPOL('0010111003') FROM DATAS.DUMMY
causes this error:
[Error Code: -204, SQL State: 42704] [SQL0204] BNOWPAPOL in LIBWEB
type *N not found.
The parm value passed into the BNOWPAPOL() function is supplied as a quoted string with no definition (no CAST). The SELECT statement assumes that it's a VARCHAR value since different length strings might be given at any time and passes it to the server as a VARCHAR.
The original function definition says:
CREATE FUNCTION LIBWEB.BNOWPAPOL(POL CHAR)
The function signature is generated for a single-byte CHAR. (Function definitions can be overloaded to handle different inputs, and signatures are used to differentiate between function versions.)
Since a VARCHAR was passed from the client and only a CHAR function version was found by the server, the returned error fits. Changing the function definition or CASTing to a matching type can solve this kind of problem. (Note that a CHAR(1) parm could only correctly handle a single-character input if a value is CAST.)