Facing issue with Dynamic Unpivot in Google BigQuery - google-bigquery

Desired Data
Original Data
I am using the script below to achieve this:
DECLARE myunpivot STRING;
SET myunpivot = (
SELECT CONCAT('(', STRING_AGG( column_name, ','), ')'),
From(
SELECT column_name FROM `ProjectA.ProjectA1.INFORMATION_SCHEMA.COLUMNS`
where table_name ="my_table"
));
EXECUTE IMMEDIATE format("""
SELECT * FROM `ProjectA.ProjectA1.table_name`
unpivot
(
columns
FOR value in %s
)
""", myunpivot);
I am getting this error- "Unexpected Keyword DECLARE" although I ensured that DECLARE statement is the first line in my code.
On removing the DECLARE statement, I am getting this error- "Executing OTHER_STATEMENT statements is not implemented"
I am not able to figure out what the issue is over here. Please Help!

Related

How to pivot in bigQuery using PIVOT?

I am trying to pull rows as columns in bigquery.
This is how my data looks like now:
This is how I want my data to look like:
PS: While I have shown only 3 values in column SUB_CLASS_DESC actual count is in 100s. Hence, I am looking to use Procedural language as per documentation here. I followed the example shared here in towardsdatascience.com and wrote below code, but unfortunately that doesn't work:
DECLARE DEPT_CLASS_SUB_CLASS STRING;
SET DEPT_CLASS_SUB_CLASS = (SELECT CONCAT('("', STRING_AGG(DISTINCT DEPT_CLASS_SUB_CLASS, '", "'), '")')
FROM `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_11_TEST`
);
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_PIVOTED_12_TEST` AS
SELECT * FROM
(SELECT HH_ID,DEPT_CLASS_SUB_CLASS,SALE_AMT
FROM `analytics-mkt-cleanroom.Workspace.HS_AF_SG_R12_800K_SAMPLE_SALES_11_TEST`
)
PIVOT
(SUM(SALE_AMT)
,FOR DEPT_CLASS_SUB_CLASS IN %s
)""",DEPT_CLASS_SUB_CLASS);
Error I am getting:
Error message suggests to declare before the execute block, and I am doing exactly that, but I don't understand why the error still persists.
I tried declaring variables DEPT_CLASS_SUB_CLASS in different ways but not successful yet. Could anyone please point out where I might be making the mistake.
Much appreciated!
Consider below approach
execute immediate (select '''
select *
from your_table
pivot (any_value(sale_amt) for replace(sub_class_desc, ' ', '_') in (''' || list || '''))
'''
from (
select string_agg(distinct "'" || replace(sub_class_desc, ' ', '_') || "'") list
from your_table
)
)
if applied to dummy data as in your question - output is
How can I save these results into a new pivoted table? Specifically where can I put my CREATE OR REPLACE TABLE?
execute immediate (select '''
create or replace table `your_project.your_dataset.pivot_table` as
select *
from your_table
pivot (any_value(sale_amt) for replace(sub_class_desc, ' ', '_') in (''' || list || '''))
'''
from (
select string_agg(distinct "'" || replace(sub_class_desc, ' ', '_') || "'") list
from your_table
)
);
DEPT_CLASS_SUB_CLASS variable should be placed before any other statement, not just before an execute block being referenced.
From your error message, you seems to declare a variable at [411:1] which means at 411 line. Kindly move it to the top of your script at line 1 and test it again.
you have kind of a PIVOTing problem. I wrote down some test query which do PIVOTing and list columns in an alphabetical order at the same time.
DECLARE sample_data ARRAY<STRUCT<HH_ID STRING, SUB_CLASS_DESC STRING, SALE_AMT FLOAT64>> DEFAULT [
('HHH_001', 'K&B FIXTURE/PLUMBING', 139.),
('HHH_001', 'PULLDOWN KITCHEN FAUCETS', 129.),
('HHH_001', 'TUBULAR REPAIR & REPLACE', 0.)
];
CREATE TEMP TABLE data AS
SELECT r.* REPLACE(TRANSLATE(SUB_CLASS_DESC, ' &/', '___') AS SUB_CLASS_DESC)
FROM UNNEST(sample_data) r
;
EXECUTE IMMEDIATE FORMAT ("""
SELECT *
FROM data
PIVOT (SUM(SALE_AMT) AS sale_amt FOR SUB_CLASS_DESC IN ('%s'));
""", (SELECT STRING_AGG(DISTINCT SUB_CLASS_DESC, "','" ORDER BY SUB_CLASS_DESC ASC) FROM data)
);
Query Result

how can I print details (table name, column name, data type) of each column/table in my db2 database?

In my previous question Mark suggested a good answer for displaying count on every table in my database. I would like to expand this procedure and - instead of counts - display the specific info (TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE) about each column in the database.
I so far have the following command:
--#SET TERMINATOR #
CREATE OR REPLACE FUNCTION EXPORT_SCHEMAS()
RETURNS TABLE (P_TABSCHEMA VARCHAR(128), P_TABNAME VARCHAR(128), P_COLUM_NNAME VARCHAR(128), P_DATA_TYPE VARCHAR(128))
BEGIN
DECLARE L_STMT VARCHAR(256);
DECLARE L_ROWS VARCHAR(256);
FOR V1 AS
SELECT TABSCHEMA, TABNAME
FROM SYSCAT.TABLES
WHERE TYPE = 'T'
ORDER BY 1,2
DO
SET L_STMT = 'SET ? = (SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE FROM SYSIBM.COLUMNS where TABLE_NAME = "'||V1.TABNAME||'" AND TABLE_SCHEMA = "'||V1.TABSCHEMA||'")';
PREPARE S FROM L_STMT;
EXECUTE S INTO L_ROWS;
PIPE(L_ROWS);
END FOR;
RETURN;
END#
SELECT * FROM TABLE(EXPORT_SCHEMAS())#
but now when I run it:
db2 -ntd~ -f export_schemas.sql > dump.csv
I'm getting the error:
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL20019N The result type returned from the function body cannot be assigned
to the data type defined in the RETURNS clause. LINE NUMBER=17.
SQLSTATE=42866
Could you please help me and let me know what is wrong here and how could I fix it? Thanks!
If you use Db2 for LUW, then you shouldn't use SYSIBM schema in your queries on the system catalog. Use SYSCAT instead.
You don't have to use any functions to get what you want here. Use the following query instead:
SELECT TABSCHEMA, TABNAME, COLNAME, TYPENAME
FROM SYSCAT.COLUMNS
ORDER BY TABSCHEMA, TABNAME, COLNO;
As for your routine. There is a number of errors in the text.
1) if you want to assign multiple values with SET statement, you must use the corresponding number of parameter markers in the statement:
SET (?, ..., ?) = (SELECT COL1, ..., COLn FROM ...);
PREPARE S FROM L_STMT;
EXECUTE S INTO L_V1, ..., L_Vn;
2) RETURNS TABLE (...) and PIPE(...) must have the same number of columns
You could directly query the tables SYSCAT.COLUMNS and SYSCAT.TABLES. The following returns the table schema and name followed by column name and their type. Column info is sorted by the column order:
select t.tabschema, t.tabname, c.colname, c.typename, c.colno
from syscat.columns c, syscat.tables t
where t.type='T' and t.tabname=c.tabname and t.tabschema=c.tabschema
order by 1,2,c.colno
BTW: Db2 has a tool db2look to export schema information.

How to save query result when using SQL in R?

If just SQL, we can use the following code to save a query result in a temporary table, so that we can use the result later.
CREATE TABLE #TEMPTABLE
(
Column1 type1,
Column2 type2,
Column3 type3
)
INSERT INTO #TEMPTABLE
SELECT ...
SELECT *
FROM #TEMPTABLE ...
This answer is from How to save select query results within temporary table?
However, I am using R to connect HANA. I need to use SQL query in R to select data from HANA. I need a tempory table for my query result. My code is like this:
sqlQuery(ch,paste('
CREATE TABLE #myTemp(
"/BIC/ZSALE_OFF" INT
)
INSERT INTO #myTemp
SELECT
"/BIC/ZSALE_OFF"
FROM
"SAPB1D"."/BIC/AZ_RT_A212"
'))
I have got the following error information:
[1] "42000 257 [SAP AG][LIBODBCHDB DLL][HDBODBC] Syntax error or access violation;257 sql syntax error: incorrect syntax near \"INSERT\": line 5 col 19 (at pos 124)"
[2] "[RODBC] ERROR: Could not SQLExecDirect '\n CREATE TABLE #myTemp(\n \"/BIC/ZSALE_OFF\" INT\n )\n INSERT INTO #myTemp\n SELECT\n \"/BIC/ZSALE_OFF\"\n FROM\n \"SAPB1D\".\"/BIC/AZ_RT_A212\"\n
Without the temperory result part, the code for just query is correct:
sqlQuery(ch,paste('
SELECT
"/BIC/ZSALE_OFF"
FROM
"SAPB1D"."/BIC/AZ_RT_A212"
'))
I am not sure if the grammar is correct, or there is something else I do not understand.

Trouble Getting Columns Names to Variable in SSIS Execute SQL Task

I'm attempting to validate some column headings before the import of a monthly data set. I've set up an Execute SQL Task that's supposed to retrieve the column headings of the prior month's table and store it in Header_Row as a single string with the field names separated by commas. The query runs just fine in SQL Server, but when running in SSIS, it throws the following error:
"The type of the value (Empty) being assigned to variable 'User:Header_Row' differs from the current variable type (String)."
1) Does this mean that I'm not getting anything back from my query?
2) Is there another method I should be using in SSIS to get the query results I'm looking for?
3) Is there an issue with me using the variable reference in my query as a portion of a string? I think the answer is yes, but would like to confirm, as my variable was still empty after changing this.
Original Query:
SELECT DISTINCT
STUFF((
SELECT
',' + COLUMN_NAME
FROM
db_Analytics.INFORMATION_SCHEMA.COLUMNS aa
WHERE
TABLE_NAME = 'dt_table_?'
ORDER BY
aa.ORDINAL_POSITION
FOR
XML PATH('')
), 1, 1, '') AS Fields
FROM
db_Analytics.INFORMATION_SCHEMA.COLUMNS a;
EDIT: After changing the variable to cover the full table name, I have a new error saying "The value type (__ComObject) can only be converted to variables of the type Object."
Final Query:
SELECT DISTINCT
CAST(STUFF((
SELECT
',' + COLUMN_NAME
FROM
db_Analytics.INFORMATION_SCHEMA.COLUMNS aa
WHERE
TABLE_NAME = ?
ORDER BY
aa.ORDINAL_POSITION
FOR
XML PATH('')
), 1, 1, '') As varchar(8000)) AS Fields
FROM
db_Analytics.INFORMATION_SCHEMA.COLUMNS a;
You are attempting to parameterize your query. Proper query parameterization is useful for avoiding SQL Injection attacks and the like.
Your query is looking for a TABLE_NAME that is literally 'dt_table_?' That's probably not what you want.
For laziness, I'd just rewrite it as
DECLARE #tname sysname = 'dt_table_' + ?;
SELECT DISTINCT
STUFF((
SELECT
',' + COLUMN_NAME
FROM
db_Analytics.INFORMATION_SCHEMA.COLUMNS aa
WHERE
TABLE_NAME = #tname
ORDER BY
aa.ORDINAL_POSITION
FOR
XML PATH('')
), 1, 1, '') AS Fields
FROM
db_Analytics.INFORMATION_SCHEMA.COLUMNS a;
If that's not working, you might need to use an Expression to build out the query.
I'm really pretty sure that this is your problem:
TABLE_NAME = 'dt_table_?'
I'm guessing this is an attempt to parameterize the query, but having the question mark inside the single-quote will cause the question mark to be taken literally.
Try like this instead:
TABLE_NAME = ?
And when you populate the variable that you use as the parameter value, include the 'dt_table_' part in the value of the variable.
EDIT:
Also in your ResultSet assignment, try changing "Fields" to "0" in the Result Name column.
There are two issues with the query above:
1) The query in the task was not properly parameterized. I fixed this by putting the full name of the prior month's table into the variable.
2) The default length of the result was MAX, which was causing an issue when SSIS would try to put it into my variable, Header_Row. I fixed this by casting the result of the query as varchar(8000).
Thanks for the help everyone.

Need help with OracleDB SQL Developer Regular Expression Query

When I let this Query run in my Oracle SQL Developer 1.5.3
select
COLUMNNAME ,
REPLACE( COLUMNNAME, 'BEFORESTRING', 'AFTERSTRING' )
as COLUMNNAME
from
TABLENAME
;
This ain't working. Does anyone know what's wrong with the Query? Or maybe the Oracle DB Developer Tool has a bug?
Update: I want to change the table not only print out a regex match.
Try:
update tablename
set columnname = REPLACE( COLUMNNAME, 'BEFORESTRING', 'AFTERSTRING' ) ;
That will change all rows unless you add a WHERE clause. If there is a lot of data this would be more efficient:
update tablename
set columnname = REPLACE( COLUMNNAME, 'BEFORESTRING', 'AFTERSTRING' )
where columnname like '%BEFORESTRING%';