Snowflake SQL - OBJECT_CONSTRUCT from COUNT and GROUP BY - sql

I'm trying to summarize data in a table:
counting total rows
counting values on specific fields
getting the distinct values on specific fields
and, more importantly, I'm struggling with:
getting the count for each field nested in an object
given this data
COL1
COL2
A
0
null
1
B
null
B
null
the expected result from this query would be:
with dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
)
select
count(1) as total
,count(col1) as col1
,array_agg(distinct col1) as dist_col1
--,object_construct(???) as col1_object_count
,count(col2) as col2
,array_agg(distinct col2) as dist_col2
--,object_construct(???) as col2_object_count
from
dummy
TOTAL
COL1
DIST_COL1
COL1_OBJECT_COUNT
COL2
DIST_COL2
COL2_OBJECT_COUNT
4
3
["A", "B"]
{"A": 1, "B", 2, null: 1}
2
[0, 1]
{0: 1, 1: 1, null: 2}
I've tried several functions inside OBJECT_CONSTRUCT mixed with ARRAY_AGG, but all failed
OBJECT_CONSTRUCT can work with several columns but only given all (*), if you try a select statement inside, it will fail
another issue is that analytical functions are not easily taken by the object or array functions in Snowflake.

You could use Snowflake Scripting or Snowpark for this but here's a solution that is somewhat flexible so you can apply it to different tables and column sets.
Create test table/view:
Create or Replace View dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
);
Set session variables for table and colnames.
set tbname = 'DUMMY';
set colnames = '["COL1", "COL2"]';
Create view that generates the required table_column_summary data:
Create or replace View table_column_summary as
with
-- Create table of required column names
cn as (
select VALUE::VARCHAR CNAME
from table(flatten(input => parse_json($colnames)))
)
-- Convert rows into objects
,ro as (
select
object_construct_keep_null(*) row_object
-- using identifier on session variable to dynamically supply table/view name
from identifier($tbname) )
-- Flatten row objects into key/values
,rof as (
select
key col_name,
ifnull(value,'null')::VARCHAR col_value
from ro, lateral flatten(input => row_object), cn
-- You will only need this filter if you need a subset
-- of columns from the source table/query summarised
where col_name = cn.cname)
-- Get the column value distinct value counts
,cdv as (
select col_name,
col_value,
sum(1) col_value_count
from rof
group by 1,2
)
-- and derive required column level stats and combine with cdv
,cv as (
select
(select count(1) from dummy) total,
col_name,
object_construct('COL_COUNT', count(col_value) ,
'COL_DIST', array_agg(distinct col_value),
'COL_OBJECT_COUNT', object_agg(col_value,col_value_count)) col_values
from cdv
group by 1,2)
-- Return result
Select * from cv;
Use this final query if you want a solution that works flexibility with any table/columns provided as input...
Select total, object_agg(col_name, col_values) col_values_obj
From table_column_summary
Group by 1;
Or use this final query if you want the fixed columns output as described in your question...
Select total,
COL1[0]:COL_COUNT COL1,
COL1[0]:COL_DIST DIST_COL1,
COL1[0]:COL_OBJECT_COUNT COL1_OBJECT_COUNT,
COL2[0]:COL_COUNT COL2,
COL2[0]:COL_DIST DIST_COL2,
COL2[0]:COL_OBJECT_COUNT COL2_OBJECT_COUNT
from table_column_summary
PIVOT ( ARRAY_AGG ( col_values )
FOR col_name IN ( 'COL1', 'COL2' ) ) as pt (total, col1, col2);

Related

How to unpivot a single row in Oracle 11?

I have a row of data and I want to turn this row into a column so I can use a cursor to run through the data one by one. I have tried to use
SELECT * FROM TABLE(PIVOT(TEMPROW))
but I get
'PIVOT' Invalid Identifier error.
I have also tried that same syntax but with
('select * from TEMPROW')
Everything I see using pivot is always using count or sum but I just want this one single row of all varchar2 to turn into a column.
My row would look something like this:
ABC | 123 | aaa | bbb | 111 | 222 |
And I need it to turn into this:
ABC
123
aaa
bbb
111
222
My code is similar to this:
BEGIN
OPEN C_1 FOR SELECT * FROM TABLE(PIVOT( 'SELECT * FROM TEMPROW'));
LOOP
FETCH C_1 INTO TEMPDATA;
EXIT WHEN C_2%NOTFOUND;
DBMS_OUTPUT.PUT_LINE(1);
END LOOP;
CLOSE C_1;
END;
You have to unpivot to convert whole row into 1 single column
select * from Table
UNPIVOT
(col for col in (
'ABC' , '123' , 'aaa' ,' bbb' , '111' , '222'
))
or use union but for that you need to add col names manually like
Select * from ( Select col1 from table
union
select col2 from table union...
Select coln from table)
sample output to show as below
One option for unpivoting would be numbering columns by decode() and cross join with the query containing the column numbers :
select decode(myId, 1, col1,
2, col2,
3, col3,
4, col4,
5, col5,
6, col6 ) as result_col
from temprow
cross join (select level AS myId FROM dual CONNECT BY level <= 6 );
or use a query with unpivot keyword by considering the common expression for the column ( namely col in this case ) must have same datatype as corresponding expression :
select result_col from
(
select col1, to_char(col2) as col2, col3, col4,
to_char(col5) as col5, to_char(col6) as col6
from temprow
)
unpivot (result_col for col in (col1,col2,col3,col4,col5,col6));
Demo

Using a case statement as an if statement

I am attempting to create an IF statement in BigQuery. I have built a concept that will work but it does not select the data from a table, I can only get it to display 1 or 0
Example:
SELECT --AS STRUCT
CASE
WHEN (
Select Count(1) FROM ( -- If the records are the same, then return = 0, if the records are not the same then > 1
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Prior_Filtered`
Except Distinct
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest_Filtered`
)
)= 0
THEN
(Select * from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest`) -- This Does not
work Scalar subquery cannot have more than one column unless using SELECT AS
STRUCT to build STRUCT values at [16:4] END
SELECT --AS STRUCT
CASE
WHEN (
Select Count(1) FROM ( -- If the records are the same, then return = 0, if the records are not the same then > 1
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Prior_Filtered`
Except Distinct
Select Distinct ESCO, SOURCE, LDCTEXT, STATUS,DDR_DATE, TempF, HeatingDegreeDays, DecaTherms
from `gas-ddr.gas_ddr_outbound.LexingtonDDRsOutbound_onchange_Latest_Filtered`
)
)= 0
THEN 1 --- This does work
Else
0
END
How can I Get this query to return results from an existing table?
You question is still a little generic, so my answer same as well - and just mimic your use case at extend I can reverse engineer it from your comments
So, in below code - project.dataset.yourtable mimics your table ; whereas
project.dataset.yourtable_Prior_Filtered and project.dataset.yourtable_Latest_Filtered mimic your respective views
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT 'aaa' cols, 'prior' filter UNION ALL
SELECT 'bbb' cols, 'latest' filter
), `project.dataset.yourtable_Prior_Filtered` AS (
SELECT cols FROM `project.dataset.yourtable` WHERE filter = 'prior'
), `project.dataset.yourtable_Latest_Filtered` AS (
SELECT cols FROM `project.dataset.yourtable` WHERE filter = 'latest'
), check AS (
SELECT COUNT(1) > 0 changed FROM (
SELECT DISTINCT cols FROM `project.dataset.yourtable_Latest_Filtered`
EXCEPT DISTINCT
SELECT DISTINCT cols FROM `project.dataset.yourtable_Prior_Filtered`
)
)
SELECT t.* FROM `project.dataset.yourtable` t
CROSS JOIN check WHERE check.changed
the result is
Row cols filter
1 aaa prior
2 bbb latest
if you changed your table to
WITH `project.dataset.yourtable` AS (
SELECT 'aaa' cols, 'prior' filter UNION ALL
SELECT 'aaa' cols, 'latest' filter
) ......
the result will be
Row cols filter
Query returned zero records.
I hope this gives you right direction
Added more explanations:
I can be wrong - but per your question - it looks like you have one table project.dataset.yourtable and two views project.dataset.yourtable_Prior_Filtered and project.dataset.yourtable_Latest_Filtered which present state of your table prior and after some event
So, first three CTE in the answer above just mimic those table and views which you described in your question.
They are here so you can see concept and can play with it without any extra work before adjusting this to your real use-case.
For your real use-case you should omit them and use your real table and views names and whatever columns the have.
So the query for you to play with is:
#standardSQL
WITH check AS (
SELECT COUNT(1) > 0 changed FROM (
SELECT DISTINCT cols FROM `project.dataset.yourtable_Latest_Filtered`
EXCEPT DISTINCT
SELECT DISTINCT cols FROM `project.dataset.yourtable_Prior_Filtered`
)
)
SELECT t.* FROM `project.dataset.yourtable` t
CROSS JOIN check WHERE check.changed
It should be a very simple IF statement in any language.
Unfortunately NO! it cannot be done with just simple IF and if you see it fit you can submit a feature request to BigQuery team for whatever you think makes sense

GBQ SQL: Return blank spaces if a record is not found in the table

I have a query as below. I would like SQL to return blank spaces if a key is not found in the table.
Select * from table_A where key in (1, 2, 3, 4)
Output:
1 x y
2 a b
'' '' ''
4 ds c
Assuming table_A has 3 columns and key 3 record in not in the table
Instead of empty strings you should work with NULL values to be type-safe.
NULL indicates that there is no value present in contrast to empty string or zeros which are still values of a certain type.
If you wanted to use empty strings you'd have to cast the key to a string on the run - not very convenient.
The trick to get your result is to create an ideal key-table with all keys - I'm using generate_array here from 1 to the max(key). Then left join your table to it and voila:
WITH test AS (SELECT * FROM UNNEST([
STRUCT(1 AS key, 'x' AS col1, 'y' AS col2),
STRUCT(2 AS key, 'a' AS col1, 'b' AS col2),
STRUCT(4 AS key, 'x' AS col1, 'y' AS col2)
])
)
SELECT
test.*
FROM UNNEST(GENERATE_ARRAY(1, (SELECT MAX(key) FROM test))) AS key
LEFT JOIN test USING(key)
gives you
If you wanted all keys, just SELECT * FROM ...

Find way for gathering data and replace with values from another table

I am looking for an Oracle SQL query to find a specific pattern and replace them with values from another table.
Scenario:
Table 1:
No column1
-----------------------------------------
12345 user:12345;group:56789;group:6785;...
Note: field 1 may be has one or more pattern
Table2 :
Id name type
----------------------
12345 admin user
56789 testgroup group
Result must be the same
No column1
-----------------------------------
12345 user: admin;group:testgroup
Logic:
First split the concatenated string to individual rows using connect
by clause and regex.
Join the newly created table(split_tab) with Table2(tab2).
Use listagg function to concatenate data in the columns.
Query:
WITH tab1 AS
( SELECT '12345' NO
,'user:12345;group:56789;group:6785;' column1
FROM DUAL )
,tab2 AS
( SELECT 12345 id
,'admin' name
,'user' TYPE
FROM DUAL
UNION
SELECT 56789 id
,'testgroup' name
,'group' TYPE
FROM DUAL )
SELECT no
,listagg(category||':'||name,';') WITHIN GROUP (ORDER BY tab2.id) column1
FROM ( SELECT NO
,REGEXP_SUBSTR( column1, '(\d+)', 1, LEVEL ) id
,REGEXP_SUBSTR( column1, '([a-z]+)', 1, LEVEL ) CATEGORY
FROM tab1
CONNECT BY LEVEL <= regexp_count( column1, '\d+' ) ) split_tab
,tab2
WHERE split_tab.id = tab2.id
GROUP BY no
Output:
No Column1
12345 user:admin;group:testgroup
with t1 (no, col) as
(
-- start of test data
select 1, 'user:12345;group:56789;group:6785;' from dual union all
select 2, 'user:12345;group:56789;group:6785;' from dual
-- end of test data
)
-- the lookup table which has the substitute strings
-- nid : concatenation of name and id as in table t1 which requires the lookup
-- tname : required substitute for each nid
, t2 (id, name, type, nid, tname) as
(
select t.*, type || ':' || id, type || ':' || name from
(
select 12345 id, 'admin' name, 'user' type from dual union all
select 56789, 'testgroup', 'group' from dual
) t
)
--select * from t2;
-- cte table calculates the indexes for the substrings (eg, user:12345)
-- no : sequence no in t1
-- col : the input string in t1
-- si : starting index of each substring in the 'col' input string that needs attention later
-- ei : ending index of each substring in the 'col' input string
-- idx : the order of substring to put them together later
,cte (no, col, si, ei, idx) as
(
select no, col, 1, case when instr(col,';') = 0 then length(col)+1 else instr(col,';') end, 1 from t1 union all
select no, col, ei+1, case when instr(col,';', ei+1) = 0 then length(col)+1 else instr(col,';', ei+1) end, idx+1 from cte where ei + 1 <= length(col)
)
,coll(no, col, sstr, idx, newstr) as
(
select
a.no, a.col, a.sstr, a.idx,
-- when a substitute is not found in t2, use the same input substring (eg. group:6785)
case when t2.tname is null then a.sstr else t2.tname end
from
(select cte.*, substr(col, si, ei-si) as sstr from cte) a
-- we don't want to miss if there is no substitute available in t2 for a substring
left outer join
t2
on (a.sstr = t2.nid)
)
select no, col, listagg(newstr, ';') within group (order by no, col, idx) from coll
group by no, col;

How to select a value from different row if the column is null in the current row?

I have a decode statement in my select SQL like this -
...
decode(instr(col1,'str1'), 0, 'STR1', 'STR2') as NAME,
...
The problem is the col1 could be null. So I thought I could use an inner decode like the following -
decode(instr(
decode(col1, null, (
select unique col1 from SAMETABLE st where st.pid = pid) as col2, col1), 'str1'), 0, 'STR1', 'STR2') as NAME,
But it failed.
Here is a possible snapshot of what in DB -
col1 pid
row1 null 1
row2 somevalue 1
I would like to use the value of col1 in row2 to replace the value in row1 when col1 is null in row1 and the two records' pid are equal.
Can anyone point out if I'm doing something impossible?
There are the following issues with your code:
You give the inner table an alias st and then do where st.pid = pid, but that is a self-reference, because also the other pid is taken from the table of the inner query. Instead, give the table in the main query an alias.
You give the outcome of the inner query an alias (as col2), but giving aliases is not allowed inside expressions, so that needs to be removed.
The inner query selects unique col1, but that can still give multiple results, which will give an error. The inner query must return exactly one value at all times (when there are different non null values, and even when there are none). So you should use an aggregate function, like min
decode(a, null, b, a) is a long way to write nvl(a, b)
So you could use this:
select decode(
instr(
nvl(col1, (select min(col1) from t where pid = t1.pid)),
'str1'
),
0, 'STR1', 'STR2'
) as NAME
from mytable t1
I have tried this in Oracle 11 g and it works pretty well. I have also tried to change the starting value of col1 and it works. So i guess you have some other issues that is related to the field type not on how DECODE works.
DECLARE
col1 VARCHAR(10);
result VARCHAR2(10);
BEGIN
col1:=null;
select DECODE(
instr(DECODE(col1, null, (select 'HELLO' from DUAL),
col1),'str1'), 0, 'STR1', 'STR2') into result
from DUAL;
dbms_output.PUT_LINE(result);
END
I guess you have to change the subquery :
select unique col1 from SAMETABLE st where st.pid = pid
with something like
select unique col1 from SAMETABLE st where st.pid = pid and col1 is not null