UNION ALL versus CONNECT BY LEVEL for generating rows

UNION ALL versus CONNECT BY LEVEL for generating rows - sql

I was wondering which is a better/faster/more efficient way of turning arbitrary strings into columns:
UNION ALL
SELECT my_field,
CASE WHEN my_field = 'str1'
THEN ...
...
END,
...
FROM (
SELECT 'str1' AS my_field FROM DUAL
UNION ALL
SELECT 'str2' AS my_field FROM DUAL
UNION ALL
SELECT 'str3' AS my_field FROM DUAL
),
...
CONNECT BY LEVEL
SELECT CASE WHEN rowno = 1
THEN 'str1'
...
END AS my_field,
CASE WHEN rowno = 1
THEN ...
...
END,
...
FROM (
SELECT ROWNUM rowno
FROM DUAL
CONNECT BY LEVEL <= 3
),
...
I'm inclined to go with the UNION ALL version if only because it makes the outermost SELECT simpler: I don't have to do a second CASE statement to get the desired string values. It also is more readable to see WHEN my_field = 'str1' rather than WHEN rowno = 1. The only reason I ask about the CONNECT BY LEVEL version is because it was suggested in Example of Data Pivots in SQL (rows to columns and columns to rows) (see the "From Two rows to Six rows (a column to row pivot)" section).
I have only SELECT access to the Oracle database I'm using, so I cannot run EXPLAIN PLAN. I have also tried to use WITH ... AS before, too, without luck.

I think you're confusing the purposed UNION ALL and CONNECT BY methods used in "Example of Data Pivots in SQL (rows to columns and columns to rows)"
The UNION ALL in your question is used to transform multiple rows with a single column into a single row with multiple columns:
label, 1, val1
label, 2, val2
label, 3, val3
into
label, val1, val2, val3
The CONNECT BY sub-query is used to transform a single row with multiple columns into mutiple rows with single column, so it uses as generator sub-query to multiply the existing data set:
label, val1, val2, val3
+
1
2
3
result into:
label, 1, val1, val2, val3
label, 2, val1, val2, val3
label, 3, val1, val2, val3
transformed into:
label, 1, val1
label, 2, val2
label, 3, val3

I would use connect by for any but the most trivial number of rows. Not having explain plan is a pain though ... you're really having your hands tied there. I'd be really keen on knowing what the optimiser's estimate of cardinality is.

Related

Snowflake SQL - OBJECT_CONSTRUCT from COUNT and GROUP BY

I'm trying to summarize data in a table:
counting total rows
counting values on specific fields
getting the distinct values on specific fields
and, more importantly, I'm struggling with:
getting the count for each field nested in an object
given this data
COL1
COL2
A
0
null
1
B
null
B
null
the expected result from this query would be:
with dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
)
select
count(1) as total
,count(col1) as col1
,array_agg(distinct col1) as dist_col1
--,object_construct(???) as col1_object_count
,count(col2) as col2
,array_agg(distinct col2) as dist_col2
--,object_construct(???) as col2_object_count
from
dummy
TOTAL
COL1
DIST_COL1
COL1_OBJECT_COUNT
COL2
DIST_COL2
COL2_OBJECT_COUNT
4
3
["A", "B"]
{"A": 1, "B", 2, null: 1}
2
[0, 1]
{0: 1, 1: 1, null: 2}
I've tried several functions inside OBJECT_CONSTRUCT mixed with ARRAY_AGG, but all failed
OBJECT_CONSTRUCT can work with several columns but only given all (*), if you try a select statement inside, it will fail
another issue is that analytical functions are not easily taken by the object or array functions in Snowflake.

You could use Snowflake Scripting or Snowpark for this but here's a solution that is somewhat flexible so you can apply it to different tables and column sets.
Create test table/view:
Create or Replace View dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
);
Set session variables for table and colnames.
set tbname = 'DUMMY';
set colnames = '["COL1", "COL2"]';
Create view that generates the required table_column_summary data:
Create or replace View table_column_summary as
with
-- Create table of required column names
cn as (
select VALUE::VARCHAR CNAME
from table(flatten(input => parse_json($colnames)))
)
-- Convert rows into objects
,ro as (
select
object_construct_keep_null(*) row_object
-- using identifier on session variable to dynamically supply table/view name
from identifier($tbname) )
-- Flatten row objects into key/values
,rof as (
select
key col_name,
ifnull(value,'null')::VARCHAR col_value
from ro, lateral flatten(input => row_object), cn
-- You will only need this filter if you need a subset
-- of columns from the source table/query summarised
where col_name = cn.cname)
-- Get the column value distinct value counts
,cdv as (
select col_name,
col_value,
sum(1) col_value_count
from rof
group by 1,2
)
-- and derive required column level stats and combine with cdv
,cv as (
select
(select count(1) from dummy) total,
col_name,
object_construct('COL_COUNT', count(col_value) ,
'COL_DIST', array_agg(distinct col_value),
'COL_OBJECT_COUNT', object_agg(col_value,col_value_count)) col_values
from cdv
group by 1,2)
-- Return result
Select * from cv;
Use this final query if you want a solution that works flexibility with any table/columns provided as input...
Select total, object_agg(col_name, col_values) col_values_obj
From table_column_summary
Group by 1;
Or use this final query if you want the fixed columns output as described in your question...
Select total,
COL1[0]:COL_COUNT COL1,
COL1[0]:COL_DIST DIST_COL1,
COL1[0]:COL_OBJECT_COUNT COL1_OBJECT_COUNT,
COL2[0]:COL_COUNT COL2,
COL2[0]:COL_DIST DIST_COL2,
COL2[0]:COL_OBJECT_COUNT COL2_OBJECT_COUNT
from table_column_summary
PIVOT ( ARRAY_AGG ( col_values )
FOR col_name IN ( 'COL1', 'COL2' ) ) as pt (total, col1, col2);

How to find each case of matching pattern within a string and return as rows

I'm trying to identify reference numbers contained in strings in a column. The table looks something like this:
col1 col2
1 fgREF1234fhjdREF1235hgkjREF1236
2 hREF1237hjdfREF1238djhfhs
Need to write an SQL query that identifies the 'REF' followed by the 4 digits and returns each in its own row.
The output should look like this:
col1 ref
1 REF1234
1 REF1235
1 REF1236
2 REF1237
2 REF1238
I have tried:
select
case when substr(substr(col2, instr(col2, 'REF'), 7), 1, 1) like 'R'
then substr(col2, instr(col2, 'R'), 7) else null end ref
from table
...but this will only identify the first match in the string.
I am using Oracle SQL but ideally the solution would be able to be converted to other SQL variants.
Any help would be much appreciated!

You can use regexp_substr delimited by connect by level <= regexp_count(col2,'REF') ( the appearance time of the pattern string REF within the strings col2 )
with t(col1,col2) as
(
select 1,'fgREF1234fhjdREF1235hgkjREF1236' from dual union all
select 2,'hREF1237hjdfREF1238djhfhs' from dual
)
select col1,
regexp_substr(col2,'REF[0-9]+',1,level) as ref
from t
connect by level <= regexp_count(col2,'REF')
and prior col1 = col1
and prior sys_guid() is not null;
Demo

You can use the below code to get the desired result :-
select x.col1, explode(x.ref) as ref from (
select col1,split(trim(regexp_replace(col2,'[^REF0-9]',' ')),' ') as ref
from inp

Find min max over all columns without listing down each column name in SQL

I have a SQL table (actually a BigQuery table) that has a huge number of columns (over a thousand). I want to quickly find the min and max value of each column. Is there a way to do that?
It is impossible for me to list all the columns. Looking for ways to do something like
SELECT MAX(*) FROM mytable;
and then running
SELECT MIN(*) FROM mytable;
I have been unable to Google a way of doing that. Not sure that's even possible.
For example, if my table has the following schema:
col1 col2 col3 .... col1000
the (say, max) query should return
Row col1 col2 col3 ... col1000
1 3 18 0.6 ... 45
and the min query should return (say)
Row col1 col2 col3 ... col1000
1 -5 4 0.1 ... -5
The numbers are just for illustration. The column names could be different strings and not easily scriptable.

See below example for BigQuery Standard SQL - it works for any number of columns and does not require explicit calling/use of columns names
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
)
SELECT
MIN(CAST(value AS INT64)) AS min_value,
MAX(CAST(value AS INT64)) AS max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value
with result
Row min_value max_value
1 -1 11
Note: if your columns are of STRING data type - you should remove CAST ... AS INT64
Or if they are of FLOAT64 - replace INT64 with FLOAT64 in the CAST function
Update
Below is option to get MIN/Max for each column and present result as array of respective values as list of respective values in the order of the columns
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 14 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
), temp AS (
SELECT pos, MIN(CAST(value AS INT64)) min_value, MAX(CAST(value AS INT64)) max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value WITH OFFSET pos
GROUP BY pos
)
SELECT 'min_values' stats, TO_JSON_STRING(ARRAY_AGG(min_value ORDER BY pos)) vals FROM temp UNION ALL
SELECT 'max_values', TO_JSON_STRING(ARRAY_AGG(max_value ORDER BY pos)) FROM temp
with result as
Row stats vals
1 min_values [-1,2,3,4]
2 max_values [7,11,5,14]
Hope this is something you can still apply to whatever your final goal

Oracle SQL Min in Select Clause

Can some one please help me in writing a sql query that should do a oracle min function based on the following conditions.
For eg for column values
0,0,0,0 then output should be 0
0,null,0,null then output should be o
0,2,4,5,6 then output should be 2 (Note that we are excluding Zero here)
0,2,null,4,5 then output should be 2 (same here we are excluding zero)
null,null,null, null then output should be null.
I wrote query already that satisfies all the above cases but failing for last case when all the column values are null. Instead of returning null it is returning 0. Can some one modify the below query to fit for the last case as well?
select NVL(MIN(NULLIF(columnname,0)),0) from tablename;
Please also keep in mind that the query should be runnable in oracle as well as hsqldb as we are using hsql db for running junits.

If all 4 cases satisfied by your query then just a case will solve your problem.
SELECT CASE WHEN MIN(COLUMNNAME) IS NULL THEN NULL ELSE NVL(MIN(NULLIF(COLUMNNAME,0)),0) END FROM TABLENAME;
Note:- assuming all the cases satisfied by your query except 5th.

I will show below an input table with two columns, ID and VAL, to illustrate the various possibilities. You want a single result per ID (or even for the entire table), so this must be a job for GROUP BY and some aggregate function. You want to distinguish between three types of values: Greater than zero, zero, and null (in this order); you want to pick the "highest priority group" that exists for each ID (in this order of priority), and for that priority group only, you want to pick the min value. This is exactly what the aggregate FIRST/LAST function does. To order by the three "classes" of values, we use a CASE expression in the ORDER BY clause of the aggregate LAST function.
The WITH clause below is not part of the solution - I only include it to create test data (in your real life situation, use your actual table and column names and remove the entire WITH clause).
with
inputs ( id, val ) as (
select 1, 0 from dual union all
select 1, 0 from dual union all
select 1, 0 from dual union all
select 2, 0 from dual union all
select 2, null from dual union all
select 2, 0 from dual union all
select 3, 0 from dual union all
select 3, 2 from dual union all
select 3, 5 from dual union all
select 4, 0 from dual union all
select 4, 3 from dual union all
select 4, null from dual union all
select 5, null from dual union all
select 5, null from dual
)
select id,
min(val) keep (dense_rank last order by case when val > 0 then 2
when val = 0 then 1
else 0
end
) as min_val
from inputs
group by id
order by id
;
ID MIN_VAL
---------- ----------
1 0
2 0
3 2
4 3
5

Oracle SQL -- select from two columns and combine into one

I have this table:
Vals
Val1 Val2 Score
A B 1
C 2
D 3
I would like the output to be a single column that is the "superset" of the Vals1 and Val2 variable. It also keeps the "score" variable associated with that value.
The output should be:
Val Score
A 1
B 1
C 2
D 3
Selecting from this table twice and then unioning is absolutely not a possibility because producing it is very expensive. In addition I cannot use a with clause because this query uses one in a sub-query and for some reason Oracle doesn't support two with clauses.
I don't really care about how repeat values are dealt with, whatever is easiest/fastest.
How can I generate my appropriate output?

Here is solution without using unpivot.
with columns as (
select level as colNum from dual connect by level <= 2
),
results as (
select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
columns
)
select * from results where val is not null
Here is essentially the same query without the WITH clause:
select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
(select level as colNum from dual connect by level <= 2) columns
where case colNum
when 1 then Val1
when 2 then Val2
end is not null
Or a bit more concisely
select *
from ( select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
(select level as colNum from dual connect by level <= 2) columns
) results
where val is not null

try this, looks like you want to convert column values into rows
select val1, score from vals where val1 is not null
union
select val2,score from vals where val2 is not null

If you're on Oracle 11, unPivot will help:
SELECT *
FROM vals
UNPIVOT ( val FOR origin IN (val1, val2) )
you can choose any names instead of 'val' and 'origin'.
See Oracle article on pivot / unPivot.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

UNION ALL versus CONNECT BY LEVEL for generating rows - sql

I would use connect by for any but the most trivial number of rows. Not having explain plan is a pain though ... you're really having your hands tied there. I'd be really keen on knowing what the optimiser's estimate of cardinality is.

Related

Snowflake SQL - OBJECT_CONSTRUCT from COUNT and GROUP BY

How to find each case of matching pattern within a string and return as rows

Find min max over all columns without listing down each column name in SQL

Oracle SQL Min in Select Clause

Oracle SQL -- select from two columns and combine into one

Categories

Resources