Find min max over all columns without listing down each column name in SQL - sql

I have a SQL table (actually a BigQuery table) that has a huge number of columns (over a thousand). I want to quickly find the min and max value of each column. Is there a way to do that?
It is impossible for me to list all the columns. Looking for ways to do something like
SELECT MAX(*) FROM mytable;
and then running
SELECT MIN(*) FROM mytable;
I have been unable to Google a way of doing that. Not sure that's even possible.
For example, if my table has the following schema:
col1 col2 col3 .... col1000
the (say, max) query should return
Row col1 col2 col3 ... col1000
1 3 18 0.6 ... 45
and the min query should return (say)
Row col1 col2 col3 ... col1000
1 -5 4 0.1 ... -5
The numbers are just for illustration. The column names could be different strings and not easily scriptable.

See below example for BigQuery Standard SQL - it works for any number of columns and does not require explicit calling/use of columns names
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
)
SELECT
MIN(CAST(value AS INT64)) AS min_value,
MAX(CAST(value AS INT64)) AS max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value
with result
Row min_value max_value
1 -1 11
Note: if your columns are of STRING data type - you should remove CAST ... AS INT64
Or if they are of FLOAT64 - replace INT64 with FLOAT64 in the CAST function
Update
Below is option to get MIN/Max for each column and present result as array of respective values as list of respective values in the order of the columns
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 14 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
), temp AS (
SELECT pos, MIN(CAST(value AS INT64)) min_value, MAX(CAST(value AS INT64)) max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value WITH OFFSET pos
GROUP BY pos
)
SELECT 'min_values' stats, TO_JSON_STRING(ARRAY_AGG(min_value ORDER BY pos)) vals FROM temp UNION ALL
SELECT 'max_values', TO_JSON_STRING(ARRAY_AGG(max_value ORDER BY pos)) FROM temp
with result as
Row stats vals
1 min_values [-1,2,3,4]
2 max_values [7,11,5,14]
Hope this is something you can still apply to whatever your final goal

Related

Snowflake SQL - OBJECT_CONSTRUCT from COUNT and GROUP BY

I'm trying to summarize data in a table:
counting total rows
counting values on specific fields
getting the distinct values on specific fields
and, more importantly, I'm struggling with:
getting the count for each field nested in an object
given this data
COL1
COL2
A
0
null
1
B
null
B
null
the expected result from this query would be:
with dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
)
select
count(1) as total
,count(col1) as col1
,array_agg(distinct col1) as dist_col1
--,object_construct(???) as col1_object_count
,count(col2) as col2
,array_agg(distinct col2) as dist_col2
--,object_construct(???) as col2_object_count
from
dummy
TOTAL
COL1
DIST_COL1
COL1_OBJECT_COUNT
COL2
DIST_COL2
COL2_OBJECT_COUNT
4
3
["A", "B"]
{"A": 1, "B", 2, null: 1}
2
[0, 1]
{0: 1, 1: 1, null: 2}
I've tried several functions inside OBJECT_CONSTRUCT mixed with ARRAY_AGG, but all failed
OBJECT_CONSTRUCT can work with several columns but only given all (*), if you try a select statement inside, it will fail
another issue is that analytical functions are not easily taken by the object or array functions in Snowflake.
You could use Snowflake Scripting or Snowpark for this but here's a solution that is somewhat flexible so you can apply it to different tables and column sets.
Create test table/view:
Create or Replace View dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
);
Set session variables for table and colnames.
set tbname = 'DUMMY';
set colnames = '["COL1", "COL2"]';
Create view that generates the required table_column_summary data:
Create or replace View table_column_summary as
with
-- Create table of required column names
cn as (
select VALUE::VARCHAR CNAME
from table(flatten(input => parse_json($colnames)))
)
-- Convert rows into objects
,ro as (
select
object_construct_keep_null(*) row_object
-- using identifier on session variable to dynamically supply table/view name
from identifier($tbname) )
-- Flatten row objects into key/values
,rof as (
select
key col_name,
ifnull(value,'null')::VARCHAR col_value
from ro, lateral flatten(input => row_object), cn
-- You will only need this filter if you need a subset
-- of columns from the source table/query summarised
where col_name = cn.cname)
-- Get the column value distinct value counts
,cdv as (
select col_name,
col_value,
sum(1) col_value_count
from rof
group by 1,2
)
-- and derive required column level stats and combine with cdv
,cv as (
select
(select count(1) from dummy) total,
col_name,
object_construct('COL_COUNT', count(col_value) ,
'COL_DIST', array_agg(distinct col_value),
'COL_OBJECT_COUNT', object_agg(col_value,col_value_count)) col_values
from cdv
group by 1,2)
-- Return result
Select * from cv;
Use this final query if you want a solution that works flexibility with any table/columns provided as input...
Select total, object_agg(col_name, col_values) col_values_obj
From table_column_summary
Group by 1;
Or use this final query if you want the fixed columns output as described in your question...
Select total,
COL1[0]:COL_COUNT COL1,
COL1[0]:COL_DIST DIST_COL1,
COL1[0]:COL_OBJECT_COUNT COL1_OBJECT_COUNT,
COL2[0]:COL_COUNT COL2,
COL2[0]:COL_DIST DIST_COL2,
COL2[0]:COL_OBJECT_COUNT COL2_OBJECT_COUNT
from table_column_summary
PIVOT ( ARRAY_AGG ( col_values )
FOR col_name IN ( 'COL1', 'COL2' ) ) as pt (total, col1, col2);

How to unpivot a single row in Oracle 11?

I have a row of data and I want to turn this row into a column so I can use a cursor to run through the data one by one. I have tried to use
SELECT * FROM TABLE(PIVOT(TEMPROW))
but I get
'PIVOT' Invalid Identifier error.
I have also tried that same syntax but with
('select * from TEMPROW')
Everything I see using pivot is always using count or sum but I just want this one single row of all varchar2 to turn into a column.
My row would look something like this:
ABC | 123 | aaa | bbb | 111 | 222 |
And I need it to turn into this:
ABC
123
aaa
bbb
111
222
My code is similar to this:
BEGIN
OPEN C_1 FOR SELECT * FROM TABLE(PIVOT( 'SELECT * FROM TEMPROW'));
LOOP
FETCH C_1 INTO TEMPDATA;
EXIT WHEN C_2%NOTFOUND;
DBMS_OUTPUT.PUT_LINE(1);
END LOOP;
CLOSE C_1;
END;
You have to unpivot to convert whole row into 1 single column
select * from Table
UNPIVOT
(col for col in (
'ABC' , '123' , 'aaa' ,' bbb' , '111' , '222'
))
or use union but for that you need to add col names manually like
Select * from ( Select col1 from table
union
select col2 from table union...
Select coln from table)
sample output to show as below
One option for unpivoting would be numbering columns by decode() and cross join with the query containing the column numbers :
select decode(myId, 1, col1,
2, col2,
3, col3,
4, col4,
5, col5,
6, col6 ) as result_col
from temprow
cross join (select level AS myId FROM dual CONNECT BY level <= 6 );
or use a query with unpivot keyword by considering the common expression for the column ( namely col in this case ) must have same datatype as corresponding expression :
select result_col from
(
select col1, to_char(col2) as col2, col3, col4,
to_char(col5) as col5, to_char(col6) as col6
from temprow
)
unpivot (result_col for col in (col1,col2,col3,col4,col5,col6));
Demo

Unable to write exact sql query to get result set

I have below type of data set:
Base Col1 Col2 Col3
1000 0 10 1100
1100 0 10 1210
1210 0 10 1331
For deriving col3, I will use formula like
col3 = (base - col1) * (1 + col2 / 100)
If you observe above data set 1st row of col3 value is the second row base column value. And Col2 value is same for all records.
So now my problem is at later point of time my col1 (Col1 column is a part of formula) row values will update based on this i need to recalculate col3 values by using mentioned formula.
See below data set for example, if col1 value has updates then we need to recalculate col3 values like below by using formula (Col3=(base-col1)*(1+col2/100))
Base Col1 Col2 Col3
1000 10 10 1089
1089 20 10 1175.9
1175.9 30 10 1293.4
For getting above data set, I have tried like below.
SELECT
col1, col2,
col3 - SUM(col1 * (Power((1 + COL2 / 100.00), RNO)))
OVER(ORDER BY RNO ROWS UNBOUNDED PRECEDING)
FROM
(SELECT
row_number() OVER(ORDER BY col1) rno,
*
FROM
#TABLE1) A
But I am not getting the correct results.
Please use below script to create table and for populating data.
CREATE TABLE #Table1
(
[col1] INT,
[col2] INT,
[col3] INT
);
INSERT INTO #Table1
([col1],
[col2],
[col3])
VALUES (10,10, 1100),
(20,10,1210),
(30,10,1331);
Note:- In my example always base value will dependent on previous row col3 value.
Please help me.
You should not store calculation results in your table. This is redundant and can lead to wrong data, as you notice. Your table also lacks an order. So first thing: Give the records a timestamp or a number. Then remove Col3 and Base. (Well, you must have the initial base value of course, so either keep the base column and make all values null except for the first one or store the value somewhere else or use a fix value in your query.)
Rno Col1 Col2
1 0 10
2 0 10
3 0 10
To get the results you need a recursive query. Below query considers RNOs as adjacent (with a non-adjacent number or dates, you'd have to use row_number to number your rows first). Here I just use 1000 as the base. If this is variable, store it somewhere and take it from there.
with cte(rno, base, col1, col2, col3) as
(
select rno, 1000 as base, col1, col2, (1000 - col1) * (1 + col2/100) as col3
from mytable
where rno = 1
union all
select m.rno, cte.col3 as base, m.col1, m.col2, (cte.col3 - m.col1) * (1 + m.col2/100)
from mytable m
join cte on m.rno = cte.rno + 1
)
select * from cte
order by rno;
You can create a view for this of course.
When col1 changes you need to update col3 of same row,
When col3 changes you need to update Base of next row,
When Base changes you need to update col3 of same row..
and so on..
At every update of Base, col1, or col3 run this loop:
declare #i int = 1
while #i<>0 begin
update t set Col3 = newCol3
from (
select top 1 base, col1, col2, col3, (base - col1) * (1 + col2 / 100.0) newCol3
from #t
where col3 <> (base - col1) * (1 + col2 / 100.0)
order by base
) t
update t set base = newbase
from (
select top 1 base, col1, col2, col3, newbase
from (
select base, col1, col2, col3, LAG(col3,1,null) over (order by base) newbase
from #t
) t
where base <> newbase
order by base
) t
if ##ROWCOUNT=0 set #i=0
end
output
base col1 col2 col3
1000 10 10 1089
1089 20 10 1175,9
1175,9 30 10 1260,49 -- I think you have an error in your example

SQL - Group by numbers according to their difference

I have a table and I want to group rows that have at most x difference at col2.
For example,
col1 col2
abg 3
abw 4
abc 5
abd 6
abe 20
abf 21
After query I want to get groups such that
group 1: abg 3
abw 4
abc 5
abd 6
group 2: abe 20
abf 21
In this example difference is 1.
How can write such a query?
For Oracle (or anything that supports window functions) this will work:
select col1, col2, sum(group_gen) over (order by col2) as grp
from (
select col1, col2,
case when col2 - lag(col2) over (order by col2) > 1 then 1 else 0 end as group_gen
from some_table
)
Check it on SQLFiddle.
This should get what you need, and changing the gap to that of 5, or any other number is a single change at the #lastVal +1 (vs whatever other difference). The prequery "PreSorted" is required to make sure the data is being processed sequentially so you don't get out-of-order entries.
As each current row is processed, it's column 2 value is stored in the #lastVal for test comparison of the next row, but remains as a valid column "Col2". There is no "group by" as you are just wanting a column to identify where each group is associated vs any aggregation.
select
#grp := if( PreSorted.col2 > #lastVal +1, #grp +1, #grp ) as GapGroup,
PreSorted.col1,
#lastVal := PreSorted.col2 as Col2
from
( select
YT.col1,
YT.col2
from
YourTable YT
order by
YT.col2 ) PreSorted,
( select #grp := 1,
#lastVal := -1 ) sqlvars
try this query, you can use 1 and 2 as input and get you groups:
var grp number(5)
exec :grp :=1
select * from YourTABLE
where (:grp = 1 and col2 < 20) or (:grp = 2 and col2 > 6);

Oracle SQL -- select from two columns and combine into one

I have this table:
Vals
Val1 Val2 Score
A B 1
C 2
D 3
I would like the output to be a single column that is the "superset" of the Vals1 and Val2 variable. It also keeps the "score" variable associated with that value.
The output should be:
Val Score
A 1
B 1
C 2
D 3
Selecting from this table twice and then unioning is absolutely not a possibility because producing it is very expensive. In addition I cannot use a with clause because this query uses one in a sub-query and for some reason Oracle doesn't support two with clauses.
I don't really care about how repeat values are dealt with, whatever is easiest/fastest.
How can I generate my appropriate output?
Here is solution without using unpivot.
with columns as (
select level as colNum from dual connect by level <= 2
),
results as (
select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
columns
)
select * from results where val is not null
Here is essentially the same query without the WITH clause:
select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
(select level as colNum from dual connect by level <= 2) columns
where case colNum
when 1 then Val1
when 2 then Val2
end is not null
Or a bit more concisely
select *
from ( select case colNum
when 1 then Val1
when 2 then Val2
end Val,
score
from vals,
(select level as colNum from dual connect by level <= 2) columns
) results
where val is not null
try this, looks like you want to convert column values into rows
select val1, score from vals where val1 is not null
union
select val2,score from vals where val2 is not null
If you're on Oracle 11, unPivot will help:
SELECT *
FROM vals
UNPIVOT ( val FOR origin IN (val1, val2) )
you can choose any names instead of 'val' and 'origin'.
See Oracle article on pivot / unPivot.