SQL - Group by numbers according to their difference

SQL - Group by numbers according to their difference - sql

I have a table and I want to group rows that have at most x difference at col2.
For example,
col1 col2
abg 3
abw 4
abc 5
abd 6
abe 20
abf 21
After query I want to get groups such that
group 1: abg 3
abw 4
abc 5
abd 6
group 2: abe 20
abf 21
In this example difference is 1.
How can write such a query?

For Oracle (or anything that supports window functions) this will work:
select col1, col2, sum(group_gen) over (order by col2) as grp
from (
select col1, col2,
case when col2 - lag(col2) over (order by col2) > 1 then 1 else 0 end as group_gen
from some_table
)
Check it on SQLFiddle.

This should get what you need, and changing the gap to that of 5, or any other number is a single change at the #lastVal +1 (vs whatever other difference). The prequery "PreSorted" is required to make sure the data is being processed sequentially so you don't get out-of-order entries.
As each current row is processed, it's column 2 value is stored in the #lastVal for test comparison of the next row, but remains as a valid column "Col2". There is no "group by" as you are just wanting a column to identify where each group is associated vs any aggregation.
select
#grp := if( PreSorted.col2 > #lastVal +1, #grp +1, #grp ) as GapGroup,
PreSorted.col1,
#lastVal := PreSorted.col2 as Col2
from
( select
YT.col1,
YT.col2
from
YourTable YT
order by
YT.col2 ) PreSorted,
( select #grp := 1,
#lastVal := -1 ) sqlvars

try this query, you can use 1 and 2 as input and get you groups:
var grp number(5)
exec :grp :=1
select * from YourTABLE
where (:grp = 1 and col2 < 20) or (:grp = 2 and col2 > 6);

Related

How to update each column one at time for each row in snowflake

Suppose I have 10 columns in my table and I want to update each column but one at a time for each row up to 10 rows.
if table is like
1,2,3
4,5,6
7,8,9
I want to update it like
x,2,3
4,y,6
7,8,z
Columns can be of any count so need dynamic approach. Also sometimes need to exclude some columns.
I tried to see if I can update row based on row id but there is no such option available as row id. I don't wanna change design of table to include a counter column.

you can use window function to assign a a row id and based on that :
with cte as (
select * from (
select * , row_number() over (order by id) rn
from tablename
) t ) ;
update t
set col1 = case when rn = 1 then <updatevalue> else col1 end
, col2 = case when rn = 2 then <updatevalue> else col2 end
, col3 = case when rn = 3 then <updatevalue> else col3 end
, ...
from tablename t
join cte on cte.id = t.id

The requirement "Columns can be of any count so need dynamic approach" looks like as a try to implement matrix as a table.
Alternative approach could be usage of ARRAY type and storing entire structure as single "cell" in the table.
CREATE OR REPLACE TABLE t
AS
SELECT ARRAY_CONSTRUCT(ARRAY_CONSTRUCT(1,2,3),
ARRAY_CONSTRUCT(4,5,6),
ARRAY_CONSTRUCT(7,8,9)) c
UNION ALL
SELECT ARRAY_CONSTRUCT(ARRAY_CONSTRUCT(10,20,30),
ARRAY_CONSTRUCT(40,50,60),
ARRAY_CONSTRUCT(70,80,90)) c;
SELECT *
FROM t;
/*
C
[[1,2,3],[4,5,6],[7,8,9]]
[[10,20,30],[40,50,60],[70,80,90]]
*/
Accessing elements:
SELECT c[0][0], c[0][1], c[0][2],
c[1][0], c[1][1], c[1][2],
c[2][0], c[2][1], c[2][2]
FROM t;
/*
C[0][0] C[0][1] C[0][2] C[1][0] C[1][1] C[1][2] C[2][0] C[2][1] C[2][2]
1 2 3 4 5 6 7 8 9
10 20 30 40 50 60 70 80 90
*/
Update:
UPDATE t
SET c = ARRAY_CONSTRUCT(ARRAY_CONSTRUCT('x' , c[0][1], c[0][2])
,ARRAY_CONSTRUCT(c[1][0], 'y' ,c[1][2])
,ARRAY_CONSTRUCT(c[2][0], c[2][1] , 'z' )
);
SELECT * FROM t;
/*
C
[["x",2,3],[4,"y",6],[7,8,"z"]]
[["x",20,30],[40,"y",60],[70,80,"z"]]
*/
More robust transformations could be performed via user-defined functions.

Group identifiers/values that are related with each other between multiple columns

I want to group identifiers that are related with each other between multiple columns and create/assign a unique group id.
Also, If we receive a new row, we can assign the right id respecting what has been done before for others group id
For example:
Col1
Col2
Col3
Col4
AA
Null
33
12
BB
Null
45
12
AA
123
65
15
CC
123
NULL
42
DD
Null
10
42
EE
NULL
20
NULL
FF
145
33
NULL
GG
NULL
NULL
11
Desired result:
The group ID =1 beacuse in col1, it's the same value row 1 and 3 (AA) and for row 4 it's also ID 1 because in the second column, the value for AA it's 123 (the same for CC)
If there is any match between rows and cross the columns, we generate an id
Col1
Col 2
Col 3
Col 4
Group ID
AA
Null
33
12
1
BB
Null
45
12
1
AA
123
65
15
1
CC
123
NULL
42
1
DD
Null
10
42
1
EE
NULL
20
NULL
2
FF
145
33
NULL
1
GG
NULL
NULL
11
3

I've been doing some work on this and agree with Kashyap- I cannot find a way to do this is a single statement. You need either a recursive CTE or a loop. Synapse does not currently support recursive CTEs, which leaves using a loop to create the effect you want.
One concern that came up while I was working with this. As you continue to add data, you'll have more and more overlaps and could eventually end up with just one group. That depends on your dataset- you might have something you can guarantee will have discrete divisions. The way the script I put together works, a new match will update any group IDs, even in existing data. You could modify it to only set group IDs only for new rows, but then you could end up in a situation when one row matches multiple groups.
Certainly not the only option, but this is the script I pulled together. It is dependent on having a unique ID that will remain the same in each iteration. Because the loop uses updates instead of inserts, prepping the data would involve inserting the data into your new table without the group, and you can create your ID at that time using auto-increment or otherwise. The script works best with an INT ID column, but should work with a guid if that is necessary.
So process is essentially this:
Do whatever initial prep you need to do to inserting data into the table and creating an ID
Join the table back onto itself, once for each column that could contain a match
Update the Group ID to be the minimum value across the IDs and current group IDs of that set of matches.
Check to see if we need to do another round. Because we are using minimum ID as a group number, there will be a row where the ID = group ID in each group
CREATE TABLE #testtable
(
[id] INT NOT NULL,
[col1] INT NOT NULL,
[col2] INT NULL,
[col3] INT NULL,
[groupnumber] INT NULL
)
INSERT INTO #testtable
(id,
col1,
col2,
col3)
INSERT INTO #testTable (id, col1, col2, col3)
SELECT 1, 1, 5, 33 UNION ALL -- First
SELECT 2, 2, null, 45 UNION ALL -- Second
SELECT 3, 1, 123, 65 UNION ALL -- First
SELECT 4, 3, 123, null UNION ALL -- First
SELECT 5, 10, null, 10 UNION ALL -- Third
SELECT 6, 5, null, 45 UNION ALL -- Second
SELECT 7, 6, 145, 33 -- First
DECLARE #RemainingRows INT,
#LoopCounter INT, #MaxLoops int -- To protect against infinite loop
SET #RemainingRows = (SELECT COUNT([id]) FROM #testtable)
SET #LoopCounter = 0;
SET #MaxLoops = 10;
WHILE( #RemainingRows > 0
AND #LoopCounter < #MaxLoops )
BEGIN
WITH combineddata AS
(
SELECT
id,
col1,
col2,
col3,
groupnumber
FROM
#testtable
),
--Create a set a rows that contains all rows and all possible matches
matcheddata AS
(
SELECT
c1.id,
c1.col1 AS c1col1,
c1.col2 AS c1col2,
c1.col3 AS c1col3,
c1.groupnumber AS groupNumber1,
c2.id AS RowNum2,
c2.groupnumber AS groupNumber2,
c3.id AS RowNum3,
c3.groupnumber AS groupNumber3,
c4.id AS RowNum4,
c4.groupnumber AS groupNumber4
FROM
combineddata c1
LEFT JOIN
combineddata c2
ON c1.col1 = c2.col1
LEFT JOIN
combineddata c3
ON c1.col2 = c3.col2
LEFT JOIN
combineddata c4
ON c1.col3 = c4.col3
)
UPDATE #testtable
SET
groupnumber =
CASE
WHEN
NEW.groupnumber IS NULL
THEN
NULL
ELSE
NEW.groupnumber
END
FROM
(
SELECT
id,
c1col1,
c1col2,
c1col3,
MIN(groupnumber) AS GroupNumber
FROM
matcheddata CROSS apply (
SELECT
MIN(c) AS GroupNumber
FROM (VALUES
(id),
(RowNum2),
(RowNum3),
(RowNum4),
(groupNumber1),
(groupNumber2),
(groupNumber3),
(groupNumber4)
) AS v (C)
WHERE
c IS NOT NULL) g
GROUP BY
id,
c1col1,
c1col2,
c1col3
) NEW
INNER JOIN # testtable
ON NEW.id = #testtable.id
SET
#LoopCounter = #LoopCounter + 1
SET
#RemainingRows =
(
SELECT
COUNT(t1.id)
FROM
#testtable t1
LEFT JOIN
#testtable t2
ON t1.groupnumber = t2.[id]
WHERE
t2.id IS NULL
OR t2.id <> t2.groupnumber
)
PRINT 'Remaining Rows: ' + CAST(#RemainingRows AS VARCHAR) PRINT 'Counter: ' + CAST(#LoopCounter AS VARCHAR);
END
SELECT * FROM #testtable
IF Object_id('tempdb..#testTable') IS NOT NULL
BEGIN
DROP TABLE # testtable
END```

How to unpivot a single row in Oracle 11?

I have a row of data and I want to turn this row into a column so I can use a cursor to run through the data one by one. I have tried to use
SELECT * FROM TABLE(PIVOT(TEMPROW))
but I get
'PIVOT' Invalid Identifier error.
I have also tried that same syntax but with
('select * from TEMPROW')
Everything I see using pivot is always using count or sum but I just want this one single row of all varchar2 to turn into a column.
My row would look something like this:
ABC | 123 | aaa | bbb | 111 | 222 |
And I need it to turn into this:
ABC
123
aaa
bbb
111
222
My code is similar to this:
BEGIN
OPEN C_1 FOR SELECT * FROM TABLE(PIVOT( 'SELECT * FROM TEMPROW'));
LOOP
FETCH C_1 INTO TEMPDATA;
EXIT WHEN C_2%NOTFOUND;
DBMS_OUTPUT.PUT_LINE(1);
END LOOP;
CLOSE C_1;
END;

You have to unpivot to convert whole row into 1 single column
select * from Table
UNPIVOT
(col for col in (
'ABC' , '123' , 'aaa' ,' bbb' , '111' , '222'
))
or use union but for that you need to add col names manually like
Select * from ( Select col1 from table
union
select col2 from table union...
Select coln from table)
sample output to show as below

One option for unpivoting would be numbering columns by decode() and cross join with the query containing the column numbers :
select decode(myId, 1, col1,
2, col2,
3, col3,
4, col4,
5, col5,
6, col6 ) as result_col
from temprow
cross join (select level AS myId FROM dual CONNECT BY level <= 6 );
or use a query with unpivot keyword by considering the common expression for the column ( namely col in this case ) must have same datatype as corresponding expression :
select result_col from
(
select col1, to_char(col2) as col2, col3, col4,
to_char(col5) as col5, to_char(col6) as col6
from temprow
)
unpivot (result_col for col in (col1,col2,col3,col4,col5,col6));
Demo

Find min max over all columns without listing down each column name in SQL

I have a SQL table (actually a BigQuery table) that has a huge number of columns (over a thousand). I want to quickly find the min and max value of each column. Is there a way to do that?
It is impossible for me to list all the columns. Looking for ways to do something like
SELECT MAX(*) FROM mytable;
and then running
SELECT MIN(*) FROM mytable;
I have been unable to Google a way of doing that. Not sure that's even possible.
For example, if my table has the following schema:
col1 col2 col3 .... col1000
the (say, max) query should return
Row col1 col2 col3 ... col1000
1 3 18 0.6 ... 45
and the min query should return (say)
Row col1 col2 col3 ... col1000
1 -5 4 0.1 ... -5
The numbers are just for illustration. The column names could be different strings and not easily scriptable.

See below example for BigQuery Standard SQL - it works for any number of columns and does not require explicit calling/use of columns names
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
)
SELECT
MIN(CAST(value AS INT64)) AS min_value,
MAX(CAST(value AS INT64)) AS max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value
with result
Row min_value max_value
1 -1 11
Note: if your columns are of STRING data type - you should remove CAST ... AS INT64
Or if they are of FLOAT64 - replace INT64 with FLOAT64 in the CAST function
Update
Below is option to get MIN/Max for each column and present result as array of respective values as list of respective values in the order of the columns
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 14 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
), temp AS (
SELECT pos, MIN(CAST(value AS INT64)) min_value, MAX(CAST(value AS INT64)) max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value WITH OFFSET pos
GROUP BY pos
)
SELECT 'min_values' stats, TO_JSON_STRING(ARRAY_AGG(min_value ORDER BY pos)) vals FROM temp UNION ALL
SELECT 'max_values', TO_JSON_STRING(ARRAY_AGG(max_value ORDER BY pos)) FROM temp
with result as
Row stats vals
1 min_values [-1,2,3,4]
2 max_values [7,11,5,14]
Hope this is something you can still apply to whatever your final goal

Count comma separated values of all columns in Oracle SQL

I have already went through a number of questions and I couldn't find what I am exactly looking for.
Suppose I have a table as follows :
Col1 Col2 Col3
1,2,3 2,3,4,5,1 5,6
I need to get a result as follows using a select statement:
Col1 Col2 Col3
1,2,3 2,3,4,5,1 5,6
3 5 2
Note the added third column is the count of comma separated values.
Finding the count for a single column is simple, but this seems difficult if not impossible.
Thanks in advance.

select
col1,
regexp_count(col1, ',') + 1 as col1count,
col2,
regexp_count(col2, ',') + 1 as col2count,
col3,
regexp_count(col3, ',') + 1 as col3count
from t
FIDDLE
FIDDLE2

Per Count the number of elements in a comma separated string in Oracle an easy way to do this is to count the number of commas and then add 1
You just need the result unioned onto your original data. So, do that:
SQL> with the_data (col1, col2, col3) as (
2 select '1,2,3', '2,3,4,5,1', '5,6' from dual
3 )
4 select a.*
5 from the_data a
6 union all
7 select to_char(regexp_count(col1, ',') + 1)
8 , to_char(regexp_count(col2, ',') + 1)
9 , to_char(regexp_count(col3, ',') + 1)
10 from the_data;
COL1 COL2 COL
----- --------- ---
1,2,3 2,3,4,5,1 5,6
3 5 2
You need to convert the result to a character because you're unioning a character to a number, which Oracle will complain about.
It's worth noting that storing data in this manner violates the first normal form. This makes it far more difficult to manipulate and almost impossible to constrain to be correct. It's worth considering normalising your data model to make this, and other queries, simpler.

Finding the count for a single column is simple, but this seems difficult if not impossible.
So you don't to look for each column manually? You want it dynamically.
The design is actually flawed since it violates normalization. But if you are willing to stay with it, then you could do it in PL/SQL using REGEXP_COUNT.
Something like,
SQL> CREATE TABLE t AS
2 SELECT '1,2,3' Col1,
3 '2,3,4,5,1' Col2,
4 '5,6' Col3
5 FROM dual;
Table created.
SQL>
SQL> DECLARE
2 cnt NUMBER;
3 BEGIN
4 FOR i IN
5 (SELECT column_name FROM user_tab_columns WHERE table_name='T'
6 )
7 LOOP
8 EXECUTE IMMEDIATE 'select regexp_count('||i.column_name||', '','') + 1 from t' INTO cnt;
9 dbms_output.put_line(i.column_name||' has cnt ='||cnt);
10 END LOOP;
11 END;
12 /
COL3 has cnt =2
COL2 has cnt =5
COL1 has cnt =3
PL/SQL procedure successfully completed.
SQL>
Probably, there will be an XML solution in SQL itself, without using PL/SQL.
In SQL -
SQL> WITH DATA AS
2 ( SELECT '1,2,3' Col1, '2,3,4,5,1' Col2, '5,6' Col3 FROM dual
3 )
4 SELECT regexp_count(col1, ',') + 1 cnt1,
5 regexp_count(col2, ',') + 1 cnt2,
6 regexp_count(col3, ',') + 1 cnt3
7 FROM t;
CNT1 CNT2 CNT3
---------- ---------- ----------
3 5 2
SQL>

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Group by numbers according to their difference - sql

I have a table and I want to group rows that have at most x difference at col2. For example, col1 col2 abg 3 abw 4 abc 5 abd 6 abe 20 abf 21 After query I want to get groups such that group 1: abg 3 abw 4 abc 5 abd 6 group 2: abe 20 abf 21 In this example difference is 1. How can write such a query?

For Oracle (or anything that supports window functions) this will work: select col1, col2, sum(group_gen) over (order by col2) as grp from ( select col1, col2, case when col2 - lag(col2) over (order by col2) > 1 then 1 else 0 end as group_gen from some_table ) Check it on SQLFiddle.

try this query, you can use 1 and 2 as input and get you groups: var grp number(5) exec :grp :=1 select * from YourTABLE where (:grp = 1 and col2 < 20) or (:grp = 2 and col2 > 6);

Related

How to update each column one at time for each row in snowflake

Group identifiers/values that are related with each other between multiple columns

How to unpivot a single row in Oracle 11?

Find min max over all columns without listing down each column name in SQL

Count comma separated values of all columns in Oracle SQL

Categories

Resources