Counting Combinations

Counting Combinations - sql

I have an Oracle table with multiple columns some populated with a variable, there a large number of possible variables the example below is not exhaustive.
ID Col1 Col2 Col3 Col4 Col5 Col6
-------------------------------------
1 X2 B2
2 C3 D1 R4
3 B2 X2
4 E4 T1 W2
5 X2 B2
6 R4 D1
7 D1 R4 C3
I need to identify the number of distinct combinations where row 1, row 3 and row 5 in the above example are considered the same combination and rows 2 and 7 are also considered the same. So the desired result would look like:
Col1 Col2 Col3 Col4 Col5 Col6 Count(*)
------------------------------------------------
B2 X2 3
C3 D1 R4 2
E4 T1 W2 1
D1 R4 1
But if I use this:
SELECT Col1, Col2, Col3, Col4, Col5, Col6, Count(*)
FROM MyTable
GROUP BY Col1, Col2, Col3, Col4, Col5, Col6
ORDER BY Count(*) DESC
Then row 3 in my data is considered unique. However, it has the same combination as rows 1 and row 5. Also row 2 and 7 are not considered the same and the result is:
Col1 Col2 Col3 Col4 Col5 Col6 Count(*)
------------------------------------------------
X2 B2 2
C3 D1 R4 1
B2 X2 1
E4 T1 W2 1
R4 D1 1
D1 R4 C3 1
It looks like I need to sort the col variables before comparing them. But is there an elegant solution to doing this for large record sets (3 million+ records) with up to 20 columns of data in Oracle?

There are two ways that come to my mind. First you can write a function accepting six or more strings and concatenating them in order. Then:
select colstring, count(*)
from
(
select id, concat_sorted(col1, col2, col3, col4, col5, col6) as colstring
from MyTable
)
group by colstring;
Another way would be to make each column a separate record and use listagg on them, provided you have Oracle 11g or higher available:
select colstring, count(*)
from
(
select id, listagg (colx, ',') within group (order by colx) as colstring
from
(
select id, col1 as colx from MyTable
union all
select id, col2 from MyTable
union all
select id, col3 from MyTable
union all
select id, col4 from MyTable
union all
select id, col5 from MyTable
union all
select id, col6 from MyTable
)
group by id
)
group by colstring

Try like this,
WITH t AS (
SELECT 1 ID, 'X2' col1, 'B2' col2, NULL col3, NULL col4, NULL col5, NULL col6 FROM dual
UNION
SELECT 2, 'C3', 'D1', 'R4', NULL, NULL, NULL FROM dual
UNION
SELECT 3, 'B2', 'X2', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 4, 'E4', 'T1', 'W2', NULL, NULL, NULL FROM dual
UNION
SELECT 5, 'X2', 'B2', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 6, 'R4', 'T1', NULL, NULL, NULL, NULL FROM dual
UNION
SELECT 7, 'D1', 'R4', 'C3', NULL, NULL, NULL FROM dual
)
SELECT col1, col2, col3, col4, col5, col6, tot_count
FROM (
SELECT col1, col2, col3, col4, col5, col6, cnt,
MAX(cnt) OVER (PARTITION BY val) AS tot_count,
row_number() OVER (PARTITION BY val ORDER BY cnt DESC) AS rn
FROM (
SELECT col1, col2, col3, col4, col5, col6, val, count(*) OVER (PARTITION BY val) cnt
FROM (
SELECT A.ID, col1, col2, col3, col4, col5, col6, val
FROM (SELECT ID, col1, col2, col3, col4, col5, col6
FROM t
) A,
(SELECT ID, listagg( val,',') WITHIN GROUP(ORDER BY val DESC) AS val
FROM (
SELECT ID, val
FROM t
unpivot ( val FOR origin IN (col1, col2, col3, col4, col5, col6))
)
GROUP BY ID
)b
WHERE A.ID = b.ID
)
ORDER BY val
)t1
)t2
WHERE tot_count = cnt
AND rn = 1
ORDER BY tot_count DESC;

Related

sort return data with SQL in SQLite order by row

I have a DB which has 8 columns, all are integers range 1~99:
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
1 13 24 18 35 7 50 88
13 4 33 90 78 42 26 57
22 18 30 3 57 90 71 8
...
When I perform "select Col1, Col2, Col3, Col5, Col6, Col7, Col8 from MyTable where Col4>10"
I would like the return data is sorted, e.g. the first row should return like this:
1,7,13,24,35,50,88
However, "order by" only work on "Column", is there anyway to preform this in SQL ? Or need a temp table/max() to perform this ? Thanks.
Regds
LAM Chi-fung

Your current design is not appropriate for this requirement.
Consider changing it to something like this:
CREATE TABLE tablename (
id INTEGER, -- corresponds to the rowid of your current table
col_id INTEGER NOT NULL, -- 1-8, corresponds to the number of each of the columns ColX
value INTEGER NOT NULL -- corresponds to the value of each of the columns ColX
);
You can populate it from your current table:
INSERT INTO tablename (id, col_id, value)
SELECT rowid, 1, Col1 FROM MyTable UNION ALL
SELECT rowid, 2, Col2 FROM MyTable UNION ALL
SELECT rowid, 3, Col3 FROM MyTable UNION ALL
SELECT rowid, 4, Col4 FROM MyTable UNION ALL
SELECT rowid, 5, Col5 FROM MyTable UNION ALL
SELECT rowid, 6, Col6 FROM MyTable UNION ALL
SELECT rowid, 7, Col7 FROM MyTable UNION ALL
SELECT rowid, 8, Col8 FROM MyTable
Now you can get the result that you want with GROUP_CONCAT() window function and aggregation:
SELECT result
FROM (
SELECT id, GROUP_CONCAT(value) OVER (PARTITION BY id ORDER BY value) result
FROM tablename
WHERE id IN (SELECT id FROM tablename WHERE col_id = 4 AND value > 10)
)
GROUP BY id
HAVING MAX(LENGTH(result))
See the demo.
Results:
result
1,7,13,18,24,35,50,88
4,13,26,33,42,57,78,90

Fix your data model! You should not be storing values in columns. You should be storing them in rows.
That is, SQL doesn't "sort columns". It deals with data in rows!
You can do what you want by unpivoting the data into rows, calculating the new order, and then reaggregating:
with t as (
select row_number() over () as id,
t.*
from mytable t
where col4 > 10
)
select max(case when seqnum = 1 then col end) as col1,
max(case when seqnum = 2 then col end) as col2,
max(case when seqnum = 3 then col end) as col3,
max(case when seqnum = 4 then col end) as col4,
max(case when seqnum = 5 then col end) as col5,
max(case when seqnum = 6 then col end) as col6,
max(case when seqnum = 7 then col end) as col7,
max(case when seqnum = 8 then col end) as col8
from (select row_number() over (partition by id order by col) as seqnum,
x.*
from (select id, col1 as col from t union all
select id, col2 as col from t union all
select id, col3 as col from t union all
select id, col4 as col from t union all
select id, col5 as col from t union all
select id, col6 as col from t union all
select id, col7 as col from t union all
select id, col8 as col from t
) x
) x
group by id;

Start calculation after the offset in LEAD

I've a weird scenario to work on. I have data in the way below.
Col1 col2 col3
a b 201921
a b 201923
a b 201924
a b 201925
a b 201927
Col1 and col2 etc are there for a partition and there are so many columns like those. I have a dynamic parameter which will feed the offset in LEAD function.
What LEAD doing is for every row, it will find the next value based on offset. But what I need is a little different. When the first row finds the offset, the next row should skip the offset number of rows.
For example, lead of 201921, 1 is 201923. So, the next calculation should happen from the row which has 201924. And then 201927 and further.
What I wrote currently is
Lead(col3,<dynamic param>,col3) over (partition by col1,col2 order by col3)
Is there such a thing to skip rows and continue from the next? I am a bit curious.
Expected output( for offset 1):
Col1 col2 col3 col4
a b 201921 201923
a b 201923 skip
a b 201924 201925
a b 201925 skip
a b 201927 201927
Expected output( for offset 2):
Col1 col2 col3 col4
a b 201921 201924
a b 201923 skip
a b 201924 skip
a b 201925 201925
a b 201927 201927

You can implement using a query determining COL4 using a CASE statement. This will be the base query. <dynamic param> is what will need to be replaced with your dymanic parameter.
SELECT col1,
col2,
col3,
CASE
WHEN ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY col3) + <dynamic param> >
COUNT (*) OVER (PARTITION BY col1, col2)
THEN
col3
WHEN MOD (ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY col3), <dynamic param> + 1) = 1
THEN
LEAD (col3, <dynamic param>) OVER (PARTITION BY col1, col2 ORDER BY col3)
END AS col4
FROM t;
Here are examples using the samples you provided
SQL> --offset of 1
SQL> WITH
2 t (col1, col2, col3)
3 AS
4 (SELECT 'a', 'b', 201921 FROM DUAL
5 UNION ALL
6 SELECT 'a', 'b', 201923 FROM DUAL
7 UNION ALL
8 SELECT 'a', 'b', 201924 FROM DUAL
9 UNION ALL
10 SELECT 'a', 'b', 201925 FROM DUAL
11 UNION ALL
12 SELECT 'a', 'b', 201927 FROM DUAL)
13 SELECT col1,
14 col2,
15 col3,
16 CASE
17 WHEN ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY col3) + 1 >
18 COUNT (*) OVER (PARTITION BY col1, col2)
19 THEN
20 col3
21 WHEN MOD (ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY col3), 1 + 1) = 1
22 THEN
23 LEAD (col3, 1) OVER (PARTITION BY col1, col2 ORDER BY col3)
24 END AS col4
25 FROM t;
COL1 COL2 COL3 COL4
_______ _______ _________ _________
a b 201921 201923
a b 201923
a b 201924 201925
a b 201925
a b 201927 201927
SQL> --offset of 2
SQL> WITH
2 t (col1, col2, col3)
3 AS
4 (SELECT 'a', 'b', 201921 FROM DUAL
5 UNION ALL
6 SELECT 'a', 'b', 201923 FROM DUAL
7 UNION ALL
8 SELECT 'a', 'b', 201924 FROM DUAL
9 UNION ALL
10 SELECT 'a', 'b', 201925 FROM DUAL
11 UNION ALL
12 SELECT 'a', 'b', 201927 FROM DUAL)
13 SELECT col1,
14 col2,
15 col3,
16 CASE
17 WHEN ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY col3) + 2 >
18 COUNT (*) OVER (PARTITION BY col1, col2)
19 THEN
20 col3
21 WHEN MOD (ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY col3), 2 + 1) = 1
22 THEN
23 LEAD (col3, 2) OVER (PARTITION BY col1, col2 ORDER BY col3)
24 END AS col4
25 FROM t;
COL1 COL2 COL3 COL4
_______ _______ _________ _________
a b 201921 201924
a b 201923
a b 201924
a b 201925 201925
a b 201927 201927

If I'm following your logic, you can use a case expression and analytic row_number() calculation to only populate col4 every offset row; something like (with n as the dynamic value):
select col1, col2, col3,
case when mod(row_number() over (partition by col1, col2 order by col3) + n, n + 1) = 0
then
lead(col3, n) over (partition by col1, col2 order by col3)
end as col4
from your_table
order by col1, col2, col3;
db<>fiddle
But that leaves the value in the last row for the partition null. Based on your second example you seem to actually want the last n rows to always have their own col3 value, which you could determine from lead() being null and then coalescing:
case when
mod(row_number() over (partition by col1, col2 order by col3) + n, n + 1) = 0
or lead(col3, n) over (partition by col1, col2 order by col3) is null
then
coalesce(lead(col3, n) over (partition by col1, col2 order by col3), col3)
end as col4
db<>fiddle
or using additional case branches:
case when
lead(col3, n) over (partition by col1, col2 order by col3) is null
then
col3
when
mod(row_number() over (partition by col1, col2 order by col3) + n, n + 1) = 0
then
lead(col3, n) over (partition by col1, col2 order by col3)
end as col4
db<>fiddle
If col3 can be nullable then you could always set the last n rows in the partition to their col3 answer instead of checking if the lead is null:
case when
row_number() over (partition by col1, col2 order by col3 desc) <= n
then
col3
when
mod(row_number() over (partition by col1, col2 order by col3) + n, n + 1) = 0
then
lead(col3, n) over (partition by col1, col2 order by col3)
end as col4
db<>fiddle

Listagg function with rep substr function

RAW DATA:
Col1 Col2 Col3 Col4
Ajay G1 B1 10.201.131.27
Ajay G1 B2 10.201.131.27
Ajay G1 B1 10.201.131.28
Ajay G1 B2 10.201.131.28
Ajay G1 B1 10.201.131.29
Ajay G1 B2 10.201.131.29
EXPECTED OUTPUT using Oracle 10g
Col1 Col2 Col3 Col4
Ajay G1 B1,B2 10.201.131.27,10.201.131.28, 10.201.131.29
Would be very glad if someone is able to help.
Query I used:
select * from (select
Col1,
Col2,
substr(regexp_replace(','||(LISTAGG(( CASE WHEN T1.COl3 IS NULL OR TRIM( T1.COl3 ) ='' THEN NULL ELSE T1.COl3 END), ',') WITHIN GROUP (ORDER BY T1.COl1 )), '(,[^,]+)(\1)+', '\1'),2) as COl3 ,
substr(regexp_replace(','||(LISTAGG(( CASE WHEN T1.COl4 IS NULL OR TRIM( T1.COl4 ) ='' THEN NULL ELSE T1.COl4 END), ',') WITHIN GROUP (ORDER BY T1.COl1 )), '(,[^,]+)(\1)+', '\1'),2) as COl4 ,
from T1
Group by
Col1
)abc
The OUTOPUT I GOT is below,
Col1 Col2 Col3 Col4
Ajay G1 B1,B2 10.201.131.27
Ajay G1 B1,B2 10.201.131.28
Ajay G1 B1,B2 10.201.131.29
Thanks in Advance.

To eliminate duplicates from LISTAGG you may use in Oracle 10 the row_number function to define the order of the duplication.
In the next step you pass to the LISTAGG function only the first duplicated (row_number = 1), all higher duplicates are reset to NULL that is ignored by LISTAGG.
Here the query
with t2 as (
select
COL1, COL2,
COL3,
row_number() over (partition by col1, col2, col3 order by null) as rn3,
COL4,
row_number() over (partition by col1, col2, col4 order by null) as rn4
from t)
select
COL1, COL2,
listagg(case when rn3 = 1 then COL3 end,',') within group (order by COL3) COL3,
listagg(case when rn4 = 1 then COL4 end,',') within group (order by COL4) COL4
from t2
group by COL1, COL2
result
COL1, COL2, COL3, COL4
Ajay G1 B1,B2 10.201.131.27,10.201.131.28,10.201.131.29
Note that this approach is far superior that the afterwards elimination with REGEXP as for non-trivial data you to often encounters ORA-01489: result of string concatenation is too long before you can start the elimination.
Note also, that you can upgrade to Oracle 19 (which can be considered overdue from the point of Oracle 10) and you may use the feature LISTAGG (DISTINCT with the same affect without the need of elimination duplicates. This version also elegantl yhandles the owerflow problems.

with t (Col1, Col2, Col3, Col4) as (
select 'Ajay', 'G1', 'B1', '10.201.131.27' from dual union all
select 'Ajay', 'G1', 'B2', '10.201.131.27' from dual union all
select 'Ajay', 'G1', 'B1', '10.201.131.28' from dual union all
select 'Ajay', 'G1', 'B2', '10.201.131.28' from dual union all
select 'Ajay', 'G1', 'B1', '10.201.131.29' from dual union all
select 'Ajay', 'G1', 'B2', '10.201.131.29' from dual)
, t1 as (
select Col1, Col2
, listagg(Col3, ',') within group (order by Col3) x
, listagg(Col4, ',') within group (order by Col4) y
from t
group by Col1, Col2
)
select Col1, Col2
, rtrim(regexp_replace(x || ',', '([^,]+,)\1+', '\1'), ',') Col3_
, rtrim(regexp_replace(y || ',', '([^,]+,)\1+', '\1'), ',') Col4_
from t1
;
COL1 CO COL3_ COL4_
---- -- ------------------------------ ------------------------------------------------------------
Ajay G1 B1,B2 10.201.131.27,10.201.131.28,10.201.131.29
I was showing it in two steps, but it can be in one step as well.
select Col1, Col2
, rtrim(regexp_replace(listagg(Col3, ',') within group (order by Col3) || ',', '([^,]+,)\1+', '\1'), ',') Col3_
, rtrim(regexp_replace(listagg(Col4, ',') within group (order by Col4) || ',', '([^,]+,)\1+', '\1'), ',') Col4_
from t
group by Col1, Col2
;

Pick minimum value and update all the rows SQL

I have a scenario where I need to pick up a minimum value of a priority column and take the product of those and put it in all the columns.
SD PL PRIO PRDT PNAME
1 29 10 MM CAR
1 LI 20 SS BRAKE
1 AA 30 AA ZZZZ
Since the Priority 10 is the minimum of gorup SD 1 MM should be replaced like below.
SD PL PRIO PRDT PNAME
1 29 10 MM CAR
1 LI 20 MM BRAKE
1 AA 30 MM ZZZZ
Could you please help with the select query.

You can use ROW_NUMBER:
SELECT
t.SD, t.PL, t.PRIO, t2.PRDT, t.PNAME
FROM YourTable t
INNER JOIN(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY SD ORDER BY PRIO) AS rn
FROM YourTable
) t2
ON t.SD = t2.SD
WHERE t2.rn = 1
How about using a correlated subquery:
UPDATE YourTable t
SET PRDT = (
SELECT PRDT
FROM YourTable t2
WHERE
t2.SD = t.SD
AND t2.PRIO = (SELECT MIN(t3.PRIO) FROM YourTable t3 WHERE t3.SD = t.SD)
)

Try this if you want only SELECT Query.
SELECT COL1,
COL2,
MIN(COL3) OVER(PARTITION BY COL1 ORDER BY COL3 ASC) COL3,
COL4,
COL5
FROM
(SELECT 1 COL1, '29' COL2, 10 COL3, 'MM' COL4, 'CAR' COL5 FROM DUAL
UNION ALL
SELECT 1 COL1, 'LI' COL2, 20 COL3, 'SS' COL4, 'BRAKE' COL5 FROM DUAL
UNION ALL
SELECT 1 COL1, 'AA' COL2, 30 COL3, 'AA' COL4 , 'ZZZZ' COL5 FROM DUAL
UNION ALL
SELECT 2 COL1, '29' COL2, 10 COL3, 'MM' COL4, 'CAR' COL5 FROM DUAL
UNION ALL
SELECT 2 COL1, 'LI' COL2, 05 COL3, 'SS' COL4, 'BRAKE' COL5 FROM DUAL
UNION ALL
SELECT 2 COL1, 'AA' COL2, 30 COL3, 'AA' COL4 , 'ZZZZ' COL5 FROM DUAL
);

Getting the value of no grouping column

I know the basics in SQL programming and I know how to apply some tricks in SQL Server in order to get the result set, but I don't know all tricks in Oracle.
I have these columns:
col1 col2 col3
And I wrote this query
SELECT
col1, MAX(col3) AS mx3
FROM
myTable
GROUP BY
col1
And I need to get the value of col2 in the same row where I found the max value of col3, do you know some trick to solve this problem?

The easiest way to do this, IMHO, is not to use max, but the window function rank:
SELECT col1 , col2, col3
FROM (SELECT col1, col2, col3,
RANK() OVER (PARTITION BY col1 ORDER BY col3 DESC) rk
FROM myTable) t
WHERE rk = 1
BTW, the same syntax should also work for MS SQL-Server and most other modern databases, with MySQL being the notable exception.

A couple of different ways to do this:
In both cases I'm treating your initial query as either a common table expression or as an inline view and joining it back to the base table to get your added column. The trick here is that the INNER JOIN eliminates all the records not in your max query.
SELECT A.*,
FROM myTable A
INNER JOIN (SELECT col1 , MAX( col3 ) AS mx3 FROM myTable GROUP BY col1) B
on A.Col1=B.Col1
and B.mx3 = A.Col3
or
with CTE AS (SELECT col1 , MAX( col3 ) AS mx3 FROM myTable GROUP BY col1)
SELECT A.*
FROM MyTable A
INNER JOIN CTE
on A.col1 = B.Col1
and A.col3= cte.mx3

Here's an alternative that's just a slight extension of your existing group by query (ie. doesn't require querying the same table more than once):
with mytable as (select 1 col1, 1 col2, 1 col3 from dual union all
select 1 col1, 2 col2, 2 col3 from dual union all
select 1 col1, 1 col2, 3 col3 from dual union all
select 1 col1, 3 col2, 3 col3 from dual union all
select 2 col1, 10 col2, 1 col3 from dual union all
select 2 col1, 23 col2, 2 col3 from dual union all
select 2 col1, 12 col2, 2 col3 from dual)
SELECT
col1,
MAX(col2) keep (dense_rank first order by col3 desc) mx2,
MAX(col3) AS mx3
FROM
myTable
GROUP BY
col1;
COL1 MX2 MX3
---------- ---------- ----------
1 3 3
2 23 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Counting Combinations - sql

Related

sort return data with SQL in SQLite order by row

Start calculation after the offset in LEAD

Listagg function with rep substr function

Pick minimum value and update all the rows SQL

Getting the value of no grouping column

Categories

Resources