Merge Columns in Oracle with distinct values - sql

Need help to merge columns in Oracle with distinct values.
I've one table called TEST with below data.
ID ID1 ID2 ID3
1 A B C
1 B P A
2 X Y Z
2 Y Z K
Need output as follows
ID MergedValues
1 A;B;C;P
2 X;Y;Z;K

This solution is close:
SELECT id, listagg(v, ';') WITHIN GROUP (ORDER BY v) AS MergedValues
FROM (
SELECT id, id1 AS v
FROM test
UNION
SELECT id, id2 AS v
FROM test
UNION
SELECT id, id3 AS v
FROM test
) t
GROUP BY id
SQLFiddle
It does not retain the order of encounter of MergedValues as you seem to have requested implicitly, but produces this:
| ID | MERGEDVALUES |
|----|--------------|
| 1 | A;B;C;P |
| 2 | K;X;Y;Z |

You can unpivot the columns into rows, and find the distinct values to remove duplicates:
select distinct id, val
from test
unpivot (val for pos in (id1 as 1, id2 as 2, id3 as 3));
And then apply listagg() to that:
select id,
listagg(val, ';') within group (order by val) as mergedvalues
from (
select distinct id, val
from test
unpivot (val for pos in (id1 as 1, id2 as 2, id3 as 3))
)
group by id
order by id;
With your sample data as a CTE:
with test (ID, ID1, ID2, ID3) as (
select 1, 'A', 'B', 'C' from dual
union all select 1, 'B', 'P', 'A' from dual
union all select 2, 'X', 'Y', 'Z' from dual
union all select 2, 'Y', 'Z', 'K' from dual
)
select id,
listagg(val, ';') within group (order by val) as mergedvalues
from (
select distinct id, val
from test
unpivot (val for pos in (id1 as 1, id2 as 2, id3 as 3))
)
group by id
order by id;
ID MERGEDVALUES
---------- ------------------------------
1 A;B;C;P
2 K;X;Y;Z
If the order within the list needs to match what you showed then it seems almost to be based on the first column the value was seen in, so you can do:
select id,
listagg(val, ';') within group (order by min_pos) as mergedvalues
from (
select id, val, min(pos) as min_pos
from test
unpivot (val for pos in (id1 as 1, id2 as 2, id3 as 3))
group by id, val
)
group by id
order by id;
ID MERGEDVALUES
---------- ------------------------------
1 A;B;P;C
2 X;Y;Z;K
which is closer but has C and P reversed; it isn't clear what should control that. Perhaps there is another column you haven't shown which implies a row order.

Here's my approach:
(Note: after posting, I see this resembles Alex Poole's approach, except that I order the input rows first.)
Order the input rows within each ID: you don't say how, I order by ID1,ID2,ID3
Unpivot the data, assigning numbers from 1 to 3 to the columns
Assign priorities to each value based on row order then column order
When a value appears more than once, keep only the minimum "priority"
Use LISTAGG, ordering by priority.
with data_with_rn as (
select t.*,
row_number() over(partition by id order by ID1,ID2,ID3) rn
from t
)
, unpivoted as (
select id, val,
row_number() over(partition by id order by rn, col) priority
from data_with_rn
unpivot(val for col in(ID1 as 1, ID2 as 2, ID3 as 3))
)
, grouped as (
select id, val, min(priority) priority
from unpivoted
group by id, val
)
select id, listagg(val, ';') within group(order by priority) vals
from grouped
group by id
order by id;
ID VALS
-- --------
1 A;B;C;P
2 X;Y;Z;K

Related

Duplicate handling using case statement

I have a table tab1. Case 1:if no dups then display col1 data. Case 2: If I find duplicate in col1,then max of sr_no should be considered. While considering this, I need to consider only data='xyz' others should be ignored.
Tab1 structure(not exactly) Col1 Sr Data
Could you please help me with the query. Tried with case condition but not getting desired output.
For example
Col1. Sr. Data.
1234. 1. ABC
1234. 2. MNO
1234. 3. XYZ
1234. 4. ABC
2345. 1. ABC
OUTPUT
Col1. Sr. Data
1234. 3. XYZ (as it is duplicated, select max of sr and data=XYZ)
2345. 1. ABC (As it is unique no checks for max and data=XYZ)
I think you want row_number() with a priority for XYZ:
select t.*
from (select t.*,
row_number() over (partition by col1 order by (case when data = 'XYZ' then 1 else 2 end), sr desc) as seqnum
from t
) t
where seqnum = 1;
Your logic appears to be:
SELECT Col1, Sr, Data
FROM (
SELECT t.*,
CASE max_cnt
WHEN 1
THEN 1
ELSE ROW_NUMBER() OVER ( PARTITION BY Col1 ORDER BY Sr DESC )
END AS rn
FROM (
SELECT t.*,
MAX( cnt ) OVER ( PARTITION BY Col1 ) AS max_cnt
FROM (
SELECT t.*,
COUNT(*) OVER ( PARTITION BY Col1, Data ) AS cnt
FROM table_name t
) t
) t
WHERE max_cnt = 1
OR data = 'XYZ'
)
WHERE rn = 1;
Which, for the sample data:
CREATE TABLE table_name ( Col1, Sr, Data ) AS
SELECT 1234, 1, 'ABC' FROM DUAL UNION ALL
SELECT 1234, 2, 'MNO' FROM DUAL UNION ALL
SELECT 1234, 3, 'XYZ' FROM DUAL UNION ALL
SELECT 1234, 4, 'ABC' FROM DUAL UNION ALL
SELECT 2345, 1, 'ABC' FROM DUAL;
Outputs:
COL1
SR
DATA
1234
3
XYZ
2345
1
ABC
db<>fiddle here

how to get unique data from multiple columns in db2

I wanted to get data from 2 columns in below way:
Id1 id2 id3
1 1 2
2 3 null
2 4 null
O/p
Id1 data
1 1,2
2 3,4
Here id1 is pk and id2 and id3 is fk of other table.
Try this as is:
WITH TAB (ID1, ID2, ID3) AS
(
VALUES
(1, 1, 2)
, (2, 3, NULL)
, (2, 4, NULL)
)
SELECT ID1, LISTAGG(DISTINCT ID23, ',') AS DATA
FROM
(
SELECT T.ID1, CASE V.ID WHEN 2 THEN T.ID2 ELSE T.ID3 END AS ID23
FROM TAB T
CROSS JOIN (VALUES 2, 3) V(ID)
)
WHERE ID23 IS NOT NULL
GROUP BY ID1;
This is a bit strange -- concatenating both within the same row and across multiple rows. One method is to unpivot and then aggregate:
select id1, listagg(id2, ',') within group (order by id2)
from (select id1, id2 from t union all
select id1, id3 from t
) t
where id2 is not null
group by id1;
Assuming that only id2 could be NULL, you can also express this as:
select id1,
listagg(concat(id2, coalesce(concat(',', id3), '')), ',') within group (order by id2)
from t
group by id1;

Is there a simple way to get an id associated with an aggregate like min and max?

I always brute force my way through the solution to the problem of getting an id associated with an aggregate operation like min and max through some ugly sql code. I am just wondering if there is a correct/clean way to solve this problem. Suppose you have the following:
SELECT 1 AS groupid, 1 AS id, 100 AS val
INTO #a
UNION
SELECT 1, 2, 50
UNION
SELECT 1, 3, 75
UNION
SELECT 2, 2, 120
UNION
SELECT 2, 4, 22
UNION
SELECT 2, 1, 45
NOTE#1: id is unique within a groupid
NOTE#2: val can have the same values so in that case the id column
will be the first id corresponding to val
Suppose I want the result to look like:
groupid | min_id | min_val | max_id | max_val
1 2 50 1 100
2 3 22 2 120
You can use conditional aggregation or window functions. For instance, you can use first_value():
select distinct group_id,
min(val) over (partition by groupid) as min_val,
first_value(id) over (partition by groupid order by val asc) as min_id,
max(val) over (partition by groupid) as max_val,
first_value(id) over (partition by groupid order by val desc) as max_id
from t;
Alas, SQL Server does not support first_value() as an aggregation function, so this uses the select distinct short-cut.
Use GROUP BY and simple aggregation functions like this
SELECT groupid, MIN(id) AS min_id, MIN(val) as min_val, MAX(id) AS max_id, MAX(val) as max_val
FROM table
GROUP BY groupid

Rows Columns Traverse

I have data in the below format
id idnew
1 2
3 4
2
4 7
6 8
7
Result Should be something like this
ID should be followed by idnew
1
2
3
4
2
4
7
6
8
7
Thanks in advance
This should maintain the order:
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS RowNumber
FROM myTable
UNION ALL
SELECT idnew, ROW_NUMBER() OVER (ORDER BY idnew) +
(SELECT COUNT(*) FROM dbo.myTable) AS RowNumber
FROM myTable
WHERE idnew IS NOT NULL
) a
ORDER BY RowNumber
I am assuming the id column is NOT NULL-able.
NOTE: If you want to keep the NULL values from the idnew column AND maintain the order, then remove the WHERE clause and ORDER BY id in the second select:
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER (ORDER BY id) AS RowNumber
FROM myTable
UNION ALL
SELECT idnew, ROW_NUMBER() OVER (ORDER BY id) +
(SELECT COUNT(*) FROM dbo.myTable) AS RowNumber
FROM myTable
) a
ORDER BY RowNumber
This is fully tested, try it here: https://rextester.com/DVZXO21058
Setting up the table as you described:
CREATE TABLE myTable (id INT, idnew INT);
INSERT INTO myTable (id, idnew)
VALUES (1, 2),
(3, 4),
(2, NULL),
(4, 7),
(6, 8),
(7, NULL);
SELECT * FROM myTable;
Here is the query to do the trick:
SELECT mixed_id FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS row_num,
id,
idnew
FROM myTable
) AS x
UNPIVOT
(
mixed_id for item in (id, idnew)
) AS y
WHERE mixed_id IS NOT NULL
ORDER BY row_num, mixed_id;
In order not to further complicate the query, this is taking advantage of 'id' would rank ahead of 'idnew' as a string. I believe string ranking is not the key issue here.
Using Cross Apply
;WITH CTE (id,idnew)
AS
(
SELECT 1,2 UNION ALL
SELECT 3,4 UNION ALL
SELECT 2,NULL UNION ALL
SELECT 4,7 UNION ALL
SELECT 6,8 UNION ALL
SELECT 7,NULL
)
SELECT New
FROM CTE
CROSS APPLY ( VALUES (id),(idnew))AS Dt (New)
WHERE dt.New IS NOT NULL
Result
New
---
1
2
3
4
2
4
7
6
8
7

SQL Grouping by Ranges

I have a data set that has timestamped entries over various sets of groups.
Timestamp -- Group -- Value
---------------------------
1 -- A -- 10
2 -- A -- 20
3 -- B -- 15
4 -- B -- 25
5 -- C -- 5
6 -- A -- 5
7 -- A -- 10
I want to sum these values by the Group field, but parsed as it appears in the data. For example, the above data would result in the following output:
Group -- Sum
A -- 30
B -- 40
C -- 5
A -- 15
I do not want this, which is all I've been able to come up with on my own so far:
Group -- Sum
A -- 45
B -- 40
C -- 5
Using Oracle 11g, this is what I've hobbled togther so far. I know that this is wrong, by I'm hoping I'm at least on the right track with RANK(). In the real data, entries with the same group could be 2 timestamps apart, or 100; there could be one entry in a group, or 100 consecutive. It does not matter, I need them separated.
WITH SUB_Q AS
(SELECT K_ID
, GRP
, VAL
-- GET THE RANK FROM TIMESTAMP TO SEPARATE GROUPS WITH SAME NAME
, RANK() OVER(PARTITION BY K_ID ORDER BY TMSTAMP) AS RNK
FROM MY_TABLE
WHERE K_ID = 123)
SELECT T1.K_ID
, T1.GRP
, SUM(CASE
WHEN T1.GRP = T2.GRP THEN
T1.VAL
ELSE
0
END) AS TOTAL_VALUE
FROM SUB_Q T1 -- MAIN VALUE
INNER JOIN SUB_Q T2 -- TIMSTAMP AFTER
ON T1.K_ID = T2.K_ID
AND T1.RNK = T2.RNK - 1
GROUP BY T1.K_ID
, T1.GRP
Is it possible to group in this way? How would I go about doing this?
I approach this problem by defining a group which is the different of two row_number():
select group, sum(value)
from (select t.*,
(row_number() over (order by timestamp) -
row_number() over (partition by group order by timestamp)
) as grp
from my_table t
) t
group by group, grp
order by min(timestamp);
The difference of two row numbers is constant for adjacent values.
A solution using LAG and windowed analytic functions:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST ( "Timestamp", "Group", Value ) AS
SELECT 1, 'A', 10 FROM DUAL
UNION ALL SELECT 2, 'A', 20 FROM DUAL
UNION ALL SELECT 3, 'B', 15 FROM DUAL
UNION ALL SELECT 4, 'B', 25 FROM DUAL
UNION ALL SELECT 5, 'C', 5 FROM DUAL
UNION ALL SELECT 6, 'A', 5 FROM DUAL
UNION ALL SELECT 7, 'A', 10 FROM DUAL;
Query 1:
WITH changes AS (
SELECT t.*,
CASE WHEN LAG( "Group" ) OVER ( ORDER BY "Timestamp" ) = "Group" THEN 0 ELSE 1 END AS hasChangedGroup
FROM TEST t
),
groups AS (
SELECT "Group",
VALUE,
SUM( hasChangedGroup ) OVER ( ORDER BY "Timestamp" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp
FROM changes
)
SELECT "Group",
SUM( VALUE )
FROM Groups
GROUP BY "Group", grp
ORDER BY grp
Results:
| Group | SUM(VALUE) |
|-------|------------|
| A | 30 |
| B | 40 |
| C | 5 |
| A | 15 |
This is typical "star_of_group" problem (see here: https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)
In your case, it would be as follows:
with t as (
select 1 timestamp, 'A' grp, 10 value from dual union all
select 2, 'A', 20 from dual union all
select 3, 'B', 15 from dual union all
select 4, 'B', 25 from dual union all
select 5, 'C', 5 from dual union all
select 6, 'A', 5 from dual union all
select 7, 'A', 10 from dual
)
select min(timestamp), grp, sum(value) sum_value
from (
select t.*
, sum(start_of_group) over (order by timestamp) grp_id
from (
select t.*
, case when grp = lag(grp) over (order by timestamp) then 0 else 1 end
start_of_group
from t
) t
)
group by grp_id, grp
order by min(timestamp)
;