Merge multiple rows with same ID into one row - sql

How can I merge multiple rows with same ID into one row.
When value in first and second row in the same column is the same or when there is value in first row and NULL in second row.
I don't want to merge when value in first and second row in the same column is different.
I have table:
ID |A |B |C
1 NULL 31 NULL
1 412 NULL 1
2 567 38 4
2 567 NULL NULL
3 2 NULL NULL
3 5 NULL NULL
4 6 1 NULL
4 8 NULL 5
4 NULL NULL 5
I want to get table:
ID |A |B |C
1 412 31 1
2 567 38 4
3 2 NULL NULL
3 5 NULL NULL
4 6 1 NULL
4 8 NULL 5
4 NULL NULL 5

I think there's a simpler solution to the above answers (which is also correct). It basically gets the merged values that can be merged within a CTE, then merges that with the data not able to be merged.
WITH CTE AS (
SELECT
ID,
MAX(A) AS A,
MAX(B) AS B,
MAX(C) AS C
FROM dbo.Records
GROUP BY ID
HAVING MAX(A) = MIN(A)
AND MAX(B) = MIN(B)
AND MAX(C) = MIN(C)
)
SELECT *
FROM CTE
UNION ALL
SELECT *
FROM dbo.Records
WHERE ID NOT IN (SELECT ID FROM CTE)
SQL Fiddle: http://www.sqlfiddle.com/#!6/29407/1/0

WITH Collapsed AS (
SELECT
ID,
A = Min(A),
B = Min(B),
C = Min(C)
FROM
dbo.MyTable
GROUP BY
ID
HAVING
EXISTS (
SELECT Min(A), Min(B), Min(C)
INTERSECT
SELECT Max(A), Max(B), Max(C)
)
)
SELECT
*
FROM
Collapsed
UNION ALL
SELECT
*
FROM
dbo.MyTable T
WHERE
NOT EXISTS (
SELECT *
FROM Collapsed C
WHERE T.ID = C.ID
);
See this working in a SQL Fiddle
This works by creating all the mergeable rows through the use of Min and Max--which should be the same for each column within an ID and which usefully exclude NULLs--then appending to this list all the rows from the table that couldn't be merged. The special trick with EXISTS ... INTERSECT allows for the case when a column has all NULL values for an ID (and thus the Min and Max are NULL and can't equal each other). That is, it functions like Min(A) = Max(A) AND Min(B) = Max(B) AND Min(C) = Max(C) but allows for NULLs to compare as equal.
Here's a slightly different (earlier) solution I gave that may offer different performance characteristics, and being more complicated, I like less, but being a single flowing query (without a UNION) I kind of like more, too.
WITH Collapsible AS (
SELECT
ID
FROM
dbo.MyTable
GROUP BY
ID
HAVING
EXISTS (
SELECT Min(A), Min(B), Min(C)
INTERSECT
SELECT Max(A), Max(B), Max(C)
)
), Calc AS (
SELECT
T.*,
Grp = Coalesce(C.ID, Row_Number() OVER (PARTITION BY T.ID ORDER BY (SELECT 1)))
FROM
dbo.MyTable T
LEFT JOIN Collapsible C
ON T.ID = C.ID
)
SELECT
ID,
A = Min(A),
B = Min(B),
C = Min(C)
FROM
Calc
GROUP BY
ID,
Grp
;
This is also in the above SQL Fiddle.
This uses similar logic as the first query to calculate whether a group should be merged, then uses this to create a grouping key that is either the same for all rows within an ID or is different for all rows within an ID. With a final Min (Max would have worked just as well) the rows that should be merged are merged because they share a grouping key, and the rows that shouldn't be merged are not because they have distinct grouping keys over the ID.
Depending on your data set, indexes, table size, and other performance factors, either of these queries may perform better, though the second query has some work to do to catch up, with two sorts instead of one.

You can try something like this:
select
isnull(t1.A, t2.A) as A,
isnull(t1.B, t2.B) as B,
isnull(t1.C, t2.C) as C
from
table_name t1
join table_name t2 on t1.ID = t2.ID and .....
You mention the concepts of first and second. How do
you define this order? Place that order defining condition
in here: .....
Also, I assume you have exactly 2 rows for each ID value.

Related

Top n distinct values of one column in Oracle

I'm using a query where a part of it gets the top 3 of a certain column.
It creates a distinct subquery of the column, limited by 3 number of rows, and then filters those rows to the main query to do the top 3.
WITH subquery AS (
SELECT col FROM (
SELECT DISTINCT col
FROM tbl
) WHERE ROWNUM <= 3
)
SELECT col
FROM tbl
WHERE tbl.col = subquery.col
So the original table is like this:
col
-----
a
a
a
b
b
b
c
d
d
e
f
f
f
f
And the query returns the top 3 of the column (not the top 3 rows which would only be a):
col
-----
a
a
a
b
b
b
c
I'm trying to learn if there is a more correct way of doing this as the real query is big and duplicating its size with a subquery that looks almost the same just to get the top 3 is hard to work with and understand/modify.
Is there a better way to do the top first 3 distinct values of one column in Oracle?
Yes, you can use dense_rank and avoid duplicated code:
select col
from (select col, dense_rank() over (order by col) rnk from tbl)
where rnk <= 3
demo

Sql in oracle to find out missing records from its distinct values

I am sorry , this one is not working... May be I should have clarified this earlier. The values A,B,C,D etc... Are the Distinct values for CODE in the Table. There are several hundreds of IDs in the table and each ID can have one to many Code values. In the above example assume that there are 5 distinct values of Code from table A. There are 3 IDs and each ID is associated in Table A as follows
ID Code
1 A
1 B
1 C
2 D
2 A
3 B
3 C
4 A
4 B
4 C
4 D
4 E
As you see above there are several IDs associated with different Code values. I need the result as follows
ID CODE
1 D
1 E
2 B
2 C
2 E
3 A
3 D
3 E
ID 4 should not return anything because it contain all possible Codes (in this case A,B,C,D,E)
First you should take distinct values for both column in different sub-query, second cross join them - that will give you all possible combination,
finally exclude combination which are already presnet
select *
from
(select distinct ID
from your_table) ytI, /* this sub-query will return all possible ID */
(select distinct code
from your_table) ytc /* this sub-query will return all possible code */
where (ytI.ID,ytc.Code) /* there will be cross-join as there are no join condition between first two tables*/
not in /* exclude those records which are already present */
(select id,code
from your_table yt_i)
try this
select T2.ID, T1.missing_value
from
(
select 'A' missing_value from dual UNION
select 'B' from dual UNION
select 'C' from dual UNION
select 'D' from dual UNION
select 'E' from dual
) T1,
(
select distinct id from MYTABLE
) T2
WHERE NOT EXISTS
(
SELECT * FROM MYTABLE M WHERE M.CODE = T1.missing_value and M.ID = T2.ID
)
ORDER BY T2.ID, T1.missing_value

GROUP BY one column to find MAX, but keep value from another column - SQL

A | B | num
----------------------
123 1 2
123 10 5
Result:
A | B | max_num
-------------------------
123 10 5
Let's say the table name is tab, currently I have
SELECT T.A, MAX(T.num) AS max_num
FROM tab T
GROUP BY T.A
However, the result will not contain the column B.
SELECT T.A, T.B... GROUP BY T.A, T.B
Will also not give the desired result, since max is found based on the A,B pair.
How can I choose the max of num grouped by only A, but then keep the value of B for the max row that is chosen?
1.Select Max num from table
2.Just filter of IN Clause
select * from Mytable where
num in(
select TOP 1 MAX(num)
from mytab
group by colA)
or
For SQL SERVER
You can Use Window function for single Max using ROW_NUMBER ()
select * from (
select ROW_NUMBER () OVER (ORDER BY num desc) rn,*
from tab
)d where d.rn=1
This should do the job:
Select t1.A, T1.B,T1.num from tab t1 where (T1.A,T1.num) in (
SELECT T.A, MAX(T.num) AS max_num
FROM tab T
GROUP BY T.A)
Selection the Record where num equals the max(num)
See the SQLFIDDLE
Do you mean you want the whole rows where c = the max(c) value for each a? This one will give both rows if it's a tie:
select a, b, c
from t as t1
where c = (select max(c) from t t2
where t1.a = t2.a)

How to select distinct rows with a specified condition

Suppose there is a table
_ _
a 1
a 2
b 2
c 3
c 4
c 1
d 2
e 5
e 6
How can I select distinct minimum value of all the rows of each group?
So the expected result here is:
_ _
a 1
b 2
c 1
d 2
e 5
EDIT
My actual table contains more columns and I want to select them all. The rows differ only in the last column (the second one in the example). I'm new to SQL and possibly my question is ill-formed in it initial view.
The actual schema is:
| day | currency ('EUR', 'USD') | diff (integer) | id (foreign key) |
The are duplicate pairs (day, currency) that differ by (diff, id). I want to see a table with uniquer pairs (day, currency) with a minimum diff from the original table.
Thanks!
in your case it's as simple as this:
select column1, min(column2) as column2
from table
group by column1
for more than two columns I can suggest this:
select top 1 with ties
t.column1, t.column2, t.column3
from table as t
order by row_number() over (partition by t.column1 order by t.column2)
take a look at this post https://stackoverflow.com/a/13652861/1744834
You can use the ranking function ROW_NUMBER() to do this with a CTE. Especially, if there are more column other than these two column, it will give the distict values like so:
;WITH RankedCTE
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY column1 ORDER BY Colmn2 ) rownum
FROM Table
)
SELECT column1, column2
FROM RankedCTE
WHERE rownum = 1;
This will give you:
COLUMN1 COLUMN2
a 1
b 2
c 1
d 2
e 5
SQL Fiddle Demo
SELECT ColOne, Min(ColTwo)
FROM Table
GROUP BY ColOne
ORDER BY ColOne
PS: not front of a,machine, but give above a try please.
select MIN(col2),col1
from dbo.Table_1
group by col1

get subset of a table in SQL

I want to get a subset of a table, here's the example:
1 A
2 A
3 B
4 B
5 C
6 D
7 D
8 D
I want to get the unique record, but with the smallest id:
1 A
3 B
5 C
6 D
How can I write the SQL in SQL Server? Thanks!
Use a common-table expression like this:
;WITH DataCTE AS
(
SELECT ID, OtherCol,
ROW_NUM() OVER(PARTITION BY OtherCol ORDER BY ID) 'RowNum'
FROM dbo.YourTable
)
SELECT *
FROM DataCTE
WHERE RowNum = 1
This "partitions" your data by the second column you have (A, B, C) and orders by the ID (1, 2, 3) - smallest ID first.
Therefore, for each "partition" (i.e. each value of your second column), the entry with RowNum = 1 is the one with the smallest ID for each value of the second column.
select min(id), othercol
from thetable
group by othercol
and maybe with
order by othercol
... at the end if thats important
Try this:
SELECT MIN(Id) AS Id, Name
FROM MyTable
GROUP BY Name
select min(id), column2
from table
group by column2
It helps if you provide the table information in the question - I've just guessed at the column names...