I have a table with data like the following
key | A | B | C
---------------------------
1 | x | 0 | 1
2 | x | 2 | 0
3 | x | NULL | 4
4 | y | 7 | 1
5 | y | 3 | NULL
6 | z | NULL | 4
And I want to merge the rows together based on column A with largest primary key being the 'tie breaker' between values that are not NULL
Result
key | A | B | C
---------------------------
1 | x | 2 | 4
2 | y | 3 | 1
3 | z | NULL | 4
What would be the best way to achieve this assuming my data is actually 40 columns and 1 million rows with an unknown level of duplications?
Using ROW_NUMBER and conditional aggregation:
SQL Fiddle
WITH cte AS(
SELECT *,
rnB = ROW_NUMBER() OVER(PARTITION BY A ORDER BY CASE WHEN B IS NULL THEN 0 ELSE 1 END DESC, [key] DESC),
rnC = ROW_NUMBER() OVER(PARTITION BY A ORDER BY CASE WHEN C IS NULL THEN 0 ELSE 1 END DESC, [key] DESC)
FROM tbl
)
SELECT
[key] = ROW_NUMBER() OVER(ORDER BY A),
A,
B = MAX(CASE WHEN rnB = 1 THEN B END),
C = MAX(CASE WHEN rnC = 1 THEN C END)
FROM cte
GROUP BY A
Related
I am not sure of the logic required to accomplish this, but I want to take a table like this...
+----+------+
| Id | Type |
+----+------+
| 10 | A |
| 10 | B |
| 10 | C |
| 20 | A |
| 20 | C |
+----+------+
...and end up with a table like this...
+----+------+---+---+---+
| Id | Type | A | B | C |
+----+------+---+---+---+
| 10 | A | 1 | 1 | 1 |
| 10 | B | 1 | 1 | 1 |
| 10 | C | 1 | 1 | 1 |
| 20 | A | 1 | 0 | 1 |
| 20 | C | 1 | 0 | 1 |
+----+------+---+---+---+
...where each Id will have new columns created to consolidate information about Type into every row of that Id. Since 10 has a row of types A, B, and C, then all rows that have an ID of 10 should have a 1/true in the new columns A, B and C.
I know how to do this on a per-row basis, but can't wrap my head around how to consolidate the information from multiple rows into each row of the same ID.
Try this below logic- Demo
SELECT *,
(SELECT COUNT(DISTINCT Type) FROM your_table B WHERE B.ID = A.Id and B.Type = 'A') A,
(SELECT COUNT(DISTINCT Type) FROM your_table C WHERE C.ID = A.Id and C.Type = 'B') B,
(SELECT COUNT(DISTINCT Type) FROM your_table D WHERE D.ID = A.Id and D.Type = 'C') C
FROM your_table A
And just another option- Demo
SELECT *,
SUM(CASE WHEN Type= 'A' THEN 1 ELSE 0 END) OVER(PARTITION BY Id) A,
SUM(CASE WHEN Type= 'B' THEN 1 ELSE 0 END) OVER(PARTITION BY Id) B,
SUM(CASE WHEN Type= 'C' THEN 1 ELSE 0 END) OVER(PARTITION BY Id) C
FROM your_table
I have 4 columns a ,b ,c, d
sample data
a | b | c | d |
1 | 1 | 101 | 0
2 | 1 | 101 | 0
3 | 1 | 101 | 1
4 | 1 | 102 | 0
5 | 1 | 102 | 0
1 | 2 | 101 | 0
2 | 2 | 101 | 1
Write a SQL command such that it should return those rows where for every value of c in b, return rows with maximum a
i.e
Expect output
a | b | c | d |
3 | 1 | 101 | 1
5 | 1 | 102 | 0
2 | 2 | 101 | 1
You can use a correlated subquery:
select t.*
from t
where t.a = (select max(t2.a) from t t2 where t2.b = t.b and t2.c = t.c);
With an index on t(b, c, a), this often has the best performance.
An alternative is window functions:
select t.*
from (select t.*, row_number() over (partition by b, c order by a desc) as seqnum
from t
) t
where seqnum = 1;
You don't mention the database you are using. In PostgreSQL you can do:
select distinct on (b, c) a, b, c, d
from t
order by b, c, a desc
Both of these tables already exist, so not looking for a dynamic situation. The goal is to consolidate the data rows horizontally, but have them to the leftmost "data" field available. There will never be a 4th entry.
I am using Microsoft SQL Server
Table1:
ID|Data
--------
A | 1
A | 2
B | 3
C | 4
C | 5
C | 6
Table2:
ID | Data 1 | Data 2 | Data 3
------------------------------
A | | |
B | | |
C | | |
Desired Result of Table2:
ID | Data 1 | Data 2 | Data 3
------------------------------
A | 1 | 2 |
B | 3 | |
C | 6 | 7 | 8
You can use row_number:
select id,
max(case when rn = 1 then data end) as data_1,
max(case when rn = 2 then data end) as data_2,
max(case when rn = 3 then data end) as data_3
from (
select t.*,
row_number() over (
partition by id order by data
) as rn
from your_table t
) t
group by id;
I have the following table that I want to group by type. When there are multiple rows with the same type (e.g., A & B type), I want to preserve the 'value' from the row with the highest rank (i.e., primary > secondary > tertiary..)
rowid | type | rank | value
1 | A | primary | 1
2 | A | secondary | 2
3 | B | secondary | 3
4 | B | tertiary | 4
5 | C | primary | 5
So the resulting table should look like
rowid | type | rank | value
1 | A | primary | 1
3 | B | secondary | 3
5 | C | primary | 5
Any suggestions will be highly appreciated!
p.s., I'm working in MS SQL Server.
You can use row_number(). Here is a simple'ish method:
select t.*
from (select t.*,
row_number() over (partition by type
order by charindex(rank, 'primary,secondary,tertiary')
) as seqnum
from t
) t
where seqnum = 1;
This uses charindex() as a simple method of ordering the ranks.
try this,
;WITH CTE
AS (
SELECT *
,row_number() OVER (
PARTITION BY [type] ORDER BY value
) rn
FROM #t
)
SELECT *
FROM cte
WHERE rn = 1
Another way of doing is with Row_Number and an Order By specifying your rule with CASE.
Schema:
CREATE TABLE #TAB(rowid INT, [type] VARCHAR(1), rankS VARCHAR(50) , value INT)
INSERT INTO #TAB
SELECT 1 , 'A' , 'primary' , 1
UNION ALL
SELECT 2 , 'A' , 'secondary', 2
UNION ALL
SELECT 3 , 'B' , 'secondary' , 3
UNION ALL
SELECT 4 , 'B' , 'tertiary' , 4
UNION ALL
SELECT 5 , 'C' , 'primary' , 5
Now apply rank rule with Row_Number
SELECT * FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY [type] ORDER BY (CASE rankS
WHEN 'primary' THEN 1
WHEN 'secondary' THEN 2
WHEN 'tertiary' THEN 3 END )) AS SNO, * FROM #TAB
)A
WHERE SNO =1
Result:
+-----+-------+------+-----------+-------+
| SNO | rowid | type | rankS | value |
+-----+-------+------+-----------+-------+
| 1 | 1 | A | primary | 1 |
| 1 | 3 | B | secondary | 3 |
| 1 | 5 | C | primary | 5 |
+-----+-------+------+-----------+-------+
I have a view in a Oracle DB, it looks as follows:
id | type | numrows
----|--------|----------
1 | S | 2
2 | L | 3
3 | S | 2
4 | S | 2
5 | L | 3
6 | S | 2
7 | L | 3
8 | S | 2
9 | L | 3
10 | L | 3
The idea is: if TYPE is 'S' then return 2 rows (randomly), and if TYPE is 'L' then return 3 rows (randomly).
Example:
id | type | numrows
----|--------|----------
1 | S | 2
3 | S | 2
2 | L | 3
5 | L | 3
7 | L | 3
you should tell oracle how to get 3 rows or 2 rows. An ideea is to fabricate a row:
select id, type, numrows
from
(select
id,
type,
numrows,
row_number() over (partition by type order by type) rnk --fabricated
from table)
where
(type = 'S' and rnk <= 2 )
or
(type = 'L' and rnk <= 3 );
You can order by anything you want in that analytic function. For example, you can order by dbms_random.random() for random choices.
If your column numrows is correct and that's the number of rows you want to get then the where clause is simpler:
select id, type, numrows
from
(select
id,
type,
numrows,
row_number() over (partition by type order by dbms_random.random()) rnk --fabricated
from table)
where
rnk <= numrows;