Aggregate data from multiple rows into single row

Aggregate data from multiple rows into single row - sql

In my table each row has some data columns Priority column (for example, timestamp or just an integer). I want to group my data by ID and then in each group take latest not-null column. For example I have following table:
id A B C Priority
1 NULL 3 4 1
1 5 6 NULL 2
1 8 NULL NULL 3
2 634 346 359 1
2 34 NULL 734 2
Desired result is :
id A B C
1 8 6 4
2 34 346 734
In this example table is small and has only 5 columns, but in real table it will be much larger. I really want this script to work fast. I tried do it myself, but my script works for SQLSERVER2012+ so I deleted it as not applicable.
Numbers: table could have 150k of rows, 20 columns, 20-80k of unique ids and average SELECT COUNT(id) FROM T GROUP BY ID is 2..5
Now I have a working code (thanks to #ypercubeᵀᴹ), but it runs very slowly on big tables, in my case script can take one minute or even more (with indices and so on).
How can it be speeded up?
SELECT
d.id,
d1.A,
d2.B,
d3.C
FROM
( SELECT id
FROM T
GROUP BY id
) AS d
OUTER APPLY
( SELECT TOP (1) A
FROM T
WHERE id = d.id
AND A IS NOT NULL
ORDER BY priority DESC
) AS d1
OUTER APPLY
( SELECT TOP (1) B
FROM T
WHERE id = d.id
AND B IS NOT NULL
ORDER BY priority DESC
) AS d2
OUTER APPLY
( SELECT TOP (1) C
FROM T
WHERE id = d.id
AND C IS NOT NULL
ORDER BY priority DESC
) AS d3 ;
In my test database with real amount of data I get following execution plan:

This should do the trick, everything raised to the power 0 will return 1 except null:
DECLARE #t table(id int,A int,B int,C int,Priority int)
INSERT #t
VALUES (1,NULL,3 ,4 ,1),
(1,5 ,6 ,NULL,2),(1,8 ,NULL,NULL,3),
(2,634 ,346 ,359 ,1),(2,34 ,NULL,734 ,2)
;WITH CTE as
(
SELECT id,
CASE WHEN row_number() over
(partition by id order by Priority*power(A,0) desc) = 1 THEN A END A,
CASE WHEN row_number() over
(partition by id order by Priority*power(B,0) desc) = 1 THEN B END B,
CASE WHEN row_number() over
(partition by id order by Priority*power(C,0) desc) = 1 THEN C END C
FROM #t
)
SELECT id, max(a) a, max(b) b, max(c) c
FROM CTE
GROUP BY id
Result:
id a b c
1 8 6 4
2 34 346 734

One alternative that might be faster is a multiple join approach. Get the priority for each column and then join back to the original table. For the first part:
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id;
Then join back to this table:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.id, ta.a, tb.b, tc.c
from pabc left join
t ta
on pabc.id = ta.id and pabc.pa = ta.priority left join
t tb
on pabc.id = tb.id and pabc.pb = tb.priority left join
t tc
on pabc.id = tc.id and pabc.pc = tc.priority ;
This can also take advantage of an index on t(id, priority).

previous code will work with following syntax:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.Id,ta.a, tb.b, tc.c
from pabc
left join t ta on pabc.id = ta.id and pabc.pa = ta.priority
left join t tb on pabc.id = tb.id and pabc.pb = tb.priority
left join t tc on pabc.id = tc.id and pabc.pc = tc.priority ;

This looks rather strange. You have a log table for all column changes, but no associated table with current data. Now you are looking for a query to collect your current values from the log table, which is a laborious task naturally.
The solution is simple: have an additional table with the current data. You can even link the tables with a trigger (so either every time a record gets inserted in your log table you update the current table or everytime a change is written to the current table you write a log entry).
Then just query your current table:
select id, a, b, c from currenttable order by id;

Related

Select column with maximum value in another column but with aggregate SUM calculation

For each name, I need to output the category with the MAX net revenue and I am not sure how to do this. I have tried a bunch of different approaches, but it basically looks like this:
SELECT Name, Category, MAX(CatNetRev)
FROM (
SELECT Name, Category, SUM(Price*(Shipped-Returned)) AS "CatNetRev"
FROM a WITH (NOLOCK)
INNER JOIN b WITH (NOLOCK) ON b.ID = a.ID
...
-- (bunch of additional joins here, not relevant to question)
WHERE ... -- (not relevant to question)
GROUP BY Name, Category
) a GROUP BY Name;
This currently doesn't work because "Category" is not contained in an aggregate function or Group By (and this is obvious) but other approaches I have tried have failed for different reasons.
Each Name can have a bunch of different Categories, and Names can have the same Categories but the overlap is irrelevant to the question. I need to output just each unique Name that I have (we can assume they are already all unique) along with the "Top Selling Category" based on that Net Revenue calculation.
So for example if I have:
Name:
Category:
"CatNetRev":
A
1
100
A
2
300
A
3
50
B
1
300
B
2
500
C
1
40
C
2
20
C
3
10
I would want to output:
Name:
Category:
A
2
B
2
C
1
What's the best way to go about doing this?

Having to guess at your data schema a bit, as you didn't alias any of your columns, or define what table a vs b really was (as Gordon alluded). I'd use CROSS APPLY to get the max value, then bind the revenues in a WHERE clause, like so.
DECLARE #Revenue TABLE
(
Name VARCHAR(50)
,Category VARCHAR(50)
,NetRevenue DECIMAL(16, 9)
);
INSERT INTO #Revenue
(
Name
,Category
,NetRevenue
)
SELECT Name
,Category
,SUM(a.Price * (b.Shipped - b.Returned)) AS CatNetRev
FROM Item AS a
INNER JOIN ShipmentDetails AS b ON b.ID = a.ID
WHERE 1 = 1
GROUP BY
Name
,Category;
SELECT r.Name
,r.Category
FROM #Revenue AS r
CROSS APPLY (
SELECT MAX(r2.NetRevenue) AS MaxRevenue
FROM #Revenue AS r2
WHERE r.Name = r2.Name
) AS mr
WHERE r.NetRevenue = mr.MaxRevenue;

you can use window functions:
select * from
(
select * , rank() over (partition by Name order by CatNetRev desc) rn
from table
) t
where t.rn = 1

How to use multiple counts in where clause to compare data of a table in sql?

I want to compare data of a table with its other records. The count of rows with a specific condition has to match the count of rows without the where clause but on the same grouping.
Below is the table
-------------
id name time status
1 John 10 C
2 Alex 10 R
3 Dan 10 C
4 Tim 11 C
5 Tom 11 C
Output should be time = 11 as the count for grouping on time column is different when a where clause is added on status = 'C'
SELECT q1.time
FROM (SELECT time,
Count(id)
FROM table
GROUP BY time) AS q1
INNER JOIN (SELECT time,
Count(id)
FROM table
WHERE status = 'C'
GROUP BY time) AS q2
ON q1.time = q2.time
WHERE q1.count = q2.count
This is giving the desired output but is there a better and efficient way to get the desired result?

Are you looking for this :
select t.*
from table t
where not exists (select 1 from table t1 where t1.time = t.time and t1.status <> 'C');
However you can do :
select time
from table t
group by time
having sum (case when status <> 'c' then 1 else 0 end ) = 0;

If you want the times where the rows all satisfy the where clause, then in Postgres, you can express this as:
select time
from t
group by time
having count(*) = count(*) filter (where status = 'C');

Group parents with same children

EDIT: This is way harder to explain that I though, constatly editing based on comments. Thank you all for taking interest.
I have a table like this
ID Type ParentID
1 ChildTypeA 1
2 ChildTypeB 1
3 ChildTypeC 1
4 ChildTypeD 1
5 ChildTypeA 2
6 ChildTypeB 2
7 ChildTypeC 2
8 ChildTypeA 3
9 ChildTypeB 3
10 ChildTypeC 3
11 ChildTypeD 3
12 ChildTypeA 4
13 ChildTypeB 4
14 ChildTypeC 4
and I want to group parents that have same children - meaning same number of children of same type.
From parent point of view, there is a finite set of possible configurations (max 10).
If any parent has same set of children (by ChildType), I want to group them together (in what I call a configuration).
ChildTypeA-D = ConfigA
ChildTypeA-C = ConfigB
ChildTypeA, B, E, F = ConfigX
etc.
The output I need is parents grouped by Configurations.
Config Group ParentID
ConfigA 1
ConfigA 3
ConfigB 2
ConfigB 4
I have no idea where to even begin.

I named your table t. Please try if this is what you are looking for.
It's show matched and unmatched.
It's looking for parentids with the same number of rows (t1.cnt = t2.cnt) and that all the rows are matched (having COUNT(*) = t1.cnt).
You can try it here
;with t1 as (select parentid, type, id, count(*) over (partition by parentid order by parentid) cnt from t),
t3 as
(
select t1.parentid parentid1, t2.parentid parentid2, count(*) cn, t1.cnt cnt1, t2.cnt cnt2, ROW_NUMBER () over (order by t1.parentid) rn
from t1 join t1 as t2 on t1.type = t2.type and t1.parentid <> t2.parentid and t1.cnt = t2.cnt
group by t1.parentid, t2.parentid, t1.cnt, t2.cnt
having COUNT(*) = t1.cnt
),
notFound as (
select t1.parentid, ROW_NUMBER() over(order by t1.parentid) rn
from t1
where not exists (select 1 from t3 where t1.parentid = t3.parentid1)
group by t1.parentid
)
select 'Config'+char((select min(rn)+64 from t3 as t4 where t3.parentid1 in (t4.parentid1 , t4.parentid2))) config, t3.parentid1
from t3
union all
select 'Config'+char((select max(rn)+64+notFound.rn from t3)) config, notFound.parentid
from notFound
OUTPUT
config parentid1
ConfigA 1
ConfigA 3
ConfigB 2
ConfigB 4
If id 14 was ChildTypeZ then parentid 2 and 4 wouldn't match. This would be the output:
config parentid1
ConfigA 1
ConfigA 3
ConfigC 2
ConfigD 4

I have happen to have similar task. The data I'm working with is a bit bigger scale so I had to find an effective approach to this. Basically I've found 2 working approaches.
One is pure SQL - here's a core query. Basically it gives you smallest ParentID with same collection of children, which you can then use as a group id (you can also enumerate it with row_number). As a small note - I'm using cte here, but in real world I'd suggest to put grouped parents into temporary table and add indexes on the table as well.
;with cte_parents as (
-- You can also use different statistics to narrow the search
select
[ParentID],
count(*) as cnt,
min([Type]) as min_Type,
max([Type]) as max_Type
from Table1
group by
[ParentID]
)
select
h1.ParentID,
k.ParentID as GroupID
from cte_parents as h1
outer apply (
select top 1
h2.[ParentID]
from cte_parents as h2
where
h2.cnt = h1.cnt and
h2.min_Type = h1.min_Type and
h2.max_Type = h1.max_Type and
not exists (
select *
from (select tt.[Type] from Table1 as tt where tt.[ParentID] = h2.[ParentID]) as tt1
full join (select tt.[Type] from Table1 as tt where tt.[ParentID] = h1.[ParentID]) as tt2 on
tt2.[Type] = tt1.[Type]
where
tt1.[Type] is null or tt2.[Type] is null
)
order by
h2.[ParentID]
) as k
ParentID GroupID
----------- --------------
1 1
2 2
3 1
4 2
Another one is a bit trickier and you have to be careful when using it. But surprisingly, it works not so bad. The idea is to concatenate children into big string and then group by these strings. You can use any available concatenation method (xml trick or clr if you have SQL Server 2017). The important part is that you have to use ordered concatenation so every string will represent your group precisely. I have created a special CLR function (dbo.f_ConcatAsc) for this.
;with cte1 as (
select
ParentID,
dbo.f_ConcatAsc([Type], ',') as group_data
from Table1
group by
ParentID
), cte2 as (
select
dbo.f_ConcatAsc(ParentID, ',') as parent_data,
group_data,
row_number() over(order by group_data) as rn
from cte1
group by
group_data
)
select
cast(p.value as int) as ParentID,
c.rn as GroupID,
c.group_data
from cte2 as c
cross apply string_split(c.parent_data, ',') as p
ParentID GroupID group_data
----------- -------------------- --------------------------------------------------
2 1 ChildTypeA,ChildTypeB,ChildTypeC
4 1 ChildTypeA,ChildTypeB,ChildTypeC
1 2 ChildTypeA,ChildTypeB,ChildTypeC,ChildTypeD
3 2 ChildTypeA,ChildTypeB,ChildTypeC,ChildTypeD

Consolidate records

I want to consolidate a set of records
(id) / (referencedid)
1 10
1 11
2 11
2 10
3 10
3 11
3 12
The result of query should be
1 10
1 11
3 10
3 11
3 12
So, since id=1 and id=2 has same set of corresponding referenceids {10,11} they would be consolidated. But id=3 s corresponding referenceids are not the same, hence wouldnt be consolidated.
What would be good way to get this done?

Select id, referenceid
From MyTable
Where Id In (
Select Min( Z.Id ) As Id
From (
Select Z1.id, Group_Concat( Z1.referenceid ) As signature
From (
Select id, referenceid
From MyTable
Order By id, referenceid
) As Z1
Group By Z1.id
) As Z
Group By Z.Signature
)

-- generate count of elements for each distinct id
with Counts as (
select
id,
count(1) as ReferenceCount
from
tblReferences R
group by
R.id
)
-- generate every pairing of two different id's, along with
-- their counts, and how many are equivalent between the two
,Pairings as (
select
R1.id as id1
,R2.id as id2
,C1.ReferenceCount as count1
,C2.ReferenceCount as count2
,sum(case when R1.referenceid = R2.referenceid then 1 else 0 end) as samecount
from
tblReferences R1 join Counts C1 on R1.id = C1.id
cross join
tblReferences R2 join Counts C2 on R2.id = C2.id
where
R1.id < R2.id
group by
R1.id, C1.ReferenceCount, R2.id, C2.ReferenceCount
)
-- generate the list of ids that are safe to remove by picking
-- out any id's that have the same number of matches, and same
-- size of list, which means their reference lists are identical.
-- since id2 > id, we can safely remove id2 as a copy of id, and
-- the smallest id of which all id2 > id are copies will be left
,RemovableIds as (
select
distinct id2 as id
from
Pairings P
where
P.count1 = P.count2 and P.count1 = P.samecount
)
-- validate the results by just selecting to see which id's
-- will be removed. can also include id in the query above
-- to see which id was identified as the copy
select id from RemovableIds R
-- comment out `select` above and uncomment `delete` below to
-- remove the records after verifying they are correct!
--delete from tblReferences where id in (select id from RemovableIds) R

How to find "holes" in a table

I recently inherited a database on which one of the tables has the primary key composed of encoded values (Part1*1000 + Part2).
I normalized that column, but I cannot change the old values.
So now I have
select ID from table order by ID
ID
100001
100002
101001
...
I want to find the "holes" in the table (more precisely, the first "hole" after 100000) for new rows.
I'm using the following select, but is there a better way to do that?
select /* top 1 */ ID+1 as newID from table
where ID > 100000 and
ID + 1 not in (select ID from table)
order by ID
newID
100003
101029
...
The database is Microsoft SQL Server 2000. I'm ok with using SQL extensions.

select ID +1 From Table t1
where not exists (select * from Table t2 where t1.id +1 = t2.id);
not sure if this version would be faster than the one you mentioned originally.

SELECT (ID+1) FROM table AS t1
LEFT JOIN table as t2
ON t1.ID+1 = t2.ID
WHERE t2.ID IS NULL

This solution should give you the first and last ID values of the "holes" you are seeking. I use this in Firebird 1.5 on a table of 500K records, and although it does take a little while, it gives me what I want.
SELECT l.id + 1 start_id, MIN(fr.id) - 1 stop_id
FROM (table l
LEFT JOIN table r
ON l.id = r.id - 1)
LEFT JOIN table fr
ON l.id < fr.id
WHERE r.id IS NULL AND fr.id IS NOT NULL
GROUP BY l.id, r.id
For example, if your data looks like this:
ID
1001
1002
1005
1006
1007
1009
1011
You would receive this:
start_id stop_id
1003 1004
1008 1008
1010 1010
I wish I could take full credit for this solution, but I found it at Xaprb.

from How do I find a "gap" in running counter with SQL?
select
MIN(ID)
from (
select
100001 ID
union all
select
[YourIdColumn]+1
from
[YourTable]
where
--Filter the rest of your key--
) foo
left join
[YourTable]
on [YourIdColumn]=ID
and --Filter the rest of your key--
where
[YourIdColumn] is null

The best way is building a temp table with all IDs
Than make a left join.
declare #maxId int
select #maxId = max(YOUR_COLUMN_ID) from YOUR_TABLE_HERE
declare #t table (id int)
declare #i int
set #i = 1
while #i <= #maxId
begin
insert into #t values (#i)
set #i = #i +1
end
select t.id
from #t t
left join YOUR_TABLE_HERE x on x.YOUR_COLUMN_ID = t.id
where x.YOUR_COLUMN_ID is null

Have thought about this question recently, and looks like this is the most elegant way to do that:
SELECT TOP(#MaxNumber) ROW_NUMBER() OVER (ORDER BY t1.number)
FROM master..spt_values t1 CROSS JOIN master..spt_values t2
EXCEPT
SELECT Id FROM <your_table>

This solution doesn't give all holes in table, only next free ones + first available max number on table - works if you want to fill in gaps in id-es, + get free id number if you don't have a gap..
select numb + 1 from temp
minus
select numb from temp;

This will give you the complete picture, where 'Bottom' stands for gap start and 'Top' stands for gap end:
select *
from
(
(select <COL>+1 as id, 'Bottom' AS 'Pos' from <TABLENAME> /*where <CONDITION*/>
except
select <COL>, 'Bottom' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/)
union
(select <COL>-1 as id, 'Top' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/
except
select <COL>, 'Top' AS 'Pos' from <TABLENAME> /*where <CONDITION>*/)
) t
order by t.id, t.Pos
Note: First and Last results are WRONG and should not be regarded, but taking them out would make this query a lot more complicated, so this will do for now.

Many of the previous answer are quite good. However they all miss to return the first value of the sequence and/or miss to consider the lower limit 100000. They all returns intermediate holes but not the very first one (100001 if missing).
A full solution to the question is the following one:
select id + 1 as newid from
(select 100000 as id union select id from tbl) t
where (id + 1 not in (select id from tbl)) and
(id >= 100000)
order by id
limit 1;
The number 100000 is to be used if the first number of the sequence is 100001 (as in the original question); otherwise it is to be modified accordingly
"limit 1" is used in order to have just the first available number instead of the full sequence

For people using Oracle, the following can be used:
select a, b from (
select ID + 1 a, max(ID) over (order by ID rows between current row and 1 following) - 1 b from MY_TABLE
) where a <= b order by a desc;

The following SQL code works well with SqLite, but should be used without issues also on MySQL, MS SQL and so on.
On SqLite this takes only 2 seconds on a table with 1 million rows (and about 100 spared missing rows)
WITH holes AS (
SELECT
IIF(c2.id IS NULL,c1.id+1,null) as start,
IIF(c3.id IS NULL,c1.id-1,null) AS stop,
ROW_NUMBER () OVER (
ORDER BY c1.id ASC
) AS rowNum
FROM |mytable| AS c1
LEFT JOIN |mytable| AS c2 ON c1.id+1 = c2.id
LEFT JOIN |mytable| AS c3 ON c1.id-1 = c3.id
WHERE c2.id IS NULL OR c3.id IS NULL
)
SELECT h1.start AS start, h2.stop AS stop FROM holes AS h1
LEFT JOIN holes AS h2 ON h1.rowNum+1 = h2.rowNum
WHERE h1.start IS NOT NULL AND h2.stop IS NOT NULL
UNION ALL
SELECT 1 AS start, h1.stop AS stop FROM holes AS h1
WHERE h1.rowNum = 1 AND h1.stop > 0
ORDER BY h1.start ASC

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Aggregate data from multiple rows into single row - sql

Related

Select column with maximum value in another column but with aggregate SUM calculation

How to use multiple counts in where clause to compare data of a table in sql?

Group parents with same children

Consolidate records

How to find "holes" in a table

Categories

Resources