SQL Server : group by sum of column - sql

I need to aggregate data by one column which contains numeric data.
I have data like:
ID | Amount
---+-------
1 | 44
2 | 15
3 | 16
4 | 8
5 | 16
Result, which I expect is:
ID | Amount
---+-------
1 | 44
2 | 31
4 | 24
Query should group data ordered by ID column by Amount column in parts of max sum of amount 32. If amount is greater then 32 then it should be presented as one 'group'. Result should contain Min(ID) and SUM(Amount) which can't be greater than 32 when group more than one record.

The only way that I know how to accomplish this is using iteration (although in your case if you have enough single values over 32, then you might be able to use a more efficient approach).
Iteration in SQL Server queries is handled by recursive CTEs (once you forswear cursors):
with v as (
select *
from (values (1, 44), (2, 15), (3, 16), (4, 8), (5, 16) ) v(id, amount)
),
t as (
select v.*, row_number() over (order by id) as seqnum
),
cte as (
select seqnum, id, amount, id as grp
from t
where seqnum = 1
union all
select t.seqnum, t.id,
(case when t.amount + cte.amount > 32 then t.amount else t.amount + cte.amount end) as amount,
(case when t.amount + cte.amount > 32 then t.id else cte.grp end) as grp
from cte join
t
on cte.seqnum = t.seqnum + 1
)
select grp, max(amount)
from cte
group by grp;
I should note that the use of max(amount) in the outer query assumes that the values are never negative. A slight modification can handle that situation.
Also, the intermediate result using t is not strictly necessary for the data you have provided. It ensures that the columns used in the join actually have no gaps.

You can try this version with rownumbers assigned initially and each row is joined to the previous one in a recursive cte. And if the running sum > 32 a new group starts.
with rownums as (select t.*,row_number() over(order by id) as rnum from t)
,cte(rnum,id,amount,runningsum,grp) as (select rnum,id,amount,amount,1 from rownums where rnum=1
union all
select t.rnum,t.id,t.amount
,case when c.runningsum+t.amount > 32 then t.amount else c.runningsum+t.amount end
,case when c.runningsum+t.amount > 32 then t.id else c.grp end
from cte c
join rownums t on t.rnum=c.rnum+1
)
select grp as id,max(runningsum) as amount
from cte
group by grp
Sample Demo

Related

SELECT SQL Matching Number

I have millions of rows of data that have similar values ​​like this:
Id Reff Amount
1 a1 1000
2 a2 -1000
3 a3 -2500
4 a4 -1500
5 a5 1500
every data must have positive and negative values. the question is, how do I show only records that don't have a similar value? like a row Id 3. thanks for help
You can use not exists:
select t.*
from mytable t
where not exists (select 1 from mytable t1 where t1.amount = -1 * t.amount)
A left join antipattern would also get the job done:
select t.*
from mytable t
left join mytable t1 on t1.amount = -1 * t.amount
where t1.id is null
Demo on DB Fiddle:
Id | Reff | Amount
-: | :--- | -----:
3 | a3 | -2500
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE Test(
Id int
,Reff varchar(2)
,Amount int
);
INSERT INTO Test(Id,Reff,Amount) VALUES (1,'a1',1000);
INSERT INTO Test(Id,Reff,Amount) VALUES (2,'a2',-1000);
INSERT INTO Test(Id,Reff,Amount) VALUES (3,'a3',-2500);
INSERT INTO Test(Id,Reff,Amount) VALUES (4,'a4',-1500);
INSERT INTO Test(Id,Reff,Amount) VALUES (5,'a5',1500);
Query 1:
select t.*
from Test t
left join Test t1 on t1.amount =ABS(t.amount)
where t1.id is null
Results:
| Id | Reff | Amount |
|----|------|--------|
| 3 | a3 | -2500 |
Using a NOT EXISTS or a LEFT JOIN will work fine to find the amounts that don't have an opposite amount in the data.
But to really find the amounts that don't balance out with an Amount sorted by ID?
For such SQL puzzle it should be handled as a Gaps-And-Islands problem.
So the solution might appear a bit more complicated, but it's actually quite simple.
It first calculates a ranking per absolute value.
And based on that ranking it filters the last amount where the SUM per ranking isn't balanced out (not 0)
SELECT Id, Reff, Amount
FROM
(
SELECT *,
SUM(Amount) OVER (PARTITION BY Rnk) AS SumAmountByRank,
ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Id DESC) AS Rn
FROM
(
SELECT Id, Reff, Amount,
ROW_NUMBER() OVER (ORDER BY Id) - ROW_NUMBER() OVER (PARTITION BY ABS(Amount) ORDER BY Id) AS Rnk
FROM YourTable
) AS q1
) AS q2
WHERE SumAmountByRank != 0
AND Rn = 1
ORDER BY Id;
A test on rextester here
If the sequence doesn't matter, and just the balance matters?
Then the query can be simplified.
SELECT Id, Reff, Amount
FROM
(
SELECT Id, Reff, Amount,
SUM(Amount) OVER (PARTITION BY ABS(Amount)) AS SumByAbsAmount,
ROW_NUMBER() OVER (PARTITION BY ABS(Amount) ORDER BY Id DESC) AS Rn
FROM YourTable
) AS q
WHERE SumByAbsAmount != 0
AND Rn = 1
ORDER BY Id;

SQL: count last equal values

I need to solve this problem in pure SQL:
I have to count all the records with a specific value:
In my table there is a column flag with values 0 or 1. I need to count all the 1 after last 0 and sum the amount column values of those records.
Example:
Flag | Amount
0 | 5
1 | 8
0 | 10
1 | 20
1 | 30
Output:
2 | 50
If last value is 0 I don't need to do anything.
I hasten that I need to perform a fast query (possibly accessing just one time).
I assumed that your example table is logically ordered by Amount. Then you can do this:
select
count(*) as cnt
,sum(Amount) as Amount
from yourTable
where Amount > (select max(Amount) from yourTable where Flag = 0)
If the biggest value is from a row where Flag = 0 then nothing will be returned.
If your table may not contain any zeros, then you are safer with:
select count(*) as cnt, sum(Amount) as Amount
from t
where Amount > all (select Amount from t where Flag = 0)
Or, using window functions:
select count(*) as cnt, sum(amount) as amount
from (select t.*, max(case when flag = 0 then amount end) as flag0_amount
from t
) t
where flag0_amount is null or amount > flag0_amount
I find the solution by myself:
select decode(lv,0,0,tot-prog) somma ,decode(lv,0,0,cnt-myrow) count
from(
select * from
(
select pan,dt,flag,am,
last_value(flag) over() lv,
row_number() OVER (order by dt) AS myrow,
count(*) over() cnt,
case when lead(flag) OVER (ORDER BY dt) != flag then rownum end AS change,
sum(am) over() tot,
sum(am) over(order by dt) prog
from test
where pan=:pan and dt > :dt and flag is not null
order by dt
) t
where change is not null
order by change desc
) where rownum =1

Efficient way to cluster a timeline OR reconstruct a batch number

I'm working on a large dataset (150k / day) of a tester database. Each row contains data about a specific test of the product. Each tester inserts the results of his test.
I want to do some measurements like pass-fail-rate over a shift per product and tester. The problem is there are no batch numbers assigned so I can't select this easy.
Considering the given subselect of the whole table:
id tBegin orderId
------------------------------------
1 2018-10-20 00:00:05 1
2 2018-10-20 00:05:15 1
3 2018-10-20 01:00:05 1
10 2018-10-20 10:03:05 3
12 2018-10-20 11:04:05 8
20 2018-10-20 14:15:05 3
37 2018-10-20 18:12:05 1
My goal is it to cluster the data to the following
id tBegin orderId pCount
--------------------------------------------
1 2018-10-20 00:00:05 1 3
10 2018-10-20 10:03:05 3 1
12 2018-10-20 11:04:05 8 1
20 2018-10-20 14:15:05 3 1
37 2018-10-20 18:12:05 1 1
A simple GROUP BY orderID won't do the trick, so I came upwith the following
SELECT
MIN(c.id) AS id,
MIN(c.tBegin) AS tBegin,
c.orderId,
COUNT(*) AS pCount
FROM (
SELECT t2.id, t2.tBegin, t2.orderId,
( SELECT TOP 1 t.id
FROM history t
WHERE t.tBegin > t2.tBegin
AND t.orderID <> t2.orderID
AND <restrict date here further>
ORDER BY t.tBegin
) AS nextId
FROM history t2
) AS c
WHERE <restrict date here>
GROUP BY c.orderID, c.nextId
I left out the WHEREs that select the correct date and tester.
This works, but it seams very inefficient. I have worked with small databases, but I'm new to SQL Server 2017.
I appreciate your help very much!
You can use window functions for this:
DECLARE #t TABLE (id INT, tBegin DATETIME, orderId INT);
INSERT INTO #t VALUES
(1 , '2018-10-20 00:00:05', 1),
(2 , '2018-10-20 00:05:15', 1),
(3 , '2018-10-20 01:00:05', 1),
(10, '2018-10-20 10:03:05', 3),
(12, '2018-10-20 11:04:05', 8),
(20, '2018-10-20 14:15:05', 3),
(37, '2018-10-20 18:12:05', 1);
WITH cte1 AS (
SELECT *, CASE WHEN orderId = LAG(orderId) OVER (ORDER BY tBegin) THEN 0 ELSE 1 END AS chg
FROM #t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY tBegin) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY tBegin) AS rn
FROM cte2
)
SELECT *
FROM cte3
WHERE rn = 1
The first cte assigns a "change flag" to each row where the value changed
The second cte uses a running sum to convert 1s and 0s to a number which can be used to group rows
Finally you number rows within each group and select first row per group
Demo on DB Fiddle
You can use cumulative approach :
select min(id) as id, max(tBegin), orderid, count(*)
from (select h.*,
row_number() over (order by id) as seq1,
row_number() over (partition by orderid order by id) as seq2
from history h
) h
group by orderid, (seq1 - seq2)
order by id;

Finding Missing Numbers series when Data Is Grouped in sql server

I need to write a query that will calculate the missing numbers with their count in a sequence when the data is "grouped". The data are in multiple groups & each group is in sequence.
For Ex. I have number series like 1001-1050, 1245-1270, 4571-4590 and all numbers like 1001,1002,1003,....1050 is stored in Table1 and from that Table1 some numbers are stored in another table Table2. E.g. 1001,1002,1003,1004,1005.
I want to get output like this:
Utilized Numbers | Balance Numbers |
----------- -------------------------
1001 - 1005 = 5 | 1006 - 1050 = 45 |
1245 - 1251 = 7 | 1252 - 1270 = 19 |
4571 - 4573 = 3 | 4574 - 4590 = 17 |
The number of each series is single field which is stored in both tables.
You haven't really explained your data, but guessing that "Utilized" are the numbers found in both Table1 and Table2, and "Balance" are the numbers only in Table1.
You can get the result at least this way, it's a little bit messy, mostly because of formatting the results:
Edit: This is a new version that does not use lag.
select
min (case when C2 = 1 then MINID end), max (case when C2 = 1 then MAXID end), max(case when C2=1 then ROWS end),
min (case when C2 = 0 then MINID end), max (case when C2 = 0 then MAXID end), max(case when C2=0 then ROWS end)
from (
select min(ID) as MINID, max(ID) as MAXID, count(*) as ROWS, C2, row_number() over (partition by C2 order by min(ID)) as GRP3 from (
select *, ID - RN as GRP1, ID - RN2 as GRP2 from (
select
T1.ID, row_number() over (order by T1.ID) as RN,
case when T2.ID is NULL then 0 else 1 end as C2,
row_number() over (partition by case when T2.ID is NULL then 0 else 1 end order by T1.ID) as RN2,
T2.ID as ID2
from #Table1 T1
left outer join #Table2 T2 on T1.ID = T2.ID
) X
) Y
group by GRP1, GRP2, C2
) Z
group by GRP3
order by 1
The idea here is to have a row number ordered by Table1.ID, and it's compared to the Table1.ID, and if the difference changes, then it's a new group. The same logic is used second time, but now partitioned differently for rows that exist in Table2 to handle changes between "Utilized" and "Balance".
From those groupings you can get the min and max value + number of rows. There's one additional grouping with min/max and case to format the result into 2 columns.
See the demo.

SELECT records until new value SQL

I have a table
Val | Number
08 | 1
09 | 1
10 | 1
11 | 3
12 | 0
13 | 1
14 | 1
15 | 1
I need to return the last values where Number = 1 (however many that may be) until Number changes, but do not need the first instances where Number = 1. Essentially I need to select back until Number changes to 0 (15, 14, 13)
Is there a proper way to do this in MSSQL?
Based on following:
I need to return the last values where Number = 1
Essentially I need to select back until Number changes to 0 (15, 14,
13)
Try (Fiddle demo ):
select val, number
from T
where val > (select max(val)
from T
where number<>1)
EDIT: to address all possible combinations (Fiddle demo 2)
;with cte1 as
(
select 1 id, max(val) maxOne
from T
where number=1
),
cte2 as
(
select 1 id, isnull(max(val),0) maxOther
from T
where val < (select maxOne from cte1) and number<>1
)
select val, number
from T cross join
(select maxOne, maxOther
from cte1 join cte2 on cte1.id = cte2.id
) X
where val>maxOther and val<=maxOne
I think you can use window functions, something like this:
with cte as (
-- generate two row_number to enumerate distinct groups
select
Val, Number,
row_number() over(partition by Number order by Val) as rn1,
row_number() over(order by Val) as rn2
from Table1
), cte2 as (
-- get groups with Number = 1 and last group
select
Val, Number,
rn2 - rn1 as rn1, max(rn2 - rn1) over() as rn2
from cte
where Number = 1
)
select Val, Number
from cte2
where rn1 = rn2
sql fiddle demo
DEMO: http://sqlfiddle.com/#!3/e7d54/23
DDL
create table T(val int identity(8,1), number int)
insert into T values
(1),(1),(1),(3),(0),(1),(1),(1),(0),(2)
DML
; WITH last_1 AS (
SELECT Max(val) As val
FROM t
WHERE number = 1
)
, last_non_1 AS (
SELECT Coalesce(Max(val), -937) As val
FROM t
WHERE EXISTS (
SELECT val
FROM last_1
WHERE last_1.val > t.val
)
AND number <> 1
)
SELECT t.val
, t.number
FROM t
CROSS
JOIN last_1
CROSS
JOIN last_non_1
WHERE t.val <= last_1.val
AND t.val > last_non_1.val
I know it's a little verbose but I've deliberately kept it that way to illustrate the methodolgy.
Find the highest val where number=1.
For all values where the val is less than the number found in step 1, find the largest val where the number<>1
Finally, find the rows that fall within the values we uncovered in steps 1 & 2.
select val, count (number) from
yourtable
group by val
having count(number) > 1
The having clause is the key here, giving you all the vals that have more than one value of 1.
This is a common approach for getting rows until some value changes. For your specific case use desc in proper spots.
Create sample table
select * into #tmp from
(select 1 as id, 'Alpha' as value union all
select 2 as id, 'Alpha' as value union all
select 3 as id, 'Alpha' as value union all
select 4 as id, 'Beta' as value union all
select 5 as id, 'Alpha' as value union all
select 6 as id, 'Gamma' as value union all
select 7 as id, 'Alpha' as value) t
Pull top rows until value changes:
with cte as (select * from #tmp t)
select * from
(select cte.*, ROW_NUMBER() over (order by id) rn from cte) OriginTable
inner join
(
select cte.*, ROW_NUMBER() over (order by id) rn from cte
where cte.value = (select top 1 cte.value from cte order by cte.id)
) OnlyFirstValueRecords
on OriginTable.rn = OnlyFirstValueRecords.rn and OriginTable.id = OnlyFirstValueRecords.id
On the left side we put an original table. On the right side we put only rows whose value is equal to the value in first line.
Records in both tables will be same until target value changes. After line #3 row numbers will get different IDs associated because of the offset and will never be joined with original table:
LEFT RIGHT
ID Value RN ID Value RN
1 Alpha 1 | 1 Alpha 1
2 Alpha 2 | 2 Alpha 2
3 Alpha 3 | 3 Alpha 3
----------------------- result set ends here
4 Beta 4 | 5 Alpha 4
5 Alpha 5 | 7 Alpha 5
6 Gamma 6 |
7 Alpha 7 |
The ID must be unique. Ordering by this ID must be same in both ROW_NUMBER() functions.