SQL Group Range Values - sql

I have taken a look at several other questions/answers on here but I cannot apply those to my problem. I am trying to identify multiple sequential breaks based on a key column. Most examples I have found do not deal with multiple breaks in a sequence for the same key column.
Sample data:
Location Number
------------------------
300 15
300 16
300 17
300 18
300 21
300 22
300 23
400 10
400 11
400 14
400 16
Here is the result I am looking for:
Location StartNumber StartNumber
------------------------------------------
300 15 18
300 21 23
400 10 11
400 14 14
400 16 16

Here's as relatively portable SQL solution since you didn't specify the DB
Create Table SampleData (Location int, Number Int)
INSERT INTO SampleData VALUES (300, 15)
INSERT INTO SampleData VALUES (300, 16)
INSERT INTO SampleData VALUES (300, 17)
INSERT INTO SampleData VALUES (300, 18)
INSERT INTO SampleData VALUES (300, 21)
INSERT INTO SampleData VALUES (300, 22)
INSERT INTO SampleData VALUES (300, 23)
INSERT INTO SampleData VALUES (400, 10)
INSERT INTO SampleData VALUES (400, 11)
INSERT INTO SampleData VALUES (400, 14)
INSERT INTO SampleData VALUES (400, 16)
SELECT
t1.Location,
t1.Number AS startofgroup,
MIN(t2.Number) AS endofgroup
FROM (SELECT Number , Location
FROM SampleData tbl1
WHERE NOT EXISTS(SELECT *
FROM SampleData tbl2
WHERE tbl1.Number - tbl2.Number = 1
and tbl1.Location = tbl2.Location)) t1
INNER JOIN (SELECT Number , Location
FROM SampleData tbl1
WHERE NOT EXISTS(SELECT *
FROM SampleData tbl2
WHERE tbl2.Number - tbl1.Number = 1
and tbl1.Location = tbl2.Location)) t2
ON t1.Number <= t2.Number
and t1.Location = t2.Location
GROUP BY
t1.Location,
t1.Number
ORDER BY
Location,
startofgroup
Output
Location startofgroup endofgroup
----------- ------------ -----------
300 15 18
300 21 23
400 10 11
400 14 14
400 16 16
Its a modified version of Listing 2. A set-based solution for identifying islands. From Islands and Gaps in Sequential Numbers by Alexander Kozak
If you're looking for more options with SQL Server 2005 and later you should search for the phrase "Itzik Ben-Gan gaps and islands"

Well, if you're using an RDBMS that supports the lag() function, then this should tell you where the breaks are. You should then be able to use this, along with some case statements and careful use of the min() and max() functions, to get the query that you want.
select location, lag_number as startnumber, number as endnumber
from(select location, number, lag_number
from(
select location, number
, lag(number) over (partition by location order by number) as lag_number
from table
)a
where number is not null and lag_number is not null
)b
where number-lag_number>1 order by 1,2,3;

Related

Create groups for INSERT based on IDs of source table

I'm trying to regroup some Ids, and create unique ID in a grouping table (acting as a bridge table in my data warehouse).
I got in a the source table transactionId, UserId and OrganisationId.
What I want to do is to create "groups" of OrganisationId, depending on User association, in order to use this group in a joint.
See below:
Original datas
TransaID UserId OrgaId
---------- -------- --------
24011035 1 180
24011035 1 19
24011040 2 89
24011064 3 89
24011070 4 19
24011082 4 180
24011106 5 89
24011106 5 180
24011107 6 180
Desired output
OrgaGroupId OrgaId
------------- --------
1 180
1 19
2 89
3 180
3 89
4 180
I've created 1 group for combination of Orga 180 and 19, as 2 users got it.
Two simple groups for OrgaId 89 and 180, as they appear only once associated to a user.
And finally, another group for 180 and 89, as it's a new combination.
What would be the T-SQL statements to achieve output exposed above?
declare #t table(TransaID int, UserId int, OrgaId int);
insert into #t(TransaID, UserId, OrgaId)
values
(24011035, 1, 180),
(24011035, 1, 19),
(24011040, 2, 89),
(24011064, 3, 89),
(24011070, 4, 19),
(24011082, 4, 180),
(24011106, 5, 89),
(24011106, 5, 180),
(24011107, 6, 180);
select /*straggr,*/ OrgaId, dense_rank() over(order by min(UserId)) as grpid
from
(
select a.*,
(select distinct concat(b.OrgaId, ',') from #t as b where b.UserId = a.UserId order by concat(b.OrgaId, ',') for xml path('')) as straggr
from #t as a
) as t
group by straggr, OrgaId;

Distribute values to several rows in SQL Server

I need help with SQL Server on how to distribute a row value to several rows with the same id. To illustrate,
Id = ProductInventoryCode
Qty = QuantityInStock
ForDistribution:
Id | Qty | TotalNoOfBranchesWithId
---+--------+-------------------------
1 | 40 | 2
2 | 33 | 3
3 | 21 | 2
A table that will receive the distributed values
Id | BranchCode | Qty | QtyFromForDistributionTable
-------------------------------------------------------
1 101 13 20
1 102 8 20
2 101 10 11
2 102 2 10
2 103 3 12
3 101 1 11
3 102 12 10
As much as possible the distribution should be near equal for each id and branches.
I got something like below, but somewhat got confused and lost path.
with rs as
(
select
r.*, cume.cumequantity,
coalesce(s.shipped, 0) AS shipped
from
tmpForDistribution r
cross apply
(SELECT SUM([QuantityInStock]) AS cumequantity
FROM tmpForDistribution r2
WHERE r2.ProductInventoryCode = r.ProductInventoryCode) cume
left join
(SELECT ProductInventoryCode, COUNT(ProductInventoryCode) AS shipped
FROM tmpDistributed s
GROUP BY s.ProductInventoryCode) s ON r.ProductInventoryCode = s.ProductInventoryCode
)
select
rs.ProductInventoryCode, rs.cumequantity, rs.QuantityInStock,
***"how to distribute"***
from rs
I'm currently using SQL Server 2008
Here's a sample screen output
The upper result is 145 Branches, below we use to distribute the ForDistributionQty field which is 3130, I am ending up with a fraction (DistVal = 21.586) which is not correct for this problem, it should be a whole number such as 21, however, if its just 21, then 21 x 145 is just 3045 which is shy of 85 units.
Here we distribute the values, and then make a final "adjustment" to the record which has the largest quantity (arbitrary). But at the end of the day, the math works and the distributed values are square.
Note: Not sure why in your sample why ID 2 did not get an even distribution
Declare #Table table (Id int,BranchCode int,Qty int)
Insert Into #Table values
(1, 101, 13),
(1, 102, 8),
(2, 101, 10),
(2, 102, 2),
(2, 103, 3),
(3, 101, 1),
(3, 102, 12)
Declare #Dist table (ID int,Qty int)
Insert Into #Dist values
(1,40),
(2,33),
(3,49)
;with cte0 as (
Select A.*
,ToDist = cast(D.Qty as int)
,DistVal = cast(D.Qty as int)/C.Cnt
,RN = Row_Number() over (Partition By A.ID Order By cast(D.Qty as int)/C.Cnt Desc,A.Qty Desc)
From #Table A
Join (Select ID,Cnt=count(*) from #Table Group By ID) C on A.ID=C.ID
Join #Dist D on A.ID=D.ID )
, cte1 as (
Select ID,AdjVal=Sum(DistVal)-max(ToDist) From cte0 Group By ID
)
Select A.ID
,A.BranchCode
,A.Qty
,DistVal = DistVal - case when A.RN<=abs(AdjVal) then 1*sign(AdjVal) else 0 end
From cte0 A
Join cte1 B on (A.ID=B.Id)
Order By 1,2
Returns
ID BranchCode Qty DistVal
1 101 13 20
1 102 8 20
2 101 10 11
2 102 2 11
2 103 3 11
3 101 1 24
3 102 12 25
If you can tolerate decimal values, a subquery seems to give a better query plan (tested on SQL 2014, with some sensible keys in place, this avoids a table spool and some additional index scans):
Declare #Table table (Id int,BranchCode int,Qty int, primary key(id, branchcode))
Insert Into #Table values
(1, 101, 13),
(1, 102, 8),
(2, 101, 10),
(2, 102, 2),
(2, 103, 3),
(3, 101, 1),
(3, 102, 12)
Declare #Dist table (ID int primary key,Qty int)
Insert Into #Dist values
(1,40),
(2,33),
(3,21)
SELECT
t.id
,t.BranchCode
,t.Qty
,(d.Qty / CAST((SELECT COUNT(*) as cnt FROM #table t2 where t.id = t2.id) AS decimal(10,2))) as DistributedQty
FROM #Table t
INNER JOIN #Dist d
ON d.id = t.Id
outputs:
Id BranchCode Qty DistributedQty
1 101 13 20.00000000000
1 102 82 20.00000000000
2 101 10 11.00000000000
2 102 21 11.00000000000
2 103 31 11.00000000000
3 101 11 10.50000000000
3 102 12 10.50000000000
If you need DistributedQty to be an int and retain remainders then I can't think of a better solution than #John Cappelletti's, noting that uneven quantities may not be as exactly even as you might hope (e.g. 32 distributed by three would result in a 12/10/10 distribution instead of an 11/11/10 distribution).

Add column with row number

I want to add a column to my select showing a set of number from say 1 to 4.
Example:
Select * gives me
Id Transaction
1 10
2 11
3 12
4 13
5 14
6 15
I want to add a column called "Flow". The result should be like this.
Id Transaction Flow
1 10 1
2 11 2
3 12 3
4 13 4
5 14 1
6 15 2
In this example the flow is from 1-4. Could be 1-n.
No particular relation between Id and Flow is needed.
If you're using SQL Server or other DBMS that allows ROW_NUMBER, you could do this:
CREATE TABLE #Tbl(Id INT, [Transaction] INT);
INSERT INTO #Tbl VALUES
(1, 10), (2, 11), (3, 12), (4, 13), (5, 14), (6, 15);
DECLARE #N INT = 4;
SELECT *,
Flow = 1 + ((ROW_NUMBER() OVER(ORDER BY Id) - 1) % #N)
FROM #Tbl
DROP TABLE #Tbl;
If you are using mySql.
Query
set #r := 0;
select Id, `Transaction`,
#r := (#r % 4) + 1 as Flow
from your_table_name
order by Id;
Demo
EDIT
Following sql query can be used irrespective of rdbms.
Query
select *, (
select ((count(*) - 1) % 4) + 1 as Flow
from your_table_name t2
where t1.Id >= t2.Id
) as Flow
from your_table_name t1;

Running total with maximum limit

I have a simple question but its kinda difficult for me to solve it.
I would like to have sum up a specific column till it reached a limit and resets it self. (SQL 2012)
Lets say the limit is 50
- List item
- Value Total
- 10 10
- 20 30
- 30 60
- 40 50 (60-limit) + the current row value
- 2 2
- 3 5
- 10 15
- 25 40
- 15 55
- 5 10 (55-limit) + the current row value
Thank you very much.
This should do it if you have SQL Server 2012 or later:
DECLARE #Table TABLE (Id INT, ListItem INT);
INSERT INTO #Table VALUES (1, 10);
INSERT INTO #Table VALUES (2, 20);
INSERT INTO #Table VALUES (3, 30);
INSERT INTO #Table VALUES (4, 40);
INSERT INTO #Table VALUES (5, 2);
INSERT INTO #Table VALUES (6, 3);
INSERT INTO #Table VALUES (7, 10);
INSERT INTO #Table VALUES (8, 25);
INSERT INTO #Table VALUES (9, 15);
INSERT INTO #Table VALUES (10, 5);
WITH RunningTotal AS (
SELECT Id, ListItem, SUM(ListItem) OVER (ORDER BY Id) % 50 AS RT FROM #Table)
SELECT
rt.Id,
rt.ListItem,
CASE WHEN rt.RT < rt2.RT THEN rt.RT + 50 ELSE rt.RT END AS RunningTotal
FROM
RunningTotal rt
LEFT JOIN RunningTotal rt2 ON rt2.Id = rt.Id - 1
ORDER BY
rt.Id;
The tricky bit is allowing the numbers to overflow the 50 one time, otherwise this would be trivial.
Results are:
Id LI RunningTotal
1 10 10
2 20 30
3 30 60
4 40 50
5 2 2
6 3 5
7 10 15
8 25 40
9 15 55
10 5 10
create table running_totals
(
id int identity(1,1),
val int
)
insert into running_totals
select 1 union all
select 20 union all
select 10 union all
select 30 union all
select 50 union all
select 10 union all
select 11 union all
select 22 union all
select 40 union all
select 60 union all
select 20 union all
select 10 union all
select 15
declare cur_run_tot cursor for select id,val from running_totals order by id asc
declare #id int ,#val int,#runtot int
open cur_run_tot
create table #RunTot
(
id int,val int, runtot int
)
fetch next from cur_run_tot into #id,#val
while ##FETCH_STATUS = 0
begin
if #runtot is null or #runtot+#val > 50
set #runtot = #val
else
set #runtot = #runtot+ #val
insert into #RunTot
select #id,#val,#runtot
fetch next from cur_run_tot into #id,#val
end
select id as ID, val as Current_Value, runtot as Running_Total from #RunTot
drop table #RunTot
deallocate cur_run_tot

Finding Missing Numbers When Data Is Grouped In SQL Server

I need to to write a query that will calculate the missing numbers in a sequence when the data is "grouped". The data in each group is in sequence, but each individual group would have its own sequence. The data would look something like this:
Id| Number|
-----------
1 | 250 |
1 | 270 | <260 Missing
1 | 280 | <290 Missing
1 | 300 |
1 | 310 |
2 | 110 |
2 | 130 | <120 Missing
2 | 140 |
3 | 260 |
3 | 270 |
3 | 290 | <280 Missing
3 | 300 |
3 | 340 | <310, 320 & 330 Missing
I have found a solution based on this post from CELKO here:
http://bytes.com/topic/sql-server/answers/511668-query-find-missing-number
In essence to set up a demo run the following:
CREATE TABLE Sequence
(seq INT NOT NULL
PRIMARY KEY (seq));
INSERT INTO Sequence VALUES (1);
INSERT INTO Sequence VALUES (2);
INSERT INTO Sequence VALUES (3);
INSERT INTO Sequence VALUES (4);
INSERT INTO Sequence VALUES (5);
INSERT INTO Sequence VALUES (6);
INSERT INTO Sequence VALUES (7);
INSERT INTO Sequence VALUES (8);
INSERT INTO Sequence VALUES (9);
INSERT INTO Sequence VALUES (10);
CREATE TABLE Tickets
(buyer CHAR(5) NOT NULL,
ticket_nbr INTEGER DEFAULT 1 NOT NULL
PRIMARY KEY (buyer, ticket_nbr));
INSERT INTO Tickets VALUES ('a', 2);
INSERT INTO Tickets VALUES ('a', 3);
INSERT INTO Tickets VALUES ('a', 4);
INSERT INTO Tickets VALUES ('b', 4);
INSERT INTO Tickets VALUES ('c', 1);
INSERT INTO Tickets VALUES ('c', 2);
INSERT INTO Tickets VALUES ('c', 3);
INSERT INTO Tickets VALUES ('c', 4);
INSERT INTO Tickets VALUES ('c', 5);
INSERT INTO Tickets VALUES ('d', 1);
INSERT INTO Tickets VALUES ('d', 6);
INSERT INTO Tickets VALUES ('d', 7);
INSERT INTO Tickets VALUES ('d', 9);
INSERT INTO Tickets VALUES ('e', 10);
SELECT DISTINCT T1.buyer, S1.seq
FROM Tickets AS T1, Sequence AS S1
WHERE seq <= (SELECT MAX(ticket_nbr) -- set the range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq NOT IN (SELECT ticket_nbr -- get missing numbers
FROM Tickets AS T3
WHERE T1.buyer = T3.buyer);
CELKO does mention that this is for a small number of tickets, in my example my numbers table is limited to 200 rows with a single column which is a primary key with each row an increment of 10 as that is what I am interested in. I modified CELKOs query as follows (added in min range):
SELECT DISTINCT T1.buyer, S1.seq
FROM Tickets AS T1, Sequence AS S1
WHERE seq <= (SELECT MIN(ticket_nbr) -- set the MIN range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq <= (SELECT MAX(ticket_nbr) -- set the MAX range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq NOT IN (SELECT ticket_nbr -- get missing numbers
FROM Tickets AS T3
WHERE T1.buyer = T3.buyer)
ORDER BY buyer, seq;
The output would be those numbers that are missing:
buyer seq
a 1
b 1
b 2
b 3
e 1
e 2
e 3
e 4
e 5
e 6
e 7
e 8
e 9
This works exactly as I want, however, on my data set it is very slow (11 second run time at the moment - it appears to be the DISTINCT which slows things down tremendously and presumably will gt worse as the base data set grows). I have tried all manner of things to make it more efficient but sadly my ambition exceeds my knowledge. Is it possible to make the query above more efficient/faster. My only constraint is that the dataset I am making needs to be a SQL View (as it feeds a report) and will execute on SQL Azure.
Cheers
David
If my understanding is correct, you want to fill in the missing data from the table. The table would consist of ID and a Number which is incremented by 10.
CREATE TABLE Test(
ID INT,
Number INT
)
INSERT INTO Test VALUES
(1, 250), (1, 270), (1, 280), (1, 300), (1, 310),
(2, 110), (2, 130), (2, 140), (3, 260), (3, 270),
(3, 290), (3, 300), (3, 340);
You could do this by using a Tally Table and doing a CROSS JOIN on the Test table:
;WITH E1(N) AS(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS(SELECT 1 FROM E1 a, E1 b)
,E4(N) AS(SELECT 1 FROM E2 a, E2 b)
,Tally(N) AS(
SELECT TOP (SELECT MAX(Number)/10 FROM Test)
(ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) - 1) * 10
FROM E4
),
MinMax AS(
SELECT
ID,
Minimum = MIN(Number),
Maximum = MAX(Number)
FROM Test
GROUP BY ID
),
CrossJoined AS(
SELECT
m.ID,
Number = Minimum + t.N
FROM MinMax m
CROSS JOIN Tally t
WHERE
Minimum + t.N <= Maximum
)
SELECT * FROM CrossJoined c
ORDER BY c.ID, c.Number
RESULT
ID Seq
----------- --------------------
1 250
1 260
1 270
1 280
1 290
1 300
1 310
2 110
2 120
2 130
2 140
3 260
3 270
3 280
3 290
3 300
3 310
3 320
3 330
3 340
If you only want to find the missing Number from Test grouped by ID, just replace the final SELECT statement:
SELECT * FROM CrossJoined c
ORDER BY c.ID, c.Number
to:
SELECT c.ID, c.Number
FROM CrossJoined c
WHERE NOT EXISTS(
SELECT 1 FROM Test t
WHERE
t.ID = c.ID
AND t.Number = c.Number
)
ORDER BY c.ID, c.Number
RESULT
ID Number
----------- --------------------
1 260
1 290
2 120
3 280
3 310
3 320
3 330