Finding Missing Numbers When Data Is Grouped In SQL Server - sql

I need to to write a query that will calculate the missing numbers in a sequence when the data is "grouped". The data in each group is in sequence, but each individual group would have its own sequence. The data would look something like this:
Id| Number|
-----------
1 | 250 |
1 | 270 | <260 Missing
1 | 280 | <290 Missing
1 | 300 |
1 | 310 |
2 | 110 |
2 | 130 | <120 Missing
2 | 140 |
3 | 260 |
3 | 270 |
3 | 290 | <280 Missing
3 | 300 |
3 | 340 | <310, 320 & 330 Missing
I have found a solution based on this post from CELKO here:
http://bytes.com/topic/sql-server/answers/511668-query-find-missing-number
In essence to set up a demo run the following:
CREATE TABLE Sequence
(seq INT NOT NULL
PRIMARY KEY (seq));
INSERT INTO Sequence VALUES (1);
INSERT INTO Sequence VALUES (2);
INSERT INTO Sequence VALUES (3);
INSERT INTO Sequence VALUES (4);
INSERT INTO Sequence VALUES (5);
INSERT INTO Sequence VALUES (6);
INSERT INTO Sequence VALUES (7);
INSERT INTO Sequence VALUES (8);
INSERT INTO Sequence VALUES (9);
INSERT INTO Sequence VALUES (10);
CREATE TABLE Tickets
(buyer CHAR(5) NOT NULL,
ticket_nbr INTEGER DEFAULT 1 NOT NULL
PRIMARY KEY (buyer, ticket_nbr));
INSERT INTO Tickets VALUES ('a', 2);
INSERT INTO Tickets VALUES ('a', 3);
INSERT INTO Tickets VALUES ('a', 4);
INSERT INTO Tickets VALUES ('b', 4);
INSERT INTO Tickets VALUES ('c', 1);
INSERT INTO Tickets VALUES ('c', 2);
INSERT INTO Tickets VALUES ('c', 3);
INSERT INTO Tickets VALUES ('c', 4);
INSERT INTO Tickets VALUES ('c', 5);
INSERT INTO Tickets VALUES ('d', 1);
INSERT INTO Tickets VALUES ('d', 6);
INSERT INTO Tickets VALUES ('d', 7);
INSERT INTO Tickets VALUES ('d', 9);
INSERT INTO Tickets VALUES ('e', 10);
SELECT DISTINCT T1.buyer, S1.seq
FROM Tickets AS T1, Sequence AS S1
WHERE seq <= (SELECT MAX(ticket_nbr) -- set the range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq NOT IN (SELECT ticket_nbr -- get missing numbers
FROM Tickets AS T3
WHERE T1.buyer = T3.buyer);
CELKO does mention that this is for a small number of tickets, in my example my numbers table is limited to 200 rows with a single column which is a primary key with each row an increment of 10 as that is what I am interested in. I modified CELKOs query as follows (added in min range):
SELECT DISTINCT T1.buyer, S1.seq
FROM Tickets AS T1, Sequence AS S1
WHERE seq <= (SELECT MIN(ticket_nbr) -- set the MIN range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq <= (SELECT MAX(ticket_nbr) -- set the MAX range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq NOT IN (SELECT ticket_nbr -- get missing numbers
FROM Tickets AS T3
WHERE T1.buyer = T3.buyer)
ORDER BY buyer, seq;
The output would be those numbers that are missing:
buyer seq
a 1
b 1
b 2
b 3
e 1
e 2
e 3
e 4
e 5
e 6
e 7
e 8
e 9
This works exactly as I want, however, on my data set it is very slow (11 second run time at the moment - it appears to be the DISTINCT which slows things down tremendously and presumably will gt worse as the base data set grows). I have tried all manner of things to make it more efficient but sadly my ambition exceeds my knowledge. Is it possible to make the query above more efficient/faster. My only constraint is that the dataset I am making needs to be a SQL View (as it feeds a report) and will execute on SQL Azure.
Cheers
David

If my understanding is correct, you want to fill in the missing data from the table. The table would consist of ID and a Number which is incremented by 10.
CREATE TABLE Test(
ID INT,
Number INT
)
INSERT INTO Test VALUES
(1, 250), (1, 270), (1, 280), (1, 300), (1, 310),
(2, 110), (2, 130), (2, 140), (3, 260), (3, 270),
(3, 290), (3, 300), (3, 340);
You could do this by using a Tally Table and doing a CROSS JOIN on the Test table:
;WITH E1(N) AS(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS(SELECT 1 FROM E1 a, E1 b)
,E4(N) AS(SELECT 1 FROM E2 a, E2 b)
,Tally(N) AS(
SELECT TOP (SELECT MAX(Number)/10 FROM Test)
(ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) - 1) * 10
FROM E4
),
MinMax AS(
SELECT
ID,
Minimum = MIN(Number),
Maximum = MAX(Number)
FROM Test
GROUP BY ID
),
CrossJoined AS(
SELECT
m.ID,
Number = Minimum + t.N
FROM MinMax m
CROSS JOIN Tally t
WHERE
Minimum + t.N <= Maximum
)
SELECT * FROM CrossJoined c
ORDER BY c.ID, c.Number
RESULT
ID Seq
----------- --------------------
1 250
1 260
1 270
1 280
1 290
1 300
1 310
2 110
2 120
2 130
2 140
3 260
3 270
3 280
3 290
3 300
3 310
3 320
3 330
3 340
If you only want to find the missing Number from Test grouped by ID, just replace the final SELECT statement:
SELECT * FROM CrossJoined c
ORDER BY c.ID, c.Number
to:
SELECT c.ID, c.Number
FROM CrossJoined c
WHERE NOT EXISTS(
SELECT 1 FROM Test t
WHERE
t.ID = c.ID
AND t.Number = c.Number
)
ORDER BY c.ID, c.Number
RESULT
ID Number
----------- --------------------
1 260
1 290
2 120
3 280
3 310
3 320
3 330

Related

I want to output a list of every Case_Number with Code 1 that has a higher UniqueID value than its Code 2 counterpart's UniqueID value

I have a table which looks something like this:
Case_Number | Code | UniqueID
a 1 1372
a 2 1352
a 3 1325
b 1 1642
b 2 1651
b 3 1623
c 1 1743
c 2 1739
c 3 1720
... ... ...
From this database I want to output a list of every Case_Number where the UniqueID value of Code 1 is higher than the UniqueID value of Code 2 (But ignoring the UniqueID value of Code 2, or any other Code x that might be in the table). Meaning that if the UniqueID value of Code 2 is higher than Code 1, which is the case with Case_Number b in the example above, it should not show up in the list.
So, querying the above table would result in this:
Case_Number | Code | UniqueID
a 1 1372
c 1 1743
Hmmm . . . You seem to want:
select t.*
from t
where t.code = 1 and
t.uniqueid > (select max(t2.uniqueid)
from t t2
where t2.case_number = t.case_number and t2.code = 2
);
The max() in the subquery is simply to handle the case where there is more than one matching value.
The query below gives you the expected result
CREATE TABLE CaseTab
(Case_Number VARCHAR(10),
Code INT,
UniqueID INT);
INSERT INTO CaseTab VALUES ('a', 1, 1372);
INSERT INTO CaseTab VALUES ('a', 2, 1352);
INSERT INTO CaseTab VALUES ('a', 3, 1325);
INSERT INTO CaseTab VALUES ('b', 1, 1642);
INSERT INTO CaseTab VALUES ('b', 2, 1651);
INSERT INTO CaseTab VALUES ('b', 3, 1623);
INSERT INTO CaseTab VALUES ('c', 1, 1743);
INSERT INTO CaseTab VALUES ('c', 2, 1739);
INSERT INTO CaseTab VALUES ('c', 3, 1720);
WITH v_code_gt_1 AS
(SELECT Case_Number, MAX(UniqueID) AS UniqueID
FROM CaseTab
WHERE Code > 1
GROUP BY Case_Number)
SELECT c1.Case_Number, c1.UniqueID
FROM CaseTab c1 JOIN
v_code_gt_1 c2
ON (c1.Case_Number = c2.Case_Number)
WHERE c1.UniqueID > c2.UniqueID
AND c1.Code = 1;
Basically the query gets the max UniqueID for all cases where code is greater than 1 and compares against the Unique ID for Code 1.
You haven't stated whether there can be cases with code = 1, but no other codes. If so, use LEFT JOIN as below.
WITH v_code_gt_1 AS
(SELECT Case_Number, MAX(UniqueID) AS UniqueID
FROM CaseTab
WHERE Code > 1
GROUP BY Case_Number)
SELECT c1.Case_Number, c1.UniqueID
FROM CaseTab c1 LEFT JOIN
v_code_gt_1 c2
ON (c1.Case_Number = c2.Case_Number)
WHERE c1.UniqueID > ISNULL(c2.UniqueID, 0)
AND c1.Code = 1;

Remove duplicates values when all values are the same

I am using SQL workbench/J connecting to amazon redshift.
I have the following data in a table (there are more columns that need to be kept but are all the exact same values for each unique claim_id regardless of line number):
Member ID | Claim_ID | Line_Number |
1 100 1
1 100 2
1 100 1
1 100 2
2 101 13
2 101 13
2 101 13
2 101 13
3 102 12
3 102 12
1 103 2
1 103 2
I want it to become the following which will remove any duplicates based on claim_id (it does not matter which line number is kept):
Member ID | Claim_ID | Line_Number |
1 100 1
2 101 13
3 102 12
1 103 2
I have tried the following:
select er_main.member_id, er_main.claim_id, er_main.line_number,
temp.claim_id, temp.line_number
from OK_ER_30 er_main
inner join (
select row_number() over (partition by claim_id order by line_number desc) as seqnum
from
OK_ER_30 temp) temp
ON er_main.claim_id = temp.claim_id and seqnum = 1
Order by er_main.claim_id, temp.line_number
and this:
select * from ok_er_30
where claim_id in
(select distinct claim_id
from ok_er_30
group by claim_id
)
order by claim_id desc
I have checked many other ways of pulling only one row per distinct claim_id but nothing has worked.
try this
select Distant(Member_ID,Claim_ID,max(Line_Number)) group by Member_ID,Claim_ID
Check out the following code.
declare #OK_ER_30 table(Member_ID int, Claim_ID int, Line_Number int);
insert #OK_ER_30 values
(1, 100, 1),
(1, 100, 2),
(1, 100, 1),
(1, 100, 2),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(3, 102, 12),
(3, 102, 12),
(1, 103, 2),
(1, 103, 2);
with
t as(
select *, row_number() over(
partition by Member_ID, Claim_ID order by (select 0)
) rn
from #OK_ER_30
)
delete from t where rn > 1;
select * from #OK_ER_30;
Try this,
select Member_ID,Claim_ID,max(Line_Number) group by Member_ID,Claim_ID

Distribute values to several rows in SQL Server

I need help with SQL Server on how to distribute a row value to several rows with the same id. To illustrate,
Id = ProductInventoryCode
Qty = QuantityInStock
ForDistribution:
Id | Qty | TotalNoOfBranchesWithId
---+--------+-------------------------
1 | 40 | 2
2 | 33 | 3
3 | 21 | 2
A table that will receive the distributed values
Id | BranchCode | Qty | QtyFromForDistributionTable
-------------------------------------------------------
1 101 13 20
1 102 8 20
2 101 10 11
2 102 2 10
2 103 3 12
3 101 1 11
3 102 12 10
As much as possible the distribution should be near equal for each id and branches.
I got something like below, but somewhat got confused and lost path.
with rs as
(
select
r.*, cume.cumequantity,
coalesce(s.shipped, 0) AS shipped
from
tmpForDistribution r
cross apply
(SELECT SUM([QuantityInStock]) AS cumequantity
FROM tmpForDistribution r2
WHERE r2.ProductInventoryCode = r.ProductInventoryCode) cume
left join
(SELECT ProductInventoryCode, COUNT(ProductInventoryCode) AS shipped
FROM tmpDistributed s
GROUP BY s.ProductInventoryCode) s ON r.ProductInventoryCode = s.ProductInventoryCode
)
select
rs.ProductInventoryCode, rs.cumequantity, rs.QuantityInStock,
***"how to distribute"***
from rs
I'm currently using SQL Server 2008
Here's a sample screen output
The upper result is 145 Branches, below we use to distribute the ForDistributionQty field which is 3130, I am ending up with a fraction (DistVal = 21.586) which is not correct for this problem, it should be a whole number such as 21, however, if its just 21, then 21 x 145 is just 3045 which is shy of 85 units.
Here we distribute the values, and then make a final "adjustment" to the record which has the largest quantity (arbitrary). But at the end of the day, the math works and the distributed values are square.
Note: Not sure why in your sample why ID 2 did not get an even distribution
Declare #Table table (Id int,BranchCode int,Qty int)
Insert Into #Table values
(1, 101, 13),
(1, 102, 8),
(2, 101, 10),
(2, 102, 2),
(2, 103, 3),
(3, 101, 1),
(3, 102, 12)
Declare #Dist table (ID int,Qty int)
Insert Into #Dist values
(1,40),
(2,33),
(3,49)
;with cte0 as (
Select A.*
,ToDist = cast(D.Qty as int)
,DistVal = cast(D.Qty as int)/C.Cnt
,RN = Row_Number() over (Partition By A.ID Order By cast(D.Qty as int)/C.Cnt Desc,A.Qty Desc)
From #Table A
Join (Select ID,Cnt=count(*) from #Table Group By ID) C on A.ID=C.ID
Join #Dist D on A.ID=D.ID )
, cte1 as (
Select ID,AdjVal=Sum(DistVal)-max(ToDist) From cte0 Group By ID
)
Select A.ID
,A.BranchCode
,A.Qty
,DistVal = DistVal - case when A.RN<=abs(AdjVal) then 1*sign(AdjVal) else 0 end
From cte0 A
Join cte1 B on (A.ID=B.Id)
Order By 1,2
Returns
ID BranchCode Qty DistVal
1 101 13 20
1 102 8 20
2 101 10 11
2 102 2 11
2 103 3 11
3 101 1 24
3 102 12 25
If you can tolerate decimal values, a subquery seems to give a better query plan (tested on SQL 2014, with some sensible keys in place, this avoids a table spool and some additional index scans):
Declare #Table table (Id int,BranchCode int,Qty int, primary key(id, branchcode))
Insert Into #Table values
(1, 101, 13),
(1, 102, 8),
(2, 101, 10),
(2, 102, 2),
(2, 103, 3),
(3, 101, 1),
(3, 102, 12)
Declare #Dist table (ID int primary key,Qty int)
Insert Into #Dist values
(1,40),
(2,33),
(3,21)
SELECT
t.id
,t.BranchCode
,t.Qty
,(d.Qty / CAST((SELECT COUNT(*) as cnt FROM #table t2 where t.id = t2.id) AS decimal(10,2))) as DistributedQty
FROM #Table t
INNER JOIN #Dist d
ON d.id = t.Id
outputs:
Id BranchCode Qty DistributedQty
1 101 13 20.00000000000
1 102 82 20.00000000000
2 101 10 11.00000000000
2 102 21 11.00000000000
2 103 31 11.00000000000
3 101 11 10.50000000000
3 102 12 10.50000000000
If you need DistributedQty to be an int and retain remainders then I can't think of a better solution than #John Cappelletti's, noting that uneven quantities may not be as exactly even as you might hope (e.g. 32 distributed by three would result in a 12/10/10 distribution instead of an 11/11/10 distribution).

How to select multi record depending on some column's condition?

Say there is a SQL Server table which contain 2 columns: ID, Value
The sample data looks like this:
ID value
------------------
1 30
1 30
2 50
2 50
3 50
When I run this query:
select ID, NEWID(), value
from table1
order by ID
The result looks like this:
1 30 E152AD19-9920-4567-87FF-C4822FD9E485
1 30 54F28C58-ABA9-4DFB-9A80-CE9C4C390CBB
2 50 ........
2 50 ........
3 50 4E5A9E26-FEEC-4CC7-9AC5-96747053B6B2
But what I want is : how many record of ID depending on (sum of value /30 )'s result, for example of ID 2, it's value's sum is 50+50=100, and 100/30=3, so ID 2 will display in query result three times
The final result i want is like this:
1 E152AD19-9920-4567-87FF-C4822FD9E485
1 54F28C58-ABA9-4DFB-9A80-CE9C4C390CBB
2 4E5A9E26-FEEC-4CC7-9AC5-96747053B6B2
2 ....
2 ....
3 D861563E-E01A-4198-9E92-7BEB4678E5D1
Please note ID of 2 display three times, wait for your helps, thanks.
How about something like
CREATE TABLE Table1
([ID] int, [value] int)
;
INSERT INTO Table1
([ID], [value])
VALUES
(1, 30),
(1, 30),
(2, 50),
(2, 50),
(3, 50)
;
;WITH SummedVals AS (
SELECT ID,
SUM(value) / 30 Cnt
FROM Table1
GROUP BY ID
)
, Vals AS (
SELECT ID,
Cnt - 1 Cnt
FROM SummedVals
UNION ALL
SELECT ID,
Cnt - 1 Cnt
FROM Vals
WHERE Cnt > 0
)
SELECT ID,
NEWID()
FROM Vals
ORDER BY 1
SQL Fiddle DEMO

How can I group a set split by change in a field with respect to an order?

I have a set of records.
ID Value
1 a
2 b
3 b
4 b
5 a
6 a
7 b
8 b
And I would like to group them like so.
MIN(ID) MAX(ID) Value
1 1 a
2 4 b
5 6 a
7 8 b
I'm vaguely aware of oracle over() analytical function which looks to be the right direction, but I don't know what this problem is called much less how to solve it.
Probably an easier way, but this may help to start. I ran it on Postgres, but should work (maybe with a minor tweak) on Oracle. The inner most query puts the previous value on each row. We can use that to detect a grouping change (when value does not equal previous value). Every time there is a group change, we flag it with a "1". Sum these group changes and we now have a group id which increments every time there is a value change. Then we can perform our normal group by function.
create table x(id int, value varchar(1));
insert into x values(1, 'a');
insert into x values(2, 'b');
insert into x values(3, 'b');
insert into x values(4, 'b');
insert into x values(5, 'a');
insert into x values(6, 'a');
insert into x values(7, 'b');
insert into x values(8, 'b');
SELECT MIN(id), MAX(id), value
FROM ( SELECT id
,value
,previous_value
,SUM( CASE WHEN value = previous_value THEN 0 ELSE 1 END ) OVER(ORDER BY id) AS group_id
FROM ( SELECT id
,value
,COALESCE( LAG(value) OVER(ORDER BY id), value ) previous_value
FROM x
ORDER BY id
) y
) z
GROUP BY group_id, value
ORDER BY 1, 2;
min | max | value
-----+-----+-------
1 | 1 | a
2 | 4 | b
5 | 6 | a
7 | 8 | b
(4 rows)