Find MisMatch rows in same table in SQL Server

Find MisMatch rows in same table in SQL Server - sql

StudentID StudentID NO
1 111
1 211
1 111
2 444
2 444
2 444
5 555
5 555
5 NULL
6 66
6 66
6 66
6 66
Output
1
5
The output is 1 and 5 because they have different studentIDnos wheareas 2 and 6 have same studentIDnos. We should take care null also. Consider 5
I need a SQL server Query to get this output

You can achieve this with a group by and having clause like this:
SELECT studentID
FROM YourTable
GROUP BY studentID
HAVING count(distinct isnull(studentID_NO,1)) > 1
It will return every students that have more then 1 studentID_NO

INSERT INTO #Table1
(StudentID, StudentIDNO)
VALUES
(1, '111'),
(1, '211'),
(1, '111'),
(2, '444'),
(2, '444'),
(2, '444'),
(5, '555'),
(5, '555'),
(5, NULL),
(6, '66'),
(6, '66'),
(6, '66'),
(6, '66')
;
Select T.StudentID from (
select StudentID, StudentIDNO from #Table1
GRoup by StudentID, StudentIDNO)T
GROUP BY T.StudentID
HAVING COUNT(T.StudentID) > 1

One way:
select StudentID from T
group by StudentID
having count(distinct isnull([StudentID NO], -1)) > 1
Assuming -1 is not going to appear

Related

I want to fetch data in sequence of their branched data if exists

Hi i have data in one table as Question Table
QuestionID QuestionDescription
2 This is test
3 test is tst
4 3
6 5
17 6
18 7
19 8
20 9
5 4
and in one Table QuestionBranching Table as
QuestionBranchingID QuestionID Response NextQuestionID ParentQuestionID
1 3 True 5 3
2 3 False 6 3
7 5 True 19 3
8 5 False 20 3
9 18 True 17 18
10 18 False 4 18
So if any QuestionID exists in the QuestionBranching table then the Select Join query should fetch data in that sequence order. for ex.:
If QuestionID exists in QuestionBranching Table then NextQuestionID will be next in the sequence.
and If not then normal flow.
So the desired result i am looking for is :
QuestionID
2
3(if it exists in QuestionBranching then NextQuestionID will be next i.e. '5')
5
6
19
20
18
17
4

Try this:
select isnull(b.NextQuestionID,q.QuestionID) as QuestionID
from Question q
left join QuestionBranching b on q.QuestionID=b.QuestionID

maybe this can help you
declare #question table (QuestionID int, QuestionDescription varchar(100))
declare #branching table (BranchingID int, QuestionID int, Response bit, NextQuestionID int)
insert into #question values (2, 'this is a test'), (3, 'test is tst'), (4, '3'), (6, '5'), (17, '6'), (18, '7'), (19, '8'), (20, '9'), (5, '4')
insert into #branching values (1, 3, 1, 5), (2, 3, 0, 6), (7, 5, 1, 19), (8, 5, 0, 20)
select t.nq
from ( select q.QuestionID, q.QuestionID as nq
from #question q
union all
select isnull((select b2.QuestionID from #branching b2 where b2.NextQuestionID = b.QuestionID), b.QuestionID) as QuestionID, b.NextQuestionID as nq
from #branching b
) t
order by t.QuestionID, t.nq

How do I group the results by company?

I have the following sql fiddle:
CREATE TABLE companies (pk serial PRIMARY KEY, name text, max int);
INSERT INTO companies(name, max)
VALUES
('Company A', 3),
('Company B', 8),
('Company C', -1);
CREATE TABLE employees (pk serial PRIMARY KEY, company integer REFERENCES companies (pk),
name text, joined timestamp);
INSERT INTO employees (company, name, joined)
VALUES
(2, 'Jane', '2015-09-23 14:46:57'),
(2, 'Jack', '2015-09-23 14:46:57'),
(3, 'Frank', '2015-09-23 14:51:07'),
(2, 'Bob', '2015-09-23 14:56:11'),
(1, 'Carl', '2015-09-23 16:12:05'),
(1, 'Jason', '2015-09-23 16:15:35'),
(3, 'Fred', '2015-09-23 16:28:35'),
(2, 'Bruce', '2015-09-23 16:35:51'),
(1, 'Brian', '2015-09-23 16:36:17'),
(1, 'Ryan', '2015-09-23 16:36:22'),
(1, 'Peter', '2015-09-23 16:37:04'),
(3, 'Ed', '2015-09-23 16:37:11'),
(2, 'Jenny', '2015-09-23 16:37:15'),
(2, 'Jessica', '2015-09-24 09:52:46'),
(3, 'Anita', '2015-09-24 10:01:19'),
(3, 'Melanie', '2015-09-24 10:05:27'),
(3, 'Kathryn', '2015-09-24 10:05:29'),
(2, 'Ashely', '2015-09-24 10:19:46'),
(1, 'Valerie', '2015-09-24 14:49:05'),
(2, 'Jimmy', '2015-09-24 15:42:45'),
(3, 'Johnny', '2015-09-24 17:38:06'),
(1, 'Mick', '2015-09-25 14:49:10');
SELECT * -- choose the columns you want here
FROM (SELECT e.*, c.max,
row_number() over (partition by company order by joined desc) as rank
FROM employees e JOIN
companies c
on e.company = c.pk
) e
WHERE rank <= max or max = -1
This gives:
pk company name joined max rank
22 1 Mick 2015-09-25T14:49:10Z 3 1
19 1 Valerie 2015-09-24T14:49:05Z 3 2
11 1 Peter 2015-09-23T16:37:04Z 3 3
20 2 Jimmy 2015-09-24T15:42:45Z 8 1
18 2 Ashely 2015-09-24T10:19:46Z 8 2
14 2 Jessica 2015-09-24T09:52:46Z 8 3
13 2 Jenny 2015-09-23T16:37:15Z 8 4
8 2 Bruce 2015-09-23T16:35:51Z 8 5
4 2 Bob 2015-09-23T14:56:11Z 8 6
1 2 Jane 2015-09-23T14:46:57Z 8 7
2 2 Jack 2015-09-23T14:46:57Z 8 8
21 3 Johnny 2015-09-24T17:38:06Z -1 1
17 3 Kathryn 2015-09-24T10:05:29Z -1 2
16 3 Melanie 2015-09-24T10:05:27Z -1 3
15 3 Anita 2015-09-24T10:01:19Z -1 4
12 3 Ed 2015-09-23T16:37:11Z -1 5
7 3 Fred 2015-09-23T16:28:35Z -1 6
3 3 Frank 2015-09-23T14:51:07Z -1 7
How do I get it so that the results are grouped by company? e.g. I'd like 3 rows (1 for each company) and then an array of the employees for each. For instance, Company A would look like:
1 [{"name": "Mick", "joined": "2015-09-25T14:49:10Z", "rank": 1},{"name": "Valerie", "joined": "2015-09-24T14:49:05Z", "rank": 2},{"name": "Peter", "joined": "2015-09-23T16:37:04Z", "rank": 3}]
I have been trying various GROUP By statements and keep running into varous errors where the sql is invalid, etc.

Your sample output looks a bit like a JSON array (however, not a valid JSON value), so maybe you are looking for something like this:
select c.pk, jsonb_agg(to_jsonb(e))
from employees e
join companies c on e.company = c.pk
group by c.pk;
To get the three "most recent" employees you can use:
select c.pk, jsonb_agg(e3.emp)
from (
select company,
to_jsonb(e) as emp,
row_number() over (partition by company order by joined desc) as rn
from employees e
) e3
join companies c on e.company = c.pk
where e3.rn <= 3;
Online Example: https://rextester.com/RPSI96409

Remove duplicates values when all values are the same

I am using SQL workbench/J connecting to amazon redshift.
I have the following data in a table (there are more columns that need to be kept but are all the exact same values for each unique claim_id regardless of line number):
Member ID | Claim_ID | Line_Number |
1 100 1
1 100 2
1 100 1
1 100 2
2 101 13
2 101 13
2 101 13
2 101 13
3 102 12
3 102 12
1 103 2
1 103 2
I want it to become the following which will remove any duplicates based on claim_id (it does not matter which line number is kept):
Member ID | Claim_ID | Line_Number |
1 100 1
2 101 13
3 102 12
1 103 2
I have tried the following:
select er_main.member_id, er_main.claim_id, er_main.line_number,
temp.claim_id, temp.line_number
from OK_ER_30 er_main
inner join (
select row_number() over (partition by claim_id order by line_number desc) as seqnum
from
OK_ER_30 temp) temp
ON er_main.claim_id = temp.claim_id and seqnum = 1
Order by er_main.claim_id, temp.line_number
and this:
select * from ok_er_30
where claim_id in
(select distinct claim_id
from ok_er_30
group by claim_id
)
order by claim_id desc
I have checked many other ways of pulling only one row per distinct claim_id but nothing has worked.

try this
select Distant(Member_ID,Claim_ID,max(Line_Number)) group by Member_ID,Claim_ID

Check out the following code.
declare #OK_ER_30 table(Member_ID int, Claim_ID int, Line_Number int);
insert #OK_ER_30 values
(1, 100, 1),
(1, 100, 2),
(1, 100, 1),
(1, 100, 2),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(3, 102, 12),
(3, 102, 12),
(1, 103, 2),
(1, 103, 2);
with
t as(
select *, row_number() over(
partition by Member_ID, Claim_ID order by (select 0)
) rn
from #OK_ER_30
)
delete from t where rn > 1;
select * from #OK_ER_30;

Try this,
select Member_ID,Claim_ID,max(Line_Number) group by Member_ID,Claim_ID

Distribute values to several rows in SQL Server

I need help with SQL Server on how to distribute a row value to several rows with the same id. To illustrate,
Id = ProductInventoryCode
Qty = QuantityInStock
ForDistribution:
Id | Qty | TotalNoOfBranchesWithId
---+--------+-------------------------
1 | 40 | 2
2 | 33 | 3
3 | 21 | 2
A table that will receive the distributed values
Id | BranchCode | Qty | QtyFromForDistributionTable
-------------------------------------------------------
1 101 13 20
1 102 8 20
2 101 10 11
2 102 2 10
2 103 3 12
3 101 1 11
3 102 12 10
As much as possible the distribution should be near equal for each id and branches.
I got something like below, but somewhat got confused and lost path.
with rs as
(
select
r.*, cume.cumequantity,
coalesce(s.shipped, 0) AS shipped
from
tmpForDistribution r
cross apply
(SELECT SUM([QuantityInStock]) AS cumequantity
FROM tmpForDistribution r2
WHERE r2.ProductInventoryCode = r.ProductInventoryCode) cume
left join
(SELECT ProductInventoryCode, COUNT(ProductInventoryCode) AS shipped
FROM tmpDistributed s
GROUP BY s.ProductInventoryCode) s ON r.ProductInventoryCode = s.ProductInventoryCode
)
select
rs.ProductInventoryCode, rs.cumequantity, rs.QuantityInStock,
***"how to distribute"***
from rs
I'm currently using SQL Server 2008
Here's a sample screen output
The upper result is 145 Branches, below we use to distribute the ForDistributionQty field which is 3130, I am ending up with a fraction (DistVal = 21.586) which is not correct for this problem, it should be a whole number such as 21, however, if its just 21, then 21 x 145 is just 3045 which is shy of 85 units.

Here we distribute the values, and then make a final "adjustment" to the record which has the largest quantity (arbitrary). But at the end of the day, the math works and the distributed values are square.
Note: Not sure why in your sample why ID 2 did not get an even distribution
Declare #Table table (Id int,BranchCode int,Qty int)
Insert Into #Table values
(1, 101, 13),
(1, 102, 8),
(2, 101, 10),
(2, 102, 2),
(2, 103, 3),
(3, 101, 1),
(3, 102, 12)
Declare #Dist table (ID int,Qty int)
Insert Into #Dist values
(1,40),
(2,33),
(3,49)
;with cte0 as (
Select A.*
,ToDist = cast(D.Qty as int)
,DistVal = cast(D.Qty as int)/C.Cnt
,RN = Row_Number() over (Partition By A.ID Order By cast(D.Qty as int)/C.Cnt Desc,A.Qty Desc)
From #Table A
Join (Select ID,Cnt=count(*) from #Table Group By ID) C on A.ID=C.ID
Join #Dist D on A.ID=D.ID )
, cte1 as (
Select ID,AdjVal=Sum(DistVal)-max(ToDist) From cte0 Group By ID
)
Select A.ID
,A.BranchCode
,A.Qty
,DistVal = DistVal - case when A.RN<=abs(AdjVal) then 1*sign(AdjVal) else 0 end
From cte0 A
Join cte1 B on (A.ID=B.Id)
Order By 1,2
Returns
ID BranchCode Qty DistVal
1 101 13 20
1 102 8 20
2 101 10 11
2 102 2 11
2 103 3 11
3 101 1 24
3 102 12 25

If you can tolerate decimal values, a subquery seems to give a better query plan (tested on SQL 2014, with some sensible keys in place, this avoids a table spool and some additional index scans):
Declare #Table table (Id int,BranchCode int,Qty int, primary key(id, branchcode))
Insert Into #Table values
(1, 101, 13),
(1, 102, 8),
(2, 101, 10),
(2, 102, 2),
(2, 103, 3),
(3, 101, 1),
(3, 102, 12)
Declare #Dist table (ID int primary key,Qty int)
Insert Into #Dist values
(1,40),
(2,33),
(3,21)
SELECT
t.id
,t.BranchCode
,t.Qty
,(d.Qty / CAST((SELECT COUNT(*) as cnt FROM #table t2 where t.id = t2.id) AS decimal(10,2))) as DistributedQty
FROM #Table t
INNER JOIN #Dist d
ON d.id = t.Id
outputs:
Id BranchCode Qty DistributedQty
1 101 13 20.00000000000
1 102 82 20.00000000000
2 101 10 11.00000000000
2 102 21 11.00000000000
2 103 31 11.00000000000
3 101 11 10.50000000000
3 102 12 10.50000000000
If you need DistributedQty to be an int and retain remainders then I can't think of a better solution than #John Cappelletti's, noting that uneven quantities may not be as exactly even as you might hope (e.g. 32 distributed by three would result in a 12/10/10 distribution instead of an 11/11/10 distribution).

Finding Missing Numbers When Data Is Grouped In SQL Server

I need to to write a query that will calculate the missing numbers in a sequence when the data is "grouped". The data in each group is in sequence, but each individual group would have its own sequence. The data would look something like this:
Id| Number|
-----------
1 | 250 |
1 | 270 | <260 Missing
1 | 280 | <290 Missing
1 | 300 |
1 | 310 |
2 | 110 |
2 | 130 | <120 Missing
2 | 140 |
3 | 260 |
3 | 270 |
3 | 290 | <280 Missing
3 | 300 |
3 | 340 | <310, 320 & 330 Missing
I have found a solution based on this post from CELKO here:
http://bytes.com/topic/sql-server/answers/511668-query-find-missing-number
In essence to set up a demo run the following:
CREATE TABLE Sequence
(seq INT NOT NULL
PRIMARY KEY (seq));
INSERT INTO Sequence VALUES (1);
INSERT INTO Sequence VALUES (2);
INSERT INTO Sequence VALUES (3);
INSERT INTO Sequence VALUES (4);
INSERT INTO Sequence VALUES (5);
INSERT INTO Sequence VALUES (6);
INSERT INTO Sequence VALUES (7);
INSERT INTO Sequence VALUES (8);
INSERT INTO Sequence VALUES (9);
INSERT INTO Sequence VALUES (10);
CREATE TABLE Tickets
(buyer CHAR(5) NOT NULL,
ticket_nbr INTEGER DEFAULT 1 NOT NULL
PRIMARY KEY (buyer, ticket_nbr));
INSERT INTO Tickets VALUES ('a', 2);
INSERT INTO Tickets VALUES ('a', 3);
INSERT INTO Tickets VALUES ('a', 4);
INSERT INTO Tickets VALUES ('b', 4);
INSERT INTO Tickets VALUES ('c', 1);
INSERT INTO Tickets VALUES ('c', 2);
INSERT INTO Tickets VALUES ('c', 3);
INSERT INTO Tickets VALUES ('c', 4);
INSERT INTO Tickets VALUES ('c', 5);
INSERT INTO Tickets VALUES ('d', 1);
INSERT INTO Tickets VALUES ('d', 6);
INSERT INTO Tickets VALUES ('d', 7);
INSERT INTO Tickets VALUES ('d', 9);
INSERT INTO Tickets VALUES ('e', 10);
SELECT DISTINCT T1.buyer, S1.seq
FROM Tickets AS T1, Sequence AS S1
WHERE seq <= (SELECT MAX(ticket_nbr) -- set the range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq NOT IN (SELECT ticket_nbr -- get missing numbers
FROM Tickets AS T3
WHERE T1.buyer = T3.buyer);
CELKO does mention that this is for a small number of tickets, in my example my numbers table is limited to 200 rows with a single column which is a primary key with each row an increment of 10 as that is what I am interested in. I modified CELKOs query as follows (added in min range):
SELECT DISTINCT T1.buyer, S1.seq
FROM Tickets AS T1, Sequence AS S1
WHERE seq <= (SELECT MIN(ticket_nbr) -- set the MIN range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq <= (SELECT MAX(ticket_nbr) -- set the MAX range
FROM Tickets AS T2
WHERE T1.buyer = T2.buyer)
AND seq NOT IN (SELECT ticket_nbr -- get missing numbers
FROM Tickets AS T3
WHERE T1.buyer = T3.buyer)
ORDER BY buyer, seq;
The output would be those numbers that are missing:
buyer seq
a 1
b 1
b 2
b 3
e 1
e 2
e 3
e 4
e 5
e 6
e 7
e 8
e 9
This works exactly as I want, however, on my data set it is very slow (11 second run time at the moment - it appears to be the DISTINCT which slows things down tremendously and presumably will gt worse as the base data set grows). I have tried all manner of things to make it more efficient but sadly my ambition exceeds my knowledge. Is it possible to make the query above more efficient/faster. My only constraint is that the dataset I am making needs to be a SQL View (as it feeds a report) and will execute on SQL Azure.
Cheers
David

If my understanding is correct, you want to fill in the missing data from the table. The table would consist of ID and a Number which is incremented by 10.
CREATE TABLE Test(
ID INT,
Number INT
)
INSERT INTO Test VALUES
(1, 250), (1, 270), (1, 280), (1, 300), (1, 310),
(2, 110), (2, 130), (2, 140), (3, 260), (3, 270),
(3, 290), (3, 300), (3, 340);
You could do this by using a Tally Table and doing a CROSS JOIN on the Test table:
;WITH E1(N) AS(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS(SELECT 1 FROM E1 a, E1 b)
,E4(N) AS(SELECT 1 FROM E2 a, E2 b)
,Tally(N) AS(
SELECT TOP (SELECT MAX(Number)/10 FROM Test)
(ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) - 1) * 10
FROM E4
),
MinMax AS(
SELECT
ID,
Minimum = MIN(Number),
Maximum = MAX(Number)
FROM Test
GROUP BY ID
),
CrossJoined AS(
SELECT
m.ID,
Number = Minimum + t.N
FROM MinMax m
CROSS JOIN Tally t
WHERE
Minimum + t.N <= Maximum
)
SELECT * FROM CrossJoined c
ORDER BY c.ID, c.Number
RESULT
ID Seq
----------- --------------------
1 250
1 260
1 270
1 280
1 290
1 300
1 310
2 110
2 120
2 130
2 140
3 260
3 270
3 280
3 290
3 300
3 310
3 320
3 330
3 340
If you only want to find the missing Number from Test grouped by ID, just replace the final SELECT statement:
SELECT * FROM CrossJoined c
ORDER BY c.ID, c.Number
to:
SELECT c.ID, c.Number
FROM CrossJoined c
WHERE NOT EXISTS(
SELECT 1 FROM Test t
WHERE
t.ID = c.ID
AND t.Number = c.Number
)
ORDER BY c.ID, c.Number
RESULT
ID Number
----------- --------------------
1 260
1 290
2 120
3 280
3 310
3 320
3 330

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find MisMatch rows in same table in SQL Server - sql

StudentID StudentID NO 1 111 1 211 1 111 2 444 2 444 2 444 5 555 5 555 5 NULL 6 66 6 66 6 66 6 66 Output 1 5 The output is 1 and 5 because they have different studentIDnos wheareas 2 and 6 have same studentIDnos. We should take care null also. Consider 5 I need a SQL server Query to get this output

You can achieve this with a group by and having clause like this: SELECT studentID FROM YourTable GROUP BY studentID HAVING count(distinct isnull(studentID_NO,1)) > 1 It will return every students that have more then 1 studentID_NO

One way: select StudentID from T group by StudentID having count(distinct isnull([StudentID NO], -1)) > 1 Assuming -1 is not going to appear

Related

I want to fetch data in sequence of their branched data if exists

How do I group the results by company?

Remove duplicates values when all values are the same

Distribute values to several rows in SQL Server

Finding Missing Numbers When Data Is Grouped In SQL Server

Categories

Resources