Removing Duplicate Sets of Rows

Removing Duplicate Sets of Rows - sql

Data
CREATE TABLE #tbl_LinkedInvoices(
InvoiceNbr varchar(50)
, AssociatedInvoiceNbr varchar(50)
, RowNbr int
, AssociatedRowNbr int
)
INSERT INTO #tbl_LinkedInvoices(
InvoiceNbr, AssociatedInvoiceNbr, RowNbr, AssociatedRowNbr)
VALUES
('A0001', 'A1001', 1, 4),
('A0002', 'A2002', 2, 5),
('A0002', 'A3002', 3, 6),
('A1001', 'A0001', 4, 1),
('A2002', 'A0002', 5, 2),
('A3002', 'A0002', 6, 3)
SELECT * FROM #tbl_LinkedInvoices
Challenge/Goal
tbl_LinkedInvoices is meant to identify the AssociatedInvoiceNbrs b an InvoiceNbr a is linked to. As such, a set can appear multiple times in the table since (a, b) = (b,a). To address these reappearances RowNbr and AssociatedRowNbr fields are added to give grouped sequences.
With the identified duplicate rows, remove duplicate row, preserving a single unique record in the table. Current script yields error, expect there might be a better way to write the query.
Script
Use a counter to check if the duplicate row still exists, if it does delete that row, till FALSE.
DECLARE #RowCounter int
DECLARE #RemoveRow int
SET #RowCounter = 1
IF EXISTS (SELECT
RowNbr
FROM #tbl_LinkedInvoices WHERE RowNbr = (SELECT AssociatedRowNbr FROM #tbl_LinkedInvoices)
)
BEGIN
SET #RemoveRow = (SELECT RowNbr FROM #tbl_LinkedINvoices
WHERE RowNbr = (
SELECT AssociatedRowNbr FROM #tbl_LinkedInvoices WHERE RowNbr =#RowCounter ))
BEGIN
DELETE FROM #tbl_LinkedInvoices
WHERE
RowNbr = #RemoveRow
END
BEGIN
SET #RowCounter = #RowCounter + 1
END
END
Error
Msg 512, Level 16, State 1, Line 212
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

If I followed you correctly, you can perform the deletion of "mirror" records in a single statement, without using the additional computed columns:
delete t
from #tbl_LinkedInvoices t
where exists (
select 1
from #tbl_LinkedInvoices t1
where
t1.AssociatedInvoiceNbr = t.InvoiceNbr
and t1.InvoiceNbr = t.AssociatedInvoiceNbr
and t1.AssociatedInvoiceNbr > t.AssociatedInvoiceNbr
)
This removes mirror records while retaining the one whose InvoiceNbr is smaller than AssociatedInvoiceNbr.
Demo on DB Fiddle with your sample data.
After the delete statement is executed, the content of the table is:
InvoiceNbr | AssociatedInvoiceNbr | RowNbr | AssociatedRowNbr
:--------- | :------------------- | -----: | ---------------:
A0001 | A1001 | 1 | 4
A0002 | A2002 | 2 | 5
A0002 | A3002 | 3 | 6

Related

Get Ids from constant list for which there are no rows in corresponding table

Let say I have a table Vehicles(Id, Name) with below values:
1 Car
2 Bike
3 Bus
and a constant list of Ids:
1, 2, 3, 4, 5
I want to write a query returning Ids from above list for which there are no rows in Vehicles table. In the above example it should return:
4, 5
But when I add new row to Vehicles table:
4 Plane
It should return only:
5
And similarly, when from the first version of Vehicle table I remove the third row (3, Bus) my query should return:
3, 4, 5
I tried with exist operator but it doesn't provide me correct results:
select top v.Id from Vehicle v where Not Exists ( select v2.Id from Vehicle v2 where v.id = v2.id and v2.id in ( 1, 2, 3, 4, 5 ))

You need to treat your "list" as a dataset, and then use the EXISTS:
SELECT V.I
FROM (VALUES(1),(2),(3),(4),(5))V(I) --Presumably this would be a table (type parameter),
--or a delimited string split into rows
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable YT
WHERE YT.YourColumn = V.I);

Please try the following solution.
It is using EXCEPT set operator.
Set Operators - EXCEPT and INTERSECT (Transact-SQL)
SQL
-- DDL and sample data population, start
DECLARE #Vehicles TABLE (ID INT PRIMARY KEY, vehicleType VARCHAR(30));
INSERT INTO #Vehicles (ID, vehicleType) VALUES
(1, 'Car'),
(2, 'Bike'),
(3, 'Bus');
-- DDL and sample data population, end
DECLARE #vehicleList VARCHAR(20) = '1, 2, 3, 4, 5'
, #separator CHAR(1) = ',';
SELECT TRIM(value) AS missingID
FROM STRING_SPLIT(#vehicleList, #separator)
EXCEPT
SELECT ID FROM #Vehicles;
Output
+-----------+
| missingID |
+-----------+
| 4 |
| 5 |
+-----------+

In SQL we store our values in tables. We therefore store your list in a table.
It is then simple to work with it and we can easily find the information wanted.
I fully agree that it is possible to use other functions to solve the problem. It is more intelligent to implement database design to use basic SQL. It will run faster, be easier to maintain and will scale for a table of a million rows without any problems. When we add the 4th mode of transport we don't have to modify anything else.
CREATE TABLE vehicules(
id int, name varchar(25));
INSERT INTO vehicules VALUES
(1 ,'Car'),
(2 ,'Bike'),
(3 ,'Bus');
CREATE TABLE ids (iid int)
INSERT INTO ids VALUES
(1),(2),(3),(4),(5);
CREATE VIEW unknownIds AS
SELECT iid unknown_id FROM ids
LEFT JOIN vehicules
ON iid = id
WHERE id IS NULL;
SELECT * FROM unknownIds;
| unknown_id |
| ---------: |
| 4 |
| 5 |
INSERT INTO vehicules VALUES (4,'Plane')
SELECT * FROM unknownIds;
| unknown_id |
| ---------: |
| 5 |
db<>fiddle here

Rule Table (With wild cards)

I have searched the web for something similar to my question, but have not found anything. I am looking for a way to apply "rules" with a SQL query.
My input data schema:
a: Int (NOT NULL)
b: Int (NOT NULL)
c: Int (NOT NULL)
The rule table schema:
a: Int (NULLABLE)
b: Int (NULLABLE)
c: Int (NULLABLE)
result: Int (NOT NULL)
There could be multiple rules which could "match" with the data. NULL represents a wildcard (could be any value). For example:
Input Data
a | b | c
1 | 2 | 3
Rules table:
a | b | c | result
1 | 2 | NULL| 99
1 | NULL| NULL| 101
1 | 2 | 3 | 203
When the rules are applied, it should match with the row which has the most matches (row 3 in this case).
I have come up with a query, which appears to be working, but it is not perfect. It can be slow if the "rules" table gets significant in size, and I'm worried there are edge cases I could be missing.
SELECT input.*,
COALESCE(rule.result, -1) as 'RuleResult'
FROM dbo.input input
OUTER APPLY (
SELECT TOP 1 result
FROM dbo.RuleTable rt
WHERE (input.a = rt.a OR rt.a IS NULL)
AND (input.b = rt.b OR rt.b IS NULL)
AND (input.c = rt.c OR rt.c IS NULL)
ORDER BY rt.a DESC, rt.b DESC, rt.c DESC
) rule
The idea is: The outer apply will run the query for each row in the input. The ORDER BY clause will set the priority of the rules, and will have the columns with the least number of NULL values at the top. The top row then becomes the result. The ORDER BY clause needs to align with the business need. If there is one rule with an 'a' value, and another with a 'b' value, an input row could match with two rules which only have a single matching condition. Then one still needs to be chosen.
My question: Is this query optimal? Am I missing anything? Are there resources about this out there I have not found? Is there a better way of doing this?
Update 1: After reading replies and discussing this with other people, here are some additional thoughts (still need to test these out):
Can we reverse the join? Basically join the rule table to the data
Can we eliminate input data that we know no rules apply to? Is this possible with NULLS (wildcard) values, or would this help?
Note: I'm still working through the thought process on this, so I may not be super clear yet.

Maybe the ORDER BY in the OUTER APPLY could use a small change
ORDER BY (iif(rt.a is null,0,1)
+iif(rt.b is null,0,1)
+iif(rt.c is null,0,1)) desc,
rt.a desc, rt.b desc, rt.c desc
Sample Data
create table input (a int, b int, c int);
insert into input values
(1,2,3),
(1,2,null),
(1,null,null),
(4,5,6);
create table RuleTable (a int, b int, c int, result int);
insert into RuleTable values
(1,2,null, 120),
(1,null,null, 100),
(1,2,3, 123),
(4,null,null, 400),
(null,5,6, 056);
Query
SELECT input.*
, COALESCE(ruled.result, -1) as RuleResult
FROM input input
OUTER APPLY (
SELECT TOP 1 result
FROM RuleTable as rt
WHERE (input.a = rt.a OR rt.a IS NULL)
AND (input.b = rt.b OR rt.b IS NULL)
AND (input.c = rt.c OR rt.c IS NULL)
ORDER BY (iif(rt.a is null,0,1)
+iif(rt.b is null,0,1)
+iif(rt.c is null,0,1)) desc,
rt.a desc, rt.b desc, rt.c desc
) ruled
ORDER BY a desc, b desc, c desc;
Result
a
b
c
RuleResult
4
5
6
56
1
2
3
123
1
2
null
120
1
null
null
100
Test on db<>fiddle here

SQL updating multiple rows in 1 table

I have a question. I am using MS SQL Server Management Studio by the way.
I have a Dictionary table with a lot of translations. I need to copy a complete description from a languageID to another languageID.
Example below.
LanguageID | Description
2 | Some text
2 | More text
2 | Some more text
10 | *needs to be replaced
10 | *needs to be replaced
10 | *needs to be replaced
The result must be like this:
LanguageID | Description
2 | Some text
2 | More text
2 | Some more text
10 | Some text
10 | More text
10 | Some more text
The description of LanguageID 2 and 10 must be exactly the same.
My current Query runs into an error:
update tblDictionary
set Description = (Select Description from tblDictionary where
tblDictionary.LanguageID = 2)
where LanguageID = 10
Msg 512, Level 16, State 1, Line 1 Subquery returned more than 1
value. This is not permitted when the subquery follows =, !=, <, <= ,
, >= or when the subquery is used as an expression. The statement has been terminated.

If all translations for LanguageID 10 must be exact the same as for languageID 2 then its easier to delete all translations for ID 10 and then insert them back again.
Something like this
delete from tblDictionary where LanguageID = 10;
insert into tblDictionary (LanguageID, Description)
select 10, d.Description
from tblDictionary d
where d.LanguageID = 2
This method also has the advantage that if there are less records with LanguageID = 10 then there are for LanguageID = 2 this will be corrected in the same process.
If you have more columns in tblDictionary than you will need to modify the insert statement off course

DECLARE #temp varchar(50)
DECLARE language_cursor CURSOR FOR
SELECT Description FROM tblDictionary
WHERE LanguageID = 2
ORDER BY Description;
OPEN language_cursor;
-- Perform the first fetch.
FETCH NEXT FROM language_cursor
into #temp;
-- Check ##FETCH_STATUS to see if there are any more rows to fetch.
WHILE ##FETCH_STATUS = 0
BEGIN
update TOP (1) tblDictionary
set Description = #temp
where Description = ''
and LanguageID = 10;
FETCH NEXT FROM language_cursor
into #temp;
END
CLOSE language_cursor;
DEALLOCATE language_cursor;
Set all languageID 10 to empty first, then loop all description from languageID 2 to update into languageID 10 one by one until all empty description from languageID10 is filled.

Now if you really want an update, something like this should work, even though I think the structure of the table needs to be improved.
WITH l2 AS
(SELECT *,
ROW_NUMBER() OVER(PARTITION BY LanguageId ORDER BY Description ASC) AS No FROM tblDictionary WHERE LanguageId=2),
l10 AS
(SELECT *,
ROW_NUMBER() OVER(PARTITION BY LanguageId ORDER BY Description ASC) AS No FROM tblDictionary WHERE LanguageId=10)
UPDATE l10 SET Description = l2.Description
FROM l10
INNER JOIN l2 ON l10.No = l2.No

Recursive view that sum value from double tree structure SQL Server

First sorry for numerous repost of my question, I'm new around and getting used to properly and clearly asking questions.
I'm working on a recursive view that sum up values from a double tree structure.
I have researched around and found many questions about recursive sums but none of their solutions seemed to work for my issue specifically.
As of now I have issues aggregating the values in the right cells, the logic being i need the sum of each element per year in it's parent and also the sum of all the years for a given element.
Here is a fiddle of my tables and actual script:
SQL Fiddle
And here is a screenshot of the output I'm looking for:
My question is:
How can I get my view to aggregate the value from child to parent in this double tree structure?

If I understand your question correctly, you are trying to get an aggregation at 2 different levels to show in a single result set.
Clarification Scenario:
Below is an over-simplified sample data set for what I believe you are trying to achieve.
create table #agg_table
(
group_one int
, group_two int
, group_val int
)
insert into #agg_table
values (1, 1, 6)
, (1, 1, 7)
, (1, 2, 8)
, (1, 2, 9)
, (2, 3, 10)
, (2, 3, 11)
, (2, 4, 12)
, (2, 4, 13)
Given the sample data above, you want want to see the following output:
+-----------+-----------+-----------+
| group_one | group_two | group_val |
+-----------+-----------+-----------+
| 1 | NULL | 30 |
| 1 | 1 | 13 |
| 1 | 2 | 17 |
| 2 | NULL | 46 |
| 2 | 3 | 21 |
| 2 | 4 | 25 |
+-----------+-----------+-----------+
This output can be achieved by making use of the group by grouping sets
(example G. in the link) syntax in SQL Server as shown in the query below:
select a.group_one
, a.group_two
, sum(a.group_val) as group_val
from #agg_table as a
group by grouping sets
(
(
a.group_one
, a.group_two
)
,
(
a.group_one
)
)
order by a.group_one
, a.group_two
What that means for your scenario, is that I believe your Recursive-CTE is not the issue. The only thing that needs to change is in the final select query from the entire CTE.
Answer:
with Temp (EntityOneId, EntityOneParentId, EntityTwoId, EntityTwoParentId, Year, Value)
as
(
SELECT E1.Id, E1.ParentId, E2.Id, E2.ParentId, VY.Year, VY.Value
FROM ValueYear AS VY
FULL OUTER JOIN EntityOne AS E1
ON VY.EntityOneId = E1.Id
FULL OUTER JOIN EntityTwo AS E2
ON VY.EntityTwoId = E2.Id
),
T (EntityOneId, EntityOneParentId, EntityTwoId, EntityTwoParentId, Year, Value, Levels)
as
(
Select
T1.EntityOneId,
T1.EntityOneParentId,
T1.EntityTwoId,
T1.EntityTwoParentId,
T1.Year,
T1.Value,
0 as Levels
From
Temp
As T1
Where
T1.EntityOneParentId is null
union all
Select
T1.EntityOneId,
T1.EntityOneParentId,
T1.EntityTwoId,
T1.EntityTwoParentId,
T1.Year,
T1.Value,
T.Levels +1
From
Temp
AS T1
join
T
On T.EntityOneId = T1.EntityOneParentId
)
Select
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId,
T.Year,
sum(T.Value) as Value
from T
group by grouping sets
(
(
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId,
T.Year
)
,
(
T.EntityOneId,
T.EntityOneParentId,
T.EntityTwoId,
T.EntityTwoParentId
)
)
order by T.EntityOneID
, T.EntityOneParentID
, T.EntityTwoID
, T.EntityTwoParentID
, T.Year
FYI - I believe the sample data did not have the records necessary to match the expected output completely, but the last 20 records in the SQL Fiddle match the expected output perfectly.

Updating fields in sql incrementing by 1 each time

I am trying to construct an SQL query that would take a set of rows and renumber a field on all rows, starting at where they all match a session ID.
e.g.
before change:
SessionID | LineNumber | LineContents
----------------------------------------------
74666 | 1 | example content
74666 | 2 | some other content
74666 | 3 | another line
74666 | 4 | final line
after change (user has deleted line 2 so the 'LineNumber' values have updated to reflect the new numbering (i.e. line '3' has now become line '2' etc.):
SessionID | LineNumber | LineContents
----------------------------------------------
74666 | 1 | example content
74666 | 2 | another line
74666 | 3 | final line
So reflecting this in NON proper syntax would be something along these lines
i = 0;
UPDATE tbl_name
SET LineNumber = i++;
WHERE SessionID = 74666;
Searches a lot for this with no luck, any help is great :)

Using Row_Number() function and CTE:
;WITH CTE AS (
SELECT SessionID, LineNumber, LineContents,
Row_Number() OVER(PARTITION BY SessionID ORDER BY LineNumber) Rn
FROM Table1
)
UPDATE CTE
SET LineNumber = Rn
WHERE SessionID = 74666;
Fiddle Demo

You can use ROW_NUMBER ( MS SQL ) or ROWNUM ( Oracle ) or similar inside your UPDATE statement.
Check this
Or this

You have 2 main ways to do that.
The first "low level" way is this one (SQL Fiddle here):
DELETE FROM TestTable
WHERE SESSIONID = 74666 AND LineNumber = 3;
UPDATE TestTable SET LineNumber = LineNumber-1
WHERE SESSIONID = 74666 AND LineNumber > 3
select * from TestTable -- check the result
Here we're assuming you know both LineNumber and SessionID.
The other way is through t-SQL Triggers, a little more complex but it helps you if you don't know info about the rows you're deleting. Give it a try.

CREATE TABLE Trial (
-- ID INT,
SessionID INT ,
LineNumber INT ,
LineContent NVARCHAR(100)
)
INSERT INTO dbo.trial
VALUES ( 74666, 1, 'example content' ) ,
( 74666, 2, 'some other content' ),
( 74666, 4, 'another line' ),
( 74666, 5, 'final line' )
You can last deleted LineNumber value and can use that id in your update statement to update rest of the LineNumbers , for instance here Linenumber 3 is deleted so ,
UPDATE dbo.trial SET LineNumber = LineNumber -1 WHERE LineNumber > 3

I would be really inclined to handle this differently if at all possible, and have the Linenumber generated on the fly, to avoid having to maintain a column, e.g.:
CREATE TABLE dbo.T
(
LineNumberID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY,
SessionID INT NOT NULL,
LineContents NVARCHAR(MAX) NOT NULL
);
GO
INSERT dbo.T (SessionID, LineContents)
VALUES
(74666, 'example content'),
(74666, 'some other content'),
(74666, 'another line'),
(74666, 'final line');
GO
CREATE VIEW dbo.V
AS
SELECT LinenumberID,
SessionID,
Linenumber = ROW_NUMBER() OVER(PARTITION BY SessionID ORDER BY LinenumberID),
LineContents
FROM dbo.T;
GO
In this your View gives you what you need, and if I delete as follows:
SELECT *
FROM dbo.V;
DELETE dbo.V
WHERE SessionID = 74666
AND Linenumber = 3;
SELECT *
FROM dbo.V;
You get the output:
LINENUMBERID SESSIONID LINENUMBER LINECONTENTS
1 74666 1 example content
2 74666 2 some other content
3 74666 3 another line
4 74666 4 final line
LINENUMBERID SESSIONID LINENUMBER LINECONTENTS
1 74666 1 example content
2 74666 2 some other content
4 74666 3 final line
Example on SQL Fiddle
So you maintain your sequential linenumber field without actually having to update a field. This of course only works if you can rely on a field (such as CreatedDate, or an ID column) to order by. Otherwise you will have to maintain it using triggers, and update statements as suggested in other answers.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Removing Duplicate Sets of Rows - sql

Related

Get Ids from constant list for which there are no rows in corresponding table

Rule Table (With wild cards)

SQL updating multiple rows in 1 table

Recursive view that sum value from double tree structure SQL Server

Updating fields in sql incrementing by 1 each time

Categories

Resources