compare primary/alias groups across two tables - sql

Gday,
We have two tables that contain exactly the same structure. There are two columns "PrimaryAddress" and "AliasAddress". These are for email addresses and aliases. We want to find any records that need to be added to either side to keep the records in sync. The catch is that the primary name in one table might be listed as an alias in the other. The good news is that an address wont appear twice in the "AliasAddress" column.
TABLE A
PrimaryAddress~~~~~AliasAdress
chris#work~~~~~~~~~chris#home
chris#work~~~~~~~~~c#work
chris#work~~~~~~~~~theboss#work
chris#work~~~~~~~~~thatguy#aol
bob#test~~~~~~~~~~~test1#test
bob#test~~~~~~~~~~~charles#work
bob#test~~~~~~~~~~~chuck#aol
sally#mars~~~~~~~~~sally#nasa
sally#mars~~~~~~~~~sally#gmail
TABLE B
PrimaryAddress~~~~~AliasAdress
chris#home~~~~~~~~~chris#work
chris#home~~~~~~~~~c#work
chris#home~~~~~~~~~theboss#work
chris#home~~~~~~~~~thatguy#aol
bob#test~~~~~~~~~~~test1#test
bob#test~~~~~~~~~~~charles#work
sally#nasa~~~~~~~~~sally#mars
sally#nasa~~~~~~~~~sally#gmail
sally#nasa~~~~~~~~~ripley#nostromo
The expected result is to return the following missing records from both tables:
bob#test~~~~~~~~~~~chuck#aol
sally#nasa~~~~~~~~~ripley#nostromo
Note that the chris#* block is a total match because the sum of all the aliases (plus primary) is still the same regardless of which address is considered primary. It doesnt matter which address is primary as along as the sum of the entire primary group contains all entries in both tables.
I dont mind if this is run in two passes A->B and B->A but I just cant get my head around a solution.
Any help appreciated :)

drop TABLE #TABLEA
CREATE TABLE #TABLEA
([PrimaryAddress] varchar(10), [AliasAdress] varchar(12))
;
INSERT INTO #TABLEA
([PrimaryAddress], [AliasAdress])
VALUES
('chris#work', 'chris#home'),
('chris#work', 'c#work'),
('chris#work', 'theboss#work'),
('chris#work', 'thatguy#aol'),
('bob#test', 'test1#test'),
('bob#test', 'charles#work'),
('bob#test', 'chuck#aol'),
('sally#mars', 'sally#nasa'),
('sally#mars', 'sally#gmail')
;
drop TABLE #TABLEB
CREATE TABLE #TABLEB
([PrimaryAddress] varchar(10), [AliasAdress] varchar(15))
;
INSERT INTO #TABLEB
([PrimaryAddress], [AliasAdress])
VALUES
('chris#home', 'chris#work'),
('chris#home', 'c#work'),
('chris#home', 'theboss#work'),
('chris#home', 'thatguy#aol'),
('bob#test', 'test1#test'),
('bob#test', 'charles#work'),
('sally#nasa', 'sally#mars'),
('sally#nasa', 'sally#gmail'),
('sally#nasa', 'ripley#nostromo')
;
try the following
select a.PrimaryAddress,a.AliasAdress from #TABLEA a left join #TABLEB b on a.AliasAdress=b.AliasAdress or b.PrimaryAddress=a.AliasAdress
where b.PrimaryAddress is null
union all
select a.PrimaryAddress,a.AliasAdress from #TABLEB a left join #TABLEA b on a.AliasAdress=b.AliasAdress or b.PrimaryAddress=a.AliasAdress
where b.PrimaryAddress is null

So you want to compare table A and B, and find rows which are unqiue in either table. How about an outer join, followed by looking for NULL values:
SELECT ta.*, tb.*
FROM table_a ta
FULL OUTER JOIN table_b tb ON tb.PrimaryAddress = ta.PrimaryAddress
AND tb.AliasAddress = ta.AliasAddress
WHERE ta.PrimaryAddress IS NULL
OR tb.PrimaryAddress IS NULL
If I understand the question correctly, this should return what you ask for.

Here's how I did it, with a bit of throwing-hands-up-in-the-air at the end.
Step one, identify the sets of items to be compared. This is:
For a “primary” value, all values found in Alias
Including the “primary” value as well (to cover that nasa/nostromo case)
A set in a table (A or B) is identified by its primary value. What really makes it hard is that the primary value is not shared across the two tables (sally#mars, sally#nasa). So we can compare sets, but we have to be able to “go back” to the primary on each table separately (e.g. the stand-out from table B may be sally#nasa / ripley#nostroomo, but we have to add sally#mars / ripley#nostromo to table A)
Major problems arise if, in a table, a primary value appears as an alias for a different primary value (e.g. in table A, chris#work appears as an alias for bob#test). For the sake of sanity, I am going to assume this will not happen… but if it does, the problem becomes even harder.
This query works to add missing items in B that are not in A, where the PrimaryAddress is the same for both A and B:
;WITH setA (SetId, FullSet)
as (-- Complete sets in A
select PrimaryAddress, AliasAdress
from A
union select PrimaryAddress, PrimaryAddress
from A
)
,setB (SetId, FullSet)
as (-- Complete sets in B
select PrimaryAddress, AliasAdress
from B
union select PrimaryAddress, PrimaryAddress
from B
)
,NotInB (Missing)
as (-- What's in A that's not in B
select FullSet
from setA
except select FullSet -- This is the secret sauce. Definitely worth your time to read up on how EXCEPT works.
from setB
)
-- Take the missing values plus their primaries from A and load them into B
INSERT B (PrimaryAddress, AliasAdress)
select A.PrimaryAddress, nB.Missing
from NotInB nB
inner join A
on A.AliasAdress = nb.Missing
Run it again with the tables reversed (from “NotInB” on) to do the same for A.
HOWEVER
Doing so with your sample data for "in B not in A" will add (sally#nasa, ripley#nostromo) to A, and as that’s a different primary, it’d create a new set, and so does not solve the problem. It gets ugly quickly. Talking it out from here:
Takes two passes, one for A not in B, one for B not in A
For each pass, have to do two checks
First check is what’s above: what’s in A not in B where primary addresses match, and add it
Second check is ugly: what’s in A not in B where the primary addresses from A is NOT a primary address in B and, thus, must be an alias. Here, find A’s primary address in B’s alias list, get the primary key used for this set in B, and create the row(s) in B using those values.

OK, This is how we did it... As it was becoming a pain, we ran a procedure that added the primary address of each entry as an alias: xx#xx -> xx#xx so that all addresses were listed as aliases for each user. This is similar to what #Phillip Kelly did above. Then we ran the following code: (its messy but it works; in one pass too)
SELECT 'Missing from B:' as Reason, TableA.[primary] as APrimary, TableA.[alias] as AAlias, TableB.[primary] as BPrimary,TableB.[alias] as BAlias into #A FROM dbo.TableA LEFT OUTER JOIN TableB ON TableB.alias = TableA.alias
SELECT 'Missing from A:' as Reason,TableA.[primary] as APrimary, TableA.[alias] as AAlias, TableB.[primary] as BPrimary,TableB.[alias] as BAlias into #B FROM dbo.TableB LEFT OUTER JOIN TableA ON TableA.alias = TableB.alias
select * from #A
select * from #B
UPDATE #A
SET #A.APrimary = #B.BPrimary
FROM #B INNER JOIN #A ON #A.APrimary = #B.BPrimary
WHERE #A.BPrimary IS NULL
UPDATE #B
SET #B.BPrimary = #A.APrimary
FROM #B INNER JOIN #A ON #B.BPrimary = #A.BPrimary
WHERE #B.APrimary IS NULL
select * from #A
select * from #B
select * into #result from (
select Reason, BPrimary as [primary], BAlias as [alias] from #B where APrimary IS NULL
union
select Reason, APrimary as [primary], AAlias as [alias] from #A where BPrimary IS NULL
) as tmp
select * from #result
drop table #A
drop table #B
drop table #result
GO

Related

Insert data from Table A to Table B, then Update Table A with Table B ID

I currently have two tables like the following:
Table A
TableAId
TableAPrivateField
CommonField1
CommonField2
CommonField..
TableBGeneratedId
1
datadatadata
datadatadata2
datadatadata3
d...
NULL
2
datadatadata5
datadatadata6
datadatadata7
d...
NULL
...
Table B
TableBId
CommonField1
CommonField2
CommonField..
...
What i want to do is insert into TableB some record fetched from TableA, and then update the column [TableBGenerateId] of TableA with the corresponding new Id from the inserted record in TableB.
I tried with declaring a Table Value Parameter and then use it with the OUTPUT clause, but i can't find a way to relate back to the original TableAId of the row that acted as the source for the insert
something like that:
DECLARE #InsertedTableB TABLE (
TableBId INT PRIMARY KEY
);
INSERT INTO TableB
OUTPUT inserted.TableBId INTO #InsertedTableB
SELECT CommonField1, CommonField2,..
FROM TableA
WHERE TableAPrivateField = 'MyCondition';
WITH NumberedTableA AS(
SELECT TableAId, ROW_NUMBER() OVER(ORDER BY TableAId) AS RowNum
FROM TableA
WHERE TableAPrivateField = 'MyCondition'
),
NumberedInsert AS(
SELECT TableBId, ROW_NUMBER() OVER(ORDER BY TableBId) AS RowNum
FROM #InsertedTableB
)
UPDATE TableA
SET GeneratedTableBId = NumberedInsert.TableBId
FROM TableA
JOIN NumberedTableA ON Table.TableAId = NumberedTableA.TableAId
JOIN NumberedInsert ON NumberedTableA.RowNum = NumberedTable.RowNum
My problem is that even thought the query works i have no guaranties that the order of the fetched records will be the same, so i would risk linking back the wrong Ids. I tried to figure out some different solutions, but the closest one i found was to temporarily add a column to TableB containing TableAId and then perform the update, but i disliked it because this operation needs to be executed frequently and it would be too performance demanding. Adding the column permanently also isn't an acceptable solution sadly.
Anyone has any suggestion on how solve this?
If you use MERGE rather than INSERT (but still only ever insert with the MERGE by using a condition that will never be met e.g. 1=0), you can capture both the ID from TableA, and the new ID from tableB in the OUTPUT clause and insert this to your table variable. Then use this table variable to update tableA:
DECLARE #InsertedTableB TABLE (TableBId INT PRIMARY KEY, TableAId INT NOT NULL);
MERGE INTO dbo.TableB AS b
USING
( SELECT TableAId, CommonField1, CommonField2
FROM dbo.TableA
WHERE TableAPrivateField = 'MyCondition'
) AS a
ON 1 = 0 -- <<<< Always false so will never match and only ever insert
WHEN NOT MATCHED THEN
INSERT (CommonField1, CommonField2)
VALUES (a.CommonField1, a.CommonField2)
OUTPUT inserted.TableBId, a.TableAId INTO #InsertedTableB (TableBId, TableAId);
UPDATE a
SET GeneratedTableBId = b.TableBId
FROM dbo.TableA AS a
INNER JOIN #InsertedTableB AS b
ON b.TableAId = a.TableAId;
Example on db<>fiddle
Whenever I post any answer that in anyway condones the use of MERGE it is met with at least one comment highlighting all of the bugs with it, so to pre-empt that: There are a lot of issues with using MERGE in SQL Server - I do not believe that any of those risks will apply in this scenario if you are (a) forcing an insert and (b) using a table as the target. So while I will always avoid MERGE where I can by using multiple statements, this is one scenario where I don't avoid it because I don't think there is a cleaner solution available without using MERGE. It is anecdotal, but I have used this method for years and have never once encountered an issue.

SQL-Server: Updating table from another table

I have two tables, one which represents some data and one that links two pieces of data together.
The first, Redaction, has three columns; ID, X, Y.
The second, LinkedRedactions, has two columns; PrimaryID, SecondaryID, which are the IDs of two of the rows from Redaction that are linked, and need to have the same X and Y value.
What I want to do is update the values of X and Y in Redaction for the SecondaryIDs if they are not already the same as the values for X and Y for the corresponding PrimaryID.
Unfortunately I cannot use a TRIGGER since the scripts will be running on kCura's Relativity platform, which doesn't allow them. A SQL script would be ideal, which would be run every few seconds by an agent.
I've tried declaring a temporary table and updating from that, but that gives me the error
"must declare scalar variable #T"
DECLARE #T TABLE (
[ID] INT, [X] INT, [Y] INT
)
INSERT INTO #T
SELECT
[ID], [X], [Y]
FROM
[Redaction] AS R
WHERE
[ID] IN (
SELECT [PrimaryID] FROM [LinkedRedactions]
)
UPDATE
[Redaction]
SET
[X] = #T.[X], [Y] = #T.[Y]
WHERE
[Redaction].[ID] IN (
SELECT [ID] FROM #T
)
Disclaimer: This is only my second day of SQL, so more descriptive answers would be appreciated
The entire code can be simplified using inner joins.
UPDATE red
SET [X] = redPrimary.[X], [Y] = redPrimary.[Y]
FROM [Redaction] red
INNER JOIN [LinkedRedactions] redLnk ON red.[ID] = redLnk.SecondaryIDs
INNER JOIN [Redaction] redPrimary ON redLnk.PrimaryID = redPrimary.[ID]
Explanation:
[Redaction] red
[LinkedRedactions] redLnk
[Redaction] redPrimary
red, redLnk and redPrimary are called aliases and they're used to call the table by using a different name.
INNER JOIN
This is a type of join that only matches if the same column value exists on both the left and the right table.
UPDATE red
--SET statement
FROM [Redaction] red
This updates only the [Redaction] table via it's alias 'red'.
INNER JOIN [LinkedRedactions] redLnk ON red.[ID] = redLnk.SecondaryIDs
This joins the Link table and the table to be updated by the secondary IDs and ID respectively.
INNER JOIN [Redaction] redPrimary ON redLnk.PrimaryID = redPrimary.[ID]
This joins the Link table and the [Redaction] table again but uses the Primary ID and ID columns respectively. This is a self join which allows us to update a set of values in a table with a different set of values from the same table.
No WHERE conditions are needed since the conditions are handled in the ON clauses.
You can use UPDATE FROM
UPDATE [Redaction]
SET
[X] = T.[X],
[Y] = T.[Y]
FROM
#T T
WHERE
[Redaction].[ID] = T.[ID]

DELETE WITH INTERSECT

I have two tables with the same number of columns with no primary keys (I know, this is not my fault). Now I need to delete all rows from table A that exists in table B (they are equal, each one with 30 columns).
The most immediate way I thought is to do a INNER JOIN and solve my problem. But, write conditions for all columns (worrying about NULL) is not elegant (maybe cause my tables are not elegant either).
I want to use INTERSECT. I am not knowing how to do it? This is my first question:
I tried (SQL Fiddle):
declare #A table (value int, username varchar(20))
declare #B table (value int, username varchar(20))
insert into #A values (1, 'User 1'), (2, 'User 2'), (3, 'User 3'), (4, 'User 4')
insert into #B values (2, 'User 2'), (4, 'User 4'), (5, 'User 5')
DELETE #A
FROM (SELECT * FROM #A INTERSECT SELECT * from #B) A
But all rows were deleted from table #A.
This drived me to second question: why the command DELETE #A FROM #B deletes all rows from table #A?
Try this:
DELETE a
FROM #A a
WHERE EXISTS (SELECT a.* INTERSECT SELECT * FROM #B)
Delete from #A where, for each record in #A, there is a match where the record in #A intersects with a record in #B.
This is based on Paul White's blog post using INTERSECT for inequality checking.
SQL Fiddle
To answer your first question you can delete based on join:
delete a
from #a a
join #b b on a.value = b.value and a.username = b.username
The second case is really strange. I remember similar case here and many complaints about this behaviour. I will try to fing that question.
You can use Giorgi's answer to delete the rows you need.
As for the question regarding why all rows were deleted, that's because there is no limiting condition. Your FROM clause gets a table to process, but there is no WHERE clause to prevent certain rows from being deleted from #A.
Create a table (T) defining the primary keys
insert all records from A into T (i will assume there are no duplicates in A)
try to insert all records from B in T
3A. if insert fails delete it from B (already exists)
Drop T (you really shouldn't !!!)
Giorgi's answer explicitly compares all columns, which you wanted to avoid.
It is possible to write code that doesn't list all columns explicitly.
EXCEPT produces the result set that you need, but I don't know a good way to use this result set to DELETE original rows from A without primary key. So, the solution below saves this intermediary result in a temporary table using SELECT * INTO. Then deletes everything from A and copies temporary result into A. Wrap it in a transaction.
-- generate the final result set that we want to have and save it in a temporary table
SELECT *
INTO #t
FROM
(
SELECT * FROM #A
EXCEPT
SELECT * FROM #B
) AS E;
-- copy temporary result back into A
DELETE FROM #A;
INSERT INTO #A
SELECT * FROM #t;
DROP TABLE #t;
-- check the result
SELECT * FROM #A;
result set
value username
1 User 1
3 User 3
The good side of this solution is that it uses * instead of the full list of columns. Of course, you can list all columns explicitly as well. It will still be easier to write and handle, than writing comparisons of all columns and taking care of possible NULLs.

How to combine three tables into a new table

all with the same column headings and I would like to create one singular table from all three.
I'd also, if it is at all possible, like to create a trigger so that when one of these three source tables is edited, the change is copied into the new combined table.
I would normally do this as a view, however due to constraints on the STSrid, I need to create a table, not a view.
Edit* Right, this is a bit ridiculous but anyhow.
I HAVE THREE TABLES
THERE ARE NO DUPLICATES IN ANY OF THE THREE TABLES
I WANT TO COMBINE THE THREE TABLES INTO ONE TABLE
CAN SOMEONE HELP PROVIDE THE SAMPLE SQL CODE TO DO THIS
ALSO IS IT POSSIBLE TO CREATE TRIGGERS SO THAT WHEN ONE OF THE THREE TABLES IS EDITED THE CHANGE IS PASSED TO THE COMBINED TABLE
I CAN NOT CREATE A VIEW DUE TO THE FACT THAT THE COMBINED TABLE NEEDS TO HAVE A DIFFERENT STSrid TO THE SOURCE TABLES, CREATING A VIEW DOES NOT ALLOW ME TO DO THIS, NOR DOES AN INDEXED VIEW.
Edit* I Have Table A,Table B and Table C all with columns ORN, Geometry and APP_NUMBER. All the information is different so
Table A (I'm not going to give an example geometry column)
ORN ID
123 14/0045/F
124 12/0002/X
Table B (I'm not going to give an example geometry column)
ORN ID
256 05/0005/D
989 12/0012/X
Table C (I'm not going to give an example geometry column)
ORN ID
043 13/0045/D
222 11/0002/A
I want one complete table of all info
Table D
ORN ID
123 14/0045/F
124 12/0002/X
256 05/0005/D
989 12/0012/X
043 13/0045/D
222 11/0002/A
Any help would be greatly appreciated.
Thanks
If the creation of the table is a one time thing you can use a select into combined with a union like this:
select * into TableD from
(
select * from TableA
union all
select * from TableB
union all
select * from TableC
) UnionedTables
As for the trigger, it should be easy to set up a after insert trigger like this:
CREATE TRIGGER insert_trigger
ON TableA
AFTER INSERT AS
insert TableD (columns...) select (columns...) from inserted
Obviously you will have to change the columns... to match your structure.
I haven't checked the syntax though so it might not be prefect and it could need some adjustment, but it should give you an idea I hope.
If IDs are not duplicated it ill be easy to achieve it, in another case you can must add a OriginatedFrom column. You also can create a lot of instead off triggers (not only for insert but for delete and update) but that a lazy excuse for not refactoring the app.
Also you must pay attention for any reference for the data, since its a RELATIONAL model is likely to other tables are related to the table you are about to drop.
This is the code for create the table D
drop table D;
Select * into D from (select * from A Union all select* from B Union all select * from C);
Its rather simple Just Create Table_D First
CREATE TABLE_D
(
ORN INT,
ID VARCHAR(20),
Column3 Datatype
)
GO
Use INSERT statement to insert records into this table SELECTing and using UNION ALL operator from other three table.
INSERT INTO TABLE_D (ORN , ID, Column3)
SELECT ORN , ID, Column3
FROM Table_A
UNION ALL
SELECT ORN , ID, Column3
FROM Table_B
UNION ALL
SELECT ORN , ID, Column3
FROM Table_C
Trigger
You will need to create this trigger on all of the tables.
CREATE TRIGGER tr_Insert_Table_A
ON TABLE_A
FOR INSERT
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO TABLE_D (ORN , ID, Column3)
SELECT i.ORN , i.ID, i.Column3
FROM Inserted i LEFT JOIN TABLE_D D
ON i.ORN = D.ORN
WHERE D.ORN IS NULL
END
Read here to learn more about SQL Server Triggers

How does transact sql know which table I'm referencing in this subquery?

This is a question about documentation on how t-sql decides which "column" is in scope for subqueries. I tried google-ing which turned up this link but it didn't explain it.
Here's a runnable example. The update statement sets the only entry in #a.a to null. Presumably this is because the subquery reference to alias a resolves to table #b which has no rows that match value 1, thus returning null to the outer update query.
if object_id('tempdb..#a') is not null
drop table #a
if object_id('tempdb..#b') is not null
drop table #b
create table #a (a int)
create table #b (a int)
insert into #a values (1)
insert into #b values (2)
update a
set a = (select a from #b as a where a.a = 1)
from #a as a
Is there documentation that indicates this design choice? It is otherwise ambiguous, because if I change the update statement to use a different alias, the final value in #a.a is 2:
update aa
set a = (select a from #b as a where aa.a = 1)
from #a as aa
This reference might do a better job of explaining it.
The idea is quite simple. A table alias is interpreted as the "first" table definition, starting with the current level of the subquery and then moving outward. A table alias in a subquery cannot be used in an outer query, so references can only move "inward".
In your example:
update a
set a = (select a from #b as a where a.a = 1)
from #a as a
The a.a is referring to column a of table a. In the subquery itself, table a is defined as #b. That is the reference.
In this query:
update aa
set a = (select a from #b as a where aa.a = 1)
from #a as aa;
The table aliases is aa. This is not defined in the subquery. It is defined at the next level out, so it refers to #a.
In general, don't give different tables the same alias in a query (with the exception of aliases on subqueries that are essentially just a filtered/selected version of a specific table). That can just lead to confusion.
In your first example there is no relationship between the outer and inner query, and so you are setting the value of column 'a' to the results of the inner query for every row in table #a. The inner query returns null, as there are no rows in #b which have the value of 1, so the column a in #a is set to null
In your second example, you are still not providing a relationship between the inner and outer query. All the inner query is doing is selecting every value from #b, because for every row in #b, the value of #a.a is 1. You might just as well have (select a from #b) as your inner query.
The reason rhat #a.a gets set to 2 is that there is only 1 row in the #b table, and its value is 2. If there were multiple rows in #b, then I think that #a.a would get set to the value of the last returned row in table #b. So if there were 2 rows in #b and the first had value 2 and the second had value 3, then I would expect that #a would be set to 3. (Or it would not execute).
Either way these are not very good pieces of SQL IMHO.