SQL-Server: Updating table from another table - sql

I have two tables, one which represents some data and one that links two pieces of data together.
The first, Redaction, has three columns; ID, X, Y.
The second, LinkedRedactions, has two columns; PrimaryID, SecondaryID, which are the IDs of two of the rows from Redaction that are linked, and need to have the same X and Y value.
What I want to do is update the values of X and Y in Redaction for the SecondaryIDs if they are not already the same as the values for X and Y for the corresponding PrimaryID.
Unfortunately I cannot use a TRIGGER since the scripts will be running on kCura's Relativity platform, which doesn't allow them. A SQL script would be ideal, which would be run every few seconds by an agent.
I've tried declaring a temporary table and updating from that, but that gives me the error
"must declare scalar variable #T"
DECLARE #T TABLE (
[ID] INT, [X] INT, [Y] INT
)
INSERT INTO #T
SELECT
[ID], [X], [Y]
FROM
[Redaction] AS R
WHERE
[ID] IN (
SELECT [PrimaryID] FROM [LinkedRedactions]
)
UPDATE
[Redaction]
SET
[X] = #T.[X], [Y] = #T.[Y]
WHERE
[Redaction].[ID] IN (
SELECT [ID] FROM #T
)
Disclaimer: This is only my second day of SQL, so more descriptive answers would be appreciated

The entire code can be simplified using inner joins.
UPDATE red
SET [X] = redPrimary.[X], [Y] = redPrimary.[Y]
FROM [Redaction] red
INNER JOIN [LinkedRedactions] redLnk ON red.[ID] = redLnk.SecondaryIDs
INNER JOIN [Redaction] redPrimary ON redLnk.PrimaryID = redPrimary.[ID]
Explanation:
[Redaction] red
[LinkedRedactions] redLnk
[Redaction] redPrimary
red, redLnk and redPrimary are called aliases and they're used to call the table by using a different name.
INNER JOIN
This is a type of join that only matches if the same column value exists on both the left and the right table.
UPDATE red
--SET statement
FROM [Redaction] red
This updates only the [Redaction] table via it's alias 'red'.
INNER JOIN [LinkedRedactions] redLnk ON red.[ID] = redLnk.SecondaryIDs
This joins the Link table and the table to be updated by the secondary IDs and ID respectively.
INNER JOIN [Redaction] redPrimary ON redLnk.PrimaryID = redPrimary.[ID]
This joins the Link table and the [Redaction] table again but uses the Primary ID and ID columns respectively. This is a self join which allows us to update a set of values in a table with a different set of values from the same table.
No WHERE conditions are needed since the conditions are handled in the ON clauses.

You can use UPDATE FROM
UPDATE [Redaction]
SET
[X] = T.[X],
[Y] = T.[Y]
FROM
#T T
WHERE
[Redaction].[ID] = T.[ID]

Related

compare primary/alias groups across two tables

Gday,
We have two tables that contain exactly the same structure. There are two columns "PrimaryAddress" and "AliasAddress". These are for email addresses and aliases. We want to find any records that need to be added to either side to keep the records in sync. The catch is that the primary name in one table might be listed as an alias in the other. The good news is that an address wont appear twice in the "AliasAddress" column.
TABLE A
PrimaryAddress~~~~~AliasAdress
chris#work~~~~~~~~~chris#home
chris#work~~~~~~~~~c#work
chris#work~~~~~~~~~theboss#work
chris#work~~~~~~~~~thatguy#aol
bob#test~~~~~~~~~~~test1#test
bob#test~~~~~~~~~~~charles#work
bob#test~~~~~~~~~~~chuck#aol
sally#mars~~~~~~~~~sally#nasa
sally#mars~~~~~~~~~sally#gmail
TABLE B
PrimaryAddress~~~~~AliasAdress
chris#home~~~~~~~~~chris#work
chris#home~~~~~~~~~c#work
chris#home~~~~~~~~~theboss#work
chris#home~~~~~~~~~thatguy#aol
bob#test~~~~~~~~~~~test1#test
bob#test~~~~~~~~~~~charles#work
sally#nasa~~~~~~~~~sally#mars
sally#nasa~~~~~~~~~sally#gmail
sally#nasa~~~~~~~~~ripley#nostromo
The expected result is to return the following missing records from both tables:
bob#test~~~~~~~~~~~chuck#aol
sally#nasa~~~~~~~~~ripley#nostromo
Note that the chris#* block is a total match because the sum of all the aliases (plus primary) is still the same regardless of which address is considered primary. It doesnt matter which address is primary as along as the sum of the entire primary group contains all entries in both tables.
I dont mind if this is run in two passes A->B and B->A but I just cant get my head around a solution.
Any help appreciated :)
drop TABLE #TABLEA
CREATE TABLE #TABLEA
([PrimaryAddress] varchar(10), [AliasAdress] varchar(12))
;
INSERT INTO #TABLEA
([PrimaryAddress], [AliasAdress])
VALUES
('chris#work', 'chris#home'),
('chris#work', 'c#work'),
('chris#work', 'theboss#work'),
('chris#work', 'thatguy#aol'),
('bob#test', 'test1#test'),
('bob#test', 'charles#work'),
('bob#test', 'chuck#aol'),
('sally#mars', 'sally#nasa'),
('sally#mars', 'sally#gmail')
;
drop TABLE #TABLEB
CREATE TABLE #TABLEB
([PrimaryAddress] varchar(10), [AliasAdress] varchar(15))
;
INSERT INTO #TABLEB
([PrimaryAddress], [AliasAdress])
VALUES
('chris#home', 'chris#work'),
('chris#home', 'c#work'),
('chris#home', 'theboss#work'),
('chris#home', 'thatguy#aol'),
('bob#test', 'test1#test'),
('bob#test', 'charles#work'),
('sally#nasa', 'sally#mars'),
('sally#nasa', 'sally#gmail'),
('sally#nasa', 'ripley#nostromo')
;
try the following
select a.PrimaryAddress,a.AliasAdress from #TABLEA a left join #TABLEB b on a.AliasAdress=b.AliasAdress or b.PrimaryAddress=a.AliasAdress
where b.PrimaryAddress is null
union all
select a.PrimaryAddress,a.AliasAdress from #TABLEB a left join #TABLEA b on a.AliasAdress=b.AliasAdress or b.PrimaryAddress=a.AliasAdress
where b.PrimaryAddress is null
So you want to compare table A and B, and find rows which are unqiue in either table. How about an outer join, followed by looking for NULL values:
SELECT ta.*, tb.*
FROM table_a ta
FULL OUTER JOIN table_b tb ON tb.PrimaryAddress = ta.PrimaryAddress
AND tb.AliasAddress = ta.AliasAddress
WHERE ta.PrimaryAddress IS NULL
OR tb.PrimaryAddress IS NULL
If I understand the question correctly, this should return what you ask for.
Here's how I did it, with a bit of throwing-hands-up-in-the-air at the end.
Step one, identify the sets of items to be compared. This is:
For a “primary” value, all values found in Alias
Including the “primary” value as well (to cover that nasa/nostromo case)
A set in a table (A or B) is identified by its primary value. What really makes it hard is that the primary value is not shared across the two tables (sally#mars, sally#nasa). So we can compare sets, but we have to be able to “go back” to the primary on each table separately (e.g. the stand-out from table B may be sally#nasa / ripley#nostroomo, but we have to add sally#mars / ripley#nostromo to table A)
Major problems arise if, in a table, a primary value appears as an alias for a different primary value (e.g. in table A, chris#work appears as an alias for bob#test). For the sake of sanity, I am going to assume this will not happen… but if it does, the problem becomes even harder.
This query works to add missing items in B that are not in A, where the PrimaryAddress is the same for both A and B:
;WITH setA (SetId, FullSet)
as (-- Complete sets in A
select PrimaryAddress, AliasAdress
from A
union select PrimaryAddress, PrimaryAddress
from A
)
,setB (SetId, FullSet)
as (-- Complete sets in B
select PrimaryAddress, AliasAdress
from B
union select PrimaryAddress, PrimaryAddress
from B
)
,NotInB (Missing)
as (-- What's in A that's not in B
select FullSet
from setA
except select FullSet -- This is the secret sauce. Definitely worth your time to read up on how EXCEPT works.
from setB
)
-- Take the missing values plus their primaries from A and load them into B
INSERT B (PrimaryAddress, AliasAdress)
select A.PrimaryAddress, nB.Missing
from NotInB nB
inner join A
on A.AliasAdress = nb.Missing
Run it again with the tables reversed (from “NotInB” on) to do the same for A.
HOWEVER
Doing so with your sample data for "in B not in A" will add (sally#nasa, ripley#nostromo) to A, and as that’s a different primary, it’d create a new set, and so does not solve the problem. It gets ugly quickly. Talking it out from here:
Takes two passes, one for A not in B, one for B not in A
For each pass, have to do two checks
First check is what’s above: what’s in A not in B where primary addresses match, and add it
Second check is ugly: what’s in A not in B where the primary addresses from A is NOT a primary address in B and, thus, must be an alias. Here, find A’s primary address in B’s alias list, get the primary key used for this set in B, and create the row(s) in B using those values.
OK, This is how we did it... As it was becoming a pain, we ran a procedure that added the primary address of each entry as an alias: xx#xx -> xx#xx so that all addresses were listed as aliases for each user. This is similar to what #Phillip Kelly did above. Then we ran the following code: (its messy but it works; in one pass too)
SELECT 'Missing from B:' as Reason, TableA.[primary] as APrimary, TableA.[alias] as AAlias, TableB.[primary] as BPrimary,TableB.[alias] as BAlias into #A FROM dbo.TableA LEFT OUTER JOIN TableB ON TableB.alias = TableA.alias
SELECT 'Missing from A:' as Reason,TableA.[primary] as APrimary, TableA.[alias] as AAlias, TableB.[primary] as BPrimary,TableB.[alias] as BAlias into #B FROM dbo.TableB LEFT OUTER JOIN TableA ON TableA.alias = TableB.alias
select * from #A
select * from #B
UPDATE #A
SET #A.APrimary = #B.BPrimary
FROM #B INNER JOIN #A ON #A.APrimary = #B.BPrimary
WHERE #A.BPrimary IS NULL
UPDATE #B
SET #B.BPrimary = #A.APrimary
FROM #B INNER JOIN #A ON #B.BPrimary = #A.BPrimary
WHERE #B.APrimary IS NULL
select * from #A
select * from #B
select * into #result from (
select Reason, BPrimary as [primary], BAlias as [alias] from #B where APrimary IS NULL
union
select Reason, APrimary as [primary], AAlias as [alias] from #A where BPrimary IS NULL
) as tmp
select * from #result
drop table #A
drop table #B
drop table #result
GO

SQL - Stripping a string and using it in a condition

So I have a SQL query issue given to me which i'm struggling to resolve:
It currently brings back 6710445 rows but i need to apply further conditions based on a particular string field.
SELECT
Table1.ExampleColumn1 -- (ID)
,Table1.ExampleColumn2
,Table2.ExampleColumn3
,Table2.ExampleColumn4
,Table3.ExampleColumn5
,Table3.ExampleColumn6
,Table1.StringField
FROM [Example Database].[dbo].[Table1] AS Table1
INNER JOIN [Example Database].[dbo].[Table2] AS Table2
ON Example = Example
INNER JOIN [Example Database].[dbo].[Table3] AS Table3
ON Example = Example
WHERE Month BETWEEN 201304 AND 201603
AND (Age < 19)
The above 'Table1.StringField' has the following type codes displayed as a string in each the rows: "||J183,Y752,J374,Y752."
I also have a reference table (Call it 'Ref1') with 514 of these codes displayed individually, which has no other fields in the table whatsoever.
So what i need to be able to do is find rows from the query above which has any of values from the 'Ref1' displayed anywhere within 'Table1.StringField' individual rows, and if not to not include that row in the results set.
I tried to strip down the 'StringField' column of the comma's and "||" but it didn't work as well as i hoped and ended up bringing back over 30M rows.
Any ideas on how to do this? Preferably so it's efficient and doesn't make the user wait 10 minutes just to query it?
Maybe this will get you half way there... I also agree with Sean Lange's comment about not storing delimited data to begin with but I'm assuming the OP already knows this. You can also pivot/unpivot this data to achieve this as well. This is probably the most brute force way of doing sort of what you're looking to do.
--DROP TABLE #Table
--DROP TABLE #Ref
CREATE TABLE #Table (Col VARCHAR(MAX))
CREATE TABLE #Ref (Code VARCHAR(10))
INSERT INTO #Table (Col) VALUES ('A123,B234,C345'),('A123'),('C345')
INSERT INTO #Ref (Code) VALUES ('A123'),('B234')
SELECT * FROM #Table
SELECT * FROM #Ref
SELECT DISTINCT t.Col
FROM #Table t
CROSS APPLY (
SELECT CASE WHEN CHARINDEX(r.Code, t.Col) > 0 THEN 1 ELSE 0 END AS [ItsHere] FROM #Ref r) oa
WHERE oa.ItsHere = 1
What you need to do is join your query to the Ref1 table on Table1.StringField = Ref1.Ref_1_value and then exclude the Table1 rows that don't match any Ref_1_value. Like this:
SELECT
Table1.ExampleColumn1 -- (ID)
,Table1.ExampleColumn2
,Table2.ExampleColumn3
,Table2.ExampleColumn4
,Table3.ExampleColumn5
,Table3.ExampleColumn6
,Table1.StringField
FROM [Example Database].[dbo].[Table1] AS Table1
INNER JOIN [Example Database].[dbo].[Table2] AS Table2
ON Example = Example
INNER JOIN [Example Database].[dbo].[Table3] AS Table3
ON Example = Example
INNER JOIN [Example Database].[dbo].[Ref1] as Ref1
ON Table1.StringField = Ref1.Ref_1_value
WHERE Month BETWEEN 201304 AND 201603
AND (Age < 19)
AND Ref1.Ref_1_value is not null

Right outer join with static values in DB2

I have a table table and I'm trying to show all the rows with the id in a certain list. If there are no rows with this id, then I would display null values.
Obviously, if these ids I want to select were in a different table, then the solution was just a right join statement.
The problem is that such table does not exists and the user provides the list of ids as input.
I'm trying to solve this using the right join (values ..) on ..=.. statement. But I'm unable to give a column name to the nested statement (with static values). And so I'm unable to write a valid on clause.
For instance, I have the table table:
id val
-- --
0 0.1
2 -0.5
7 1.1
Then the user dinamically select a list of ids, which is not necessary contained in the list of ids of the table. For instance, if the user select 0,1,2,3, then I should diplay:
id val
-- --
0 0.1
1 null
2 -0.5
3 null
I'm trying to do something like
select * from table right outer join (
values (0),(1),(2),(3)
) as static_values on table.id = static_values[1]
Obviosuly static_values[1] is wrong, but I have to name the column to perform a join and I don't know how else to do it.
You should be able to do:
select *
from table
right outer join (values (0),(1),(2),(3)) as static_values(id)
on table.id = static_values.id

How does transact sql know which table I'm referencing in this subquery?

This is a question about documentation on how t-sql decides which "column" is in scope for subqueries. I tried google-ing which turned up this link but it didn't explain it.
Here's a runnable example. The update statement sets the only entry in #a.a to null. Presumably this is because the subquery reference to alias a resolves to table #b which has no rows that match value 1, thus returning null to the outer update query.
if object_id('tempdb..#a') is not null
drop table #a
if object_id('tempdb..#b') is not null
drop table #b
create table #a (a int)
create table #b (a int)
insert into #a values (1)
insert into #b values (2)
update a
set a = (select a from #b as a where a.a = 1)
from #a as a
Is there documentation that indicates this design choice? It is otherwise ambiguous, because if I change the update statement to use a different alias, the final value in #a.a is 2:
update aa
set a = (select a from #b as a where aa.a = 1)
from #a as aa
This reference might do a better job of explaining it.
The idea is quite simple. A table alias is interpreted as the "first" table definition, starting with the current level of the subquery and then moving outward. A table alias in a subquery cannot be used in an outer query, so references can only move "inward".
In your example:
update a
set a = (select a from #b as a where a.a = 1)
from #a as a
The a.a is referring to column a of table a. In the subquery itself, table a is defined as #b. That is the reference.
In this query:
update aa
set a = (select a from #b as a where aa.a = 1)
from #a as aa;
The table aliases is aa. This is not defined in the subquery. It is defined at the next level out, so it refers to #a.
In general, don't give different tables the same alias in a query (with the exception of aliases on subqueries that are essentially just a filtered/selected version of a specific table). That can just lead to confusion.
In your first example there is no relationship between the outer and inner query, and so you are setting the value of column 'a' to the results of the inner query for every row in table #a. The inner query returns null, as there are no rows in #b which have the value of 1, so the column a in #a is set to null
In your second example, you are still not providing a relationship between the inner and outer query. All the inner query is doing is selecting every value from #b, because for every row in #b, the value of #a.a is 1. You might just as well have (select a from #b) as your inner query.
The reason rhat #a.a gets set to 2 is that there is only 1 row in the #b table, and its value is 2. If there were multiple rows in #b, then I think that #a.a would get set to the value of the last returned row in table #b. So if there were 2 rows in #b and the first had value 2 and the second had value 3, then I would expect that #a would be set to 3. (Or it would not execute).
Either way these are not very good pieces of SQL IMHO.

Multiple replacements in string in single Update Statement in SQL server 2005

I've a table 'tblRandomString' with following data:
ID ItemValue
1 *Test"
2 ?Test*
I've another table 'tblSearchCharReplacement' with following data
Original Replacement
* `star`
? `quest`
" `quot`
; `semi`
Now, I want to make a replacement in the ItemValues using these replacement.
I tried this:
Update T1
SET ItemValue = select REPLACE(ItemValue,[Original],[Replacement])
FROM dbo.tblRandomString T1
JOIN
dbo.tblSpecialCharReplacement T2
ON T2.Original IN ('"',';','*','?')
But it doesnt help me because only one replacement is done per update.
One solution is I've to use as a CTE to perform multiple replacements if they exist.
Is there a simpler way?
Sample data:
declare #RandomString table (ID int not null,ItemValue varchar(500) not null)
insert into #RandomString(ID,ItemValue) values
(1,'*Test"'),
(2,'?Test*')
declare #SearchCharReplacement table (Original varchar(500) not null,Replacement varchar(500) not null)
insert into #SearchCharReplacement(Original,Replacement) values
('*','`star`'),
('?','`quest`'),
('"','`quot`'),
(';','`semi`')
And the UPDATE:
;With Replacements as (
select
ID,ItemValue,0 as RepCount
from
#RandomString
union all
select
ID,SUBSTRING(REPLACE(ItemValue,Original,Replacement),1,500),rs.RepCount+1
from
Replacements rs
inner join
#SearchCharReplacement scr
on
CHARINDEX(scr.Original,rs.ItemValue) > 0
), FinalReplacements as (
select
ID,ItemValue,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY RepCount desc) as rn
from
Replacements
)
update rs
set ItemValue = fr.ItemValue
from
#RandomString rs
inner join
FinalReplacements fr
on
rs.ID = fr.ID and
rn = 1
Which produces:
select * from #RandomString
ID ItemValue
----------- -----------------------
1 `star`Test`quot`
2 `quest`Test`star`
What this does is it starts with the unaltered texts (the top select in Replacements), then it attempts to apply any valid replacements (the second select in Replacements). What it will do is to continue applying this second select, based on any results it produces, until no new rows are produced. This is called a Recursive Common Table Expression (CTE).
We then use a second CTE (a non-recursive one this time) FinalReplacements to number all of the rows produced by the first CTE, assigning lower row numbers to rows which were produced last. Logically, these are the rows which were the result of applying the last applicable transform, and so will no longer contain any of the original characters to be replaced. So we can use the row number 1 to perform the update back against the original table.
This query does do more work than strictly necessary - for small numbers of rows of replacement characters, it's not likely to be too inefficient. We could clear it up by defining a single order in which to apply the replacements.
Will skipping the join table and nesting REPLACE functions work?
Or do you need to actually get the data from the other table?
-- perform 4 replaces in a single update statement
UPDATE T1
SET ItemValue = REPLACE(
REPLACE(
REPLACE(
REPLACE(
ItemValue,'*','star')
ItemValue,'?','quest')
ItemValue,'"','quot')
ItemValue,';','semi')
Note: I'm not sure if you need to escape any of the characters you're replacing