Reconciling a column across two tables in SQL Server - sql

There are two Databases, Database A has a table A with columns of id, group and flag. Database B has a table B with columns of ID and flag. Table B is essentially a subset of table A where the group == 'B'.
They are updated/created in odd ways that are outside my understanding at this time, and are beyond the scope of this question (this is not the time to fix the basic setup and practices of this client).
The problem is that when the flag in Table A is updated, it is not reflected in table B, but should be. This is not a time-critical problem, so it was suggested I create a job to handle this. Maybe because it's the end of the week, or maybe because I've never written more than the most basic stored procedure (I'm a programmer, not a DBA), but I'm not sure how to go about this.
At a simplistic level, the stored procedure would be something along of the lines of
Select * in table A where group == B
Then, loop through the resultset, and for each id, update the flag.
But I'm not even sure how to loop in a stored procedure like this. Suggestions? Example code would be preferred.
Complication: Alright, this gets a little harder too. For every group, Table B is in a separate database, and we need to update this flag for all groups. So, we would have to set up a separate trigger for each group to handle each DB name.
And yes, inserts to Table B are already handled - this is just to update flag status.

Assuming that ID is a unique key, and that you can use linked servers or some such to run a query across servers, this SQL statement should work (it works for two tables on the same server).
UPDATE Table_B
SET Table_B.Flag = Table_A.Flag
FROM Table_A inner join Table_B on Table_A.id = Table_B.id
(since Table_B already contains the subset of rows from Table_A where group = B, we don't have to include this condition in our query)
If you can't use linked servers, then I might try to do it with some sort of SSIS package. Or I'd use the method described in the linked question (comments, above) to get the relevant data from Database A into a temp table etc. in Database B, and then run this query using the temp table.

UPDATE
DatabaseB.dbo.Table_B
SET
DatabaseB.dbo.Table_B.[Flag] = DatabaseA.dbo.Table_A.Flag
FROM
DatabaseA.dbo.Table_A inner join DatabaseB.dbo.Table_B B
on DatabaseA.dbo.id = DatabaseB.dbo.B.id
Complication:
For sevaral groups run one such update SQL per group.
Note you can use Flag without []. I'm using the brackets only because of syntax coloring on stackoverflow.

Create an update trigger on table A that pushes the necessary changes to B as A is modified.
Basically (syntax may not be correct, I can't check it right now). I seem to recall that the inserted table contains all of the updated rows on an update, but you may want to check this to make sure. I think the trigger is the way to go, though.
create trigger update_b_trigger
on Table_A
for update
as
begin
update Table_B
set Table_B.flag = inserted.flag
from inserted
inner join Table_B
on inserted.id = Table_B.id
and inserted.group = 'B'
and inserted.flag <> Table_B.flag
end
[EDIT] I'm assuming that inserts/deletes to Table B are already handled and it's just flag updates to Table B that need to be addressed.

Related

Is it possible to update one table from another table without a join in Vertica?

I have two tables A(i,j,k) and B(m,n).
I want to update the 'm' column of B table by taking sum(j) from table A. Is it possible to do it in Vertica?
Following code can be used for Teradata, but does Vertica have this kind of flexibility?
Update B from (select sum(j) as m from A)a1 set m=a1.m;
The Teradata SQL syntax won't work with Vertica, but the following query should do the same thing :
update B set m = (select sum(j) from A)
Depending on the size of your tables, this may not be an efficient way to update data. Vertical is a WORM (write once read many times) store, and is not optimized for updates or deletes.
An alternate way would be to first temporarily move the data in the target table to another intermediate (but not temporary) table. After that write a join query using the other table to produce the desired result, and finally use export table with that join query. Finally drop the intermediate table. Of course, this is assuming you have partitioned your table in a way suitable for your update logic.

MS SQL - Conditional UPDATE return result

I have a stored procedure which is doing the following.
The populated target table data is checked against several similar source tables for a match (based on name and address data). If a match is found in the first table then it updates the target with a flag identifying which source table the match was from. However if it doesn't find a match I need it to look in the next source table and the next until either a match is found or not as the case may be.
Is there an easy way for the UPDATE statement to provide some kind of return value I can query to say whether it updated the target table? I would like to use this return value so that I can skip checking subsequent source tables unnecessarily.
Otherwise will I have to perform the conditional UPDATE then do a separate query to determine if the UPDATE actually updated the flag?
Probably the safest approach is to use the OUTPUT clause. This will return the modified rows into a new table.
You can check the table to see if any rows have been updated.
One advantage of the OUTPUT clause is that you can update multiple rows at the same time.
I like the soulution of Gordon, but I do not think you actualy need it.
Simply run the updates in order:
UPDATE BASE_TABLE
SET FLAG='first_table'
where FLAG IS null AND
EXIST (SELECT 1 FROM first_table f1 where f1.ID = ID)
UPDATE BASE_TABLE
SET FLAG='second_table'
where FLAG IS null AND
EXIST (SELECT 1 FROM second_table f2 where f2.ID = ID)
...
And so on.
You dont need to check every row conditionaly, that would be very slow.
you can put your update in try/catch and insert your result to another table

Copy records missing from one table to a new table

I managed to delete 4,000 rows from a table in my 129,000-row production database (Postgres 9.4 on Heroku), but only identified the problem a few days later.
I have a backup from before the loss, but only want to selectively restore the missing rows back to the table, preserving their id's. (A complete restore is not an option as new data has since been added to the table.)
Into a local testing database I have imported the backed-up table as articles_backup, alongside the actual articles table. I want to find all the rows in articles_backups that are missing from articles and then copy these to a new table articles_restores that I will then restore to the production database, back into the articles table (preserving record id's).
This query successfully returns all the id's of the deleted records:
select articles_backups.id
from articles_backups
left outer join articles on (articles_backups.id = articles.id)
where articles.id is null
But I have not been able to copy the result to a new table. I have unsuccessfully tried:
select *
into articles_restores
from articles_backups
left outer join articles on (articles_backups.id = articles.id)
where articles.id is null;
Which gives:
ERROR: column "id" specified more than once
Basically your query with LEFT JOIN / IS NULL does what you are after:
Select rows which are not present in other table
You get the error because you select all columns from both tables, and there is an id column in both. It's not possible to create a new table with duplicate column names, and it's not what you want to begin with. Only select columns from articles_backups:
CREATE TABLE articles_restores AS
SELECT ab.*
FROM articles_backups ab
LEFT JOIN articles a USING (id)
WHERE a.id IS NULL;
While being at it I simplified your query syntax with table aliases. The USING clause is just for the convenience of shorter code. It folds the two id columns into one, but all other columns are still in there twice if you SELECT *.
Use CREATE TABLE AS. SELECT INTO is also defined by the SQL standard and implemented in Postgres, but its use is discouraged. It's used in PL/pgSQL functions for a different purpose. Details:
Creating temporary tables in SQL
You could use an except to retrieve all the rows from articles_backup that are different from articles:
(assuming both tables have the same columns in the same order)
you could also create a temp table with this info to make it easy on your repairing statements:
create table temp_articles as
select * from articles_backup
except
select * from articles
step 1 - update rows from 'articles_backup' present in articles.
This step needs attention... you will have to establish a rule to choose between the data present in articles and the one present in temp_articles.
UPDATE articles a
SET a.col1=b.col1,
a.col2=b.col2,
(... other columns ...)
FROM (SELECT * FROM temp_articles) AS b
WHERE a.id = b.id and /* your rule for data to be (or not) updated goes here */
step 2 - insert rows from 'articles_backup' not present in articles (your deleted records):
insert into articles
select * from temp_articles where id not in (select id from articles)
Let us know if you need more help.

What's a good logic/design of a SQL script to incrementally update a table?

So there's this table of just about 40,000 rows I am looking to update. Colleague said it's best to incrementally update the table instead of complete delete and load.
So I've tried hashing out the design and logic of a script to do this, but my inexperience is getting to me. I just don't know what's efficient and unneeded to incrementally update a table.
Currently, the warehouse looks like this: data comes from source into a table (let's call this T1) in Teradata. Then it's sent into another table (let's call this T2) in Teradata with some added fields such as timestamp. Lastly, a view is built on that last table for security reasons.
So with that laid out, I was thinking of creating a temp/volatile table with data from T1. This would have all the data up to the time the script is run with new records. Then, go through the entire table seeing if the ID (primary index) already exists in T2, and if not, add it to another temp table. Then somehow combine the second temp table with T2 and override T2 and build a view on top of that.
Does this make any sense?
There's also the possibility of records being updated. So they would already exist in T2, but have updated data in a new version of T1. I think comparing the values of all the columns from T1 to T2 would be highly inefficient, but can't think of another way to do this
A 40,000 row delete and insert should be pretty painless for any modern database. Ditto for updates.
The real reason for doing and incremental delete/update/insert is so you can log the changes and timestamp rows in the permanent table with the date/time of nsertion and/or last update. The usual technique goes something like this:
remove rows from the permanent table that don't exist in the temp table
update rows that exist in both tables
insert rows that exist in the temp table, but don't exist in the permanent table.
Looking at the Teradata docs, that would be something like this (no warranties about this being syntactically correct, since I don't have a Teradata instance to play with):
delete permanent p
where not exists ( select *
from temp t
where t.id = p.id
)
update p
from permanent p ,
temp t
set ...
where t.id = p.id
insert permanent
select ...
from temp t
where not exists ( select *
from permanent p
where p.id = t.id
)
One might note that the deletes might get a little hairy if there are dependent foreign key constraints involved.
One might also note that on the update, the where clause might get a tad...complicated if you want to check for actual changes to column values: not much point in updating a row if nothing has changed.
There's a Teradata MERGE command that you might find useful, check this post:
https://forums.teradata.com/forum/database/merge-syntax-simple-version
merge into merge_tmp as t using (select 1 as a,'stf' as b,'uuj' as c) as s
on t.a = s.a
when matched then update set c = s.c
when not matched then insert values (s.a,s.b,s.c);
If you need to match on more columns simple put an and in the on statement.
Edit: If you want to use MERGE you might also need to use a delete statement like the one in nicholas' post.

update data from one table to another (in a database)

DB gurus,
I am hoping someone can set set me on the right direction.
I have two tables. Table A and Table B. When the system comes up, all entries from Table A are massaged and copied over to Table B (according to Table B's schema). Table A can have tens of thousands of rows.
While the system is up, Table B is kept in sync with Table A via DB change notifications.
If the system is rebooted, or my service restarted, I want to re-initialize Table B. However, I want to do this with the least possible DB updates. Specifically, I want to:
add any rows that are in Table A, but not in Table B, and
delete any rows that are not in Table A, but are in Table B
any rows that are common to Table A and Table B should be left untouched
Now, I am not a "DB guy", so I am wondering what is conventional way of doing this.
Use exists to keep processing to a minimum.
Something along these lines, modified so the joins are correct (also verify that I didn't do something stupid and get TableA and TableB backwards from your description):
insert into TableB
select
*
from
TableA a
where
not exists (select 1 from TableB b where b.ID = a.ID)
delete from
TableB b
where
not exists (select 1 from TableA a where a.ID = b.ID)
Informix's Enterprise Replication features would do all this for you. ER works by shipping the logical logs from one server to another, and rolling them forward on the secondary.
You can configure it to be as finely-grained as you need (ie just a handful of tables).
You use the term "DB change notifications" - are you already using ER or is this some trigger-based arrangement?
If for some reason ER can't work for your configuration, I would suggest rewriting the notifications model to behave asynchronously, ie:
write notifications to a table in server 'A' that contains a timestamp or serial field
create a table on server 'B' that stores the timestamp/serial value of the last processed record
run a daemon process on server 'B' that:
compares 'A' and 'B' timestamps/serials
selects 'A' records between 'A' and 'B' timestamps
processes those records into 'B'
update 'B' timestamp/serial
sleep for appropriate time-period, and loop
So Server 'B' is responsible for ensuring its copy is in sync with 'A'. 'A' is not inconvenienced by 'B' being unavailable.
A simple way would be to use a historic table where you would put the changes from A that happened since the last update, and use that table to sync the table B instead of a direct copy from A to B. Once the sync is done, you delete the whole historic table and start anew.
What I don't understand is how table A can be update and not B if your service or computer is not running. Are they found on 2 different database or server?
Join data from both tables according to comon columns and this gives you the rows that have a match in both tables, i.e. data in A and in B. Then use this values (lets call this set M) with set operations, i.e. set minus operations to get the differences.
first requirement: A minus M
second requrement: B minus A
third requirement: M
Do you get the idea?
I am a Sql Server guy but since Sql Server 2008, for this kind of operation , a feature call MERGE is available.
By using MERGE statement we can perform insert, update and delete operations in a single statement.
So I googled and found that Informix also supports the same MERGE statement but I am not sure whether it takes care of delete too or not though insert and update is being taken care off. Moreover, this statement takes care of transaction by itself