Use to SQL to detect changes between tables - sql

I want to create a SQL script that would compare 2 of the same fields in two different tables. These tables may be in two different servers. I want to use this script to see if one field gets updated in one table/server, it is also updated in the other table/server. Any ideas to approach this?

First thing you need to be sure of is that your servers are linked, otherwise you won't easily be able to compare the two. If the servers are linked, and the tables are identical you can use an EXCEPT query to identify the changes e.g.
select * from [server1].[db].[schema].[table]
except
select * from [server2].[db].[schema].[table]
This query will return all rows from the table in server1 that don't appear in server2 from here you can either wrap this in a count or insert/update the missing/changed rows from one table to another
Identifying whether the rows have changed or been inserted will rely on using a primary key, with that you can join one table to another and identify what needs updating using a query like so:
select *
from [server1].[db].[schema].[table] t1
inner join [server2].[db].[schema].[table] t2 on t1.id = t2.id
where ( t1.col1 <> t2.col1 or t1.col2 <> t2.col2 ... )
Another way of tracking changes is to use a DML trigger and have this propagate changes from one table to another.
I was working on a SQL Server auditing tool that uses these principles, have a look through the code if you like its not 100% working https://github.com/simtbak/panko/blob/main/archive/Panko%20v003.sql

Related

INSERT INTO SELECT with LEFT JOIN not preventing duplicates for simultaneous hits

I have this SQL query that inserts records from one table to another without duplicates.
It works fine, if I call this SQL query from one instance of my application. But in production, the application is horizontally scaled, having more than one instance of application, each calling below query simultaneously at the same time. That is causing duplicate records to me. Is there any way to fix this query, so it allows simultaneous hits?
INSERT INTO table1 (col1, col2)
SELECT DISTINCT TOP 10
t2.col1,
t2.col2
FROM
table2 t2
LEFT JOIN
table1 t1 ON t2.col1 = t1.col1
AND t2.col2 = t1.col2
WHERE
t1.col1 IS NULL
The corrective action here depends on the behavior you want. If you intend to allow for just a single horizontal instance of your application to execute this query, then you need to create a critical section, into which one instance is allowed to enter. Since you are already using SQL Server, you could implement by forcing each instance to get a lock on a certain table. Only the instance which gets the lock will execute the query, and the others will drop off.
If, on the other hand, you really want each instance to execute the query, then you should use a serializable transaction. Using a serializable transaction will ensure that only one instance can do the insert on the table at a given time. It would not be possible for two or more instances to interleave and execute the same insert.

Trying to use cursor on one database using select from another db

So I'm trying to wrap my head around cursors. I have task to transfer data from one database to another, but they have slightly diffrent schemas. Let's say I have TableOne (Id, Name, Gold) and TableTwo (Id, Name, Lvl). I want to take all records from TableTwo and insert it into TableOne, but it can be duplicated data on Name column. So if single record from TableTwo exist (on Name column comparison) in TableOne, I want to skip it, if don't - create record in TableOne with unique Id.
I was thinking about looping on each record in TableTwo, and for every record check if it's exist in TableOne. So, how do I make this check without making call to another database every time? I wanted first select all record from TableOne, save it into variable and in loop itself make check against this variable. Is this even possible in SQL? I'm not so familiar with SQL, some code sample would help a lot.
I'm using Microsoft SQL Server Management Studio if that matters. And of course, TableOne and TableTwo exists in diffrent databases.
Try this
Insert into table1(id,name,gold)
Select id,name,lvl from table2
Where table2.name not in(select t1.name from table1 t1)
If you want to add newId for every row you can try
Insert into table1(id,name,gold)
Select (select max(m.id) from table1 m) + row_number() over (order by t2.id) ,name,lvl from table2 t2
Where t2.name not in(select t1.name from table1 t1)
It is possible yes, but I would not recommend it. Looping (which is essentially what a cursor does) is usually not advisable in SQL when a set-based operation will do.
At a high level, you probably want to join the two tables together (the fact that they're in different databases shouldn't make a difference). You mention one table has duplicates. You can eliminate those in a number of ways such as using a group by or a row_number. Both approaches will require you understanding which rows you want to "pick" and which ones you want to "ignore". You could also do what another user posted in a comment where you do an existence check against the target table using a correlated subquery. That will essentially mean that if any rows exist in the target table that have duplicates you're trying to insert, none of those duplicates will be put in.
As far as cursors are concerned, to do something like this, you'd be doing essentially the same thing, except on each pass of the cursor you would be temporarily assigning and using variables instead of columns. This approach is sometimes called RBAR (for "Rob by Agonizing Row"). On every pass of the cursor or loop, it has to re-open the table, figure out what data it needs, then operate on it. Even if that's efficient and it's only pulling back one row, there's still lots of overhead to doing that query. So while, yes, you can force SQL to do what you've describe, the database engine already has an operation for this (joins) which does it far faster than any loop you could conceivably write

Data between test and production environments are different

The count(*) of a table in both Test and Production return the same value. However, a user of the table was doing some validation/testing and noticed that the sum of a column/field is different between the 2 environments. Being the better SQL user between the 2 of us, I'm trying to figure out how to find the discrepancies.
What's a good way to do so? This isn't that big of a table (~1 mill) but I'd like to keep the query/statement rather small
This is in Teradata
Alright, here is a framework for you to build off of then. Since you are looking at sums and the like, you'll need to assemble most of the WHERE clause for this, since I don't have enough information to know what you are summing. So, I'll write this to find discrepencies in the rows themselves...
SELECT t1.id
FROM Production.[schema].table1 t1
INNER JOIN Test.[schema].table1 t2 ON t1.id = t2.id
WHERE t1.column <> t2.column
....
Just push the columns you want to compare into the WHERE clause...this will sync up the two tables from TEST and PROD and let you look for differences between columns. It will return a list of row ids where there is a mismatch.

Safely insert row data from one table to another - SQL

I need to move some data stored in one table to another using a script, taking into account existing records that may already be in the destination table as well as any relationships that may exist.
I am curious to know the best method of doing this that has a relatively low impact on performance and can be reversed if necessary.
At first I will be moving only one record to ensure the process runs smoothly but then it will be responsible for moving around 1650 rows.
What would be the best approach to take or is there a better alternative?
Edit:
My previous suggestion of using MERGE will not work as I will be operating under the SQL Server 2005 environment, not 2008 like previously mentioned.
the question does not provide any details, so I can't provide any actual real code, just this plan of attack:
step 1 write a query that will SELECT only the rows you need to copy. You will need to JOIN and/or filter (WHERE) this data to only include the rows that don't already exist in the destination table. Make the column list be the exact same as the destination table's columns, in column order and data type.
step 2 turn that SELECT statement into an INSERT by adding INSERT YourDestinationTable (col1, col2, col3..) before the select.
step 3 if you only want to try a single row, add a TOP 1 to the select part of the new INSERET - SELECT command, you can rerun this command as many times as necessary with/without a TOP because it should eliminate any rows you add by the JOINs and WHERE conditions in the SELECT
in the end, you'll have something that looks like:
INSERT YourDestinationTable
(Col1, Col2, Col3, ...)
SELECT
s.Col1, s.Col2, s.Col3, ...
FROM YourSourceTable s
LEFT OUTER JOIN SomeOtherTable x ON s.Col4=x.Col4
WHERE NOT EXISTS (SELECT 1 FROM YourDestinationTable d WHERE s.PK=d.PK)
AND x.Col5='J'
I'm reading the question as only inserting missing rows from a source table to a destination table. If changes need to be migrated as well then prior to the above steps you will need to do an UPDATE of the destination table joining in the source table. This is hard to explain without more specifics of the actual tables, columns, etc.
Yes, the MERGE statement is ideal for bulk imports if you are running SQL Server 2008.

Optimize query that compares two tables with similar schema in different databases

I have two different tables with similar schema in different database. What is the best way to compare records between these two tables. I need to find out-
records that exists in first table whose corresponding record does not exist in second table filtering records from the first table with some where clauses.
So far I have come with this SQL construct:
Select t1_col1, t1_ col2 from table1
where t1_col1=<condition> AND
t1_col2=<> AND
NOT EXISTS
(SELECT * FROM
table2
WHERE
t1_col1=t2_col1 AND
t1_col2=t2_col2)
Is there a better way to do this?
This above query seems fine but I suspect it is doing row by row comparison without evaluating the conditions in the first part of the query because the first part of the query will reduce the resultset very much. Is this happening?
Just use except keyword!!!
Select t1_col1, t1_ col2 from table1
where t1_col1=<condition> AND
t1_col2=<condition>
except
SELECT t2_col1, t2_ col2 FROM table2
It returns any distinct values from the query to the left of the EXCEPT operand that are not also returned from the right query.
For more information on MSDN
If the data in both table are expected to have the same primary key, you can use IN keyword to filter those are not found in the other table. This could be the simplest way.
If you are open to third party tools like Redgate Data Compare you can try it, it's a very nice tool. Visual Studio 2010 Ultimate edition also have this feature.