updating changes rows - sql

I have a requirement to update a couple of thousand rows in a table based on whether any changes have happened to any of the values. At the moment im just updating all the values regardless but was wondering what was more effecient. Should i check all the columns to see if there are any changes and update or should i just update regardless. e.g
update someTable Set
column1 = somevalue,
column2 = somevalue,
column3 = somevalue,
etc....
from someTable inner join sometable2 on
someTable.id = sometable2.id
where
someTable.column1 != sometable2.column1 or
someTable.column2 != sometable2.column2 or
someTable.column2 != sometable2.column2 or
etc etc......
Whats faster and whats best practice

See two articles on Paul White's Blog.
The Impact of Non-Updating Updates for discussion of the main issue.
Undocumented Query Plans: Equality Comparisons for a less tedious way of doing the inequality comparisons particularly if your columns are nullable (WHERE NOT EXISTS (SELECT someTable.* INTERSECT SELECT someTable2.*)).

I believe this is the best way.
Tables and data:
declare #someTable1 table(id int, column1 int, column2 varchar(2))
declare #someTable2 table(id int, column1 int, column2 varchar(2))
insert #someTable1
select 1,10 a, 'a3'
union all select 2,20 , 'a3'
union all select 3,null, 'a4'
insert #someTable2
select 1,10, 'a3'
union all select 2,19, 'a3'
union all select 3,null, 'a5'
Update:
UPDATE t1
set t1.column1 = t2.column1,
t1.column2 = t2.column2
from #someTable1 t1
JOIN
(select * from #someTable2
EXCEPT
select * from #someTable1) t2
on t2.id = t1.id
Result:
select * from #someTable1
id a b
----------- -------- --
1 10 a3
2 19 a3
3 NULL a5

I've found that explicitly including the where clause the excludes no-op updates to perform faster, when working against large tables, but this is a very YMMV type of question.
If possible, compare the two approaches side by side, against a realistic set of data. E.g. if your tables contain millions of rows, and the updates affect only 10, make sure your sample data affects just a few rows. Or likewise, if it's likely that most rows will change, make your sample data reflect that.

Related

UNION two SELECT queries but result set is smaller than one of them

In a SQL Server statement there is
SELECT id, book, acnt, prod, category from Table1 <where clause...>
UNION
SELECT id, book, acnt, prod, category from Table2 <where clause...>
The first query returned 131,972 lines of data; the 2nd one, 147,692 lines. I didn't notice there is any commonly shared line of data from these two tables, so I expect the result set after UNION should be the same as the sum of 131972 + 147692 = 279,384.
However the result set after UNION is 133,857. Even though they might have overlapped lines that I accidently missed, the result should be at least the same as the larger result set of those two. I can't figure how the number 133,857 came from.
Is my understanding about SQL UNION correct? I use SQL server in this case.
To expand comment given under the question, which I think states what you already know:
UNION takes care of duplicates also within one table as well.
Just take a look at a example:
SETUP:
create table tbl1 (col1 int, col2 int);
insert into tbl1 values
(1,2),
(3,4);
create table tbl2 (col1 int, col2 int);
insert into tbl1 values
(1,2),
(1,2),
(1,2),
(3,4);
Query
select * from tbl1
union
select * from tbl2;
will produce output
col1 | col2
-----|------
1 | 2
3 | 4
DB fiddle

How to verify if two queries contain exact same data

I have a table that maintains a "Gold Standard" set of data that another table should match if the table was processed correctly.
Both of these tables have almost 1,000,000 records of data.
For example. I have table (table1) that have PrimaryKey1, ColumnA, ColumnB, ColumnC, ColumnD, and Column E.
I have another table (table2) with ForeignKey1, ColumnF, ColumnG, ColumnH, ColumnI, ColumnJ.
I need to check that all the data in these two table are exactly the same except for a few columns.
What I mean by that is that ColumnA from table1 has to have all of the same as columnF in table2, and ColumnC from table1 has to matchup with ColumnI from table2 FOR THE SAME RECORD (lets call this primaryKey1). The other columns in the table do not matter.
Also, if there is a mismatch between the datasets, I need to know where the mismatch is.
I think your best bet is SUBSTRACT(). Select x, y, z from A substract select x,y,z from B. If it returns nothing, you're good to go.
Hope this helps!
A quick trick that I use is just comparing row counts. This will at least show you if you have a problem (it won't show you where the problem is).
A union query can join two queries together and display the combined result. Common rows are treated as 1 row. So, if the first query returns exactly 1 million rows, the UNION query (both queries combined) should return exactly 1 million rows. If it doesn't there is a problem.
select ColumnA 'Col1'
, ColumnC 'Col2'
from Table1
UNION
select ColumnF 'Col1'
, ColumnI 'Col2'
from TableB
Something like
select
*
from
gold_copy a
join my_copy b on a.primary_key = b.primary_key
and
a.field1 <> b.field1
or a.field_a <> b.field_f
or a.field_c <> b.field_i
or a.field_x <> b.field_y
I think the following will helps you to get the unmatched records.
select * from table1 where not exists (select * from table2);
so instead of all columns you can check with the columns what you need from the two tables,but i think the column names should be same.
Thank you.
You could use symetric difference for this
(select 'table1', col
from table1
UNION ALL
select 'table2', col
from table2)
EXCEPT
(select 'table1', col
from table1
INTERSECT
select 'table2', col
from table2)
This query returns only those rows that are only in one table and it says in which table it was found

SQL select from either one or other table

Assume I have a table A with a lot of records (> 100'000) and a table B with has the same columns as A and about the same data amount.
Is there a possibility with one clever select statement that I can either get all records of table A or all records of table B?
I am not so happy with the approach I currently use because of the performance:
select
column1
,column2
,column3
from (
select 'A' as tablename, a.* from table_a a
union
select 'B' as tablename, b.* from table_b b
) x
where
x.tablename = 'A'
Offhand, your approach seems like the only approach in standard SQL.
You will improve performance considerably by changing the UNION to UNION ALL. The UNION must read in the data from both tables and then eliminate duplicates, before returning any data.
The UNION ALL does not eliminate duplicates. How much better this performs depends on the database engine and possibly on turning parameters.
Actually, there is another possibility. I don't know how well it will work, but you can try it:
select *
from ((select const.tableName, a.*
from A cross join
(select 'A' as tableName where x.TableName = 'A')
) union all
(select const.tableName, b.*
from B cross join
(select 'B' as tableName where x.TableName = 'B')
)
) t
No promises. But the idea is to cross join to a table with either 1 or 0 rows. This will not work in MySQL, because it does not allow WHERE clauses without a FROM. In other databases, you might need a tablename such as dual. This gives the query engine an opportunity to optimize away the read of the table entirely, when the subquery contains no records. Of course, just because you give a SQL engine the opportunity to optimize does not mean that it will.
Also, the "*" is a bad idea particularly in union's. But I've left it in because that is not the focus of the question.
you can try next solution, it's selects only from table tmp1 ('A' = 'A')
select
*
from
tmp1
where
'A' = 'A'
union all
select
*
from
tmp2
where
'B' = 'A'
SQL Fiddle demo here
check execution plan
Hard to tell exactly what you want without a little more context, but perhaps something like this could work?
DECLARE #TableName nvarchar(15);
DECLARE #Query nvarchar(50);
SELECT #TableName = YourField
FROM YourTable
WHERE ...
SET #Query = 'SELECT * FROM ' + #TableName
EXEC #Query
Syntax might differ a bit depending on what RDBMS you are using, and more specifically what you are trying to accomplish, but might be a push in the right direction.
The proper way to do this and maintain performance requires some modification to your physical table design.
If you can add a column to each table that holds your indicator column and add a check constraint on that column, you can achieve "partition" elimination on your query.
DDL:
create table table_a (
c1 ...
,c2 ...
,c3 ...
,table_ind char(1) not null generated always as 'A'
,constraint ck_table_ind check (table_ind = 'A')
);
create table table_b (
c1 ...
,c2 ...
,c3 ...
,table_ind char(1) not null generated always as 'B'
,constraint ck_table_ind check (table_ind = 'B')
);
create view v1 as (
select * from table_a
union all
select * from table_b
);
If you execute the query select c1,c2,c3 from v1 where table_ind = 'A' the DB2 optimizer will use the check constraint to recognize that no rows in table_b can match the table_ind = 'A' predicate, so it will completely eliminate the table from the access plan.
This was used (and still is in some cases) before DB2 for Linux/UNIX/Windows supported Range Partitioning. You can read more about this technique in this research paper [PDF] written by some of the IBM DB2 developers back in 2002.

Display multiple queries with different row types as one result

In PostgreSQL 8.3 on Ubuntu, I do have 3 tables, say T1, T2, T3, of different schemas.
Each of them contains (a few) records related to the object of the ID I know.
Using 'psql', I frequently do the 3 operations:
SELECT field-set1 FROM T1 WHERE ID='abc';
SELECT field-set2 FROM T2 WHERE ID='abc';
SELECT field-set3 FROM T3 WHERE ID='abc';
and just watch the results; for me it is enough to see.
Is it possible to have a procedure/function/macro etc, with one parameter 'id',
just running the three SELECTS one after another,
displaying results on the screen ?
field-set1, field-set2 and field-set 3 are completely different.
There is no reasonable way to JOIN the tables T1, T2, T3; these are unrelated data.
I do not want JOIN.
I want to see the three resulting sets on the screen.
Any hint?
Quick and dirty method
If the row types (data types of all columns in sequence) don't match, UNION will fail.
However, in PostgreSQL you can cast a whole row to its text representation:
SELECT t1:text AS whole_row_in_text_representation FROM t1 WHERE id = 'abc'
UNION ALL
SELECT t2::text FROM t2 WHERE id = 'abc'
UNION ALL
SELECT t3::text FROM t3 WHERE id = 'abc';
Only one ; at the end, and the one is optional with a single statement.
A more refined alternative
But also needs a lot more code. Pick the table with the most columns first, cast every individual column to text and give it a generic name. Add NULL values for the other tables with fewer columns. You can even insert headers between the tables:
SELECT '-t1-'::text AS c1, '---'::text AS c2, '---'::text AS c1 -- table t1
UNION ALL
SELECT '-col1-'::text, '-col2-'::text, '-col3-'::text -- 3 columns
UNION ALL
SELECT col1::text, col2::text, col3::text FROM t1 WHERE id = 'abc'
UNION ALL
SELECT '-t2-'::text, '---'::text, '---'::text -- table t2
UNION ALL
SELECT '-col_a-'::text, '-col_b-'::text, NULL::text -- 2 columns, 1 NULL
UNION ALL
SELECT col_a::text, col_b::text, NULL::text FROM t2 WHERE id = 'abc'
...
put a union all in between and name all columns equal
SELECT field-set1 as fieldset FROM T1 WHERE ID='abc';
union all
SELECT field-set2 as fieldset FROM T2 WHERE ID='abc';
union all
SELECT field-set3 as fieldset FROM T3 WHERE ID='abc';
and execute it at once.

Comparison Query to Compare Two SQL Server Tables [duplicate]

This question already has answers here:
sql query to return differences between two tables
(14 answers)
Closed 6 years ago.
I would like to know how to compare two different database table records. What I mean is I will compare two database tables which may have different column names but same data. But one of them may have more records than the other one so I want to see what the difference is between those two tables. To do that how to write the sql query ? FYI : these two databases are under the same SQL Server instance.
Table1
------+---------
|name |lastname|
------+---------
|John |rose |
------+---------
|Demy |Sanches |
------+---------
Table2
------+----------
|name2|lastname2|
------+----------
|John |rose |
------+----------
|Demy |Sanches |
------+----------
|Ruby |Core |
------+----------
Then when after comparing table 1 and table 2, it should return Ruby Core from Table2.
Select * from Table1
Except
Select * from Table2
It will show all mismatch records between table1 and table2
Late answer but can be useful to other readers of this thread
Beside other solutions, I can recommend SQL comparison tool called ApexSQL Data Diff.
I know you'd prefer the solution not based on the software, but for other visitors, who may want to do this in an easier way, I strongly suggest reading this article: http://solutioncenter.apexsql.com/how-to-compare-sql-server-database-tables-with-different-names/
The article explains how to use the Object mapping feature in ApexSQL Data Diff, which is particularly useful in situations where two tables share the same name, but their column names are different.
To handle such a case - each column pair needs to be mapped manually in order for the data stored within them to be included when comparing SQL database tables for differences.
If you do an outer join from T1 to T2 you can find rows in the former that are not in the latter by looking for nulls in the T2 values, similarly an outer join of T2 to T1 will give you rows in T2. Union the two together and you get the lot... something like:
SELECT 'Table1' AS TableName, name, lastname FROM
Table1 OUTER JOIN Table2 ON Table1.name = Table2.name2
AND Table1.lastname = Table2.lastname
WHERE Table2.name2 IS NULL
UNION
SELECT 'Table2' AS TableName, name2 as name, lastname2 as lastname FROM
Table2 OUTER JOIN Table1 ON Table2.name2 = Table1.name
AND Table2.lastname2 = Table1.lastname
WHERE Table1.name IS NULL
That's off the top of my head - and I'm a bit rusty :)
If you are using Sql server use a full join. it does exactly the same as Murph said but in one command.
SELECT 'Table1' AS TableName, name, lastname
FROM Table1
FULL JOIN Table2 ON Table1.name = Table2.name2
AND Table1.lastname = Table2.lastname
You could use the CHECKSUM function if you're confident that the data is expressed identically.
Example:
if not OBJECT_ID('Table1', 'Table') is null drop table Table1
if not OBJECT_ID('Table2', 'Table') is null drop table Table2
create table table1
( id int identity(0, 1),
name varchar(128),
lastname varchar(128)
)
create table table2
( id int identity(0, 1),
name varchar(128),
lastname varchar(128)
)
insert into table1 (name, lastname) values ('John', 'rose')
insert into table1 (name, lastname) values ('Demy', 'Sanches')
insert into table2 (name, lastname) values ('John', 'rose')
insert into table2 (name, lastname) values ('Demy', 'Sanches')
insert into table2 (name, lastname) values ('Ruby', 'Core')
select
table2.*
from table1
right outer join table2 on CHECKSUM(table1.name, table1.lastname) = CHECKSUM(table2.name, table2.lastname)
where table1.id is null
See the CHECKSUM MSDN topic for more information.
Try dbForge Data Compare for SQL Server. It can compare and synchronize any database data. Quick, easy, always delivering a correct result. See how it flies on your database!
create table #test
(
Sno INT IDENTITY(1,1),
ExpDate VARCHAR(50),
Amt INT,
Amt1 INT,
Amt2 INT,
SumoAmt INT
)
create table #test1
(
Sno INT IDENTITY(1,1),
ExpDate VARCHAR(50),
Amt INT,
Amt1 INT,
Amt2 INT,
SumoAmt INT
)
INSERT INTO #test(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,10,40)
INSERT INTO #test(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,20,50)
INSERT INTO #test(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,30,60)
INSERT INTO #test(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',NULL,20,40,70)
INSERT INTO #test1(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,10,40)
INSERT INTO #test1(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,20,50)
INSERT INTO #test1(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',10,20,30,60)
INSERT INTO #test1(Expdate,Amt,Amt1,Amt2,SumoAmt) values ('30-07-2012',NULL,20,40,70)
SELECT MIN(TableName) as TableName, Sno,Expdate,Amt,Amt1,Amt2,SumoAmt
FROM
(
SELECT '#test' as TableName,Sno,Expdate,Amt,Amt1,Amt2,SumoAmt
FROM #test
UNION ALL
SELECT '#test1' as TableName,Sno,Expdate,Amt,Amt1,Amt2,SumoAmt
FROM #test1
) tmp
GROUP BY Sno,Expdate,Amt,Amt1,Amt2,SumoAmt
HAVING COUNT(*) = 1
ORDER BY sno
If you want the differences from both the table.
(SELECT *, 'in Table1' AS Comments
FROM Table1
EXCEPT
SELECT * , 'in Table1' AS Comments
FROM Table2)
UNION
(SELECT *, 'in Table2' AS Comments
FROM Table2
EXCEPT
SELECT *, 'in Table2' AS Comments
FROM Table1)
Firefly will do exactly what you're looking for. It lets you build two sql statements then compare the results of the sql queries showing missing rows and data differences. Each query can even come from a different database like oracle / sql server.
http://download.cnet.com/Firefly-Data-Compare-Tool/3000-10254_4-10633690.html?tag=mncol