I'm interested in T-SQL source code for synchronizing a table (or perhaps a subset of it) with data from another similar table. The two tables could contain any variables, for example I could have
base table source table
========== ============
id val id val
---------- ------------
0 1 0 3
1 2 1 2
2 3 3 4
or
base table source table
=================== ==================
key val1 val2 key val1 val2
------------------- ------------------
A 1 0 A 1 1
B 2 1 C 2 2
C 3 3 E 4 0
or any two tables containing similar columns with similar names. I'd like to be able to
check that the two tables have
matching columns: the source table has exactly the same columns as the base table and the datatypes match
make a diff from the base table to the source table
do the necessary updates, deletes and inserts to change the data in the
base table to correspond the source table
optionally limit the diff to a subset of the base table,
preferrably with a stored procedure. Has anyone written a stored proc for this or could you point to a source?
SQL Server 2008 features the new merge statement. It's very flexible, if a bit complex to write out.
As an example, the following query would synchronize the #base and #source tables. It's limited to a subset of #base where id <> 2:
MERGE #base as tgt
USING #source as src
ON tgt.id = src.id and tgt.val = src.val
WHEN NOT MATCHED BY TARGET
THEN INSERT (id, val) values (src.id, src.val)
WHEN NOT MATCHED BY SOURCE AND tgt.id <> 2
THEN DELETE
Interesting question.
you could start from EXCEPT - INTERSECT
http://msdn.microsoft.com/en-us/library/ms188055.aspx
Here is readymade solution, may help you
http://www.sqlservercentral.com/scripts/Miscellaneous/30596/
Not sure if it's of any use to your specific situation, but this kind of operation is usually and relatively easily done using external tools (SQL Workbench diff, SQL Compare etc.).
It can even be scripted, just probably not invokable from a T-SQL procedure.
Related
Imagine you have table t where statusID is a calculated field based on status. 'Live'=1, 'Deleted'=2 and so on. t is partitioned by statusID meaning that all 1 goes into a partition, all 2 goes into another etc. Now let's query t. If I want all live records can I use WHERE status='live' or must I use where statusID=1 to take advantage of the partition?
ID
val1
val2
status
statusID
1
abc
ABC
live
1
2
xyz
XYZ
deleted
2
3
foo
BAR
archived
3
We have some existing tables which are getting quite large, and code that leverages the status col, which is indexed. For most loads this is fine, but with the rows getting into the millions we are starting to see issues with joins.
I basically want to create a table like this
col1|col2
---------
1 1
1 2
1 3
2 1
3 1
2 2
1 4
where column 2 autoincrements, but its autoincrement values are not tied to the overall table but column 1's value. Is this possible?
I thought I found a duplicate question, but it was for PostgreSQL. Apologies for temporarily marking your question as a duplicate. I've reversed that.
I don't know for certain if this is possible in SQLite in an automated way, but one solution would be to do it in steps:
BEGIN a transaction and INSERT one row the table with a NULL for the col2. This should acquire a RESERVED lock and prevent other concurrent processes from doing the same thing and causing a race condition.
SELECT MAX(col2) FROM mytable WHERE col1 = ? to get the greatest value inserted for the given group so far.
UPDATE mytable SET col2 = ?+1 WHERE col1 = ? AND col2 IS NULL Using the MAX discovered in step 2.
COMMIT to write the changes to the file.
I need to create a package to migrate a large amount of data from a database table into a different database table. The source table will continuously have new data in like 4,5 days so I will run my package again and again.
I need to migrate all data from this table to another table but I don't want to migrate those data that I already migrated. What kind of transformation I need to use or what SQL command I need to write to do this?
The usual way this is done is by having "audit" timestamps on the source table and migating only records updated or inserted after the last migration.
for example:
Table Sales
sale_id
sale_date
sale_amount
...............
dw_create_date
dw_update_date
Your source extraction could be something along the lines of..
select sales.sale_id,
sales.sale_date,
....
from sales
where dw_updated_date > {last_migration_date}
last_migration_date is usually read from a config file or table.
Other approaches
There are a few other approaches that you could use, but all of these have bigger performance problems as your data size grows.
1) Do a (target-source) data, to get changed rows in the souurce.
select *
from source
minus
select * from target
You could do the same using a join between source and target.
select source.*
from src
left join tgt on (src.id=tgt.id)
where (src.column1 <> tgt.column1 or
src.column2 <> tgt.column2
............
)
Note that either one of these approaches does not take care of deletes in the source. If you want the tables to be in sync, the only way to do that would be do a (source-target) to get insert/update changes and (target-source) to get deleted rows and do the same in the target.
2. Insert and ignore the primary constraint error:
This has serious issues if the data can change in the source and you want the updates propagated to the target. You'd also be querying the entire source each time. It is usually better to use Merge/Upsert along with filtered source data, instead.
I would assume both tables have some unique identifier, no?
Table A has:
1
2
3
4
You're moving that to Table B, but keeping the data in Table A at the same time, yes?
So you've run your job once. Now Table B has:
1
2
3
4
Table A gets updated. It now has:
1
2
3
4
5
6
7
You run your job again, but you only want to send over 5,6,7.
SELECT *
FROM TableA
LEFT OUTER JOIN TableB ON TableA.ID = TableB.ID
WHERE TableB.ID = NULL.
If you have some sample data it would help. Does this give you a good idea?
See joins: http://i.stack.imgur.com/1UKp7.png
I am a bit lost trying to insert my data in a specific scenario from an excel sheet into 4 tables, using SSIS.
Each row of my excel sheet needs to be split into 3 tables. The identity column value then needs to be inserted into a 4th mapping table to hold the relationship. How do I achieve this efficiently using SSIS 2008?
Note in the below example, its fixed that both col4 and 5 go into 3rd table.
Here is data example
Excel
col1 col2 col3 col4 col5
a b c d 3
a x c y 5
Table1
PK col
1 a
2 a
Table2
PK col1 col2
1 b c
2 x c
Table3
PK Col
1 d
2 3
3 y
4 5
Map_table
PK Table1_ID Table2_ID Table3_ID
1 1 1 1
2 1 1 2
2 2 2 3
2 2 2 4
I am fine even if just a SQL based approach is suggested, as I do not ave any mandate to use SSIS only. Additional challenge is that in table 2, if a same data row exists, I want to use that ID in the map table, instead of inserting duplicate rows!
Multicast is the component you are looking for. This component takes an input source and DUPLICATE it as many output. You can, in that scenario, have an Excel source and duplicate the flow to insert the data into your Table1, Table2 and Table3.
Now, the tricky part is getting back those identities into your Map_Table. Either you dont use IDENTITY and use some other means (like a GUID, or an incremental counter of your own that you would setup as a derived column before the multicast) or you use the ##IDENTITY to retrive the last inserted identity. Using ##IDENTITY sounds like a pain to me for your current scenario, but that's up to you. If the data is not that huge, I would go for a GUID.
##IDENTITY don't work well with BULK operations. It will retrieve only the last identity created. Also, keep in mind that I talked about ##IDENTITY, but you may want to use IDENT_CURRENT('TableName') instead to retrieve the last identity for a specific table. ##IDENTITY retrieve the last identity created within your session, whatever the scope. You can use SCOPE_IDENTITY() to retrive the last identity within your scope.
I have a table that is similar to the following below:
id | cat | one_above | top_level |
0 'printers' 'hardware' 'computers'
I want to be able to write a query, without using unions, that will return me a result set that transposes this table's columns into rows. What this means, is that I want the result to be:
id | cat |
0 'printers'
0 'hardware'
0 'computers'
Is this possible in MySQL? I can not drop down to the application layer and perform this because I'm feeding these into a search engine that will index based on the id. Various other DBMS have something like PIVOT and UNPIVOT. I would appreciate any insight to something that I'm missing.
Mahmoud
P.S.
I'm considering re-normalization of the database as a last option, since this won't be a trivial task.
Thanks!
I got this out of the book The Art of SQL, pages 284-286:
Let's say your table name is foo.
First, create a table called pivot:
CREATE Table pivot (
count int
);
Insert into that tables as many rows as there are columns that you want to pivot in foo. Since you have three columns in foo that you want to pivot, create three rows in the pivot table:
insert into pivot values (1);
insert into pivot values (2);
insert into pivot values (3);
Now do a Cartesian join between foo and pivot, using a CASE to select the correct column based on the count:
SELECT foo.id, Case pivot.count
When 1 Then cat
When 2 Then one_above
When 3 Then top_level
End Case
FROM foo JOIN pivot;
This should give you what you want.
After some fairly extensive digging I stumbled on this page which may or may not contain your answer. It's a difficult in MySQL but from a conceptual point of view I can construct a query that would transpose like this using describe (though it would probably perform horribly). So I'm sure that we can figure out a way to do it the right way.