I am a bit lost trying to insert my data in a specific scenario from an excel sheet into 4 tables, using SSIS.
Each row of my excel sheet needs to be split into 3 tables. The identity column value then needs to be inserted into a 4th mapping table to hold the relationship. How do I achieve this efficiently using SSIS 2008?
Note in the below example, its fixed that both col4 and 5 go into 3rd table.
Here is data example
Excel
col1 col2 col3 col4 col5
a b c d 3
a x c y 5
Table1
PK col
1 a
2 a
Table2
PK col1 col2
1 b c
2 x c
Table3
PK Col
1 d
2 3
3 y
4 5
Map_table
PK Table1_ID Table2_ID Table3_ID
1 1 1 1
2 1 1 2
2 2 2 3
2 2 2 4
I am fine even if just a SQL based approach is suggested, as I do not ave any mandate to use SSIS only. Additional challenge is that in table 2, if a same data row exists, I want to use that ID in the map table, instead of inserting duplicate rows!
Multicast is the component you are looking for. This component takes an input source and DUPLICATE it as many output. You can, in that scenario, have an Excel source and duplicate the flow to insert the data into your Table1, Table2 and Table3.
Now, the tricky part is getting back those identities into your Map_Table. Either you dont use IDENTITY and use some other means (like a GUID, or an incremental counter of your own that you would setup as a derived column before the multicast) or you use the ##IDENTITY to retrive the last inserted identity. Using ##IDENTITY sounds like a pain to me for your current scenario, but that's up to you. If the data is not that huge, I would go for a GUID.
##IDENTITY don't work well with BULK operations. It will retrieve only the last identity created. Also, keep in mind that I talked about ##IDENTITY, but you may want to use IDENT_CURRENT('TableName') instead to retrieve the last identity for a specific table. ##IDENTITY retrieve the last identity created within your session, whatever the scope. You can use SCOPE_IDENTITY() to retrive the last identity within your scope.
Related
I basically want to create a table like this
col1|col2
---------
1 1
1 2
1 3
2 1
3 1
2 2
1 4
where column 2 autoincrements, but its autoincrement values are not tied to the overall table but column 1's value. Is this possible?
I thought I found a duplicate question, but it was for PostgreSQL. Apologies for temporarily marking your question as a duplicate. I've reversed that.
I don't know for certain if this is possible in SQLite in an automated way, but one solution would be to do it in steps:
BEGIN a transaction and INSERT one row the table with a NULL for the col2. This should acquire a RESERVED lock and prevent other concurrent processes from doing the same thing and causing a race condition.
SELECT MAX(col2) FROM mytable WHERE col1 = ? to get the greatest value inserted for the given group so far.
UPDATE mytable SET col2 = ?+1 WHERE col1 = ? AND col2 IS NULL Using the MAX discovered in step 2.
COMMIT to write the changes to the file.
I have three tables xx_1 , xx_2, xx_3 such that :
xx_1
id obj_version_num location
1 x ubudu
2 x bali
3 x india
xx_2
id name grade
1 abc band 1
2 xyz band 2
3 gdgd band 3
xx_3 has :
Name details col1 p_id
abc A HDHD 10
xyz B HDHD 20
gdgd C HDHD 30
smith D HDHD 40
I want to delete data from xx_1 and xx_2 if the name is smith in xx_3
Currently i am doing :
delete from xx_1
where id in (select distinct id from xx_2 t ,xx_3 k
where t.name=k.name
and k.name ='Smith')
and then
delete from xx_2
where name ='Smith'
Is there anyway i can delete data from both these table together ? without creating two separate scripts ?
There is no way to delete from many tables with a single statement, but the better question is why do you need to delete from all tables at the same time? It sounds to me like you don't fully understand how transactions work in Oracle.
Lets say you login and delete a row from table 1, but do not commit. As far as all other sessions are concerned, that row has not been deleted. If you open another connection and query for the row, it will still be there.
Then you delete from tables 2, 3 and then 4 in turn. You still have not committed the transaction, so all other sessions on the database can still see the deleted rows.
Then you commit.
All at the same time, the other sessions will no longer see the rows you deleted from the 4 tables, even though you did the deletes in 4 separate statements.
EDIT after edit in question:
You can define the foreign keys on the 3 child tables to "ON DELETE CASCADE". Then when you delete from the parent table, all associated rows from the 3 child tables are also deleted.
You cannot delete from multiple tables in a single statement, primary key or not.
In my talned job I am copying data from Excel to SQL table.
For this job to maintain the foreign key constraint I had to do a look up before copying the data.
The task goes like this.
I have to copy data in Table2 (id keys value).
My excel sheet has data for id and keys column. Table 1 has two columns id and value.
For value column's data I want to look at Table1's corresponding entry with the id of the current record in Table2. I have to copy the data from Table1's value column to Table2's value column.
Excel (id 1 2 3, keys a b c)
Table_1 (id 1 2 3, value 123 456 789)
desired output: Table_2 (id 1 2 3, keys a b c, value 123 456 789)
current output: Table_2 (id 1 2 3, keys a b c, value null null null)
How do I properly map this?
You've set the job up exactly as needs to be done really so it's not your job layout that's the problem here.
If you're expecting every record in your Excel document to have a matching record in your database table then you should use an inner join condition in your tMap join like so:
And this then allows you to have an output that grabs everything that isn't joining (which is your issue here):
This should show you everything in your source (not lookup) file that isn't matching. I suspect that everything is failing to match on your join condition (essentially WHERE ExcelDoc.id = Table.Id) even if it looks like it should. This could be down to a mismatch of datatypes as they are read into Talend (so the Java/Talend datatypes such as int/Integer or String rather than the DB types) or because one of your id columns has extraneous whitespace padding.
If you find that your id column in one of your sources does in fact have any padding then you should be able to convert it in a previous step before your join or even just make the transformation in the join:
The only other thing I'd recommend is that you name your flows between components (just click on them to change the name), especially when you are joining anything in a tMap.
I am not sure if my question will be precise and understandable so I apologize in advance.
I have Excel 2013 file with rows which I should import into database table. Now, that would be easy but here is the catch: there are some multiple value fields (I think listbox fields but I am not sure, there are no properties or lists defined anywhere, those fields just have few values to choose from and little arrow box).
I need to break rows that have those multiple value fields into new row for EVERY value from the list. Every row in sheet has different values, and some rows don't have multiple values, those should be ignored.
Example:
column1 Column2 Column3
1 2 multiple values: a, b or c
should be broken into 3 new rows
column1 Column2 Column3
1 2 a
1 2 b
1 2 c
I don't know how to do this? Or at least how to import data like above (all multiple values, rest of the data just the same) into database using SSIS. Thank you for every tip you have!
I'm interested in T-SQL source code for synchronizing a table (or perhaps a subset of it) with data from another similar table. The two tables could contain any variables, for example I could have
base table source table
========== ============
id val id val
---------- ------------
0 1 0 3
1 2 1 2
2 3 3 4
or
base table source table
=================== ==================
key val1 val2 key val1 val2
------------------- ------------------
A 1 0 A 1 1
B 2 1 C 2 2
C 3 3 E 4 0
or any two tables containing similar columns with similar names. I'd like to be able to
check that the two tables have
matching columns: the source table has exactly the same columns as the base table and the datatypes match
make a diff from the base table to the source table
do the necessary updates, deletes and inserts to change the data in the
base table to correspond the source table
optionally limit the diff to a subset of the base table,
preferrably with a stored procedure. Has anyone written a stored proc for this or could you point to a source?
SQL Server 2008 features the new merge statement. It's very flexible, if a bit complex to write out.
As an example, the following query would synchronize the #base and #source tables. It's limited to a subset of #base where id <> 2:
MERGE #base as tgt
USING #source as src
ON tgt.id = src.id and tgt.val = src.val
WHEN NOT MATCHED BY TARGET
THEN INSERT (id, val) values (src.id, src.val)
WHEN NOT MATCHED BY SOURCE AND tgt.id <> 2
THEN DELETE
Interesting question.
you could start from EXCEPT - INTERSECT
http://msdn.microsoft.com/en-us/library/ms188055.aspx
Here is readymade solution, may help you
http://www.sqlservercentral.com/scripts/Miscellaneous/30596/
Not sure if it's of any use to your specific situation, but this kind of operation is usually and relatively easily done using external tools (SQL Workbench diff, SQL Compare etc.).
It can even be scripted, just probably not invokable from a T-SQL procedure.