OleDB Destination executes full rollback on error, Bulk Insert Task doesn't - sql

I'm using SSIS and BIDS to process a text file who contains lots (millions) of records. I decided to use the Bulk Insert Task and it worked great but then the destination table needed an additional column with a default value on the insert operation and the Bulk Insert Task stopped working. After that, I decided to use a Derived Column with the defaul value and an OleDB Destination to insert the bulk data. It solved my last problem but generated a new one: If there is an error when inserting the data in the OleDB Destination, then it executes a full rollback and no row was added on my table, but when I used the Bulk Insert Task, there were rows based in the BatchSize configuration. Let me explain it with a sample:
I use a text file with 5000 lines. The file contained a duplicate id (intentionally) between the rows 3000 and 4000.
Before starting the DTS, the destination table was totally empty.
Using Bulk Insert Task, after the error raised (and the DTS stopped), the destination table had 3000 rows. I set the BatchSize attribute to 1000.
Using OleDB Destination, after the error raised, the destination table had 0 rows! I set the Rows per batch attribute to 1000 and the Maximum insert commit size to its max value: 2147483647. I tried changing last one to 0, no effect.
Is this the normal behavior of OleDB Destination? Can someone provide me a guide about working with these tasks? Should I forget to use these tasks and use the Bulk Insert from T-SQL?
As a side note, I also tried following the instructions for KEEPNULLS in Keep Nulls or UseDefault Values During Bulk Import (SQL Server) to not use the OleDB Destination task, but it didn't work (maybe is just me).
EDIT: Additional info about the problem.
Table structure (sample)
Table T
id int, name varchar(50), processed int default 0
CSV File (sample)
1, hello
2, world

There is no rolling back on Bulk Inserts, that's why they are fast.
Take a look at using format files:
http://msdn.microsoft.com/en-us/library/ms179250.aspx

You could potentially place this in a transaction in SSIS (you'll need MSDTC running), or you could create T-SQL script with a try-catch to handle any exceptions of the bulk insert (probably just rollback or commit).

Related

Data flow insert lock

I have an issue with my data flow task locking, this task compares a couple of tables, from the same server and the result is inserted into one of the tables being compared. The table being inserted into is being compared by a NOT EXISTS clause.
When performing fast load the task freezes with out errors when doing a regular insert the task gives a dead lock error.
I have 2 other tasks that perform the same action to the same table and they work fine but the amount of information being inserted is alot smaller. I am not running these tasks in parallel.
I am considering using no locks hint to get around this because this is the only task that writes to a cerain table partition, however I am only coming to this conclusion because I can not figure out anything else, aside from using a temp table, or a hashed anti join.
Probably you have so called deadlock situation. You have in your DataFlow Task (DFT) two separate connection instances to the same table. The first conn instance runs SELECT and places Shared lock on the table, the second runs INSERT and places a page or table lock.
A few words on possible cause. SSIS DFT reads table rows and processes it in batches. When number of rows is small, read is completed within a single batch, and Shared lock is eliminated when Insert takes place. When number of rows is substantial, SSIS splits rows into several batches, and processes it consequentially. This allows to perform steps following DFT Data Source before the Data Source completes reading.
The design - reading and writing the same table in the same Data Flow is not good because of possible locking issue. Ways to work it out:
Move all DFT logic inside single INSERT statement and get rid of DFT. Might not be possible.
Split DFT, move data into intermediate table, and then - move to the target table with following DFT or SQL Command. Additional table needed.
Set a Read Committed Snapshot Isolation (RCSI) on the DB and use Read Committed on SELECT. Applicable to MS SQL DB only.
The most universal way is the second with an additional table. The third is for MS SQL only.

SQL Server : Query using data from a file

I need to run a query in SQL Server, where I have a particular number of values stored individually on separate lines in a text file, and I need to run a query in SQL server to check if a value in a column of the table matches any one of the value stored in the txt file.
How should I go about doing this ?
I am aware of how to formulate various types of queries in SQL Server, just not sure how to run a query that is dependent on a file for its query parameters.
EDIT :
Issue 1 : I am not doing this via a program since the query that I need to run traverses over 7 million datapoints which results in the program timing out before it can complete, hence the only alternative I have left is to run the query in SQL Server itself without worrying about the timeout.
Issue 2 : I do not have admin rights to the database that I am accessing which is why there is no way I could create a table, dump the file into it, then perform a query by joining those tables.
Thanks.
One option would be to use BULK INSERT and a temp table. Once in the temp table, you can parse the values. This is likely not the exact answer you need, but based on your experience, I'm sure you could tweak as needed.
Thanks...
SET NOCOUNT ON;
USE Your_DB;
GO
CREATE TABLE dbo.t (
i int,
n varchar(10),
d decimal(18,4),
dt datetime
);
GO
BULK INSERT dbo.t
FROM 'D:\import\data.txt'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n');
There are lots of approaches.
Mine would be to import the file to a table, do the comparison with a regular SQL query, and then delete the file-data table if you don't need it anymore.
Bulk import the data from text file into a temporary table.
Execute the query to do the comparison between your actual physical table & temporary table.

SSIS - Delete Existing Rows then Insert, Incomplete Result

I'm relatively new to SSIS and know that handling duplicates is an oft repeated question, so thank you in advance for reading through my wall of text and for any help with my complicated situation.
I have a small 18179 row table (we'll call it Destination) that needs to be updated with SSIS using a flat file. The 18179 row flat file I am testing contains only records that exist in Destination and have changed. Currently, I have a package that loads a staging table (we'll call it Stage) from the flat file, then moves to the Data Flow and Look Up
This Data Flow takes Stage and does Look Up LKP_OrderID from Stage on Destination using primary key OrderID to see if the record exists.
If the OrderID does not exist in Destination, then it follows the New OrderID path and the record is inserted into Destination at DST_OLE_Dest.
Here is where I am having trouble: If the OrderID does exist in Destination, then it follows the Existing OrderID path. The CMD_Delete_Duplicates OLE DB Command executes:
DELETE d
FROM dbo.Destination d
INNER JOIN dbo.Stage s ON d.OrderID = s.OrderID
This should delete any records from Destination that exist in Stage. Then it should insert the updated version of those records from Stage at DST_OLE_Desti.
However, it seems to process the 18179 rows in 2 batches: in the first batch it processes 9972 rows.
Then, in the 2nd batch it processes the remaining 8207 rows. It displays that it inserted all 18179 rows to Destination, but I only end up with the last batch of 8207 rows in Destination.
I believe it deletes and inserts the 1st batch of 9972 rows, then runs the above delete from inner join SQL again for the 2nd batch of 8207 rows, inadvertently deleting the just-inserted 9972 rows and leaving me with the 8207.
I've found that maximizing DefaultBufferSize to 104857600 bytes and increasing the DefaultBufferMaxRows in the Data Flow such that the package processes all 18179 rows at once correctly deletes and inserts all 18179, but once my data exceeds the 104857600 file size, this will again be an issue. I can also use the OLE DB Command transformation to run
DELETE FROM dbo.Destination WHERE OrderID = ?
This should pass OrderID from Stage and delete from Destination where there is a match, but this is time intensive and takes ~10 minutes for this small table. Are there any other solutions out there for this problem? How would I go about doing an Update rather than an Insert and Delete if that is a better option?
Yeah, you've got logic issues in there. Your OLE DB Command is firing that delete statement for EVERY row that flows through it.
Instead, you'd want to have that step be a precedent (Execute SQL Task) to the Data Flow. That would clear out the existing data in the target table before you began loading it. Otherwise, you're going to back out the freshly loaded data, much as you've observed.
There are different approaches for handling this. If deletes work, then keep at it. Otherwise, people generally stage updates to a secondary table and then use an Execute SQL Task as a successor to the data flow task and perform set based update.
You could use the Slowly Changing Dimension tool from the SSIS toolbox to update the rows (as opposed to a delete and re-insert). You only have 'Type 1' changes by the sounds of it, so you won't need to use the Historical Attributes Inserts Output.
It would automatically take care of both streams in your illustration - inserts and updates

Retrieving the last inserted rows

I have a table which contains GUID and Name columns and I need to retrieve the last inserted rows so I can load it into table2.
But how would I find out the latest data in Table1. I seem to be lost at this I have read similar posts posing the same question but the answers don't seem to work for my situation.
I am using SQL Server 2008 and I upload my data using SSIS
1 - One way to do this is with triggers. Check out my blog entry that shows how to copy data from one table to another on a insert.
Triggers to replicate data = http://craftydba.com/?p=1995
However, like most things in life, there is overhead with triggers. If you are bulk loading a ton of data via SSIS, this can add up.
2 - Another way to do this is to add a modify date to your first table and modify your SSIS package.
ALTER TABLE [MyTable1]
ADD [ModifyDate] [datetime] NULL DEFAULT GETDATE();
Next, change your SSIS package. In the control flow, add an execute SQL task. Insert data from [MyTable1] to [MyTable2] using TSQL.
INSERT INTO [MyTable2]
SELECT * FROM [MyTable1]
WHERE [ModifyDate] >= 'Start Date/Time Of Package';
Execute SQL Task =
http://technet.microsoft.com/en-us/library/ms141003.aspx
This will be quicker than a data flow or execute OLEDB command since you are working with the data on the server.

Why doesn't SSIS OLE DB Command transformation insert all the records into a table?

I have an SSIS package that takes data from Tables in an SQL database and insert (or update existing rows) in a table that is another database.
Here is my problem, after the lookup, I either insert or update the rows but over half of the rows that goes into the insert are not added to the table.
For the insert, I am using an Ole Db Command object in which I use an insert command that I have tested. I found out why the package was running without error notification but still not inserting all the rows in the Table.
I have checked in sqlProfiler and it says the command was RCP:Completed which I assume means it supposedly worked.
If I do the insert manually in sql management studio with the data the sql profiler gives me (the values it uses toe execute the insert statement with), it works. I have checked the data and everything seems fine (no illegal data in the rows that are not inserted).
I am totally lost as to how to fix this, anyone has an idea?
Any specific reason to use OLE DB Command instead of OLE DB Destination to insert the records?
EDIT 1:
So, you are seeing x rows (say 100) sent from Lookup transformation match output to the OLE DB destination but only y rows (say 60) are being inserted. Is that correct? Instead of inserting into your actual destination table, try to insert into a dummy table to see if all the rows are being redirected correctly. You can create a dummy table by clicking on the New... button on the OLE DB destination. It will create a table for you matching the input columns. That might help to narrow down the issue.
EDIT 2:
What is the name of the table that you are trying to use? I don't think that it matters. I am just curious if the name is any reserved keyword. One other thing that I can think of is whether there are any other processes that might trigger some action on your destination table (either from within the package or outside of the package)? I suspect that some other process might be deleting the rows from the table.