Insertion in Leaf table in SQL Server MDS is very slow - sql

I am using SSIS to move data from an existing database to an MDS database.
I am following the following Control flow;
Truncate TableName_Leaf
Load Data to stg
The second step has the following data flow:
1. Load data from source database (This has around 90000 records)
2. Apply a data conversion task to convert string datatype to Unicode (as MDS only supports Unicode)
3. Specify TableName_Leaf as OLE DB destination.
The step 1 and 2 are completing quickly, but the insertion to Leaf table is extremely slow. (It took 40 seconds to move 100 rows end to end, and around 6 minutes to move 1000 records.)
I tried deleting extra constraints from the Leaf table, but that also did not improve the performance much.
Is there any other way to insert data to MDS which is quicker or better?

Using a Table or view - fast load. in OLE DB destination connection helped resolve the issue. I used a batch size of 1000 for my case and it worked fine.

Related

SSIS Incremental Load-15 mins

I have 2 tables. The source table being from a linked server and destination table being from the other server.
I want my data load to happen in the following manner:
Everyday at night I have scheduled a job to do a full dump i.e. truncate the table and load all the data from the source to the destination.
Every 15 minutes to do incremental load as data gets ingested into the source on second basis. I need to replicate the same on the destination too.
For incremental load as of now I have created scripts which are stored in a stored procedure but for future purposes we would like to implement SSIS for this case.
The scripts run in the below manner:
I have an Inserted_Date column, on the basis of this column I take the max of that column and delete all the rows that are greater than or equal to the Max(Inserted_Date) and insert all the similar values from the source to the destination. This job runs evert 15 minutes.
How to implement similar scenario in SSIS?
I have worked on SSIS using the lookup and conditional split using ID columns, but these tables I am working with have a lot of rows so lookup takes up a lot of the time and this is not the right solution to be implemented for my scenario.
Is there any way I can get Max(Inserted_Date) logic into SSIS solution too. My end goal is to remove the approach using scripts and replicate the same approach using SSIS.
Here is the general Control Flow:
There's plenty to go on here, but you may need to learn how to set variables from an Execute SQL and so on.

Database structure for avoiding data loss, deadlock & worst performance

Below Image is my database struture image.
I have 100+ sensors which are sending constant data to respective machines.
I have 5-7 different machines which is having different SQL express database installed in it.
All the machines will send its respective data to one SERVER.
Every second each mahcine will send 10 rows as a bulk data to server stored procedure
PROBLEM : managing large data coming from every machines to single server to avoiding deadlock / delay in performance.
Background Logic
Bulk data from machines are stored into temporary table
& than using that temp table i am looping through each record for processing.
Finally in sp_processed_filtered_data lots of insert & updates are present. & there are nested sp's for processing that filtered data.
Current Logic :
Step : 1
Every Machine will send data to SP_Manage stored procedure which consist of bulk data in XML format, which we are converting in SQL table format.
This data is row data. so we filter this data.
Let's say after filtering 3 rows are remained.
so I want to process that each row now.
Step : 2
As I have to process each rows I have to loop through each row to send data to SP_Process_Filetered_Data.
Now this SP is containing complex logic.
I am looping though each records & every machines will send data parallelly.
So I am afraid will it be causing data loss or dead lock.

SSIS. Inserting data from MS ACCESS into SQL Server 2014 Azure. Extremely Slow

I am new to SSIS. As part of my package I follow these steps
Create table
Open DataFlow task
Connect to Access Database
Insert data into SQL Server Table that was created in step 1
I have just run the step and have found that the process is taking forever. I am only bringing in 3 columns with a total row count of 255,000
Column A = INT
Column B = NVARCHAR (255)
Column C = NVARCHAR (255)
Yes, I have been lazy with the data conversion however, with such a small number of records, I didn't think that performance would be an issue at all.
After 10 minutes, only 3% of the data has been inserted. I re check the number of records in the tableafter about 10 seconds and the count only goes up by about 400 records.
I have other packages that import data from text files (much bigger) and they run in seconds, so I have a feeling that this could a MS ACCESS issue.
If this is the case, are you aware if I could use SSIS to trigger a MS ACESS job that could in turn export the file as CSV or text so that my SSIS package could pick it up? I dont want to manually open up MS ACESS and run the job as I am trying to get to as much of an automated solution as possible
Thanks in advance
WOW! Figured it out...
Step 1: Went to my source connection and changed the data access mode from 'table or view' to 'SQL Command', then just wrote a select * from table name
Step 2: Went to destination and changed data access mode from 'table or view' to 'table or view - fast load'
Runs in seconds now

LDF file continues to grow very large during transaction phase - SQL Server 2005

We have a 6 step where we copy tables from one database to another. Each step is executing a stored procedure.
Remove tables from destination database
Create tables in destination database
Shrink database log before copy
Copy tables from source to destination
Shrink the database log
Back up desstination database
during the step 4, our transaction log (ldf file) grows very large to where we now have to consistently increase the max size on the sql server and soon enough (in the far furture) we believe it may eat up all the resources on our server. It was suggested that in our script, we commit each transaction instead of waiting til the end to commit the transactions.
Any suggestions?
I'll make the assumption that you are moving large amounts of data. The typical solution to this problem is to break the copy up in to smaller number of rows. This keeps the hit on transaction log smaller. I think this will be the preferred answer.
The other answer that I have seen is using Bulk Copy, which writes the data out to a text file and imports it into your target db using Bulk Copy. I've seen a lot of posts that recommend this. I haven't tried it.
If the schema of the target tables isn't changing could you not just truncate the data in the target tables instead of dropping and recreating?
Can you change the database recovery model to Bulk Logged for this process?
Then, instead of creating empty tables at the destination, do a SELECT INTO to create them. Once they are built, alter the tables to add indices and constraints. Doing bulk copies like this will greatly reduce your logging requirements.

SSIS data import with resume

I need to push a large SQL table from my local instance to SQL Azure. The transfer is a simple, 'clean' upload - simply push the data into a new, empty table.
The table is extremely large (~100 million rows) and consist only of GUIDs and other simple types (no timestamp or anything).
I create an SSIS package using the Data Import / Export Wizard in SSMS. The package works great.
The problem is when the package is run over a slow or intermittent connection. If the internet connection goes down halfway through, then there is no way to 'resume' the transfer.
What is the best approach to engineering an SSIS package to upload this data, in a resumable fashion? i.e. in case of connection failure, or to allow the job to be run only between specific time windows.
Normally, in a situation like that, I'd design the package to enumerate through batches of size N (1k row, 10M rows, whatever) and log to a processing table what the last successful batch transmitted would be. However, with GUIDs you can't quite partition them out into buckets.
In this particular case, I would modify your data flow to look like Source -> Lookup -> Destination. In your lookup transformation, query the Azure side and only retrieve the keys (SELECT myGuid FROM myTable). Here, we're only going to be interested in rows that don't have a match in the lookup recordset as those are the ones pending transmission.
A full cache is going to cost about 1.5GB (100M * 16bytes) of memory assuming the Azure side was fully populated plus the associated data transfer costs. That cost will be less than truncating and re-transferring all the data but just want to make sure I called it out.
Just order by your GUID when uploading. And make sure you use the max(guid) from Azure as your starting point when recovering from a failure or restart.