SSIS vs. DTS performance

SSIS vs. DTS performance - sql

Seems crazy to be doing this at this late date, but...
I am rebuilding some ETL infrastructure with a Rocket Software UniVerse source and an SQL destination. The old destination platform was SQL 2000 on Windows Server 2003, the new platform is SQL 2012 on Windows Server 2012. In both cases, an ODBC driver is used to connect to the source. Everything seems to work fine on the new platform, but the execution time for a package is exponentially slower. For example, one table with roughly 1.3 Million rows and 28 Columns takes about an hour using SQL 2000/DTS and over 3.5 hours using SQL 2012/SSIS. Both SQL servers are virtualized on Xen Server, the 2012 server has more RAM and more vCPUs, neither machine has an advantage in disk infrastructure. No metrics (Memory, disk IO, etc.) are red-lining (or really even coming close) on the 2012 server during package execution.
I have read several forum posts describing the same scenario, but none really seemed to have a solution that works for me. Since all of these posts were quite dated (most of these conversions from DTS to SSIS happened in the SQL 2005 days), I was curious if there was any fresher info out there.
The packages are very simple table copies, no transforms. I am using a "SELECT column, column,.. FROM sourcetable" for my source connection and 'Table or View - Fast Load' for my destination. The slow down APPEARS to be on the source side of the equation, though I can't be certain.
Any help appreciated.

One option to investigate is lowering the buffer size in your data flow. By default, it's set at 10k rows. If you have a slow data source, it can take quite a while to fill up the "bucket" of data just to start sending a batch of information down to the destination. While it might seem counterintuitive, lowering that number can increase performance as 5k, or 1k or 100 rows of data fill the bucket much sooner. That data then gets shuffled through the data flow and lands in the source while bucket 2, 3, etc are being filled.
If you have a SQL Server source, you can optimize your query by hinting that you'd like a fast N rows, which you'd align with your SSIS package's row size.
See Rob Farley's article for more details about that.

Related

Diagnosing SQL speeds

I just migrated a production DB onto some new hardware (in a sandbox setting) because they were suffering from poor IO performance.
Testing a simple select * from TABLE query on one of the large tables (123m rows, 24 columns) takes about 20 minutes. I can see that the query maxes out a single core on the SQL server, but memory consumption and disk IO are non-existent.
In the resource monitor there are 0 waits, other than Network I/O which is at 700-800.
The query is being run from a local install of MSSQL Mgmt studio.
Data file I/0 is 0 in the activity monitor.
Wait time in the query is about double the active CPU time.
I am not sure if this is a problem that I need to solve, or that's just the way it works.
I was actually testing the speed of the query directly on the server vs. it being called by my users app - to diagnose if an ODBC driver might be holding things up, as reading from the database took 98% of the scripts time.
Ran a select * from query and was expecting it to complete much faster than it did.
EDIT: Its SQL 2017

Internal.cleanup_server_retention_window does not function properly

SSISDB Database has become so huge in size and clean-up task is required but nothing seems to work
The size of my SSISDB log data is currently really huge (112 GB). When investigating why it reached such huge size, I realized that the SSIS maintenance job was not migrated during the server migration. I tried to clean up the log data by using the built-in stored procedure [internal].[cleanup_server_retention_window] and setting the retention_window to 7 days (The database has not been cleaned up for more than 3 months)
However, the stored procedure does not seem to decrease the size. Instead it took so long (many hours) to complete and made some table larger, e.g. [internal].[event_message_context]. Does it mean that while cleaning up /deleting the log entries, the stored procedure also inserted new logs into the table?
Other options (see below) that I have found on the internet does not seem to function either. It took so long to complete and the size does not seem to decrease.
http://cryptoknight.org/index.php?/archives/1-SSIS-Maintenance-Script.html
I'm expecting to find a solution that can help me drastically reduce the size of my log data and to keep the retention only for 3 days.
P.S.:I'm allowed to disable the SQL Server Agent during the clean up.

It seems there is a problem with SQL Server. For version 2017, the Microsoft has released patch via CU 17. The issue is mentioned here
Issue
You can download the latest CU using the following link.

Completion latency to copy a SQL Azure database?

I am wondering about how much times it takes to complete a database copy over SQL Azure. I am considering a scenario where :
a single database is populated first, and then stay read-only.
a set of copies are created.
an embarrassingly task gets paralleled over each copy (read-only).
copies are deleted to lower hosting cost.
Such a scenario makes sense if the database copy on SQL Azure is reasonably fast.
Does anyone has some information concerning the latency to complete a copy of an SQL Azure Database, maybe w/o of the GB size of the database (assuming that smaller DB get copied faster than big ones)?
Subsidiary question: if 10 copies of the DB are triggered at the same time, will it takes 10x more time to complete the 10th copy? or does SQL Azure support some level of parallelization for such an operation.

I have no empirical numbers for how long it takes to copy DBs of various sizes, but in my experience the time is usually minutes. For the DBs that I regularly work with which are less than 100MB I allow 5 minutes, but this is probably quite generous. I've occasionally copied larger databases and it doesn't seem to take much larger than that, I suspect a lot of the time is actually spent provisioning the new database rather than copying data.
I'm taking a guess at what would happen if you initiated multiple copies, but because of the SQL Azure infrastructure I'd be surprised if there was much of a slowdown if multiple copies were initiated at the same time.
I don't know how long you want the whole process to take, but I think it's basically a good idea. I'd highly recommend doing some of your own benchmarking though.

I've found it particularly slow. I just copied a tiny 3.45MB database, and it took in excess of 5 minutes. It started 6:42, finished 6:49.
This was just using the SQL command line create with copy. Eg:
CREATE DATABASE NewDB AS COPY OF OldDB;
I'm not sure what the deal was - whenever I went to check progress, it wasn't showing anything, then suddenly it was just done.
Eg:
SELECT * FROM sys.dm_database_copies c
JOIN sys.databases d ON c.database_id = d.database_id
WHERE databases.name = 'NewDB';
The percent_complete column was null each time I looked. I was actually concerned I'd done something wrong...

What are the factors that affect the time taken to run a SQL on a database?

I have a query that runs on a data warehouse. I ran the report last month. It gave me some results in say x minutes. The same report when run on the same database without any modifications to the database returns the same results but in y minutes now.
y>x. The difference between the time is so large.
The amount of data and the indexes are also the same. There is no difference in them.
Now clients ask for me for a reason for this. What are the possible reasons for this?

You leave a lot of questions open
is the database running on a dedicated server.
do you run the reports from clients or directly on the server.
have there been changes to the phyisical network, have some settings been changed.
did they (by accident) change the protocol to communicate with the server (tcp, named-pipes, ...)
have you tried defragmenting
have you rebooted the server
do you have an execution plan before and after

Most likely the query plan has changed. Some minor difference in data has pushed the query optimisers calculations onto a new, less optimal plan.

Here are a few:
The amount of data in the warehouse has changed.
Indexes might have been modified.
Your warehouse is split across different servers and there is connectivity lag between them...
Your database server is processing something else as well due to which it has lesser memory and cpu for ur reports to run.

SQL Server 2005 "Pin" data in Memory

We're running our application's database on dedicated box running only SQL Server 2005.
This DB server has 32 Gb of RAM... and the database file itself is only 6 Gb.
I'd like to force several of the heavily read/queried tables into the SQL Memory buffer to increase speed.
I understand that SQL server is really good about keeping necessary data cached in memory once it's read from disk... But our clients would probably prefer their query running quickly the FIRST time.
"Fastest Performance the Second Time" isn't exactly a product highlight.
Short of the old "Pin Table" DBCC command.. any thoughts?
I've written a "CacheTableToSQLMemory" Proc which Loops through all of a table's Indexes (Clustered & Non) , performing a "Select *" into a Temp table. I've scheduled SQL Agent to run a "cache lots of tables" Proc every 15 minutes in an attempt to keep pages in Memory.
It works to a large extent.. but even after I cache all of a query's relevant tables, running a query still increased the Count of Cached pages for that table. then it's faster the 2nd time.
thoughts?
We're running PAE & AWE. SQL is set to use between 8 & 20 GB of RAM.

The x86 bottleneck is your real issue. AWE can serve only data pages, as they can be mapped in and out of the AWE areas, but every other memory allocation has to cram in the 2GB of the process virtual address space. That would include every thread stack, all the code, all the data currently mappen 'in use' from AWE and, most importantly, every single cached plan, execution plan, cached security token, cached metadata and so on and so forth. and I'm not even counting CLR, I hope you don't use it.
Given that the system has 32GB of RAM, you can't even try /3GB and see if that helps, because of the total PAE reduction to 16GB in that case that would make half your RAM invisible...
You realy, really, have to move to x64. AWE can help only that much. You could collect performance counters from the Buffer Manager and Memory Manager objects and monitor sys.dm_os_memory_clerks so you could get a better picture of how is the instance memory behaving (where does the memory in use go, who is consuming it etc). I don't expect that will help you solve the issue really, but I do expect it will give you enough information to make a case for the upgrade to x64.

There is no way to pin tables in memory in SQL Server 2005. If SQL Server is dropping the tables from memory, it's because there is memory pressure from other parts of the system. Since your database is only 6GB, the database should stay in memory... provided that there are no other databases on the server.
There are a few things you can do to try to keep data in memory, though. Depending on the patch level and edition of your SQL Server installation, you might be able to make use of the lock pages in memory functionality to ensure that SQL Server's memory never gets paged out.
You can also change the memory allocation on the server to be a fixed size. Unless there's something else on your database server, you can set SQL Server's min and max memory to the same value. This won't necessarily prevent this from happening in the future (it's a function of how SQL Server is supposed to work) but it certainly won't hurt to set your SQL Server to use a fixed amount of memory (if you have no other memory concerns).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SSIS vs. DTS performance - sql

Related

Diagnosing SQL speeds

Internal.cleanup_server_retention_window does not function properly

Completion latency to copy a SQL Azure database?

What are the factors that affect the time taken to run a SQL on a database?

SQL Server 2005 "Pin" data in Memory

Categories

Resources