SSIS custom transformation component memory usage - sql

I am moving large volumes of data from Oracle to SQL server on a daily basis. The application using the oracle system stores dates in non-standard formats and these need to be converted to SQL Server dates.
I have created a custom SSIS transformation components t odo this. My issue is that when running the SSIS packages consume a huge amount of memory, often reaching multiple gigabytes. The usage keeps ballooning while the package is running.
The issue is with the process "IISServerExec.exe" running on the server. When seen from the task manager the memory usage constantly increases while running. Several of these package need to run at the same time each day, but with this memory ballooning the system can only manage two or three.
I have also followed numerous online examples and they suffer the same problem. One example is the simple component from microsoft to convert a string to uppercase. This example consumed over 600mb of ram, with 6 input columns over 9 million rows.
If I create similar (but simpler) transformations using derived columns these consume less than 100mb of memory.
I am using SQL Server 2012 SP2 (11.0.5058) and have tested on four separate machines running Windows 7 and Server 2008 R2, each with all updates installed through windows update. All programming is done in VB on Visual Studio 2013. Oracle connection are using the Attunity source connector.
Is there a command you need to run at the end of the processInput section to flush out the ram or is memory usage expected?

Related

SQL database copied and updated

I have a main system that I use to add records to and run multiple routines on an SQL database using MS Access. Some routines take days to run.
I want to build a second PC where I can hopefully easily update its copy of the database and then run the long routines on while continuing to keep up the day-to-day activities on the main system.
How easy (or feasible) is it to take a copy of a sql database from one computer and update it on another?
Those processing times sound rather large. I would suggest that you consider building some server side routines that run on SQL server – they will run much faster, and more important reduce if not eliminate most network traffic. Keep in mind that working with lots of data using Access to SQL server can and will often run SLOWER than just using Access without SQL server.
As for moving the SQL database to another computer? The idea and concept is very much like moving a file like Access to another computer. You simply from SQL manager create backup file on the first computer. (it will be a single file – choose device, and then add a file location). You also find that such .bak files from SQL zip REALLY well, so if you using some kind of FTP or internet, then zip it file before you transmit it.
Regardless, you can even transfer that “bak” file with a jump drive or whatever. You then restore that database to the other computer – and you off to the races. (on the target computer, or your local computer, simply choose “restore” and restore the database from the “bak” file you transferred to the other computer running sql server. So the whole database with many tables etc. will be turned into a single file - much like access is.
So moving and making copies of a SQL database is a common task, and once you get the hang of this process, it not really much harder than moving an access file between computers.
I would however question why the process are taking so long – they may well be complex, but the use of store procedures and pass-through quires would substantial speed up your application as opposed to processing the data with a local client like Access + VBA. So try adopting more t-sql and store procedures – they will run “many” times faster – often 1 hour can be cut down to say 1 minute or less. So moving is easy, but you might eliminate the whole "day" of time down to a few minutes of time if you can run the processing routines server side as opposed to client side of which will occur when using Access as the client. (the access client can most certainly send or call t-sql routines that run server side - but the main trick here is to get those processing routines running server side).
Last time I used ms access it was just a file like an excel file which you could simply copy to the other machine and do whatever you want with.
If you are fairly comfortable with sql/administration, and only need a one-way copy (main system to second PC) then you could migrate ms access to mysql:
http://www.kitebird.com/articles/access-migrate.html
(Telling Microsoft Access to Export Its Own Tables)
This process should be fairly easy to automate if you need to do this regularly.

UnixODBC driver issues for .MDF databases. OR: is there a way to easily extract a bunch of tables without an sql server?

Disclaimer: I am somewhat of a n00b when it comes to database programming, so bear with me.
I've been attempting to batch process a rather large amount (~20 gb) of data all contained in .MDF SQL database files. The files contain meteorological data obtained through weather balloons, with each table consisting of ~1 second observations of winds, pressure, height, temps, etc, and are created with our radiosonde tracking software on an unnetworked Windows machine. It is possible (and quite easy) to load the files using the associated software and export the tables as an ASCII text file...however, this process involves manually loading each one. As I'm performing a study that requires as many soundings as possible (we have over 2000), doing this process over and over for several years of twice-daily observations is extremely time-prohibitive.
I've been taking the files off of the computer and putting them on my laptop running Linux Mint, and consider myself to be fluent with Perl...I do most of my data analysis with Perl scripts. That said, I've had the darndest time trying to get into the database files!
I've tried to connect to one of the files using the DBI package using variants on
$dbh = DBI->connect("DBI:ODBC:$filename") or die "blahblahblah";
I have unixODBC installed and configured, have downloaded "libmyodbc.so" and "libodbcmyS.so", and keep getting the error
DBI connect('','',...) failed: [unixODBC][Driver Manager]Data source name not found, and no default driver specified (SQL-IM002) at dumpsql.pl line 6.
I've tried remedying this a number of ways over the past couple days, and I won't post them here for the sake of brevity. My odbcinst.ini file is as follows:
[MySQL]
Description = ODBC for MySQL
Driver = /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so
Setup = /usr/sib/x86_64-linux-gnu/odbc/libodbcmyS.so
FileUsage = 1
I'm seriously confused. I THINK I'm doing everything that various online tutorials are suggesting, but everyone else is connecting to servers and these files are all local and in the same directory! Could anyone attempt to point me in the right direction? All I want is to calculate meteorological values using vertical sounding data! Am I missing something totally obvious?
Any help would be greatly appreciated!
It seems the original database server was a Microsoft SQL Server (MDF files). I am afraid these files alone are useless on a Linux machine. You need a Microsoft SQL Server on a Windows machine to get access to the contained data.
You described that you are able to attach a MDF file on a SQL server manually and then you can export the needed data as text files. Try to automate that. I'm not a MS SQL Server expert but it should be possible.
E.g. here is a tutorial to attach and detach a MDF file via T-SQL. So my approach would be to write a script which iterates over the 2000 MDF files and attach each to the SQL server. Then execute a query to export your data and then detach the MDF.

Reason why data returned is different. SSMS 2012

I'm pretty new when it comes to memory issues with SQL. Our company currently has SSMS 2012 (we updated from SSMS 2008 R2).
I'm running a procedure in SSMS 2012 that has two cursors. Within the cursors, I have a dynamic SQL statement that is grabbing certain data from tables. When the procedure is done, I notice that the data grabbed from those tables is correct in certain instances and incorrect in others. This changes with every run (final result is VERY large). I know my code is correct because I have tested it with less data and my coworker has checked it as well.
I did just notice that the 2008 R2 version has recommended RAM of 2.048GB or more and the 2012 version has recommended RAM of at least 4GB. Our company currently has 4GB of RAM on our server (blame IT). Could this be a reason as to why when I run it one time I get correct data and when I run it another time I don't?
Any sort of explanation would be helpful as I am pretty new to this stuff.

DQS SSIS package hangs in VS2010

I have a package that is used for DQS cleansing. I have nearly 650,000+ records to clean however after about 350,000 records processed I get a symptom that hangs my project up. For example, I will close my visual studio project/solution but once I try to go back to open my project I get the message "visual studio is waiting for an internal operation" in the lower right hand corner. Once this happens I can't click or scroll anywhere in my project.
I am using SQL 2012 to move data from one table to another but into another database within the SQL 2012 server/instance. In addition, I'm using the DQS client to clean and validate data for last name, state and country. My visual studio is version 2010. I'm running this all on a virtual machine that has 8g of ram and 4 cores. I do have the cumulative service pack installed for SQL 2012.
At this point, I have to kill my VS2010 in task manager but can't seem to work my SSIS project any more. I have to delete all my records in my destination table then I can get into my project once more.
Thanks for any help or ideas,
Michael
DQS Cleansing is a VERY resource intensive task. According to the Data Quality Services Performance Best Practices Guide, even when adhering to hardware recommendations and best practices, DQS cleansing on 1 million rows can take between 2-3.5 hrs.
Also, I agree with Pondlife's comments about running in BIDS vs DTEXEC. BIDS/SSDT is 32bit (limiting memory to 2-3GB) while DTEXEC has a 64bit version which can use way more memory.

What are my options - sql express or?

I have a client who has a VB 6.0 application with MS Access as backend. But, now the Access is unable to take the load. So, we are now considering to shift to SqlExpress Edition. Also, we will convert VB6.0 application to c# based Winforms. My questions -
1) Can SqlserverExpress support 10 users concurrently? If not SqlExpress, then what other options are available?
2) Should I first convert VB 6.0 to C# application? Because, if I transfer data to Sqlserver, will VB 6.0 application continue to work?
thanks
Yes it can
You don't need to convert your app, but Access and Sql Express - are different database engines, so you will need to adopt your app to sql express
Note, that sql express prior to 2008 R2 can handle up to 4 Gb databases, while 2008 R2 can handle up to 10 Gb per database.
1) SQL Express allows over 32 thousand simultaneous users. The only real limit is database size, which is 10 Gigabytes.
2) You'll need to at least modify the VB 6 application to have the correct connection string before it will work with SQL server.
I am curious though why you say that Access (the JET database engine) is unable to take the load. Usually 20 or even more simultaneous users are no problem.
If product is for in house use and doesnt generate cash you can use oracle. Its free to use unless your app is for commercial use.
One question to ask is after how many users does the system slow down? And if it is slow with one user, then this might be some software design issue and not necessary the "load" on the server. There also the issue of the type of connection WAN or LAN. In fact you can read about this issue in the following article:
http://www.kallal.ca//Wan/Wans.html
The above in a nutshell explains why the Access data engine does not work well on a WAN.
Also migration of data to SQL server without determining what particular issue is causing the slowdown could very well result in a further slowdown. In other words often just up-sizing data to a server database engine will not solve performance issues and in some cases it can become worse.
In fact in many on line Access forums we often see users complain of a slowdown when moving the back end data file from Access to SQL server. So just moving to SQL server without taking advantages of SQL server features does not always guarantee a performance boost.
The other issue you want to determine here is if the VB6 program uses ADO or DAO. Either data object model is fine, but ADO would suggest LESS code will have to be modified then if the application is based on DAO.
Another issue is you not mentioned how large tables are, and the number. So say 30 to 50 highly related tables, and say a small number of rows (say 200,000) in some of the tables should run just fine with 5 to 15 users. If your user count is only about 10, and your table row counts are small as noted then performance should be ok, and if it is not, then as noted you might be able to keep the application as is and moving the data to SQL server may not yield performance gains without further code modifications. And of course SOME code will have to be modified to work with SQL server - how much will depend on the data object used, and how much code there is over all. (more recordset code = more chance of needing more code changes).
If you do decide to convert from Access to SQL Server Express, there is a migration wizard which can give you a quick start with that process. Here's the link