SSIS Designer and splitting Packages

SSIS Designer and splitting Packages - sql

I have a large solution which is currently in 1 big package. I've started to split the package into smaller packages but have noticed that more memory is being used on the SQL Server when running the solution.
Has anyone else seen this when using multiple packages?

SQL Server, sqlservr.exe, is generally run as a service and as you use it, it's going to continue consuming memory until it's sucked it all and windows forces it back down. That's by design as the database performs best when it has as much data as possible in memory vs having to read from disk. You should configure your SQL Server instance to have a maximum memory so that the OS can have room to breathe. How much memory should you reserve?
SQL Server Integration Services, SSIS, runs in its own address space - even though you can launch it from SQL Server you'll see it, dtexec.exe, handles asking for memory and in the event the process crashes, it does not bring down SQL Server. This is a very good thing, the separation. From a practical perspective, it means that if you are going to run SSIS packages on a machine, you need to leave enough memory for SSIS to run and guess what, SSIS is a blazing fast in-memory ETL solution. As much as it can, an SSIS data flow task is going to hold data in its memory so it can manipulate it (change data type, lookup, etc) in one big pass before writing it to destination as the IO is the most expensive part of ETL.
But, as you're developing these packages, you're running them from Visual Studio, devenv.exe. VS/SSDT needs memory to do its thing. And hey, when you run an SSIS package from Visual Studio, that's wrapped in the debugger call (can't recall the process name) and that too sucks memory to be able to provide debugging capability.
Sadly, a four gigabyte allocation of RAM for a developer machine is insufficient. And if this is a server, the licensing cost alone dwarfs what it'd cost to max that box out on memory.
Were it me, I'd cap SQL Server about 1.5 GB. Under a gig is usually not enough for SQL Server to do much of anything. Assume that Visual Studio and the debugger are going to be good for about 2 gigs when things get hot and heavy. That leaves .5 gigs reserved to the OS (and Outlook, Excel, Windows Explorer, Web browser pointed at StackOverflow and MSDN documentation and crap, we're out of memory)
To address memory usage by SSIS. I would think but have not tested, that 1 package with 10 data flow connected serially versus 10 packages with 1 data flow each, the monolithic package would consume more memory as it will validate all the data flows when it starts. Yes, there is startup overhead that is shared by the monolithic approach which will be allocated for each individual package but I can't imagine it will be of any significance. Plus, that memory is returned to the OS once the dtexec process completes. It is not like SQL Server which will hold on to the memory until the process cycles.

Related

Unable to control memory utilization on server increasing with ruby process

I have a Rails 3.2 application running on production server. The server has 8 GB of RAM and every other process works fine. But, there is a ruby process which keeps the memory utilization on the higher side. I have to manually login to the server console and type the TOP command and kill the process using the PID.
But, I am unable to figure out how to check which ruby process is taking so much of memory and also how to control it permanently.
Please suggest me a solution.
Thanks.

Could be so many things. Finding memory leaks is tough. What kind of application server are you using? If you're using Unicorn consider checking out Puma. It's actually really easy to switch over. We saw big gains in our app when we switched to Puma.
Also look through your app for n+1 queries. Optimizing some queries here and there would help tremendously.
Another thing you could consider trying is moving some longer running tasks to a background job with something like sidekiq.
Lots of performance monitoring services out there, like New Relic, that you could check out as well. Without more info it's a tough question to answer.

Write time to hard drive

I realize this number will change based on many factors, but in general, when I write data to a hard-drive (e.g. copy a file), how long does it take for that data to actually be written to the platter after Windows says the copy is done?
Could anyone point me in the right direction to discover more on this topic?

If you are looking for a hard number, that is pretty much unknowable. Generally it is the order of a tens to a few hundred milliseconds for the data to start reaching the disk platters, but can be as high as several seconds in a large server disk array with RAID and de-duplication.
The flow of events goes something like this.
The application calls a function like fwrite().
This call is handled by the filesystem layer in your Operating System, which has to figure out what specific disk sectors are to be manipulated.
The SATA/IDE driver in your OS will talk to the hard drive controller hardware. On a modern PC, it typically uses DMA to feed the data to the disk.
The data sits in a write cache inside the hard disk (RAM).
When the physical platters and heads have made it into position, it will begin to transfer the contents of cache onto the platters.
Steps 3-6 may repeat several times depending on how much data is to be written, where on the disk it is to be written. Additionally, there is usually filesystem metadata that must be updated (e.g. free space counters), which will trigger more writes to the disk.
The time it takes from steps 1-3 can be unpredictable in a general purpose OS like Windows due to task scheduling, background threads, and your disk write is probably queued up with a few dozen other processes. I'd say it is usually on the order of 10-100msec on a typical PC. If you go to the Windows Resource Monitor and click the Disk tab, you can get an idea of the average disk queue length. You can use the Performance Monitor to produce more finely-controlled graphs.
Steps 3-4 are largely controlled by the disk controller and disk interface (SATA, SAS, etc). In the server world, you can be talking about a SAN with FC or iSCSI network switches, which impose their own latencies.
Step 5 will be controlled by they physical performance of the disk. Many consumer-grade HDD manufacturers do not post average seek times anymore, but 10-20msec is common.
Interesting detail about Step 5: Some HDDs lie about flushing their write cache to get better benchmark scores.
Step 6 will depend on your filesystem and how much data you are writing.

You are right that there can be a delay between Windows indicating that data writing is finished and the last data actually written. Things to consider are:
Device Manager, Disk Drive, Properties, Policies - Options for disabling Write Caching.
You might be better off using Direct I/O so that Windows does not save it temporarily in File Cache.
If your program writes the data, you can log what has been copied.
If you are sending the data over a network, you are likely to have no control of when the remote system has finished.
To see what is happening, you can set up Perfmon logging. One of my examples of monitoring:
http://www.roylongbottom.org.uk/monitor1.htm#anchor2

SQL server query response time influenced by "outside" processes?

Can the performance (response time) of a query executed in a DBMS like SQL Server be influenced by whatever it's happening on the machine on which the server runs? To be more specific, is the response time expected to increase when running a couple of Windows processes that continuously check and clean the machine, and process data received from the network?
Thanks.

The four key resources for any program are available memory, processors, disk space, and disk usage.
Let's investigate each of these in turn. Available memory is well-managed in SQL Server (see here). The default behavior is to start with a bunch of memory and then increase it as necessary. If your query load is not changing, then SQL Server should hit a maximum amount of memory and stop growing. It sounds like your query load is consistent over time, so memory would not be a big issue. Also, many configurations of SQL server fix the memory size to avoid interference with other processors.
Processing power. This can be a big one. SQL Server requires processing power. The processors may be used by other Windows processes. This would slow down queries, particularly those that are processing (as opposed to I/O) constrained. However, this might be mitigated on a multi-processor machine. A given SQL Server instance might be assigned a certain number of processors. The rest could be used for Windows.
Disk space. This has little impact. In general, either disk that is needed is available or it is not (and the query needing it fails). One exception is temporary disk space, the availability of which can influence query execution plans. Often, temporary space is put on its own drive to avoid needless conflict with other processes.
I/O bandwidth. SQL Server needs to communicate through the file system to disk. This can be a real performance drag, and it can occur at multiple different levels. The operating system itself could be saturated with I/O calls, slowing down the database. The network between the CPUs and the disks could be saturated, slowing down reading and writing speeds. The disk system itself can be slow because of multiple concurrent actions -- even from different servers. And this can get all the more complicated in a virtual environment.
The answer is "yes". Windows processes can affect the performance of SQL queries. My best guess is that the affect would be either in terms of eating up processors or using up disk bandwidth.

Yes, the database server is sharing resources with anything else that runs on the machine, so any resource intensive process could affect the database performance noticeably.
One important resource is the memory. The SQL Server will by default use up all free memory if it has any use for it. If you are running any other processes on the server, you should limit the memory use of SQL Server so that it allocates a bit less, to allow room for the other proccesses, to reduce memory swapping.

What is the performance hit for using WCF Performance Counters (performanceCounters = "ALL")?

Does anyone have experience with using the WCF Performance Counters in a production system and running into any performance issues? I suspect if you are monitoring all Service, Endpoints, and Operations and log all counters to a file, sampling every second, then this is the worst case scenario. From what I gather, the hit comes when you actually sample, not when the counters are turned on. Any real-life experience out there using them in production?

I can't answer for the WCF ones in detail, but Performance Counters in general work by writing values to some shared memory all the time. So WCF always writes values to a memory mapped file or a shared section in a dll.
When the perfmon application wants to display them, it loads the shared memory and reads from it. That's not a performance hit particularly.
The problem comes when you want to do something with that counter data, like write it to a file or update a graph. That's when the performance starts to be noticeable. This goes double if the reader is running across the network.

Best Dual HD Set up for Development

I've got a machine I'm going to be using for development, and it has two 7200 RPM 160 GB SATA HDs in it.
The information I've found on the net so far seems to be a bit conflicted about which things (OS, Swap files, Programs, Solution/Source code/Other data) I should be installing on how many partitions on which drives to get the most benefit from this situation.
Some people suggest having a separate partition for the OS and/or Swap, some don't bother. Some people say the programs should be on the same physical drive as the OS with the data on the other, some the other way around. Same with the Swap and the OS.
I'm going to be installing Vista 64 bit as my OS and regularly using Visual Studio 2008, VMWare Workstation, SQL Server management studio, etc (pretty standard dev tools).
So I'm asking you--how would you do it?

If the drives support RAID configurations in your BIOS, you should do one of the following:
RAID 1 (Mirror) - Since this is a dev machine this will give you the fault tolerance and peace of mind that your code is safe (and the environment since they are such a pain to put together). You get better performance on reads because it can read from both/either drive. You don't get any performance boost on writes though.
RAID 0 - No fault tolerance here, but this is the fastest configuration because you read and write off both drives. Great if you just want as fast as possible performance and you know your code is safe elsewhere (source control) anyway.
Don't worry about mutiple partitions or OS/Data configs because on a dev machine you sort of need it all anyway and you shouldn't be running heavy multi-user databases or anything anyway (like a server).
If your BIOS doesn't support RAID configurations, however, then you might consider doing the OS/Data split over the two drives just to balance out their use (but as you mentioned, keep the programs on the system drive because it will help with caching). Up to you where to put the swap file (OS will give you dump files, but the data drive is probably less utilized).

If they're both going through the same disk controller, there's not going to be much difference performance-wise no matter which way you do it; if you're going to be doing lots of VM's, I would split one drive for OS and swap / Programs and Data, then keep all the VM's on the other drive.
Having all the VM's on an independant drive would let you move that drive to another machine seamlessly if the host fails, or if you upgrade.

Mark one drive as being your warehouse, put all of your source code, data, assets, etc. on there and back it up regularly. You'll want this to be stable and easy to recover. You can even switch My Documents to live here if wanted.
The other drive should contain the OS, drivers, and all applications. This makes it easy and secure to wipe the drive and reinstall the OS every 18-24 months as you tend to have to do with Windows.
If you want to improve performance, some say put the swap on the warehouse drive. This will increase OS performance, but will decrease the life of the drive.
In reality it all depends on your goals. If you need more performance then you even out the activity level. If you need more security then you use RAID and mirror it. My mix provides for easy maintenance with a reasonable level of data security and minimal bit rot problems.
Your most active files will be the registry, page file, and running applications. If you're doing lots of data crunching then those files will be very active as well.

I would suggest if 160gb total capacity will cover your needs (plenty of space for OS, Applications and source code, just depends on what else you plan to put on it), then you should mirror the drives in a RAID 1 unless you will have a server that data is backed up to, an external hard drive, an online backup solution, or some other means of keeping a copy of data on more then one physical drive.
If you need to use all of the drive capacity, I would suggest using the first drive for OS and Applications and second drive for data. Purely for the fact of, if you change computers at some point, the OS on the first drive doesn't do you much good and most Applications would have to be reinstalled, but you could take the entire data drive with you.
As for dividing off the OS, a big downfall of this is not giving the partition enough space and eventually you may need to use partitioning software to steal some space from the other partition on the drive. It never seems to fail that you allocate a certain amount of space for the OS partition, right after install you have several gigs free space so you think you are fine, but as time goes by, things build up on that partition and you run out of space.
With that in mind, I still typically do use an OS partition as it is useful when reloading a system, you can format that partition blowing away the OS but keep the rest of your data. Ways to keep the space build up from happening too fast is change the location of your my documents folder, change environment variables for items such as temp and tmp. However, there are some things that just refuse to put their data anywhere besides on the system partition. I used to use 10gb, these days I go for 20gb.
Dividing your swap space can be useful for keeping drive fragmentation down when letting your swap file grow and shrink as needed. Again this is an issue though of guessing how much swap you need. This will depend a lot on the amount of memory you have and how much stuff you will be running at one time.

For the posters suggesting RAID - it's probably OK at 160GB, but I'd hesitate for anything larger. Soft errors in the drives reduce the overall reliability of the RAID. See these articles for the details:
http://alumnit.ca/~apenwarr/log/?m=200809#08
http://permabit.wordpress.com/2008/08/20/are-fibre-channel-and-scsi-drives-more-reliable/
You can't believe everything you read on the internet, but the reasoning makes sense to me.
Sorry I wasn't actually able to answer your question.

I usually run a box with two drives. One for the OS, swap, typical programs and applications, and one for VMs, "big" apps (e.g., Adobe CS suite, anything that hits the disk a lot on startup, basically).
But I also run a cheap fileserver (just an old machine with a coupla hundred gigs of disk space in RAID1), that I use to store anything related to my various projects. I find this is a much nicer solution than storing everything on my main dev box, doesn't cost much, gives me somewhere to run a webserver, my personal version control, etc.
Although I admit, it really isn't doing much I couldn't do on my machine. I find it's a nice solution as it helps prevent me from spreading stuff around my workstation's filesystem at random by forcing me to keep all my work in one place where it can be easily backed up, copied elsewhere, etc. I can leave it on all night without huge power bills (it uses <50W under load) so it can back itself up to a remote site with a little script, I can connect to it from outside via SSH (so I can always SCP anything I need).
But really the most important benefit is that I store nothing of any value on my workstation box (at least nothing that isn't also on the server). That means if it breaks, or if I want to use my laptop, etc. everything is always accessible.

I would put the OS and all the applications on the first disk (1 partition). Then, put the data from the SQL server (and any other overflow data) on the second disk (1 partition). This is how I'd set up a machine without any other details about what you're building. Also make sure you have a backup so you don't lose work. It might even be worth it to mirror the two drives (if you have RAID capability) so you don't lose any progress if/when one of them fails. Also, backup to an external disk daily. The RAID won't save you when you accidentally delete the wrong thing.

In general I'd try to split up things that are going to be doing a lot of I/O (such as if you have autosave on VS going off fairly frequently) Think of it as sort of I/O multithreading

I've observed significant speedups by putting my virtual machines on a separate disk. Whenever Windows is doing something stupid in the VM (e.g., indexing yet again), it doesn't thrash my Mac's disk quite so badly.
Another issue is that many tools (Visual Studio comes to mind) break in frustrating ways when bits of them are on the non-primary disk.
Use your second disk for big random things.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas