Is having multiple data/log files a good thing even on the same LUN? - sql

I have read that it is a good idea to have one file per CPU/CPU Core so that SQL can more efficiently stream data to and from the disks. Ok, I can see the benefit if they are on different spindles, but what if I only have one spindle (4 drives in Raid 10) for my data files (.mdf and .ndf), will I still benefit from splitting the data files (from just the .mdf file to a .mdf and several .ndf files)? Same goes for the log file, although I see no benefit to it as the data has to be written serially and you're limited by the spindle's sequential write speed...
FYI, this is in regards to SQL Server 2005/2008...
Thanks.

The recommendation for multiple tempdb data files is definitely not about IOPS. It is about contention on the allocation pages (GAM, SGAM, PFS) in tempdb. SQL 2005+ doesn't require as big of a load on these pages, but contention still occurs. Not all system require a 1 file to 1 core mapping. Most sytems will perform well with 1 file to 2 or 4 cores. Having too many files adds overhead for managing the files. A good recommendation is to start with 1:4 or 1:2 and increasing if contention continues. Don't go above 1:1.
For other databases, this is not recommended.
And yes, only 1 log file ... always.

8 Steps to better Transaction Log throughput:
Create only ONE transaction log file.
Even though you can create multiple
transaction log files, you only need
one... SQL Server DOES not "stripe"
across multiple transaction log files.
Instead, SQL Server uses the
transaction log files sequentially.
Misconceptions around TF 1118:
Why is the trace flag not required so
much in 2005 and 2008? In SQL Server
2005, my team changed the allocation
system for tempdb to reduce the
possibility of contention. There is
now a cache of temp tables. When a new
temp table is created on a cold system
(just after startup) it uses the same
mechanism as for SQL 2000. When it is
dropped though, instead of all the
pages being deallocated completely,
one IAM page and one data page are
left allocated, and the temp table is
put into a special cache. Subsequent
temp table creations will look in the
cache to see if they can just grab a
pre-created temp table 'off the
shelf'. If so, this avoids accessing
the allocation bitmaps completely. The
temp table cache isn't huge (I think
it's 32 tables), but this can still
lead to a big drop in latch
contention in tempdb.
So the answer is NO to both questions. Log striping was never an issue, and one-NDF-per-CPU is largely a myth, one that will take a very long time to die out. Multiple files IMHO make sense only if you can stripe IO (separate LUNs). Multiple filegroups though make sense, but not for IO reasons, for administrative purposes: piecemeal restores and archive read-only filegroups.

Still good. This is not about IOPS - it is about SQL Server BLOCKING a file for certain operations. mostly when file extends are allocated to a table / index. If you do a lot of inserts / updates, multiple files basically mean another thread will block another file, not wait on the first one.
So, this is not really about IOPS loads, it is about a blocking behavior.

Related

Is SELECT retrieving data from the WAL files?

To my understanding, database can postpone writing to table files to boost IO performance. When transaction is COMMITted, data are written to the WAL files.
I'm curious, how much delayed the writing to table files can be. In particular, when I use a simple SELECT, e.g.
SELECT * from myTable;
after COMMIT, is it possible that database has to retrieve data from the WAL files in addition to the table files?
The documentation talks about being able to postpone the flushing of data pages:
If we follow this procedure, we do not need to flush data pages to disk on every transaction commit.
What WAL files allow an RDBMS to do is keeping "dirty" data pages in memory and flushing them to disk at a later time. It does not work in a way that the data pages are modified at a later time.
So the answer to your question is "No, a SELECT is always retrieving data from the data pages, not from the WAL files".
PostgreSQL does not read from WAL for this purpose during normal operations. It would only read WAL in order to apply changes to the data files after a crash (or during replication onto another server).
When data from ordinary data files is changed, the pages of the data files are kept in shared memory (in shared_buffers) until they are written to disk. Any other processes wanting to see that data will find it in that shared memory, and it will be in its changed form. They will always look for it in shared_buffers before they try to read it from disk, so no one will ever see the stale on-disk version of the data, except for the recovery process after a crash.

Optimize tempdb log files

Have a dedicated DB server with 16 cores and 200+ GB of memory.
Use tempdb a lot but it typically stays under 4 GB
Recently added a dedicated SSD stripe for tempdb
Based on this page should create multiple files
Optimizing tempdb Performance
Understand multiple row files.
Here is my question:
Should I also create multiple tempdb log files?
It does say "create one data file for each CPU".
So my thought is that data means row (not log) files.
No, as with all databases, SQL server can only use one log file at a time so there is no benefit at all in having multiple log files.
The best thing you can do with log files really is keep them on separate drives to the data files as they have different IO requirements, pre-size them so they don't have to auto grow and if they do have to autogrow, make sure they do so at a sensible level to manage the number of virtual log files that are created inside them.

Is there any logic in just maxing tempdb and never having it change size?

The reason I ask it we have a dedicated RAID10 array with ~150GB for the tempdb (the "t" drive). It is only used for storing tempdb. The t drive isn't used by by SQL Server or any other process for anything else.
Our DBA has tempdb setup with 15GB initial size and autogrow 20% increments. Everytime the server starts it resized to 15GB and then over the course of the day grows to ~80GB (on average). Now IT is looking into making initial size larger say 30 or 40GB but given the drive is ONLY used for tempdb my thinking is why not "max it" right away.
Is the any negative effect to simply create 4 data files in the primary group for tempdb give them each an initial size of 30GB (120GB total), turn autogrow off and be done with it?
Are there any limits on SQL Server ability to span multiple tempdb data files in one query? i.e. will it cause problems if the tempdb has say 70GB total free but the file used by one process is full (30 of 30GB used)?
I would size them to about 100GB and leave autogrow on, this way you don't have to wait for it to grow every time, I would also add multiple files
Is the any negative effect to simply
create 4 data files in the primary
group for tempdb give them each an
initial size of 30GB, turn autogrow
off and be done with it?
Sounds like a good plan to me, however I would leave autogrow on just in case someone decides to do a sort operation on a big table which doesn't have an index on that column
See also here: http://technet.microsoft.com/en-us/library/cc966534.aspx
It is recommended to have .25 to 1
data files (per filegroup) for each
CPU on the host server.
This is especially true for TEMPDB
where the recommendation is 1 data
file per CPU.
Dual core counts as 2 CPUs; logical
procs (hyperthreading) do not.
We have found it very useful to create large TempDB data and log files. Any actions that limit server OS activities such as resizing TempDB increase server efficiencies. We have a 16 processor machine with 113 GB dedicated to TempDB data space. This machine is dedicated to large SSIS ETL processes, thus resulting in mass data operations.
The bulk of our ETL operations spawn up to 4 SQL threads. After initially configuring a TempDB file for each processor (16), we quickly realized via performance monitoring that our configuration was forcing SQL\windows to unnecessarily span the multiple TempDB files. We settled on 5 larger TempDB data files and realized performance improvements. We have since moved on to a 24 processor box and are using 8 TempDB files.
Please note that this is a large data migration server; I’m sure transaction-oriented systems would still benefit from the recommended 1-1 processor to TempDB file configuration. It should also be noted that having a large increase % on a TempDB file may force a critical transaction to take the windows operation hit and thus may not be appropriate for your specific application.

SQL 2005 Partitioning

I have a database with 200 million records and I need to support 200 write tps. How many partitions do you recommend to use?
One. Don't bother. Partitions will slow you down for writes
It's far more important for writes to have a dedicated, fast volume for your transaction log file (the LDF file) for that database alone. Don't add log files either: one LDF on one volume only.
This is because of write ahead logging: One and Two. Simply, a data page may not be written to disk immediately, but your associated log entry must be confirmed as written for any given transaction

When should one use auto shrink on log files in SQL Server?

I have had a few problems with log files growing too big on my SQL Servers (2000). Microsoft doesn't recommend using auto shrink for log files, but since it is a feature it must be useful in some scenarios. Does anyone know when is proper to use the auto shrink property?
Your problem is not that you need to autoshrink periodically but that you need to backup the log files periodically. (We back ours up every 15 minutes.) Backing up the database itself is not sufficient, you must do the log as well. If you do not back up the transaction log, it will grow until it takes up all the space on the drive. If you back it up, it frees the space to be reused (you will still probably need to shrink after the first backup to get the log down to a more reasonable size). If you don't need to be able torecover from transactions (which you should need to be able to do unless your entire database consists of tables that are loaded from another source and can easily be re-loaded.), then set your log to simlpe recovery mode.
One reason why autoshrinking isn't so good an idea is that you will be growing the transaction log frequently which slows down performance. IF you back up the log, one you get to a relatively stable size (the amount of space normally used by the transaction log in the time period between backups), then the log will only need to grow occasionally if there are an unusually heavy amount fo transactions.
My take on this is that auto-shrink is useful when you have many fairly small databases that frequently get larger due to added data, and then have a lot of empty space afterwards. You also need to not mind that the files will be fragmented on the disk when they frequently grow and shrink. I'd never use auto-shrink on a critical database or one larger than 2 GB, as you never know when the shrink operation will kick in, and access to the database will be blocked until the shrink has completed.
You should never have autoshrink turned on. It causes performance degradation in several ways. The file-system and indexes become fragmented and it is very resource intensive. It is also not necessary if you manage your backups correctly.
Read this answer from Paul Randal on Server Fault and Just Say No To Auto-Shrink!!
I used to use it when we had a demo version of a huge database that took up a lot of space on the laptop, so we used it to keep the size down.
The key is to use it only when the data is basically throw away.
You should truncate the logs periodically as a part of your backup strategy.