Recently I had a discussion with our Network and System team about putting SQL files on different SAN LUNs. They believed that now a days because of SAN EMC Management process it is wasting the time and energy to put SQL files (Data/Log/Lob/Index/Backups especially TLogs) on separate drives with different spindles. So, could you help me by participating and stating your idea and vision about this discussion please.
I would tend to agree with your SAN administrators on this one. Most SANs today are running RAID-10 or similar technologies, spanning many drives and handling very high IOPS. Physically separating spindles for SQL Server data and logs goes back to the days of local storage with low numbers of drives and low IOPS capabilities.
so - its definitely worth placing your SQL data on separate LUNs, if only for scaling purposes. Definitely don't partition a LUN into multiple filesystems, I have seen that - its a road to destruction.
Putting different volumes on different physical spindles - this depends on a lot of factors.
Whats the workload, OLTP or OLAP (transactional or analytics)?
Whats the storage array? Is it traditional RAID (LUNs are on raidgroups) or is it virtualized provisioning (LUNs on "extents" Pools, extents on raidgroups (example VNX, VMAX, Unity))?
Are you using thin provisioning?
How are you going to scale?
Measure what workload are you dishing out to your current storage devices. Main thing, measure IOPS but also IO block size. IOPS alone is a meaningless number, you want to know the size of IO operations to determine volume placement. Determine what technology are you using, traditional or virtualized. Use latency as the ultimate performance measure.
This should get the conversation with your storage guys started.
Related
Before spinning up an actual (MySQL, Postgres, etc) database, are there ways to estimate how many reads & writes per second the database can handle?
I'm assuming this is dependant on the CPU and memory (+ network if we're sharding), but is there a good best practice on how to put these variables together?
This is useful for estimating cost and understanding how much of a traffic spike can the db handle.
You can learn from others to gauge transactions per second you'll get from certain instances. For example, https://aiven.io/blog/postgresql-12-gcp-aws-performance gives you a good idea of how PostgreSQL 12 performs.
Percona has blogged about performance benchmarks also: https://www.percona.com/blog/2017/01/06/millions-queries-per-second-postgresql-and-mysql-peaceful-battle-at-modern-demanding-workloads/
Here's another benchmark with useful information: http://dimitrik.free.fr/blog/posts/mysql-performance-80-and-sysbench-oltp_rw-updatenokey.html about MySQL 8.0 and links to 5.7 performance.
There are several blogs about SQL Server performance such as https://storagehub.vmware.com/t/microsoft-sql-server-2017-database-on-vmware-vsan-tm-6-7-using-vmware-cloud-foundation-tm/performance-test-results/ that can also help you recognize the workloads these databases can handle.
Under 10K tps shouldn't be much of a problem with modern hardware. You can start with a most common configuration on the cloud or a standard sized server in your own environment. Use SSDs. Optimize your server settings to gain more speed and be ready to add more resources gradually. As Gordon mentions, benchmark your database after you have installed it. I'd start with 32G memory, 8 cores and SSDs to pull 10K tps as a thumbrule and adjust from there.
As you assumed, a lot depends on the # and type of CPU/memory/SSD, your workload, how you structure data, latency between your app and database, reporting happening against the database, master/slave configuration, types of transactions, storage engines etc.
Ok, let met try to explain this in more detail.
I am developing a diagnostic system for airplanes. Let imagine that airplanes has 6 to 8 on-board computers. Each computer has more than 200 different parameters. The diagnostic system receives all this parameters in binary formatted package, then I transfer data according to the formulas (to km, km/h, rpm, min, sec, pascals and so on) and must store it somehow in a database. The new data must be handled each 10 - 20 seconds and stored in persistence again.
We store the data for further analytic processing.
Requirements of storage:
support sharding and replication
fast read: support btree-indexing
NOSQL
fast write
So, I calculated an average disk or RAM usage per one plane per day. It is about 10 - 20 MB of data. So an estimated load is 100 airplanes per day or 2GB of data per day.
It seems that to store all the data in RAM (memcached-liked storages: redis, membase) are not suitable (too expensive). However, now I am looking to the mongodb-side. Since it can utilize as RAM and disk usage, it supports all the addressed requirements.
Please, share your experience and advices.
There is a helpful article on NOSQL DBMS Comparison.
Also you may find information about the ranking and popularity of them, by category.
It seems regarding to your requirements, Apache's Cassandra would be a candidate due to its Linear scalability, column indexes, Map/reduce, materialized views and powerful built-in caching.
With Microsoft SQL Server 2005, is it possible to combine the processing power of multiple physical servers into a single logical sql server? Is it possible on SQL Server 2008?
I'm thinking, if the database files were located on a SAN and somehow one of the sql servers acted as a kind of master, then processing could be spread out over multiple physical servers, for instance even allowing simultaneous updates where there was no overlap, and in the case of read-only queries on unlocked tables no limit.
We have an application that is limited by the speed of our sql server, and probably stuck with server 2005 for now. Is the only option to get a single more powerful physical server?
Sorry I'm not an expert, I'm not sure if the question is a stupid one.
TIA
Before rushing out and buying new hardware, find out where your bottlenecks really are. Many locking problems can be solved with the appropriate indexes for your workload.
For example, I've seen instances where placing tempDB on SSD solved performance issues and saved the client buying an expensive new server.
Analyse your workload: How Can I Log and Find the Most Expensive Queries?
With SQL Server 2008 you can utilise the Management Data Warehouse (MDW) to capture your workload.
White Paper: SQL Server 2008 Performance and Scale
Also: please be aware that a SAN solution is not necessarily a faster I/O solution than directly attached storage. It depends on the SAN, number of Physical disks in a LUN, LUN subscription and usage, the speed of the HBA's and several other hardware factors...
Optimizing the app may be a big job of going through all business logic and lines of code. But looking for the most expansive query can easily locate the bottleneck area. Maybe it only happens to a couple of the biggest tables, views or stored procedures. Add or fine tune an index may help right the way. If bumping up the RAM is possible try that option as well. That is cheap and easy configure.
Good luck.
You might want to google for "sql server scalable shared database". Yes you can store your db files on a SAN and use multiple servers, but you're going to have to meet some pretty rigid criteria for it to be a performance boost or even useful (high ratio of reads to writes, small enough dataset to fit in memory or a fast enough SAN, multiple concurrent accessors, etc, etc).
Clustering is complicated and probably much more expensive in the long run than a bigger server, and far less effective than properly optimized application code. You should definitely make sure your app is well optimized.
I have terabytes of files and database dumps that I need to backup off-site.
What's the best way to accomplish this?
I'm currently weighing rsyinc to Amazon EBS or getting an appliance (eg barracuda).
I called a buddy of mine, and he said he uses backula to get all the files on a single disk, then backs that disk up to tape, then sends the tapes off to iron mountain.
Still waiting to hear back from other sysadmins I've contacted. Will post results here.
One common solution to offsite backups that is worth considering is performing the backup onsite and then physically transporting the backup elsewhere, either via secure snail mail or with a service designed for that purpose. If bandwidth is an issue, this may be more practical.
Instead of tapes, I use hard drives that I physically swap out every week. It is less expensive than tape equipment, and easier to plug into another system when necessary.
Back in the late 80s I worked at a place where every week we received a box of tapes of various sorts every monday - we would do one set of weekly backups on the tapes on that box and send them off-site. Evidently they had two of these boxes, one that was in our office and the other they kept locked up somewhere. Then we got an Exabyte drive which had a single tape capacity greater than that whole box of TK-50s, QIC-40s and mag tapes, and it was just simpler to send a single tape home with one of the manager every week.
I'm sure there are still off-site backup systems like that, but I find it easier to keep cycling a couple of 500Gb drives from my home system to my desk at work.
Why not encrpyt it and actually upload to a third party vendor?
I am thinking of doing this with my data at home but have not found a vendor that will just let me do a dump...They all want to install client side apps...
Admittedly, I have not looked that hard...
We use a couple of solutions. We have an offsite backup with another company that we do. We also use several portable hard drives and swap them out each day. Neither solution really handles multiple terabytes of data. More like gigabytes.
In the future, however, we will probably be looking at going the tape router, or something else that is similarly permanent and storable. Terabytes of data is too much to transfer over the wire. When bluray discs become reasonably priced and commercially viable, it may be a good idea to look into the 400GB discs that were touted not long ago. Those would be extremely storage friendly (both in the physical sense and the file size sense), and depending on the longevity stats, may keep for a while, similar to tapes.
I would recommend using a local san from a company like EMC that provides compressed snapshot based replication to remote facilities. It's an expensive solution, but it works.
http://www.emc.com/products/family/emc-centera-family.htm
Over the weekend, I've heard back from a couple of my sysadmin buddies.
It seems the best practice is to backup all machines to a central large disk, then back that disk up to tape, then send the tapes off site (all have used Iron Mountain).
Tapes hold 400-800G and cost $30-$80 per tape.
A tape changer seems to go for $10k on up.
Not sure how much the off-site shipping costs.
I'm scared of tape. I think it gives a false sense of data security. In my own experience from backing up dozens of terrabytes across hundreds of tapes, we discovered that the data recovery rate after a few years fell to about 70%.
To be fair, that was with a now discontinued technology (AIT), but it pretty much put me off tape for life unless it sits on a 1" spool and is reassuringly expensive.
These days, multiple hard drives, multiple locations, and yes, a fall back into Amazon S3 or other cloud provider does no harm (apart from being a tad expensive).
I have a problem with a large database I am working with which resides on a single drive - this Database contains around a dozen tables with the two main ones are around 1GB each which cannot be made smaller. My problem is the disk queue for the database drive is around 96% to 100% even when the website that uses the DB is idle. What optimisation could be done or what is the source of the problem the DB on Disk is 16GB in total and almost all the data is required - transactions data, customer information and stock details.
What are the reasons why the disk queue is always high no matter the website traffic?
What can be done to help improve performance on a database this size?
Any suggestions would be appreciated!
The database is an MS SQL 2000 Database running on Windows Server 2003 and as stated 16GB in size (Data File on Disk size).
Thanks
Well, how much memory do you have on the machine? If you can't store the pages in memory, SQL Server is going to have to go to the disk to get it's information. If your memory is low, you might want to consider upgrading it.
Since the database is so big, you might want to consider adding two separate physical drives and then putting the transaction log on one drive and partitioning some of the other tables onto the other drive (you have to do some analysis to see what the best split between tables is).
In doing this, you are allowing IO accesses to occur in parallel, instead of in serial, which should give you some more performance from your DB.
Before buying more disks and shifting things around, you might also update statistics and check your queries - if you are doing lots of table scans and so forth you will be creating unnecessary work for the hardware.
Your database isn't that big after all - I'd first look at tuning your queries. Have you profiled what sort of queries are hitting the database?
If you disk activity is that high while your site is idle, I would look for other processes that might be running that could be affecting it. For example, are you sure there aren't any scheduled backups running? Especially with a large db, these could be running for a long time.
As Mike W pointed out, there is usually a lot you can do with query optimization with existing hardware. Isolate your slow-running queries and find ways to optimize them first. In one of our applications, we spent literally 2 months doing this and managed to improve the performance of the application, and the hardware utilization, dramatically.