Scaling cheaply: MySQL and MS SQL - sql

How cheap can MySQL be compared to MS SQL when you have tons of data (and joins/search)? Consider a site like stackoverflow full of Q&As already and after getting dugg.
My ASP.NET sites are currently on SQL Server Express so I don't have any idea how cost compares in the long run. Although after a quick research, I'm starting to envy the savings MySQL folks get.

MSSQL Standard Edition (32 or 64 bit) will cost around $5K per CPU socket. 64 bit will allow you to use as much RAM as you need. Enterprise Edition is not really necessary for most deployments, so don't worry about the $20K you would need for that license.
MySQL is only free if you forego a lot of the useful tools offered with the licenses, and it's probably (at least as of 2008) going to be a little more work to get it to scale like Sql Server.
In the long run I think you will spend much more on hardware and people than you will on just the licenses. If you need to scale, then you will probably have the cash flow to handle $5K here and there.

The performance benefits of MS SQL over MySQL are fairly negligible, especially if you mitigate them with server and client side optimzations like server caching (in RAM), client caching (cache and expires headers) and gzip compression.

I know that stackoverflow has had problems with deadlocks from reads/writes coming at odd intervals but they're claiming their architecture (MSSQL) is holding up fine. This was before the public beta of course and according to Jeff's twitter earlier today:
the range of top 32 newest/modified
questions was about 20 minutes in the
private beta; now it's about 2
minutes.
That the site hasn't crashed yet is a testament to the database (as well as good coding and testing).
But why not post some specific numbers about your site?

MySQL is extremely cheap when you have the distro (or staff to build) that carries MySQL Enterprise edition. This is a High Availability version which offers multi-master replication over many servers.
Pros are low (license-) costs after initial purchase of hardware (Gigs of RAM needed!) and time to set up.
The drawbacks are suboptimal performance with many joins, no full-text indexing, stored procesures (I think) and one need to replicate grants to every master node.
Yet it's easier to run than the replication/proxy balancing setup that's available for PostgreSQL.

Related

How to estimate the maximum number of reads and writes per second a RDBMS server can handle?

Before spinning up an actual (MySQL, Postgres, etc) database, are there ways to estimate how many reads & writes per second the database can handle?
I'm assuming this is dependant on the CPU and memory (+ network if we're sharding), but is there a good best practice on how to put these variables together?
This is useful for estimating cost and understanding how much of a traffic spike can the db handle.
You can learn from others to gauge transactions per second you'll get from certain instances. For example, https://aiven.io/blog/postgresql-12-gcp-aws-performance gives you a good idea of how PostgreSQL 12 performs.
Percona has blogged about performance benchmarks also: https://www.percona.com/blog/2017/01/06/millions-queries-per-second-postgresql-and-mysql-peaceful-battle-at-modern-demanding-workloads/
Here's another benchmark with useful information: http://dimitrik.free.fr/blog/posts/mysql-performance-80-and-sysbench-oltp_rw-updatenokey.html about MySQL 8.0 and links to 5.7 performance.
There are several blogs about SQL Server performance such as https://storagehub.vmware.com/t/microsoft-sql-server-2017-database-on-vmware-vsan-tm-6-7-using-vmware-cloud-foundation-tm/performance-test-results/ that can also help you recognize the workloads these databases can handle.
Under 10K tps shouldn't be much of a problem with modern hardware. You can start with a most common configuration on the cloud or a standard sized server in your own environment. Use SSDs. Optimize your server settings to gain more speed and be ready to add more resources gradually. As Gordon mentions, benchmark your database after you have installed it. I'd start with 32G memory, 8 cores and SSDs to pull 10K tps as a thumbrule and adjust from there.
As you assumed, a lot depends on the # and type of CPU/memory/SSD, your workload, how you structure data, latency between your app and database, reporting happening against the database, master/slave configuration, types of transactions, storage engines etc.

moving from sql server to cassandra

I have a data intensive project for which I wrote the code recently, the data and sp live in a MS SQL db. My initial estimate is that the db will grow to 50TB, then it will become fairly static in growth. The final application will perform lots of row level look ups and readings, with a very small percentile of db write backs.
With the above scenario in mind, its being suggested that I should look at a NoSQL option in order to scale to the large load of data and transactions, and after a bit of research the roads leads to Cassandra (while considering MongoDB as a second alternative)
I would appreciate your guidance with the following set of initial questions:
-Does Cassandra support the concept of store procs?
-Would I be able to install and run the 50TB db on a single node (single Windows Server)?
-Does Cassandra support/leverage multiple CPUs in single server (ex: 4 CPUs)?
-Would open source version be able to support the 50TB db? or would I need to purchase the ENT version?
Regards,
-r
Does Cassandra support the concept of store procs?
Cassandra does not support stored procedures. However there is a feature called "prepared statements" which allows you to submit a CQL query once, and then have it executed multiple times with different parameters. But the set of things you can do with prepared statements is limited to regular CQL. In particular you can not do things like loops, conditional statements or other interesting things. But you do get some measure of protection against injection attacks and savings on multiple compilations.
Would I be able to install and run the 50TB db on a single node (single Windows Server)?
I am not aware of anything that would prevent you from running a 50TB database on one node, but you may require lots of memory to keep things relatively smooth, as you RAM/storage ratio is likely to be very low and thus impact your ability to cache disk data meaningfully. What is not recommended, however, is running a production setup on Windows. Cassandra uses some Linux specific IO optimizations, and is tested much more thoroughly on Linux. Far-out setups like you're suggesting are especially likely to be untested on Windows.
Does Cassandra support/leverage multiple CPUs in single server (ex: 4 CPUs)?
Yes
Would open source version be able to support the 50TB db? or would I need to purchase the ENT version?
The Apache distro does not have any usage limits baked into it (it makes little sense in an open source project, if you think about it). Neither does the free version from DataStax, the Community Edition.

What database strategy to choose for a large web application

I have to rewrite a large database application, running on 32 servers. The hardware is up to date, each machine has two quad core Xeon and 32 GByte RAM.
The database is multi-tenant, each customer has his own file, around 5 to 10 GByte each. I run around 50 databases on this hardware. The app is open to the web, so I have no control
on the load. There are no really complex queries, so SQL is not required if there is a better solution.
The databases get updated via FTP every day at midnight. The database is read-only.
C# is my favourite language and I want to use ASP.NET MVC.
I thought about the following options:
Use two big SQL servers running SQL Server 2012 to serve the 32 servers with data. On the 32 servers running IIS hosting providing REST services.
Denormalize the database and use Redis on each webserver. Use booksleeve as a Redis client.
Use a combination of SQL Server and Redis
Use SQL Server 2012 together with Hadoop
Use Hadoop without SQL Server
What is the best way for a read-only database, to get the best performance without loosing maintainability? Does Map-Reduce make sense at all in such a scenario?
The reason for the rewrite is, the old app written in C++ with ISAM technology is too slow, the interfaces are old fashioned and not nice to use from an website, especially when using ajax.
The app uses a relational datamodel with many tables, but it is possible to write one accerlerator table where all queries can be performed on, and all other information from the other tables are possible by a simple key lookup.
Few questions. What problems have come up that you're rewriting this? What do the query patterns look like? It sounds like you would be most comfortable with a SQLServer + caching (memcached) to address whatever issues that are causing you to rewrite this. Redis is good, but you won't need the data structure features with the db handling queries, and you don't need persistance if it's only being used as a cache. Without knowing more about the problem, I guess I'd look at MongoDB to handle data sharding, redundant storage, and caching all in one solution. There are no special machines in this setup, redundancy can be configured, and the load should balance well.
This question is almost an opinion piece. I'd personally prefer an Oracle RAC with TimesTen for caching if performance is of the utmost importance, and if volume of concurrent reads is high during the day.
There's a white paper here...
http://www.oracle.com/us/products/middleware/timesten-in-memory-db-504865.pdf
The specs of the disk subsystem and organization of indexes and data files across physical disks is probably the most important factor though.

Single logical SQL Server possible from multiple physical servers?

With Microsoft SQL Server 2005, is it possible to combine the processing power of multiple physical servers into a single logical sql server? Is it possible on SQL Server 2008?
I'm thinking, if the database files were located on a SAN and somehow one of the sql servers acted as a kind of master, then processing could be spread out over multiple physical servers, for instance even allowing simultaneous updates where there was no overlap, and in the case of read-only queries on unlocked tables no limit.
We have an application that is limited by the speed of our sql server, and probably stuck with server 2005 for now. Is the only option to get a single more powerful physical server?
Sorry I'm not an expert, I'm not sure if the question is a stupid one.
TIA
Before rushing out and buying new hardware, find out where your bottlenecks really are. Many locking problems can be solved with the appropriate indexes for your workload.
For example, I've seen instances where placing tempDB on SSD solved performance issues and saved the client buying an expensive new server.
Analyse your workload: How Can I Log and Find the Most Expensive Queries?
With SQL Server 2008 you can utilise the Management Data Warehouse (MDW) to capture your workload.
White Paper: SQL Server 2008 Performance and Scale
Also: please be aware that a SAN solution is not necessarily a faster I/O solution than directly attached storage. It depends on the SAN, number of Physical disks in a LUN, LUN subscription and usage, the speed of the HBA's and several other hardware factors...
Optimizing the app may be a big job of going through all business logic and lines of code. But looking for the most expansive query can easily locate the bottleneck area. Maybe it only happens to a couple of the biggest tables, views or stored procedures. Add or fine tune an index may help right the way. If bumping up the RAM is possible try that option as well. That is cheap and easy configure.
Good luck.
You might want to google for "sql server scalable shared database". Yes you can store your db files on a SAN and use multiple servers, but you're going to have to meet some pretty rigid criteria for it to be a performance boost or even useful (high ratio of reads to writes, small enough dataset to fit in memory or a fast enough SAN, multiple concurrent accessors, etc, etc).
Clustering is complicated and probably much more expensive in the long run than a bigger server, and far less effective than properly optimized application code. You should definitely make sure your app is well optimized.

Are there performance benefits when upgrading SQL2000 to SQL2005?

I've had a look at this question but there are no responses regarding performance.
Are there any performance benefits from doing a simple upgrade from SQL2000 to SQL2005?
I am thinking a line of business OLTP datbase.
I am not using OLAP or FTI.
We found yes.
The query optimiser is updated and better.
We found a lot of query plans were different with absolutely no other changes.
Even our end users commented on the speed improvement and general responsiveness. I have the email to prove it :-D
At the same time, we re-wrote a very few stored procs because they were worse. However, the same tweaks also improved response on SQL 2000 and was simply poor code/ideas.
Not in my experience.
If you want to improve your database performance, I tend to throw more hardware at it in the form of more RAM and faster disks.
I still haven't found much on this but here is a load of marketing stuff that essentially says SQL2005 is a good thing:
http://www.microsoft.com/sqlserver/2005/en/us/white-papers.aspx#gen
and in this white paper "Why upgrade to SQLSERVER2005"(.doc)
it states
Faster Response Times for End Users
Database query response times have
improved 70−80 percent for many
applications converted from SQL Server
2000 to SQL Server 2005. Quicker
response times to queries help
business users react faster and enable
the database to be used for real-time
decision making. Queries that might
have previously taken several hours
can now be completed in a matter of
minutes. This provides a whole new
level of functionality for end-users
because analysis can be done on an ad
hoc basis rather than exclusively in
predefined reports. Companies can use
this new accessibility to information
to dynamically monitor processes.
On the flip side to benefits there seem to be few problems when upgrading.
Why would upgrading from SQL Server 2000 to 2005 result in slower queries?
http://www.google.co.uk/url?sa=t&source=web&ct=res&cd=4&url=http%3A%2F%2Fwww.eggheadcafe.com%2Fforumarchives%2FSQLServersetup%2FFeb2006%2Fpost26068566.asp&ei=76XNSpHWJ5bWmQPDwdSMAw&usg=AFQjCNGY15FipRCEzPq2OVcomXqys08hTA&sig2=UN5mcW6T7es0D_UUCcuizQ