I have to decide to use an Azure SQL Data Warehouse or a SQL Data warehouse based on Microsoft SQL Server virtualized on a VM.
The problem what i do not understand is the MAX CONCURRENT QUERIES LIMITATION TO 32. The same for the Azure SQL Database is 6400.
To be honest when i want to use the Azure Data Warehouse in an Enterprise environment the 32 concurrent queries are laughable or i do not understand it.
Lets assume a company with 10.000 Employees worldwide and i set up a Azure Data Warehouse for reporting purpose where lets assume 250 permanently are querying from or additional 250 employees are working with a business app which uses data from the Data Warehouse. How should this work without extreme performance lacks?
This isn't the issue that you might think.
First, the limit is now 128. (https://learn.microsoft.com/en-us/azure/sql-data-warehouse/memory-and-concurrency-limits#gen2-1)
Second, this is well above the concurrency of the next most concurrent single cluster warehouse. I've often wondered what marketing mistake was made by Microsoft that concurrency is seen as a limitation on ASDW, but rarely mentioned for less concurrent competitors.
Third, the best way to serve thousands of concurrent query users (ie, dashboards) is through PowerBI hybrid queries, and (potentially) Azure Analysis Services. This gives extremely high concurrency and interactivity.
Perhaps the best evidence I can give is that I work with Azure SQL Data Warehouse customers on a daily basis. I often get questions like this when a customer is first exposed to ASDW, but I never get questions about concurrency by the time they're in production. In other words, the issue of "concurrency" just isn't important for most customers.
I'm currently running an instance of MS SQL Server 2014 (12.1.4100.1) on a dedicated machine I rent for $270/month with the following specs:
Intel Xeon E5-1660 processor (six physical 3.3ghz cores +
hyperthreading + turbo->3.9ghz)
64 GB registered DDR3 ECC memory
240GB Intel SSD
45000 GB of bandwidth transfer
I've been toying around with Azure SQL Database for a bit now, and have been entertaining the idea of switching over to their platform. I fired up an Azure SQL Database using their P2 Premium pricing tier on a V12 server (just to test things out), and loaded a copy of my existing database (from the dedicated machine).
I ran several sets of queries side-by-side, one against the database on the dedicated machine, and one against the P2 Azure SQL Database. The results were sort of shocking: my dedicated machine outperformed (in terms of execution time) the Azure db by a huge margin each time. Typically, the dedicated db instance would finish in under 1/2 to 1/3 of the time that it took the Azure db to execute.
Now, I understand the many benefits of the Azure platform. It's managed vs. my non-managed setup on the dedicated machine, they have point-in-time restore better than what I have, the firewall is easily configured, there's geo-replication, etc., etc. But I have a database with hundreds of tables with tens to hundreds of millions of records in each table, and sometimes need to query across multiple joins, etc., so performance in terms of execution time really matters. I just find it shocking that a ~$930/month service performs that poorly next to a $270/month dedicated machine rental. I'm still pretty new to SQL as a whole, and very new to servers/etc., but does this not add up to anyone else? Does anyone perhaps have some insight into something I'm missing here, or are those other, "managed" features of Azure SQL Database supposed to make up the difference in price?
Bottom line is I'm beginning to outgrow even my dedicated machine's capabilities, and I had really been hoping that Azure's SQL Database would be a nice, next stepping stone, but unless I'm missing something, it's not. I'm too small of a business still to go out and spend hundreds of thousands on some other platform.
Anyone have any advice on if I'm missing something, or is the performance I'm seeing in line with what you would expect? Do I have any other options that can produce better performance than the dedicated machine I'm running currently, but don't cost in the tens of thousand/month? Is there something I can do (configuration/setting) for my Azure SQL Database that would boost execution time? Again, any help is appreciated.
EDIT: Let me revise my question to maybe make it a little more clear: is what I'm seeing in terms of sheer execution time performance to be expected, where a dedicated server # $270/month is well outperforming Microsoft's Azure SQL DB P2 tier # $930/month? Ignore the other "perks" like managed vs. unmanaged, ignore intended use like Azure being meant for production, etc. I just need to know if I'm missing something with Azure SQL DB, or if I really am supposed to get MUCH better performance out of a single dedicated machine.
(Disclaimer: I work for Microsoft, though not on Azure or SQL Server).
"Azure SQL" isn't equivalent to "SQL Server" - and I personally wish that we did offer a kind of "hosted SQL Server" instead of Azure SQL.
On the surface the two are the same: they're both relational database systems with the power of T-SQL to query them (well, they both, under-the-hood use the same DBMS).
Azure SQL is different in that the idea is that you have two databases: a development database using a local SQL Server (ideally 2012 or later) and a production database on Azure SQL. You (should) never modify the Azure SQL database directly, and indeed you'll find that SSMS does not offer design tools (Table Designer, View Designer, etc) for Azure SQL. Instead, you design and work with your local SQL Server database and create "DACPAC" files (or special "change" XML files, which can be generated by SSDT) which then modify your Azure DB such that it copies your dev DB, a kind of "design replication" system.
Otherwise, as you noticed, Azure SQL offers built-in resiliency, backups, simplified administration, etc.
As for performance, is it possible you were missing indexes or other optimizations? You also might notice slightly higher latency with Azure SQL compared to a local SQL Server, I've seen ping times (from an Azure VM to an Azure SQL host) around 5-10ms, which means you should design your application to be less-chatty or to parallelise data retrieval operations in order to reduce page load times (assuming this is a web-application you're building).
Perf and availability aside, there are several other important factors to consider:
Total cost: your $270 rental cost is only one of many cost factors. Space, power and hvac are other physical costs. Then there's the cost of administration. Think work you have to do each patch Tuesday and when either Windows or SQL Server ships a service pack or cumulative update. Even if you don't test them before rolling out, it still takes time and effort. If you do test, then there's a second machine and duplicating the product instance and workload for test.
Security: there is a LOT written about how bad and dangerous and risky it is to store any data you care about in the cloud. Personally, I've seen way worse implementations and processes on security with local servers (even in banks and federal agencies) than I've seen with any of the major cloud providers (Microsoft, Amazon, Google). It's a lot of work getting things right then even more work keeping them right. Also, you can see and audit their security SLAs (See Azure's at http://azure.microsoft.com/en-us/support/trust-center/).
Scalability: not just raw scalability but the cost and effort to scale. Azure SQL DB recently released the huge P11 edition which has 7x the compute capacity of the P2 you tested with. Scaling up and down is not instantaneous but really easy and reasonably quick. Best part is (for me anyway), it can be bumped to some higher edition when I run large queries or reindex operations then back down again for "normal" loads. This is hard to do with a regular SQL Server on bare metal - either rent/buy a really big box that sits idle 90% of the time or take downtime to move. Slightly easier if in a VM; you can increase memory online but still need to bounce the instance to increase CPU; your Azure SQL DB stays online during scale up/down operations.
There is an alternative from Microsoft to Azure SQL DB:
“Provision a SQL Server virtual machine in Azure”
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-provision-sql-server/
A detailed explanation of the differences between the two offerings: “Understanding Azure SQL Database and SQL Server in Azure VMs”
https://azure.microsoft.com/en-us/documentation/articles/data-management-azure-sql-database-and-sql-server-iaas/
One significant difference between your stand alone SQL Server and Azure SQL DB is that with SQL DB you are paying for high levels of availability, which is achieved by running multiple instances on different machines. This would be like renting 4 of your dedicated machines and running them in an AlwaysOn Availability Group, which would change both your cost and performance. However, as you never mentioned availability, I'm guessing this isn't a concern in your scenario. SQL Server in a VM may better match your needs.
SQL DB has built in availability (which can impact performance), point in time restore capability and DR features. You have the option to scale up / down your DB based on your usage to reduce the cost. You can improve your query performance using Global query (shard data). SQl DB manages auto upgrades and patching and greatly improves the manageability story. You may need to pay a little premium for that. Application level caching / evenly distributing the load, downgrading when cold etc. may help improve your database performance and optimize the cost.
I read in a few places that SQL Azure data is automatically replicated and the Azure platform provides redundant copies of the data, Therefore SQL Server high availability features such as database mirroring and failover cluster aren't needed.
Has anyone got a chance to investigate deeper into this? Are all those availability enhancements really not needed in Azure? Thanks!
To clarify, I'm talking about SQL as a service and not a VM hosted SQL.
The SQL Database service (database-as-a-service) is a multi-tenant database service, and your databases are triple-replicated within the data center, providing durable storage. The service itself, being large-scale, provides high availability (since there are many VMs running the service itself, along with replicated data). Nothing is needed in terms of mirroring or failover clusters. Having said that: If, say, your particular database became unavailable for a period of time, you'll need to consider how you'll handle that situation (perhaps sync'ing to another SQL Database, maybe even in another data center).
If you go with SQL Database (DBaaS), you'll still need to work out your backup strategy, and possibly syncing with another DC (or on-premises database server) for DR purposes.
More info on SQL Database fault tolerance is here.
Your desired detail is probably contained in this MSDN article of Business Continuity and Azure SQL Database (see: http://msdn.microsoft.com/en-us/library/windowsazure/hh852669.aspx). At the most basic level Azure SQL Database will keep three replicas of your database - one primary and two secondary.
While this helps with BCP / DR scenarios you may also wish to investigate ways to backup your database so you have point-in-time restore capabilities. More information on backup / restore can be found here: http://msdn.microsoft.com/en-us/library/windowsazure/jj650016.aspx
I'd like to ask if there exist any sharding mechanism (like SQL Azure Federations in Cloud) but in SQL Server 2012 .
I've searched a lot but I couldn't find any appropriate solution that resembles Federations. There is AlwaysOn but it's not the same.
Thanks
No, this feature is not in the boxed product in SQL Server 2012. There are many ways to scale reads but merge replication and distributed partitioned views seem to still be the only viable out-of-the-box solutions for scaling writes. Note that the latter doesn't seem to be officially documented separately in SQL Server 2012 - it has all been condensed into the CREATE VIEW topic.
With HRD and BigTable, you are forced to deal with eventual consistency for all queries that are not ancestor queries. Your code has to be robust enough to cope with the fact that the results may be stale.
With Google's launching of Cloud SQL, they put in a disclaimer: ( https://developers.google.com/cloud-sql/faq#hrapps )
"We recommend that you use Google Cloud SQL with
High Replication App Engine applications. While you can use use
Google Cloud SQL with applications that
do not use high replication, doing so might impact performance."
What does this mean? Does this mean that there are the same eventual consistency issues using SQL with HRD? There is no concept of entity groups in SQL, however could this mean that particular SQL queries in particular circumstances deliver stale results?
This would mean that Google's implementation of the SQL atomic transactional contract would be broken and SQL would not function as users of relational databases would expect.
If this is not the case, what are the concerns for having a master/slave or HRD model with SQL and why would Google give you the option of choosing a model with poorer performance?
(from forum)
The Cloud SQL and Data store systems are independent. You can use one or both as you see fit for your app.
We recommend using HRD apps because that type of app will be colocated with Cloud SQL. Master slave apps are served from a different set of datacenters where cloud sql does not have presence. It will work but it will be slower.
Quotes from documentation:
"Google Cloud SQL is, simply put, a MySQL instance that lives in the cloud. It has all the capabilities and functionality of MySQL"
TO answer your question, the high replication/master-slave options for a relational DB are not to do with consistency but with other factors like latency at peak loads and availability for write when there is a planned maintenance. For a high replication datastore the latency is low even if load spikes, and they are available for write even while there is a maintenance planned. Check the comparison at http://code.google.com/appengine/docs/java/datastore/hr/
And second part of question as to why would google offer a master-slave option which is not full proof. Answer is so that people who don't need complete uptime and want to try out GAE can use it.