As in the question, I want to increase number of queries per second on GCS. Currently, my application is on my local machine, when it runs, it repeatedly sends queries to and receives data back from the GCS server. More specifically, my location is in Vietnam, and the server (free tier though) is in Singapore. The maximum QPS I can get is ~80, which is unacceptable. I know I can get better QPS by putting my application on the cloud, same location with the SQL server, but that alone requires a lot of configuration and works. Are there any solutions for this?
Thank you in advance.
colocating your application front-end layer with the data persistence layer should be your priority: deploy your code to the cloud as well
use persistent connections/connection pooling to cut on connection establishment overhead
free tier instances for Cloud SQL do not exist. What are you referring to here? f1-micro GCE instances are not free in Singapore region either.
depending on the complexity of your queries, read/write pattern, size of your dataset, etc. performance of your DB could be I/O bound. Ensuring your instance is provisioned with SSD storage and/or increasing the data disk size can help lifting IOPS limits, further improving DB performance.
Side note: don't confuse commonly used abbreviation GCS (Google Cloud Storage) with Google Cloud SQL.
Related
I have performance issue with my .NET app hosted on an azure web app, connecting to Azure SQL DB with a custom connection string.
The more there are users, the more the app is slow. Therefore I am wondering if there are some improvements to perform at connection pool level.
How to check the pool size currently set ? How to detect sql issues when handling requests from different users ? And how to set pool size ?
Thank you for your help.
I think it's related to SQL Database resource limits for Azure SQL Database server.
The more there are users, the more the app is slow, one of the most important reason is database resource limits are reached.
Compute (DTUs and eDTUs / vCores)
When database compute utilization (measured by DTUs and eDTUs, or vCores) becomes high, query latency increases and can even time out.
Storage
When database space used reaches the max size limit, database inserts and updates that increase the data size fail and clients receive an error message. Database SELECTS and DELETES continue to succeed.
Sessions and workers (requests)
The maximum number of sessions and workers are determined by the service tier and compute size (DTUs and eDTUs). New requests are rejected when session or worker limits are reached, and clients receive an error message. While the number of connections available can be controlled by the application, the number of concurrent workers is often harder to estimate and control. This is especially true during peak load periods when database resource limits are reached and workers pile up due to longer running queries.
Fore details, please reference: What happens when database resource limits are reached.
If you Azure SQL DB is single database, you can reference these documents:
Azure SQL Database vCore-based purchasing model limits for a single database.
Resource limits for single databases using the DTU-based purchasing model.
Choose the most appropriate service tier.
About the performance issue, you also can use the Monitoring and performance tuning. It will help troubleshoot performance issue and improve the performance.
Hope this helps.
I am trying to copy 25 TB of data to Azure. Do we have any option to move the date?
Tried to copy but it has taken 1 hr for 1 GB Data, do we have any better solution so that I can do it more quickly?
The problem statement is very general. I would start with asking, how are you transferring the data?
The speed is dependent on so many factors, a few being:
1. Location of the data.
2. Location of the storage account you're writing to.
3. Network speed and bandwidth on the client side.
4. Network speed and bandwidth on the azure storage side. (expected to be good)
If you're writing the data to a Azure Storage account which is in a region closer to you, you're expected to get better speed.
As for the options to write the data:
1. Look at AzCopy.
https://azure.microsoft.com/en-us/documentation/articles/storage-use-azcopy/
Use Import\Export service.
https://azure.microsoft.com/en-us/pricing/details/storage-import-export/
The best way to upload large datasets into the cloud is still the sneakernet
Azure do a thing called the Azure Import/Export Service Basically you buy a SATA hard drive, encrypt it with a numerical bitlocker key, copy data to it, create an Azure import job, then ship the hard drive to them.
This ends up being considerably quicker than trying to upload.
An alternative you might want to look into, would be the AWS Import/Export Snowball for which they will ship you an appliance to copy the data to which you ship back to them when complete. It might be worth considering copying data into AWS via Snowball then copying it across their much faster internet pipes into Azure instead of buying the hardware required to transfer that much data.
If you open the target Storage account in the Azure Portal, there's now a calculator that will accept basic details (how much data etc) and then recommend the best options to you. Its under the heading "Data transfer".
I am running an Azure web role, which is storing very small blobs into Azure storage. (Blob upload is being done from the server, not from the browser.) I have searched stack overflow and the rest of the internet for tips on optimizing blob storage performance, and I believe I've checked and implemented all of the usual suspects: uploading async, allowing unlimited outgoing web connections (which now seems to be the default setting on web roles and no longer needs to be explicitly set in web.config or in code).
Tweaking the number of concurrent uploads I allow makes some difference, but regardless of what I've tried, I seem to max out at around 1,000 blob uploads per second. This is when running in the Azure web role, in the same region as the storage account (East US). My rate when running this from home over a good internet connection isn't much less, ~700 blobs/sec, which seems to tell me that it's not the network latency that's limiting the rate, it's the actual processing time of the storage service.
I wouldn't normally consider these rates horrible for this kind of a service, but I've read that Microsoft boasts a rate of ~20,000 storage transactions per second, so I've been a little disappointed with these results.
I'd like to get some feedback from those who have really tried to push the limits of blob storage. Does ~1000 small uploads per second sound about right? Or is there possibly something else I should be doing to improve this? I'll post the code if I need to, but I'd rather not receive speculative answers, I'd like to hear from developers who can either confirm that my results are reasonable, or that they've seen much higher throughput.
I should add that I'm currently running this in a small web role. I've tried it also in a medium web role, and didn't see any significant difference.
EDIT:
After a few days of development and testing, my upload rate seemed to suddenly increase. Not by a lot, but maybe by another ~200 per second. In looking around the web, I noticed a comment in the Azure documentation stating "A storage account scales automatically as usage increases." So I'm wondering if it really is capable of much higher rates, but will not automatically scale up until it sees sustained period of high volume. Some confirmation of that would also be greatly appreciated.
Depending on how small your requests are the problem might be caused by Nagle’s Algorithm is Not Friendly towards Small Requests - although usually I see that with queues / table operations. Try disabling Nagle's and let me know if that makes any difference. As an fyi, you have to disable it prior to establishing the connection otherwise the changes will not take effect.
Jason
I searched online for awhile about what is "Excessive resource usage" on SQL Azure, still cannot get an idea.
Some articles suggest query takes too long, too much memory etc will cause "Excessive resource usage". But If I use simple query, simple data structure, what will happen?
For example: I get a 1G SQL Azure as session state. Since session is a very small string, and save/delete all the time, I don't think it will grow to 1G for millions of session simultaneously. You can calculate, for 1 million session, 20 char each, only take 20M space, consider 20 minutes expire etc. Cannot even close to 1G. But the queries, should be lots and lots. Each query will be very simple and fast by index.
I wanna know, if this use will be consider as "Excessive resource usage"? Is there any hard number to limit you on the usage?
Btw, as example above, if all happen in same datacenter, so all cost is 1G database which is $10 a month, right?
Unfortunately the answer is 'it depends'. I think that probably the best reference (with guidance) on the SQL Azure Query Throttle is here: TechNet Article on SQL Azure Perormance This will povide details about the metrics that are monitored and the mechanism of the throttle.
The reason that I say it depends is that the throttle is non-deterministic for any given user. This is because the throttle will be activated based on the total load on the node (physical SQL Server in Azure DC). While the subscribers who will get throttled are the subscribers delivering the greatest load the level at which the throttle kicks in will depend on the total load on the node. SO if you are on a quiet node (where other tenant DBs are relatively inactive) then you will be able to put through a bunch more throughput than if you are on a busy node.
It is very appealing to use 1GB SQL Azure DBs for session state storage; you've identified the cost benefits. You are taking a risk though. One way to mitigate this risk is to partition across at least two SQL Azure 1GB DBs and adjust the load yourself based on whether one of the DBs starts hitting the throttle.
Another option if you want determinism for throughput is to use the WIndows Azure Cache to back your sesion state store. The Cache has hard pre-defined limits for query throughput so you can plan for it more easily Azure Caching FAQ including Limits. The Cache approach is probably a bit more expensive but with a lower risk of problems.
We have created a product that potentially will generate tons of requests for a data file that resides on our server. Currently we have a shared hosting server that runs a PHP script to query the DB and generate the data file for each user request. This is not efficient and has not been a problem so far but we want to move to a more scalable system so we're looking in to EC2. Our main concerns are being able to handle high amounts of traffic when they occur, and to provide low latency to users downloading the data files.
I'm not 100% sure on how this is all going to work yet but this is the idea:
We use an EC2 instance to host our admin panel and to generate the files that are being served to app users. When any admin makes a change that affects these data files (which are downloaded by users), we make a copy over to S3 using CloudFront. The idea here is to get data cached and waiting on S3 so we can keep our compute times low, and to use CloudFront to get low latency for all users requesting the files.
I am still learning the system and wanted to know if anyone had any feedback on this idea or insight in to how it all might work. I'm also curious about the purpose of projects like Cassandra. My understanding is that simply putting our application on EC2 servers makes it scalable by the nature of the servers. Is Cassandra just about keeping resource usage low, or is there a reason to use a system like this even when on EC2?
CloudFront: http://aws.amazon.com/cloudfront/
EC2: http://aws.amazon.com/cloudfront/
Cassandra: http://cassandra.apache.org/
Cassandra is a non-relational database engine and if this is what you need, you should first evaluate Amazon's SimpleDB : a non-relational database engine built on top of S3.
If the file only needs to be updated based on time (daily, hourly, ...) then this seems like a reasonable solution. But you may consider placing a load balancer in front of 2 EC2 images, each running a copy of your application. This would make it easier to scale later and safer if one instance fails.
Some other services you should read up on:
http://aws.amazon.com/elasticloadbalancing/ -- Amazons load balancer solution.
http://aws.amazon.com/sqs/ -- Used to pass messages between systems, in your DA (distributed architecture). For example if you wanted the systems that create the data file to be different than the ones hosting the site.
http://aws.amazon.com/autoscaling/ -- Allows you to adjust the number of instances online based on traffic
Make sure to have a good backup process with EC2, snapshot your OS drive often and place any volatile data (e.g. a database files) on an EBS block. EC2 doesn't fail often but when it does you don't have access to the hardware, and if you have an up to date snapshot you can just kick a new instance online.
Depending on the datasets, Cassandra can also significantly improve response times for queries.
There is an excellent explanation of the data structure used in NoSQL solutions that may help you see if this is an appropriate solution to help:
WTF is a Super Column