I have an Azure Function that writes to Azure SQL. It currently reads from a topic but could be changed to read from a queue. Is there a preference to read from a topic or a queue?
There are 200K messages per hour hitting the topic. I need to write the 200K messages per hour to Azure SQL. During processing I regularly get the error "The request limit for the database is 60 and has been reached.". I understand that I've hit the maximum number of DB connections. Is there a way to stop Azure from scaling up the number of Azure Function instances? What's the best way to share SQL connections?
Any other Azure Function to Azure SQL performance tips?
Thanks,
Richard
There is no well-defined way to achieve this with Service Bus. You may want to play with host.json file and change maxConcurrentCalls parameter:
"serviceBus": {
// The maximum number of concurrent calls to the callback the message
// pump should initiate. The default is 16.
"maxConcurrentCalls": XYZ,
}
but it only controls the amount of parallel calls at a single instance.
I would suggest you look at Event Hubs. You get at least 2 bonuses:
You can switch to batches of events instead of 1 by 1 processing. This is usually a very effective way to insert large amount of data into SQL table.
Max concurrency is limited by amount of Event Hub partitions, so you know the hard limit of concurrent calls.
On the downside, you would lose some Service Bus features like dead lettering, auto retries etc.
Related
I have performance issue with my .NET app hosted on an azure web app, connecting to Azure SQL DB with a custom connection string.
The more there are users, the more the app is slow. Therefore I am wondering if there are some improvements to perform at connection pool level.
How to check the pool size currently set ? How to detect sql issues when handling requests from different users ? And how to set pool size ?
Thank you for your help.
I think it's related to SQL Database resource limits for Azure SQL Database server.
The more there are users, the more the app is slow, one of the most important reason is database resource limits are reached.
Compute (DTUs and eDTUs / vCores)
When database compute utilization (measured by DTUs and eDTUs, or vCores) becomes high, query latency increases and can even time out.
Storage
When database space used reaches the max size limit, database inserts and updates that increase the data size fail and clients receive an error message. Database SELECTS and DELETES continue to succeed.
Sessions and workers (requests)
The maximum number of sessions and workers are determined by the service tier and compute size (DTUs and eDTUs). New requests are rejected when session or worker limits are reached, and clients receive an error message. While the number of connections available can be controlled by the application, the number of concurrent workers is often harder to estimate and control. This is especially true during peak load periods when database resource limits are reached and workers pile up due to longer running queries.
Fore details, please reference: What happens when database resource limits are reached.
If you Azure SQL DB is single database, you can reference these documents:
Azure SQL Database vCore-based purchasing model limits for a single database.
Resource limits for single databases using the DTU-based purchasing model.
Choose the most appropriate service tier.
About the performance issue, you also can use the Monitoring and performance tuning. It will help troubleshoot performance issue and improve the performance.
Hope this helps.
My company uses IBM MQ's Multi-Instance Queues right now. We would like to replicate those queues to a different Data Center over the WAN for Disaster Recover purposes. I'm skeptical it will work simply due to all the message traffic and even a slight delay will cause the Queues to fail.
What is the technical reason why this will not work?
Are you talking about storage replication? If so are you planning to use synchronous or asynchronous replication?
Asynch will not cause any delay on the replicating end but there will be some amount of delay before the receiving end receives data depending on network distance. Your storage team should be able to tell you how many seconds the async replication delay could be.
With synch the data is sent over the network by the replicating end storage array and a confirmation comes back over the network before the the storage array returns to the OS that the write was successful. To be usable the two arrays have to be with in 6ms of each other. This type of replication adds a delay to each write equal to the network ms.
MQ application can batch messages into single units of work to improve performance with sync replication is in place, but this will slow down persistent message performance.
Define "Slight delay" in your statement?
Async replication will cause a delay and RPO will not be zero. Your storage team can advise on RPO value. If that is not acceptable, asynch replication is not an option for you.
Although it's pragmatic choice from cost and distance standpoint but could cause duplicate or missing transactions.
For synch replication, the distance in data-centers is limited. (Apart from hit on performance on Primary DC). Check with your storage team on the distance limit.
In current project we currently use 8 worker role machines side by side that actually work a little different than azure may expect it.
Short outline of the system:
each worker start up to 8 processes that actually connect to cloud queue and processes messages
each process accesses three different cloud queues for collecting messages for different purposes (delta recognition, backup, metadata)
each message leads to a WCF call to an ERP system to gather information and finally add retreived response in an ReDis cache
this approach has been chosen over many smaller machines due to costs and performance. While 24 one-core machines would perform by 400 calls/s to the ERP system, 8 four-core machines with 8 processes do over 800 calls/s.
Now to the question: when even increasing the count of machines to increase performance to 1200 calls/s, we experienced outages of Cloud Queue. In same moment of time, 80% of the machines' processes don't process messages anymore.
Here we have two problems:
Remote debugging is not possible for these processes, but it was possible to use dile to get some information out.
We use GetMessages method of Cloud Queue to get up to 4 messages from queue. Cloud Queue always answers with 0 messages. Reconnect the cloud queue does not help.
Restarting workers does help, but shortly lead to same problem.
Are we hitting the natural end of scalability of Cloud Queue and should switch to Service Bus?
Update:
I have not been able to fully understand the problem, I described it in the natual borders of Cloud Queue.
To summarize:
Count of TCP connections have been impressive. Actually too impressive (multiple hundreds)
Going back to original memory size let the system operate normally again
In my experience I have been able to get better raw performance out of Azure Cloud Queues than service bus, but Service Bus has better enterprise features (reliable, topics, etc). Azure Cloud Queue should process up to 2K/second per queue.
https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/
You can also try partitioning to multiple queues if there is some natural partition key.
Make sure that your process don't have some sort of thread deadlock that is the real culprit. You can test this by connecting to the queue when it appears hung and trying to pull messages from the queue. If that works it is your process, not the queue.
Also take a look at this to setup some other monitors:
https://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/
It took some time to solve this issue:
First a summarization of the usage of the storage account:
We used the blob storage once a day pretty heavily.
The "normal" diagonistics that Azure provides out of the box also used the same storage account.
Some controlling processes used small tables to store and read information once an hour for ca. 20 minutes
There may be up to 800 calls/s that try to increase a number to count calls to an ERP system.
When recognizing that the storage account is put under heavy load we split it up.
Now there are three physical storage accounts heaving 2 queues.
The original one still keeps up to 800/s calls for increasing counters
Diagnositics are still on the original one
Controlling information has been also moved
The system runs now for 2 weeks, working like a charm. There are several things we learned from that:
No, the infrastructure is "not just there" and it doesn't scale endlessly.
Even if we thought we didn't use "that much" summarized we used quite heavily and uncontrolled.
There is no "best practices" anywhere in the net that tells the complete story. Esp. when start working with the storage account a guide from MS would be quite helpful
Exception handling in storage is quite bad. Even if the storage account is overused, I would expect some kind of exception and not just returning zero message without any surrounding information
Read complete story here: natural borders of cloud storage scalability
UPDATE:
The scalability has a lot of influences. You may are interested in Azure Service Bus: Massive count of listeners and senders to be aware of some more pitfalls.
I searched online for awhile about what is "Excessive resource usage" on SQL Azure, still cannot get an idea.
Some articles suggest query takes too long, too much memory etc will cause "Excessive resource usage". But If I use simple query, simple data structure, what will happen?
For example: I get a 1G SQL Azure as session state. Since session is a very small string, and save/delete all the time, I don't think it will grow to 1G for millions of session simultaneously. You can calculate, for 1 million session, 20 char each, only take 20M space, consider 20 minutes expire etc. Cannot even close to 1G. But the queries, should be lots and lots. Each query will be very simple and fast by index.
I wanna know, if this use will be consider as "Excessive resource usage"? Is there any hard number to limit you on the usage?
Btw, as example above, if all happen in same datacenter, so all cost is 1G database which is $10 a month, right?
Unfortunately the answer is 'it depends'. I think that probably the best reference (with guidance) on the SQL Azure Query Throttle is here: TechNet Article on SQL Azure Perormance This will povide details about the metrics that are monitored and the mechanism of the throttle.
The reason that I say it depends is that the throttle is non-deterministic for any given user. This is because the throttle will be activated based on the total load on the node (physical SQL Server in Azure DC). While the subscribers who will get throttled are the subscribers delivering the greatest load the level at which the throttle kicks in will depend on the total load on the node. SO if you are on a quiet node (where other tenant DBs are relatively inactive) then you will be able to put through a bunch more throughput than if you are on a busy node.
It is very appealing to use 1GB SQL Azure DBs for session state storage; you've identified the cost benefits. You are taking a risk though. One way to mitigate this risk is to partition across at least two SQL Azure 1GB DBs and adjust the load yourself based on whether one of the DBs starts hitting the throttle.
Another option if you want determinism for throughput is to use the WIndows Azure Cache to back your sesion state store. The Cache has hard pre-defined limits for query throughput so you can plan for it more easily Azure Caching FAQ including Limits. The Cache approach is probably a bit more expensive but with a lower risk of problems.
I'm working on a real-time application and building it on Azure.
The idea is that every user reports something about himself and all the other users should see it immediately (they poll the service every seconds or so for new info)
My approach for now was using a Web Role for a WCF REST Service where I'm doing all the writing to the DB (SQL Azure) without a Worker Role so that it will be written immediately.
I've come think that maybe using a Worker Role and a Queue to do the writing might be much more scalable, but might interfere with the real-time side of the service. (The worker role might not take the job immediately from the queue)
Is it true? How should I go about this issue?
Thanks
While it's true that the queue will add a bit of latency, you'll be able to scale out the number of Worker Role instances to handle the sheer volume of messages.
You can also optimize queue-reading by getting more than one message at a time. Since a single queue has a scalability target of 500 TPS, this lets you go well beyond 500 messages per second on reads.
You might look into a Cache for buffering the latest user updates, so when polling occurs, your service reads from cache instead of SQL Azure. That might help as the volume of information increases.
You could have a look at SignalR, it does not support farm scenarios out-of-the-box, but should be able to work with the use of either internal endpoint calls to update every instance, using the Azure Service Bus, or using the AppFabric Cache. This way you get a Push scenario rather than a Pull scenario, thus you don't have to poll your endpoints for potential updates.