Stream Analytics job has validation errors: Job will exceed the maximum amount of Event Hub Receivers - azure-stream-analytics

I am trying to write a query in ASA ( Azure Stream Analytics) with lot of left joins. And it is failing to start the job with the following error:
Stream Analytics job has validation errors: Job will exceed the maximum amount of Event Hub Receivers.

You are most likely hitting Event Hub's 5 readers limit. Take a look at this article.
Here is the relevant part:
Consumer groups
... When a job contains a self-join or multiple inputs, some input may be
read by more than one reader downstream, which impacts the number of
readers in a single consumer group. To avoid exceeding Event Hub limit of
5 readers per consumer group per partition, it is a best practice
to designate a consumer group for each Stream Analytics job.
...

Related

How to achieve dynamic fair processing between batch tasks?

Our use case is that our system supports scheduling multiple multi-channel send jobs at any time. Multi-channel send meaning send emails, send notifications, send sms etc.
How it currently works is we have a SQS queue per channel. Whenever a job is scheduled, it pushes all its send records to appropriate channel SQS. Any job scheduled later then pushes its own send records to appropriate channel SQS and so on. This leads to starvation of later scheduled jobs if the first scheduled job is high volume, as its records will be processed first from queue before reaching 2nd job records.
On consumer side, we have much lower processing rate than incoming as we can only do a fixed amount of sends per hour. So a high volume job could go on for a long time after being scheduled.
To solve the starvation problem, our first idea was to create 3 queues per channel, low, medium high volume and jobs would be submitted to queue as per their volume. Problem is if 2 or more same volume jobs come, then we still face this problem.
The only guaranteed way to ensure no starvation and fair processing seems like having a queue per job created dynamically. Consumers process from each queue at equal rate and processing bandwidth gets divided between jobs. High volume job might take long time to complete, but it wont choke processing for other jobs.
We could create the sqs queues dynamically for every job scheduled, but that will mean monitoring maybe 50+ queues at some point. Better choice seemed having a kinesis stream with multiple shards, where we would need to ensure every shard only contains single partition key that would identify a single job, I am not sure if that's possible though.
Are there any better ways to achieve this, so we can do fair processing and not starve any job?
If this is not the right community for such questions, please let me know.

How spring-cloud-stream Kafka Reactive stream partition assignment working with concurrent processing

For example, I configure a topic have 2 partitions, but in my application with 1 instance, I use Flux.parallel(10) to consume the message, and it has 1000 message lag on that topic, what will happen?
Will it poll 10 messages per time? from 2 partitions or 1 partition?
Only poll 2 message and 1 partition each?
I want to know how it works, so I could configure it right with the capability of large throughput and consume sequence
BTW I found this issue, but now answer there
It's better to use multiple receivers instead.
Using parallel can cause problems with out of order offset commits.

Need burst speed messages per second for devices at various times during a day with Azure IoT hub

While Azure Event hub can have thousands and million? of messages per second, the Azure IoT hub has a surprisingly low limitation on this.
S1 has 12 msg/sec speed but allow 400.000 daily msg pr. unit
S2 has 120 msg/sec speed but allow 6.000.000 daily msg pr. unit
S3 has 6000 msg/sec speed but allow 300.000.000 daily msg pr unit.
Imagine an IoT solution where your devices normally sends 1 message every hour, but have the ability to activate a short "realtime" mode to send messages every second for about 2 minutes duration.
Example: 10.000 IoT devices:
Let's say 20% of these devices happens to start a realtime mode session simultaneously 4 times a day. (We do not have control over when those are started by individual customers). That is 2000 devices and burst speed needed is then 2000 msg/second.
Daily msg need:
Normal messages: 10.000dev * 24hours = 240.000 msg/day
Realtime messages daily count: 2.000dev * 120msg(2 min with 1 msg every second) * 4times a day = 960.000 messages
Total daily msg count need: 240.000 + 960000 msg = 1.200.000 msg/day.
Needed Azure IoT hub tier: S1 with 3 units gives 1.200.000 msg/day. ($25 * 3units = $75/month)
Burst speed needed:
2000 devices sending simultaneously every second for a couple of
minutes a few times a day: 2000 msg/second. Needed Azure IoT hub
tier: S2 with 17 units gives speed 2040 msg/second. ($250 * 17units =
$4250/month) Or go for S3 with 1 unit, which gives speed 6000
msg/second. ($2500/month)
The daily message count requires only a low IoT hub tier due to the modest messages per day count, but the need for burst speed when realtime is activated requires an unproportionally very high IoT hub tier which skyrockets(33 times) the monthly costs for the solution, ruining the businesscase.
Is it possible to allow for temporary burst speeds at varying times during a day as long as the total number of daily messages sent does not surpass current tier max limit?
I understood from an article from 2016 by Nicole Berdy that the throttling on Azure IoT hub is in place to avoid DDOS attacks and misuse. However to be able to simulate realtime mode functionality with Azure IoT hub we need more Event Hub like messages/second speed. Can this be opened up by contacting support or something? Will it help if the whole solution is living inside its own protected network bubble?
Thanks,
For real-time needs definitely, always consider Azure IoT Edge and double check if it is possible to implement it on your scenario.
In the calculations you did above you refer, for example that S2 has 120 msg/sec speed. That is not fully correct. Let me explain better:
The throttle for Device-to-cloud sends is applied only if you overpass 120 send operations/sec/unit
Each message can be up to 256 KB which is the maximum message size.
Therefore, the questions you need to answer to successfully implement your scenario with the lowest cost possible are:
What is the message size of my devices?
Do I need to display messages in near-real-time on customer's Cloud Environment, or my concern is the level of detail of the sensors during a specific time?
When I enable "burst mode" am I leveraging the batch mode of Azure IoT SDK?
To your questions:
Is it possible to allow for temporary burst speeds at varying times
during a day as long as the total number of daily messages sent does
not surpass current tier max limit?
No, the limits for example to S2 are 120 device-to-cloud send operations/sec/unit.
Can this be opened up by contacting support or something? Will it help
if the whole solution is living inside its own protected network
bubble?
No, the only exception is when you need to increase the total number of devices plus modules that can be registered to a single IoT Hub for more than 1,000,000. On that case you shall contact Microsoft Support.

Azure Function to Azure SQL - Performance and Scaling

I have an Azure Function that writes to Azure SQL. It currently reads from a topic but could be changed to read from a queue. Is there a preference to read from a topic or a queue?
There are 200K messages per hour hitting the topic. I need to write the 200K messages per hour to Azure SQL. During processing I regularly get the error "The request limit for the database is 60 and has been reached.". I understand that I've hit the maximum number of DB connections. Is there a way to stop Azure from scaling up the number of Azure Function instances? What's the best way to share SQL connections?
Any other Azure Function to Azure SQL performance tips?
Thanks,
Richard
There is no well-defined way to achieve this with Service Bus. You may want to play with host.json file and change maxConcurrentCalls parameter:
"serviceBus": {
// The maximum number of concurrent calls to the callback the message
// pump should initiate. The default is 16.
"maxConcurrentCalls": XYZ,
}
but it only controls the amount of parallel calls at a single instance.
I would suggest you look at Event Hubs. You get at least 2 bonuses:
You can switch to batches of events instead of 1 by 1 processing. This is usually a very effective way to insert large amount of data into SQL table.
Max concurrency is limited by amount of Event Hub partitions, so you know the hard limit of concurrent calls.
On the downside, you would lose some Service Bus features like dead lettering, auto retries etc.

Stream analytics small rules on high amount of device data

We have the following situation.
We have multiple devices sending data to an event hub (Interval is
one second)
We have a lot of small stream analytics rules for alarm
checks. The rules are applied to a small subset of the devices.
Example:
10000 Devices sending data every second.
Rules for roughly 10 devices.
Our problem:
Each stream analytics query processes all of the input data, although the job has to process only a small subset of the data. Each query filters on device id and filters out the most amount of data. Thus we need a huge number of streaming units which lead to high stream analytics cost.
Our first idea was to create an event hub for each query. However, here we have the problem that each event hub has at least one throughput unit, which leads also to high costs.
What is the best solution in our case?
One possible solution would be to use IoT hub and to create a different Endpoint with a specific Route for the devices you want to monitor.
Have a look to this blog post to see if this will work for your particular scenario: https://azure.microsoft.com/en-us/blog/azure-iot-hub-message-routing-enhances-device-telemetry-and-optimizes-iot-infrastructure-resources/
Then in Azure Stream Analytics, you can use this specific Endpoint as input.
Thanks,
JS (Azure Stream Analytics team)