Is there a recommended way in the Azure ecosystem to join the JSON messages sent by two or more separate devices at approximately the same time in order to run them through, for example, an Azure ML webservice.
The goal of this would be running a real time analysis with data coming from multiple devices.
Thank you
Edit :
Perhaps I should have phrased my question better, but I am currently using Azure Stream Analytics in order to capture the data sent from a device to Azure ML, which works fine (from learn.microsoft.com/en-us/azure/iot-hub/…). Now I want to do the same thing but with multiple devices that each send part of the information that Azure ML needs.
I think what you are looking for is Azure Stream Analytics which allows you to work on windows of time.
This article shows how to integrate ASA with Machine Learning.
And you can easily set the input of an ASA job to an IoT Hub.
Related
I am trying to set up a Stream Analytics Job that accepts input from an Event Hub, processess the input via a ML model, and sends the output e.g. to a Power BI dashboard.
I deployed an ONNX model on an ACI (Azure Container Instance) instance following the documentation here: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where . This seems to be fine and I do get the automated swagger definition and can use the service via REST.
How can I connect to my ML deployment from within the Stream Analytics query? There is the "Functions" setting under "Job Topology" of the "Stream Analytics job" page, but I cannot figure out how to add it there. This ( https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-machine-learning-integration-tutorial ) suggests that it is possible, although it uses Azure Machine Learning Studio (as opposed to Azure Machine Learning without "Studio". I'm quite new to Azure and don't know if this matters or not but find it a bit confusing).
There is an ongoing limited preview which you can sign up to get access to this functionality. You will then be able to use the ONNX model you have deployed on ACI in your Stream Analytics job. We expect to roll out this functionality more broadly in the coming weeks :)
Building in the Google Cloud ecosystem is really powerful. I really like how you can ingest files to Cloud Storage then Data Flow enriches, transforms and aggregates the data, and then finally stored in BigQuery or Cloud SQL.
I have a couple of questions to help me have a better understanding.
If you are to build a big data product using the Google services.
When a front-end web application (might be built in React) submits a file to Cloud storage it may take some time before it completely processes. The client might want to view the status the file in the pipeline. They then might want to do something with the result on completion. How are front-end clients expected know when a file has completed processed and ready? Do they need to poll data from somewhere?
If you currently have a microservice architecture in which each service does a different kind of processing. For example one might parse a file, another might processes messages. The services communicate using Kafka or RabbitMQ and store data in Postgres or S3.
If you adopt the Google services ecosystem could you replace that microservice architecture with Cloud storage, dataflow, Cloud SQL/Store?
Did you look at Cloud Pub/Sub (topic subscription/publication service).
Cloud Pub/Sub brings the scalability, flexibility, and reliability of enterprise message-oriented middleware to the cloud. By providing many-to-many, asynchronous messaging that decouples senders and receivers, it allows for secure and highly available communication between independently written applications.
I believe Pub/Sub can mostly substitute Kafka or RabitMQ in your case.
How are front-end clients expected know when a file has completed processed and ready? Do they need to poll data from somewhere?
For example, if you are using dataflow API to process the file, Cloud dataflow can publish the progress and send the status to a topic. Your front end (app engine) just needs to subscribe to that topic and receive update.
1)
Dataflow does not offer inspection to intermediary results. If a frontend wants more progress about an element being processed in a Dataflow pipeline, custom progress reporting will need to be built into the Pipline.
One idea, is to write progress updates to a sink table and output molecules to that at various parts of the pipeline. I.e. have a BigQuery sink where you write rows like ["element_idX", "PHASE-1 DONE"]. Then a frontend can query for those results. (I would avoid overwriting old rows personally, but many approaches can work).
You cand do this by consuming the PCollection in both the new sink, and your pipeline's next step.
2)
Is your Microservice architecture using a "Pipes and filters" pipeline style approach? I.e. each service reads from a source (Kafka/RabbitMQ) and writes data out, then the next consumes it?
Probably the best way to do setup one a few different Dataflow pipelines, and output their results using a Pub/Sub or Kafka sink, and have the next pipeline consume that Pub/Sub sink. You may also wish to sink them to a another location like BigQuery/GCS, so that you can query out these results again if you need to.
There is also an option to use Cloud Functions instead of Dataflow, which have Pub/Sub and GCS triggers. A microservice system can be setup with several Cloud Functions.
We have the following situation.
We have multiple devices sending data to an event hub (Interval is
one second)
We have a lot of small stream analytics rules for alarm
checks. The rules are applied to a small subset of the devices.
Example:
10000 Devices sending data every second.
Rules for roughly 10 devices.
Our problem:
Each stream analytics query processes all of the input data, although the job has to process only a small subset of the data. Each query filters on device id and filters out the most amount of data. Thus we need a huge number of streaming units which lead to high stream analytics cost.
Our first idea was to create an event hub for each query. However, here we have the problem that each event hub has at least one throughput unit, which leads also to high costs.
What is the best solution in our case?
One possible solution would be to use IoT hub and to create a different Endpoint with a specific Route for the devices you want to monitor.
Have a look to this blog post to see if this will work for your particular scenario: https://azure.microsoft.com/en-us/blog/azure-iot-hub-message-routing-enhances-device-telemetry-and-optimizes-iot-infrastructure-resources/
Then in Azure Stream Analytics, you can use this specific Endpoint as input.
Thanks,
JS (Azure Stream Analytics team)
I am developing a sensor based mobile application for iOS and Android. The data produced by smart phone sensors will be stored in the cloud. At this point, I am wondering that what I should test about the data transfer and storing. I mean that for example, I should test the scenario as if the connection corrupts while GPS data transfer not finished. I am not looking for the techniques, or testing styles. I am trying to find possible failure points or test scenarios. I hope that I could explain my point.
Below are some of the things worth considering for your app:
Incomplete transfers when connection corrupts (as u mentioned)
Cloud-server size..how much request can it handle at a single instance?
If u are considering cloud solutions, you should also consider the location of your users from where they will be accessing your app. Users and the location of data center will also affect in the response time.
Format of the date to stored. Considering a file size which is fast in i/o will also help optimize the speed of the app.
Asynchronous/Synchronous data transfer
Security measures on the cloud..may be using services like VPC if you are considering AWS
These are some things worth considering.
Thanks :)
I would like to create an app where the user can add and view data. Either adding at a desktop/tablet or phone and reading from either source. I would like the data store to be synced between any of the user's devices.
I'm starting to play with the Trial of Azure, and it looks promising. Probably a solid way to sync data through to cloud between users' devices. Other than syncing between a users devices, I have no need for cloud services currently.
I've seen some apps that do a 'Backup/Restore' model with the user's SkyDrive account. But this seems to be a manual process. I'd like to see it done seamlessly.
I've looked into Sync services, but that would be more like a hub/spoke solution. Again, I don't need a central database.
What are some options? At this point, I would be fine using just Windows 8 patterns/practices.
Because they are separate devices, you will need to have some service layer to do the store/forward for you. With that you have two basic choices, use the end user's own storage (aka SkyDrive) or use your own storage (aka Windows Azure).
SkyDrive is fully supported through the Live SDKs and provides a secure way to allow a user to share store their data, and synchronize it across multiple devices. You get security, and there is no cost for the server side storage on your part. The user owns their storage, not you. The limitation is that you may have issues sharing that same data across other devices or users where SkyDrive (or whatever file sync service you use) is not available.
With a service layer, like Azure, you have a lot more flexibility, but you also will be responsible for maintaining (and paying for) that server side storage / services. Have you looked at "Windows Azure Mobile Services". With your Azure account you get 10 free Azure Mobile Services. You will pay for the SQL data storage on the backend, and that cost will depend on the amount of data you store on the server side. You also need to make sure to architect your application in a way to protect an individual users' data, but it is actually pretty easy to do, well documented, and gives you a lot of options.
Lastly, you may consider what type of data you want to share. SkyDrive is great for "Files". Pics, Songs, Videos, etc. Windows Azure Mobile Services (WAMS) is great for "Data".
Neither model is right or wrong. It just depends on your goals.
Hope that helps you go through the thought process