Azure IOT Edge - Large messages - azure-iot-hub

What is the best practise for sending larger messages from an edge module, to Azure?, should you use file upload in IoT hub, or go directly to a storage on the side?
/Jonas

Fileupload directly via the SDK and IoT Hub is only available for IoT (non Edge) devices, as you already figured out.
You can use the Blob Storage module and deploy it on the edge. https://learn.microsoft.com/en-us/azure/iot-edge/how-to-deploy-blob?view=iotedge-2020-11
To upload files, you then add them to the local blob storage and the module will deal with sending it to the Azure Blob storage account.

I prefer to either use the SDK or Blob Storage module as the authentication to blob storage is handled in a more secure way. The SDK will generate a short lived SAS token to connect to blob storage, and the edge module will send the blob storage connection string down as part of the module twin.

Related

Cloud and local application sync ideas

I've a situation where my central MySQL db and file system (S3) runs on a EC2.
But one of my application runs locally at my client site on a PI-3 device, which needs to look up data and files from both the DB and file system on cloud. The application generates transactional records in turn and need to upload the DB and FS (may be at day end).
The irony is that sometimes the cloud may not be available due to connectivity issues (being in a remote area).
What could be the best strategies to accommodate this kind of a scenario?
Can AWS Greengrass help in here?
How to keep the Lookup data (DB and FS)in sync with the local devices?
How to update/sync the transactional data generated by the local devices?
And finally, what could be the risks in such a deployment model?
Appreciate some help/suggestions.
How to keep the Lookup data (DB and FS)in sync with the local devices?
You can have a Greengrass Group and includes all of the devices in the that group. Make the devices subscribe to a topic e.g. DB/Cloud/update. Once device received the message on that topic, trigger a on-demand lambda to download the latest information from the Cloud. To make sure the device do not miss any update when offline, you can use persistent session, it will make sure device will receive all the missing message when it is back online.
How to update/sync the transactional data generated by the local devices?
You may try with the Stream Manager. https://docs.aws.amazon.com/greengrass/latest/developerguide/stream-manager.html
Right now, it is allowed you to add a local use lambda to pre-process the data and sync it up with the cloud

Using Kubernetes Persistent Volume for Data Protection

To resolve a few issues we are running into with docker and running multiple instances of some services, we need to be able to share values between running instances of the same docker image. The original solution I found was to create a storage account in Azure (where we are running our kubernetes instance that houses the containers) and a Key Vault in Azure, accessing both via the well defined APIs that microsoft has provided for Data Protection (detailed here).
Our architect instead wants to use Kubernetes Persitsent Volumes, but he has not provided information on how to accomplish this (he just wants to save money on the azure subscription by not having an additional storage account or key storage). I'm very new to kubernetes and have no real idea how to accomplish this, and my searches so far have not come up with much usefulness.
Is there an extension method that should be used for Persistent Volumes? Would this just act like a shared file location and be accessible with the PersistKeysToFileSystem API for Data Protection? Any resources that you could point me to would be greatly appreciated.
A PersistentVolume with Kubernetes in Azure will not give you the same exact functionality as Key Vault in Azure.
PesistentVolume:
Store locally on a mounted volume on a server
Volume can be encrypted
Volume moves with the pod.
If the pod starts on a different server, the volume moves.
Accessing volume from other pods is not that easy.
You can control performance by assigning guaranteed IOPs to the volume (from the cloud provider)
Key Vault:
Store keys in a centralized location managed by Azure
Data is encrypted at rest and in transit.
You rely on a remote API rather than a local file system.
There might be a performance hit by going to an external service
I assume this not to be a major problem in Azure.
Kubernetes pods can access the service from anywhere as long as they have network connectivity to the service.
Less maintenance time, since it's already maintained by Azure.

S3 connectors to connect with Kafka for streaming data from on-premise to cloud

I want to stream data from on-premise to Cloud(S3) using Kafka. For which I need to intsall kafka on source machine and also on cloud. But I don't want to intsall it on cloud. I need some S3 connector through which I can connect with kafka and stream data from on-premise to cloud.
If your data is in Avro or Json format (or can be converted to those formates), you can use the S3 connector for Kafka Connect. See Confluent's docs on that
Should you want to move actual (bigger) files via Kafka, be aware that Kafka is designed for small messages and not for file transfers.
There is a kafka-connect-s3 project consisting of both sink and source connector from Spreadfast, which can handle text format. Unfortunately it is not really updated, but works nevertheless

Synchronizing local parse server to cloud parse server

Here is my situation i have a local parse server to which all my client devices are primarily connected due to the lack of consistent internet resource availability.
I intend to have a cloud hosted parse server to which the local parse can data sync with (pull and push) whenever there is availability of internet resources which may be periodic . How can that be achieved ?
The idea is to sync data between both parse servrr instances
You know that parse-server by itself does not save your data.
What am sure you are trying to refer to is your database.
The best way to go is to export and import your mongo data as needed from your remote db source to your remote data source when you have internet connectivity.
You may also try ngrok.com for temporary network tunneling if that is what you need

Web Service - Efficient way to transfer file to cloud platform storage

I have a service that requires some input files from the client. This service is run from a number of different cloud platforms (Amazon, Azure, etc.). Further, it requires that these input files be stored in their respective cloud platform's persistent storage. What is the best way to transfer those input files from the client to the service without requiring that the client knows about the specific cloud platform's details (e.g. secret keys)?
Given these requirements, the simple approach of the client transferring these files directly to the platform's storage (e.g. S3) isn't possible.
I could have the client call a service method and pass it the file blob to transfer up to the service, but that would require that the service receive the file, then do it's own file transfer to the cloud platform's storage (in other words, do 2 file transfers for 1 file).
So, the question is, what is the best way to transfer a file to storage given that we don't know what cloud platform the service is running on?
In this setting your proposed proxy solutions seems to be the only one and although the double transfer seems to be bad, in practice its not that evil as the cloud storage is commonly in the same datacenter as the compute nodes and therefore very fast to access.