Cloud and local application sync ideas - amazon-s3

I've a situation where my central MySQL db and file system (S3) runs on a EC2.
But one of my application runs locally at my client site on a PI-3 device, which needs to look up data and files from both the DB and file system on cloud. The application generates transactional records in turn and need to upload the DB and FS (may be at day end).
The irony is that sometimes the cloud may not be available due to connectivity issues (being in a remote area).
What could be the best strategies to accommodate this kind of a scenario?
Can AWS Greengrass help in here?
How to keep the Lookup data (DB and FS)in sync with the local devices?
How to update/sync the transactional data generated by the local devices?
And finally, what could be the risks in such a deployment model?
Appreciate some help/suggestions.

How to keep the Lookup data (DB and FS)in sync with the local devices?
You can have a Greengrass Group and includes all of the devices in the that group. Make the devices subscribe to a topic e.g. DB/Cloud/update. Once device received the message on that topic, trigger a on-demand lambda to download the latest information from the Cloud. To make sure the device do not miss any update when offline, you can use persistent session, it will make sure device will receive all the missing message when it is back online.
How to update/sync the transactional data generated by the local devices?
You may try with the Stream Manager. https://docs.aws.amazon.com/greengrass/latest/developerguide/stream-manager.html
Right now, it is allowed you to add a local use lambda to pre-process the data and sync it up with the cloud

Related

Manage In-memory cache in multiple servers in aws

Once or twice a day some files are being uploaded to S3 Bucket. I want the uploaded data to be refreshed with the In-memory data of each server on every s3 upload.
Note there are multiple servers running and I want to store the same data in all the servers. Also, the servers are scaling based on the traffic(also on start-up of the new server goes up and older ones go down means server instances will not be the same always).
Like I want to keep updated data in the cache.
I want to build an architecture where auto-scaling of the server can be supported. I came across the FAN-OUT architecture of AWS by using the SNS and multiple SQS from which different servers can poll.
How can we handle the auto-scaling of the queue with respect to servers?
Or is there any other way to handle the scenario?
PS: I m totally new to the AWS environment.
It Will be a great help for any reference.
To me there are a few things that you need to have to make this work. These are opinions and, as with most architectural designs, there is certainly more than one way to handle this.
I start with the assumption that you've got an application running on an EC2 of some sort (Elastic Beanstalk, Fargate, Raw EC2s with auto scaling, etc.) and that you've solved for having the application installed and configured when a scale-up event occurs.
Conceptually I'd have this diagram:
The setup involves having the S3 bucket publish likely s3:ObjectCreated events to the SNS topic. These events will be published when an object in the bucket is updated or created.
Next:
During startup your application will pull the current data from S3.
As part of application startup create a queue named after the instance id of the EC2 (see here for some examples) The queue would need to subscribe to the SNS topic. If the queue already exists then that's not an error.
Your application would have a background thread or process that polls the SQS queue for messages.
If you get a message on the queue then that needs to tell the application to refresh the cache from S3.
When an instance is shut down there is an event from at least Elastic Beanstalk and the load balancers that your instance will be shut down. Remove the SQS queue tied to the instance at that time.
The only issue might be that a hard crash of an environment would leave orphan queues. It may be advisable to either manually clean these up or have a periodic task clean them up.

Using Kubernetes Persistent Volume for Data Protection

To resolve a few issues we are running into with docker and running multiple instances of some services, we need to be able to share values between running instances of the same docker image. The original solution I found was to create a storage account in Azure (where we are running our kubernetes instance that houses the containers) and a Key Vault in Azure, accessing both via the well defined APIs that microsoft has provided for Data Protection (detailed here).
Our architect instead wants to use Kubernetes Persitsent Volumes, but he has not provided information on how to accomplish this (he just wants to save money on the azure subscription by not having an additional storage account or key storage). I'm very new to kubernetes and have no real idea how to accomplish this, and my searches so far have not come up with much usefulness.
Is there an extension method that should be used for Persistent Volumes? Would this just act like a shared file location and be accessible with the PersistKeysToFileSystem API for Data Protection? Any resources that you could point me to would be greatly appreciated.
A PersistentVolume with Kubernetes in Azure will not give you the same exact functionality as Key Vault in Azure.
PesistentVolume:
Store locally on a mounted volume on a server
Volume can be encrypted
Volume moves with the pod.
If the pod starts on a different server, the volume moves.
Accessing volume from other pods is not that easy.
You can control performance by assigning guaranteed IOPs to the volume (from the cloud provider)
Key Vault:
Store keys in a centralized location managed by Azure
Data is encrypted at rest and in transit.
You rely on a remote API rather than a local file system.
There might be a performance hit by going to an external service
I assume this not to be a major problem in Azure.
Kubernetes pods can access the service from anywhere as long as they have network connectivity to the service.
Less maintenance time, since it's already maintained by Azure.

Synchronizing local parse server to cloud parse server

Here is my situation i have a local parse server to which all my client devices are primarily connected due to the lack of consistent internet resource availability.
I intend to have a cloud hosted parse server to which the local parse can data sync with (pull and push) whenever there is availability of internet resources which may be periodic . How can that be achieved ?
The idea is to sync data between both parse servrr instances
You know that parse-server by itself does not save your data.
What am sure you are trying to refer to is your database.
The best way to go is to export and import your mongo data as needed from your remote db source to your remote data source when you have internet connectivity.
You may also try ngrok.com for temporary network tunneling if that is what you need

Just how volatile is a Bluemix Virtual Server's own storage?

The Bluemix documentation leads a reader to believe that the only persistent storage for a virtual server is using Bluemix Block Storage. Also, the documentation leads you to believe that virtual server's own storage will not persist over restarts or failures. However, in practice, this doesn't seem to be the case at least as far as restarts are concerned. We haven't suffered any virtual server outages yet.
So we want a clearer understanding of the rationale for separating the virtual server's own storage from its attached Block Storage.
Use case: I am moving our Git server and a couple of small LAMP-based assets to a Bluemix Virtual Server as we simultaneously develop new mobile apps using Cloud Foundry. In our case, we don't anticipate scaling up the work that the virtual server does any time soon. We just want a reliable new home for an existing website.
Even if you separate application files and databases out into block storage, re-provisioning the virtual server in the event of its loss is not trivial even when the provisioning is automated with Ansible or the like. So, we are not expecting to have to be regularly provisioning the non-persistent storage of a Bluemix Virtual Server.
The Bluemix doc you reference is a bit misleading and is being corrected. The virtual server's storage on local disk does persist across restart, reboot, suspend/resume, and VM failure. If such was not the case then the OS image would be lost during any such event.
One of the key advantages of storing application data in a block storage volume is that the data will persist beyond the VM's lifecycle. That is, even if the VM is deleted, the block storage volume can be left in tact to persist data. As you mentioned, block storage volumes are often used to back DB servers so that the user data is isolated, which lends itself well to providing a higher class of storage specifically for application data, back up, recovery, etc.
In use cases where VM migration is desired the VMs can be set up to boot from a block storage volume, which enables one to more easily move the VM to a different hypervisor and simply point to the same block storage boot volume.
Based on your use case description you should be fine using VM local storage.

GCP - CDN Server

I'm trying to architect a system on GCP for scalable web/app servers. My initial intention was to have one disk per web server group hosting the OS, and another hosting the source code + imagery etc. My idea was to mount the OS disk on multiple VM instances so to have exact clones of the servers, with one place to store PHP session files (so moving in between different servers would be transparent and not cause problems).
The second idea was to mount a 2nd disk, containing the source code and media files, which would then be shared with 2 web servers, one configured as a CDN server and one with the main website and backend. The backend would modify/add/delete media files, and the CDN server would supply them to the browser when requested.
My problem arises when reading that the Persistent Disk Storage is only mountable on a single VM instance with read/write access, and if it's needed on multiple instances it can be mounted only in write access. I need to have one of the instances with read/write access with the others (possibly many) with read only access.
Would you be able to suggest ways or methods on how to implement such a system on the GCP, or if it's not possible at all?
Unfortunately, it's not possible.
But, you can create a Single-Node File Server and mount it as a read and write disk on other VMs.
GCP has documentation on how to create a single-Node File Server
An alternative to using persistent (which as you said, only alows a single RW mount or many read-only) is to use Cloud Storage - which can be mounted through FUSE.