Web Service - Efficient way to transfer file to cloud platform storage - wcf

I have a service that requires some input files from the client. This service is run from a number of different cloud platforms (Amazon, Azure, etc.). Further, it requires that these input files be stored in their respective cloud platform's persistent storage. What is the best way to transfer those input files from the client to the service without requiring that the client knows about the specific cloud platform's details (e.g. secret keys)?
Given these requirements, the simple approach of the client transferring these files directly to the platform's storage (e.g. S3) isn't possible.
I could have the client call a service method and pass it the file blob to transfer up to the service, but that would require that the service receive the file, then do it's own file transfer to the cloud platform's storage (in other words, do 2 file transfers for 1 file).
So, the question is, what is the best way to transfer a file to storage given that we don't know what cloud platform the service is running on?

In this setting your proposed proxy solutions seems to be the only one and although the double transfer seems to be bad, in practice its not that evil as the cloud storage is commonly in the same datacenter as the compute nodes and therefore very fast to access.

Related

Cloud and local application sync ideas

I've a situation where my central MySQL db and file system (S3) runs on a EC2.
But one of my application runs locally at my client site on a PI-3 device, which needs to look up data and files from both the DB and file system on cloud. The application generates transactional records in turn and need to upload the DB and FS (may be at day end).
The irony is that sometimes the cloud may not be available due to connectivity issues (being in a remote area).
What could be the best strategies to accommodate this kind of a scenario?
Can AWS Greengrass help in here?
How to keep the Lookup data (DB and FS)in sync with the local devices?
How to update/sync the transactional data generated by the local devices?
And finally, what could be the risks in such a deployment model?
Appreciate some help/suggestions.
How to keep the Lookup data (DB and FS)in sync with the local devices?
You can have a Greengrass Group and includes all of the devices in the that group. Make the devices subscribe to a topic e.g. DB/Cloud/update. Once device received the message on that topic, trigger a on-demand lambda to download the latest information from the Cloud. To make sure the device do not miss any update when offline, you can use persistent session, it will make sure device will receive all the missing message when it is back online.
How to update/sync the transactional data generated by the local devices?
You may try with the Stream Manager. https://docs.aws.amazon.com/greengrass/latest/developerguide/stream-manager.html
Right now, it is allowed you to add a local use lambda to pre-process the data and sync it up with the cloud

Using Kubernetes Persistent Volume for Data Protection

To resolve a few issues we are running into with docker and running multiple instances of some services, we need to be able to share values between running instances of the same docker image. The original solution I found was to create a storage account in Azure (where we are running our kubernetes instance that houses the containers) and a Key Vault in Azure, accessing both via the well defined APIs that microsoft has provided for Data Protection (detailed here).
Our architect instead wants to use Kubernetes Persitsent Volumes, but he has not provided information on how to accomplish this (he just wants to save money on the azure subscription by not having an additional storage account or key storage). I'm very new to kubernetes and have no real idea how to accomplish this, and my searches so far have not come up with much usefulness.
Is there an extension method that should be used for Persistent Volumes? Would this just act like a shared file location and be accessible with the PersistKeysToFileSystem API for Data Protection? Any resources that you could point me to would be greatly appreciated.
A PersistentVolume with Kubernetes in Azure will not give you the same exact functionality as Key Vault in Azure.
PesistentVolume:
Store locally on a mounted volume on a server
Volume can be encrypted
Volume moves with the pod.
If the pod starts on a different server, the volume moves.
Accessing volume from other pods is not that easy.
You can control performance by assigning guaranteed IOPs to the volume (from the cloud provider)
Key Vault:
Store keys in a centralized location managed by Azure
Data is encrypted at rest and in transit.
You rely on a remote API rather than a local file system.
There might be a performance hit by going to an external service
I assume this not to be a major problem in Azure.
Kubernetes pods can access the service from anywhere as long as they have network connectivity to the service.
Less maintenance time, since it's already maintained by Azure.

GCP - CDN Server

I'm trying to architect a system on GCP for scalable web/app servers. My initial intention was to have one disk per web server group hosting the OS, and another hosting the source code + imagery etc. My idea was to mount the OS disk on multiple VM instances so to have exact clones of the servers, with one place to store PHP session files (so moving in between different servers would be transparent and not cause problems).
The second idea was to mount a 2nd disk, containing the source code and media files, which would then be shared with 2 web servers, one configured as a CDN server and one with the main website and backend. The backend would modify/add/delete media files, and the CDN server would supply them to the browser when requested.
My problem arises when reading that the Persistent Disk Storage is only mountable on a single VM instance with read/write access, and if it's needed on multiple instances it can be mounted only in write access. I need to have one of the instances with read/write access with the others (possibly many) with read only access.
Would you be able to suggest ways or methods on how to implement such a system on the GCP, or if it's not possible at all?
Unfortunately, it's not possible.
But, you can create a Single-Node File Server and mount it as a read and write disk on other VMs.
GCP has documentation on how to create a single-Node File Server
An alternative to using persistent (which as you said, only alows a single RW mount or many read-only) is to use Cloud Storage - which can be mounted through FUSE.

Amazon web services: Where to start

I am a recent grad and wanted to learn about doing web application using AWS. I have gone through the documentation and ran their sample Travel Log application Successfully.
But still I am not clear about the terminologies used. can anyone explain me the difference between Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon SimpleDB in simple words.
I am looking to come up with a web app that has a signin page and people posting some text there. may i know what services of amazon would be required for me to build this app.
Thanks
Amazon Simple Storage Service (S3) is for load static content , maybe images, videos, or something you want to save, You could think of it like a hard drive for storage.
Amazon Elastic Compute Cloud: ( EC2) basically is your Virtual Operative System, you can install whatever OS you want (Debian, Ubuntu, Fedora, Centos, Windows Server, Suse enterprise). ( if your application uses server side processing this will be its home)
Amazon Simple DB, is a no-sql database system, that you could use for your aplications, and Amazon gives you as a service, but if you want to use something more, you could install yours on EC2, or use RDS for Database server (MySql for example)
If you want to know more, there are some books, like: "programming Amazon EC2" or see Amazon screencast at http://www.youtube.com/user/AmazonWebServices or its presentation on http://www.slideshare.net/AmazonWebServices
Amazon Simple Storage Service (Amazon S3)
Amazon S3 (Simple Storage Service) is a scalable, high-speed, low-cost web-based service designed for online backup and archiving of data and application programs. It allows to upload, store, and download any type of files up to 5 TB in size. This service allows the subscribers to access the same systems that Amazon uses to run its own web sites. The subscriber has control over the accessibility of data, i.e. privately/publicly accessible.
Amazon Elastic Compute Cloud (Amazon EC2)
Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) cloud. Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deploy applications faster. You can use Amazon EC2 to launch as many or as few virtual servers as you need, configure security and networking, and manage storage. Amazon EC2 enables you to scale up or down to handle changes in requirements or spikes in popularity, reducing your need to forecast traffic.
Amazon SimpleDB
Amazon SimpleDB is a highly available NoSQL data store that offloads the work of database administration. Developers simply store and query data items via web services requests and Amazon SimpleDB does the rest.
Unbound by the strict requirements of a relational database, Amazon SimpleDB is optimized to provide high availability and flexibility, with little or no administrative burden. Behind the scenes, Amazon SimpleDB creates and manages multiple geographically distributed replicas of your data automatically to enable high availability and data durability. The service charges you only for the resources actually consumed in storing your data and serving your requests. You can change your data model on the fly, and data is automatically indexed for you. With Amazon SimpleDB, you can focus on application development without worrying about infrastructure provisioning, high availability, software maintenance, schema and index management, or performance tuning.
For more information, go through these:
https://aws.amazon.com/simpledb/
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html
https://www.tutorialspoint.com/amazon_web_services/amazon_web_services_s3.htm
Amazon S3 is used for storage of files. It is basically like the hard drives like on your system you use C or D your files. If you are developing any application you can use S3 for storing the static files or any backup files.
Amazon EC2 is exactly like your physical machine. Only difference is EC2 is on cloud. You can install and run software, applications store files exactly you do on your physical machines.
Amazon Simple DB is a a database on cloud. you can integrate it with your application and make queries.

Which is a Better Solution in this scenario of WCF

i have a WCF service which Monitors a Particular Drive and Creates a New Folder weekly which i am using as a Document Storage
i have many Drives configured for Document Storage and i have to Monitor which Drive is Active(only one drive can be Active at one time ) and on Weekly Basis i have to Add a new Folder in My Active Drive at a predefined Path
provided at the configuration Time.
The Client can make any Drive Inactive or the drive can become Inactive if it is Full and i need to make another Drive Active dynamically using a Service based on priority for example
i have following drives
drive A priority 1 Active yes
drive B priority 2 Active no
if A Becomes Full i have to Make Drive B as Active
Now should i Implement a WCF Service in IIS or as a Windows Service as My Program Will Watch has to Perform Many Actions Like check the drive size and make another drive Active and send Updates in the Database
Which is a Better Way IIS or Windows Service
I need A Service which Get the Information about Drives path From the Database and I have a Configuration WIndows Application which needs to communicate with this Service also to check the drive path and Check the size if it is invalid Application will not Configure the Drive Path and if it is valid it will keep the entry in the Database and any client can have multiple directories and only one directory will be Active So that i can Store Documents in it
What about the Performance and can i configure WCF for IIS like IIS does not Refresh the Application Pool as i want my Service to Run periodically say every 30minutes – Nitin Bourai just now edit
It seems to me a better architecture would be to have a service responsible for persisting your Documents, it can then decide where (and how) to store it and where to read it from based on who's requesting it / how much disk space is available etc. This way all your persistance implementation details are hidden from consumers - they only need to care about Documents, not how they are persisted.
As to how to host it... there is lots of useful information out there documenting both:
IIS : here
Windows Service: here
Both would be more than capable of hosting such a service.
I would go with a windows service in this case. Unless I misunderstand, you want this all to happen with no human intervention, correct? So, I don't see a contract, which means its not a good candidate for WCF.
As I see it both Windows Service or IIS hosted service will work well in your scenario. Having said that, I would go with the Windows Service. It is just a feeling matter but I guess you have a little more config support 'out of the box'. I believes it is easier to config what to do if it fail to start, config the user you want the service to run with and so on.
But as I said, it is a matter of feeling