Should I create a separate file upload server besides main graphql server? [closed]

Should I create a separate file upload server besides main graphql server? [closed] - file-upload

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I'm creating a mobile application with React Native which is going to heavily depend on file uploading in form of images and videos. Currently my GraphQL server is handling all the database interaction and I now want to add the functionality to upload images (right now only profile pictures, later videos). These files will be stored in a cloud object storage.
It would be quite easy to use apollo-upload-client in the mobile application and graphql-upload with my Apollo server to handle the uploading of files. But I'm not sure if I should create a seperate server which only handles the interaction with files so that my main server only needs to handle the DB jobs. File uploading would also add a huge amount of load to the main GraphQL server which needs to be super fast and responsive as most of the application depends on it.
I would be interested to hear other opinions and tips on this topic and if it's worth creating a seperate server for interaction with files.
Or even look into different languages like Elixir or Python to improve performance because we would also need to process and compress the videos and images to reduce their size.

IMO, if your final destination is cloud-based storage, you're going to be better off (and pay less) if you upload directly to the cloud. What I generally recommend is a 3-step process:
mutation to create a signed upload URL (or signed upload form)
Client uploads directly to the cloud (temporary location with TTL)
mutation to process the form which contained the upload metadata (process and move to final location)
This especially gets interesting once you start handling multiple uploads, and figuring how to process them asynchronously while the user is filling out the rest of the form.

Related

Cost effective solution to upload large files on S3 using application running in EC2 instance [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 13 days ago.
Improve this question
I have an application running on EC2 instance. Using that application user can large file 10MB+ on S3 bucket. The mechanism I am copies the file on EC2 instance and then upload it on S3 bucket. It is costly because two times copy (first on EC2 and then on S3 bucket). I have tried using another solution in order to reduce two-time copy a file.
API-gateway ==>> lambda function ==>> S3 bucket.
API gateway has a limit of 10MB i.e. file size should be less than 10MB.
I thought of splitting the file into small pieces but again lambda have to zip them (it again takes time to zip them).
The other solution is s3 pre-signed URLs but it is again costly. I need some effective solution for this problem
I found solutions but they were not either cost effective or time consuming.

S3 pre-signed URLs is the way as mentioned by luk2302.

AWS S3 to Glacier: did backup work? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am experimenting with backing up data in my Amazon S3 folders to Glacier using lyfecycle management options. I chose one of the folders in the bucket for the testing and created a lifecycle rule that states that objects with that prefix need to be migrated to Glacier after 30 days. I created the rule today but these files are all older than 30 days so I expected them to get migrated right away. But I am looking at that S3 folder and not noticing any changes. How do I find out if a backup actually occurred?

The lifecycle management policy (LMP) you applied will affect all items matching it, whether they existed before you applied the policy or were created after you applied it. It takes time for the policy to synchronize across all of your items in S3. See Object Lifecycle Management just before and after "Before You Decide to Archive Objects".
The objects moved by a LMP are only visible through the S3 API, not via the Glacier API or console. You'll continue to see the objects listed in your S3 bucket, but the object's metadata will be updated to indicate that the x-amz-storage-class is Glacier. You should be able to see this through the S3 console, or by making a request for the object's metadata using the S3 API. See Object Key and Metadata for the System-Defined Metadata.

How to make a daily back up of my ec2 instance? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a community AMI based Linux EC2 instance in AWS. Now I want to take a daily back up of my instance, and upload that image in to S3.
Is that the correct way of doing the back up of my EC2 instance? Can anybody help me to point out the correct method for taking back up of my EC2 instance?

Hopefully your instance is EBS backed.
If so, you can backup your instance by taking an EBS Snapshot. That can be done through aws.amazon.com (manually), using AWS Command Line Tools (which can be automated and scheduled in cron or Windows Task Scheduler as appropriate) or through the AWS API.
You want to ensure that no changes are made to the state of the database backup files during the snapshot process. When I used this strategy for MySQL running on Ubuntu, I used a script to ensure a consistent snapshot. That script uses a feature of the XFS file system to freeze the filesystem during the snapshot. In that deployment, the snapshot only took 2-3 seconds and was performed at a very off-peak time. Any website visitors would experience a 2-3 second lag. For Windows, if the device can not be rebooted for the snapshot (you have no maintenance window at night), I would instead create a separate EBS device (e.g. a "S:\" device for snapshots), use SQL Server backup tools to create a .bak file on that other device, then create an EBS snapshot of that separate EBS device.
For details on scripting the backup, see this related question:
Automating Amazon EBS snapshots anyone have a good script or solution for this on linux
If you have separate storage mounted e.g. for your database, be sure you back that up too!
UPDATE
To create a snapshot manually,
Browse to https://console.aws.amazon.com/ec2/home?#s=Volumes
Right-click on the volume you want to backup (the instance the volume is attached to is in the column named 'Attachment Information')
Select Create Snapshot
To create an AMI image from the instance and lauch other instances just like it (in instances with more resources or to balance load, etc.):
Browse to https://console.aws.amazon.com/ec2/home?#s=Instances
Right-click on the instance you want to create the AMI from
Select Create Image (EBS AMI)

How to detect changes in Amazon S3? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Notification of new S3 objects
Get notified when user uploads to an S3 bucket?
What's the most efficient way to detect changes in Amazon S3? A number of distributed boxes need to synchronize local files with S3. Each box needs to synchronize with a portion of an S3 bucket. Sometimes files get dropped into a bucket from an external source, and so the boxes won't know about it.
I could write a script that continually crawls all files on S3 and notifies the appropriate box when there is a change, but that will be slow and expensive. (There will be millions of files). I thought about enabling logging on the bucket, but it takes a long time for logs to get written, and I would like to get notified of changes fairly quickly.
Any other ideas?

Amazon provides a means of notifying bucket events (as seen here), but the only event currently supported is the s3:ReducedRedundancyLostObject.
I am afraid the only ways you can do what you want, today, are by either polling (or crawling, like you said) or modifying the clients who upload files to your bucket(s) (if you are in control of their code) in order to notify your boxes whenever stuff is uploaded/changed.

Server Load & Scalability for Massive Uploads

I want to upload millions of audio items by users to my server. The current app has designed to give the contents, transcode them and finally send by ftp to storage servers. I want to know:
Does the app server can bear the enormous tasks by user like commenting, uploading, transcoding after scaling to more servers (to carry web app load)?
If the answer of above question is yes, is it correct and best approach? Because a good architecture will be to send transcoding to storage servers wait for finishing the job and sending respond to app server but at the same time it has more complexity and insecurity.
What is the common method for this type of websites?
If I send the upload and transcoding job to storage servers does it compatible with enterprise storage technologies in a long term scalability?
5- The current App is based on PHP. Is it possible to move tmp folder to another servers to overcome upload overload?
Thanks for answer, for tmp folder question number 5. I mean the tmp folder in Apache. I know that all uploaded files before moving to final storage destination (eg: storage servers or any solution) are stored in tmp folder of apache. I was wondering if this is a rule for apache and all uploaded files should be located first in app server, so how can I control, scale and redirect this massive load of storage to a temporary storage or server? I mean a server or storage solution as tmp folder of appche to just be guest of uploaded files before sending to the final storages places. I have studied and designed all the things about scaling of database, storages, load balancing, memcache etc. but this is one of my unsolved question. Where new arrived files by users to main server will be taken place in a scaled architect? And what is the common solution for this? (In one box solution all files will be temporary in the tmp dir of appche but for massive amount of contents and in a scaled system?).
Regards

You might want to take a look at the Viddler architecture: http://highscalability.com/blog/2011/5/10/viddler-architecture-7-million-embeds-a-day-and-1500-reqsec.html

Since I don't feel I can answer this (I wanted to add a comment, but my text was too long), some thoughts:
If you are creating such a large system (as it sounds) you should have some performance tests to see, how many concurrent connections/uploads,... whatever your architecture can handle. As I always say: If you don't know it: "no, it can't ".
I think the best way to deal with heavy load (this is: a lot of uploads, requiring a lot of blocked Threads from the appserver (-> this means, I would not use the Appserver to handle the fileuploads). Perform all your heavy operations (transcoding) asynchronously (e.g. queue the uploaded files, processess them afterwards). In any case the Applicaiton server should not wait for the response of the transcoding system -> just tell the user, that his file are going to be processed and send him a message (or whatever) when its finished. You can use something like gearman for that.
I would search for existing architectures, that have to handle a lot of uploads/conversion too (e.g. flickr) just go to slideshare and search for "flickr" or "scalable web architecture"
I do not really understand this - but I would use Servers based on their tasks (e.g. Applicaiton server, Database serversm, Transconding servers, Storage,...) - each server should do, what he can do best.
I am afraid I don't know what you are talking about when you say tmp folder.
Good luck

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas