How do i replicate a huge mount of data for loadbalancing purpose?

How do i replicate a huge mount of data for loadbalancing purpose? - load-balancing

i had a website with huge mount of data (781GB), and now im about to do loadbalancing with my website.
the problem is how do i replicate my website to the other node ? does NFS / GlusterFS works in this case?

Related

Varnish x Apache CDN & HTTPs

I'm currently trying to setup a DYI CDN using Varnish, Nginx, & Apache.
This is the following setup I have planned.
The following assumes:
1. The Varnish Origin server is on the same server as the web server (Apache in this case)
2. You have 2 Varnish Cache Servers located in different countries, one in NA, & one in EU
Example of an NA client attempting to retrieve data:
NA Client --> Varnish Origin Server --> NA Varnish Cache Server --> If result NOT IN cache --> Return Apache Origin Server results --> Input request data into NA Varnish Cache Server
Example of an EU client attempting to retrieve data:
EU Client --> Varnish Origin Server --> EU Varnish Cache Server --> If result IN cache --> Return EU Varnish Cache results
Any suggestions and/or mistakes? Where would I insert Nginx/HAProxy in order to terminate SSL, since Varnish doesn't accept HTTPs?

What you're suggesting is perfectly possible and has become an increasingly more popular use case for us at Varnish Software.
Geolocation
First things first: I'm assuming all users, regardless of their location, will use the same hostname to connect to the CDN. Let's say the hostname is www.example.com.
US users should automatically connect to a Varnish edge node in the US, EU users should be directed to the EU.
This geographical targeting requires some sort of GeoDNS approach. If your DNS provider can do this for you, things will be a lot easier.
If not, I know for a fact that AWS Route 53 does this. If you want to use open source technology, you can host the DNS zone yourself using https://github.com/abh/geodns/.
So if a US user does a DNS call on www.example.com, this should resolve to us.example.com. For EU users this will be eu.example.com.
Topology
Your setup will connect from the a local Varnish server to a remote Varnish server. This seems like one hop too many. If the geolocation properly works, you'll directly end up on the Varnish server that is closest to your user.
We call these geolocated servers "edge nodes". They will connect back to the origin server(s) in case requested content is not available in cache.
It's up to you to decide if one origin Apache will do, or if you want to duplicate your Apache servers in the different geographical regions.
SSL/TLS
My advice in terms of SSL/TLS termination: Use Hitch. It's a dedicated TLS Proxy that was developed by Varnish, to use with Varnish. It's open source.
You can install Hitch on each Varnish server and accept HTTPS there. The connection between Hitch and Varnish can be done over Unix Domain Sockets, which further reduces latency.
Our tests show you can easily process 100 Gbps on a single server using terminated TLS with Hitch.
Single tier Varnish or multi-tier Varnish
If your CDN requires a lot of storage, I'd advise you to setup a multi-tier Varnish setup in each geographical location:
The edge tier will be RAM heavy and will cache the hot content in memory using the malloc stevedore in Varnish
The storage tier will be disk heavy and will cache long tail content on disk using the file stevedore in Varnish
Although the file stevedore is capable of caching terrabytes of data, it is quite prone to disk fragmentation, which at very large scale will slow you down in the long run.
If you have tiered Varnish servers, you can tune each tier to its needs. Combined, the results will be quite good: although the file stevedore has its limitations, it will still be a lot faster than constantly accessing the origin when the cache of the edge servers is full.
Varnish Software's DIY CDN solution
Varnish Software, the company behind the Varnish Cache project, has done many CDN integration projects for some of the world's biggest web platforms.
Varnish Cache, the open source project, is the foundation of these CDN solutions. However, typical CDN clients have some extra requirements, that are not part of the open source solution.
That's why we developed Varnish Enterprise, to tackle these limitations.
Have a look at Varnish Software's DIY CDN solution to learn more. Please also have a look at the docs containing the extra features of the product.
If you want to play around with these features without buying a license up front, you can play around with Varnish Enterprise images in the Cloud.
We have an AWS image available on the AWS marketplace.
We have an Azure image available on the Azure marketplace.
We have a GCP image available on the GCP marketplace
Our most significant CDN feature in Varnish Enterpise is the Massive Storage Engine. It was specifically built to counter the limitations of the file stevedore that is prone to disk fragmentation and non-persistent.
There's a lot of other cool stuff in Varnish Enterprise for CDN as well, but you'll find that on the docs pages I referred to.

AWS S3 and AWS ELB instead of AWS Elastic beanstalk for SPA Angular 6 application

I am creating an Angular 6 frontend application. My backend api are created in DotNet. Assume the application is similar to https://www.amazon.com/.
My query is related to frontend portion deployment related only, on AWS. Large number of users with variable count pattern are expected on my portal. I thought of using AWS elastic beanstalk as PAAS web server.
Can AWS S3/ ELB be used instead of PAAS beanstalk without any limitations?

I'm not 100% sure what you mean by combining an Elastic Load Balancer with S3. I think you may be confused as to the purpose of the ELB, which is to distribute requests to multiple servers e.g. NodeJS servers, but cannot be used with S3 which is already highly available.
There are numerous options when serving an Angular app:
You could serve the files using a nodejs app, but unless you are doing server-side rendering (using Angular Universal), then I don't see the point because you are just serving static files (files that don't get stitched together by a server such as when you use PHP). It is more complicated to deploy and maintain a server, even using Elastic Beanstalk, and it is probably difficult to get the same performance as you could do with other setups (see below).
What I suspect most people would do is to configure an S3 bucket to host and serve the static files of your Angular app (https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html). You basically configure your domain name to resolve to the S3 bucket's url. This is extremely cheap, as you are not paying for a server which is running constantly, but rather only have to pay the small storage cost and plus a data transfer fee which would be directly proportional to your traffic.
You can further improve on the S3 setup by creating a CloudFront distribution that uses your S3 bucket as it's origin (the location that it get files from). When you configure your domain name to resolve to your CloudFront distribution, then instead of a user's request getting the files from the S3 bucket (which could be in a region on the other side of the world and so slower), the request will be directed to the closest "edge location" which will be much closer to your user, and check if files are cached there first. It is basically a global content delivery network for your files. This is a bit more expensive than S3 on it's own. See https://aws.amazon.com/premiumsupport/knowledge-center/cloudfront-serve-static-website/.

Indexing document to solrcloud cluster present inside amazon VPC

I am moving my solr cloud machines to amazon VPC and the machine from which I am going to index the data to the solr machines is outside VPC. As far as I know the bin/post utility in solr doesn't provide a way of proxying the request to the solr machines.
So, how do I index my files to solr machines from a machine that is outside the VPC network.

How can I create a shared /home dir across my Amazon EC2 servers

I have a cluster of EC2 servers spun up with Ubuntu 12.04. This will be a dev environment where several developers will be ssh-ing in. I would like to set it up where the /home directory is shared across all 4 of these servers. I want to do this to A) ease the deployment of the servers, and B) make it easier on the devs so that everything in their homedir is available to them on all servers.
I have seen this done in the past with a NetApp network attached drive, but I can't seem to figure out how to create the equivalent using AWS components.
Does anyone have an idea of how I can create this same setup using Amazon services?

You'll probably need to have a server host an NFS share to store the home directories. I'd try out what this guy has done in his answer https://serverfault.com/questions/19323/is-it-feasible-to-have-home-folder-hosted-with-nfs.

404 redirect with cloud storage

I'm hoping to reach someone with some experience using a service like Amazon's S3 with this question. On my site we have a dedicated image server. And on this server, we have an automatic 404 redirect through Apapche so that, if a user tries to access an image that doesn't exist, they'll see a snazzy "Image Not Available" image.
We're looking to move the hosting of these images to a cloud storage solution (S3 or Rackspace's CloudFiles), and I'm wondering if anyone's had any success replicating this behavior on a cloud storage service and if so how they did it.

THe Amazon instances are just like normal hosted server instances once they are up and running so your Apache configuration could assumedly be identical to what you currently have.
Your only issue will be where to store the images. The new Amazon Elastic Block Store makes it easy to mount a drive based on S3 backed data. You could store all your images on such a volume and use it with your Apache instance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas