Google Compute Engine Load Balancer limits - load-balancing

I'm thinking of using Google Compute Engine to run a LOT of instances in a target pool behind a network load balancer. Each of those instances will end up real-time processing many large data streams, so at full scale and peak times there might be multiple Terabytes per second go through.
Question:
Is there a quota or limit to the data you can push through those load balancers? Is there a limit of instances you can have in a target pool? (the documentation does not seem to specify this)
It seems like load balancers have a dedicated IP (means it's a single machine?)

There's no limit on the amount of data that you can push through a LB. As for instances, there are default limits on CPUs, persistent or SSD disks, and you can see those quotas in the Developers Console at 'Compute' > 'Compute Engine'> 'Quotas', however you can always request increase quota at this link. You can have as many instances that you need in a target pool. Take a look to the Compute Engine Autoscaler that will help you to spin up machines as your service needs. The single IP provided for your LB is in charge of distributing incoming traffic across your multiple instances.

Related

does ceph RBD have the ability to load balance?

I don't know much about ceph. As far as I know, RBD is a distributed block storage device of ceph, and the same data should be stored on several computers that make up the ceph cluster. So, does this distributed block device(ceph RBD) have the ability to load balance? In other words, if multiple clients(In my situation,it would be QEMU)use this RBD block storage and they read the same data at the same time, will ceph RBD balance the traffic and send it to the client simultaneously from different computers in the cluster or just one computer will send its data to multiple clients? If I have a ceph cluster composed of 6 computers and a ceph cluster composed of 3 computers. Is there any difference in the performance of these RBD?
It's not a load balance but the distributed nature of ceph allows many clients to be served in parallel. If we focus on replicated pools with a size of 3 there are 3 different disks (on different hosts) storing the exact same object. But there's always a primary OSD which forwards write requests to the other copies. This make write requests a little slower but read requests are only served by the primary OSD so it's much faster than writing. And since clients "talk" directly to the OSDs (they get the address from the MON) many clients can be served in parallel. Especially because the OSDs don't store the RBDs as a single object but split into many objects grouped by "Placement Groups".
However, if you really talk about the exact same object being read by multiple clients you have to know that there are watchers on RBDs which lock them so only one client can change data. If you could describe your scenario with more detail we could provide more information.
If I have a ceph cluster composed of 6 computers and a ceph cluster
composed of 3 computers. Is there any difference in the performance of
these RBD?
It depends on the actual configuration (reasonable amount of PGs, crush rules, network etc.) but in general the answer is yes, the more ceph nodes you have the more clients you can serve in parallel. Ceph may not have the best performance compared to other storage systems (of course, depending on the actual setup) but it scales so well that the performance stays the same with an increasing amount of clients.
https://ceph.readthedocs.io/en/latest/architecture/
does ceph RBD have the ability to load balance?
Yes, it does. For RBD there's rbd_read_from_replica_policy option:
"… Policy for determining which OSD will receive read operations. If set to default, each PG’s primary OSD will always be used for read operations. If set to balance, read operations will be sent to a randomly selected OSD within the replica set. If set to localize, read operations will be sent to the closest OSD as determined by the CRUSH map
…
Note: this feature requires the cluster to be configured with a minimum compatible OSD release of Octopus. …"

How to increase queries per minute of Google Cloud SQL?

As in the question, I want to increase number of queries per second on GCS. Currently, my application is on my local machine, when it runs, it repeatedly sends queries to and receives data back from the GCS server. More specifically, my location is in Vietnam, and the server (free tier though) is in Singapore. The maximum QPS I can get is ~80, which is unacceptable. I know I can get better QPS by putting my application on the cloud, same location with the SQL server, but that alone requires a lot of configuration and works. Are there any solutions for this?
Thank you in advance.
colocating your application front-end layer with the data persistence layer should be your priority: deploy your code to the cloud as well
use persistent connections/connection pooling to cut on connection establishment overhead
free tier instances for Cloud SQL do not exist. What are you referring to here? f1-micro GCE instances are not free in Singapore region either.
depending on the complexity of your queries, read/write pattern, size of your dataset, etc. performance of your DB could be I/O bound. Ensuring your instance is provisioned with SSD storage and/or increasing the data disk size can help lifting IOPS limits, further improving DB performance.
Side note: don't confuse commonly used abbreviation GCS (Google Cloud Storage) with Google Cloud SQL.

How to handle resource limits for apache in kubernetes

I'm trying to deploy a scalable web application on google cloud.
I have kubernetes deployment which creates multiple replicas of apache+php pods. These have cpu/memory resources/limits set.
Lets say that memory limit per replica is 2GB. How do I properly configure apache to respect this limit?
I can modify maximum process count and/or maximum memory per process to prevent memory overflow (thus the replicas will not be killed because of OOM). But this does create new problem, this settings will also limit maximum number of requests that my replica could handle. In case of DDOS attack (or just more traffic) the bottleneck could be the maximum process limit, not memory/cpu limit. I think that this could happen pretty often, as these limits are set to worst case scenario, not based on average traffic.
I want to configure autoscaler so that it will create multiple replicas in case of such event, not only when the cpu/memory usage is near limit.
How do I properly solve this problem? Thanks for help!
I would recommend doing the following instead of trying to configuring apache to limit itself internally:
Enforce resource limits on pods. i.e let them OOM. (but see NOTE*)
Define an autoscaling metric for your deployment based on your load.
Setup a namespace wide resource-quota. This enforeces a clusterwide limit on the resources pods in that namespace can use.
This way you can let your Apache+PHP pods handle as many requests as possible until they OOM, at which point they respawn and join the pool again, which is fine* (because hopefully they're stateless) and at no point does your over all resource utilization exceed the resource limits (quotas) enforced on the namespace.
* NOTE: This is only true if you're not doing fancy stuff like websockets or stream based HTTP, in which case an OOMing Apache instance takes down other clients that are holding an open socket to the instance. If you want, you should always be able to enforce limits on apache in terms of the number of threads/processes it runs anyway, but it's best not to unless you have solid need for it. With this kind of setup, no matter what you do, you'll not be able to evade DDoS attacks of large magnitudes. You're either doing to have broken sockets (in the case of OOM) or request timeouts (not enough threads to handle requests). You'd need far more sophisticated networking/filtering gear to prevent "good" traffic from taking a hit.

Uneven cache hits

I have integrated twemproxy into web layer and I have 6 Elasticache(1 master , 5 read replicas) I am getting issue that the all replicas have same keys everything is same but cache hits on one replica is way more than others and I performed several load testing still on every test I am getting same result. I have separate data engine that writes on the master of this cluster and remaining 5 replicas get sync with it. So I am using twemproxy only for reading data from Elasticache not for sharding purpose. So my simple question is why i am getting 90% of hits on single read replicas of Elasticache it should distribute the hits evenly among all read replicas? right?
Thank you in advance
Twemproxy hashes everything as I recall. This means it will try to split keys among the masters you give it. If you have one master this means it hashes everything to one server. Thus, as far as it is concerned you have one server for acceptable queries. As such, it isn't helping you in this case.
If you want to have a single endpoint to distribute reads across a bank of identical slaves, you will need to put a TCP load balancer in front of the slaves and have your application talk to the IP:port of the load balancer. Common options are Nginx and HAProxy for software based ones, on AWS you could use their load balancer but you could run into various resource limits out of your control there, and pretty much any hardware load balancer would work as well (though this is difficult if not impossible on AWS).
Which load balancer to use is dependent on your (or your personnel's) comfort and knowledge level with each option.

How can I monitor the bandwidth of an object in the Amazon S3 service?

How can programmatically monitor the bandwidth of an object in AWS S3 service? I would like to do this to prevent excessive bandwidth usage by clients who are using our services and costing us more than we can afford. We like to limit 1TB bandwidth for each object.
The detailed usage reports are just per bucket, not per object.
What you could do is enable logging and parse the logs once an hour or so. It's certainly not instant, but it would prevent people from going way over your usage limits.
Also, s3stat is a good option up to a point. Once you start doing more than ~ 50 million requests per month, they have trouble crunching the data.