I know SSL has a performance hit on your HTTP communication in terms of speed but is there much of a difference in the amount of data transferred?
ie, If a mobile device is paying a lot per kb, is there a huge difference? Does anyone have an estimate of how much of a difference?
Thanks for the help!
Matt
No, there is not much of a difference, neither in terms of "performance" nor in terms of bandwidth.
According to Google, a company one would hope is a reliable source on large-scale networking, the network-bandwidth overhead is less than 2%.
As Borealid pointed, the overhead is small. Usually. For an average request (which extends to multimegabyte files).
However if you have something like RESTful APIs to call, you need to ensure that persistent connection is used, otherwise with small request bodies SSL will add significant overhead. I can't tell you exact numbers now (simply because they vary depending on certificate size and number of certificates in the chain) but if you have to establish SSL session to send a 200-byte request and receive a 2-Kb response, SSL handshake can add another 5-7 Kb easily, so you see the overhead.
I just did a test using wireshark, downloading a 5-byte file from Amazon S3 over http and https to an iPad using a simple NSURLConnection request.
For http, total traffic was 1310 bytes.
For https, total traffic was 7099 bytes.
This was just for a single download in each case, and includes all back-and-forth over-the-wire traffic associated with the request, including DNS (about 200 bytes) and TCP handshaking (about 400 bytes for the http case).
Obviously the actual totals would change according to URL length and your particular SSL certificate; you could certainly have leaner headers than S3 delivers.
In theory, the SSL bandwidth overhead for a 1MB file should be about the same as a 1-byte file, i.e. about 5800 bytes in the above example, as encryption shouldn't increase the size of the data transmitted beyond the initial certificate and key exchange. So for large files it's negligible, but for small files can be significant, as pointed out by Eugene.
Related
When the load balancer can use round robin algorithm to distribute the incoming request evenly to the nodes why do we need to use the consistent hashing to distribute the load? What are the best scenario to use consistent hashing and RR to distribute the load?
From this blog,
With traditional “modulo hashing”, you simply consider the request
hash as a very large number. If you take that number modulo the number
of available servers, you get the index of the server to use. It’s
simple, and it works well as long as the list of servers is stable.
But when servers are added or removed, a problem arises: the majority
of requests will hash to a different server than they did before. If
you have nine servers and you add a tenth, only one-tenth of requests
will (by luck) hash to the same server as they did before. Consistent hashing can achieve well-distributed uniformity.
Then
there’s consistent hashing. Consistent hashing uses a more elaborate
scheme, where each server is assigned multiple hash values based on
its name or ID, and each request is assigned to the server with the
“nearest” hash value. The benefit of this added complexity is that
when a server is added or removed, most requests will map to the same
server that they did before. So if you have nine servers and add a
tenth, about 1/10 of requests will have hashes that fall near the
newly-added server’s hashes, and the other 9/10 will have the same
nearest server that they did before. Much better! So consistent
hashing lets us add and remove servers without completely disturbing
the set of cached items that each server holds.
Similarly, The round-robin algorithm is used to the scenario that a list of servers is stable and LB traffic is at random. The consistent hashing is used to the scenario that the backend servers need to scale out or scale in and most requests will map to the same server that they did before. Consistent hashing can achieve well-distributed uniformity.
Let's say we want to maintain user sessions on servers. So, we would want all requests from a user to go to the same server. Using round-robin won't be of help here as it blindly forwards requests in circularly fashion among the available servers.
To achieve 1:1 mapping between a user and a server, we need to use hashing based load balancers. Consistent hashing works on this idea and it also elegantly handles cases when we want to add or remove servers.
References: Check out the below Gaurav Sen's videos for further explanation.
https://www.youtube.com/watch?v=K0Ta65OqQkY
https://www.youtube.com/watch?v=zaRkONvyGr8
For completeness, I want to point out one other important feature of Consistent Hashing that hasn't yet been mentioned: DOS mitigation.
If a load-balancer is getting spammed with requests, (either from too many customers, an attack, or a haywire local service) a round-robin approach will apply the request spam evenly across all upstream services. Even spread out, this load might be too much for each service to handle. So what happens? Your loadbalancer, in trying to be helpful, has brought down your entire system.
If you use a modulus or consistent hashing approach, then only a small subset of services will be DOS'd by the barrage.
Being able to "limit the blast radius" in this manner is a critical feature of production systems
Consistent hashing is fits well for stateful systems(where context of the previous request is required in the current requests), so in stateful systems if previous and current request lands in different servers than for current request context is lost and system won't be able to fulfil the request, so in consistent hashing with the use of hashing we can route of requests to same server for that particular user, while in round robin we cannot achieve this, round robin is good for stateless systems.
I'm experimenting with webRTC and it seems that there's an arbitrary limit to how many bytes can be sent in each message. This guy whose example I used chose a limit of 100 (plus some) bytes. In my tests it seems to be close to 200 bytes. However from reading on TCP and UDP those protocols support packages of up to around 65kb and even when taking the MTU for different types of networks into account it should still be a lot more space available than ~200 bytes.
The only source I've found that mentions a hard limit is this WebRTC Data Channel Protocol draft but it only says TBD.
So my questions are:
if there's any source that specifies the current message size limit in any browser?
if I can assume that the limit is always the same, and if not if there's any way my app can be made aware of the limit?
The sharefest project found a way around the rate throttling - you can modify the outgoing offer to change the bandwidth setting (per http://www.ietf.org/rfc/rfc2327.txt)
Details here: https://github.com/Peer5/ShareFest/blob/master/public/js/peerConnectionImplChrome.js#L201
From my own experience you're still limited to ~800 bytes per message.
I've been testing sending jpegs to chrome 57 over the data channel, and messages up to 64k seem to be reliable now.
The webRTC data channel does have a reliability mechanism, it uses SCTP over DTLS (over UDP) - SCTP lets you set reliability and ordering behaviour, but by default WebRTC uses ordered+reliable - meaning you get similar semantics to that of TCP - except that the message boundaries are preserved - at least in theory.
In practice Chrome may deliver partial messages up to the javascript if it runs out of space so it is best to check that you have a complete message before processing it.
Would sending lots a small packets by UDP take more resources (cpu, compression by zlib, etc...). I read here that sending one big packet of ~65kBYTEs by UDP would probably fail so I'm thought that sending lots of smaller packets would succeed more often, but then comes the computational overhead of using more processing power (or at least thats what I'm assuming). The question is basically this; what is the best scenario for sending the maximum successful packets and keeping computation down to a minimum? Is there a specific size that works most of the time? I'm using Erlang for a server and Enet for the client (written in c++). Using Zlib compression also and I send the same packets to every client (broadcasting is the term I guess).
The maximum size of UDP payload that, most of the time, will not cause ip fragmentation is
MTU size of the host handling the PDU (most of the case it will be 1500) -
size of the IP header (20 bytes) -
size of UDP header (8 bytes)
1500 MTU - 20 IP hdr - 8 UDP hdr = 1472 bytes
#EJP talked about 534 bytes but I would fix it to 508. This is the number of bytes that FOR SURE will not cause fragmentation, because the minimum MTU size that an host can set is 576 and IP header max size can be 60 bytes (508 = 576 MTU - 60 IP - 8 UDP)
By the way i'd try to go with 1472 bytes because 1500 is a standard-enough value.
Use 1492 instead of 1500 for calculation if you're passing through a PPPoE connection.
Would sending lots a small packets by UDP take more resources ?
Yes, it would, definitely! I just did an experiment with a streaming app. The app sends 2000 frames of data each second, precisely timed. The data payload for each frame is 24 bytes. I used UDP with sendto() to send this data to a listener app on another node.
What I found was interesting. This level of activity took my sending CPU to its knees! I went from having about 64% free CPU time, to having about 5%! That was disastrous for my application, so I had to fix that. I decided to experiment with variations.
First, I simply commented out the sendto() call, to see what the packet assembly overhead looked like. About a 1% hit on CPU time. Not bad. OK... must be the sendto() call!
Then, I did a quick fakeout test... I called the sendto() API only once in every 10 iterations, but I padded the data record to 10 times its previous length, to simulate the effect of assembling a collection of smaller records into a larger one, sent less often. The results were quite satisfactory: 7% CPU hit, as compared to 59% previously. It would seem that, at least on my *NIX-like system, the operation of sending a packet is costly just in the overhead of making the call.
Just in case anyone doubts whether the test was working properly, I verified all the results with Wireshark observation of the actual UDP transmissions to confirm all was working as it should.
Conclusion: it uses MUCH less CPU time to send larger packets less often, then the same amount of data in the form of smaller packets sent more frequently. Admittedly, I do not know what happens if UDP starts fragging your overly-large UDP datagram... I mean, I don't know how much CPU overhead this adds. I will try to find out (I'd like to know myself) and update this answer.
534 bytes. That is required to be transmitted without fragmentation. It can still be lost altogether of course. The overheads due to retransmission of lost packets and the network overheads themselves are several orders of magnitude more significant than any CPU cost.
You're probably using the wrong protocol. UDP is almost always a poor choice for data you care about transmitting. You wind up layering sequencing, retry, and integrity logic atop it, and then you have TCP.
Hi I am writing a program that will send a file from client to server using UDP socket using different packet sizes for example 512B, 1KB and 2KB and i don't want use fixed buffer size in the receiver(server).I need some codes in Java that will allow both server and client to agree upon a packet size before transfer start. Many thanks
Don't you forget that UDP packets may be fragmented, duplicated and lost? There is a whole bunch of things to take care of, starting with lost packet retransmissions.
I hate to give a "don't do this" kind of answers, but for this one, just use TCP. And if you want some user-level "packets", you can have them with TCP also (prefix each one with its length, that's enough).
I am building an Arduino-based device that needs to send data over the internet to a remote server. It needs to do this as frequently as possible but also use as little bandwidth as possible. It will probably work over GSM/EDGE (cellular networking).
The data I'd like to send is about 40 bytes in size - really minimal. I'd like to send this packet to the server about once a minute, but also receive a packet of around that size in response once in a while.
The bandwidth on my server is no problem - the bandwidth on the device's internet connection is, i.e. the cellular data.
Do headers on mobile requests and responses count as part of the bandwidth?
Yes, the total packet size is all data that is sent. Assuming a TCP packet you lose 20 bytes right from the start. If you get intimate with Wireshark you can see exactly what's happening.