File upload/download using UDP - udp

We have web based j2ee application which allows file upload/download. Due to latency issue upload/download is slower for many users.
1) I read that sending data using UDP can improve data transfer speed. How can we send file data using UDP?
2) We are zipping file using GZIP before upload/download to reduce amount data transfer. Is there better method available improve data compression?

UDP is a protocol that does not guarantee the arrival of messages. You are most likely using a standard file transfer protocol like ftp which should suit you fine. Are your issues with latency or with bandwidth? You might be better of investigating why the link has a high latency or bandwidth issues, as this could prove to be an issue with other parts of your web application.
GZIP and other zipping tools are good for reducing the amount of data that is sent if you're willing up put up with the initial cost of compressing. These tools should have options so you can tweak the level of compression (i.e. take a long amount of time and compress optimally, or compress it quickly but have a larger zipped file). You will probably need to experiment and see what balance works the best for you.

1) Are there protocols faster than TCP on high latency links?
Yes, UDT is the primary example, but it is not a free trade, for instance consider you now need a custom frontend application to download files.
2) Is there better file compression than GZIP?
Yes, view the exhaustive list at http://www.maximumcompression.com/index.html, bzip2 and 7-zip are popular alternatives to gzip.
Note for specific domains, such as text, photographic images, scanned text, there are domain specific codecs which are more preferable.

Related

Fastest possible way to transfer a large directory over VPN

I have a problem. Everyday I have to upload my whole source code (it is a directory with several directories and files) to a server over VPN. The size of source code is around 250 MB. What I do everyday is, compress it (that reduces it's size to around 100 MB), then I transfer this zipped file over ftp to the server and finally unzip it there. It takes me around 20 minutes to transfer that.
I am sure there has got to be a better way of doing this than what I am doing. Either suggest me a better compression mechanism or faster upload method.
If you could set up a Version Control server it would be great, Mercurial and Git are perfect for this.
The other option is using rsync, which is a synchronizing tool that only uploads the differences between the two versions, avoiding repetitive transmission of data.
I'm assuming a UNIX-like environment here, but on windows the options are pretty much the same.
PD: this question is more fitted for superuser.

Should I use Gzip in this case?

I have a restful java api that provides data to a Node.js client (that gzip data to users). The question is, If they are running in the same machine, should I Gzip the data from the java api to the node.js application?
I'm asking this because this case, I dont have to worry to network latency, but Gzip compression may increase CPU utilization.
Does it worth use gzip this situation?
If the objective is to increase speed of the overall system, then using gzip to transfer across processes boundaries would not be very useful, particularly if the message size is small enough to fit within memory. If the message is too large to fit in memory, and some paging overhead is incurred, the benefit of gzip may be greater but still not anywhere near enough to justify using it. Gzip only makes sense when the speed of compression is significantly greater than the speed of communication. This is usually not the case with inter-process communication (even if it incurs pagefault overhead.)

Turn Binary Data into Airborne Packets?

Are there any software libraries and/or wireless drivers that make it possible to turn a sequence of binary data into a wireless packet in the air? For example, if someone used Airpcap / Wireshark to capture a series of interesting packets, is there some library that can be fed that binary data in order to turn it back into 802.11 wireless packets for testing purposes? If so, can we then also make minor alterations to the values of the packet in order to generate a wide variety of testing scenarios? Is anyone aware of tools/libraries that enable or assist this scenario?
While there are many tools around that may be used to replay and send data, one of the most advanced and flexible one is:
TCPReplay
http://tcpreplay.synfin.net/
You can edit the packets at different levels and then to send them.
Excerpt from their website:
... You can ... classify traffic as client or server, rewrite Layer 2, 3 and 4 headers and finally replay ...
There are some alternatives such as bitTwist and the WinPcap library.
Most Wi-fi tools are set up for cracking networks or stealing data so you might be able to re-purpose an existing attacker's tool or library (like ettercap or aircrack-ng) for your testing purposes. Most tools I've encountered focus on ethernet, tcp and http.
The following list of software might merit further investigation:
TCPReplay
Bit-Twist
aircrack-ng suite
Nemesis
Packet Editor
Bit-Twist and TCPReplay are your best bet if you're willing to compromise for something higher up in the protocol stack.

To compress or not to compress?

Enabling compressing (gzip/deflate) in the Apache server will reduce the size of the response but will add more CPU cycles, I will run a stress test with various response sizes but
I wanted to ask if in terms of server load is there any suggestion on when should I turn compressing on or off?
Thank you
In most cases web servers are limited by io (be it memory, network bandwidth, database, hard drive, ...), and have plenty of spare cpu cycle to use for compressing the pages before serving them, especially since this isn't even really that much cpu intensive, while it provide a huge usability boost for your users and save you bandwidth.
I think that as long as the server has a powerful CPU, use compression. Speed is usually the best feature that servers should have after security and stability.
It depends on what you want to achieve. Tipically, turning deflate on won't add a very significant footprint to your CPU performance and if your website/s include large text files (html, js, css, etc.) it's likely to make an important difference in bandwidth usage and page loading times. Of course, if what you want is to reduce system load and don't care much for bandwidth, this wouldn't be the right choice for you.
Another option you might find useful is installing a lightweight web server/proxy like Nginx, lighttpd or Varnish (I personally prefer the first one), and serve compressed static content with that (leaving heavier Apache processes only to handle the dynamic content). That would also be likely to result in a better overall performance of your server. But, again, these all depends on your scenario, what's your website/web application like, and what you want to achieve.

Grand Unified Theory of logging

Is their a Grand Unified Theory of logging? Shall we develop one? Question (just to show this is not a discussion :), how can I improve on the following? (note that I live mainly in the embedded world, but non-embedded suggestions are also welcome)
How do you log, when do you log, what do you log, what do you do with log files?
How do you log - I generally have macros, #ifdef TESTING, sort of thing. They write to RAM and a low priority process writes them out when the system is idle (using UDP, since I do embedded systems)
When do you log - same as voting, early and often. At every (in)significant program event, I log at varying levels. Events received, transaction succeed/fail, data updated, etc
What do you log - Fatal/Error/Warning/Info/Debug/Trace is covered in When to use the different log levels?
What do you do with log files - 1) keep them (in CVS), both pass and fail 2) capture everything and filter later in case I can't repeat a problem. I have tools to filter the log by "level" (Fatal/Error/etc), process, file, etc. And to draw message sequence charts, dump data structures, draw histograms of memory usage - what am I missing?
Hmmm, binary or ascii log file format? Ascii is bulkier, but binary requires more processing. I have done both, currently I use ascii
Question - did I miss anything, and how can I improve on this?
You could "instrument" your code in many different ways, everything from start-up/shut-down events to individual machine instruction execution (using a processor emulator). Of all the possibilities, what's worth doing? Don't just do it for the sake of completeness; have a specific goal in mind. A business case if you like, with a benefit you expect to receive. E.g.:
Insight into CPU task execution times/patterns to enable optimisation (if you need to improve performance).
Insight into other systems to resolve system integration issues (e.g. what messages is your VoIP box sending and receiving when it connects to a particular peer?)
Insight into the nature of errors (for field diagnostics)
Aid in development
Aid in validation testing
I imagine that there's no grand unified theory of logging, because what you do would depend on many details:
Quantity of data
Type of data
Events
Streamed audio/video
Available storage
Storage speed
Storage capacity
Available channels to extract data
Bandwidth
Cost
Availability
Internet connected 24×7
Site visit required
Need to unlock a rusty gate, climb a ladder onto a roof, to plug in a cable, after filling out OHS documentation
Need to wait until the Antarctic winter is over and the ice sheets thaw
Random access vs linear access (e.g. if you compress it, do you need to read from the start to decompress and access some random point?)
Need to survive error conditions
Watchdog reboots
Possible data corruption
Due to failing power supply
Due to unreliable storage media
Need to survive a plane crash
As for ASCII vs binary, I usually prefer to keep the logging simple, and put any nice presentation in a PC application that decodes the data. It's usually easier to create a user-friendly presentation in PC software (written in e.g. Python) rather than in the embedded system itself.
did I miss anything, and how can I
improve on this?
Asynchronous logging.
Using multiple log files for the same process for different logging abstractions. e.g. the process' activities are logged in a normal log file. And the process' stats (periodic statistics that you might be interested in) are logged in a separate stats log file.
Hmmm, binary or ascii log file format?
Ascii is bulkier, but binary requires
more processing. I have done both,
currently I use ascii
ASCII is good. More often than not, logs are meant to be used for debugging purposes. A human readable form eases and speeds this up.
However, if your logs are used mostly to record information which is used later on for analysis and generation of reports (e.g. stats or latencies etc.) a binary format would be preferred. You can go one step ahead and use a custom format along with a db service which does index based sorting, where the index can be a tuple of time with the event type.
--
One thing which may be helpful is to have a "maybeLogger" object which will accept log records for an operation which may or may not succeed, and then either ditch those records if the operation succeeds or fails in an uninteresting way, or log them if it does something interesting. This is relatively easy to do in something like .net. In an embedded system, it can only be done really easily if the amount of stuff to be logged is small enough to fit in free RAM, but one could probably use a garbage-collection-based approach to hold stuff in flash (have one 'stream' of data in flash for new log entries, and another for ones that are confirmed to be interesting; periodically move data which is known to be good from the first stream to the second).
Here's my $0.02.
I only log when I'm having a problem and need to track down the source. Usually this has to do with a customer's environment, so I can't just attach the debugger. My solution is to enable the Telnet port and use that to print out statements as to where the program is and values of variables.
I do ASCII only because it's over telnet.
Another aspect of telnet is that it is pretty simple. It's a TCP port with text being thrown out. Very little processing other than the normal TCP headaches.
The log files are dumped as soon as I get them because I have not tried to capture and save a telnet session. I guess I could with WireShark, but I don't need a history of that session. I just need to find the problem and verify a fix.