does using gzip on command line have the same result as using zlib programmatically? - gzip

I can compress a file using gzip on the command line or zlib programmatically. Are the created deflated files equal? I can live with different headers.
Also, what is the relation between gzip's compression factor (1-9) and zlib's parameters (level, window bits and mem level)?

No, they're not exactly equal, but both are compatible with either decompressor. In other words, gzip streams produced by either are interchangeable between them or with any other compliant decompressor.
zlib also has compression levels 1-9, which behave approximately the same as gzip's levels 1-9. The window bits and memory level parameters are not available on gzip, where those are fixed effectively at zlib's defaults. Those parameters both allow embedded zlib applications use less memory at the cost of reduced compression effectiveness.

Related

Data compression for FPGA bitstream

I'm looking for a good compression algorithm to use for decompressing data from a flash chip to load to an FPGA (a Xilinx Spartan6-LX9, on the Mojo development board). It must be fast to decompress and not require a lot of working memory to do so, as the CPU (an ATmega16U4) is clocked at 8 MHz and has only 2 KiB of RAM and 16 KiB of program flash, some of which is already in use. Compression speed is not particularly important, as compression will only be run once on a computer, and the compression algorithm need not work on arbitrary inputs.
Here is an example bitstream. The format is documented in the Spartan-6 FPGA Configuration manual (starting on page 92).
Generally, the patterns present in the data fall into a few categories, and I'm not sure which of these will be easiest to exploit given the constraints I'm working with:
The data is organized overall into a set of packets of a known format. Certain parts of the bitstream are somewhat "stereotyped" (e.g, it will always begin and end by writing to certain registers), and other commands will appear in predictable sequences.
Some bytes are much more common than others. 00 and FF are by far the most frequent, but other bytes with few bits set (e.g, 80, 44, 02) are also quite common.
Runs of 00 and FF bytes are very frequent. Other patterns will sometimes appear on a local scale (e.g, a 16-byte sequence will be repeated a few times), but not globally.
What would be an appropriate compression algorithm (not a library, unless you're sure it'll fit!) for this task, given the constraints?
You should consider using LZO compression library. It has probably one of the fastest decompressors in existence, and decompression requires no memory. Compression, however, needs 64KB of memory (or 8KB for one of compression levels). If you only need to decompress, it might just work for you.
LZO project even provides special cut-down version of this library called miniLZO. According to the author, miniLZO compiles to less than 5KB binary on i386. Since you have 16KB flash, it might just fit into your constraints.
LZO compressor is currently used by UPX (ultimate packer for executables).
From your description, I would recommend run-length encoding followed by Huffman coding the bytes and runs. You would need very little memory over the data itself, mainly for accumulating frequencies and building a Huffman tree in place. Less than 1K.
You should make a histogram of the lengths of the runs to help determine how many bits to allocate to the run lengths.
Have you tried the built-in bitstream compression? That can work really well on non-full devices. It's a bitgen option, and the FPGA supports it out of the box, so it has no resource impact on your micro.
The way the compression is achieved is described here:
http://www.xilinx.com/support/answers/16996.html
Other possibilities have been discussed on comp.arch.fpga:
https://groups.google.com/forum/?fromgroups#!topic/comp.arch.fpga/7UWTrS307wc
It appears that one poster implemented LZMA successfully on a relatively constrained embedded system. You could use 7zip to check what sort of compression ratio you might expect and see if it's good enough before committing to implementation of the embedded part.

Should I use Gzip in this case?

I have a restful java api that provides data to a Node.js client (that gzip data to users). The question is, If they are running in the same machine, should I Gzip the data from the java api to the node.js application?
I'm asking this because this case, I dont have to worry to network latency, but Gzip compression may increase CPU utilization.
Does it worth use gzip this situation?
If the objective is to increase speed of the overall system, then using gzip to transfer across processes boundaries would not be very useful, particularly if the message size is small enough to fit within memory. If the message is too large to fit in memory, and some paging overhead is incurred, the benefit of gzip may be greater but still not anywhere near enough to justify using it. Gzip only makes sense when the speed of compression is significantly greater than the speed of communication. This is usually not the case with inter-process communication (even if it incurs pagefault overhead.)

Zlib compression on MSP430

Has anyone attempted using zlib compression on an MSP430? Do you have any advice on how to compile to library for use in an MSP430 project (I am using IAR Embedded Workbench)?
According to MSP430 datasheets and Wikipedia article, you don't have enough RAM (it has at most 16 KiB) even for just sliding window (32 KiB). So, you cannot use any deflate algorithm for MSP430. Considering ZLIB is a just deflate implementation that's true for ZLIB too. Even you try to write your own deflate implementation you cannot succeeded. Because, deflate needs 32 KiB for sliding dictionary and some extra memory for it's huffman trees. That's only for decompression part. For compression, you need extra memory for hash chain match finder which is 7.5 * dictionary size = 240 KiB (according to 7zip's deflate implementation). If you really need compression for that small architecture, I advice to look at custom byte coded LZSS compression algorithms. They're fast and light-weight. But, not strong enough to compete with deflate due to especially entropy coding differences.
I used to build zlib as a test for processor development but as the world started to transition to 64 bit their haphazard use of unsigned long and unsigned int and mixing without carefully typecasting, etc would wreak havoc on the compilers. It may have settled down now but I walked away from using it.
it does need/want a ton of memory, the msp430 is particularly small on the ram side compared to some of the competition.
I have an msp430 simulator you can use http://github.com/dwelch67/msp430sim. which is easy to configure to have lots of ram, more than you will find in a chip. Althogh zlib may still want the full 64k and not leave you with any. Just need to see what happens. Maybe i will take this on and try it myself as a test for my simulator. On the above simulator or maybe one of my others I have a different compression tool used that has a very (relatively) small memory footprint. Not sure if you need zlib specifically or if you just need some sort of decompression in general.
I have built it for a number of targets, not specifically MSP430, but that should not matter. It is all ISO C and dependent only on standard library calls. It uses dynamic memory allocation, so you'll need a heap.

File upload/download using UDP

We have web based j2ee application which allows file upload/download. Due to latency issue upload/download is slower for many users.
1) I read that sending data using UDP can improve data transfer speed. How can we send file data using UDP?
2) We are zipping file using GZIP before upload/download to reduce amount data transfer. Is there better method available improve data compression?
UDP is a protocol that does not guarantee the arrival of messages. You are most likely using a standard file transfer protocol like ftp which should suit you fine. Are your issues with latency or with bandwidth? You might be better of investigating why the link has a high latency or bandwidth issues, as this could prove to be an issue with other parts of your web application.
GZIP and other zipping tools are good for reducing the amount of data that is sent if you're willing up put up with the initial cost of compressing. These tools should have options so you can tweak the level of compression (i.e. take a long amount of time and compress optimally, or compress it quickly but have a larger zipped file). You will probably need to experiment and see what balance works the best for you.
1) Are there protocols faster than TCP on high latency links?
Yes, UDT is the primary example, but it is not a free trade, for instance consider you now need a custom frontend application to download files.
2) Is there better file compression than GZIP?
Yes, view the exhaustive list at http://www.maximumcompression.com/index.html, bzip2 and 7-zip are popular alternatives to gzip.
Note for specific domains, such as text, photographic images, scanned text, there are domain specific codecs which are more preferable.

vaadin cache.html size

Looking into Chrome Developer Tools' Audits tab when launching my Vaadin-based web application, I have been horrified to see that the cache.html file was > 4Mb big! I thought that Vaadin's runtime was at worst a few hundred k's. I need to enable gzip compression, but still... how is it even possible that such a huge file is meant to be sent to the browser?
4MB is too big. Make sure you are not using GWT's "draft compilation" as it makes the resulting widgetset huge.
The right size is around 400-600kB (uncompressed). The size depends on what widgets are included in the set. Adding new widgets makes it a little bigger while leaving out some unused widgets makes it smaller. Realistic minimum size is between 200-300kB.
Most important is that you have enabled the gzip encoding on you HTTP server. That way only 80kB to 200kB is actually transferred to the browser.
See also: http://vaadin.com/forum/-/message_boards/message/163146