How does the performance of Kakadu change with compression ratio? - jpeg2000

Is there any analysis of how the compression and decompression performance of Kakadu changes with compression ratio? E.g., if we change the compression ratio from 2 to 10, how much slower will Kakadu (de)compress the same image?
(Kakadu is a library for working with JPEG2000.)

It turns out that the performance doesn't depend of the compression ratio.
The only exception is lossless compression, which is faster because it skips the step of cutting off the "lost" info.


which compression technique works better when using tools like datameer,presto and spark

We are working on choosing a better compression technique. We tried with bzip2 but its taking more time for compression.
I think there will be no direct answer to your question. What will be better or right will depend on your infrastructure, requirements and data flow.
You may have a look into "Performance comparison of different file formats and storage engines in the Hadoop ecosystem" or "Hadoop Compression. Choosing compression codec.".
Just from the perspective of speed, Snappy might be a good try.

What is the recommended compression for HDF5 for fast read/write performance (in Python/pandas)?

I have read several times that turning on compression in HDF5 can lead to better read/write performance.
I wonder what ideal settings can be to achieve good read/write performance at:
data_df.to_hdf(..., format='fixed', complib=..., complevel=..., chunksize=...)
I'm already using fixed format (i.e. h5py) as it's faster than table. I have strong processors and do not care much about disk space.
I often store DataFrames of float64 and str types in files of approx. 2500 rows x 9000 columns.
There are a couple of possible compression filters that you could use.
Since HDF5 version 1.8.11 you can easily register a 3rd party compression filters.
Regarding performance:
It probably depends on your access pattern because you probably want to define proper dimensions for your chunks so that it aligns well with your access pattern otherwise your performance will suffer a lot. For example if you know that you usually access one column and all rows you should define your chunk shape accordingly (1,9000). See here, here and here for some infos.
However AFAIK pandas usually will end up loading the entire HDF5 file into memory unless you use read_table and an iterator (see here) or do the partial IO yourself (see here) and thus doesn't really benefit that much of defining a good chunk size.
Nevertheless you might still benefit from compression because loading the compressed data to memory and decompressing it using the CPU is probably faster than loading the uncompressed data.
Regarding your original question:
I would recommend to take a look at Blosc. It is a multi-threaded meta-compressor library that supports various different compression filters:
BloscLZ: internal default compressor, heavily based on FastLZ.
LZ4: a compact, very popular and fast compressor.
LZ4HC: a tweaked version of LZ4, produces better compression ratios at the expense of speed.
Snappy: a popular compressor used in many places.
Zlib: a classic; somewhat slower than the previous ones, but achieving better compression ratios.
These have different strengths and the best thing is to try and benchmark them with your data and see which works best.

Possibilities to compress an image (size)

I am implementing an application. In that I need to find out a way to compress the image (size). Because it will help a lot for me to making the space comfortable in the database(server).Please help me on this.
Thanks in advance,
Sekhar Behalam.
Your options are to reduce the dimension of the images and/or reduce the quality by increasing the compression. Are the images photographic in nature (JPG is best) or simple solid colour graphics (use PNGs)?
If the images are JPG (lossy compression) you can simply load and re-save them with a higher compression setting. This can result in a large space saving.
The image quality will of course decline, but you can get away with quite a lot of compression in JPG before it is noticeable. What is acceptable of course is determined by the use of the images (which you have not stated).
Hope this helps.
Also consider pngcrush, which is a utility that is included with the SDK. In the Project Settings in Xcode, there's an option to "Compress PNG Images." Make sure to check that. Note that this only works for resource images (as far as I know)—but you haven't stated if these images will be user-created/instantiated, or brought into the app bundle directly.

Streaming Jpeg Resizer

Does anyone know of any code that does streaming Jpeg resizing. What I mean by this is reading a chunk of an image (depending on the original source and destination size this would obviously vary), and resizing it, allowing for lower memory consumption when resizing very large jpegs. Obviously this wouldn't work for progressive jpegs (or at least it would become much more complicated), but it should be possible for standard jpegs.
The design of JPEG data allows simple resizing to 1/2, 1/4 or 1/8 size. Other variations are possible. These same size reductions are easy to do on progressive jpegs as well and the quantity of data to parse in a progressive file will be much less if you want a reduced size image. Beyond that, your question is not specific enough to know what you really want to do.
Another simple trick to reduce the data size by 33% is to render the image into a RGB565 bitmap instead of RGB24 (if you don't need the full color space).
I don't know of a library that can do this off the shelf, but it's certainly possible.
Lets say your JPEG is using 8x8 pixel MCUs (the units in which pixels are grouped). Lets also say you are reducing by a factor to 12 to 1. The first output pixel needs to be the average of the 12x12 block of pixels at the top left of the input image. To get to the input pixels with a y coordinate greater than 8, you need to have decoded the start of the second row of MCUs. You can't really get to decode those pixels before decoding the whole of the first row of MCUs. In practice, that probably means you'll need to store two rows of decoded MCUs. Still, for a 12000x12000 pixel image (roughly 150 mega pixels) you'd reduce the memory requirements by a factor of 12000/16 = 750. That should be enough for a PC. If you're looking at embedded use, you could horizontally resize the rows of MCUs as you read them, reducing the memory requirements by another factor of 12, at the cost of a little more code complexity.
I'd find a simple jpeg decoder library like Tiny Jpeg Decoder and look at the main loop in the jpeg decode function. In the case of Tiny Jpeg Decoder, the main loop calls decode_MCU, Modify from there. :-)
You've got a bunch of fiddly work to do to make the code work for non 8x8 MCUs and a load more if you want to reduce by a none integer factor. Sounds like fun though. Good luck.

gzipping server responses worse off

Following yahoos performance teams advice, I decided to enable mod_deflate on Apache. In checking the results (using HTTPWatch), the gzipped responses took on average a 100 milliseconds more than the non-gzipped?
The server is on average load using <5% of CPU. Compression level is at minimum?
have you guys experienced results as such or read about it? I very much appreciate any input. Thanks.
What kind of responses are you sending? You won't notice any benefits in compressing certain kinds of binary data, e.g. images, Flash animations and other such assets; GZip works best for text.
Also, compressing data will incur a slight performance overhead on both server and client, but you expected that, right?
I don't think Yahoo's point is that gzipping will be faster. It's that if you look at the marginal cost of bandwidth versus CPU power, you're better off using more CPU if it allows you to use less bandwidth.
I'd agree with Rob that you need to figure out if the delay is due to Apache not serving the file as quickly because it has to go through compression or if its something else. Just watching the HTTP response is not going to tell you WHY its slower, just that it is.