What are good compression-oriented application programming interfaces (APIs)? - api

What are good compression-oriented application programming interfaces (APIs)?
Do people still use the
1991 "data compression interface" draft standard, and the
1991 "Stream transformation algorithm interface" draft standard.
(Both draft standards by Ross Williams)?
Are there any alternatives to those draft standards?
(I'm particularly looking for C APIs, but links to compression-oriented APIs in C++ and other languages would also be appreciated).
I'm experimenting with some data compression algorithms.
Typically the compressed file I'm producing is composed of a series of blocks,
with a block header indicating which compression algorithm needs to be used to decompress the remaining data in that block -- Huffman, LZW, LZP, "stored uncompressed", etc.
The block header also indicates which filter(s) need to be used to convert the intermediate stream or buffer of data from the decompressor into a lossless copy of the original plaintext -- Burrows–Wheeler transform, delta encoding, XML end-tag restoration, "copy unchanged", etc.
Rather than use a huge switch statement that selects based on the "compression type", which calls the selected decompression algorithm or filter algorithm, each procedure with its own special number and order of parameters,
it simplifies my code if every algorithm has exactly the same API -- the same number and order of parameters, etc.
Rather than waiting for the decompressor to run through the entire input stream before handing its output to the first filter,
It would be nice if the API supported decompressed output data coming out the final filter "relatively quickly" (low-latency) after relatively little compressed data has been fed into the initial decompressor.
It would be nice if the API could be used in systems that have only one thread or process.
Currently I'm kludging together my own internal API,
re-using existing compression algorithm implementations by
writing short wrapper functions to convert between my internal API and the special number and order of parameters used by each implementation.
Is there an already-existing API that I could use rather than designing my own from scratch?
Where can I find such an API?

I fear such an "API" does not exist.
Especially, requirement such as "starting stage-2 while stage-1 is ongoing and unfinished" is completely implementation dependant; and cannot be added later by an API layer.
Btw, Maciej Adamczyk just tried the same as you.
He made an open source benchmark comparing multiple compression algorithms over a block-compression scenario. The code can be consulted here :
http://encode.ru/threads/1371-Filesystem-benchmark?p=26630&viewfull=1#post26630
He has been obliged to "encapsulate" all these different compressor interfaces in order to cope with the difference.
Now for the good thing : most compressors tend to have relatively similar C interface when it comes to compressing a block of data.
AS an example, they can be as simple as this one :
http://code.google.com/p/lz4/source/browse/trunk/lz4.h
So, in the end, the adaptation layer is not so heavy.

Related

How to implement a dual write system for SQL database and Elastic Search

I made my own research and found out that there is several ways to do that, but the most accurate is Change Data Capture. However, I don't see the benefits of it related to the asynchronous method for example :
Synchronous double-write: Elasticsearch is updated synchronously when
the DB is updated. This technical solution is the simplest, but it
faces the largest number of problems, including data conflicts, data
overwriting, and data loss. Make your choice carefully.
Asynchronous double-write: When the DB is updated, an MQ is recorded and used to
notify the consumer. This allows the consumer to backward query DB
data so that the data is ultimately updated to Elasticsearch. This
technical solution is highly coupled with business systems. Therefore,
you need to compile programs specific to the requirements of each
business. As a result, rapid response is not possible.
Change Data Capture (CDC): Change data is captured from the DB, pushed to an
intermediate program, and synchronously pushed to Elasticsearch by
using the logic of the intermediate program. Based on the CDC
mechanism, accurate data is returned at an extremely fast speed in
response to queries. This solution is less coupled to application
programs. Therefore, it can be abstracted and separated from business
systems, making it suitable for large-scale use. This is illustrated
in the following figure.
Alibabacloud.com
In another article it said that asynchronous is also risky if one datasource is down and we cannot easily rollback.
https://thorben-janssen.com/dual-writes/
So my question is : Should I use CDC to perform persistance operations for multiple datasources ? Why CDC is better than asynchronous given that is based on the same principle ?

Is it better better to open or to read large matrices in Julia?

I'm in the process of switching over to Julia from other programming languages and one of the things that Julia will let you hang yourself on is memory. I think this is likely a good thing, a programming language where you actually have to think about some amount of memory management forces the coder to write more efficient code. This would be in contrast to something like R where you can seemingly load datasets that are larger than the allocated memory. Of course, you can't actually do that, so I wonder how does R get around that problem?
Part of what I've done in other programming languages is work on large tabular datasets, often converted over to a R dataframe or a matrix. I think the way this is handled in Julia is to stream data in wherever possible, so my main question is this:
Is it better to use readline("my_file.txt") to access data or is it better to use open("my_file.txt", "w")? If possible, wouldn't it be better to access a large dataset all at once for speed? Or would it be better to always stream data?
I hope this makes sense. Any further resources would be greatly appreciated.
I'm not an extensive user of Julia's data-ecosystem packages, but CSV.jl offers the Chunks and Rows alternatives to File, and these might let you process the files incrementally.
While it may not be relevant to your use case, the mechanisms mentioned in #Przemyslaw Szufel's answer are used other places as well. Two I'm familiar with are the TiffImages.jl and NRRD.jl packages, both I/O packages mostly for loading image data into Julia. With these, you can load terabyte-sized datasets on a laptop. There may be more packages that use the same mechanism, and many package maintainers would probably be grateful to receive a pull request that supports optional memory-mapping when applicable.
In R you cannot have a data frame larger than memory. There is no magical buffering mechanism. However, when running R-based analytics you could use a disk.frame package for that.
Similarly, in Julia if you want to process data frames larger than memory you need to use am appropriate package. The most reasonable and natural option in Julia ecosystem is JuliaDB.
If you want to do something more low-level solution have a look at:
Mmap that provides Memory-mapped I/O that exactly solves the issue of conveniently handling data too large to fit into memory
SharedArrays that offers a disk mapped array with implementation based on Mmap.
In conclusion, if your data is data frame based - try JuliaDB, otherwise have a look at Mmap and SharedArrays (look at the filename parameter)

Analyzing bitstreams using Icestorm

I'm trying to understand the bitstreams generated by Yosys/arachne-pnr as described on http://www.clifford.at/icestorm/:
The recommended approach for learning how to use this documentation is to synthesize very simple circuits using Yosys and Arachne-pnr, run the icestorm tool icebox_explain on the resulting bitstream files, and analyze the results using the HTML export of the database mentioned above. icebox_vlog can be used to convert the bitstream to Verilog. The output file of this tool will also outline the signal paths in comments added to the generated Verilog code.
In order to understand the effect a change in the bitstream has, it would be helpful if I could change the .ex file and convert it back to an ASCII bitstream (instead of having to identify the bit manually) for uploading to the FPGA. Is there a way to do so?
I'm a bit concerned about damaging the FPGA with an invalid bitstream. Are there situations where this is known to happen? Is there a way to simulate a bitstream?
Also, it would be helpful to have some kind of “higher-level” explanation format which e.g. shows the IE/REN bits on the I/O blocks to which they correspond, not the one on which they have to be set in the bitstream. Is there such a format?
I know of the possibility to generate an equivalent Verilog circuit, but the problem with this is that it doesn't usually allow me a lossless round-trip back into a bitstream. Is there a way to generate an equivalent Verilog circuit which (e.g. by instantiating the blocks explicitly) yields the exact same bitstream when processed with Yosys/arachne-pnr?
I'm a bit concerned about damaging the FPGA with an invalid bitstream. Are there situations where this is known to happen? Is there a way to simulate a bitstream?
I have not damaged any FPGA so far. (I have, however, managed to damage the serial flash on one icestick after running some test that reprogrammed it in a loop.)
But this does not mean that you cannot damage your FPGA by programming it with an invalid bitstream. You could theoretically configure the FPGA in a way that produces a driver-driver conflict. I don't know how well the hardware deals with something like that. I have not run any experiments to find out..
Also, it would be helpful to have some kind of “higher-level” explanation format which e.g. shows the IE/REN bits on the I/O blocks to which they correspond, not the one on which they have to be set in the bitstream. Is there such a format?
icebox_vlog produces a higher-level output. But it does not output things like I/O blocks, so it might be too high-level for your needs.
I know of the possibility to generate an equivalent Verilog circuit, but the problem with this is that it doesn't usually allow me a lossless round-trip back into a bitstream. Is there a way to generate an equivalent Verilog circuit which (e.g. by instantiating the blocks explicitly) yields the exact same bitstream when processed with Yosys/arachne-pnr?
Not at the moment. But it should not be too hard to extend icebox_vlog to provide this functionality. So if you really need that, it might be something within your reach to add yourself.

QR code-like alternative with extremely low error rate and ability to read bent codes

I'm trying to find an alternative to QR codes (I'd also be willing to accept an entirely novel solution and implement it myself) that meets certain specifications.
First, the codes will often end up on thin pipes, and so need to be readable around a cylinder. The advantage to this is that the effect on the image from wrapping it around a cylinder is easy to express geometrically, and the codes will never be placed on a very irregular shape.
Second, read accuracy must be very high, as any read mistake would be extremely costly. If this means larger codes with more redundancy for better error correction, so be it.
Third, ability to be read by the average smartphone camera from a few inches out.
Fourth, storage space of around half a kilobyte per code.
Do you know of such a code?
The Data Matrix Rectangular Extension (DMRE) improves upon the standard set of rectangular Data Matrix symbol sizes in an algorithmically compatible manner, thus increasing the range of suitable applications with no real downsides.
Reliable cylindrical marking is a primary use case.
Regardless of symbology you will be unable to approach sufficient data density to achieve 0.5KB of binary data in a single compact, narrow symbol scanned using a standard camera phone. However, most 2D symbologies (DMRE included) support a feature called Structured Append that allows chaining of multiple symbols that can be scanned in any order to produce a single read when all components are accounted for.
If the data to be encoded is known to be highly structured (e.g. mostly numeric or alphanumeric) then the internal encoding process of Data Matrix will be more optimised than for general binary data. For example, the largest DMRE symbol (26×64) will provide up to 236 numeric characters, ~175 alphanumeric characters and only 116 bytes.
If the default error recovery rate is insufficient then including a checksum in the data may be appropriate.
DMRE has just been voted to be accepted as an ISO/IEC project and will likely become an international standard enjoying broad hardware and software support in due course.
Another option may be to investigate PDF417 which has a broader range of symbols sizes, however the data density is somewhat less than Data Matrix.
DMRE references: AIM specification and explanatory notes.

What would be a good (de)compression routine for this scenario

I need a FAST decompression routine optimized for restricted resource environment like embedded systems on binary (hex data) that has following characteristics:
Data is 8bit (byte) oriented (data bus is 8 bits wide).
Byte values do NOT range uniformly from 0 - 0xFF, but have a poisson distribution (bell curve) in each DataSet.
Dataset is fixed in advanced (to be burnt into Flash) and each set is rarely > 1 - 2MB
Compression can take as much as time required, but decompression of a byte should take 23uS in the worst case scenario with minimal memory footprint as it will be done on a restricted resource environment like an embedded system (3Mhz - 12Mhz core, 2k byte RAM).
What would be a good decompression routine?
The basic Run-length encoding seems too wasteful - I can immediately see that adding a header setion to the compressed data to put to use unused byte values to represent oft repeated patterns would give phenomenal performance!
With me who only invested a few minutes, surely there must already exist much better algorithms from people who love this stuff?
I would like to have some "ready to go" examples to try out on a PC so that I can compare the performance vis-a-vis a basic RLE.
The two solutions I use when performance is the only concern:
LZO Has a GPL License.
liblzf Has a BSD License.
miniLZO.tar.gz This is LZO, just repacked in to a 'minified' version that is better suited to embedded development.
Both are extremely fast when decompressing. I've found that LZO will create slightly smaller compressed data than liblzf in most cases. You'll need to do your own benchmarks for speeds, but I consider them to be "essentially equal". Both are light-years faster than zlib, though neither compresses as well (as you would expect).
LZO, in particular miniLZO, and liblzf are both excellent for embedded targets.
If you have a preset distribution of values that means the propability of each value is fixed over all datasets, you can create a huffman encoding with fixed codes (the code tree has not to be embedded into the data).
Depending on the data, I'd try huffman with fixed codes or lz77 (see links of Brian).
Well, the main two algorithms that come to mind are Huffman and LZ.
The first basically just creates a dictionary. If you restrict the dictionary's size sufficiently, it should be pretty fast...but don't expect very good compression.
The latter works by adding back-references to repeating portions of output file. This probably would take very little memory to run, except that you would need to either use file i/o to read the back-references or store a chunk of the recently read data in RAM.
I suspect LZ is your best option, if the repeated sections tend to be close to one another. Huffman works by having a dictionary of often repeated elements, as you mentioned.
Since this seems to be audio, I'd look at either differential PCM or ADPCM, or something similar, which will reduce it to 4 bits/sample without much loss in quality.
With the most basic differential PCM implementation, you just store a 4 bit signed difference between the current sample and an accumulator, and add that difference to the accumulator and move to the next sample. If the difference it outside of [-8,7], you have to clamp the value and it may take several samples for the accumulator to catch up. Decoding is very fast using almost no memory, just adding each value to the accumulator and outputting the accumulator as the next sample.
A small improvement over basic DPCM to help the accumulator catch up faster when the signal gets louder and higher pitch is to use a lookup table to decode the 4 bit values to a larger non-linear range, where they're still 1 apart near zero, but increase at larger increments toward the limits. And/or you could reserve one of the values to toggle a multiplier. Deciding when to use it up to the encoder. With these improvements, you can either achieve better quality or get away with 3 bits per sample instead of 4.
If your device has a non-linear μ-law or A-law ADC, you can get quality comparable to 11-12 bit with 8 bit samples. Or you can probably do it yourself in your decoder. http://en.wikipedia.org/wiki/M-law_algorithm
There might be inexpensive chips out there that already do all this for you, depending on what you're making. I haven't looked into any.
You should try different compression algorithms with either a compression software tool with command line switches or a compression library where you can try out different algorithms.
Use typical data for your application.
Then you know which algorithm is best-fitting for your needs.
I have used zlib in embedded systems for a bootloader that decompresses the application image to RAM on start-up. The licence is nicely permissive, no GPL nonsense. It does make a single malloc call, but in my case I simply replaced this with a stub that returned a pointer to a static block, and a corresponding free() stub. I did this by monitoring its memory allocation usage to get the size right. If your system can support dynamic memory allocation, then it is much simpler.
http://www.zlib.net/