Compact decompression library for embedded use

Compact decompression library for embedded use - embedded

We're currently creating a device for a customer that will get a block of data (like, say, 5-10KB) from a PC application. This is a bit simplified, so assume that the data must be passed and uncompressed a lot, not just once a year. The communication channel is really, really slow, so we'd like to compress the data beforehand, pass to the device and let it uncompress the data to its internal flash. The device itself, however, runs on a micro controller that is not really fast and does not have a lot of memory. It has enough flash memory to store the result, and can uncompress the data block as it is received, but it may not have enough RAM to store the entire compressed or uncompressed (or even both!) data blocks. And of course, it doesn't have an operating system or other luxury.
This means we need a sufficiently fast uncompression algorithm that does not use a lot of memory. The compression can be slow and ugly, since we're doing it on the PC side. C or .NET code preferred though for compression, to make things easier. The uncompression code should be in C, since it's unlikely that someone has an ASM optimized version for our controller.
We found LZO, which would be almost perfect for us, but it has a so-called "free" license (GPL) by default, which makes it totally unusable for our customer. The author says that commercial licenses are available on request, but unfortunately he's currently unreachable (for non-technical reasons, like the news on his site say).
I found a few other libraries, including the puff.c from zlib, and we're still investigating, but I thought I'd ask for your experience:
Which compression algorithm and/or library do you recommend for embedded purposes, given that the decompression device has really limited resources and source code and a commercial license are required?

You might want to check out one of these which are not GPL and are fairly compact implementations:
fastlz - MIT license, fairly simple code
lzjb - sun CDDL, used in zfs for compression, simple and very short
liblzf - BSD-style license, small, fast
lzfx - BSD-style, based on liblzf, small, fast
Those algorithms are all based on the original algorithm of Lempel–Ziv–Welch (They have all LZ in common)
https://en.wikipedia.org/wiki/Lempel–Ziv–Welch

I have used LZSS. I used code from Haruhiko Okumura as base. It uses the last portion of uncompressed data(2K) as dictionary. This code can be modified to not require a temporary ring buffer if you have no memory. The licensing is not clear from his site but some versions was released with a "Use, distribute, and modify this program freely" line included and the code is used by commercial vendors.
Here is an implementation based on the same code that forms part of the Allegro game library. Allegro licensing is giftware or zlib.
Another option could be the lzfx lib that implement LZF. I have not used it yet but it seems nice. Also uses previous results so it has low memory requirements and is released under a BSD Licence.

One alternative could be the LZ77 coder/decoder in the Basic Compression Library.
Since it uses the unpacked data history for its dictionary, it uses no extra RAM except for the compressed and uncompressed data buffers. It should be ideal for your use case (zlib license, portable C). The entire decoder is just 70 lines of code (including comments), and really fast.
EDIT: Yet another alternative is the liblzg library, which is a refined version of the aforementioned LZ77 coder/decoder. It compresses better, is generally faster, and requires no memory for decompression. It is very, very free (zlib license).

I would recommend ZLIB.
From the wiki:
The library provides facilities for control of processor and memory use
There are also facilities for conserving memory. These are probably only useful in restricted memory environments such as some embedded systems.
zlib is also used in many embedded devices because the code is portable, liberally-licensed
and has a relatively small memory footprint.

A lot depends on the nature of the data. If it is simple enough, you may not need anything very fancy. For example if the downloaded data was a simple image (for example something like a line graph), a simple run length encoding could cut the data down by a factor of ten and you would need trivial amounts of code and RAM to decode it.
Of course if the data is more complex, then this won't be of much use. But I'd start by exploring the data being sent and see if there are specific aspects which would allow you to compress it more effectively than using a general purpose algorithm.

You might want to check out Jørgen Ibsen's aPlib - a couple of excerpts from the product page:
The compression ratios achieved by aPLib combined with the speed and tiny footprint of the depackers (as low as 169 bytes!) makes it the ideal choice for many products.
aPLib is free to use even for commercial use, please check the included license for details.
The compression library is closed-source (yes, I know this could be a problem), but has precompiled libraries for a variety of compilers and operating systems, including both 32- and 64-bit editions. There's C and x86 assembly source code for the decompressor.
EDIT:
Jørgen also has a free (zlib license) BrifLZ library you could check out if not having compressor source is a big problem.

I've seen people use 7zip on an embedded system with memory in the tens of megabytes.

there is a specific custom version of zlib for Micro-controller based on ARM Cortex-M (M0, M0+, M1, M3, M4)
https://github.com/kuym/Zlib-ARM-Cortex-M

Related

is it recommended to use SPI flash to run code instead internal flash due to memory limitation of internal flash?

We used the LPC546xx family microcontroller in our project, currently, at the initial stage, we are finalizing the software and hardware requirements. The basic firmware size (which contains RTOS, 3rd party stack, library, etc...) currently is 480 KB. Now once full application developed than the size will exceed the internal flash size (512KB) and plus we needed storage which can hold firmware update image separately.
So we planned to use SPI flash (S25LP064A-JBLE, http://www.issi.com/WW/pdf/IS25LP032-064-128.pdf, serial flash memory) of 4MB\8MB to boot and run firmware.
is it recommended to run code from SPI flash? how can I map external flash memory directly to CPU memory space? Can anyone give an example that contains this memory mapping(linker script etc..) or demo application in which LPC546xx uses SPI FLASH?

Generally speaking it's not recommended, or differently put: the closer to the CPU the better. Both S25LP064A and LPC546xx however support XIP, so it is viable.
This is not a trivial issue as many aspects are affecting. I.e. issue is best avoided and should really have been ironed out in the planning stage. Embedded Systems are more about compromising than anything and making the right/better choices takes skill end experience.
Same question with replies on the NXP forum: link
512K of NVRAM is huge. There are almost certainly room for optimisations even if 3'rd party libraries are used.
On a related note this discussion concerning XIP should give valuable insight: link.
I would strongly encourage use of file-systems if not done already, for which external storage is much better suited. The further from the computational unit, the more relevant. That's not XIP and the penalty is copy-to-RAM either way you do it. I.e. performance will be slower. But in my experience, the need for speed has often-times not been thoroughly considered and at least partially greatly overestimated.
Regarding your mentioning of RTOS and FW-upgrade:
Unless it's a poor RTOS there's file-system awareness built in. Especially for FW upgrading (Note: you'll need room for 3 images, factory reset included), unless already supported by the SoC-vendor by some other means (OTA), it will make life much easier and less risky. If there's no FS-awareness, it can be added.
FW upgrade requires a lot of extra storage. More if simpler. Simpler is however also safer which especially for FW upgrades matters hugely. In the simplest case (binary flat image), you'll need at least twice the amount of memory you're already consuming.
All-in-all: I think the direction you're going is viable and depending on the actual situation perhaps your only choice.

Data compression for FPGA bitstream

I'm looking for a good compression algorithm to use for decompressing data from a flash chip to load to an FPGA (a Xilinx Spartan6-LX9, on the Mojo development board). It must be fast to decompress and not require a lot of working memory to do so, as the CPU (an ATmega16U4) is clocked at 8 MHz and has only 2 KiB of RAM and 16 KiB of program flash, some of which is already in use. Compression speed is not particularly important, as compression will only be run once on a computer, and the compression algorithm need not work on arbitrary inputs.
Here is an example bitstream. The format is documented in the Spartan-6 FPGA Configuration manual (starting on page 92).
Generally, the patterns present in the data fall into a few categories, and I'm not sure which of these will be easiest to exploit given the constraints I'm working with:
The data is organized overall into a set of packets of a known format. Certain parts of the bitstream are somewhat "stereotyped" (e.g, it will always begin and end by writing to certain registers), and other commands will appear in predictable sequences.
Some bytes are much more common than others. 00 and FF are by far the most frequent, but other bytes with few bits set (e.g, 80, 44, 02) are also quite common.
Runs of 00 and FF bytes are very frequent. Other patterns will sometimes appear on a local scale (e.g, a 16-byte sequence will be repeated a few times), but not globally.
What would be an appropriate compression algorithm (not a library, unless you're sure it'll fit!) for this task, given the constraints?

You should consider using LZO compression library. It has probably one of the fastest decompressors in existence, and decompression requires no memory. Compression, however, needs 64KB of memory (or 8KB for one of compression levels). If you only need to decompress, it might just work for you.
LZO project even provides special cut-down version of this library called miniLZO. According to the author, miniLZO compiles to less than 5KB binary on i386. Since you have 16KB flash, it might just fit into your constraints.
LZO compressor is currently used by UPX (ultimate packer for executables).

From your description, I would recommend run-length encoding followed by Huffman coding the bytes and runs. You would need very little memory over the data itself, mainly for accumulating frequencies and building a Huffman tree in place. Less than 1K.
You should make a histogram of the lengths of the runs to help determine how many bits to allocate to the run lengths.

Have you tried the built-in bitstream compression? That can work really well on non-full devices. It's a bitgen option, and the FPGA supports it out of the box, so it has no resource impact on your micro.
The way the compression is achieved is described here:
http://www.xilinx.com/support/answers/16996.html
Other possibilities have been discussed on comp.arch.fpga:
https://groups.google.com/forum/?fromgroups#!topic/comp.arch.fpga/7UWTrS307wc
It appears that one poster implemented LZMA successfully on a relatively constrained embedded system. You could use 7zip to check what sort of compression ratio you might expect and see if it's good enough before committing to implementation of the embedded part.

Optimal way to move memory in x86 and ARM?

I am interested knowing the best approach for bulk memory copies on an x86 architecture. I realize this depends on machine-specific characteristics. The main target is typical desktop machines made in the last 4-5 years.
I know that in the old days MOVSD with REPE was nominally the fastest approach because you could move 4 bytes at a time, but I have read that nowadays MOVSB is just as fast and is simpler to write, so you may as well do a byte move and just forget about the complexities of a 4-byte move.
A surrounding question is whether MOVxx instructions are worth it at all. If the CPU can run so much faster than the memory bus, then maybe it is pointless to use a CISC move and you may as well use a plain MOV. This would be most attractive because then I could use the same algorithms on other processor architectures like ARM. This brings up the analogous question of whether ARM's specialized instructions for bulk memory moves (which are totally different than Intels) are worth it or not.
Note: I have read section 3.7.6 in the Intel Optimization Reference Manual so I am familiar with the basics. I am hoping someone can relate practical experience in the area beyond what is in this manual.

Modern Intel and AMD processors have optimisations on REP MOVSB that make it copy entire cache lines at a time if it can, making it the best (may not be fastest, but pretty close) method of copying bulk data.
As for ARM, it depends on the architecture version, but in general using an unrolled loop would be the most efficient.

GLib for embedded Linux?

I am wondering~
How large is GLib? Can it be used directly on embedded system? Is it usually too large for embedded system?
Is there a embedded system version of GLib?
Thanks

"Embedded" barely means anything. The systems range from tiny (few registers of volatile memory, read-only code memory) to nearly as big as computers have ever been on this planet.
Glib isn't huge as desktop libraries go, but it's not a tiny microcontroller library either. There's obviously a vague line somewhere in that range below which it just won't fit, but without knowing the system there's no way to tell.
Based on a comment it seems your environment ranks "ginormous" in the embedded scale, so glib will probably fit if the system underneath makes porting it worthwhile. If you have a GPOS (like Linux or BSD), there might already be a port.

If you can fit GLibC then you can fit GLib.

GLIB can be used for embedded systems and is quite small. However, newer
versions of GLIB require a lot more libraries to build. If what you need
is a subset like async queues, threads, queues, lists, etc., you might
consider getting the code and building GLIB 1.2.x instead of the latest.
I have been considering building a subset of GLIB for embedded work that
removes some of the newer features.

How large storage space do you have on your embedded device? There is no problem compiling Glib for an embedded device, given that you have enough storage on your device. Since we don't know anything about your device, it's kind of hard to say though.

What is a good FAT file system for ARM7-TDMI

I'm using the ARM7TDMI-S (NXP processor) and I need a file system capable of reading/writing to an SD card. There are so many available, what have people used and been happy with? One that requires the least amount of setup is best - so the less I have to do to get it started (i.e. write device drivers to NXP's hardware) the better.
I am currently using CMX's RTOS as the OS for this project.

I suggest that you use either EFSL or Chan's FAT File System Module. I have used both on MMC/SC cards without problems. The choice between them may come down to the license terms and pre-existing ports to your target. Martin Thomas's ARM Projects site has examples for both libraries.

FAT is popular precisely because it's so simple. The main problems with FAT are performance (because of its simplicity, it's not very fast) and its limited size (2GB for FAT16, though 2TB for FAT32)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas