Zlib compression on MSP430 - embedded

Has anyone attempted using zlib compression on an MSP430? Do you have any advice on how to compile to library for use in an MSP430 project (I am using IAR Embedded Workbench)?

According to MSP430 datasheets and Wikipedia article, you don't have enough RAM (it has at most 16 KiB) even for just sliding window (32 KiB). So, you cannot use any deflate algorithm for MSP430. Considering ZLIB is a just deflate implementation that's true for ZLIB too. Even you try to write your own deflate implementation you cannot succeeded. Because, deflate needs 32 KiB for sliding dictionary and some extra memory for it's huffman trees. That's only for decompression part. For compression, you need extra memory for hash chain match finder which is 7.5 * dictionary size = 240 KiB (according to 7zip's deflate implementation). If you really need compression for that small architecture, I advice to look at custom byte coded LZSS compression algorithms. They're fast and light-weight. But, not strong enough to compete with deflate due to especially entropy coding differences.

I used to build zlib as a test for processor development but as the world started to transition to 64 bit their haphazard use of unsigned long and unsigned int and mixing without carefully typecasting, etc would wreak havoc on the compilers. It may have settled down now but I walked away from using it.
it does need/want a ton of memory, the msp430 is particularly small on the ram side compared to some of the competition.
I have an msp430 simulator you can use http://github.com/dwelch67/msp430sim. which is easy to configure to have lots of ram, more than you will find in a chip. Althogh zlib may still want the full 64k and not leave you with any. Just need to see what happens. Maybe i will take this on and try it myself as a test for my simulator. On the above simulator or maybe one of my others I have a different compression tool used that has a very (relatively) small memory footprint. Not sure if you need zlib specifically or if you just need some sort of decompression in general.

I have built it for a number of targets, not specifically MSP430, but that should not matter. It is all ISO C and dependent only on standard library calls. It uses dynamic memory allocation, so you'll need a heap.

Related

is it recommended to use SPI flash to run code instead internal flash due to memory limitation of internal flash?

We used the LPC546xx family microcontroller in our project, currently, at the initial stage, we are finalizing the software and hardware requirements. The basic firmware size (which contains RTOS, 3rd party stack, library, etc...) currently is 480 KB. Now once full application developed than the size will exceed the internal flash size (512KB) and plus we needed storage which can hold firmware update image separately.
So we planned to use SPI flash (S25LP064A-JBLE, http://www.issi.com/WW/pdf/IS25LP032-064-128.pdf, serial flash memory) of 4MB\8MB to boot and run firmware.
is it recommended to run code from SPI flash? how can I map external flash memory directly to CPU memory space? Can anyone give an example that contains this memory mapping(linker script etc..) or demo application in which LPC546xx uses SPI FLASH?
Generally speaking it's not recommended, or differently put: the closer to the CPU the better. Both S25LP064A and LPC546xx however support XIP, so it is viable.
This is not a trivial issue as many aspects are affecting. I.e. issue is best avoided and should really have been ironed out in the planning stage. Embedded Systems are more about compromising than anything and making the right/better choices takes skill end experience.
Same question with replies on the NXP forum: link
512K of NVRAM is huge. There are almost certainly room for optimisations even if 3'rd party libraries are used.
On a related note this discussion concerning XIP should give valuable insight: link.
I would strongly encourage use of file-systems if not done already, for which external storage is much better suited. The further from the computational unit, the more relevant. That's not XIP and the penalty is copy-to-RAM either way you do it. I.e. performance will be slower. But in my experience, the need for speed has often-times not been thoroughly considered and at least partially greatly overestimated.
Regarding your mentioning of RTOS and FW-upgrade:
Unless it's a poor RTOS there's file-system awareness built in. Especially for FW upgrading (Note: you'll need room for 3 images, factory reset included), unless already supported by the SoC-vendor by some other means (OTA), it will make life much easier and less risky. If there's no FS-awareness, it can be added.
FW upgrade requires a lot of extra storage. More if simpler. Simpler is however also safer which especially for FW upgrades matters hugely. In the simplest case (binary flat image), you'll need at least twice the amount of memory you're already consuming.
All-in-all: I think the direction you're going is viable and depending on the actual situation perhaps your only choice.

Data compression for FPGA bitstream

I'm looking for a good compression algorithm to use for decompressing data from a flash chip to load to an FPGA (a Xilinx Spartan6-LX9, on the Mojo development board). It must be fast to decompress and not require a lot of working memory to do so, as the CPU (an ATmega16U4) is clocked at 8 MHz and has only 2 KiB of RAM and 16 KiB of program flash, some of which is already in use. Compression speed is not particularly important, as compression will only be run once on a computer, and the compression algorithm need not work on arbitrary inputs.
Here is an example bitstream. The format is documented in the Spartan-6 FPGA Configuration manual (starting on page 92).
Generally, the patterns present in the data fall into a few categories, and I'm not sure which of these will be easiest to exploit given the constraints I'm working with:
The data is organized overall into a set of packets of a known format. Certain parts of the bitstream are somewhat "stereotyped" (e.g, it will always begin and end by writing to certain registers), and other commands will appear in predictable sequences.
Some bytes are much more common than others. 00 and FF are by far the most frequent, but other bytes with few bits set (e.g, 80, 44, 02) are also quite common.
Runs of 00 and FF bytes are very frequent. Other patterns will sometimes appear on a local scale (e.g, a 16-byte sequence will be repeated a few times), but not globally.
What would be an appropriate compression algorithm (not a library, unless you're sure it'll fit!) for this task, given the constraints?
You should consider using LZO compression library. It has probably one of the fastest decompressors in existence, and decompression requires no memory. Compression, however, needs 64KB of memory (or 8KB for one of compression levels). If you only need to decompress, it might just work for you.
LZO project even provides special cut-down version of this library called miniLZO. According to the author, miniLZO compiles to less than 5KB binary on i386. Since you have 16KB flash, it might just fit into your constraints.
LZO compressor is currently used by UPX (ultimate packer for executables).
From your description, I would recommend run-length encoding followed by Huffman coding the bytes and runs. You would need very little memory over the data itself, mainly for accumulating frequencies and building a Huffman tree in place. Less than 1K.
You should make a histogram of the lengths of the runs to help determine how many bits to allocate to the run lengths.
Have you tried the built-in bitstream compression? That can work really well on non-full devices. It's a bitgen option, and the FPGA supports it out of the box, so it has no resource impact on your micro.
The way the compression is achieved is described here:
http://www.xilinx.com/support/answers/16996.html
Other possibilities have been discussed on comp.arch.fpga:
https://groups.google.com/forum/?fromgroups#!topic/comp.arch.fpga/7UWTrS307wc
It appears that one poster implemented LZMA successfully on a relatively constrained embedded system. You could use 7zip to check what sort of compression ratio you might expect and see if it's good enough before committing to implementation of the embedded part.

Optimal way to move memory in x86 and ARM?

I am interested knowing the best approach for bulk memory copies on an x86 architecture. I realize this depends on machine-specific characteristics. The main target is typical desktop machines made in the last 4-5 years.
I know that in the old days MOVSD with REPE was nominally the fastest approach because you could move 4 bytes at a time, but I have read that nowadays MOVSB is just as fast and is simpler to write, so you may as well do a byte move and just forget about the complexities of a 4-byte move.
A surrounding question is whether MOVxx instructions are worth it at all. If the CPU can run so much faster than the memory bus, then maybe it is pointless to use a CISC move and you may as well use a plain MOV. This would be most attractive because then I could use the same algorithms on other processor architectures like ARM. This brings up the analogous question of whether ARM's specialized instructions for bulk memory moves (which are totally different than Intels) are worth it or not.
Note: I have read section 3.7.6 in the Intel Optimization Reference Manual so I am familiar with the basics. I am hoping someone can relate practical experience in the area beyond what is in this manual.
Modern Intel and AMD processors have optimisations on REP MOVSB that make it copy entire cache lines at a time if it can, making it the best (may not be fastest, but pretty close) method of copying bulk data.
As for ARM, it depends on the architecture version, but in general using an unrolled loop would be the most efficient.

Should I care about bit-endianness when making cross-platform data storing code?

When I save some binary data on disk (or memory), I should care about byte-endianness if the data must be portable across multiple platforms. But how about bit-endianness? Is it fine to ignore?
I think this is fine to ignore, but there can be some pitfalls, so I like to hear other opinions.
Bits are always arranged the same way in a byte, though there are (were) some exotic architectures where a byte was not 8 bit. In modern computing you can safely assume an 8-bit byte.
What varies (as you properly noted) is a byte arrangement - that one you should take care of, indeed.
Big endian vs. little endian mattered more when PowerPC was prevalent (due to its use by Macs). Now that major OS (Windows, OS X, iOS, Android) and hardware platforms (x86, x86-64, ARM) are all little-endian, it is not much of a concern.

Compact decompression library for embedded use

We're currently creating a device for a customer that will get a block of data (like, say, 5-10KB) from a PC application. This is a bit simplified, so assume that the data must be passed and uncompressed a lot, not just once a year. The communication channel is really, really slow, so we'd like to compress the data beforehand, pass to the device and let it uncompress the data to its internal flash. The device itself, however, runs on a micro controller that is not really fast and does not have a lot of memory. It has enough flash memory to store the result, and can uncompress the data block as it is received, but it may not have enough RAM to store the entire compressed or uncompressed (or even both!) data blocks. And of course, it doesn't have an operating system or other luxury.
This means we need a sufficiently fast uncompression algorithm that does not use a lot of memory. The compression can be slow and ugly, since we're doing it on the PC side. C or .NET code preferred though for compression, to make things easier. The uncompression code should be in C, since it's unlikely that someone has an ASM optimized version for our controller.
We found LZO, which would be almost perfect for us, but it has a so-called "free" license (GPL) by default, which makes it totally unusable for our customer. The author says that commercial licenses are available on request, but unfortunately he's currently unreachable (for non-technical reasons, like the news on his site say).
I found a few other libraries, including the puff.c from zlib, and we're still investigating, but I thought I'd ask for your experience:
Which compression algorithm and/or library do you recommend for embedded purposes, given that the decompression device has really limited resources and source code and a commercial license are required?
You might want to check out one of these which are not GPL and are fairly compact implementations:
fastlz - MIT license, fairly simple code
lzjb - sun CDDL, used in zfs for compression, simple and very short
liblzf - BSD-style license, small, fast
lzfx - BSD-style, based on liblzf, small, fast
Those algorithms are all based on the original algorithm of Lempel–Ziv–Welch (They have all LZ in common)
https://en.wikipedia.org/wiki/Lempel–Ziv–Welch
I have used LZSS. I used code from Haruhiko Okumura as base. It uses the last portion of uncompressed data(2K) as dictionary. This code can be modified to not require a temporary ring buffer if you have no memory. The licensing is not clear from his site but some versions was released with a "Use, distribute, and modify this program freely" line included and the code is used by commercial vendors.
Here is an implementation based on the same code that forms part of the Allegro game library. Allegro licensing is giftware or zlib.
Another option could be the lzfx lib that implement LZF. I have not used it yet but it seems nice. Also uses previous results so it has low memory requirements and is released under a BSD Licence.
One alternative could be the LZ77 coder/decoder in the Basic Compression Library.
Since it uses the unpacked data history for its dictionary, it uses no extra RAM except for the compressed and uncompressed data buffers. It should be ideal for your use case (zlib license, portable C). The entire decoder is just 70 lines of code (including comments), and really fast.
EDIT: Yet another alternative is the liblzg library, which is a refined version of the aforementioned LZ77 coder/decoder. It compresses better, is generally faster, and requires no memory for decompression. It is very, very free (zlib license).
I would recommend ZLIB.
From the wiki:
The library provides facilities for control of processor and memory use
There are also facilities for conserving memory. These are probably only useful in restricted memory environments such as some embedded systems.
zlib is also used in many embedded devices because the code is portable, liberally-licensed
and has a relatively small memory footprint.
A lot depends on the nature of the data. If it is simple enough, you may not need anything very fancy. For example if the downloaded data was a simple image (for example something like a line graph), a simple run length encoding could cut the data down by a factor of ten and you would need trivial amounts of code and RAM to decode it.
Of course if the data is more complex, then this won't be of much use. But I'd start by exploring the data being sent and see if there are specific aspects which would allow you to compress it more effectively than using a general purpose algorithm.
You might want to check out Jørgen Ibsen's aPlib - a couple of excerpts from the product page:
The compression ratios achieved by aPLib combined with the speed and tiny footprint of the depackers (as low as 169 bytes!) makes it the ideal choice for many products.
aPLib is free to use even for commercial use, please check the included license for details.
The compression library is closed-source (yes, I know this could be a problem), but has precompiled libraries for a variety of compilers and operating systems, including both 32- and 64-bit editions. There's C and x86 assembly source code for the decompressor.
EDIT:
Jørgen also has a free (zlib license) BrifLZ library you could check out if not having compressor source is a big problem.
I've seen people use 7zip on an embedded system with memory in the tens of megabytes.
there is a specific custom version of zlib for Micro-controller based on ARM Cortex-M (M0, M0+, M1, M3, M4)
https://github.com/kuym/Zlib-ARM-Cortex-M