Should I care about bit-endianness when making cross-platform data storing code?

Should I care about bit-endianness when making cross-platform data storing code? - cross-platform

When I save some binary data on disk (or memory), I should care about byte-endianness if the data must be portable across multiple platforms. But how about bit-endianness? Is it fine to ignore?
I think this is fine to ignore, but there can be some pitfalls, so I like to hear other opinions.

Bits are always arranged the same way in a byte, though there are (were) some exotic architectures where a byte was not 8 bit. In modern computing you can safely assume an 8-bit byte.
What varies (as you properly noted) is a byte arrangement - that one you should take care of, indeed.

Big endian vs. little endian mattered more when PowerPC was prevalent (due to its use by Macs). Now that major OS (Windows, OS X, iOS, Android) and hardware platforms (x86, x86-64, ARM) are all little-endian, it is not much of a concern.

Related

Skills needed in 8-bit, 16-bit, 32-bit

Are there any specific skillsets required with 8-bit, 16-bit and 32-bit processing for embedded developers?

Yes, there are specific skills expected and differences between 8bit and 32bit processors. (Ignoring 16 bit, since there's so few of them available)
8 bit processors and tools are vastly different than the 32bit variants (even excluding Linux based systems).
Processor architecture
Memory availability
Peripheral complexity
An 8051 is a strange beast and plopping your average CS in front of one and asking them to make a product is asking for something that only mostly works. It's multiple memory spaces, lack of stack, constrained register file, and constrained memory really make "modern" computer science difficult.
Even an AVR, which is less of a strange beast, still has constraints that a 32 bit processor just doesn't have, particularly memory
And all of these are very different than writing code on an embedded linux platform.

In general processors and microcontrollers using 32 bit architecture tend to be more complex and used in more complex applications. As such, someone with only 8 bit device experience may not process the skills or experience necessary for more complex projects.
So it is not specifically the bit-width that is the issue, but it is used simply as a shorthand or proxy for complexity of systems. It is a very crude measure in any event since architectures differ widely even withing the bit-width classification; AVR, PIC and x51 for example are very different, as are 68K, ARM and x86. Even within the ARM family a Cortex-M device is very different from an A-class device.
Beware of any job spec that uses such broad skill classifications - something for you to challenge perhaps in the interview.

How an embedded board developers decided on what endianity they want to support?

ARM MCU supports both little endian and big endian. However, when the manufactures design microcontrollers i.e. when they use ARM microprocessor and add peripherals to it they support either big endian or little endian. So, my question is how board manufacturer like STM32, TI decides whether they want to support little or big endian. As far as I understand ARM microprocessor supports both little endian and big endian.

It's completely subjective.
The terms Big Endian and Little Endian are taken from the book Gulliver's Travels, where two nations fight a fierce, bloody war based on the disagreement about if one should hatch an egg on the "big" side or the "little" side. That is, they were fighting over something completely pointless.
In the computer world in the 1970s-80s, the Big Endian camp consisted prominently of Motorola and IBM, and the Little Endian camp consisted prominently of Intel. All other manufacturers had to pick either side.
So mostly it is picked by tradition.
Regarding ARM specifically, all ARM Cortex are in practice Little Endian. Even Freescale, former Motorola, picked Little Endian for their Kinetis family. There are however various other 32 bit architectures that use Big Endian, including I believe some pre-Cortex ARMs.
Importantly, "network endianess" is almost always Big Endian, also out of tradition. But this has an actual objective and practical reason though, namely CRC calculations. In order to create a CRC calculator in pure digital logic with XOR gates, the data must be transmitted MS byte first. It is nowadays rare to implement CRC using digital gates, but that's the historical reason.

The software ecosystem for ARM Cortex MCUs is more-or-less exclusively little-endian. It would be unlikely that anyone would choose to use big-endian exclusively or even mixed-endian without very a good application specific reason. So the choice is normally simple, and I doubt the designers think about it at all.

Unlike other ARM implementations, Cortex-M MCUs do not support "on-the-fly" changing of the endianness, and the choice of endianness is fixed by the silicon vendors. All popular (and maybe even non-popular ones) Cortex-M MCUs implement little endian, so that's the practical answer.
Little endian is easier to map a stream of bytes/ASCII characters into memory. So maybe that's one reason why it is more popular.
The choice of endianness is pretty immaterial. If you are writing standalone code in a high level language that does not exchange data with anyone else, then it really does not matter which endianness you choose. Even if you have to exchange data, then usually you would encapsulate the data access in some low level functions so by and large the endianness is still not too impactful.

Optimal way to move memory in x86 and ARM?

I am interested knowing the best approach for bulk memory copies on an x86 architecture. I realize this depends on machine-specific characteristics. The main target is typical desktop machines made in the last 4-5 years.
I know that in the old days MOVSD with REPE was nominally the fastest approach because you could move 4 bytes at a time, but I have read that nowadays MOVSB is just as fast and is simpler to write, so you may as well do a byte move and just forget about the complexities of a 4-byte move.
A surrounding question is whether MOVxx instructions are worth it at all. If the CPU can run so much faster than the memory bus, then maybe it is pointless to use a CISC move and you may as well use a plain MOV. This would be most attractive because then I could use the same algorithms on other processor architectures like ARM. This brings up the analogous question of whether ARM's specialized instructions for bulk memory moves (which are totally different than Intels) are worth it or not.
Note: I have read section 3.7.6 in the Intel Optimization Reference Manual so I am familiar with the basics. I am hoping someone can relate practical experience in the area beyond what is in this manual.

Modern Intel and AMD processors have optimisations on REP MOVSB that make it copy entire cache lines at a time if it can, making it the best (may not be fastest, but pretty close) method of copying bulk data.
As for ARM, it depends on the architecture version, but in general using an unrolled loop would be the most efficient.

GLib for embedded Linux?

I am wondering~
How large is GLib? Can it be used directly on embedded system? Is it usually too large for embedded system?
Is there a embedded system version of GLib?
Thanks

"Embedded" barely means anything. The systems range from tiny (few registers of volatile memory, read-only code memory) to nearly as big as computers have ever been on this planet.
Glib isn't huge as desktop libraries go, but it's not a tiny microcontroller library either. There's obviously a vague line somewhere in that range below which it just won't fit, but without knowing the system there's no way to tell.
Based on a comment it seems your environment ranks "ginormous" in the embedded scale, so glib will probably fit if the system underneath makes porting it worthwhile. If you have a GPOS (like Linux or BSD), there might already be a port.

If you can fit GLibC then you can fit GLib.

GLIB can be used for embedded systems and is quite small. However, newer
versions of GLIB require a lot more libraries to build. If what you need
is a subset like async queues, threads, queues, lists, etc., you might
consider getting the code and building GLIB 1.2.x instead of the latest.
I have been considering building a subset of GLIB for embedded work that
removes some of the newer features.

How large storage space do you have on your embedded device? There is no problem compiling Glib for an embedded device, given that you have enough storage on your device. Since we don't know anything about your device, it's kind of hard to say though.

Compact decompression library for embedded use

We're currently creating a device for a customer that will get a block of data (like, say, 5-10KB) from a PC application. This is a bit simplified, so assume that the data must be passed and uncompressed a lot, not just once a year. The communication channel is really, really slow, so we'd like to compress the data beforehand, pass to the device and let it uncompress the data to its internal flash. The device itself, however, runs on a micro controller that is not really fast and does not have a lot of memory. It has enough flash memory to store the result, and can uncompress the data block as it is received, but it may not have enough RAM to store the entire compressed or uncompressed (or even both!) data blocks. And of course, it doesn't have an operating system or other luxury.
This means we need a sufficiently fast uncompression algorithm that does not use a lot of memory. The compression can be slow and ugly, since we're doing it on the PC side. C or .NET code preferred though for compression, to make things easier. The uncompression code should be in C, since it's unlikely that someone has an ASM optimized version for our controller.
We found LZO, which would be almost perfect for us, but it has a so-called "free" license (GPL) by default, which makes it totally unusable for our customer. The author says that commercial licenses are available on request, but unfortunately he's currently unreachable (for non-technical reasons, like the news on his site say).
I found a few other libraries, including the puff.c from zlib, and we're still investigating, but I thought I'd ask for your experience:
Which compression algorithm and/or library do you recommend for embedded purposes, given that the decompression device has really limited resources and source code and a commercial license are required?

You might want to check out one of these which are not GPL and are fairly compact implementations:
fastlz - MIT license, fairly simple code
lzjb - sun CDDL, used in zfs for compression, simple and very short
liblzf - BSD-style license, small, fast
lzfx - BSD-style, based on liblzf, small, fast
Those algorithms are all based on the original algorithm of Lempel–Ziv–Welch (They have all LZ in common)
https://en.wikipedia.org/wiki/Lempel–Ziv–Welch

I have used LZSS. I used code from Haruhiko Okumura as base. It uses the last portion of uncompressed data(2K) as dictionary. This code can be modified to not require a temporary ring buffer if you have no memory. The licensing is not clear from his site but some versions was released with a "Use, distribute, and modify this program freely" line included and the code is used by commercial vendors.
Here is an implementation based on the same code that forms part of the Allegro game library. Allegro licensing is giftware or zlib.
Another option could be the lzfx lib that implement LZF. I have not used it yet but it seems nice. Also uses previous results so it has low memory requirements and is released under a BSD Licence.

One alternative could be the LZ77 coder/decoder in the Basic Compression Library.
Since it uses the unpacked data history for its dictionary, it uses no extra RAM except for the compressed and uncompressed data buffers. It should be ideal for your use case (zlib license, portable C). The entire decoder is just 70 lines of code (including comments), and really fast.
EDIT: Yet another alternative is the liblzg library, which is a refined version of the aforementioned LZ77 coder/decoder. It compresses better, is generally faster, and requires no memory for decompression. It is very, very free (zlib license).

I would recommend ZLIB.
From the wiki:
The library provides facilities for control of processor and memory use
There are also facilities for conserving memory. These are probably only useful in restricted memory environments such as some embedded systems.
zlib is also used in many embedded devices because the code is portable, liberally-licensed
and has a relatively small memory footprint.

A lot depends on the nature of the data. If it is simple enough, you may not need anything very fancy. For example if the downloaded data was a simple image (for example something like a line graph), a simple run length encoding could cut the data down by a factor of ten and you would need trivial amounts of code and RAM to decode it.
Of course if the data is more complex, then this won't be of much use. But I'd start by exploring the data being sent and see if there are specific aspects which would allow you to compress it more effectively than using a general purpose algorithm.

You might want to check out Jørgen Ibsen's aPlib - a couple of excerpts from the product page:
The compression ratios achieved by aPLib combined with the speed and tiny footprint of the depackers (as low as 169 bytes!) makes it the ideal choice for many products.
aPLib is free to use even for commercial use, please check the included license for details.
The compression library is closed-source (yes, I know this could be a problem), but has precompiled libraries for a variety of compilers and operating systems, including both 32- and 64-bit editions. There's C and x86 assembly source code for the decompressor.
EDIT:
Jørgen also has a free (zlib license) BrifLZ library you could check out if not having compressor source is a big problem.

I've seen people use 7zip on an embedded system with memory in the tens of megabytes.

there is a specific custom version of zlib for Micro-controller based on ARM Cortex-M (M0, M0+, M1, M3, M4)
https://github.com/kuym/Zlib-ARM-Cortex-M

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas