ZyXEL ADPCM codec - modem

I have a ZyXEL USB Omni56K Duo modem and want to send and receive voice streams on it, but to reach adequate quality I probably need to implement some "ZyXEL ADPCM" encoding because plain PCM provides too small sampling rate to transmit even medium quality voice, and it doesn't work through USB either (probably because even this bitrate is too high for USB-Serial converter in it).
This mysterious codec figures in all Microsoft WAV-related libraries as one of many codecs theoretically supported by it, but I found no implementations.
Can someone offer an implementation in any language or maybe some documentation? Writing a custom mu-law decoding algorithm won't be a problem for me.
Thanks.

I'm not sure how ZyXEL ADPCM varies from other flavors of ADPCM, but various ADPCM implementations can be found with some google searches.
However, the real reason for my post is why the choice of ADPCM. ADPCM is adaptive differential pulse-code modulation. This means that the data being passed is the difference in samples, not the current value (which is also why you see such great compression). In a clean environment with no bit loss (ie disk drive), this is fine. However, in a streaming environment, its generally assumed that bits may be periodically mangled. Any bit damage to the data and you'll be hearing static or other audio artifacts very quickly and usually, fairly badly.
ADPCM's reset mechanism isn't framed based, which means the audio problems can go on for an extended period of time depending on the encoder. The reset code is a usually a set of 0s (16 comes to mind, but its been years since I wrote my own ports).
ADPCM in the telephony environment usually converts a 12 bit PCM sample to a 4 bit ADPCM sample (not bad). As for audio quality...not bad for phone conversations and the spoken word, but most people, in a blind test, can easily detect the quality drop.
In your last sentence, you throw a curve ball into the question. You start mentioning muLaw. muLaw is a PCM implementation that takes a 12 bit sample and transforms it using a logarithmic scale to an 8 bit sample. This is the typical compression mechanism for TDM (phone) networkworks in North America (most of the rest of the world uses a similar algorithm called ALaw).
So, I'm confused what you are actually trying to find.
You also mentioned Microsft and WAV implementations. You probably know, but just in case, that WAV is just a wrapper around the audio data that provides format, sampling information, channel, size and other useful information. Without WAV, AU or other wrappers involved, muLaw and ADPCM are usually presented as raw data.
One other tip if you are implementing ADPCM. As I indicated, they use 4 bits to represent a 12 bit sample. They get away with this by both sides having a multiplier table. Your position in the table changes based on the 4 bit value (in other words, the value is both multiple against a step size and used to figure out the new step size). I've seen a variety of algorithms use slightly different tables (no idea why, but you typically see the sent and received signals slowly stray off the bias). One of the older, popular sound packages was different than what I typically saw from the telephony hardware vendors.
And, for more useless trivia, there are multiple flavors of ADPCM. The variances involve the table, source sample size and destination sample size, but I've never had a need to work with them. Just documented flavors that I've found when I did my internet search for specifications for the various audio formats used in telephony.

Piping your pcm through ffmpeg -f u16le -i - -f wav -acodec adpcm_ms - will likely work.
http://ffmpeg.org/

Related

What video encoding format supports variable bit rate streaming (quality) and a SEEK function to skip segments?

I'm looking for a Multiplatform video format that allows for a variable bit rate for streaming over high or low quality links, and also a seek function that allows a byte[] range offset (as in HTTP commands) to fetch a missing or lower than desired quality.
I think it is worth separating encoding and streaming to help the background.
Most streaming protocols will stream video contained in 'containers' e.g. MP4, WebM etc. The containers will include video tracks which can be encoded with different encoders, e.g. H.264, H.265, VP9 etc.
The term Variable bit rate is usually used to describe how the encoding is done - i.e. the encoder may compress or encode the video so it has variable quality and fixed bit rate or try to maintain a given quality level but with a variable bit rate.
I suspect what you may be more interested in is what is called Adaptive Bit Rate streaming - here the video is 'transcoded' into multiple copies, each with a different bit rate. The copies are all segmented at the same points, for example every two seconds.
A client can choose which bit rate to request for the next segment of the video, i.e. the next 2 second chunk, depending on its capabilities and on the network conditions at that time. Hence, the bit rate actually being streamed to the device can vary over time. See this answer for how you can view this in action in live examples: https://stackoverflow.com/a/42365034/334402
Assuming this meets your needs, then the two dominant ABR streaming formats at the moment are HLS and MPEG-DASH.
Traditionally HLS uses TS segments while DASH uses fragmented MP4. Both are now converging on CMAF, which means that the bulk of the media will be a single Multiplatform format, as you are looking for. There is a good overview of CMAF here at the time of writing: https://developer.apple.com/documentation/http_live_streaming/about_the_common_media_application_format_with_http_live_streaming
One caveat is that if your content is encrypted then, at the moment, different devices support different encryption modes so you may need to have separate HLS and DASH encrypted media for the medium term, until device support evolves over time.

What characteristics should have a .wav file as result of TTS engine to be be listened with high quality?

I'm trying to generate high quality voice-over using Microsoft Speech API. What kind of values I should pass in to this constructor to guarantee high quality audio?
The .wav file will be used latter to feed FFmpeg, so audio will be re-encoded latter to a more compact form. My main goal is keep the voice as clear as I can, but I really don't know which values guarantee the best quality perceived by humans.
First of all, just to let you know I haven't used this Speech API, I'll give you an answer based on my Audio processing work.....
You can choose EncodingFormat.Pcm for Pulse Code Modulation
samplesPerSecond is sampling frequency. Because it is voice you can cover it with 16000hz for sure. If you are really perfectionist you can go with 22050 for example. Higher the value is, the audio file size will be larger. If file size isn't a problem you can even go with 32000 or 44100 but there won't be much noticable difference....
bitsPerSample - go with 16 if possible
1 or 2, mono or stereo ..... it won't affect the quality of the sound
averageBytesPerSecond ..... this would be samplesPerSecond*bytesPerSample (for example 22050*2)
blockAlign ..... this would be Bytes Per Sample*numberOfChanels (for example if you have 16bit PCM Mono audio, 16bits are 2 bytes, Mono is 1, so blockAlign is 2*1)
That last one, the byte array doesn't speaks much for itself, I'm not sure what it serves for, I believe the first 6 arguments are enough for audio to be generated.
I hope this was helpful
Cheers

Data compression for FPGA bitstream

I'm looking for a good compression algorithm to use for decompressing data from a flash chip to load to an FPGA (a Xilinx Spartan6-LX9, on the Mojo development board). It must be fast to decompress and not require a lot of working memory to do so, as the CPU (an ATmega16U4) is clocked at 8 MHz and has only 2 KiB of RAM and 16 KiB of program flash, some of which is already in use. Compression speed is not particularly important, as compression will only be run once on a computer, and the compression algorithm need not work on arbitrary inputs.
Here is an example bitstream. The format is documented in the Spartan-6 FPGA Configuration manual (starting on page 92).
Generally, the patterns present in the data fall into a few categories, and I'm not sure which of these will be easiest to exploit given the constraints I'm working with:
The data is organized overall into a set of packets of a known format. Certain parts of the bitstream are somewhat "stereotyped" (e.g, it will always begin and end by writing to certain registers), and other commands will appear in predictable sequences.
Some bytes are much more common than others. 00 and FF are by far the most frequent, but other bytes with few bits set (e.g, 80, 44, 02) are also quite common.
Runs of 00 and FF bytes are very frequent. Other patterns will sometimes appear on a local scale (e.g, a 16-byte sequence will be repeated a few times), but not globally.
What would be an appropriate compression algorithm (not a library, unless you're sure it'll fit!) for this task, given the constraints?
You should consider using LZO compression library. It has probably one of the fastest decompressors in existence, and decompression requires no memory. Compression, however, needs 64KB of memory (or 8KB for one of compression levels). If you only need to decompress, it might just work for you.
LZO project even provides special cut-down version of this library called miniLZO. According to the author, miniLZO compiles to less than 5KB binary on i386. Since you have 16KB flash, it might just fit into your constraints.
LZO compressor is currently used by UPX (ultimate packer for executables).
From your description, I would recommend run-length encoding followed by Huffman coding the bytes and runs. You would need very little memory over the data itself, mainly for accumulating frequencies and building a Huffman tree in place. Less than 1K.
You should make a histogram of the lengths of the runs to help determine how many bits to allocate to the run lengths.
Have you tried the built-in bitstream compression? That can work really well on non-full devices. It's a bitgen option, and the FPGA supports it out of the box, so it has no resource impact on your micro.
The way the compression is achieved is described here:
http://www.xilinx.com/support/answers/16996.html
Other possibilities have been discussed on comp.arch.fpga:
https://groups.google.com/forum/?fromgroups#!topic/comp.arch.fpga/7UWTrS307wc
It appears that one poster implemented LZMA successfully on a relatively constrained embedded system. You could use 7zip to check what sort of compression ratio you might expect and see if it's good enough before committing to implementation of the embedded part.

Accesing files which are currently being written

If a file is in a writing process, and at this time if I try to access it like if it is a log file which is being written every 10 milliseconds and I`m trying to access it will I damage or disturb the writing process?
Specifically I'm asking about video files, like if I start a recording process (using Windows Media Encoder) and at this time I would like to monitor the file if it is a blank file (black pixels everywhere) or there is a real content being recorded.
Sorry if my question is a newbie one, but I really really need to be sure about that.
Best on advance
In general you can certainly read files as they are being written, without corrupting their content. However:
It is possible to face an issue if your recording medium cannot deal with the combined data-rate or of both reading and writing. This can be a problem especially with slow-ish USB flash drives.
It is possible to face an issue on hard drives too, if the combination of reading and writing exceeds the rate of random seeks that the hard drive can handle. This can happen more easily on older drives (e.g. IDE) when dealing with HD video.
The end result is that if you have a real-time writer process, such as a TV recorder, it may be forced to drop some of the data - in the case of video a few frames.
Modern systems have quite fast disk subsystems, reasonably good I/O schedulers and large enough RAM capacities to allow for extensive data caching, which makes it quite unlikely that a single writer/reader combination would saturate the disk subsystem, unless you are doing something unusual like recording several video streams at once.
Keep in mind however, that:
The disk subsystem can also be saturated by unrelated processes reading/writing other files from the same drive.
If you are encoding video, you might also lose frames if something draws enough CPU resources that the encoding process is no longer able to keep-up with the real-time requirements. Depending on the video file, test-playing it might be just enough to do that - at least HD reproduction can be quite demanding. So, watch your CPU load and experiment before relying on it to record your favourite show :-)
EDIT:
If you are among the lucky ones that have SSD drives, seeks and data rate should normally be a non-issue. That leaves the CPU - you'd be surprised how easy it is to push it to the limit.
Above all, you should experiment to find out the limits of your system for each particular application. That way you won't have any nasty surprises...

Compact decompression library for embedded use

We're currently creating a device for a customer that will get a block of data (like, say, 5-10KB) from a PC application. This is a bit simplified, so assume that the data must be passed and uncompressed a lot, not just once a year. The communication channel is really, really slow, so we'd like to compress the data beforehand, pass to the device and let it uncompress the data to its internal flash. The device itself, however, runs on a micro controller that is not really fast and does not have a lot of memory. It has enough flash memory to store the result, and can uncompress the data block as it is received, but it may not have enough RAM to store the entire compressed or uncompressed (or even both!) data blocks. And of course, it doesn't have an operating system or other luxury.
This means we need a sufficiently fast uncompression algorithm that does not use a lot of memory. The compression can be slow and ugly, since we're doing it on the PC side. C or .NET code preferred though for compression, to make things easier. The uncompression code should be in C, since it's unlikely that someone has an ASM optimized version for our controller.
We found LZO, which would be almost perfect for us, but it has a so-called "free" license (GPL) by default, which makes it totally unusable for our customer. The author says that commercial licenses are available on request, but unfortunately he's currently unreachable (for non-technical reasons, like the news on his site say).
I found a few other libraries, including the puff.c from zlib, and we're still investigating, but I thought I'd ask for your experience:
Which compression algorithm and/or library do you recommend for embedded purposes, given that the decompression device has really limited resources and source code and a commercial license are required?
You might want to check out one of these which are not GPL and are fairly compact implementations:
fastlz - MIT license, fairly simple code
lzjb - sun CDDL, used in zfs for compression, simple and very short
liblzf - BSD-style license, small, fast
lzfx - BSD-style, based on liblzf, small, fast
Those algorithms are all based on the original algorithm of Lempel–Ziv–Welch (They have all LZ in common)
https://en.wikipedia.org/wiki/Lempel–Ziv–Welch
I have used LZSS. I used code from Haruhiko Okumura as base. It uses the last portion of uncompressed data(2K) as dictionary. This code can be modified to not require a temporary ring buffer if you have no memory. The licensing is not clear from his site but some versions was released with a "Use, distribute, and modify this program freely" line included and the code is used by commercial vendors.
Here is an implementation based on the same code that forms part of the Allegro game library. Allegro licensing is giftware or zlib.
Another option could be the lzfx lib that implement LZF. I have not used it yet but it seems nice. Also uses previous results so it has low memory requirements and is released under a BSD Licence.
One alternative could be the LZ77 coder/decoder in the Basic Compression Library.
Since it uses the unpacked data history for its dictionary, it uses no extra RAM except for the compressed and uncompressed data buffers. It should be ideal for your use case (zlib license, portable C). The entire decoder is just 70 lines of code (including comments), and really fast.
EDIT: Yet another alternative is the liblzg library, which is a refined version of the aforementioned LZ77 coder/decoder. It compresses better, is generally faster, and requires no memory for decompression. It is very, very free (zlib license).
I would recommend ZLIB.
From the wiki:
The library provides facilities for control of processor and memory use
There are also facilities for conserving memory. These are probably only useful in restricted memory environments such as some embedded systems.
zlib is also used in many embedded devices because the code is portable, liberally-licensed
and has a relatively small memory footprint.
A lot depends on the nature of the data. If it is simple enough, you may not need anything very fancy. For example if the downloaded data was a simple image (for example something like a line graph), a simple run length encoding could cut the data down by a factor of ten and you would need trivial amounts of code and RAM to decode it.
Of course if the data is more complex, then this won't be of much use. But I'd start by exploring the data being sent and see if there are specific aspects which would allow you to compress it more effectively than using a general purpose algorithm.
You might want to check out Jørgen Ibsen's aPlib - a couple of excerpts from the product page:
The compression ratios achieved by aPLib combined with the speed and tiny footprint of the depackers (as low as 169 bytes!) makes it the ideal choice for many products.
aPLib is free to use even for commercial use, please check the included license for details.
The compression library is closed-source (yes, I know this could be a problem), but has precompiled libraries for a variety of compilers and operating systems, including both 32- and 64-bit editions. There's C and x86 assembly source code for the decompressor.
EDIT:
Jørgen also has a free (zlib license) BrifLZ library you could check out if not having compressor source is a big problem.
I've seen people use 7zip on an embedded system with memory in the tens of megabytes.
there is a specific custom version of zlib for Micro-controller based on ARM Cortex-M (M0, M0+, M1, M3, M4)
https://github.com/kuym/Zlib-ARM-Cortex-M