Creating a WAV file with an arbitrary bits per sample value? - vb.net

Do WAV files allow any arbitrary number of bitsPerSample?
I have failed to get it to work with anything less than 8. I am not sure how to define the blockAlign for one thing.
Dim ss As New Speech.Synthesis.SpeechSynthesizer
Dim info As New Speech.AudioFormat.SpeechAudioFormatInfo(AudioFormat.EncodingFormat.Pcm, 5000, 4, 1, 2500, 1, Nothing) ' FAILS
ss.SetOutputToWaveFile("TEST4bit.wav", info)
ss.Speak("I am 4 bit.")
My.Computer.Audio.Play("TEST4bit.wav")

AFAIK no, 4-bit PCM format is undefined, it wouldn't make much sense to have 16 volume levels of audio; quality would be horrible.
While technically possible, I know no decent software (e.g. Wavelab) that supports it, your very own player could though.
Formula: blockAlign = channels * (bitsPerSample / 8)
So for a mono 4-bit it would be : blockAlign = 1 * ((double)4 / 8) = 0.5
Note the usage of double being necessary to not end up with 0.
But if you look at the block align definition below, it really does not make much sense to have an alignment of 0.5 bytes, one would have to work at the bit-level (painful and useless because at this quality, non-compressed PCM would just sound horrible):
wBlockAlign
The block alignment (in bytes) of the waveform data. Playback
software needs to process a multiple of wBlockAlign bytes of data at
a time, so the value of wBlockAlign can be used for buffer
alignment.
Reference:
http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/riffmci.pdf page 59
Workaround:
If you really need 4-bit, switch to ADPCM format.

Related

how to drive a dotstar strip from C on a raspberry pi

I am trying to figure out how to drive a dotstart strip by calling write(handle, datap, len) to an SPI handle, from C, on a raspberry pi. I'm not quite clear on how to lay out the data.
Looking at https://cdn-shop.adafruit.com/datasheets/APA102.pdf#page=3 makes me think you start with 4 bytes of 0, a string of coded LED values (4 bytes per LED) and then 4 bytes of 1's. But that cannot be right; the final 4 bytes of 1's would be indistinguishable from a request to set an LED to full brightness white. So how could that terminate the data?
Insight welcome. Yes, I know there's a python library out there for this, but I'm coding in C++ or C.
After much digging, I found the answer here:
https://cpldcpu.wordpress.com/2014/11/30/understanding-the-apa102-superled/
The end frame is more complex than the spec suggests, but the spec is correct if your string has 32 LEDS, and you must always specify values for all LEDS in your string.

Static data-heavy Rust library seems bloated

I've been developing a Rust library recently to try to provide fast access to a large database (the Unicode character database, which as a flat XML file is 160MB). I also want it to have a small footprint so I've used various approaches to reduce the size. The end result is that I have a series of static slices that look like:
#[derive(Clone,Copy,Eq,PartialEq,Debug)]
pub enum UnicodeCategory {
UppercaseLetter,
LowercaseLetter,
TitlecaseLetter,
ModifierLetter,
OtherLetter,
NonspacingMark,
SpacingMark,
EnclosingMark,
DecimalNumber,
// ...
}
pub static UCD_CAT: &'static [((u8, u8, u8), (u8, u8, u8), UnicodeCategory)] =
&[((0, 0, 0), (0, 0, 31), UnicodeCategory::Control),
((0, 0, 32), (0, 0, 32), UnicodeCategory::SpaceSeparator),
((0, 0, 33), (0, 0, 35), UnicodeCategory::OtherPunctuation),
/* ... */];
// ...
pub static UCD_DECOMP_MAP: &'static [((u8, u8, u8), &'static [(u8, u8, u8)])] =
&[((0, 0, 160), &[(0, 0, 32)]),
((0, 0, 168), &[(0, 0, 32), (0, 3, 8)]),
((0, 0, 170), &[(0, 0, 97)]),
((0, 0, 175), &[(0, 0, 32), (0, 3, 4)]),
((0, 0, 178), &[(0, 0, 50)]),
/* ... */];
In total, all the data should only take up around 600kB max (assuming extra space for alignment etc), but the library produced is 3.3MB in release mode. The source code itself (almost all data) is 2.6MB, so I don't understand why the result would be more. I don't think the extra size is intrinsic as the size was <50kB at the beginning of the project (when I only had ~2kB of data). If it makes a difference, I'm also using the #![no_std] feature.
Is there any reason for the extra binary bloat, and is there a way to reduce the size? In theory I don't see why I shouldn't be able to reduce the library to a megabyte or less.
As per Matthieu's suggestion, I tried analysing the binary with nm.
Because all my tables were represented as borrowed slices, this wasn't very useful for calculating table sizes as they were all in anonymous _refs. What I could determine was the maximum address, 0x1208f8, which would be consistent with a filesize of ~1MB rather than 3.3MB. I also looked through the hex dump to see if there were any null blocks that might explain it, but there weren't.
To see if it was the borrowed slices that were the problem, I turned them into non-borrowed slices ([T; N] form). The filesize didn't change much, but now I could interpret the nm data quite easily. Weirdly, the tables took up exactly how much I expected them to (even more weirdly, they matched my lower bounds when not accounting for alignment, and there was no space between the tables).
I also looked at the tables with nested borrowed slices, e.g. UCD_DECOMP_MAP above. When I removed all of these (about 2/3 of the data), the filesize was ~1MB when it should have only been ~250kB (by my calculations and the highest nm address, 0x3d1d0), so it doesn't look like these tables were the problem either.
I tried extracting the individual files from the .rlib file (which is a simple ar-format archive). It turns out that 40% of the library is just metadata files, and that the actual object file is 1.9MB. Further, when I do this to the library without the borrowed references the object file is 261kB! I then went back to the original library and looked at the sizes of the individual _refs and found that for a table like UCD_DECOMP_MAP: &'static [((u8,u8,u8),&'static [(u8,u8,u8)])], each value of type ((u8,u8,u8),&'static [(u8,u8,u8)]) takes up 24 bytes (3 bytes for the u8 triplet, 5 bytes of padding and 16 bytes for the pointer), and that as a result these tables take up a lot more room than I would have thought. I think I can now fully account for all the filesize.
Of course, 3MB is still quite small, I just wanted to keep the file as small as possible!
Thanks to Matthieu M. and Chris Emerson for pointing me towards the solution. This is a summary of the updates in the question, sorry for the duplication!
It seems that there are two reasons for the supposed bloat:
The .rlib file outputted is not a pure object file, but is an ar archive file. Usually such a file would consist entirely of one or more object files, but rust also includes metadata. Part of the reason for this seems to be to obviate the need for separate header files. This accounted for around 40% of the final filesize.
My calculations turned out to not be accurate for some of the tables, which also happened to be the largest ones. Using nm I was able to find that for normal tables such as UCD_CAT: &'static [((u8,u8,u8), (u8,u8,u8), UnicodeCategory)], the size was 7 bytes for each item (which is actually less than I originally anticipated, assuming 8 bytes for alignment). The total of all these tables was about 230kB, and the object file including just these came in at 260kB (after extraction), so this was all consistent.
However, examining the nm output more closely for the other tables (such as UCD_DECOMP_MAP: &'static [((u8,u8,u8),&'static [(u8,u8,u8)])]) was more difficult because they appear as anonymous borrowed objects. Nevertheless, it turned out that each ((u8,u8,u8),&'static [(u8,u8,u8)]) actually takes up 24 bytes: 3 bytes for the first tuple, 5 bytes of padding, and an unexpected 16 bytes for the pointer. I believe this is because the pointer also includes the size of the referenced array. This added around a megabyte of bloat to the library, but does seem to account for the entire filesize.

Structure Packing

I'm currently learning C# and my first project (as a learning experiment) is to create a DBF reader. I'm having some difficulty understanding "packing" according to this: http://www.developerfusion.com/pix/articleimages/dec05/structs1.jpg
If I specified a packing of 2, wouldn't all structure elements begin on a 2-byte boundary, and if I specified a packing of 4, wouldn't all structure elements begin on a 4-byte boundary, and also consume a minimum of 4 bytes each?
For instance, a byte element would be placed on a 4 byte boundary, and the element following it (in a sequential layout) would be located on the next 4-byte boundary (losing 3 bytes to padding)?
In the image shown, in the "pack=4" it shows a byte that is on a 2 byte boundary, following a short.
If I understand the picture correctly, pack equal to n means that one variable cannot be stored "between" two packs of lengths n. In other words, bytes which compose a variable cannot cross one pack's boundary. This is only true if the size of a variable is less or equal to the size of a pack.
Let's take Pack = 4 as an example. Here, we can safely store a byte and a short in one pack, because they require 3 bytes of memory together. But since there is only one byte in the pack left, it requires one byte of padding to be able to store an int into the data structure, because what's left in the pack is too little to store the whole int.
I hope the explanation makes sense.
Looking at the picture again, I think it would be better if all data were aligned to the same side of a pack, either to bottom or top. This would make it clearer what's going on.

File (.wav) duration while writing PCM data #16KBps

I am writing some silent PCM data on a file #16KBps. This file is of .wav format. For this I have the following code:
#define DEFAULT_BITRATE 16000
long LibGsmManaged:: addSilence ()
{
char silenceBuf[DEFAULT_BITRATE];
if (fout) {
for (int i = 0; i < DEFAULT_BITRATE; i++) {
silenceBuf[i] = '\0';
}
fwrite(silenceBuf, sizeof(silenceBuf), 1, fout);
}
return ftell(fout);
}
Updated:
Here is how I write the header
void LibGsmManaged::write_wave_header( )
{
if(fout) {
fwrite("RIFF", 4, 1, fout);
total_length_pos = ftell(fout);
write_int32(0);
fwrite("WAVE", 4, 1, fout);
fwrite("fmt ",4, 1, fout);
write_int32(16);
write_int16(1);
write_int16(1);
write_int32(8000);
write_int32(16000);
write_int16(2);
write_int16(16);
fwrite("data",4,1,fout);
data_length_pos = ftell(fout);
write_int32(0);
}
else {
std::cout << "File pointer not correctly initialized";
}
}
void LibGsmManaged::write_int32( int value)
{
if(fout) {
fwrite( (const char*)&value, sizeof(value), 1, fout);
}
else {
std::cout << "File pointer not correctly initialized";
}
}
I run this code on my iOS device using NSTimer with interval 1.0 sec. So AFAIK, if I run this for 60 sec, I should get a file.wav that when played should show 60 sec as its duration (again AFAIK). But in actual test it displays almost double duration i.e. 2 min. (approx). I have also tested that when I change the DEFAULT_BITRATE to 8000, then the file duration is almost correct.
I am unable to identify what is going on here. Am I missing something bad here? I hope my code is not wrong.
What you're trying to do (write your own WAV files) should be totally doable. That's the good news. However, I'm a bit confused about your exact parameters and constraints, as are many others in the comments, which is why they have been trying to flesh out the details.
You want to write raw, uncompressed, silent PCM to a WAV file. Okay. How wide does the PCM data need to be? You are creating an array of chars that you are writing to the file. A char is an 8-bit byte. Is that what you want? If so, then you need to use a silent center point of 0x80 (128). 8-bit PCM in WAV files is unsigned, i.e., 0..255, and 128 is silent.
If you intend to store silent 16-bit data, that will be signed data, so the center point (between -32768 and 32767) is 0. Also, it will be stored in little endian byte format. But since it's silence (all 0s), that doesn't matter.
The title of your question indicates (and the first sentence reiterates) that you want to write data at 16 kbps. Are you sure you want raw 16 kbps audio? That's 16 kiloBITs per second, or 16000 bits per second. Depending on whether you are writing 8- or 16-bit PCM samples, that only allows for 2000 or 1000 Hz audio, which is probably not what you want. Did you mean 16 kHz audio? 16 kHz audio translates to 16000 audio samples per second, which more closely aligns with your code. Then again, your code mentions GSM (LibGsmManaged), so maybe you are looking for 16 kbps audio. But I'll assume we're proceeding along the raw PCM route.
Do you know in advance how many seconds of audio you need to write? That makes this process really easy. As you may have noticed, the WAV header needs length information in a few spots. You either write it in advance (if you know the values) or fill it in later (if you are writing an indeterminate amount).
Let's assume you are writing 2 seconds of raw, monophonic, 16000 Hz, 16-bit PCM to a WAV file. The center point is 0x0000.
WAV writing process:
Write 'RIFF'
Write 32-bit file size, which will be 36 (header size - first 8 bytes) + 64000 (see step 12 about that number)
Write 'WAVEfmt ' (with space)
Write 32-bit format header size (16)
Write 16-bit audio format (1 indicating raw PCM audio)
Write 16-bit channel count (1 because it's monophonic)
Write 32-bit sample rate (number of audio sample per second = 16000)
Write 32-bit byte rate (number of bytes per second = 32000)
Write 16-bit block alignment (2 bytes per sample * 1 channel = 2)
Write 16-bit bits per sample (16)
Write 'data'
Write 32-bit length of audio payload data (16000 samples/second * 2 bytes/sample * 2 seconds = 64000 bytes)
Write 64000 bytes, all 0 values
If you need to write a dynamic amount of audio data, leave the length field from steps 2 and 12 as 0, then seek back after you're done writing and fill those in. I'm not convinced that your original code was writing the length fields correctly. Some playback software might ignore those, others might not, so you could have gotten varying results.
Hope that helps! If you know Python, here's another question I answered which describes how to write a WAV file using Python's struct library (I referred to that code fragment a lot while writing the steps above).

Reading SWF Header with Objective-C

I am trying to read the header of an SWF file using NSData.
According to SWF format specification I need to access movie's width and height reading bits, not bytes, and I couldn't find a way to do it in Obj-C
Bytes 9 thru ?: Here is stored a RECT (bounds of movie). It must be read in binary form. First of all, we will transform the first byte to binary: "01100000"
The first 5 bits will tell us the size in bits of each stored value: "01100" = 12
So, we have 4 fields of 12 bits = 48 bits
48 bits + 5 bits (header of RECT) = 53 bits
Fill to complete bytes with zeroes, till we reach a multiple of 8. 53 bits + 3 alignment bits = 56 bits (this RECT is 7 bytes length, 7 * 8 = 56)
I use this formula to determine all this stuff:
Where do I start?
ObjC is a superset of C: You can run C code alongside ObjC with no issues.
Thus, you could use a C-based library like libming to read bytes from your SWF file.
If you need to shuffle bytes into an NSData object, look into the -dataWithBytes:length: method.
Start by looking for code with a compatible license that already does what you want. C libraries can be used from Obj-C code simply by linking them in (or arranging for them to be dynamically linked in) and then calling their functions.
Failing that, start by looking at the Binary Data Programming Guide for Cocoa and NSData Class Reference. You'd want to pull out the bytes that contain the bits you're interested in, then use bit masking techniques to extract the bits you care about. You might find the BitTst(), BitSet(), and BitClr() functions and their friends useful, if they're still there in Snow Leopard; I'm not sure whether they ended up in the démodé parts of Carbon or not. There are also the Posix setbit(), clrbit(), isset(), and isclr() macros defined in . Then, finally, there are the C bitwise operators: ^, |, &, ~, <<, and >>.