Audio through CAN FD into headphones - embedded

I am trying to record audio using a 12 bit resolution ADC, take the sample buffer and send it through CAN FD to another device, which takes samples of this audio and creates a .wav and plays it. The problem is that I see the data of the microphone being sent through CAN FD to the other device, but I am not able to transform this data into a .wav file properly and hear what I say through the microphone. I only hear beeps.
I'm creating a new .wav every 4 CAN FD messages in order to make some kind of real time communication and decrease the delay, but I don't think this is possible or if I am thinking it the proper way.
In this thread I take the message sent by the CAN FD and concatenate it in a buffer in order to introduce it in a .wav file. I have tried bigger buffers but it doesn't change the outcome.
How could I be able to take the data from the CAN FD and hear it?
Clarification: I know using CAN FD to transmit audio isn't the proper way, but it is for a master project.
struct canfd_frame frame;
CAN_MSG msg;
int trama_can[72];
int nbytes;
while (status_libreria == 0)
;
unsigned char buffer[256];
// FILE * fPtr;
int i=0,x=0;
//fPtr = fopen("Test.txt", "w");
while (1) {
do {
nbytes = read(s, &frame, sizeof(struct canfd_frame));
} while (nbytes == 0);
msg.id.ext = frame.can_id;
msg.dlc = frame.len;
if (msg.dlc > 8)
msg.dlc = 8; //Protecci�n hasta adaptar AC3LIB a CANFD
Numas_memcpy(&(msg.data.bdata), &(frame.data), msg.dlc);
can_frame_2_ac3lib(&msg, BUS_VERTICAL);
for(x=0;x<64;x++) buffer[i*64+x] = frame.data[x];
printf("%d \r\n",frame.data[x]);
printf("i:%d \r\n",i);
// Copiar datos a fichero.wav y reproducirlo simultaneamente
if (i == 3) {
printf("Datos IN\r\n");
write_wav("prueba.wav",256 , (short int *)buffer, 16000);
//fwrite(buffer,1,sizeof(buffer),fPtr);
//fclose(fPtr);
system("aplay prueba.wav -f cd");
i = 0;
system("rm prueba.wav");
}
i++;
}
32 first bytes of the audio file being recorded
In the picture, as you can see, the data is being recorded. moreover, this data is the same data as in the ADC, but when I play it, I only hear noise.

Simplify the problem first. Make sure you can transmit known data from one end to the other first at low rates. I'm sure the suggestion below will sound far too trivial. But until you are absolutely confident you understand it all, I predict you sill have many struggles.
Slowly - one frame per second, or even slower.
Learn to send one 0x55 byte from one end to the other and verify at the receiver.
Learn to send a few 0x55 in one frame and verify.
Learn to send 0x12345678 - verify it ends up with the bytes in the right order at the other end
Learn to send a counter. Check it at the receiver, make sure you do not drop any data.
Now do it all again but 10x faster.
Continue until you can send a counter at 10x the rate you need to for the audio without dropping any frames at all, for minutes and then hours.
Stress the rest of the system to make sure it still works under stress.
Only now, can you start to learn about sending audio.
Trust me, you will learn a lot!

Related

Can't get my DAC(PT8211) to work correctly using a PIC32MX uc and SPI

I'm just trying to learn to use external ADC and DAC (PT8211) with my PIC32MX534f06h.
So far, my code is just about sampling a signal with my ADC every time a timer-interrupt is triggered, then sending then same signal out to the DAC.
The interrupt and ADC part works fine and have been tested independently, but the voltages that my DAC outputs don't make much sens to me and stay at 2,5V (it's powered at 0 - 5V).
I've tried to feed the DAC various values ranging from 0 to 65534 (16bits DAC so i guess it should be the expected range of the values to feed to it, right?) voltage stays at 2.5V.
I've tried changing the SPI configuration, using different SPIs (3 and 4) and DACs (I have one soldered to my pcb, soldered to SPI3, and one one breadboard, linked to SPI4 in case the one soldered on my board was defective).
I made sure that the chip selection line works as expected.
I couldn't see the data and clock that are transmissed since i don't have a scope yet.
I'm a bit out of ideas now.
Chip selection and SPI configuration settings
signed short adc_value;
signed short DAC_output_value;
int Empty_SPI3_buffer;
#define Chip_Select_DAC_Set() {LATDSET=_LATE_LATE0_MASK;}
#define Chip_Select_DAC_Clr() {LATDCLR=_LATE_LATE0_MASK;}
#define SPI4_CONF 0b1000010100100000 // SPI on, 16-bit master,CKE=1,CKP=0
#define SPI4_BAUD 100 // clock divider
DAC output function
//output to external DAC
void DAC_Output(signed int valueDAC) {
INTDisableInterrupts();
Chip_Select_DAC_Clr();
while(!SPI4STATbits.SPITBE); // wait for TX buffer to empty
SPI4BUF=valueDAC; // write byte to TX buffer
while(!SPI4STATbits.SPIRBF); // wait for RX buffer to fill
Empty_SPI3_buffer=SPI4BUF; // read RX buffer
Chip_Select_DAC_Set();
INTEnableInterrupts();
}
ISR sampling the data, triggered by Timer1. This works fine.
ADC_input inputs the data in the global variable adc_value (12 bits, signed)
//ISR to sample data
void __ISR( _TIMER_1_VECTOR, IPL7SRS) Test_data_sampling_in( void)
{
IFS0bits.T1IF = 0;
ADC_Input();
//rescale the signed 12 bit audio values to unsigned 16 bits wide values
DAC_output_value = adc_value + 2048; //first unsign the signed 12 bit values (between 0 - 4096, center 2048)
DAC_output_value = DAC_output_value *16; // the scale between 12 and 16 bits is actually 16=65536/4096
DAC_Output(DAC_output_value);
}
main function with SPI, IO, Timer configuration
void main() {
SPI4CON = SPI4_CONF;
SPI4BRG = SPI4_BAUD;
TRISE = 0b00100000;
TRISD = 0b000000110100;
TRISG = 0b0010000000;
LATD = 0x0;
SYSTEMConfigPerformance(80000000L); //
INTCONSET = _INTCON_MVEC_MASK; /* Set the interrupt controller for multi-vector mode */
//
T1CONbits.TON = 0; /* turn off Timer 1 */
T1CONbits.TCKPS = 0b11; /* pre-scale = 1:1 (T1CLKIN = 80MHz (?) ) */
PR1 = 1816; /* T1 period ~ ? */
TMR1 = 0; /* clear Timer 1 counter */
//
IPC1bits.T1IP = 7; /* Set Timer 1 interrupt priority to 7 */
IFS0bits.T1IF = 0; /* Reset the Timer 1 interrupt flag */
IEC0bits.T1IE = 1; /* Enable interrupts from Timer 1 */
T1CONbits.TON = 1; /* Enable Timer 1 peripheral */
INTEnableInterrupts();
while (1){
}
}
I would expect to see the voltage at the ouput of my DAC to mimic those I put at the input of my ADC, instead the DAC output value is always constant, no matter what I input to the ADC
What am i missing?
Also, when turning the SPIs on, should I still manually manage the IO configuration of the SDI SDO SCK pins using TRIS or is it automatically taken care of?
First of all I agree that the documentation I first found for PT8211 is rather poor. I found extended documentation here. Your DAC (PT8211) is actually an I2S device, not SPI. WS is not chip select, it is word select (left/right channel). In I2S, If you are setting WS to 0, that means the left channel. However it looks like in the extended datasheet I found that WS 0 is actually right channel (go figure).
The PIC you've chosen doesn't seem to have any I2S hardware so you might have to bit bash it. There is a lot of info on I2S though ,see I2S bus specification .
There are some slight differences with SPI and I2C. Notice that the first bit is when WS transitions from high to low is the LSB of the right channel. and when WS transitions from low to high, it is not the LSB of the left channel. Note that the output should be between 0.4v to 2.4v (I2S standard), not between 0 and 5V. (Max is 2.5V which is what you've been seeing).
I2S
Basically, I'd try it with the proper protocol first with a bit bashing algorithm with continuous flip flopping between a left/right channel.
First of all, thanks a lot for your comment. It helps a lot to know that i'm not looking at a SPI transmission and that explains why it's not working.
A few reflexions about it
I googled Bit bashing (banging?) and it seems to be CPU intensive, which I would definately try to avoid
I have seen a (successful) projet (in MikroC) where someone transmit data from that exact same PIC, to the same DAC, using SPI, with apparently no problems whatsoever So i guess it SHOULD work, somehow?
Maybe he's transforming the data so that it works? here is the code he's using, I'm not sure what happens with the F15 bit toggle, I was thinking that it was done to manage the LSB shift problem. Here is the piece of (working) MikroC code that i'm talking about
valueDAC = valueDAC + 32768;
valueDAC.F15 =~ valueDAC.F15;
Chip_Select_DAC = 0;
SPI3_Write(valueDAC);
Chip_Select_DAC = 1;
From my understanding, the two biggest differences between SPI and I2S is that SPI sends "bursts" of data where I2S continuously sends data. Another difference is that data sent after the word change state is the LSB of the last word.
So i was thinking that my SPI is triggered by a timer, which is always the same, so even if the data is not sent continuously, it will just make the sound wave a bit more 'aliased' and if it's triggered regularly enough (say at 44Mhz), it should not be SO different from sending I2S data at the same frequency, right?
If that is so, and I undertand correctly, the "only" problem left is to manage the LSB-next-word-MSB place problem, but i thought that the LSB is virtually negligible over 16bit values, so if I could just bitshift my value to the right and then just fix the LSB value to 0 or 1, the error would be small, and the format would be right.
Does it sounds like I have a valid 'Mc-Gyver-I2S-from-my-SPI' or am I forgetting something important?
I have tried to implement it, so far without success, but I need to check my SPI configuration since i'm not sure that it's configured correctly
Here is the code so far
SPI config
#define Chip_Select_DAC_Set() {LATDSET=_LATE_LATE0_MASK;}
#define Chip_Select_DAC_Clr() {LATDCLR=_LATE_LATE0_MASK;}
#define SPI4_CONF 0b1000010100100000
#define SPI4_BAUD 20
DAaC output function
//output audio to external DAC
void DAC_Output(signed int valueDAC) {
INTDisableInterrupts();
valueDAC = valueDAC >> 1; // put the MSB of ValueDAC 1 bit to the right (becase the MSB of what is transmitted will be seen by the DAC as the LSB of the last value, after a word select change)
//Left channel
Chip_Select_DAC_Set(); // Select left channel
SPI4BUF=valueDAC;
while(!SPI4STATbits.SPITBE); // wait for TX buffer to empty
SPI4BUF=valueDAC; // write 16-bits word to TX buffer
while(!SPI4STATbits.SPIRBF); // wait for RX buffer to fill
Empty_SPI3_buffer=SPI4BUF; // read RX buffer (don't know why we need to do this here, but we do)
//SPI3_Write(valueDAC); MikroC option
// Right channel
Chip_Select_DAC_Clr();
SPI4BUF=valueDAC;
while(!SPI4STATbits.SPITBE); // wait for TX buffer to empty
SPI4BUF=valueDAC; // write 16-bits word to TX buffer
while(!SPI4STATbits.SPIRBF); // wait for RX buffer to fill
Empty_SPI3_buffer=SPI4BUF;
INTEnableInterrupts();
}
The data I send here is signed, 16 bits range, I think you said that it's allright with this DAC, right?
Or maybe i could use framed SPI? the clock seems to be continous in this mode, but I would still have the LSB MSB shifting problem to solve.
I'm a bit lost here, so any help would be cool

Embedded System - Polling

I have about 6 sensors (GPS, IMU, etc.) that I need to constantly collect data from. For my purposes, I need a reading from each (within a small time frame) to have a complete data packet. Right now I am using interrupts, but this results in more data from certain sensors than others, and, as mentioned, I need to have the data matched up.
Would it be better to move to a polling-based system in which I could poll each sensor in a set order? This way I could have data from each sensor every 'cycle'.
I am, however, worried about the speed of polling because this system needs to operate close to real time.
Polling combined with a "master timer interrupt" could be your friend here. Let's say that your "slowest" sensor can provide data on 20ms intervals, and that the others can be read faster. That's 50 updates per second. If that's close enough to real-time (probably is close for an IMU), perhaps you proceed like this:
Set up a 20ms timer.
When the timer goes off, set a flag inside an interrupt service routine:
volatile uint8_t timerFlag = 0;
ISR(TIMER_ISR_whatever)
{
timerFlag = 1; // nothing but a semaphore for later...
}
Then, in your main loop act when timerFlag says it's time:
while(1)
{
if(timerFlag == 1)
{
<read first device>
<read second device>
<you get the idea ;) >
timerflag = 0;
}
}
In this way you can read each device and keep their readings synched up. This is a typical way to solve this problem in the embedded space. Now, if you need data faster than 20ms, then you shorten the timer, etc. The big question, as it always is in situations like this, is "how fast can you poll" vs. "how fast do you need to poll." Only experimentation and knowing the characteristics and timing of your various devices can tell you that. But what I propose is a general solution when all the timings "fit."
EDIT, A DIFFERENT APPROACH
A more interrupt-based example:
volatile uint8_t device1Read = 0;
volatile uint8_t device2Read = 0;
etc...
ISR(device 1)
{
<read device>
device1Read = 1;
}
ISR(device 2)
{
<read device>
device2Read = 1;
}
etc...
// main loop
while(1)
{
if(device1Read == 1 && device2Read == 1 && etc...)
{
//< do something with your "packet" of data>
device1Read = 0;
device2Read = 0;
etc...
}
}
In this example, all your devices can be interrupt-driven but the main-loop processing is still governed, paced, by the cadence of the slowest interrupt. The latest complete reading from each device, regardless of speed or latency, can be used. Is this pattern closer to what you had in mind?
Polling is a pretty good and easy to implement idea in case your sensors can provide data practically instantly (in comparison to your desired output frequency). It does get into a nightmare when you have data sources that need a significant (or even variable) time to provide a reading or require an asynchronous "initiate/collect" cycle. You'd have to sort your polling cycles to accommodate the "slowest" data source.
What might be a solution in case you know the average "data conversion rate" of each of your sources, is to set up a number of timers (per data source) that trigger at poll time - data conversion rate and kick in the measurement from those timer ISRs. Then have one last timer that triggers at poll timer + some safety margin that collects all the conversion results.
On the other hand, your apparent problem of "having too many measurements" from the "fast" data sources wouldn't bother me too much as long as you don't have anything reasonable to do with that wasted CPU/sensor load.
A last and easier approach, in case you have some cycles to waste, is: Simply sort the data sources from "slowest" to "fastest" and initiate a measurement in that order, then wait for results in the same order and poll.

Objective-C - Arduino communication stops

I am trying to control some stepper motors using an app I am writing in Objective-C and my Arduino board. I use popen to send a byte (the letter a) to my Arduino and count the steps taken (Xc):
-(IBAction)PlusX:(id)sender{
popen("echo a > /dev/tty.usbmodem621", "r");
Xc = Xc +1;
_lXc.stringValue = #(Xc);
[NSThread sleepForTimeInterval:0.16f];
}
My Arduino reads this in the void loop and makes a step.
if (Serial.available() > 0) {
// read the incoming byte:
incomingByte = Serial.read();
Serial.println(incomingByte);
if(incomingByte == 'a'){
MotorX->step(1, FORWARD, SINGLE);
}
}
This al works pretty much as expected. Except after byte/step/action 144 the Objective-C app keeps counting the steps correctly, however they don't appear in Arduino's serial monitor and the motor stops making steps. Is there anybody who knows why this keeps happening?
Thanks
you are opening many file (in linux everything is a file) but never closing it. Then you are creating "garbage" open file, because you can't use that file anymore, but it is there. And any OS give you a limit of maximum file open (file descriptor limit); in your case about 144.
when you "kill" your application, that descriptor get released, so you can use them again.
Also, because you don't check the descriptor is valid, you didn't debugged that after the 144 open it was giving error.
Solution (that does NOT check for error) is
pclose(popen("echo a > /dev/tty.usbmodem621", 'r'));

Arduino + OV7670 - Without FIFO - Reading Snapshot

I know that there is a lot in internet (http://forum.arduino.cc/index.php?topic=159557.0 for example) about OV7670 and I read a lot about it, but seems something is missing.
First of all I took a look into the way how can we read pixel by pixel from the camera to build the rectangular 600 X 480 image, and this was quite easy to understand considering HREF, VSYNCH and PCLOCK described on documentation here: http://www.voti.nl/docs/OV7670.pdf. I understand XCLOCK as an input I need to give to OV7670 as a kind of cycle controller and RESET would be something to reset it.
So at this point I thought that the functionality of such camera would be covered by wiring the following pins:
D0..D7 - for data (pixel) connected to arduino digital pins 0 to 7 as INPUT on arduino board
XCLK - for camera clock connected to arduino digital pin 8 as OUTPUT from arduino board
PCLK - for pixel clock connected to arduino digital pin 9 as INPUT on arduino board
HREF - to define when a line starts / ends connected to arduino digital pin 10 as INPUT on arduino board
VSYCH - to define when a frame starts / ends connected to arduino digital pin 11 as INPUT on arduino board
GRD - groud connected to arduino GRD
3V3 - 3,3 INPUT connected to arduino 3,3v
RESET - connected to arduino RESET
PWDN - connected to arduino GRD
The implementation for such approach from my point of view would be something like:
Code:
for each loop function do
write high to XCLK
if VSYNCH is HIGH
return;
if HREF is LOW
return;
if lastPCLOCK was HIGH and currentPCLOCK is LOW
readPixelFromDataPins();
end for
My readPixelFromDataPins() basically read just the first byte (as I'm just testing if I can even read something from the camera), and it is written as follows:
Code:
byte readPixelFromDataPins() {
byte result = 0;
for (int i = 0; i < 8; i++) {
result = result << 1 | digitalRead(data_p[i]);
}
return result;
}
In order to check if something is being read from the camera I just print it to the Serial 9600, the byte read from data pins as a number. But currently I'm receiving only zero values. The code I'm using to retrieve an image is stored here: https://gist.github.com/franciscospaeth/8503747.
Did somebody that makes OV7670 work with Arduino already figure out what am I doing wrong? I suppose I'm using the XCLOCK wrongly right? What shall I do to get it working?
I searched a lot and I didn't found any SSCCE (http://sscce.org/) for this camera using arduino, if somebody have it please let me know.
This question is present on arduino forum (http://forum.arduino.cc/index.php?topic=211741.0) too.
your idea is not bad but ...
the xclock need to be a clock (in your program is just a transition from 0 to 1 and is freezing there)
you need also to use I2C with SIOC and SIOD for configuring the camera (or you can use the default settings, but I am not sure if is the correct output format for you, 30F/s,VGA, YUV format ....)
your code execution is slower using the serial output in the same loop with reading data
I will recommend you to toggle the xclock pin and to move the pixel print in a if(). Also you will be able to read Data only in a very precise time, if you want to read only one byte, than after a transition from 0 to 1 of HREF you need to wait for a new transition from 0 to 1 of PCLK (you will be able to see only one 0-1 transition of HREF after 784x2 transitions of PCLK, (640 active pixels + 144 dead time for each line) x 2 (for YUV or RGB are 2 bytes received for each pixel) )
Hello I am Mr_Arduino from the arduino forums. Your issue is that you are reading pixels too slow please do not use digital read to do such a thing. Also if you insist on using a separate function just to read a byte make sure the function is being inlined. You can do this by declaring your function as static inline. Also as mentioned above how are you generating the clock. You can generate the XCLK using PWM on the arduino.
I have created a working example here:
https://github.com/ComputerNerd/arduino-camera-tft/blob/master/captureimage.c
Edit: a 3rd party has copied part but not all of the code from the above link into the answer here. However, the link must remain as the code posted below requires additional files from that source to actually work.
Edit 2: Removed irrelevant code. You will need to modify what you do with the data.
void capImg(void){
cli();
uint8_t w,ww;
uint8_t h;
w=160;
h=240;
tft_setXY(0,0);
CS_LOW;
RS_HIGH;
RD_HIGH;
DDRA=0xFF;
//DDRC=0;
#ifdef MT9D111
while (PINE&32){}//wait for low
while (!(PINE&32)){}//wait for high
#else
while (!(PINE&32)){}//wait for high
while (PINE&32){}//wait for low
#endif
while (h--){
ww=w;
while (ww--){
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
WR_LOW;
while (PINE&16){}//wait for low
PORTA=PINC;
WR_HIGH;
while (!(PINE&16)){}//wait for high
}
}
CS_HIGH;
sei();
}
You can also find it on github.
You can use my instruction: how to retrieve image from ov7670 It contains all the steps you need. There is also instuction to setup FrameGrabber: how to run framegrabber

File (.wav) duration while writing PCM data #16KBps

I am writing some silent PCM data on a file #16KBps. This file is of .wav format. For this I have the following code:
#define DEFAULT_BITRATE 16000
long LibGsmManaged:: addSilence ()
{
char silenceBuf[DEFAULT_BITRATE];
if (fout) {
for (int i = 0; i < DEFAULT_BITRATE; i++) {
silenceBuf[i] = '\0';
}
fwrite(silenceBuf, sizeof(silenceBuf), 1, fout);
}
return ftell(fout);
}
Updated:
Here is how I write the header
void LibGsmManaged::write_wave_header( )
{
if(fout) {
fwrite("RIFF", 4, 1, fout);
total_length_pos = ftell(fout);
write_int32(0);
fwrite("WAVE", 4, 1, fout);
fwrite("fmt ",4, 1, fout);
write_int32(16);
write_int16(1);
write_int16(1);
write_int32(8000);
write_int32(16000);
write_int16(2);
write_int16(16);
fwrite("data",4,1,fout);
data_length_pos = ftell(fout);
write_int32(0);
}
else {
std::cout << "File pointer not correctly initialized";
}
}
void LibGsmManaged::write_int32( int value)
{
if(fout) {
fwrite( (const char*)&value, sizeof(value), 1, fout);
}
else {
std::cout << "File pointer not correctly initialized";
}
}
I run this code on my iOS device using NSTimer with interval 1.0 sec. So AFAIK, if I run this for 60 sec, I should get a file.wav that when played should show 60 sec as its duration (again AFAIK). But in actual test it displays almost double duration i.e. 2 min. (approx). I have also tested that when I change the DEFAULT_BITRATE to 8000, then the file duration is almost correct.
I am unable to identify what is going on here. Am I missing something bad here? I hope my code is not wrong.
What you're trying to do (write your own WAV files) should be totally doable. That's the good news. However, I'm a bit confused about your exact parameters and constraints, as are many others in the comments, which is why they have been trying to flesh out the details.
You want to write raw, uncompressed, silent PCM to a WAV file. Okay. How wide does the PCM data need to be? You are creating an array of chars that you are writing to the file. A char is an 8-bit byte. Is that what you want? If so, then you need to use a silent center point of 0x80 (128). 8-bit PCM in WAV files is unsigned, i.e., 0..255, and 128 is silent.
If you intend to store silent 16-bit data, that will be signed data, so the center point (between -32768 and 32767) is 0. Also, it will be stored in little endian byte format. But since it's silence (all 0s), that doesn't matter.
The title of your question indicates (and the first sentence reiterates) that you want to write data at 16 kbps. Are you sure you want raw 16 kbps audio? That's 16 kiloBITs per second, or 16000 bits per second. Depending on whether you are writing 8- or 16-bit PCM samples, that only allows for 2000 or 1000 Hz audio, which is probably not what you want. Did you mean 16 kHz audio? 16 kHz audio translates to 16000 audio samples per second, which more closely aligns with your code. Then again, your code mentions GSM (LibGsmManaged), so maybe you are looking for 16 kbps audio. But I'll assume we're proceeding along the raw PCM route.
Do you know in advance how many seconds of audio you need to write? That makes this process really easy. As you may have noticed, the WAV header needs length information in a few spots. You either write it in advance (if you know the values) or fill it in later (if you are writing an indeterminate amount).
Let's assume you are writing 2 seconds of raw, monophonic, 16000 Hz, 16-bit PCM to a WAV file. The center point is 0x0000.
WAV writing process:
Write 'RIFF'
Write 32-bit file size, which will be 36 (header size - first 8 bytes) + 64000 (see step 12 about that number)
Write 'WAVEfmt ' (with space)
Write 32-bit format header size (16)
Write 16-bit audio format (1 indicating raw PCM audio)
Write 16-bit channel count (1 because it's monophonic)
Write 32-bit sample rate (number of audio sample per second = 16000)
Write 32-bit byte rate (number of bytes per second = 32000)
Write 16-bit block alignment (2 bytes per sample * 1 channel = 2)
Write 16-bit bits per sample (16)
Write 'data'
Write 32-bit length of audio payload data (16000 samples/second * 2 bytes/sample * 2 seconds = 64000 bytes)
Write 64000 bytes, all 0 values
If you need to write a dynamic amount of audio data, leave the length field from steps 2 and 12 as 0, then seek back after you're done writing and fill those in. I'm not convinced that your original code was writing the length fields correctly. Some playback software might ignore those, others might not, so you could have gotten varying results.
Hope that helps! If you know Python, here's another question I answered which describes how to write a WAV file using Python's struct library (I referred to that code fragment a lot while writing the steps above).