How can I convert wav to a float array buffer? - dtmf

I need to be able to read a wav file and load the data into my buffer. I can use that array to convolve with other float array. How do I accomplish this?

Check out sox. You can use sox to read-in a wav file, and output the samples as floating-point values between -1 and 1.

Related

Why is 32768 used as a constant to normalize the wav data in VGGish?

I'm trying to follow along with what the code is doing for VGGish and I came across a piece that I don't really understand. In vggish_input.py there is this:
def wavfile_to_examples(wav_file):
"""Convenience wrapper around waveform_to_examples() for a common WAV format.
Args:
wav_file: String path to a file, or a file-like object. The file
is assumed to contain WAV audio data with signed 16-bit PCM samples.
Returns:
See waveform_to_examples.
"""
wav_data, sr = wav_read(wav_file)
assert wav_data.dtype == np.int16, 'Bad sample type: %r' % wav_data.dtype
samples = wav_data / 32768.0 # Convert to [-1.0, +1.0]
return waveform_to_examples(samples, sr)
Where does the constant of 32768 come from and how does dividing that convert the data to samples?
I found this for converting to -1 and +1 and not sure how to bridge that with 32768.
https://stats.stackexchange.com/questions/178626/how-to-normalize-data-between-1-and-1
32768 is 2^15. int16 has a range of -32768 to +32767. If you have int16 as input and divide it by 2^15, you get a number between -1 and +1.

How do i save a numpy tensor to a file

I am trying to convert images to numpy tensor that is 4 dimensional and I want to save this into a file, preferrably a csv file. I already have the image array in the dimension I want them but now the problem is saving to a file. Please how do I go about doing this.
Numpy has a lot of options for IO of array data:
If binary format is Ok, you can use np.save to save the 4D tensor in a binary (".npy") format. The file can be read again with np.load. This is a very convenient way to save numpy data, and it works for numeric arrays of any number of dimensions.
np.savetxt can write a 1D or 2D array in CSV-like text format. You could use np.reshape to flatten your tensor down to 1D or 2D and then use np.savetxt. The downside is the file doesn't keep track of the full 4D shape, so you'll need to track that separately in some way.
If storing in text representation is important, a better option may be to convert the tensor to string with np.array2string, then write the string to file. This works even for arrays with more than 2 dimensions.
The .tofile method simply dumps the element data as a raw binary file. No shape or another other metadata is preserved, but the binary file is easy to read into other programs.

Converting numpy array (image) to pdf base64

I have an image, represented as a numpy array.
I want to avoid writing it out as a pdf, and then reading the file back to get the base64 representation of the file, is there an easier way to do this without writing a file?
My goal is to have the base64 representation of the output pdf file (without outputting one)
If I understand correctly, the base64 encoding is different for jpgs and pdfs, is this correct?.
Using PIL Image.fromarray function, one could convert all the images to PIL images.
Then again using PIL, save() could be used to save the images together as a PDF and write them to a buff:
buff = io.BytesIO()
pil_images[0].save(buff, "PDF", resolution=100.0, save_all=True, append_images=pil_images[1:])
buff.getvalue() returns the bytes (which is good enough for me, but it is also still possible to get the base64 representation)

TFRecord larger than the original data

Actually, I am dealing with many pictures which are from different videos, so I use tf.SequenceExample() to save them as different sequences and their labels attached into TFRcord.
But after running my code to generate TFRecord, it generates a TFRecord which is 29GB larger than my original pictures 3GB.
Is that normal to create TFRecord larger than the original data?
You may be storing the decoded images instead of the jpeg encoded ones. TFRecord has no concept of image formats so you can use any encoding you want. To keep the size the same, convert the original image file contents to a BytesList and store that without calling decode_image or using any image libraries or anything that understands image formats.
Another possibility is you might be storing the image as an Int64List full of bytes which would be 8x the size. Instead, store it as a BytesList containing a single Bytes.
Check the the type of data you load. I guess you load images as pixel-data. Every pixel is unit8 (8 bit) and likely to be converted to float (32 bit). Hence you have to expect that it gets 4 times the original size (3 GB -> 12 GB).
Also, the original format might have (better) compression than tfrecords. (I'm not sure if tfrecords can use compression)

Read in 4-byte words from binary file in Julia

I have a simple binary file that contains 32-bit floats adjacent to each other.
Using Julia, I would like to read each number (i.e. each 32-bit word) and put them each sequentially into a array of Float32 format.
I've tried a few different things through looking at the documentation, but all have yielded impossible values (I am using a binary file with known values as dummy input). It appears that:
Julia is reading the binary file one-byte at a time.
Julia is putting each byte into a Uint8 array.
For example, readbytes(f, 4) gives a 4-element array of unsigned 8-bit integers. read(f, Float32, DIM) also gives strange values.
Anyone have any idea how I should proceed?
I'm not sure of the best way of reading it in as Float32 directly, but given an array of 4*n Uint8s, I'd turn it into an array of n Float32s using reinterpret (doc link):
raw = rand(Uint8, 4*10) # i.e. a vector of Uint8 aka bytes
floats = reinterpret(Float32, raw) # now a vector of 10 Float32s
With output:
julia> raw = rand(Uint8, 4*2)
8-element Array{Uint8,1}:
0xc8
0xa3
0xac
0x12
0xcd
0xa2
0xd3
0x51
julia> floats = reinterpret(Float32, raw)
2-element Array{Float32,1}:
1.08951e-27
1.13621e11
(EDIT 2020: Outdated, see newest answer.) I found the issue. The correct way of importing binary data in single precision floating point format is read(f, Float32, NUM_VALS), where f is the file stream, Float32 is the data type, and NUM_VALS is the number of words (values or data points) in the binary data file.
It turns out that every time you call read(f, [...]) the data pointer iterates to the next item in the binary file.
This allows people to be able to read in data line-by-line simply:
f = open("my_file.bin")
first_item = read(f, Float32)
second_item = read(f, Float32)
# etc ...
However, I wanted to load in all the data in one line of code. As I was debugging, I had used read() on the same file pointer several times without re-declaring the file pointer. As a result, when I experimented with the correct operation, namely read(f, Float32, NUM_VALS), I got an unexpected value.
Julia Language has changed a lot since 5 years ago. read() no longer has API to specify Type and length simultaneously. reinterpret() creates a view of a binary array instead of array with desired type. It seems that now the best way to do this is to pre-allocate the desired array and fill it with read!:
data = Array{Float32, 1}(undef, 128)
read!(io, data)
This fills data with desired float numbers.