Why write to BinaryWriter twice? - vb.net

I'm implementing this tone-generator program and it works great:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/c2b953b6-3c85-4eda-a478-080bae781319/beep-beep?forum=vbgeneral
What I can't figure out, is why the following two lines of code:
BW.Write(Sample)
BW.Write(Sample)
One "write" makes sense, but why the second "write"?

The example is a bit cryptic but the wave file is configured to be 2 channels thus the two writes are simply sending the same audio data to both channels.
The wave header is this hardcoded bit:
Dim Hdr() As Integer = {&H46464952, 36 + Bytes, &H45564157, _
&H20746D66, 16, &H20001, 44100, _
176400, &H100004, &H61746164, Bytes}
Which decoded means:
H46464952 = 'RIFF' (little endian)
36+Bytes = Length of header + length of data
H45564157 = 'WAVE' (little endian)
H20746D66 = 'fmt ' (little endian)
16 = length of fmt chunk (always 16)
H20001 = 0x0001: PCM,
0x0002: 2 channels
44100 = sampleRate
176400 = sampleRate*numChannels*bytesPerSample = 44100*2*2
H100004 = 0x0004: numChannels*bytesPerSample,
0x0010: bitsPerSample (16)
H61746164 = 'data'
Bytes = size of data chunk

Related

Make numpy array which is hash of string array

I have numpy array:
A = np.array(['abcd','bcde','cdef'])
I need hash array of A: with function
B[i] = ord(A[i][1]) * 256 + ord(A[i][2])
B = np.array([ord('b') * 256 + ord('c'), ord('c') * 256 + ord('d'), ord('d') * 256 + ord('e')])
How I can do it?
Based on the question, I assume the string are ASCII one and all strings have a size bigger than 3 characters.
You can start by converting strings to ASCII one for sake of performance and simplicity (by creating a new temporary array). Then you can merge all the string in one big array without any copy thanks to views (since Numpy strings are contiguously stored in memory) and you can actually convert characters to integers at the same time (still without any copy). Then you can use the stride so to compute all the hash in a vectorized way. Here is how:
ascii = A.astype('S')
buff = ascii.view(np.uint8)
result = buff[1::ascii.itemsize]*256 + buff[2::ascii.itemsize]
Congratulation! Speed increase four times!
import time
import numpy as np
Iter = 1000000
A = np.array(['abcd','bcde','cdef','defg'] * Iter)
Ti = time.time()
B = np.zeros(A.size)
for i in range(A.size):
B[i] = ord(A[i][1]) * 256 + ord(A[i][2])
DT1 = time.time() - Ti
Ti = time.time()
ascii = A.astype('S')
buff = ascii.view(np.uint8)
result = buff[1::ascii.itemsize]*256 + buff[2::ascii.itemsize]
DT2 = time.time() - Ti
print("Equal = %s" % np.array_equal(B, result))
print("DT1=%7.2f Sec, DT2=%7.2f Sec, DT1/DT2=%6.2f" % (DT1, DT2, DT1/DT2))
Output:
Equal = True
DT1= 3.37 Sec, DT2= 0.82 Sec, DT1/DT2= 4.11

How to properly select wanted data and discard unwanted data from binary files

I'm working on a project where I'm trying to convert old 16bit binary data files into 32bit data files for later use.
Straight conversion is no issue, but then i noticed i needed to remove header data from the data-file's.
The data consists of 8206 bytes long frames, each frame consists of 14 byte long header and 4096 bytes long data -block, depending on file, there are either 70313 or 70312 frames in each file.
i couldn't find a neat way to find all the header and remove them and save only the data-block to a new file.
so heres what I did:
results_array = np.empty([0,1], np.uint16)
for filename in file_list:
num_files += 1
# read data from file as 16bit's and save it as 32bit
data16 = np.fromfile(data_dir + "/" + filename, dtype=np.uint16)
filesize = np.prod(data16.shape)
if filesize == 288494239:
total_frames = 70313
#total_frames = 3000
else:
total_frames = 70312
#total_frames = 3000
frame_count = 0
chunksize = 4103
with open(data_dir + "/" + filename, 'rb') as file:
while frame_count < total_frames:
frame_count += 1
read_data = file.read(chunksize)
if not read_data:
break
data = read_data[7:4103]
results_array = np.append(results_array,data)
converted = np.frombuffer(results_array, np.uint16)
print(str(frame_count) + "/" + str(total_frames))
converted = np.frombuffer(results_array, np.uint16)
data32 = converted.astype(dtype=np.uint32) * 256
It works (i think it does atleast), but it is very very slow.
So question is, is there a way to do the above much faster, maybe some build-in function in numpy or something else perhaps?
Thanks in advance
Finally managed to crack this one, and it is 100x faster than initial approach :)
data = np.fromfile(read_dir + "/" + file, dtype=np.int16)
frames = len(data) // 4103 # framelenght
# Reshape into array such that each row is a frame
data = np.reshape(data[:frames * 4103], (frames, 4103))
# Remove headers and convert to int32
data = data[:, 7:].astype(np.int32) * 256

Calculating the size of const char array:

I have this:
const char changedValue [] = {0xCA,0x06,0x03,0x80,0x01,0x00};
and I need to calculate the total of the six bytes and add it to the end of that array the checksum of all bytes.
The size of a byte array with six bytes is... six.
If you need to include a (byte-size?) checksum, it must be one byte larger.

Retrieve indices for rows of a PyTables table matching a condition using `Table.where()`

I need the indices (as numpy array) of the rows matching a given condition in a table (with billions of rows) and this is the line I currently use in my code, which works, but is quite ugly:
indices = np.array([row.nrow for row in the_table.where("foo == 42")])
It also takes half a minute, and I'm sure that the list creation is one of the reasons why.
I could not find an elegant solution yet and I'm still struggling with the pytables docs, so does anybody know any magical way to do this more beautifully and maybe also a bit faster? Maybe there is special query keyword I am missing, since I have the feeling that pytables should be able to return the matched rows indices as numpy array.
tables.Table.get_where_list() gives indices of the rows matching a given condition
I read the source of pytables, where() is implemented in Cython, but it seems not fast enough. Here is a complex method that can speedup:
Create some data first:
from tables import *
import numpy as np
class Particle(IsDescription):
name = StringCol(16) # 16-character String
idnumber = Int64Col() # Signed 64-bit integer
ADCcount = UInt16Col() # Unsigned short integer
TDCcount = UInt8Col() # unsigned byte
grid_i = Int32Col() # 32-bit integer
grid_j = Int32Col() # 32-bit integer
pressure = Float32Col() # float (single-precision)
energy = Float64Col() # double (double-precision)
h5file = open_file("tutorial1.h5", mode = "w", title = "Test file")
group = h5file.create_group("/", 'detector', 'Detector information')
table = h5file.create_table(group, 'readout', Particle, "Readout example")
particle = table.row
for i in range(1001000):
particle['name'] = 'Particle: %6d' % (i)
particle['TDCcount'] = i % 256
particle['ADCcount'] = (i * 256) % (1 << 16)
particle['grid_i'] = i
particle['grid_j'] = 10 - i
particle['pressure'] = float(i*i)
particle['energy'] = float(particle['pressure'] ** 4)
particle['idnumber'] = i * (2 ** 34)
# Insert a new particle record
particle.append()
table.flush()
h5file.close()
Read the column in chunks and append the indices into a list and concatenate the list to array finally. You can change the chunk size according to your memory size:
h5file = open_file("tutorial1.h5")
table = h5file.get_node("/detector/readout")
size = 10000
col = "energy"
buf = np.zeros(batch, dtype=table.coldtypes[col])
res = []
for start in range(0, table.nrows, size):
length = min(size, table.nrows - start)
data = table.read(start, start + batch, field=col, out=buf[:length])
tmp = np.where(data > 10000)[0]
tmp += start
res.append(tmp)
res = np.concatenate(res)

Enum bitwise masking limitations

I was trying to enumerate filetypes with bitmasking for fast and easy distinguishing on bitwise OR:
typedef enum {
FileTypeDirectory = 1,
FileTypePIX = 2,
FileTypeJPG = 4,
FileTypePNG = 8,
FileTypeGIF = 16,
FileTypeHTML = 32,
FileTypeXML = 64,
FileTypeTXT = 128,
FileTypePDF = 256,
FileTypePPTX = 512,
FileTypeAll = 1023
} FileType;
My OR operations did work until 128, afterwards it failed. Are enums on a 64 Bit Mac OSX limited to Byte Datatypes? (2^7=128)
All enum constants in C are of type int and not of the type of the enumeration itself. So the restriction is not in the storage size for enum variables, but only in the number of bits for an int.
I don't know much of objective-c (as this is tagged also) but it shouldn't deviate much from C.
I'm not quite sure how you used the OR operator but it works for me well with your typedef.
FileType _fileType = FileTypeGIF | FileTypePDF | FileTypePPTX;
NSLog(#"filetype is : %d", _fileType);
the result is:
filetype is : 784
which is correct values because 16 + 256 + 512 is precisely 784.
(it has been tested on real device only.)