TensorFlow Lite: Is tensor string buffer format ASCII or UTF-8? - tensorflow

Are the strings stored in a .tflite tensor buffer in ASCII or UTF-8 format?

The few TensorFlow Lite ops that deal with string can handle UTF-8 and adhere to the format described in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/string_util.h#L17
That is:
4 bytes specifying the number of strings in the tensor
a section describing the length and location (offset) of each string inside the buffer, and the length of the buffer itself.
a section containing the actual strings.

Related

How do i save a numpy tensor to a file

I am trying to convert images to numpy tensor that is 4 dimensional and I want to save this into a file, preferrably a csv file. I already have the image array in the dimension I want them but now the problem is saving to a file. Please how do I go about doing this.
Numpy has a lot of options for IO of array data:
If binary format is Ok, you can use np.save to save the 4D tensor in a binary (".npy") format. The file can be read again with np.load. This is a very convenient way to save numpy data, and it works for numeric arrays of any number of dimensions.
np.savetxt can write a 1D or 2D array in CSV-like text format. You could use np.reshape to flatten your tensor down to 1D or 2D and then use np.savetxt. The downside is the file doesn't keep track of the full 4D shape, so you'll need to track that separately in some way.
If storing in text representation is important, a better option may be to convert the tensor to string with np.array2string, then write the string to file. This works even for arrays with more than 2 dimensions.
The .tofile method simply dumps the element data as a raw binary file. No shape or another other metadata is preserved, but the binary file is easy to read into other programs.

Converting numpy array (image) to pdf base64

I have an image, represented as a numpy array.
I want to avoid writing it out as a pdf, and then reading the file back to get the base64 representation of the file, is there an easier way to do this without writing a file?
My goal is to have the base64 representation of the output pdf file (without outputting one)
If I understand correctly, the base64 encoding is different for jpgs and pdfs, is this correct?.
Using PIL Image.fromarray function, one could convert all the images to PIL images.
Then again using PIL, save() could be used to save the images together as a PDF and write them to a buff:
buff = io.BytesIO()
pil_images[0].save(buff, "PDF", resolution=100.0, save_all=True, append_images=pil_images[1:])
buff.getvalue() returns the bytes (which is good enough for me, but it is also still possible to get the base64 representation)

How to change four bytes into float32 in tensorflow?

I use tf.FixedLengthRecordReader to read file and get a list of uint8 tensors. And I want to transform the first four bytes into one float32.
For example, the first four bytes are 0xAA,0xBB,0xCC,0xDD, I want to get 0xAABBCCDD and change it into float32. We known if we use C++, it is easy that we just use (double*)address(0xAA). But how can I do in tensorflow?
You need to use tf.decode_raw, which has an out_type argument that specifies the cast to be done, e.g.
record_bytes = tf.decode_raw(value, tf.float32)

Parsing HEVC Stream for non IDR frames

I parsed the HEVC stream with this code (after converting bytes to hex format)
string[] NALunit_string = Regex.Split(fsStringASHex, #"000001|00000001");
but after looking at the NAL units types, I can find some with reserved types. Is it normal?

Read in 4-byte words from binary file in Julia

I have a simple binary file that contains 32-bit floats adjacent to each other.
Using Julia, I would like to read each number (i.e. each 32-bit word) and put them each sequentially into a array of Float32 format.
I've tried a few different things through looking at the documentation, but all have yielded impossible values (I am using a binary file with known values as dummy input). It appears that:
Julia is reading the binary file one-byte at a time.
Julia is putting each byte into a Uint8 array.
For example, readbytes(f, 4) gives a 4-element array of unsigned 8-bit integers. read(f, Float32, DIM) also gives strange values.
Anyone have any idea how I should proceed?
I'm not sure of the best way of reading it in as Float32 directly, but given an array of 4*n Uint8s, I'd turn it into an array of n Float32s using reinterpret (doc link):
raw = rand(Uint8, 4*10) # i.e. a vector of Uint8 aka bytes
floats = reinterpret(Float32, raw) # now a vector of 10 Float32s
With output:
julia> raw = rand(Uint8, 4*2)
8-element Array{Uint8,1}:
0xc8
0xa3
0xac
0x12
0xcd
0xa2
0xd3
0x51
julia> floats = reinterpret(Float32, raw)
2-element Array{Float32,1}:
1.08951e-27
1.13621e11
(EDIT 2020: Outdated, see newest answer.) I found the issue. The correct way of importing binary data in single precision floating point format is read(f, Float32, NUM_VALS), where f is the file stream, Float32 is the data type, and NUM_VALS is the number of words (values or data points) in the binary data file.
It turns out that every time you call read(f, [...]) the data pointer iterates to the next item in the binary file.
This allows people to be able to read in data line-by-line simply:
f = open("my_file.bin")
first_item = read(f, Float32)
second_item = read(f, Float32)
# etc ...
However, I wanted to load in all the data in one line of code. As I was debugging, I had used read() on the same file pointer several times without re-declaring the file pointer. As a result, when I experimented with the correct operation, namely read(f, Float32, NUM_VALS), I got an unexpected value.
Julia Language has changed a lot since 5 years ago. read() no longer has API to specify Type and length simultaneously. reinterpret() creates a view of a binary array instead of array with desired type. It seems that now the best way to do this is to pre-allocate the desired array and fill it with read!:
data = Array{Float32, 1}(undef, 128)
read!(io, data)
This fills data with desired float numbers.