PointCloud2 Storage Format - blob

I am trying to implement in FPGA a ROS publisher node of PointCloud2 messages. As a first step, i have already implemented a publisher node on the FPGA that is publishing strings. Now, i am trying to do the same but for the PointCloud2 message format.
It is very simple to understand how a string is stored, basically each character is converted to its ASCII value and stored (as it can be seen here). On the other hand, a PointCloud2 is a complex data type that is not so easy to understand.
I have made some progress on understanding how the metadata of a PointCloud2 is stored, however, it is being very difficult to understand the storage of the data part of the PointCloud2 data type. To simplify, I have also tried a PointCloud2 with only one point but i couldn't decode it either. I know that the X, Y and Z are sequentially ordered with 4 bytes each (the datatype is a Float32). Therefore, i can isolate the 4 bytes corresponding to one of the coordinates. I have tried to assign to the X coordinate the values from 0 to 17 (in decimal). This are the values stored when using these values (they are all in decimal):
1 = [0, 0, 128, 63] -> little-endian so, the most significant byte is 63 followed by, 128, 0, 0
2 = [0, 0, 0, 64]
3 = [0, 0, 64, 64]
4 = [0, 0, 128, 64]
5 = [0, 0, 160, 64]
6 = [0, 0, 192, 64]
7 = [0, 0, 224, 64]
8 = [0, 0, 0, 65]
9 = [0, 0, 16, 65]
10 = [0, 0, 32, 65]
11 = [0, 0, 48, 65]
12 = [0, 0, 64, 65]
13 = [0, 0, 80, 65]
14 = [0, 0, 96, 65]
15 = [0, 0, 112, 65]
16 = [0, 0, 128, 65]
17 = [0, 0, 136, 65]
So, my question is, how are the values stored? Supposedly the data part is stored in binary blobs according to here. However, i don't understand what does this mean and how it works. Also, i have not found any concrete example on how to convert a decimal value to this representation.
For a PointCloud2 with X, Y and Z my current understanding is the following (here is the corresponding data):
header:
seq (4 bytes)
stamp (8 bytes)
frame_id (1 byte per character)
height: 4 bytes
width: 4 bytes
fields:
number of fields (4 bytes)
field 1
dimension (4 bytes)
name (1 byte)
offset (4 bytes)
datatype (1 byte)
count (4 bytes)
field 2
dimension (4 bytes)
name (1 byte)
offset (4 bytes)
datatype (1 byte)
count (4 bytes)
field 3
dimension (4 bytes)
name (1 byte)
offset (4 bytes)
datatype (1 byte)
count (4 bytes)
is_bigendian: 1 byte
point_step: 4 bytes
row_step: 4 bytes
size: 4 bytes
data: size bytes
is_dense: 1 byte

As I selected the datatype 7 (FLOAT32) the data part is stored as indicated in the standard IEEE 754 (which is a widely used standard to efficiently store floating-point numbers). Thankfully, there is already a Xilinx IP (Floating-Point Operator) to deal with floating-point numbers, including conversions from other types such as INT32 and mathematical operations such as multiplications.

Related

multilabel classification with counts

I am trying to train a model for multi-output/multilabel classification where the labels are not binary but counts. Suppose the possible labels are A, B, C, D and E, for example, the y matrix could be
y = [[2, 0, 0, 0, 0], [0, 1, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 3]]
I will have at most 5 labels and the count values could at most be 5 (usually 2). Options:
expanding the label space as {A,B,C,D,E} x {1,2,3,4,5} and hence reducing it to a binary label. The number of actual combinations may be less than 15 and my datasets have ~100k rows (not too big). Then using catboost MultiLabel binary classification with MultiLogLoss.
scaling all counts using max among all counts of labels. So, the above would be (dividing all numbers by 3) [[0.66,0,0,0,0], [0,0.33,0.33,0,0], [0,0,0,0.33,0], [0,0,0,0,1]]. Then using catboost MultiLogloss.
Which would be better? Is there anyway of training a model using xgboost objective=count:poisson and sklearn.multioutput.MultiOutputClassifier? Sample code for that would really help.

Count number of unique colours in image [duplicate]

This question already has answers here:
Most dominant color in RGB image - OpenCV / NumPy / Python
(3 answers)
Closed 3 years ago.
I am trying to count the number of unique colours in an image. I have some code that I think should work however when I run it on an image its saying a I have 252 different colours out of a possible 16,777,216‬. That seems wrong given the image is BGR so shouldn't their be much more different colours (thousands not hundreds?)?
def count_colours(src):
unique, counts = np.unique(src, return_counts=True)
print(counts.size)
return counts.size
src = cv2.imread('../../images/di8.jpg')
src = imutils.resize(src, height=300)
count_colours(src) # outputs 252 different colours!? only?
Is that value correct? And if not how can I fix my function count_colours()?
Source image:
Edit: is this correct?
def count_colours(src):
unique, counts = np.unique(src.reshape(-1, src.shape[-1]), axis=0, return_counts=True)
return counts.size
If you look at the uniques you are getting back, I'm pretty sure you'll find they are scalars.
You need to use the axis keyword:
>>> import numpy as np
>>> from scipy.misc import face
>>>
>>> img = face()
>>> np.unique(img.reshape(-1, img.shape[-1]), axis=0, return_counts=True)
(array([[ 0, 0, 5],
[ 0, 0, 7],
[ 0, 0, 9],
...,
[255, 248, 255],
[255, 249, 255],
[255, 252, 255]], dtype=uint8), array([1, 2, 2, ..., 1, 1, 1]))
The comment by # Edeki Okoh is correct. You need to find a way to take the color channels into account. There is probably a much cleaner solution but a hacky way to do this would be something like this. Each color channels has values from 0 to 255 so we add 1 in order to make sure that it gets multiplied. Blue will represent the last the digits, green the middle three ones and red the first three. Now every value is representing a unique color.
b,g,r = cv2.split(src)
shiftet_im = b + 1000 * (g + 1) + 1000 * 1000 * (r + 1)
The resulting image should have one channel with each value representing a unique color combination.
I think you only counted for a single channel e.g R-value out of full RGB channel. that's why you have only 252 discrete values.
In theory R G B each can have 256 discrete states.
256*256*256 =16777216
means in total you can have 16777216 possibilities of colors.
My suggestion is to convert RGB uchar CV_8UC3 into a single 32bit data structure like CV_32FC1
Let
Given image as input
# my test small sie text image. which I can count the number of the state by hand
import cv2
import numpy as np
image=cv2.imread('/home/usr/naneDownloads/vuQ9y.png' )# change here
b,g,r = cv2.split(image)
out_in_32U_2D = np.int32(b) << 16 + np.int32(g) << 8 + np.int32(r) #bit wise shift 8 for each channel.
out_in_32U_1D= out_in_32U_2D.reshape(-1) #convert to 1D
np.unique(out_in_32U_1D)
array([-2147483648, -2080374784, -1073741824, -1006632960, 0,
14336, 22528, 30720, 58368, 91136,
123904, 237568, 368640, 499712, 966656,
1490944, 2015232, 3932160, 6029312, 8126464,
15990784, 24379392, 32768000, 65011712, 67108864,
98566144, 132120576, 264241152, 398458880, 532676608,
536870912, 805306368, 1073741824, 1140850688, 1342177280,
1610612736, 1879048192], dtype=int32)
len(np.unique(out_in_32U_1D))
37 # correct for my test wirting paper when compare when my manual counting
The code here should be able to provide you with what you needed

How to read the Numpy documentation

For example:
random.randint(low, high=None, size=None, dtype='l')
>>> np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])
The parameters are low, high, size, dtype, but then what's the parameter of the '2' in the function call? It doesn't correspond to any parameter?
'2' corresponds to the low parameter.
You don't need to specify 'low' because its a positional argument.
In general, random.randint generates number between [low,high). But, if high=None, then it generates number between [0,low)
Hence you get integers between 0 and 2(excluded)

What is the difference between a Categorical Column and a Dense Column?

In Tensorflow, there are 9 different feature columns, arranged into three groups: categorical, dense and hybrid.
From reading the guide, I understand categorical columns are used to represent discrete input data with a numerical value. It gives the example of a categorical column called categorical identity column:
ID Represented using one-hot encoding
0 [1, 0, 0, 0]
1 [0, 1, 0, 0]
2 [0, 0, 1, 0]
3 [0, 0, 0, 1]
But you also have a dense column called indicator column, which 'wraps'(?) a categorical column to produce something that looks almost identical:
Category (from category column) Represented as...
0 [1, 0, 0, 0]
1 [0, 1, 0, 0]
2 [0, 0, 1, 0]
3 [0, 0, 0, 1]
So both 'categorical' and 'dense' columns seems to be able to represent discrete data, and both can use one-hot encoding, so that's not what distinguishes one from another.
My question is: In principle, what are the difference between a 'categorical column' and a 'dense column'?
I just came across this before finding an answer on the DataScience StackExchange,
you can find the original answer here
If i understood correctly the answer is simply that while the categorical column will indeed encode the data as one-hot the indicatorcolumn will encode it as multi-hot

tensorflow transform a (structured) dense matrix to sparse, when number of rows unknow

My task is to transform a special formed dense matrix tensor into a sparse one. e.g. input matrix M as followed (dense positive integer sequence followed by 0 as padding in each row)
[[3 5 7 0]
[2 2 0 0]
[1 3 9 0]]
Additionally, given the non-padding length for each row, e.g. given by tensor L =
[3, 2, 3].
The desired output would be sparse tensor S.
SparseTensorValue(indices=array([[0, 0],[0, 1],[0, 2],[1, 0],[1, 1],[2, 0],[2, 1], [2, 2]]), values=array([3, 5, 7, 2, 2, 1, 3, 9], dtype=int32), shape=array([3, 4]))
This is useful in models where objects are described by variable-sized descriptors (S are then used in embedding_lookup_sparse to connect embeddings of descriptors.)
I am able to do it when number of M's row is known (by python loop and ops like slice and concat). However, M's row number here is determined by mini-batch size and could change (say in testing phase). Is there a good way to implement that? I am trying some control_flow_ops but haven't succeeded.
Thanks!!