Tensorflow CNN filters really detect contrast in images with big white zones? - tensorflow

I'm having problems to understand how a CNN filter is able to give a higher value to perfect fit patchs when you have grayscale images with big white zones.
For example, imagine that I have the next 3x3 filter:
0-1-0
0-1-0
0-1-0
And this filter is applied to one image with big completely white zones. For example, I could to have a patch of that image, like this:
255-255-255
255-255-255
255-255-255
and for this patch, the kernel would return (0*255 + 0*255 + 0*255) + (1*255 + 1*255 + 1*255) + (0*255 + 0*255 + 0*255) = 765
and if I apply the same filter to this patch image:
0-255-0
0-255-0
0-255-0
I would get the same value: (0*0 + 0*0 + 0*0) + (1*255 + 1*255 + 1*255) + (0*0 + 0*0 + 0*0) = 765
But the last one image patch should have got a much better value for kernel, so I am going crazy to understand how this works really
Thanks in advance!

Well, after a few days thinking about it, I have found the answer to my question, using negative values in kernel. After see so many kernel examples with 1's and 0's, I didn't think that the values could to be negatives too.

Related

Efficiently implementing DXT1 texture decompression in hardware

DXT1 compression is designed to be fast to decompress in hardware where its used in texture samplers. The Wikipedia article says that under certain circumstances you can work out the co-efficients of the interpolated colours as:
c2 = (2/3)*c0+(1/3)*c1
or rearranging that:
c2 = (1/3)*(2*c0+c1)
However you re-arrange the above equation, then you end up always having to multiply something by 1/3 (or dividing by 3, same deal even more expensive). And it seems weird to me that a texture format which is designed to be fast to decompress in hardware would require a multiplication or division. The FPGA I'm implementing my GPU on only has limited resources for multiplications and I want to save those for where they're really required.
So am I missing something? Is there an efficient way of avoiding the multiplications of the colour channels by a 1/3? Or should I just eat the cost of that multiplication?
This might be a bad way of imagining it, but could you implement it via the use of addition/subtraction of successive halves (shifts)?
As you have 16 bits this gives you the ability to get quite accurate with successive additions and subtractions.
A third could be represented as
a(n+1) = a(n) +/- A>>1, where, the list [0, 0, 1, 0, 1, etc] shows whether to add or subtract the shifted result.
I believe this is called fractional maths.
However, in FPGAs, it is difficult to know whether this is actually more power efficient than the native DSP blocks (e.g. DSP48E1) provided.
MY best answer I can come up with is that I can use the identity:
x/3 = sum(n=1 to infinity) (x/2^(2n))
and then take the first n terms. Using 4 terms I get:
(x/4)+(x/16)+(x/64)+(x/256)
which equals
x*0.33203125
which is probably good enough.
This relies on multiplication by a fixed power of 2 being free in hardware, then 3 additions of which I can run 2 in parallel.
Any better answer is appreciated though.
** EDIT **: Using a combination of this and #dyslexicgruffalo's answer I made a simple c++ program which iterated over the various sequences and tried them all and recorded the various average/max errors.
I did this for 0 <= x <= 189 (as 189 is the value of 2*c0.g + c1.g when g (which is 6 bits) maxes out.
The shortest good sequence (with a max error of 2, average error of 0.62) and is 4 ops was:
1 + x/4 + x/16 + x/64.
The best sequence which had a max error of 1, average error of 0.32, but is 6 ops was:
x/2 - x/4 + x/8 - x/16 + x/32 - x/64.
For the 5 bit values (red and blue) the maximum value is 31*3 and the above sequences are still good but not the best. These are:
x/4 + x/8 - x/16 + x/32 [max error of 1, average 0.38]
and
1 + x/4 + x/16 [max error of 2, average of 0.68]
(And, luckily, none of the above sequences ever guesses an answer which is too big so no clamping is needed even though they're not perfect)

What are the criteria for the weight of deeplab my custom dataset?

I'm training Deeplab v3 by making custom data set in three class, including background
Then, My class is background, panda, bottle and there are 1949 pictures.
and I'm using a moblienetv2 model
and segmentation_dataset.py has been modified as follow.
_MYDATA_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 975, # num of samples in images/training
'trainval': 1949,
'val': 974, # num of samples in images/validation
},
num_classes=3,
ignore_label=0,
)
train.py has been modified as follow.
flags.DEFINE_boolean('initialize_last_layer', False,
'Initialize the last layer.')
flags.DEFINE_boolean('last_layers_contain_logits_only', True,
'Only consider logits as last layers or not.')
train_utils.py has not been modified.
not_ignore_mask = tf.to_float(tf.not_equal(scaled_labels, ignore_label)) * loss_weight
I get some results, but not the perfect ones.
For example, the mask colors of panda and bottles are same or not distinct
The result that I want is panda of red and bottle of green
So, I judged that there was a problem with the weight.
Based on the other people's questions, train_utils.py was configured as follows
irgore_weight = 0
label0_weight =1
label1_weight = 10
label2_weight = 15
not_ignore_mask =
tf.to_float(tf.equal(scaled_labels, 0)) * label0_weight +
tf.to_float(tf.equal(scaled_labels, 1)) * label1_weight +
tf.to_float(tf.equal(scaled_labels, 2)) * label2_weight +
tf.to_float(tf.equal(scaled_labels, ignore_label)) * irgore_weight
tf.losses.softmax_cross_entropy(
one_hot_labels,
tf.reshape(logits, shape=[-1, num_classes]),
weights=not_ignore_mask,
scope=loss_scope)
I have a question here.
What are the criteria for the weight?
My data set consists of the following.
enter image description here
It's automatically generating, so I don't know exactly which one is more, but it's a similar amount.
And another thing, I'm using Pascal's color map type.
This is the first black background and the second red third green.
I want to designate pandas as red and bottles as green exactly. What should I do?
I think you might have mixed up your label definition. Maybe I can help you with that. Please check again your segmentation_dataset.py. Here, you define "0" as the ignored label. This means that all pixels which are labeled as "0" are excluded from the training process (more specifically, excluded in the calculation of the loss function and so have no influence in the updating of the weights). In the light of this situation it is crucial to not "ignore" the background class as it is also a class you want to predict correctly. In train_utils.py you assign a weightening factor to the ignored class which would have no effect- --> Make sure that you don't mix up your three training classes [background, panada, bottle] with the "ignored" tag.
In your case num_classes=3 should be correct as it specifies the number of labels to predict (the model automatically assumes these labels are 0, 1 and 2. If you want to ignore certain labels you have to annotate them with a fourth label class (just choose a number >2 for that) and then assign this label to ignored_label. If you don't have pixels to be ignored still set ignored_label=255 and it will not influence your training;)

Custom windowed operator in tensorflow

I want to implement next formula: Si = (x1 - w1) + ... + (xn - wn) like a Conv=x1*w1 + ... xn*wn for some area under X (input tensor) and kernel W. Important that this operation repeat on all areas X with slice W with stride and padding params like a simple convolution.
How do this?
I found similar question a while ago on stackoverflow. But it finished on custom implementation on C++ and compilation or changes CUDA source or something like that.
Is there an easier way today?
If I get you correctly then you compute Si=(x1 + ... + xn)-(w1 + ... + wn)? The sum of the weights is a single number, so you don't have a "Kernel" anymore. The first sum you can compute via tf.nn.conv2d and a filter that is initialized with tf.ones. But I don't think that's what you meant to do, so could you maybe specify your question further?

combining LSA/LSI with Naive Bayes for document classification

I'm new to the gensim package and vector space models in general, and I'm unsure of what exactly I should do with my LSA output.
To give a brief overview of my goal, I'd like to enhance Naive Bayes Classifier using topic modeling to improve classification of reviews (positive or negative). Here's a great paper I've been reading that has shaped my ideas but left me still somewhat confused about implementation..
I've already got working code for Naive Bayes--currently, I'm just using unigram bag of words as my features and labels are either positive or negative.
Here's my gensim code
from pprint import pprint # pretty printer
import gensim as gs
# tutorial sample documents
docs = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
# stoplist removal, tokenization
stoplist = set('for a of the and to in'.split())
# for each document: lowercase document, split by whitespace, and add all its words not in stoplist to texts
texts = [[word for word in doc.lower().split() if word not in stoplist] for doc in docs]
# create dict
dict = gs.corpora.Dictionary(texts)
# create corpus
corpus = [dict.doc2bow(text) for text in texts]
# tf-idf
tfidf = gs.models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
# latent semantic indexing with 10 topics
lsi = gs.models.LsiModel(corpus_tfidf, id2word=dict, num_topics =10)
for i in lsi.print_topics():
print i
Here's output
0.400*"system" + 0.318*"survey" + 0.290*"user" + 0.274*"eps" + 0.236*"management" + 0.236*"opinion" + 0.235*"response" + 0.235*"time" + 0.224*"interface" + 0.224*"computer"
0.421*"minors" + 0.420*"graph" + 0.293*"survey" + 0.239*"trees" + 0.226*"paths" + 0.226*"intersection" + -0.204*"system" + -0.196*"eps" + 0.189*"widths" + 0.189*"quasi"
-0.318*"time" + -0.318*"response" + -0.261*"error" + -0.261*"measurement" + -0.261*"perceived" + -0.261*"relation" + 0.248*"eps" + -0.203*"opinion" + 0.195*"human" + 0.190*"testing"
0.416*"random" + 0.416*"binary" + 0.416*"generation" + 0.416*"unordered" + 0.256*"trees" + -0.225*"minors" + -0.177*"survey" + 0.161*"paths" + 0.161*"intersection" + 0.119*"error"
-0.398*"abc" + -0.398*"lab" + -0.398*"machine" + -0.398*"applications" + -0.301*"computer" + 0.242*"system" + 0.237*"eps" + 0.180*"testing" + 0.180*"engineering" + 0.166*"management"
Any suggestions or general comments would be appreciated.
Just started working on the same problem, but with SVM instead, AFAIK after training your model you need to do something like this:
new_text = 'here is some document'
text_bow = dict.doc2bow(new_text)
vector = lsi[text_bow]
Where vector is a topic distribution in your document, with length equal to number of topics you choose for training, 10 in your case.
So you need to represent all your documents as topic distributions and than feed them to classification algorithm.
P.S. I know it's kind of an old question, but I keep seeing it in google results every time I searching )

Rotate 3D Euler point using Quaternions to avoid gimbal lock

Firstly, I have done much googling and checking other stackoverflow posts about this, but cannot get a working reply or a snippet of working code. Maths is not my strength.
I need to have a routine that takes a camera point (CX,CY,CZ) and rotate it about a lookat point (LX,LY,LZ) by three rotation angles (RX,RY,RZ). Using euler rotations leads to gimbal lock in some cases which I need to avoid. So I heard about using quaternions.
I found this to convert the rotations into a quaternion
http://www.euclideanspace.com/maths/geometry/rotations/conversions/eulerToQuaternion/index.htm
and this to convert from a quaternion back to euler XYZ rotations
http://www.euclideanspace.com/maths/geometry/rotations/conversions/quaternionToEuler/index.htm
They seem to work fine, but I need to know how to use the quaternion to rotate the CX,CY,CZ around LX,LY,LZ and then return the new CX,CY,CZ without issues of gimbal lock.
There is so much out there about this, that I am sure a good explanation and snippet of code will help not only me but many others in the future.
So please help if you can. Many thanks.
The short answer, if your quaternion is Q and the new camera point is C':
C' = Q*(C-L)*Q^-1 + L
where the points are augmented with Cw=0 and multiplication and inverse are according to quaternion rules.
Specifically, let D = C - L. Then we let F = Q*D:
Fw = Qw*0 - Qx*Dx - Qy*Dy - Qz*Dz
Fx = Qw*Dx + Qx*0 + Qy*Dz - Qz*Dy
Fy = Qw*Dy - Qx*Dz + Qy*0 + Qz*Dx
Fz = Qw*Dz + Qx*Dy - Qy*Dx + Qz*0
Finally, we get C' = F*Q^-1 + L:
Cw' = 0
Cx' = Fw*Qx - Fx*Qw + Fy*Qz - Fz*Qy + Lx
Cy' = Fw*Qy - Fx*Qz - Fy*Qw + Fz*Qx + Ly
Cz' = Fw*Qz + Fx*Qy - Fy*Qx - Fz*Qw + Lz
However, be aware that if you're creating the quaternion from an Euler representation, you'll still end up with gimbal lock. The gimbal lock is a property of the Euler representation and the quaternion will just represent the same transformation. To get rid of gimbal lock, you'll need to avoid the Euler representation altogether, unless I misunderstand how you're using it.