I am following the course of Andrew Ng on the topic of Deep Learning. In one programming assignment that uses the SIGN dataset. For what I know each image is composed of 64 by 64 pixels of width and height, and another dimension of 3 that corresponds to the RGB channels.
According to the author it says that the value of:
n_x=num_px * num_px = 64 * 64 * 3 = 12288
and having the following data:
number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
the part that I do not understand is when the author initializes the weights, he says that the shape of W1 (an array of weights) is:
W1 : [25, 12288]
this part I do not get it, why 25 as the number of rows? I get it that the number of columns corresponds to the formula of n_x, but this 25 to what it refers to? is it the number of neurons inside a hidden layer?
Thanks
This looks like 12288 is the number of input nodes and 25 is the number of nodes in the hidden layer.
Thus, the number of weights should be = 25 * 12288 (Each node in Layer(i) is connected to each node in Layer(i+1)), and thus the size of the matrix.
Related
I am in the process of translating a Keras implementation to a PyTorch one. After the full conversion my model was not converging fast enough, although the loss did seem to be decreasing. As I was tracing back my steps, I noticed something a bit odd about my embedding layer. Let me explain the data:
I have 4 batches, each with a sequence length of 100, and a vocab size of 83. I am working with songs in ABC notation, so the song can have 83 different symbols in it and it is 100 symbols long.
So now I have an ndarray of shape (4, 100) which contains my 4 sequences of songs. Let's call it x.
Now if I pass x into an embedding layer in Keras:
tf.keras.layers.Embedding(83, 256, batch_input_shape=[4, None])(x).numpy()
I get a more "narrow" set of values for each batch than I do in PyTorch, does this affect my convergence?. I.E. the minimum value in the first batch is -0.04999 and the maximum value is 0.04999.
Now if I pass the same x into my PyTorch embedding layer:
torch.nn.Embedding(4*100, 256)(torch.tensor(x)).detach().numpy()
I get a "wider" set of values for each batch. The maximum value is 3.3865 and the minimum value is -3.917.
My question is, should I be worried that this is a cause for my model not converging properly?
You need to understand the sequential to sequantail interactions they are not exatcly the same as numpy or matrix but they are posibility when you generate even from embedding Fn there are a bit of changes you may need training or dfilters for target actions.
Example you may do it with CONV or LSTM but filters out layers that make actiuons stable or you may see this game as example !
Embedding layer:
layer_1 = model.get_layer( name="embedding_layer" )
###<keras.layers.embeddings.Embedding object at 0x000001AD42102A30
print(layer_1) # (83, 256)
print(layer_1.get_weights()[0].shape) # (48, 64)
print('min: ' + str(np.min(layer_1.get_weights()[0]))) # min: -0.049991023
print('max: ' + str(np.max(layer_1.get_weights()[0]))) # max: 0.049998153
Output:
👉 the first time
<keras.layers.embeddings.Embedding object at 0x000001FA0BE74A30>
(83, 256)
min: -0.049991023
max: 0.049998153
👉 the second time
<keras.layers.embeddings.Embedding object at 0x00000214A1C34A30>
(83, 256)
min: -0.04999887
max: 0.049993087
👉 the third time
<keras.layers.embeddings.Embedding object at 0x00000283B20F3A30>
(83, 256)
min: -0.049999725
max: 0.049998928
Sample of actions from limited inputs:
This is proving the randoms actions is working correct with simple lines of code
gameState = p.getGameState()
### {'player_x': 102, 'player_vel': 0.0, 'fruit_x': 30, 'fruit_y': -120}
player_x_array = gameState['player_x']
player_vel_array = gameState['player_vel']
fruit_x_array = gameState['fruit_x']
fruit_y_array = gameState['fruit_y']
### x is less then go left
var_1 = player_x_array - fruit_x_array ## right
var_2 = player_x_array - fruit_x_array ## left
var_3 = fruit_y_array - ( player_x_array - fruit_x_array )
print(str(var_1) + " " + str(var_2) + " " + str(var_3))
temp = tf.random.normal([len(posibility_actions)], 1, 0.2, tf.float32)
temp = np.asarray(temp) * np.asarray([ var_1, var_2, var_3 ])
temp = tf.nn.softmax(temp)
action = int(np.argmax(temp))
reward = p.act(posibility_actions[action])
print('random action: ' + str(posibility_actions[action]))
It should not be any problem when it pass though multiple lines of layers that filters out no need information, see input and output what is the taks they generated?
I'm working on a simple classification problem. I proceeded through the example and created a model.
I arranged the tag column as given below.
label 0 1 1 0 0 1
As a result, I wanted to test the system with samples. But it does value as a percentage.
I expect it to give 2 correct values, either 0 or 1.
example codes;
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = reloaded_model.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])
print(
"This particular pet had a %.1f percent probability "
"of getting adopted." % (100 * prob)
)
What code will result in 0 and 1?
thank you.
What to do depends on how you model was constructed. With only two labels you are doing binary classification. If in your model the last dense layer has 1 neuron then it is set up for binary classification. In that case your loss function in model.compile should be
loss=BinaryCrossentropy
Model.predict in that case will produce a single value probability output. You can just use an if statement to determine the class. If the probability is less than.5 it is one class, if greator or equal to .5 it is the other class. Now you may have constructed your model where the last dense layer has 2 neurons. In that case you should be using either sparse_categorical_crossentropy if the labels were integers or categorical_crossentropy if the labels were one hot encoded as your loss function. Model.predict in this case will produce two probabilities as the output. You want to select the index of with the highest probability as the class.
You can do that with class=np.argmax(predictions)
https://keras.io/api/applications/#available-models
From the table given by Keras, we know Xception has 22,910,480 parameters in total, which is the number of weight and bias of convoluation and FC layers. How do we get the size of 88 MB from the number of parameters?
Every tf.float32 / tf.int32 takes 4 bytes. So 23 * 4 ~ 88. There could be some tf.float16, tf.int16.
How can I force certain dimensionality of the output of the conv2d_transpose layer ? My problem is that I use it for upsampling and I want to match the dimensionality of my labels and the output of the NN, for example if I have a feature map as Bx25x40xC how can I make it Bx100x160xC (i.e. upsample exactly 4x times)?
It seems like dimensions of the output can be calculated using
h = ((h_in - 1) * stride_h) + kernel_h - 2 * pad_h
w = ((w_in - 1) * stride_w) + kernel_w - 2 * pad_w
one can manipulate strides and kernels, but padding is controlled by 'same'/'valid' algorithms which, to my understanding, means they are pretty much uncontrollable, so is the resulting output size. For comparison, in caffe, one can at least force the padding in attempt to match the desired output explicitly.
I have the following model file from LIBSVM:
svm_type c_svc kernel_type linear nr_class 2 total_sv 3 rho 0.0666415
label 1 -1 nr_sv 2 1 SV
0.004439511653718091 1:4.5 2:0.5
0.07111595083031433 1:2 2:2
-0.07555546248403242 1:-0.5 2:-2.5
My question is how do I figure out the weight vector from this information?
The weights of the support vectors are the first numbers on each of the support vector lines (the last three). Despite using a linear kernel, libsvm is for general kernel SVMs, so it isn't storing a weight vector and bias explicitly.
If you know you want a linear kernel, and you want that information, you can use liblinear (from the same folks as libsvm). Given this trivial data:
1 1:1 2:1
0 1:-1 2:-1
you can get this model, which has explicit weight and bias:
solver_type L2R_L2LOSS_SVC_DUAL
nr_class 2
label 1 0
nr_feature 2
bias -1
w
0.4327936
0.4327936