I am currently training a neural network based on simulated data.
The standard architecture of the model is as follows:
Input layer (5 features)
Dense hidden layer (64 units)
Dense hidden layer (32 units)
Dense output layer (2 units)
To improve the model's performance, I implemented dropout for the Dense layers and tried different dropout rates. In my case, lower dropout rates (close to 10%) worked best.
Now, I want to implement a Bayesian NN that learns the dropout rate itself, basically doing Variational Dropout. However, edward2's implementation of the DenseVariationalDropout layer does not implement the posterior distribution as a Bernoulli distribution as done in the original dropout method, but (as far as I understand) a Gaussian distribution N(mu, sd).
I already trained the BNN using DenseVariationalDropout layer for the hidden layers (2 and 3) mentioned above.
However, when I print the learned weights/parameters of e.g. the first DenseVariationalDropout layer, the learned stddev is negative.
My question now is, how do I interpret the learned stddev parameters of the layer and is there a way to somehow reconstruct the dropout rate?
In the end, I want to compare the learned dropout rate to my previous experiments using the default Dropout layers where I input the dropout rate as a hyperparameter.
Here, you can find the weights/parameter output of the second layer (first hidden layer), only showing the results for the first input node. However, other numbers look identical/similar:.
<tf.Variable 'dense_variational_dropout/kernel/mean:0' shape=(5, 64) dtype=float32, numpy=
array([[ 0.706813 , -0.70008826, 0.6587434 , 0.6841269 , 0.7477593 ,
-0.67054933, 0.6094059 , -0.66187423, -0.72358364, -0.71848536,
-0.72730917, -0.7543011 , -0.70441514, -0.6896457 , 0.6773513 ,
-0.7390508 , 0.71809 , 0.70859045, -0.7311058 , -0.6940158 ,
-0.6691812 , 0.7317551 , -0.71782017, -0.7040768 , 0.72268885,
0.686578 , -0.71496195, -0.74330693, 0.68203586, -0.71308035,
-0.7066761 , -0.7129957 , -0.7120187 , -0.7271574 , -0.6683003 ,
-0.69519144, 0.72729176, -0.7753829 , 0.727453 , -0.7636749 ,
-0.63404703, -0.69619894, 0.69890517, 0.69997096, 0.7178513 ,
-0.7171472 , 0.7051604 , 0.72245914, -0.75696194, 0.70270175,
-0.66893655, -0.7003819 , 0.7011036 , -0.7276705 , 0.7002035 ,
-0.7110728 , -0.7156996 , -0.7777113 , -0.7551749 , -0.7739159 ,
0.68988633, -0.6978364 , 0.6694619 , -0.6941327 ], ...
<tf.Variable 'dense_variational_dropout/kernel/stddev:0' shape=(5, 64) dtype=float32, numpy=
array([[-4.3544664, -4.369482 , -4.4166455, -4.3739653, -4.290909 ,
-4.410798 , -4.4918427, -4.408001 , -4.332697 , -4.329703 ,
-4.3302474, -4.2944455, -4.3497696, -4.376223 , -4.3921137,
-4.310383 , -4.3301587, -4.3463225, -4.3161445, -4.371665 ,
-4.3999367, -4.3134384, -4.355694 , -4.3571577, -4.3263173,
-4.374832 , -4.331994 , -4.308331 , -4.3812027, -4.3389053,
-4.3488293, -4.3432593, -4.335625 , -4.3250175, -4.410793 ,
-4.3587666, -4.3182616, -4.269835 , -4.3176365, -4.272719 ,
-4.4506063, -4.3802133, -4.3598847, -4.35998 , -4.33531 ,
-4.330978 , -4.3620105, -4.336859 , -4.2905126, -4.3617153,
-4.403421 , -4.350309 , -4.3518815, -4.3138742, -4.364904 ,
-4.344313 , -4.329627 , -4.2587113, -4.30761 , -4.2633414,
-4.369821 , -4.3550806, -4.4168577, -4.3864827], ...
Related
I'm implementing a DQN to do the Trading in the stock market (for educational purposes only)
I have this data and the shape of the data. This is the state in a time series data and I'm going to pass it to a nerual network. The first column is the Closing price of a stock, and the second column is the Volume (Normalized already):
array([[[-0.39283217, 3.96508668],
[-0.39415516, 0.04931261],
[-0.38271683, -0.34029827],
[-0.39283217, -0.42384451],
[-0.4332384 , -0.11795849],
[-0.41201548, -0.47441503],
[-0.41739012, -0.51788375],
[-0.42210326, -0.60101319],
[-0.43660099, -0.596672 ],
[-0.43660099, -0.64244935]]])
(1, 10, 2)
Now I pass this data to a neural network. It's essentially a policy network, but to simplify the question I write it like this here (The loss is Q - target Q value):
model = keras.Sequential([
keras.layers.Input(shape=(10,2,)),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(3, activation='linear')
])
model.compile(loss=count_the_loss(),
optimizer='adam',
metrics='mse')
Now I get this by using the predict function:
array([[[-0.79352564, -0.22876596, 2.309589 ],
[-0.10996505, 0.01430818, 0.22286436],
[-0.17374574, 0.03645202, 0.10073717],
[-0.19824156, 0.07159233, 0.08594725],
[-0.12234195, 0.03734204, 0.19439939],
[-0.21589771, 0.088783 , 0.08315123],
[-0.22866695, 0.10703149, 0.07550874],
[-0.25188142, 0.1436682 , 0.05827002],
[-0.25386256, 0.13714936, 0.06612003],
[-0.26608405, 0.1581351 , 0.05540368]]], dtype=float32)
I'm supposed to get the q(s,a1), q(s,a2), q(s,a3) (where a1, a2 and a3 stands for actions: short, flat and long respectively), then find the q for the action sampled from the experience replay.
But now I get a 1x10x3 array.
My questions are:
How am I supposes to get the q?
And when this is done, it's time to find the target Q. It's similar to the process above. Suppose the above result is what I get by passing the next_state to a target network. I have to find the q max. How can I find q max in a 1x10x3 array?
I ran the HAC on a dataset containing 10 samples as follows:
X_train
>>array([[ 0.97699105, 0.22532681],
[-0.73247801, 0.60953553],
[-0.99434933, 0.03124842],
[-0.82325963, 0.57988328],
[ 0.50084964, -0.26616097],
[ 1.94969804, 0.42602413],
[ 1.0254459 , -0.54057545],
[-0.57115945, 0.8495053 ],
[ 1.39201222, -0.34835877],
[ 0.02372729, 0.52339387]])
Here is the result I get by applying HAC using the scipy library:
linkage(X_train, method='single')
>>array([[ 1. , 3. , 0.09550162, 2. ],
[ 7. , 10. , 0.2891525 , 3. ],
[ 6. , 8. , 0.41390592, 2. ],
[ 2. , 11. , 0.57469287, 4. ],
[ 4. , 12. , 0.59203425, 3. ],
[ 9. , 13. , 0.67840909, 5. ],
[ 0. , 14. , 0.6843032 , 4. ],
[15. , 16. , 0.92251969, 9. ],
[ 5. , 17. , 0.95429679, 10. ]])
Here is the resulting dendrogram
dendrogram(linkage(X_train, method='single'), labels=np.arange(X_train.shape[0]))
In the output matrix of the linkage(X_train, method='single'), the first two columns represent the children in our hierarchy.
I would like to know how we do to calculate these children?
For example :
the first fusion of our algorithm involves singleton clusters containing points {1} and {3}. And as children we have [1, 3]
The second merge involves the previously calculated cluster containing the points {1, 3} and the singleton cluster {7}. And like children we have [7, 10]. How was the value 10 obtained?
According to the docs, at the i-th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster n+i, where n is the number of input samples and Z is the linkage matrix. https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html.
Thus 10 is just is just 10+0 where 10 is total number of points and 0 is the row where the cluster is combined.
In other words, all cluster indices i>=n actually refer to the cluster formed in Z[i - n].
If that's still unclear you can read the detailed description here https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/#Perform-the-Hierarchical-Clustering
Why there is negative numbers when I create the below array?
a12 = np.random.randn(3, 5)
a12
Output:-
array([[-1.43586215, 1.16316375, 0.01023306, -0.98150865, 0.46210347],
[ 0.1990597 , -0.60021688, 0.06980208, -0.3853136 , 0.11351735],
[ 0.66213067, 1.58601682, -1.2378155 , 2.13303337, -1.9520878 ]])
np.random.randn() draws a sample from the Standard Normal Distribution i.e N(0,1). Passing in the dimensions returns an array of the given shape i.e np.random.randn(3,5) will return an array with shape (3,5) with all elements drawn from the standard normal distribution. Hence, we can get negative numbers, and infact all numbers in R.
I am trying to create a sequential model witch would classify random groups of vectors to a class. The model consistently classifies all groups to the same class.
creating data:
Each news has 200 random vectors with a dimension of 300.
I want the model to be able to classify each news group to a class
allnews=[]
for j in range(50):
news=[]
for i in range(200):
news.append(np.random.random(300))
allnews.append(np.array(news))
#allnews= tf.convert_to_tensor(allnews)
allnews= np.array(allnews)
print(np.shape(allnews))
allnews = allnews.reshape((allnews.shape[0], allnews.shape[1], 300))
print(np.shape(allnews))
lables=[]
for j in range(20):
lables.append(0)
for j in range(20):
lables.append(1)
for d in range(10):
lables.append(2)
lables= tf.convert_to_tensor(lables)
print(lables)
creating the model:
the model i am trying to create:
YourSequenceLenght=200
model = tf.keras.Sequential()
model.add(Input(shape=(YourSequenceLenght,300)))
model.add(Dense(300,use_bias=False,kernel_initializer='random_normal',kernel_regularizer=tf.keras.regularizers.l1(0.01),activation="linear"))
model.add(SimpleRNN(1, return_sequences=False,kernel_initializer='random_normal',kernel_regularizer=tf.keras.regularizers.l1(0.01),use_bias=False,recurrent_regularizer=tf.keras.regularizers.l1(0.01),activation="sigmoid"))
model.add(Dense(3,use_bias=False,kernel_initializer='random_normal',kernel_regularizer=tf.keras.regularizers.l1(0.01),activation="softmax"))
model.summary()
METRICS = [
keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
keras.metrics.Precision(name='precision'),
keras.metrics.Recall(name='recall'),
keras.metrics.AUC(name='auc'),
]
model.compile(optimizer='sgd',loss='categorical_crossentropy',metrics=METRICS)
training and predicting:
print(lables)
lables = keras.utils.to_categorical(y=lables,num_classes= 3)
# y_train = np_utils.to_categorical(y=y_train, num_classes=10)
print(lables)
history = model.fit(allnews,lables,epochs=10)
res= model.predict(allnews)
print(np.shape(res))
import operator
for r in res:
index, value = max(enumerate(r), key=operator.itemgetter(1))
print(index)
print(value)
for r in res:
print(r)
the outputs from the for prints:
2
0.34069243
2
0.34070647
2
0.33907583
2
0.34005642
2
0.34013948
2
0.34007362
2
0.34028214
2
0.33997294
2
0.34018084
2
0.33995336
2
0.33998552
2
0.33882195
2
0.3401062
2
0.3418465
2
0.33978543
2
0.3396516
2
0.34062216
2
0.3419327
2
0.34114555
2
0.34119973
2
0.3404259
2
0.33981207
2
0.34035686
2
0.34139898
2
0.3398025
2
0.3391234
2
0.34051093
2
0.34120804
2
0.34140897
2
0.34064025
2
0.34133258
2
0.34019342
2
0.3404882
2
0.33930022
2
0.3416659
2
0.3406455
2
0.34054703
2
0.34057957
2
0.3391579
2
0.3395657
2
0.34069654
2
0.3400011
2
0.338789
2
0.34008256
2
0.34080264
2
0.34000066
2
0.340322
2
0.341806
2
0.34178147
2
0.34078327
EDIT:
clarification
I am trying to use a model witch works as follows :
sigmoid hidden layer(with resurrection ) and softmax projection
You are trying to learn something from random data. Your model is (randomly) initilialized in such a way that it always predict class 2, and the gradient updates don't steer the weights into any particular direction, because the input is random, so they stay there. Try having your input data be structured instead of random (e.g. random.random()*tf.one_hot(1,depth=200) for class 1, random.random()*tf.one_hot(2, depth=200) for class 2 and random.random()*tf.one_hot(3, depth=200). Now your values will still be random, but will adhere to a structure.
EDIT:
I took a look at your colab:
1) you can speed up the dataset construction by adding .numpy() after the tf.one_hot: tf.one_hot(1).numpy().
2) When I changed the model to:
model = tf.keras.Sequential()
model.add(Input(shape=(YourSequenceLenght,300)))
model.add(tf.keras.layers.Flatten())
model.add(Dense(300,use_bias=False,kernel_initializer='random_normal',kernel_regularizer=tf.keras.regularizers.l1(0.01),activation="linear"))
# model.add(SimpleRNN(1, return_sequences=False,kernel_initializer='random_normal',kernel_regularizer=tf.keras.regularizers.l1(0.01),use_bias=False,recurrent_regularizer=tf.keras.regularizers.l1(0.01),activation="sigmoid"))
model.add(Dense(3,use_bias=False,kernel_initializer='random_normal',kernel_regularizer=tf.keras.regularizers.l1(0.01),activation="softmax"))
model.summary()
the accuracy quickly became 100 % after 4 epochs. I think because you only have only 1 output neuron in the SimpleRNN, you can't encode enough information to what class it should be, at least not with just 1 Dense layer afterwards.
3) You are using BinaryAccuracy in your metrics, that doesn't make a lot of sense here. You can just use the normal accuracy (as a string) for the accuracy metric (metrics = ["accuracy", tf.keras.metrics.TruePositives(...), ...])
I want to pass a layer, say 9 x 1 through a kernel of size, say 2 x 1
Now what I want to do is convolve the following values together ->
1 and 2, 2 and 3, 4 and 5, 5 and 6, 7 and 8, 8 and 9
and then offcourse padd it.
What you can see from this example is that I am trying to make the stride in width dimension of the pattern ->
1, 2, 1, 2, 1, 2, ...
and after every '1' I want to padd it so that finally the size doesnt change.
To simply see it I want to slice the main matrix into smaller matrices along a dimension, pass each of them separately through conv2d layers, padd them, and then concat them again along the same dimension but I want to do all this without actually cutting it up. I hope you understand what I am trying to ask. Is it possible?
Edit : Sorry should have mentioned this, I am using tensorflow libraries and I am talking about the tf.nn.conv2d function