TensorFlow XOR implementation, fail to achieve 100% accuracy - tensorflow

I am a newbie in machine learning and tensorflow. I am trying to implement XOR gate in tensor flow I have come up with this code.
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
learning_rate = 0.01
n_epochs = 1000
n_inputs = 2
n_hidden1 = 2
n_outputs = 2
arr1, target = [[0, 0], [0, 1], [1, 0], [1,1]], [0, 1, 1, 0]
X_data = np.array(arr1).astype(np.float32)
y_data = np.array(target).astype(np.int)
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")
with tf.name_scope("dnn_tf"):
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation=tf.nn.relu)
logits = tf.layers.dense(hidden1, n_outputs, name="outputs")
with tf.name_scope("loss"):
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
with tf.name_scope("train"):
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)
training_op = optimizer.minimize(loss)
with tf.name_scope("eval"):
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
if epoch % 100 == 0:
print("Epoch: ", epoch, " Train Accuracy: ", acc_train)
sess.run(training_op, feed_dict={X:X_data, y:y_data})
acc_train = accuracy.eval(feed_dict={X:X_data, y:y_data})
The code runs fine but I am getting different outputs in each run
Run-1
Epoch: 0 Train Accuracy: 0.75
Epoch: 100 Train Accuracy: 1.0
Epoch: 200 Train Accuracy: 1.0
Epoch: 300 Train Accuracy: 1.0
Epoch: 400 Train Accuracy: 1.0
Epoch: 500 Train Accuracy: 1.0
Epoch: 600 Train Accuracy: 1.0
Epoch: 700 Train Accuracy: 1.0
Epoch: 800 Train Accuracy: 1.0
Epoch: 900 Train Accuracy: 1.0
Run -2
Epoch: 0 Train Accuracy: 1.0
Epoch: 100 Train Accuracy: 0.75
Epoch: 200 Train Accuracy: 0.75
Epoch: 300 Train Accuracy: 0.75
Epoch: 400 Train Accuracy: 0.75
Epoch: 500 Train Accuracy: 0.75
Epoch: 600 Train Accuracy: 0.75
Epoch: 700 Train Accuracy: 0.75
Epoch: 800 Train Accuracy: 0.75
Epoch: 900 Train Accuracy: 0.75
Run3-
Epoch: 0 Train Accuracy: 1.0
Epoch: 100 Train Accuracy: 0.5
Epoch: 200 Train Accuracy: 0.5
Epoch: 300 Train Accuracy: 0.5
Epoch: 400 Train Accuracy: 0.5
Epoch: 500 Train Accuracy: 0.5
Epoch: 600 Train Accuracy: 0.5
Epoch: 700 Train Accuracy: 0.5
Epoch: 800 Train Accuracy: 0.5
Epoch: 900 Train Accuracy: 0.5
I am unable to understand what I am doing wrong here and why my solution is not converging.

In theory it's possible to solve XOR with one hidden layer with two units with ReLU activations as you have in your code. However, there is always the crucial difference between a network being able to represent a solution and being able to learn it. I would assume that due to the small size of the network you run into the "dead ReLU" problem where due to unfortunate random initialization one (or both) of your hidden units doesn't activate for any input. Unfortunately ReLU also has zero gradient when it has zero activation, so a unit that never activates also cannot learn anything.
Increasing the number of hidden units makes it less likely that this happens (i.e. you can have three dead units and the other two will still be enough to solve the problem), which could explain why you are more successful with five hidden units.

You might want to check out the interactive TensorFlow Playground. They have a XOR dataset available. You can play around with the number of hidden layers, size, activation functions etc. and visualise the decision boundaries the classifier learns with the number of epohcs.

Related

Negative binomial , Poisson-gamma mixture winbugs

Winbugs trap error
model
{
for (i in 1:5323) {
Y[i] ~ dpois(mu[i]) # NB model as a Poisson-gamma mixture
mu[i] ~ dgamma(b[i], a[i]) # NB model as a poisson-gamma mixture
a[i] <- b[i] / Emu[i]
b[i] <- B * X[i]
Emu[i] <- beta0 * pow(X[i], beta1) # model equation
}
# Priors
beta0 ~ dunif(0,10) # parameter
beta1 ~ dunif(0,10) # parameter
B ~ dunif(0,10) # over-dispersion parameter
}
X[] Y[]
1.5 0
2.9 0
1.49 0
0.39 0
3.89 0
2.03 0
0.91 0
0.89 0
0.97 0
2.16 0
0.04 0
1.12 1s
2.26 0
3.6 1
1.94 0
0.41 1
2 0
0.9 0
0.9 0
0.9 0
0.1 0
0.88 1
0.91 0
6.84 2
3.14 3
End ```
This is just a sample of the data, the model question is coming from Ezra Hauer 8.3.2, the art of regression of road safety, the model is providing an **error undefined real result. **
The aim of model is to fully Bayesian and a one step model and not use empirical bayes.
The results should be similar to MLE where beta0 is 1.65, beta1 0.871, overdispersion is 0.531
X is the only variable and y is actual collision,
So X cannot be zero or negative, while y cannot be lower than zero, if the model in solved as Poisson gamma mixture using maximum likelihood then it can be created
How can I make this model work
Solving an error in winbugs?
the data is in excel, the model worked fine when I selected the biggest 1000 observations only.

TensorFlow linear regresison task - very high loss problem

I'm trying to build a linear model on my own yield
# Create features
X = np.array([-7.0, -4.0, -1.0, 2.0, 5.0, 8.0, 11.0, 14.0])
# Create labels
y = np.array([3.0, 6.0, 9.0, 12.0, 15.0, 18.0, 21.0, 24.0])
model = tf.keras.Sequential([
tf.keras.layers.Dense(50, activation = "elu", input_shape = [1]),
tf.keras.layers.Dense(1)
])
model.compile(loss = "mae",
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.01),
metrics = ["mae"])
model.fit(X, y, epochs = 150)
When I train with the above X and y data, the loss value starts from a normal value.
experience salary
0 0 2250
1 1 2750
2 5 8000
3 8 9000
4 4 6900
5 15 20000
6 7 8500
7 3 6000
8 2 3500
9 12 15000
10 10 13000
11 14 18000
12 6 7500
13 11 14500
14 12 14900
15 3 5800
16 2 4000
But when I use such a dataset, the initial loss value starts as 800.(same as above model btw)
What could be the reason for this?
Your learning rate is significantly high. You should opt for much lower initial learning rates, such as 0.0001 or 0.00001.
Otherwise you are using 'linear' activation on the last layer (default one) and the correct loss function and metric. Also note that the default batch_size in absence of explicit mentioning is 32.
UPDATING : as determined by the author of the question, underfitting was also fundamental to the problem. Adding multiple more layers helped solved the problem.

Federated Averaging (fedavg) with resnet 18 that has batch_normalization makes the same prediction after first round, but in no other rounds

I was trying to implement tensorflow-federated simple fedavg with cifar10 dataset and resnet18. Also this is the pytorch implementation. Just like trainable ones, I have aggregated non-trainable parameters of batch-normalization to server and averaged them. I have used 5 clients and dataset was divided to 5 randomly, 50k/5=10k training samples for each client, so there is no gross skewed distribution. I have tested each client, after training, with the full test dataset,10k samples, that I also use to test server. The problem is after first training round despite each client had 20-25% accuracy, the server has 10% accuracy and basically makes nearly the same predictions for each input. This is the only the case for first round since after that round server has almost always better accuracy than any client had in that round. For example
Round 0 training loss: 3.0080783367156982
Round 0 client_id: 0 eval_score: 0.2287999987602234
Round 0 client_id: 1 eval_score: 0.2614000141620636
Round 0 client_id: 2 eval_score: 0.22040000557899475
Round 0 client_id: 3 eval_score: 0.24799999594688416
Round 0 client_id: 4 eval_score: 0.2565999925136566
Round 0 validation accuracy: 10.0
Round 1 training loss: 1.920640230178833
Round 1 client_id: 0 eval_score: 0.25220000743865967
Round 1 client_id: 1 eval_score: 0.32199999690055847
Round 1 client_id: 2 eval_score: 0.32580000162124634
Round 1 client_id: 3 eval_score: 0.3513000011444092
Round 1 client_id: 4 eval_score: 0.34689998626708984
Round 1 validation accuracy: 34.470001220703125
Round 2 training loss: 1.65810227394104
Round 2 client_id: 0 eval_score: 0.34369999170303345
Round 2 client_id: 1 eval_score: 0.3138999938964844
Round 2 client_id: 2 eval_score: 0.35580000281333923
Round 2 client_id: 3 eval_score: 0.39649999141693115
Round 2 client_id: 4 eval_score: 0.3917999863624573
Round 2 validation accuracy: 45.0
Round 3 training loss: 1.4956902265548706
Round 3 client_id: 0 eval_score: 0.46380001306533813
Round 3 client_id: 1 eval_score: 0.388700008392334
Round 3 client_id: 2 eval_score: 0.39239999651908875
Round 3 client_id: 3 eval_score: 0.43700000643730164
Round 3 client_id: 4 eval_score: 0.430400013923645
Round 3 validation accuracy: 50.62000274658203
Round 4 training loss: 1.3692104816436768
Round 4 client_id: 0 eval_score: 0.510200023651123
Round 4 client_id: 1 eval_score: 0.42739999294281006
Round 4 client_id: 2 eval_score: 0.4223000109195709
Round 4 client_id: 3 eval_score: 0.45080000162124634
Round 4 client_id: 4 eval_score: 0.45559999346733093
Round 4 validation accuracy: 54.83000183105469
To solve the issue with first round I tried to repeat the dataset but it didnt help. After that I tried to use all the cifar10 training samples for each client meaning instead of creating 5 different datasets of 10k samples for each client I used all 50k samples as the dataset.
Round 0 training loss: 1.9335068464279175
Round 0 client_id: 0 eval_score: 0.4571000039577484
Round 0 client_id: 1 eval_score: 0.4514000117778778
Round 0 client_id: 2 eval_score: 0.4738999903202057
Round 0 client_id: 3 eval_score: 0.4560000002384186
Round 0 client_id: 4 eval_score: 0.4697999954223633
Round 0 validation accuracy: 10.0
Round 1 training loss: 1.4404207468032837
Round 1 client_id: 0 eval_score: 0.5945000052452087
Round 1 client_id: 1 eval_score: 0.5909000039100647
Round 1 client_id: 2 eval_score: 0.5864999890327454
Round 1 client_id: 3 eval_score: 0.5871999859809875
Round 1 client_id: 4 eval_score: 0.5684000253677368
Round 1 validation accuracy: 59.57999801635742
Round 2 training loss: 1.0174440145492554
Round 2 client_id: 0 eval_score: 0.7002999782562256
Round 2 client_id: 1 eval_score: 0.6953999996185303
Round 2 client_id: 2 eval_score: 0.6830999851226807
Round 2 client_id: 3 eval_score: 0.6682999730110168
Round 2 client_id: 4 eval_score: 0.6754000186920166
Round 2 validation accuracy: 72.41999816894531
Round 3 training loss: 0.7608759999275208
Round 3 client_id: 0 eval_score: 0.7621999979019165
Round 3 client_id: 1 eval_score: 0.7608000040054321
Round 3 client_id: 2 eval_score: 0.7390000224113464
Round 3 client_id: 3 eval_score: 0.7301999926567078
Round 3 client_id: 4 eval_score: 0.7303000092506409
Round 3 validation accuracy: 78.33000183105469
Round 4 training loss: 0.5893330574035645
Round 4 client_id: 0 eval_score: 0.7814000248908997
Round 4 client_id: 1 eval_score: 0.7861999869346619
Round 4 client_id: 2 eval_score: 0.7804999947547913
Round 4 client_id: 3 eval_score: 0.7694000005722046
Round 4 client_id: 4 eval_score: 0.758400022983551
Round 4 validation accuracy: 81.30000305175781
Clients obviously had the same initialization but i guess due to gpu use there were some minor accuracy differences yet each had 45+% accuracy. But as you can see even this didnt help with the first round. When using a simple cnn, such as the one available in the ".main", with suitable parameters this problem doesnt exist. And using
learning_rate=0.01 or momentum=0
instead of
learning_rate=0.1 and momentum=0.9
reduces this for problem the first round but it has overall worse performance and i am trying to reproduce a paper that used the latter parameters.
I have also tried the same with pytorch and got the very similar results. Colab for pytorch code The results for both are available in github.
I am very confused with that. Especially when I used entire training dataset and when each client had 45% accuracy. Also why get good results for following rounds? What changed between first round and the others? Every time clients had the same initialization with each other, same loss function, and same optimizer with the same parameters. The only thing that changed is the actual initialization between rounds.
So is there a special initialization that solves this first round problem or am I missing something?
Edit:
When the entire cifar10 training set is used for each client and dataset.repeat is used to repeat data.
Pre-training validation accuracy: 9.029999732971191
Round 0 training loss: 1.6472676992416382
Round 0 client_id: 0 eval_score: 0.5931000113487244
Round 0 client_id: 1 eval_score: 0.5042999982833862
Round 0 client_id: 2 eval_score: 0.5083000063896179
Round 0 client_id: 3 eval_score: 0.5600000023841858
Round 0 client_id: 4 eval_score: 0.6104999780654907
Round 0 validation accuracy: 10.0
What catches my attention here is the client accuracy here is actually very similar to second round (round 1) accuracy of clients when dataset wasnt repeated(previous results). so eventhough server had 10% accuracy it didnt affect much the results of the next round.
This is how it works with a simple cnn (defined in the main.py in github)
With the training set divided to 5
Pre-training validation accuracy: 9.489999771118164
Round 0 training loss: 2.1234841346740723
Round 0 client_id: 0 eval_score: 0.30250000953674316
Round 0 client_id: 1 eval_score: 0.2879999876022339
Round 0 client_id: 2 eval_score: 0.2533999979496002
Round 0 client_id: 3 eval_score: 0.25999999046325684
Round 0 client_id: 4 eval_score: 0.2897999882698059
Round 0 validation accuracy: 31.18000030517578
Entire training set for all the clients
Pre-training validation accuracy: 9.489999771118164
Round 0 training loss: 1.636365532875061
Round 0 client_id: 0 eval_score: 0.47850000858306885
Round 0 client_id: 1 eval_score: 0.49470001459121704
Round 0 client_id: 2 eval_score: 0.4918000102043152
Round 0 client_id: 3 eval_score: 0.492900013923645
Round 0 client_id: 4 eval_score: 0.4043000042438507
Round 0 validation accuracy: 50.62000274658203
As we can see when a simple cnn is used server accuracy is better than the best client accuracy, and definitely better than the average, beginning from the very first round. I am trying to understand why the resnet fails to do that and makes the same predictions regardless of input. After the first round the predictions look like
[[0.02677999 0.02175025 0.10807421 0.25275248 0.08478505 0.20601839
0.16497472 0.09307405 0.01779539 0.02399557]
[0.04087764 0.03603332 0.09987792 0.23636964 0.07425722 0.19982725
0.13649824 0.09779423 0.03454168 0.04392283]
[0.02448712 0.01900426 0.11061406 0.25295085 0.08886322 0.20792796
0.17296027 0.08762561 0.01570844 0.01985822]
[0.01790532 0.01536059 0.11237497 0.2519772 0.09357632 0.20954111
0.18946911 0.08571784 0.01004946 0.01402805]
[0.02116687 0.02263201 0.10294028 0.25523028 0.08544692 0.21299754
0.17604835 0.088608 0.01438032 0.02054946]
[0.01598492 0.01457187 0.10899033 0.25493488 0.09417254 0.20747423
0.19798534 0.08387674 0.0089481 0.01306108]
[0.01432306 0.01214803 0.11237216 0.25138852 0.09796435 0.2036258
0.20656979 0.08344456 0.00726837 0.01089529]
[0.01605278 0.0135905 0.11161591 0.25388476 0.09531546 0.20592561
0.19932476 0.08305667 0.00873495 0.01249863]
[0.02512863 0.0238647 0.10465285 0.24918261 0.08625458 0.21051233
0.16839236 0.09075507 0.01765386 0.02360307]
[0.05418856 0.05830322 0.09909651 0.20211859 0.07324574 0.18549475
0.11666768 0.0990423 0.05081367 0.06102907]]
They all return 3rd label.
I guess loading model weights from previously trained model would resolve the issue? See How to initialize the model with certain weights? for how to initialize the first round model weights.

Which optimization techniques can I use for maximizing the sum of minimum distance of each point to other points in a unit hypercube?

Let's say I have the following unit hypercube with 9 points
My goal is to maximize this function:
In the image, Figure 1 is the original data, Figure 2 is computed using the function, and Figure 3 is the optimized function.
I want to know how can I reach to Figure 3 from Figure 1.
So far, I have tried using Simulated Annealing, but I am not able to do it in the correct way. Any other suggestions would be helpful!
You could model this as:
max sum(i, d[i])
d(i) ≤ sqrt( (x[i]-x[j])^2 + (y[i]-y[j])^2 ) for all j <> i
x[i],y[i] ∈ [0,1]
This is a non-convex problem and can be solved with a global solver such as Couenne or Baron. (Note: it will find good solutions quickly but proving global optimality is difficult and time-consuming).
This can also be attacked using a multi-start approach with a local solver (I used CONOPT in the test below). The algorithm would be:
bestobj = 0
for k = 1 to N (say N=50)
(x,y) = random points in [0,1]x[0,1]
solve NLP model
if obj > bestobj
save solution
bestobj = obj
end
Using both approaches (global solver, multistart approach) I get for 9 points:
---- VAR x x-coordinates
LOWER LEVEL UPPER MARGINAL
i1 . 0.5000 1.0000 EPS
i2 . 1.0000 1.0000 EPS
i3 . 0.5000 1.0000 EPS
i4 . . 1.0000 EPS
i5 . 0.5000 1.0000 EPS
i6 . . 1.0000 EPS
i7 . . 1.0000 EPS
i8 . 1.0000 1.0000 EPS
i9 . 1.0000 1.0000 EPS
---- VAR y y-coordinates
LOWER LEVEL UPPER MARGINAL
i1 . . 1.0000 EPS
i2 . . 1.0000 EPS
i3 . 0.5000 1.0000 EPS
i4 . 1.0000 1.0000 EPS
i5 . 1.0000 1.0000 EPS
i6 . 0.5000 1.0000 EPS
i7 . . 1.0000 EPS
i8 . 1.0000 1.0000 EPS
i9 . 0.5000 1.0000 EPS
---- VAR d min distances from point i
LOWER LEVEL UPPER MARGINAL
i1 . 0.5000 1.4142 EPS
i2 . 0.5000 1.4142 EPS
i3 . 0.5000 1.4142 EPS
i4 . 0.5000 1.4142 EPS
i5 . 0.5000 1.4142 EPS
i6 . 0.5000 1.4142 EPS
i7 . 0.5000 1.4142 EPS
i8 . 0.5000 1.4142 EPS
i9 . 0.5000 1.4142 EPS
LOWER LEVEL UPPER MARGINAL
---- VAR z -INF 4.5000 +INF .
z objective

how to use neural networks for new data?

I coded some of the neural networks like image classifier,mnist and NLP I
got an accuracy of 98 percent on my GPU (NVIDIA GT 610). How can I feed new data (not training data) to my neural network and get the predictions?
Let's suppose:
Inputs Output
0 0 1 0
1 1 1 1
1 0 1 1
0 1 1 0
I got an accuracy of 98.7 how to give an input like [ 1 1 0] and predict the output. Is there any method in Tensorflow to do this?
If your output variable is y and your input placeholder is x:
sess.run(y, feed_dict={x: mnist.test.images})
See https://www.tensorflow.org/get_started/mnist/beginners