I am now doing a DNN regression analysis. I use the tensflow's DNNRegressor. But I don't know how to adjust the appropriate parameters to get a good neural network model?
regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols,
# hidden_units=[10, 128], # loss:42.252525
# hidden_units=[50, 320], # 7.66
# hidden_units=[50, 640], # 22.162941
# hidden_units=[100, 640], # 5.249118
# hidden_units=[100, 320], # 6.54
# hidden_units=[300, 640], # 41.01174
# hidden_units=[300, 896], # 17.183
# hidden_units=[50, 100, 640], # 17.760363
# hidden_units=[50, 320, 640], # 16.38122
# hidden_units=[50, 320, 128, 50], # 52.36839
# hidden_units=[640, 100], # 53
hidden_units=[100, 320, 640], # 22.162941
model_dir='./models/dnnregressor',
weight_column_name = None,
optimizer=None,
activation_fn=tf.nn.relu,
dropout=None,
gradient_clip_norm=None,
enable_centered_bias=False,
config=config,
feature_engineering_fn=None,
label_dimension = 4,
embedding_lr_multipliers=None,
input_layer_min_slice_size=None)
My dataset looks like this.
df = conv2_dataframe(CmdS_X=CmdS_X, CmdS_Y=CmdS_Y, CmdS_Z=CmdS_Z, CmdV=CmdV, halfV=halfV,
ActS_X=ActS_X, ActS_Y=ActS_Y, ActS_Z=ActS_Z, ActV=ActV)
labels = ['ActS_X', 'ActS_Y', 'ActS_Z', 'ActV']
dnnRegressor(df, labels)
shape: (12686, 9)
Data description diagram
There is no simple answer. It depends on your data characteristics and exact problem. This problem is generally referred to as "hyper-parameter tuning".
If you want a short answer - do a proper training, validation, and test dataset split and use whatever parameters give best results on your test dataset.
Related
I made a multi-input model in Keras which takes image shape=[N, 640, 480, 3] as well as numerical data shape=[N, 19] and does prediction on 12 classes.
Following is the model defining part of code:
# # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# # MODEL === CNN
# # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#
base_model = keras.applications.ResNet50(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(640, 480, 3),
include_top=False) # Do not include the ImageNet classifier at the top.
base_model.trainable = False
input_Cnn = keras.Input(shape=(640, 480, 3))
x = base_model(input_Cnn, training=False)
# Convert features of shape `base_model.output_shape[1:]` to vectors
x = keras.layers.GlobalAveragePooling2D()(x)
# A Dense classifier with a single unit (binary classification)
x1 = keras.layers.Dense(1024, activation="relu")(x)
out_Cnn = keras.layers.Dense(12, activation="relu")(x1)
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# MODEL === NN
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
inp_num = keras.layers.Input(shape=(19,)) # no. of columns of the numerical data
fc1 = keras.layers.Dense(units=2 ** 6, activation="relu")(inp_num)
fc2 = keras.layers.Dense(units=2 ** 8, activation="relu")(fc1)
fc3 = keras.layers.Dense(units=2 ** 10, activation="relu")(fc2)
fc4 = keras.layers.Dense(units=2 ** 8, activation="relu")(fc3)
fc5 = keras.layers.Dense(units=2 ** 6, activation="relu")(fc4)
out_NN = keras.layers.Dense(12, activation="relu")(fc5)
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# CONCATENATION
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
result = keras.layers.concatenate((out_Cnn, out_NN), axis=-1) # [N, 12] --- concatenate [N, 12] ==> [N, 24]
result = keras.layers.Dense(1024, activation='relu')(result)
result = keras.layers.Dense(units=12, activation="softmax")(result)
model = keras.Model([input_Cnn, inp_num], result)
print(model.summary())
Problem is that the CNN part (if independently trained) trains in a less number of epochs while the ANN part (if independently trained) takes a longer time (more epochs). But here in this code when both are combined, accuracy doesn't go beyond 10%. Is there any way to stop gradients flowing into the CNN part after a certain number of epochs so that after that model trains only the ANN part?
Im not using keras but after a quick google search this should be the answer:
You can freeze layers, so that certain parameters are not learnable anymore:
# this freezes the first N layers
for layer in model.layers[:N]:
layer.trainable = False
Where N is the amount of convolutional layers you have.
I want to create a tensor which is some kind of a transformation matrix (rotation matrix for instance)
My model predicts 2 parameters: x1 and x2
so the output is a tensor of (B, 2), when B is number of batches.
however, when I write my loss, I have to know this "B" since I want to iterate over it:
def get_rotation_tensor(x):
roll_mat = K.stack([ [[1, 0, 0],
[0, K.cos(x[i, 0]), -K.sin(x[i, 0])],
[0, K.sin(x[i, 0]), K.cos(x[i, 0])]] for i in range(BATCH_SIZE)])
pitch_mat = K.stack([ [[K.cos(x[i, 1]), 0, K.sin(x[i, 1])],
[0, 1, 0],
[-K.sin(x[i, 1]), 0, K.cos(x[i, 1])]] for i in range(BATCH_SIZE)])
return K.batch_dot(pitch_mat, roll_mat)
the only solution I could have think of is to pre-define the BATCH_SIZE in advance.. but is there a way to write a general loss function that will work for every batch size?
THANKS
I found a solution
def get_rotation_tensor(x):
ones = K.ones_like(x[:, 0])
zeros = K.zeros_like(x[:, 0])
roll_mat = K.stack([[ones, zeros, zeros],
[zeros, K.cos(x[:, 0]), -K.sin(x[:, 0])],
[zeros, K.sin(x[:, 0]), K.cos(x[:, 0])]])
pitch_mat = K.stack([[K.cos(x[:, 1]), zeros, K.sin(x[:, 1])],
[zeros, ones, zeros],
[-K.sin(x[:, 1]), zeros, K.cos(x[:, 1])]])
return K.batch_dot(K.permute_dimensions(pitch_mat, (2, 0, 1)),
K.permute_dimensions(roll_mat, (2, 0, 1)))
Perhaps I'm not fully understanding your issue, but can't you just determine the batch size by the shape of the tensors passed into the loss function. Below is an example that shows the idea. I hope this helps.
# Install TensorFlow
try:
# %tensorflow_version only exists in Colab.
%tensorflow_version 2.x
except Exception:
pass
import tensorflow as tf
print(tf.__version__)
print(tf.executing_eagerly())
# Setup repro section from Keras FAQ with TF1 to TF2 adjustments
import numpy as np
import random as rn
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/
session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.compat.v1.set_random_seed(1234)
sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
tf.compat.v1.keras.backend.set_session(sess)
# Rest of code follows ...
# Custom Loss
def my_custom_loss(y_true, y_pred):
tf.print('inside my_custom_loss:')
tf.print('y_true:')
tf.print(y_true)
tf.print('y_true column 0:')
tf.print(y_true[:,0])
tf.print('y_true column 1:')
tf.print(y_true[:,1])
tf.print('y_pred:')
tf.print(y_pred)
# get length/batch size
batch_size=tf.shape(y_pred)[0]
tf.print('batch_size:')
tf.print(batch_size)
y_zeros = tf.zeros_like(y_pred)
y_mask = tf.math.greater(y_pred, y_zeros)
res = tf.boolean_mask(y_pred, y_mask)
logres = tf.math.log(res)
finres = tf.math.reduce_sum(logres)
return finres
# Define model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', input_dim=1, name="Dense1"))
model.compile(optimizer='rmsprop', loss=my_custom_loss)
print('model.summary():')
print(model.summary())
# Generate dummy data
data = np.array([[2.0],[1.0],[1.0],[3.0],[4.0]])
labels = np.array([[[2.0],[1.0]],
[[0.0],[3.0]],
[[0.0],[3.0]],
[[0.0],[3.0]],
[[0.0],[3.0]]])
# Train the model.
print('training the model:')
print('-----')
model.fit(data, labels, epochs=1, batch_size=3)
print('done training the model.')
print(data.shape)
print(labels.shape)
Basically I've tried this code
np.random.seed(7)
dataframe = read_csv('c:/data/suicides.csv', usecols=[1], engine='python')
dataset = dataframe.values
dataset = dataset.astype('float32')
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
# reshape into X=t and Y=t+1
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))
# Initialising the RNN
#regressor = Sequential()
model = Sequential()
# # In[25]:
# # Adding the first LSTM layer and some Dropout regularisation
model.add(CuDNNLSTM(units = 10, return_sequences = True, input_shape = (trainX.shape[1], 1)))
model.add(Dropout(0.1))
# # In[26]:
# # Adding a second LSTM layer and some Dropout regularisation
model.add(CuDNNLSTM(units = 5, return_sequences = True))
model.add(Dropout(0.1))
# # In[27]:
# # Adding a third LSTM layer and some Dropout regularisation
model.add(CuDNNLSTM(units = 4, return_sequences = True))
model.add(Dropout(0.1))
# # In[28]:
# ## Adding a fourth LSTM layer and some Dropout regularisation
model.add(CuDNNLSTM(units = 2))
model.add(Dropout(0.2))
# # In[29]:
# # Adding the output layer
model.add(Dense(units = 1))
# # In[30]:
# # Compiling the RNN
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
# # In[33]:
# #epoch = [10, 15, 20, 25, 30, 35, 40, 45, 50]
# # Fitting the RNN to the Training set
model.fit(trainX, trainY, epochs = _epoch, batch_size = _batch)
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
Unfortunately for some reason while there are long time series of 0, it won't predict zero.
Ever. Min/max for train / test data starts with 0, but predict data always is like 5-6 for min. value.
Train/test data is from 0 - ~ 40
I've tried different settings, number of epochs, Activation, optimizer, loss but always min. value for predict data > ~ 15% of max train value....
For some reason batch > 1 in time series was generating too high minimum predictions.
Solved by modification of a batch settings + model training settings.
So if you have predictions that are > 0, then probably there is something wrong with training settings.
hi i used my own dataset for train the model but i have error that i mention below . my dataset has 124 class and lables are 0 to 123 , size is 60*60 gray , batch is 10 and result is :
lables.eval() --> [ 1 101 101 103 103 103 103 100 102 1] -- len(lables.eval())= 10
orginal pic size -- > (?, 60, 60, 1)
First convolutional layer (?, 30, 30, 32)
Second convolutional layer. (?, 15, 15, 64)
flatten. (?, 14400)
dense .1 (?, 2048)
dense .2 (?, 124)
error
ensorflow.python.framework.errors_impl.InvalidArgumentError: logits and
labels must have the same first dimension, got logits shape [40,124] and
labels shape [10]
code
def model_fn(features, labels, mode, params):
# Reference to the tensor named "image" in the input-function.
x = features["image"]
# The convolutional layers expect 4-rank tensors
# but x is a 2-rank tensor, so reshape it.
net = tf.reshape(x, [-1, img_size, img_size, num_channels])
# First convolutional layer.
net = tf.layers.conv2d(inputs=net, name='layer_conv1',
filters=32, kernel_size=3,
padding='same', activation=tf.nn.relu)
net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)
# Second convolutional layer.
net = tf.layers.conv2d(inputs=net, name='layer_conv2',
filters=64, kernel_size=3,
padding='same', activation=tf.nn.relu)
net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)
# Flatten to a 2-rank tensor.
net = tf.contrib.layers.flatten(net)
# Eventually this should be replaced with:
# net = tf.layers.flatten(net)
# First fully-connected / dense layer.
# This uses the ReLU activation function.
net = tf.layers.dense(inputs=net, name='layer_fc1',
units=2048, activation=tf.nn.relu)
# Second fully-connected / dense layer.
# This is the last layer so it does not use an activation function.
net = tf.layers.dense(inputs=net, name='layer_fc_2',
units=num_classes)
# Logits output of the neural network.
logits = net
y_pred = tf.nn.softmax(logits=logits)
y_pred_cls = tf.argmax(y_pred, axis=1)
if mode == tf.estimator.ModeKeys.PREDICT:
spec = tf.estimator.EstimatorSpec(mode=mode,
predictions=y_pred_cls)
else:
cross_entropy =
tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,
logits=logits)
loss = tf.reduce_mean(cross_entropy)
optimizer =
tf.train.AdamOptimizer(learning_rate=params["learning_rate"])
train_op = optimizer.minimize(
loss=loss, global_step=tf.train.get_global_step())
metrics = \
{
"accuracy": tf.metrics.accuracy(labels, y_pred_cls)
}
spec = tf.estimator.EstimatorSpec(
mode=mode,
loss=loss,
train_op=train_op,
eval_metric_ops=metrics)
return spec`
this lables comes from here via tfrecords:
def input_fn(filenames, train, batch_size=10, buffer_size=2048):
# Args:
# filenames: Filenames for the TFRecords files.
# train: Boolean whether training (True) or testing (False).
# batch_size: Return batches of this size.
# buffer_size: Read buffers of this size. The random shuffling
# is done on the buffer, so it must be big enough.
# Create a TensorFlow Dataset-object which has functionality
# for reading and shuffling data from TFRecords files.
dataset = tf.data.TFRecordDataset(filenames=filenames)
# Parse the serialized data in the TFRecords files.
# This returns TensorFlow tensors for the image and labels.
dataset = dataset.map(parse)
if train:
# If training then read a buffer of the given size and
# randomly shuffle it.
dataset = dataset.shuffle(buffer_size=buffer_size)
# Allow infinite reading of the data.
num_repeat = None
else:
# If testing then don't shuffle the data.
# Only go through the data once.
num_repeat = 1
# Repeat the dataset the given number of times.
dataset = dataset.repeat(num_repeat)
# Get a batch of data with the given size.
dataset = dataset.batch(batch_size)
# Create an iterator for the dataset and the above modifications.
iterator = dataset.make_one_shot_iterator()
# Get the next batch of images and labels.
images_batch, labels_batch = iterator.get_next()
# The input-function must return a dict wrapping the images.
x = {'image': images_batch}
y = labels_batch
print(x, ' - ', y.get_shape())
return x, y
i generate labeles via this code for example image name=math-1 , lable = 1
def get_lable_and_image(path):
lbl = []
img = []
for filename in glob.glob(os.path.join(path, '*.png')):
img.append(filename)
lable = filename[41:].split()[0].split('-')[1]
lbl.append(int(lable))
lables = np.array(lbl)
images = np.array(img)
# print(images[1], lables[1])
return images, lables
i push images and lables to create tfrecords
def convert(image_paths, labels, out_path):
# Args:
# image_paths List of file-paths for the images.
# labels Class-labels for the images.
# out_path File-path for the TFRecords output file.
print("Converting: " + out_path)
# Number of images. Used when printing the progress.
num_images = len(image_paths)
# Open a TFRecordWriter for the output-file.
with tf.python_io.TFRecordWriter(out_path) as writer:
# Iterate over all the image-paths and class-labels.
for i, (path, label) in enumerate(zip(image_paths, labels)):
# Print the percentage-progress.
print_progress(count=i, total=num_images-1)
# Load the image-file using matplotlib's imread function.
img = imread(path)
# Convert the image to raw bytes.
img_bytes = img.tostring()
# Create a dict with the data we want to save in the
# TFRecords file. You can add more relevant data here.
data = \
{
'image': wrap_bytes(img_bytes),
'label': wrap_int64(label)
}
# Wrap the data as TensorFlow Features.
feature = tf.train.Features(feature=data)
# Wrap again as a TensorFlow Example.
example = tf.train.Example(features=feature)
# Serialize the data.
serialized = example.SerializeToString()
# Write the serialized data to the TFRecords file.
writer.write(serialized)
We ran into the strange problem that our relatively simple model converges on the CPU, but not on the server with GPU. No modifications to the code are done whatsoever between the two runs. Nor does the code contain any explicit conditional statements to change the workflow on different architectures.
What could possibly be the reason?
How can this tensorflow model converge on a CPU but not on a GPU?
In the likely event that the code is too long for you to read we are still thankful about general speculations and hints.
#!/usr/bin/python
from __future__ import print_function
import tensorflow as tf
import os
import numpy as np
import input_data # copy from tensorflow/examples/tutorials/mnist/input_data.py
# wget https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/examples/tutorials/mnist/input_data.py if needed
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
force_gpu = False
debug = True # histogram_summary ...
# _cpu='/cpu:0'
default_learning_rate=0.001
tensorboard_logs = '/tmp/tensorboard-logs/'
# $(sleep 5; open http://0.0.0.0:6006) & tensorboard --debug --logdir=/tmp/tensorboard-logs/
class net():
def __init__(self,model,data,name=0,learning_rate=default_learning_rate,batch_size=64):
self.session=sess=session=tf.Session()
self.model=model
self.data=data # assigned to self.x=net.input via train
self.batch_size=batch_size
self.layers=[]
self.last_width=self.input_width(data)
self.learning_rate=learning_rate
self.generate_model(model)
def generate_model(self,model, name=''):
if not model: return self
with tf.name_scope('state'):
self.keep_prob = tf.placeholder(tf.float32) # 1 for testing! else 1 - dropout
self.train_phase = tf.placeholder(tf.bool, name='train_phase')
self.global_step = tf.Variable(0) # dont set, feed or increment global_step, tensorflow will do it automatically
with tf.name_scope('data'):
n_input=28*28
n_classes=10
self.x = x = self.input = tf.placeholder(tf.float32, [None, n_input])
self.last_layer=x
self.y = y = self.target = tf.placeholder(tf.float32, [None, n_classes])
if not force_gpu: tf.image_summary("mnist", tf.reshape(self.x, [-1, 28, 28, 1], "mnist_images"))
with tf.name_scope('model'):
model(self)
if(self.last_width!=n_classes): self.classifier() # 10 classes auto
def input_width(self,data):
return 28*28
def add(self, layer):
self.layers.append(layer)
self.last_layer = layer
self.last_shape = layer.get_shape()
def reshape(self,shape):
self.last_layer = tf.reshape(self.last_layer,shape)
self.last_shape = shape
self.last_width = shape[-1]
def batchnorm(self):
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
with tf.name_scope('batchnorm') as scope:
input = self.last_layer
train_op=batch_norm(input, is_training=True, center=False, updates_collections=None, scope=scope)
test_op=batch_norm(input, is_training=False, updates_collections=None, center=False,scope=scope, reuse=True)
self.add(tf.cond(self.train_phase,lambda:train_op,lambda:test_op))
# Fully connected layer
def dense(self, hidden=1024, depth=1, act=tf.nn.tanh, dropout=False, parent=-1): #
if parent==-1: parent=self.last_layer
shape = self.last_layer.get_shape()
if shape and len(shape)>2:
self.last_width= int(shape[1]*shape[2]*shape[3])
print("reshapeing ",shape,"to",self.last_width)
parent = tf.reshape(parent, [-1, self.last_width])
width = hidden
while depth>0:
with tf.name_scope('Dense_{:d}'.format(hidden)) as scope:
print("Dense ", self.last_width, width)
nr = len(self.layers)
# if self.last_width == width:
# M = closest_unitary(np.random.rand(self.last_width, width) / (self.last_width + width))
# weights = tf.Variable(m, name="weights_dense_" + str(nr))
# else:
weights = tf.Variable(tf.random_uniform([self.last_width, width], minval=-1. / width, maxval=1. / width), name="weights_dense")
bias = tf.Variable(tf.random_uniform([width],minval=-1./width,maxval=1./width), name="bias_dense")
dense1 = tf.matmul(parent, weights, name='dense_'+str(nr))+ bias
tf.histogram_summary('dense_'+str(nr),dense1)
tf.histogram_summary('weights_'+str(nr),weights)
tf.histogram_summary('bias_'+str(nr),bias)
tf.histogram_summary('dense_'+str(nr)+'/sparsity', tf.nn.zero_fraction(dense1))
tf.histogram_summary('weights_'+str(nr)+'/sparsity', tf.nn.zero_fraction(weights))
if act: dense1 = act(dense1)
# if norm: dense1 = self.norm(dense1,lsize=1) # SHAPE!
if dropout: dense1 = tf.nn.dropout(dense1, self.keep_prob)
self.layers.append(dense1)
self.last_layer = parent = dense1
self.last_width = width
depth=depth-1
self.last_shape=[-1,width] # dense
# Convolution Layer
def conv(self,shape,act=tf.nn.relu,pool=True,dropout=False,norm=True,name=None): # True why dropout bad in tensorflow??
with tf.name_scope('conv'):
print("input shape ",self.last_shape)
print("conv shape ",shape)
width=shape[-1]
filters=tf.Variable(tf.random_normal(shape))
# filters = tf.Variable(tf.random_uniform(shape, minval=-1. / width, maxval=1. / width), name="filters")
_bias=tf.Variable(tf.random_normal([shape[-1]]))
# # conv1 = conv2d('conv', _X, _weights, _bias)
conv1=tf.nn.bias_add(tf.nn.conv2d(self.last_layer,filter=filters, strides=[1, 1, 1, 1], padding='SAME'), _bias)
if debug: tf.histogram_summary('conv_' + str(len(self.layers)), conv1)
if act: conv1=act(conv1)
if pool: conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
if norm: conv1 = tf.nn.lrn(conv1, depth_radius=4, bias=1.0, alpha=0.001 / 9.0, beta=0.75)
if debug: tf.histogram_summary('norm_' + str(len(self.layers)), conv1)
if dropout: conv1 = tf.nn.dropout(conv1,self.keep_prob)
print("output shape ",conv1.get_shape())
self.add(conv1)
def classifier(self,classes=10): # Define loss and optimizer
with tf.name_scope('prediction'):# prediction
if self.last_width!=classes:
# print("Automatically adding dense prediction")
self.dense(hidden=classes, act= False, dropout = False)
# cross_entropy = -tf.reduce_sum(y_*y)
with tf.name_scope('classifier'):
y_=self.target
manual=False # True
if classes>100:
print("using sampled_softmax_loss")
y=prediction=self.last_layer
self.cost = tf.reduce_mean(tf.nn.sampled_softmax_loss(y, y_)) # for big vocab
elif manual:
# prediction = y =self.last_layer=tf.nn.softmax(self.last_layer)
# self.cost = cross_entropy = -tf.reduce_sum(y_ * tf.log(y+ 1e-10)) # against NaN!
prediction = y = tf.nn.log_softmax(self.last_layer)
self.cost = cross_entropy = -tf.reduce_sum(y_ * y)
else:
y = prediction = self.last_layer
self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_)) # prediction, target
# if not gpu:
tf.scalar_summary('cost', self.cost)
# self.cost = tf.Print(self.cost , [self.cost ], "debug cost : ")
learning_scheme=self.learning_rate
# learning_scheme=tf.train.exponential_decay(self.learning_rate, self.global_step, decay_steps, decay_size)
self.optimizer = tf.train.AdamOptimizer(learning_scheme).minimize(self.cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(self.target, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
if not force_gpu: tf.scalar_summary('accuracy', self.accuracy)
# Launch the graph
def next_batch(self,batch_size=10):
return self.data.train.next_batch(batch_size)
def train(self,steps=-1,dropout=None,display_step=10,test_step=200): #epochs=-1,
steps = 9999999 if steps==-1 else steps
session=self.session
# with tf.device(_cpu):
# import tensorflow.contrib.layers as layers
# t = tf.verify_tensor_all_finite(t, msg)
tf.add_check_numerics_ops()
self.summaries = tf.merge_all_summaries()
self.summary_writer = tf.train.SummaryWriter(tensorboard_logs, session.graph) #
if not dropout:dropout=1. # keep all
x=self.x
y=self.y
keep_prob=self.keep_prob
session.run([tf.initialize_all_variables()])
step = 0 # show first
while step < steps:
# print("step %d \r" % step)# end=' ')
batch_xs, batch_ys = self.next_batch(self.batch_size)
# tf.train.shuffle_batch_join(example_list, batch_size, capacity=min_queue_size + batch_size * 16, min_queue_size)
# Fit training using batch data
feed_dict = {x: batch_xs, y: batch_ys, keep_prob: dropout, self.train_phase: True}
loss,_= session.run([self.cost,self.optimizer], feed_dict=feed_dict)
if step % test_step == 0: self.test(step)
if step % display_step == 0:
# Calculate batch accuracy, loss
feed = {x: batch_xs, y: batch_ys, keep_prob: 1., self.train_phase: False}
acc , summary = session.run([self.accuracy,self.summaries], feed_dict=feed)
# self.summary_writer.add_summary(summary, step) # only test summaries for smoother curve
print("\rStep {:d} Loss= {:.6f} Accuracy= {:.3f}".format(step,loss,acc),end=' ')
if str(loss)=="nan": return print("\nLoss gradiant explosion, exiting!!!") #restore!
step += 1
print("\nOptimization Finished!")
self.test(step,number=10000) # final test
def inputs(self,data):
self.inputs, self.labels = load_data()#...)
def test(self,step,number=400):#256
session=sess=self.session
run_metadata = tf.RunMetadata()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
# Calculate accuracy for 256 mnist test images
test_labels = self.data.test.labels[:number]
test_images = self.data.test.images[:number]
feed_dict = {self.x: test_images, self.y: test_labels, self.keep_prob: 1., self.train_phase:False}
accuracy,summary= self.session.run([self.accuracy, self.summaries], feed_dict=feed_dict)
# accuracy,summary = session.run([self.accuracy, self.summaries], feed_dict, run_options, run_metadata)
print('\t'*3+"Test Accuracy:",accuracy)
# self.summary_writer.add_run_metadata(run_metadata, 'step #%03d' % step)
self.summary_writer.add_summary(summary,global_step=step)
def dense(net): # best with lr ~0.001
# type: (layer.net) -> None
# net.batchnorm() # start lower, else no effect
# net.dense(400,act=None)# # ~95% we can do better:
net.dense(400, act=tf.nn.tanh)# 0.996 YAY only 0.985 on full set, Step 5000 flat
return # 0.957% without any model!!
def alex(net):
# type: (layer.net) -> None
print("Building Alex-net")
net.reshape(shape=[-1, 28, 28, 1]) # Reshape input pictures
# net.batchnorm()
net.conv([3, 3, 1, 64])
net.conv([3, 3, 64, 128])
net.conv([3, 3, 128, 256])
net.dense(1024,act=tf.nn.relu)
net.dense(1024,act=tf.nn.relu)
# net=layer.net(dense,data=mnist, learning_rate=0.01 )#,'mnist' baseline
_net=net(alex,data=mnist, learning_rate=0.001)#,'mnist'
_net.train(50000,dropout=0.6,display_step=1,test_step=10)
In general floating point computations can be a little nondeterministic when it comes to adding many numbers (and some GPUs are buggy). Did you try retuning hyperparameters (varying learning rates and whatnot) to account for this?
I faced the same problem in the past. I solved it by following the tutorial at this link: https://www.tensorflow.org/install/pip#windows-native
Apparently, conda installation of TensorFlow is not recommended.
As an extra tip, if you're working with GPU avoid using TensorFlow 2.3. Apparently, it has some installation issues.