I've been following this tutorial:
https://blog.altoros.com/using-linear-regression-in-tensorflow.html
I'm aware there's better ways to do linear regression, but I'm using this as a base to do multi-variate regression and multi-variate non-linear regression to try to understand TensorFlow.
Without normalizing my data at all, I get 'nan' with GradientDescentOptimizer. I'm curious about why this is. Why is normalization so important that the model won't run at all? And what about subtracting mean and dividing by standard deviation suddenly makes it work so well?
After normalizing data, I'd like to recover the original value.
Each set of data seems to be normalized separately with its own stddev and mean parameters: the training data X, training data Y, test data X, and test data Y.
However, when I run the network on new data, I'm assuming when I predict new values, I have to normalize the input again. In that case, how do I make sense of the predicted Y? Am I supposed to use the training data's standard deviation and mean to unnormalize, or the new data's standard deviation and mean? I am confused what the model is actually fitting to when I give it normalized training data, and how to interpret W and b. I originally wanted to fit to Y = mx + b, and want to know what m and b really are.
Because I trained on training data, I assumed that I would need to store the training_data's pre-normalization standard deviation and mean and unnormalize any results from the network using this value. But in fact, when I use the new data's standard deviation and mean to unnormalize I get more reasonable values. I don't think it's worth posting that code because I just have a fundamental misunderstanding of what I need to do, but this is the basic code I'm using anyway.
import tensorflow as tf
import numpy
import matplotlib.pyplot as plt
# Train a data set
# X: size data
size_data = [ 2104, 1600, 2400, 1416, 3000, 1985, 1534, 1427,
1380, 1494, 1940, 2000, 1890, 4478, 1268, 2300,
1320, 1236, 2609, 3031, 1767, 1888, 1604, 1962,
3890, 1100, 1458, 2526, 2200, 2637, 1839, 1000,
2040, 3137, 1811, 1437, 1239, 2132, 4215, 2162,
1664, 2238, 2567, 1200, 852, 1852, 1203 ]
# Y: price data (set to 5x + 30)
price_data = [5*c + 30 for c in size_data]
size_data = numpy.asarray(size_data)
price_data = numpy.asarray(price_data)
# Test a data set
size_data_test = [ 1600, 1494, 1236, 1100, 3137, 2238 ]
price_data_test = [5*c + 30 for c in size_data_test]
size_data_test = numpy.asarray(size_data_test)
price_data_test = numpy.asarray(price_data_test)
def normalize(array):
std = array.std()
mean = array.mean()
return (array - mean) / std, std, mean
# Normalize a data set
size_data_n, size_data_n_std, size_data_n_mean = normalize(size_data)
price_data_n, price_data_n_std, price_data_n_mean = normalize(price_data)
size_data_test_n, size_data_test_n_std, size_data_test_n_mean = normalize(size_data_test)
price_data_test_n, price_data_test_n_std, price_data_test_n_mean = normalize(price_data_test)
# Display a plot
#plt.plot(size_data, price_data, 'ro', label='Samples data')
#plt.legend()
#plt.draw()
samples_number = price_data_n.size
# TF graph input
X = tf.placeholder("float")
Y = tf.placeholder("float")
# Create a model
# Set model weights
W = tf.Variable(numpy.random.randn(), name="weight")
b = tf.Variable(numpy.random.randn(), name="bias")
# Set parameters
learning_rate = 0.05
training_iteration = 200
# Construct a linear model
model = tf.add(tf.mul(X, W), b)
# Minimize squared errors
cost_function = tf.reduce_sum(tf.pow(model - Y, 2))/(2 * samples_number) #L2 loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_function) #Gradient descent
#optimizer = tf.train.AdagradOptimizer(learning_rate).minimize(cost_function)
# Initialize variables
init = tf.initialize_all_variables()
# Launch a graph
with tf.Session() as sess:
sess.run(init)
display_step = 20
# Fit all training data
for iteration in range(training_iteration):
for (x, y) in zip(size_data_n, price_data_n):
sess.run(optimizer, feed_dict={X: x, Y: y})
# Display logs per iteration step
if iteration % display_step == 0:
print("Iteration:", '%04d' % (iteration + 1), "cost=", "{:.9f}".format(sess.run(cost_function, feed_dict={X:size_data_n, Y:price_data_n})),\
"W=", sess.run(W), "b=", sess.run(b))
tuning_cost = sess.run(cost_function, feed_dict={X: size_data_n, Y: price_data_n})
print("Tuning completed:", "cost=", "{:.9f}".format(tuning_cost), "W=", sess.run(W), "b=", sess.run(b))
# Validate a tuning model
testing_cost = sess.run(cost_function, feed_dict={X: size_data_test_n, Y: price_data_test_n})
print("Testing data cost:" , testing_cost)
Y_predicted = sess.run(model, feed_dict={X: size_data_test_n, Y: price_data_test_n})
print("%-20s%-20s%-20s%-20s" % ("Test X", "Actual", "Target", "Error(%)"))
print('Normalized')
for i in range(len(size_data_test_n)):
err = 100.0 * abs(Y_predicted[i] - price_data_test_n[i]) / abs(price_data_test_n[i])
print("%-20f%-20f%-20f%-20f" % (size_data_test_n[i], Y_predicted[i], price_data_test_n[i], err))
print('Unnormalized')
for i in range(len(size_data_test_n)):
orig_size_data_test_i = size_data_test_n[i] * size_data_test_n_std + size_data_test_n_mean
orig_price_data_test_i = price_data_test_n[i] * price_data_test_n_std + price_data_test_n_mean
# ??? which one is correct for getting unnormalized predicted Y?
#orig_Y_predicted_i = Y_predicted[i] * price_data_n_std + price_data_n_mean
orig_Y_predicted_i = Y_predicted[i] * price_data_test_n_std + price_data_test_n_mean
orig_err = 100.0 * abs(orig_Y_predicted_i - orig_price_data_test_i) / abs(orig_price_data_test_i)
print("%-20f%-20f%-20f%-20f" % (orig_size_data_test_i, orig_Y_predicted_i, orig_price_data_test_i, orig_err))
# Display a plot
plt.figure()
plt.plot(size_data, price_data, 'ro', label='Samples')
plt.plot(size_data_test, price_data_test, 'go', label='Testing samples')
plt.plot(size_data_test, (sess.run(W) * size_data_test_n + sess.run(b))*price_data_n_std + price_data_n_mean , label='Fitted test line')
plt.legend()
plt.show()
Related
Suppose I have the following code written in Tensorflow 1.x where I define custom loss function. I wish to remove .compat.v1., Session, placeholder etc. and convert it into Tensorflow 2.x.
How to do so?
import DGM
import tensorflow as tf
import numpy as np
import scipy.stats as spstats
import matplotlib.pyplot as plt
from tqdm.notebook import trange
# Option parameters
phi = 10
n = 0.01
T = 4
# Solution parameters (domain on which to solve PDE)
t_low = 0.0 - 1e-10
x_low = 0.0 + 1e-10
x_high = 1.0
# neural network parameters
num_layers = 3
nodes_per_layer = 50
# Training parameters
sampling_stages = 2500 # number of times to resample new time-space domain points
steps_per_sample = 20 # number of SGD steps to take before re-sampling
# Sampling parameters
nsim_interior = 100
nsim_boundary_1 = 50
nsim_boundary_2 = 50
nsim_initial = 50
x_multiplier = 1.1 # multiplier for oversampling i.e. draw x from [x_low, x_high * x_multiplier]
def sampler(nsim_interior, nsim_boundary_1, nsim_boundary_2, nsim_initial):
''' Sample time-space points from the function's domain; points are sampled
uniformly on the interior of the domain, at the initial/terminal time points
and along the spatial boundary at different time points.
Args:
nsim_interior: number of space points in the interior of U
nsim_boundary_1: number of space points in the boundary of U
nsim_boundary_2: number of space points in the boundary of U_x
nsim_initial: number of space points at the initial time
'''
# Sampler #1: domain interior
t_interior = np.random.uniform(low=t_low, high=T, size=[nsim_interior, 1])
x_interior = np.random.uniform(low=x_low, high=x_high*x_multiplier, size=[nsim_interior, 1])
# Sampler #2: spatial boundary 1
t_boundary_1 = np.random.uniform(low=t_low, high=T, size=[nsim_boundary_1, 1])
x_boundary_1 = np.ones((nsim_boundary_1, 1))
# Sampler #3: spatial boundary 2
t_boundary_2 = np.random.uniform(low=t_low, high=T, size=[nsim_boundary_2, 1])
x_boundary_2 = np.zeros((nsim_boundary_2, 1))
# Sampler #4: initial condition
t_initial = np.zeros((nsim_initial, 1))
x_initial = np.random.uniform(low=x_low, high=x_high*x_multiplier, size=[nsim_initial, 1])
return (
t_interior, x_interior,
t_boundary_1, x_boundary_1,
t_boundary_2, x_boundary_2,
t_initial, x_initial
)
def loss(
model,
t_interior, x_interior,
t_boundary_1, x_boundary_1,
t_boundary_2, x_boundary_2,
t_initial, x_initial
):
''' Compute total loss for training.
Args:
model: DGM model object
t_interior, x_interior: sampled time / space points in the interior of U
t_boundary_1, x_boundary_1: sampled time / space points in the boundary of U
t_boundary_2, x_boundary_2: sampled time / space points in the boundary of U_x
t_initial, x_initial: sampled time / space points at the initial time
'''
# Loss term #1: PDE
# compute function value and derivatives at current sampled points
u = model(t_interior, x_interior)
u_t = tf.gradients(ys=u, xs=t_interior)[0]
u_x = tf.gradients(ys=u, xs=x_interior)[0]
u_xx = tf.gradients(ys=u_x, xs=x_interior)[0]
diff_u = u_t - u_xx + phi**2 * (tf.nn.relu(u) + 1e-10)**n
# compute average L2-norm for the PDE
L1 = tf.reduce_mean(input_tensor=tf.square(diff_u))
# Loss term #2: First b. c.
u = model(t_boundary_1, x_boundary_1)
bc1_error = u - 1
# Loss term #3: Second b. c.
u = model(t_boundary_2, x_boundary_2)
u_x = tf.gradients(ys=u, xs=x_boundary_2)[0]
bc2_error = u_x - 0
# Loss term #3: Initial condition
u = model(t_initial, x_initial)
init_error = u - 1
# compute average L2-norm for the initial/boundary conditions
L2 = tf.reduce_mean(input_tensor=tf.square(bc1_error + bc2_error + init_error))
return L1, L2
# initialize DGM model (last input: space dimension = 1)
model = DGM.DGMNet(nodes_per_layer, num_layers, 1)
# tensor placeholders (_tnsr suffix indicates tensors)
# inputs (time, space domain interior, space domain at initial time)
t_interior_tnsr = tf.compat.v1.placeholder(tf.float32, [None,1])
x_interior_tnsr = tf.compat.v1.placeholder(tf.float32, [None,1])
t_boundary_1_tnsr = tf.compat.v1.placeholder(tf.float32, [None,1])
x_boundary_1_tnsr = tf.compat.v1.placeholder(tf.float32, [None,1])
t_boundary_2_tnsr = tf.compat.v1.placeholder(tf.float32, [None,1])
x_boundary_2_tnsr = tf.compat.v1.placeholder(tf.float32, [None,1])
t_initial_tnsr = tf.compat.v1.placeholder(tf.float32, [None,1])
x_initial_tnsr = tf.compat.v1.placeholder(tf.float32, [None,1])
# loss
L1_tnsr, L2_tnsr = loss(
model,
t_interior_tnsr, x_interior_tnsr,
t_boundary_1_tnsr, x_boundary_1_tnsr,
t_boundary_2_tnsr, x_boundary_2_tnsr,
t_initial_tnsr, x_initial_tnsr
)
loss_tnsr = L1_tnsr + L2_tnsr
# set optimizer
starting_learning_rate = 3e-4
global_step = tf.Variable(0, trainable=False)
lr = tf.compat.v1.train.exponential_decay(
learning_rate=starting_learning_rate,
global_step=global_step,
decay_steps=1e5,
decay_rate=0.96,
staircase=True,
)
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=lr).minimize(loss_tnsr)
# initialize variables
init_op = tf.compat.v1.global_variables_initializer()
# open session
sess = tf.compat.v1.Session()
sess.run(init_op)
try:
model.load_weights("checkpoint/")
print("Loading from checkpoint.")
except:
print("Checkpoint not found.")
# for each sampling stage
for i in trange(sampling_stages):
# sample uniformly from the required regions
t_interior, x_interior, \
t_boundary_1, x_boundary_1, \
t_boundary_2, x_boundary_2, \
t_initial, x_initial = sampler(
nsim_interior, nsim_boundary_1, nsim_boundary_2, nsim_initial
)
# for a given sample, take the required number of SGD steps
for _ in range(steps_per_sample):
loss, L1, L2, _ = sess.run(
[loss_tnsr, L1_tnsr, L2_tnsr, optimizer],
feed_dict = {
t_interior_tnsr: t_interior,
x_interior_tnsr: x_interior,
t_boundary_1_tnsr: t_boundary_1,
x_boundary_1_tnsr: x_boundary_1,
t_boundary_2_tnsr: t_boundary_2,
x_boundary_2_tnsr: x_boundary_2,
t_initial_tnsr: t_initial,
x_initial_tnsr: x_initial,
}
)
if i % 10 == 0:
print(f"Loss: {loss:.5f},\t L1: {L1:.5f},\t L2: {L2:.5f},\t iteration: {i}")
model.save_weights("checkpoint/")
I tried searching how to implement custom loss functions with model as an argument, but couldn't implement it.
For model.compile there is a loss argument for which you can pass the Loss function. May be a string (name of loss function), or a tf.keras.losses.Loss instance. For example
Model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.BinaryCrossentropy())
If you have created your custom loss function you can also pass that loss function to the loss argument by providing the name of that loss function. For example
def my_loss_fn(y_true, y_pred):
squared_difference = tf.square(y_true - y_pred)
return tf.reduce_mean(squared_difference, axis=-1)
model.compile(optimizer='adam', loss=my_loss_fn)
Thank You.
I am trying to implement a custom loss function in Tensorflow 2.4 using the Keras backend.
The loss function is a ranking loss; I found the following paper with a somewhat log-likelihood loss: Chen et al. Single-Image Depth Perception in the Wild.
Similarly, I wanted to sample some (in this case 50) points from an image to compare the relative order between ground-truth and predicted depth maps using the NYU-Depth dataset. Being a fan of Numpy, I started working with that but came to the following exception:
ValueError: No gradients provided for any variable: [...]
I have learned that this is caused by the arguments not being filled when calling the loss function but instead, a C function is compiled which is then used later. So while I know the dimensions of my tensors (4, 480, 640, 1), I cannot work with the data as wanted and have to use the keras.backend functions on top so that in the end (if I understood correctly), there is supposed to be a path between the input tensors from the TF graph and the output tensor, which has to provide a gradient.
So my question now is: Is this a feasible loss function within keras?
I have already tried a few ideas and different approaches with different variations of my original code, which was something like:
def ranking_loss_function(y_true, y_pred):
# Chen et al. loss
y_true_np = K.eval(y_true)
y_pred_np = K.eval(y_pred)
if y_true_np.shape[0] != None:
num_sample_points = 50
total_samples = num_sample_points ** 2
err_list = [0 for x in range(y_true_np.shape[0])]
for i in range(y_true_np.shape[0]):
sample_points = create_random_samples(y_true, y_pred, num_sample_points)
for x1, y1 in sample_points:
for x2, y2 in sample_points:
if y_true[i][x1][y1] > y_true[i][x2][y2]:
#image_relation_true = 1
err_list[i] += np.log(1 + np.exp(-1 * y_pred[i][x1][y1] + y_pred[i][x2][y2]))
elif y_true[i][x1][y1] < y_true[i][x2][y2]:
#image_relation_true = -1
err_list[i] += np.log(1 + np.exp(y_pred[i][x1][y1] - y_pred[i][x2][y2]))
else:
#image_relation_true = 0
err_list[i] += np.square(y_pred[i][x1][y1] - y_pred[i][x2][y2])
err_list = np.divide(err_list, total_samples)
return K.constant(err_list)
As you can probably tell, the main idea was to first create the sample points and then based on the existing relation between them in y_true/y_pred continue with the corresponding computation from the cited paper.
Can anyone help me and provide some more helpful information or tips on how to correctly implement this loss using keras.backend functions? Trying to include the ordinal relation information really confused me compared to standard regression losses.
EDIT: Just in case this causes confusion: create_random_samples() just creates 50 random sample points (x, y) coordinate pairs based on the shape[1] and shape[2] of y_true (image width and height)
EDIT(2): After finding this variation on GitHub, I have tried out a variation using only TF functions to retrieve data from the tensors and compute the output. The adjusted and probably more correct version still throws the same exception though:
def ranking_loss_function(y_true, y_pred):
#In the Wild ranking loss
y_true_np = K.eval(y_true)
y_pred_np = K.eval(y_pred)
if y_true_np.shape[0] != None:
num_sample_points = 50
total_samples = num_sample_points ** 2
bs = y_true_np.shape[0]
w = y_true_np.shape[1]
h = y_true_np.shape[2]
total_samples = total_samples * bs
num_pairs = tf.constant([total_samples], dtype=tf.float32)
output = tf.Variable(0.0)
for i in range(bs):
sample_points = create_random_samples(y_true, y_pred, num_sample_points)
for x1, y1 in sample_points:
for x2, y2 in sample_points:
y_true_sq = tf.squeeze(y_true)
y_pred_sq = tf.squeeze(y_pred)
d1_t = tf.slice(y_true_sq, [i, x1, y1], [1, 1, 1])
d2_t = tf.slice(y_true_sq, [i, x2, y2], [1, 1, 1])
d1_p = tf.slice(y_pred_sq, [i, x1, y1], [1, 1, 1])
d2_p = tf.slice(y_pred_sq, [i, x2, y2], [1, 1, 1])
d1_t_sq = tf.squeeze(d1_t)
d2_t_sq = tf.squeeze(d2_t)
d1_p_sq = tf.squeeze(d1_p)
d2_p_sq = tf.squeeze(d2_p)
if d1_t_sq > d2_t_sq:
# --> Image relation = 1
output.assign_add(tf.math.log(1 + tf.math.exp(-1 * d1_p_sq + d2_p_sq)))
elif d1_t_sq < d2_t_sq:
# --> Image relation = -1
output.assign_add(tf.math.log(1 + tf.math.exp(d1_p_sq - d2_p_sq)))
else:
output.assign_add(tf.math.square(d1_p_sq - d2_p_sq))
return output/num_pairs
EDIT(3): This is the code for create_random_samples():
(FYI: Because it was weird to get the shape from y_true in this case, I first proceeded to hard-code it here as I know it for the dataset which I am currently using.)
def create_random_samples(y_true, y_pred, num_points=50):
y_true_shape = (4, 480, 640, 1)
y_pred_shape = (4, 480, 640, 1)
if y_true_shape[0] != None:
num_samples = num_points
population = [(x, y) for x in range(y_true_shape[1]) for y in range(y_true_shape[2])]
sample_points = random.sample(population, num_samples)
return sample_points
Attached model shows how to add bias in case of the unbalanced classification problem initial_bias = np.log([pos/neg]). Is there a way to add bias if you have multi-class classification with unbalanced data, Say 5 classes where classes are have distribution (0.4,0.3,0.2.0.08 and 0.02)
2) also how to calculate and use class weights in such case?
update 1
I found a way to apply weights, still not sure how to use bias
#####adding weights 20 Feb
weight_for_0 = ( 1/ 370)*(370+ 977+ 795)/3
weight_for_1 = ( 1/ 977)*(370+ 977+ 795)/3
weight_for_2 = (1 / 795)*(370+ 977+ 795)/3
#array([0, 1, 2]), array([370, 977, 795])
class_weights_dict = {0: weight_for_0, 1: weight_for_1, 2:weight_for_2}
class_weights_dict
Dcnn.fit(train_dataset,
epochs=NB_EPOCHS,
callbacks=[MyCustomCallback()],verbose=2,validation_data=test_dataset, class_weight=class_weights_dict)
Considering that you're using 'softmax':
softmax = exp(neurons) / sum(exp(neurons))
And that you want the results of the classes to be:
frequency = [0.4 , 0.3 , 0.2 , 0.08 , 0.02]
Biases should be given by the equation (elementwise):
frequency = exp(biases) / sum(exp(biases))
This forms a system of equations:
f1 = e^b1 / (e^b1 + e^b2 + ... + e^b5)
f2 = e^b2 / (e^b1 + e^b2 + ... + e^b5)
...
f5 = e^b5 / (e^b1 + e^b2 + ... + e^b5)
If you can solve this system of equations, you get the biases you want.
I used excel and test-error method to determine that for the frequencies you wanted, your biases should be respectively:
[1.1 , 0.81 , 0.4 , -0.51 , -1.9]
I don't really know how to solve that system easily, but you can keep experimenting with excel or another thing until you reach the solution.
Adding the biases to the layer - method 1.
Use a name when defining the layer, like:
self.last_dense = layers.Dense(units=3, activation="softmax", name='last_layer')
You may need to build the model first, so:
dummy_predictions = model.predict(np.zeros((1,) + input_shape))
Then you get the weights:
weights_and_biases = model.get_layer('last_layer').get_weights()
w, b = weights_and_biases
new_biases = np.array([-0.45752, 0.51344, 0.30730])
model.get_layer('last_layer').set_weights([w, new_biases])
Method 2
def bias_init(bias_shape):
return K.variable([-0.45752, 0.51344, 0.30730])
self.last_dense = layers.Dense(units=3, activation="softmax", bias_initializer=bias_init)
Just in addition to #Daniel Möller's answer, to solve the system of equations
f1 = e^b1 / (e^b1 + e^b2 + ... + e^b5)
...
f5 = e^b5 / (e^b1 + e^b2 + ... + e^b5)
You don't need excel or anything. Just compute bi = ln(fi).
To calculate fi = e^bi / (sum of e^bj), note that fi/fj = e^(bi-bj). Suppose the lowest frequency is fk. You can set bk= 0 and then compute every other class bias with bi = bj + ln(fi/fj).
A complete answer is here:
### To solve that set of nonlinear equations, use scipy fsolve
from scipy.optimize import fsolve
from math import exp
# define the frequency of different classes
f=(0.4, 0.3, 0.2, 0.08, 0.02)
# define the equation
def eqn(x, frequency):
sum_exp = sum([exp(x_i) for x_i in x])
return [exp(x[i])/sum_exp - frequency[i] for i in range(len(frequency))]
# calculate bias init
bias_init = fsolve(func=eqn,
x0=[0]*len(f),
).tolist()
bias_init
To put all things together
def init_imbalanced_class_weight_bias(df:pd.DataFrame, lable:str):
"""To handle imbalanced classification, provide initial bias list and class weight dictionary to 2 places in a tf classifier
1) In the last layer of classifier: tf.keras.layers.Dense(..., bias_initializer = bias_init)
2) model.fit(train_ds, #x=dict(X_train), y=y_train,
batch_size=batch_size,
validation_data= valid_ds, #(dict(X_test), y_test),
epochs=epochs,
callbacks=callbacks,
class_weight=class_weight,
)
Args:
df:pd.DataFrame=train_df
label:str
Returns:
class_weight:dict, e.g. {0: 1.6282051282051282, 1: 0.7604790419161677, 2: 0.9338235294117647}
bias_init:list e.g. [0.3222079660508266, 0.1168690393701237, -0.43907701967633633]
Examples:
class_weight, bias_init = init_imbalanced_class_weight_bias(df=train_df, lable=label)
References:
1. https://www.tensorflow.org/tutorials/structured_data/imbalanced_data
2. https://stackoverflow.com/questions/60307239/setting-bias-for-multiclass-classification-python-tensorflow-keras#new-answer
"""
from scipy.optimize import fsolve
from math import exp
# to deal with imbalance classification, calculate class_weight
d = dict(df[label].value_counts())
m = np.mean(list(d.values()))
class_weight = {k:m/v for (k,v) in d.items()} #e.g. {0: 1.6282051282051282, 1: 0.7604790419161677, 2: 0.9338235294117647}
# define classes frequency list
frequency = list(list(d.values())/sum(d.values()))
# define equations to solve initial bias
def eqn(x, frequency=frequency):
sum_exp = sum([exp(x_i) for x_i in x])
return [exp(x[i])/sum_exp - frequency[i] for i in range(len(frequency))]
# calculate init bias
bias_init = fsolve(func=eqn,
x0=[0]*len(frequency),
).tolist()
return class_weight, bias_init
class_weight, bias_init = init_imbalanced_class_weight_bias(df=train_df, lable=label)
I will post a colab notebook if anyone interested.
In case your tf classifier complains about ValueError: ('Could not interpret initializer identifier:', then add the tf.keras.initializers.Constant() around bias_init:
def init_imbalanced_class_weight_bias(...)
...
return class_weight, tf.keras.initializers.Constant(bias_init)
I'm working on a seq2sql project and I successfully build a model but when training I get an error. I'm not using any Keras embedding layer.
M=13 #Question Length
d=40 #Dimention of the LSTM
C=12 #number of table Columns
batch_size=9
inputs1=Input(shape=(M,100),name='question_token')
Hq=Bidirectional(LSTM(d,return_sequences=True),name='QuestionENC')(inputs1) #this is HQ shape is (num_samples,13,80)
inputs2=Input(shape=(C,3,100),name='col_token')
col_lstm_layer=Bidirectional(LSTM(d,return_sequences=False),name='ColENC')
def hidd(te):
t=tf.Variable(initial_value=1,dtype=tf.int32)
for i in range(batch_size):
t=tf.assign(t,i)
Z = tf.nn.embedding_lookup(te, t)
print(col_lstm_layer(Z))
h=tf.reshape(col_lstm_layer(Z),[1,C,d*2])
if i==0:
# cols_last_hidden=tf.Variable(initial_value=h)
cols_last_hidden=tf.stack(h)#this is because it gives an error if we use tf.Variable here
else:
cols_last_hidden=tf.concat([cols_last_hidden,h],0)#shape of this one is (num_samples,num_col,80) 80 is last encoding of each column
return cols_last_hidden
cols_last_hidden=Lambda(hidd)(inputs2)
Hq=Dense(d*2,name='QuestionLastEncode')(Hq)
I=tf.Variable(initial_value=1,dtype=tf.int32)
J=tf.Variable(initial_value=1,dtype=tf.int32)
K=1
def get_col_att(tensors):
global K,all_col_attention
if K:
t=tf.Variable(initial_value=1,dtype=tf.int32)
for i in range(batch_size):
t=tf.assign(t,i)
x = tf.nn.embedding_lookup(tensors[0], t)
# print("tensors[1]:",tensors[1])
y = tf.nn.embedding_lookup(tensors[1], t)
# print("x shape",x.shape,"y shape",y.shape)
y=tf.transpose(y)
# print("x shape",x.shape,"y",y.shape)
Ecol=tf.reshape(tf.transpose(tf.tensordot(x,y,axes=1)),[1,C,M])
if i==0:
# all_col_attention=tf.Variable(initial_value=Ecol,name=""+i)
all_col_attention=tf.stack(Ecol)
else:
all_col_attention=tf.concat([all_col_attention,Ecol],0)
K=0
print("all_col_attention",all_col_attention)
return all_col_attention
total_alpha_sel_lambda=Lambda(get_col_att,name="Alpha")([Hq,cols_last_hidden])
total_alpha_sel=Dense(13,activation="softmax")(total_alpha_sel_lambda)
# print("Hq",Hq," total_alpha_sel_lambda shape",total_alpha_sel_lambda," total_alpha_sel shape",total_alpha_sel.shape)
def get_EQcol(tensors):
global K
if K:
t=tf.Variable(initial_value=1,dtype=tf.int32)
global all_Eqcol
for i in range(batch_size):
t=tf.assign(t,i)
x = tf.nn.embedding_lookup(tensors[0], t)
y = tf.nn.embedding_lookup(tensors[1], t)
Eqcol=tf.reshape(tf.tensordot(x,y,axes=1),[1,C,d*2])
if i==0:
# all_Eqcol=tf.Variable(initial_value=Eqcol,name=""+i)
all_Eqcol=tf.stack(Eqcol)
else:
all_Eqcol=tf.concat([all_Eqcol,Eqcol],0)
K=0
print("all_Eqcol",all_Eqcol)
return all_Eqcol
K=1
EQcol=Lambda(get_EQcol,name='EQcol')([total_alpha_sel,Hq])#total_alpha_sel(12x13) Hq(13xd*2)
EQcol=Dropout(.2)(EQcol)
L1=Dense(d*2,name='L1')(cols_last_hidden)
L2=Dense(d*2,name='L2')(EQcol)
L1_plus_L2=Add()([L1,L2])
pre=Flatten()(L1_plus_L2)
Psel=Dense(12,activation="softmax")(pre)
model=Model(inputs=[inputs1,inputs2],outputs=Psel)
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
model.summary()
earlyStopping=EarlyStopping(monitor='val_loss', patience=7, verbose=0, mode='auto')
history=model.fit([Equestion,Col_Embeddings],y_train,epochs=50,validation_split=.1,shuffle=False,callbacks=[earlyStopping],batch_size=batch_size)
The shapes of the Equestion, Col_Embeddings, and y_train are (10, 12, 3, 100) ,(10, 13, 100) and (10, 12).
I searched about this error but in all cases they have used an embedding layer incorrectly. Here I get this error even though I'm not using one.
indices = 2 is not in [0, 1)
[[{{node lambda_3/embedding_lookup_2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:#col_token_2"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_col_token_2_0_1, lambda_3/Assign_2, lambda_3/embedding_lookup_2/axis)]]
The problem here was the batch size is defined at the graph level.here i have used batch_size =9 for the graph and yes i get batch size of 9 for training by the validation split .1 for the full batch size of 10 but for the validation i left only one sample because 10*.1 is one.
So the batch size of 1 cannot be passed to the graph because it needs batch size of 9.that's why this error comes
As for the solution i put the batch_size=1 and then it works fine also got a good accuracy by using batch_size=1.
Hope this will help someone.
Cheers ..
For me this error was due to bad form of my input data. You have to double check your input data to the model and it depends on your model input.
I am trying to model an ordinal predicted variable using PyMC3 based on the approach in chapter 23 of Doing Bayesian Data Analysis. I would like to determine a good starting value using find_MAP, but am receiving an optimization error.
The model:
import pymc3 as pm
import numpy as np
import theano
import theano.tensor as tt
# Some helper functions
def cdf(x, location=0, scale=1):
epsilon = np.array(1e-32, dtype=theano.config.floatX)
location = tt.cast(location, theano.config.floatX)
scale = tt.cast(scale, theano.config.floatX)
div = tt.sqrt(2 * scale ** 2 + epsilon)
div = tt.cast(div, theano.config.floatX)
erf_arg = (x - location) / div
return .5 * (1 + tt.erf(erf_arg + epsilon))
def percent_to_thresh(idx, vect):
return 5 * tt.sum(vect[:idx + 1]) + 1.5
def full_thresh(thresh):
idxs = tt.arange(thresh.shape[0] - 1)
thresh_mod, updates = theano.scan(fn=percent_to_thresh,
sequences=[idxs],
non_sequences=[thresh])
return tt.concatenate([[-1 * np.inf, 1.5], thresh_mod, [6.5, np.inf]])
def compute_ps(thresh, location, scale):
f_thresh = full_thresh(thresh)
return cdf(f_thresh[1:], location, scale) - cdf(f_thresh[:-1], location, scale)
# Generate data
real_ps = [0.05, 0.05, 0.1, 0.1, 0.2, 0.3, 0.2]
data = np.random.choice(7, size=1000, p=real_ps)
# Run model
with pm.Model() as model:
mu = pm.Normal('mu', mu=4, sd=3)
sigma = pm.Uniform('sigma', lower=0.1, upper=70)
thresh = pm.Dirichlet('thresh', a=np.ones(5))
cat_p = compute_ps(thresh, mu, sigma)
results = pm.Categorical('results', p=cat_p, observed=data)
with model:
start = pm.find_MAP()
trace = pm.sample(2000, start=start)
When running this, I receive the following error:
Applied interval-transform to sigma and added transformed sigma_interval_ to model.
Applied stickbreaking-transform to thresh and added transformed thresh_stickbreaking_ to model.
Traceback (most recent call last):
File "cm_net_log.v1-for_so.py", line 53, in <module>
start = pm.find_MAP()
File "/usr/local/lib/python3.5/site-packages/pymc3/tuning/starting.py", line 133, in find_MAP
specific_errors)
ValueError: Optimization error: max, logp or dlogp at max have non-finite values. Some values may be outside of distribution support. max: {'thresh_stickbreaking_': array([-1.04298465, -0.48661088, -0.84326554, -0.44833646]), 'sigma_interval_': array(-2.220446049250313e-16), 'mu': array(7.68422528308479)} logp: array(-3506.530143064723) dlogp: array([ 1.61013190e-06, nan, -6.73994118e-06,
-6.93873894e-06, 6.03358122e-06, 3.18954680e-06])Check that 1) you don't have hierarchical parameters, these will lead to points with infinite density. 2) your distribution logp's are properly specified. Specific issues:
My questions:
How can I determine why dlogp is nan at certain points?
Is there a different way that I can express this model to avoid dlogp being nan?
Also worth noting:
This model runs fine if I don't find_MAP and use a Metropolis sampler. However, I'd like to have the flexibility of using other samplers as this model becomes more complex.
I have a suspicion that the issue is due to the relationship between the thresholds and the normal distribution, but I don't know how to disentangle them for the optimization.
Regarding question 2: I expressed the model for the ordinal predicted variable (single group) differently; I used the Theano #as_op decorator for a function that calculates probabilities for the outcomes. That also explains why I cannot use find_MAP() or gradient based samplers: Theano cannot calculate a gradient for the custom function. (http://pymc-devs.github.io/pymc3/notebooks/getting_started.html#Arbitrary-deterministics)
# Number of outcomes
nYlevels = df.Y.cat.categories.size
thresh = [k + .5 for k in range(1, nYlevels)]
thresh_obs = np.ma.asarray(thresh)
thresh_obs[1:-1] = np.ma.masked
#as_op(itypes=[tt.dvector, tt.dscalar, tt.dscalar], otypes=[tt.dvector])
def outcome_probabilities(theta, mu, sigma):
out = np.empty(nYlevels)
n = norm(loc=mu, scale=sigma)
out[0] = n.cdf(theta[0])
out[1] = np.max([0, n.cdf(theta[1]) - n.cdf(theta[0])])
out[2] = np.max([0, n.cdf(theta[2]) - n.cdf(theta[1])])
out[3] = np.max([0, n.cdf(theta[3]) - n.cdf(theta[2])])
out[4] = np.max([0, n.cdf(theta[4]) - n.cdf(theta[3])])
out[5] = np.max([0, n.cdf(theta[5]) - n.cdf(theta[4])])
out[6] = 1 - n.cdf(theta[5])
return out
with pm.Model() as ordinal_model_single:
theta = pm.Normal('theta', mu=thresh, tau=np.repeat(.5**2, len(thresh)),
shape=len(thresh), observed=thresh_obs, testval=thresh[1:-1])
mu = pm.Normal('mu', mu=nYlevels/2.0, tau=1.0/(nYlevels**2))
sigma = pm.Uniform('sigma', nYlevels/1000.0, nYlevels*10.0)
pr = outcome_probabilities(theta, mu, sigma)
y = pm.Categorical('y', pr, observed=df.Y.cat.codes.as_matrix())
http://nbviewer.jupyter.org/github/JWarmenhoven/DBDA-python/blob/master/Notebooks/Chapter%2023.ipynb