I have trained two ml models based on two different datasets. Then I saved them as model1.pkl and model2.pkl . There are two user inputs(not input data for model) like x=0 and x=1 and if x=0 I have to go with model1.pkl for prediction else I have to go with model2.pkl for prediction. I can do them using if condition but my problem is I have to know whether is there any possibilities to save back that as model.pkl including this condition statement. If I combine them and save as a model it will be easy to load in other IDEs.
You can create a class, that has a minimal interface like a model class like this:
# create the test setup
import lightgbm as lgb
import pickle as pkl
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder
#from sklearn
df['x1']= LabelEncoder().fit_transform(df['x1'])
data= {
'x': [1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0],
'q': [0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0],
'b': [1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0, 1.0, 2.0, 3.0],
'target': [0.0, 2.0, 1.5, 0.0, 5.1, 4.0, 0.0, 1.0, 2.0, 0.0, 2.1, 1.5]
}
df= pd.DataFrame(data)
X, y=df.iloc[:, :-1], df.iloc[:, -1]
X= X.astype('float32')
# create two models
model1= LinearRegression()
model2 = lgb.LGBMRegressor(n_estimators=5, num_leaves=10, min_child_samples=1)
ser_model1= X['x']==0.0
model1.fit(X[ser_model1], y[ser_model1])
model2.fit(X[~ser_model1], y[~ser_model1])
# define a class that mocks the model interface
class CombinedModel:
def __init__(self, model1, model2):
self.model1= model1
self.model2= model2
def predict(self, X, **kwargs):
ser_model1= X['x']==0.0
return pd.concat([
pd.Series(self.model1.predict(X[ser_model1]), index=X.index[ser_model1]),
pd.Series(self.model2.predict(X[~ser_model1]), index=X.index[~ser_model1])
]
).sort_index()
# create a model with the two trained sum models
# and pickle it
model= CombinedModel(model1, model2)
model.predict(X)
with open('model.pkl', 'wb') as fp:
pkl.dump(model, fp)
model= model1= model2= None
# test load it
with open('model.pkl', 'rb') as fp:
model= pkl.load(fp)
model.predict(X)
If you want, of course you can also implement a fit method in the class above, which just calls the two models. If you implement the necessary methods, you could even use this class in a sklearn pipeline.
You can use Ensemble Voting Classifier which will consider the outputs from both the models and give appropriate output.
Link- https://machinelearningmastery.com/voting-ensembles-with-python/
In Tensorflow manual, description for labels is like below:
labels: Each row labels[i] must be a valid probability distribution.
Then, does it mean labels can be like below, if I have real probability distributions of classes for each input.
[[0.1, 0.2, 0.05, 0.007 ... ]
[0.001, 0.2, 0.5, 0.007 ... ]
[0.01, 0.0002, 0.005, 0.7 ... ]]
And, is it more efficient than one-hot encoded labels?
Thank you in advance.
In a word, yes, you can use probabilities as labels.
The documentation for tf.nn.softmax_cross_entropy_with_logits says you can:
NOTE: While the classes are mutually exclusive, their probabilities
need not be. All that is required is that each row of labels is
a valid probability distribution. If they are not, the computation of the
gradient will be incorrect.
If using exclusive labels (wherein one and only
one class is true at a time), see sparse_softmax_cross_entropy_with_logits.
Let's have a short example to be sure it works ok:
import numpy as np
import tensorflow as tf
labels = np.array([[0.2, 0.3, 0.5], [0.1, 0.7, 0.2]])
logits = np.array([[5.0, 7.0, 8.0], [1.0, 2.0, 4.0]])
sess = tf.Session()
ce = tf.nn.softmax_cross_entropy_with_logits(
labels=labels, logits=logits).eval(session=sess)
print(ce) # [ 1.24901222 1.86984602]
# manual check
predictions = np.exp(logits)
predictions = predictions / predictions.sum(axis=1, keepdims=True)
ce_np = (-labels * np.log(predictions)).sum(axis=1)
print(ce_np) # [ 1.24901222 1.86984602]
And if you have exclusive labels, it is better to use one-hot encoding and tf.nn.sparse_softmax_cross_entropy_with_logits rather than tf.nn.softmax_cross_entropy_with_logitsand explicit probability representation like [1.0, 0.0, ...]. You can have shorter representation that way.
This neural network trains on inputs [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]] with labelled outputs : [[0.0], [1.0], [1.0], [0.0]]
import numpy as np
import tensorflow as tf
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 2])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 1])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([2, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.sigmoid(tf.matmul(inputs, weights_1) + biases_1)
print layer_1_outputs
NUMBER_OUTPUT_NEURONS = 1
biases_2 = tf.Variable(tf.zeros([NUMBER_OUTPUT_NEURONS]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, NUMBER_OUTPUT_NEURONS]))
finalLayerOutputs = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.sigmoid(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
training_outputs = [[0.0], [1.0], [1.0], [0.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(15):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
print(sess.run(logits, feed_dict={inputs: np.array([[0.0, 1.0]])}))
Upon training this network returns [[ 0.61094815]] for values [[0.0, 1.0]]
[[ 0.61094815]] is value with highest probability after training this network is assign to input value [[0.0, 1.0]] ? Can the lower probability values also be accessed and not just most probable ?
If I increase number of training epochs I'll get better prediction but in this case I just want to access all potential values with their probabilities for a given input.
Update :
Have updated code to use multi class classification with softmax. But the prediction for [[0.0, 1.0, 0.0, 0.0]] is [array([0])]. Have I updated correctly ?
import numpy as np
import tensorflow as tf
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 4])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 3])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([4, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.softmax(tf.matmul(inputs, weights_1) + biases_1)
biases_2 = tf.Variable(tf.zeros([3]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, 3]))
finalLayerOutputs = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0 , 0.0, 0.0], [0.0, 1.0 , 0.0, 0.0], [1.0, 0.0 , 0.0, 0.0], [1.0, 1.0 , 0.0, 0.0]]
training_outputs = [[0.0,0.0,0.0], [1.0,0.0,0.0], [1.0,0.0,0.0], [0.0,0.0,1.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(15):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
prediction=tf.argmax(logits,1)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Which prints [array([0])]
Update 2 :
Replacing
prediction=tf.argmax(logits,1)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
With :
prediction=tf.nn.softmax(logits)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Appears to fix issue.
So now full source is :
import numpy as np
import tensorflow as tf
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
# a batch of inputs of 2 value each
inputs = tf.placeholder(tf.float32, shape=[None, 4])
# a batch of output of 1 value each
desired_outputs = tf.placeholder(tf.float32, shape=[None, 3])
# [!] define the number of hidden units in the first layer
HIDDEN_UNITS = 4
weights_1 = tf.Variable(tf.truncated_normal([4, HIDDEN_UNITS]))
biases_1 = tf.Variable(tf.zeros([HIDDEN_UNITS]))
# connect 2 inputs to every hidden unit. Add bias
layer_1_outputs = tf.nn.softmax(tf.matmul(inputs, weights_1) + biases_1)
biases_2 = tf.Variable(tf.zeros([3]))
weights_2 = tf.Variable(tf.truncated_normal([HIDDEN_UNITS, 3]))
finalLayerOutputs = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
tf.global_variables_initializer().run()
logits = tf.nn.softmax(tf.matmul(layer_1_outputs, weights_2) + biases_2)
training_inputs = [[0.0, 0.0 , 0.0, 0.0], [0.0, 1.0 , 0.0, 0.0], [1.0, 0.0 , 0.0, 0.0], [1.0, 1.0 , 0.0, 0.0]]
training_outputs = [[0.0,0.0,0.0], [1.0,0.0,0.0], [1.0,0.0,0.0], [0.0,0.0,1.0]]
error_function = 0.5 * tf.reduce_sum(tf.sub(logits, desired_outputs) * tf.sub(logits, desired_outputs))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(error_function)
for i in range(1500):
_, loss = sess.run([train_step, error_function],
feed_dict={inputs: np.array(training_inputs),
desired_outputs: np.array(training_outputs)})
prediction=tf.nn.softmax(logits)
best = sess.run([prediction],feed_dict={inputs: np.array([[0.0, 1.0, 0.0, 0.0]])})
print(best)
Which prints
[array([[ 0.49810624, 0.24845563, 0.25343812]], dtype=float32)]
Your current network does (logistic) regression, not really classification: given an input x, it tries to evaluate f(x) (where f(x) = x1 XOR x2 here, but the network does not know that before training), which is regression. To do so, it learns a function f1(x) and tries to have it be as close to f(x) on all your training samples. [[ 0.61094815]] is just the value of f1([[0.0, 1.0]]). In this setting, there is no such thing as "probability to be in a class", since there is no class. There is only the user (you) chosing to interpret f1(x) as a probability for the output to be 1. Since you have only 2 classes, that tells you that the probability of the other class is 1-0.61094815 (that is, you're doing classification with the output of the network, but it is not really trained to do that in itself). This method used as classification is, in a way, a (widely used) trick to perform classification, but only works if you have 2 classes.
A real network for classification would be built a bit differently: your logits would be of shape (batch_size, number_of_classes) - so (1, 2) in your case-, you apply a sofmax on them, and then the prediction is argmax(softmax), with probability max(softmax). Then you can also get the probability of each output, according to the network: probability(class i) = softmax[i]. Here the network is really trained to learn the probability of x being in each class.
I'm sorry if my explanation is obscure or if the difference between regression between 0 and 1 and classification seems philosophical in a setting with 2 classes, but if you add more classes you'll probably see what I mean.
EDIT
Answer to your 2 updates.
in your training samples, the labels (training_outputs) must be probability distributions, i.e. they must have sum 1 for each sample (99% of the time they are of the form (1, 0, 0), (0, 1, 0) or (0, 0, 1)), so your first output [0.0,0.0,0.0] is not valid. If you want to learn XOR on the two first inputs, then the 1st output should be the same as the last: [0.0,0.0,1.0].
prediction=tf.argmax(logits,1) = [array([0])] is completely normal: loginscontains your probabilities, and prediction is the prediction, which is the class with the biggest probability, which is in your case class 0: in your training set, [0.0, 1.0, 0.0, 0.0] is associated with output [1.0, 0.0, 0.0], i.e. it is of class 0 with probability 1, and of the other classes with probability 0. After enough training, print(best) with prediction=tf.argmax(logits,1) on input [1.0, 1.0 , 0.0, 0.0] should give you [array([2])], 2 being the index of the class for this input in your training set.
I have a batch of images as a tensor of size [batch_size, w, h].
I wish to get a histogram of the values in each column.
This is what I came up with (but it works only for the first image in the batch and its also very slow):
global_hist = []
net = tf.squeeze(net)
for i in range(batch_size):
for j in range(1024):
hist = tf.histogram_fixed_width(tf.slice(net,[i,0,j],[1,1024,1]), [0.0, 0.2, 0.4, 0.6, 0.8, 1.0], nbins=10)
global_hist[i].append(hist)
Is there an efficient way to do this?
ok so I found a solution (though its rather slow and does not allow to fix the bins edges), but someone may find this usefull.
nbins=10
net = tf.squeeze(net)
for i in range(batch_size):
local_hist = tf.expand_dims(tf.histogram_fixed_width(tf.slice(net,[i,0,0],[1,1024,1]), [0.0, 1.0], nbins=nbins, dtype=tf.float32),[-1])
for j in range(1,1024):
hist = tf.histogram_fixed_width(tf.slice(net,[i,0,j],[1,1024,1]), [0.0, 1.0], nbins=nbins, dtype=tf.float32)
hist = tf.expand_dims(hist,[-1])
local_hist = tf.concat(1, [local_hist, hist])
if i==0:
global_hist = tf.expand_dims(local_hist, [0])
else:
global_hist = tf.concat(0, [global_hist, tf.expand_dims(local_hist,[0])])
In addition, I found this link to be very usefull
https://stackoverflow.com/questions/41764199/row-wise-histogram/41768777#41768777