How to use the model.predict for data after training tensorflow model? - tensorflow

i followed the guide found here(regression):
https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/
using this dataset:
https://drive.google.com/file/d/1mVmGNx6cbfvRHC_DvF12ZL3wGLSHD9f_/view
and ended up with this code:
data = pd.read_csv(r'path')
X = data.iloc[:, 0:4].values
y = data.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(100, activation='relu')(input_layer)
dense_layer_2 = Dense(50, activation='relu')(dense_layer_1)
dense_layer_3 = Dense(25, activation='relu')(dense_layer_2)
output = Dense(1)(dense_layer_3)
model = Model(inputs=input_layer, outputs=output)
model.compile(loss="mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"])
history = model.fit(X_train, y_train, batch_size=2, epochs=100, verbose=1, validation_split=0.2)
from sklearn.metrics import mean_squared_error
from math import sqrt
pred_train = model.predict(X_train)
print(np.sqrt(mean_squared_error(y_train,pred_train)))
pred = model.predict(X_test)
print(np.sqrt(mean_squared_error(y_test,pred)))
Everything works and the model gets trained, but how do i actually use it? I want to input 4 intergers and in return get the prediction. So for example take the array [9, 4554, 1950, 0.634] and then get the predicted value. No matter what i do the model won't accept the data i am using.
Thanks for the help!

Main Problem which you are facing as per my understanding is dimension Because you insert [9,...,0.634] which of shape (4,) it mean 1D while X_test,X_train require to be 2D as per documentationo you have to convert 1D to 2D.
How You Convert
import numpy as np
X_test=[9,...,0.634]
X_test=np.array(X_test)
X_test=X_test.reshape(1,4)
model.predict(X_test)
s

Related

Facing errors when using gridsearch CV with Keras model

I need to run gridsearch CV on a Keras model but keep running into the following error:
TypeError: Only integers, slices (:), ellipsis (...), tf.newaxis (None) and scalar tf.int32/tf.int64 tensors are valid indices, got array([20000, 20001, 20002, ..., 59997, 59998, 59999])
on line grid_result = grid.fit(x_train, y_train)
The code to run the Gridsearch CV is as follows:
batch_size = 128
epochs = 20
model_CV = KerasClassifier(build_fn=create_model,epochs=epochs,batch_size=batch_size, verbose=0)
define the grid search parameters
init_mode = ['uniform', 'normal', 'he_normal','he_uniform']
param_grid = dict(init_mode=init_mode)
grid = GridSearchCV(estimator=model_CV,param_grid=param_grid, cv=3)
grid_result = grid.fit(x_train, y_train)
create_model used above
def create_model(init_mode='uniform'):
model = Sequential()
model.add(Dense(64, kernel_initializer=init_mode,
activation=tf.nn.relu, input_dim=784))
model.add(Dropout(rate=0.5))
model.add(Dense(64, kernel_initializer=init_mode,
activation=tf.nn.relu))
model.add(Dense(10, kernel_initializer=init_mode, activation=tf.nn.softmax))
compile model
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
return model
Data Source
mnist = keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
Data Preprocessing
flatten = tf.keras.layers.Flatten(input_shape=[28,28])
x_train = flatten(x_train)
x_train = x_train / 255
from tensorflow.keras.utils import to_categorical
y_train = to_categorical(y_train, num_classes = num_classes)
I tried changing y_train by flattening it or not running to_categorical on y_train but I still run into the same issue.
Is the problem with x_train or y_train and how can I fix it? Thank you for any help provided.

Different results between training and loading autokeras-model

I trained a regression-model with autokeras resulting in a model with a MAE of 0.2 with that code, where x and y were input and output-dataframes:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)
search = StructuredDataRegressor(max_trials=1000, loss='mean_squared_error', max_model_size=100000000000, overwrite = True)
search.fit(x=X_train, y=y_train, verbose=2, validation_data=(X_test, y_test))
model = search.export_model()
model.summary()
model.save('model_best')
Refeeding my data to the model delivers a MAE of about 30 with pretty nonsense predictions. My test-output values are in the range of 3 to 10, predicted output-values are in the range of -10 to 5.
model = load_model("model_best2", custom_objects=ak.CUSTOM_OBJECTS)
mae, _ = model.evaluate(x, y, verbose=2)
print('MAE: %.3f' % mae)
Those results are reproducible with any provided model from autokeras. Do you have any clue why training and evaluation results are totally different?
I created a minimal example which is delivering similar bad results so you can try on your own:
from numpy import asarray
from pandas import read_csv
from sklearn.model_selection import train_test_split
from autokeras import StructuredDataRegressor
import matplotlib.pyplot as plt
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/auto-insurance.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
data = data.astype('float32')
X, y = data[:, :-1], data[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
search = StructuredDataRegressor(max_trials=15, loss='mean_absolute_error')
search.fit(x=X_train, y=y_train, verbose=2)
mae, _ = search.evaluate(X_test, y_test, verbose=2)
print('MAE: %.3f' % mae)
predictions = search.predict(X)
miny = float(y.min())
maxy = float(y.max())
minp = float(min(predictions))
maxp = float(max(predictions))
plt.figure(figsize=(15,15))
plt.scatter(y, predictions, c='crimson',s=5)
p1 = max(maxp, maxy)
p2 = min(minp, miny)
plt.plot([p1, 0], [p1, 0], 'b-')
plt.xlabel('True Values', fontsize=15)
plt.ylabel('Predictions', fontsize=15)
plt.axis('equal')
plt.show()

LSTM for imbalanced time series classification

I wanted to fit simple LSTM model to perform binary classification on multivariate time series data. Since my data is severely imbalanced, I have integrated class_weight argument from sklearn in my model. However, I have got pretty high loss value, and it was not decreasing with each epoch. My f1 score was 0.018 which is extremely low as well. I appreciate your suggestions!
Sample data:
sequence_length = 10
def generate_data(X, y, sequence_length = 10, step = 1):
X_local = []
y_local = []
for start in range(0, len(data) - sequence_length, step):
end = start + sequence_length
X_local.append(X[start:end])
y_local.append(y[end-1])
return np.array(X_local), np.array(y_local)
X_sequence, y = generate_data(data.loc[:, "V1":"V4"].values, data.Class)
model = keras.Sequential()
model.add(LSTM(100, input_shape = (10, 4)))
model.add(Dropout(0.5))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy"
, metrics=[keras.metrics.binary_accuracy]
, optimizer="adam")
model.summary()
training_size = int(len(X_sequence) * 0.7)
X_train, y_train = X_sequence[:training_size], y[:training_size]
X_test, y_test = X_sequence[training_size:], y[training_size:]
from sklearn.utils import class_weight
class_weights = dict(zip(np.unique(y_train), class_weight.compute_class_weight('balanced', np.unique(y_train),
y_train)))
model.fit(X_train, y_train, batch_size=64, epochs=50,class_weight=class_weights)
model.evaluate(X_test, y_test)
y_test_prob = model.predict(X_test, verbose=1)
y_test_pred = np.where(y_test_prob > 0.5, 1, 0)
from sklearn.metrics import f1_score
f1_score(y_test, y_test_pred)

why is this model giving me a "Value Error"

Its giving me a value error that I don't understand. Here is what it says :
Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays" my data has 8 columns and i m trying to predict the last 2 for output.
This is a ranking algorithm that I'm experimenting with my own data with :
import pandas as pd
import keras
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from keras import backend
from keras.layers import Activation, Dense, Input, Subtract
from keras.models import Model
INPUT_DIM = 7
# Model.
h_1 = Dense(128, activation="relu")
h_2 = Dense(64, activation="relu")
h_3 = Dense(32, activation="relu")
s = Dense(1)
# Relevant document score.
rel_doc = Input(shape=(INPUT_DIM,), dtype="float32")
h_1_rel = h_1(rel_doc)
h_2_rel = h_2(h_1_rel)
h_3_rel = h_3(h_2_rel)
rel_score = s(h_3_rel)
# Irrelevant document score.
irr_doc = Input(shape=(INPUT_DIM,), dtype="float32")
h_1_irr = h_1(irr_doc)
h_2_irr = h_2(h_1_irr)
h_3_irr = h_3(h_2_irr)
irr_score = s(h_3_irr)
# Subtract scores.
diff = Subtract()([rel_score, irr_score])
# Pass difference through sigmoid function.
prob = Activation("sigmoid")(diff)
# Build model.
model = Model(inputs=[rel_doc, irr_doc], outputs=prob)
model.compile(optimizer="adadelta", loss="binary_crossentropy")
# data.
data=pd.read_csv('ranking_dataset_remastered.csv')
print (data.head())
X = data.iloc[:, 1:7]
y = data.iloc[:, 6:7]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.2)
)
# Train model.
NUM_EPOCHS = 20
BATCH_SIZE = 512
history = model.fit(X_train, y_train, batch_size=BATCH_SIZE,
epochs=NUM_EPOCHS, verbose=1)
# Generate scores from document/query features.
get_score = backend.function([rel_doc], [rel_score])
get_score([X_train])
get_score([y_train])
When you defined your model with this line:
model = Model(inputs=[rel_doc, irr_doc], outputs=prob)
You created what keras refers to as a multi-input model, which essentially means that your model is expecting more than one inputs (in your case 2: rel_doc and irr_doc).
However during training you are just passing 1 input, X_train:
history = model.fit(X_train, y_train, batch_size=BATCH_SIZE,
epochs=NUM_EPOCHS, verbose=1)
What you should do in order to work is to have two arrays, one representing relevant and one irrelevant documents and feed them both to the model during training like this:
history = model.fit([X_rel_train, X_irr_train], y_train, batch_size=BATCH_SIZE,
epochs=NUM_EPOCHS, verbose=1)

Keras, IndexError: indices are out-of-bounds

I'm trying to implement this simple neural network by Keras (Tensorflow beckend):
x_train = df_train[["Pclass", "Gender", "Age","SibSp", "Parch"]]
y_train = df_train ["Survived"]
x_test = df_test[["Pclass", "Gender", "Age","SibSp", "Parch"]]
y_test = df_test["Survived"]
y_train = y_train.values
y_test = y_test.values
But when I run this part:
model = Sequential()
model.add(Dense(input_dim=5, output_dim=1))
model.add(Activation("softmax"))
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit(x_train, y_train)
I get this error: IndexError: indices are out-of-bounds. I am supposing that it is about the arguments in model.fit(x_train, y_train). I have tried to pass these as numpy arrays by .values, but I still have the same error.
Keras expects numpy arrays not pandas, so you need to convert all of the data that you are feeding into Keras APIs.. not just y_train and y_test
So:
x_train = x_train.values
y_train = y_train.values
x_test = x_test.values
y_test = y_test.values
Or
x_train = numpy.asarray(x_train)
y_train = numpy.asarray(y_train)
x_test = numpy.asarray(x_test)
y_test = numpy.asarray(y_test)