Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor when creating a dataset from a split array - tensorflow

So I have this weird bug when I create a tf.data.Dataset from tensor slices like so
train = AB.copy()
test = train.sample(2000)
train = train[~(train.A.isin(test.A))]
x_train = train.A_image.to_numpy()
y_train = train.Label.to_numpy()
x_test = test.A_image.to_numpy()
y_test = test.Label.to_numpy()
dataset = tf.data.Dataset.from_tensor_slices(({'image_input': x_train, 'label_input': y_train}, y_train))
ds_test = tf.data.Dataset.from_tensor_slices(({'image_input': x_test, 'label_input': y_test}, y_test))
now the creation of dataset works
but ds_test throws this error in the title.
I checked they're valid arrays, same shape.
And sometimes when I restart runtime this same code works
What could be the issue here?
thanks for looking at it!
edit:converting x_test first to a list and then to an array works.

Related

XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

When training XGBoost iteratively for data too large to fit in memory, one may want to use "batches". The problem is, however, that each batch may not contain all 0,...,C labels. This leads to the error ValueError: The label must consist of integer labels of form 0, 1, 2, ..., [num_class-1] -
Is there a way to train XGBoost where we just have some subset of the labels, which may not contain zero?
The code has structure similar to this:
train = module.trainloader
test = module.valloader
# Train on one minibatch to get started
sample = next(iter(loader))
X = xgb.DMatrix(sample[0].numpy(), label=sample[1].numpy())
params = {
'learning_rate': 0.007,
'updater':'refresh',
'process_type': 'update',
}
# Get initial model training
model = xgb.train(params, dtrain=X)
for i, (trainsample, valsample) in enumerate(zip(train, test)):
X_train, y_train = trainsample
X_test, y_test = valsample
X_train = xgb.DMatrix(X_train, labels=y_train)
X_test = xgb.DMatrix(X_test)
model = xgb.train(params, dtrain=X_train, xgb_model=model)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)

Keras functional api input shape error, lstm layer received 2d instead of 3d shape

I am using the keras functional api, but i'm getting an error about the input shape of the model -
ValueError: Input 0 is incompatible with layer financial_model: expected shape=(None, 1, 62), found shape=(1, 62)
samples = np.array(samples, dtype=np.float64)
labels = np.array(labels, dtype=np.uint8)
x_train, x_test, y_train, y_test = train_test_split(samples, labels, test_size=0.33,
random_state=42)
min_max = MinMaxScaler()
x_train = min_max.fit_transform(x_train)
lstm_input = np.expand_dims(x_train, axis=1).shape
inputs = keras.Input(shape=(lstm_input[1],lstm_input[2]))
hidden = keras.layers.LSTM(lstm_input[2], activation='tanh')(inputs)
output = keras.layers.Dense(2)(hidden)
model = keras.Model(inputs=inputs, outputs=output, name="financial_model")
model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.Adam(learning_rate=0.001),
metrics=["accuracy"],
)
model.summary()
history = model.fit(x_train, y_train, batch_size=1, epochs=5, validation_split=0.2)
I've learnt from similar questions that the batch size is omitted in the input shape dimensions. How do I feed a 3 dimensional input shape into the lstm layer when the batch size is left out in the input object?
Since I have less than 50 reputation, I cannot comment. I'm not sure of this, but as the error says, your input shape is wrong. You have to add another dimension to it. Try something like this:
inputs = keras.Input(shape=(lstm_input[1],lstm_input[2], 1))

How to use cross validation validation after train/test split

I have used K-cross validation on the train set after splitting the data into train and test. But this gives an error which I think is due to indexing after the train and test split. Below is the code I used. How do I reset index after the train/train split or any other suggestions to deal with this error would be greatly appreciated. I have already tried df.reset_index() but this gives an error AttributeError: 'numpy.ndarray' object has no attribute 'reset_index'.
Thank you.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=99)
# k-fold cross validation
scores = list()
kfold = KFold(n_splits=10, shuffle=True)
# enumerate splits
for train_ix, test_ix in kfold.split(X_train):
train_X, test_X = X_train[train_ix], X_train[test_ix]
train_y, test_y = y_train[train_ix], y_train[test_ix]
# fit model
model = LinearRegression()
model.fit(train_X, train_y)
# evaluate model
yhat = model.predict(test_X)
score = np.sqrt(metrics.mean_absolute_error(yhat, test_y))
print('Fold score : {}'.format(score))
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([ 3, 9, 10, 17, 19,\n ...\n 41050, 41056, 41060, 41101, 41120],\n dtype='int64', length=3708).

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)

Im facing problem with this line of code in keras with backend Tensorflow 2.0:
loss_out = Lambda(function=ctc_lambda_func, name='ctc', output_shape=(1,))([y_pred, Y_train, X_train_length, label_length])
Y_train, X_train_length are numpy.ndarrays
y_pred and label_length are class 'tensorflow.python.framework.ops.Tensor'
You can use
tf.convert_to_tensor()
example,
import tensorflow as tf
import numpy as np
loss = Lambda(function=ctc_lambda_func, name='ctc', output_shape=(1,))
([y_pred, Y_train, X_train_length, label_length])
loss_np = np.asarray(loss, np.float32)
loss_tf = tf.convert_to_tensor(loss_np, np.float32)
sess = tf.InteractiveSession()
print(loss_tf.eval())
sess.close()
You can create dummy inputs
# you have defined the rest of your graph somewhere here
Y_train = Input(shape=...)
X_train_length = Input(shape=...)
loss = Lambda(function=ctc_lambda_func, name='ctc', output_shape=(1,)
)([y_pred, Y_train, X_train_length, label_length])
# defining the model is slightly different with multiple inputs
training_model = Model(inputs=[image_input, Y_train, X_train_length], outputs=[loss])
And when you want to train your model you will pass the parameter x as a list of length 3, such as
x = [<images - np.ndarray shape (batch, h, w, c)>, <Y_train inputs - np.ndarray>,
<X_train_length inputs - np.ndarray>]
And of course dummy values for y
y = np.zeros((batch, 1))
And it's never been simpler finally than training_model.train_on_batch(x, y)
Alternatively make a generator that generates x and y in the form described above and use training_model.fit_generator(data_generator)

Keras, IndexError: indices are out-of-bounds

I'm trying to implement this simple neural network by Keras (Tensorflow beckend):
x_train = df_train[["Pclass", "Gender", "Age","SibSp", "Parch"]]
y_train = df_train ["Survived"]
x_test = df_test[["Pclass", "Gender", "Age","SibSp", "Parch"]]
y_test = df_test["Survived"]
y_train = y_train.values
y_test = y_test.values
But when I run this part:
model = Sequential()
model.add(Dense(input_dim=5, output_dim=1))
model.add(Activation("softmax"))
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit(x_train, y_train)
I get this error: IndexError: indices are out-of-bounds. I am supposing that it is about the arguments in model.fit(x_train, y_train). I have tried to pass these as numpy arrays by .values, but I still have the same error.
Keras expects numpy arrays not pandas, so you need to convert all of the data that you are feeding into Keras APIs.. not just y_train and y_test
So:
x_train = x_train.values
y_train = y_train.values
x_test = x_test.values
y_test = y_test.values
Or
x_train = numpy.asarray(x_train)
y_train = numpy.asarray(y_train)
x_test = numpy.asarray(x_test)
y_test = numpy.asarray(y_test)