I am trying to implement a Keras Regression model on a dataset for my learning purpose. I have taken the data from the Kaggle Loan Default Prediction Challenge and I am trying to predict whether a person will default on a loan or not
The target column seems to be imbalanced and majority of the observations seems to have "0" as their value. I have tried the following approaches to overcome this data imbalance (a) Downsampled the Majority class (b) Upsample the Minority class (c) use the SMOTE algorithm. But these approaches do not seem to help the cause and prediction from the model is biased only towards "0" since majority of the classes in the dataset is "0". I have used the resample method from sklearn for performing the downsampling and upsampling.
What different approaches can I try to overcome this problem and achieve a good accuracy with my model on this data and get a realistic prediction from the model. I am sharing my code
from keras.models import Sequential
from keras.layers import Dense
from keras.regularizers import L1L2
import pandas
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import Imputer
from sklearn.metrics import roc_auc_score
import statsmodels.api as sm
from sklearn import preprocessing as pre
train = pandas.read_csv('/train_v2.csv/train_v2.csv')
# Defining the target column
train_loss = train.loss
# Defining the features for the model
train = train[['f527','f528','f271']]
# Defining the imputer function
imp = Imputer()
# Fitting the imputation function to the training dataset
imp.fit(train)
train = imp.transform(train)
train=pre.StandardScaler().fit_transform(train)
# Splitting the data into Training and Testing samples
X_train,X_test,y_train,y_test = train_test_split( train,
train_loss,test_size=0.3, random_state=42)
# logistic regression with L1 and L2 regularization
reg = L1L2(l1=0.01, l2=0.01)
model = Sequential()
model.add(Dense(13,kernel_initializer='normal', activation='relu',
W_regularizer=reg, input_dim=X_train.shape[1]))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, nb_epoch=10, validation_data=(X_test, y_test))
Related
I am trying to solve the XOR problem using the following code:
import numpy as np
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input, Concatenate
from tensorflow.keras.utils import plot_model
from tensorflow.keras.optimizers import SGD, Adam
# input data
x = np.array([[0,0], [0,1], [1,0], [1,1]], 'float32')
y = np.array([[0], [1], [1], [0]], 'float32')
### Model
model = Sequential()
# add layers (architecture)
model.add(Dense(2, activation = 'relu')
model.add(Dense(1, activation = 'sigmoid'))
# compile
model.compile(loss = 'mean_squared_error',
optimizer = SGD(learning_rate = 0.1, momentum=0.8),
metrics = ['accuracy'])
# train
model.fit(x, y, epochs = 25000, batch_size = 1)
# evaluate
ev = model.evaluate(x, y)
I already tested:
using different activation functions in the hidden layer (sigmoid and tanh)
using different learning rates and momentum
Also, I am running with a high number of epochs (25000). Still, it only accurately predicts all outputs a few times. Most of the times accuracy is equal to 0.5 or 0.75.
I have read that this is the minimum configuration to solve this problem. However, it also seems that the error surface presents a number of regions with local minima.
My question is:
Should I assume that the model is correct and can learn the problem, although sometimes it gets 'stuck' in a local minima, OR do I still need to improve my model somehow to solve the XOR more accurately and consistently?
I have an One-hot-encoded sparse matrix which can't be transformed into a normal matrix due to its size.
I would like to reduce the dimensions using an autoencoder. Currently I am trying to use Tensorflow and its Keras library for that.
The Tensorflow docs state that sparse tensors exist and that they can be used in Keras (see https://www.tensorflow.org/guide/sparse_tensor).
The Problem is that all autoencoders I've found in the internet do not seem to work with sparse tensors.
I have prepared a small code example which stops after the first training epoch with the error message: "Failed to convert elements of SparseTensor to Tensor. Consider casting elements to a supported type.".
My Questions would be:
Do you have an idea to improve the Code or ideally do you have an example which I can look up?
If not: Do you have other ideas on how to do what I would like to do (e.g. another library, other method, etc.)?
Code Example:
#necessary imports
import tensorflow as tf
from keras.models import Model, Sequential
from keras.layers import Input, Dense, ActivityRegularization
from tensorflow.keras import backend as K
from tensorflow.keras import regularizers
#example one-hot-encoded matrix with 10 records with each one out of 4 distinct categories
sparse_tensor = tf.sparse.SparseTensor(indices=[[0,3], [1,3], [2,0], [3,1], [4,0], [5,2], [6,2], [7,1], [8,3], [9,1]],
values=[1 for i in range(10)],
dense_shape=[10, 4])
encoder = Sequential([
Input(shape=(4,), sparse=True),
Dense(1, activation = 'relu'),
ActivityRegularization(l1=1e-3)
])
decoder = Sequential([
Dense(4, activation = 'sigmoid', input_shape = (1, )),
])
autoencoder = Sequential([encoder, decoder])
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(x=sparse_tensor, y=sparse_tensor, epochs=5, batch_size=5, shuffle=True)
Here is my code for distributed training via spark-tensorflow-distributor that uses tensorflow MultiWorkerMirroredStrategy to train using multiple servers
https://github.com/tensorflow/ecosystem/blob/master/spark/spark-tensorflow-distributor/spark_tensorflow_distributor/mirrored_strategy_runner.py
import sys
from spark_tensorflow_distributor import MirroredStrategyRunner
import mlflow.keras
mlflow.keras.autolog()
mlflow.log_param("learning_rate", 0.001)
import tensorflow as tf
import time
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
def train():
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
#tf.distribute.experimental.CollectiveCommunication.NCCL
model = None
with strategy.scope():
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3)
N, D = X_train.shape # number of observation and variables
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(D,)),
tf.keras.layers.Dense(1, activation='sigmoid') # use sigmoid function for every epochs
])
model.compile(optimizer='adam', # use adaptive momentum
loss='binary_crossentropy',
metrics=['accuracy'])
# Train the Model
r = model.fit(X_train, y_train, validation_data=(X_test, y_test))
mlflow.keras.log_model(model, "mymodel")
MirroredStrategyRunner(num_slots=4, use_custom_strategy=True).run(train)
I realize that saving via mlflow.keras.log_model produces 4 models in databrick experiments,
each of the 4 models is not a good predictor
if I change num_slots from 4 to 1, there is only 1 model saved in databrick experiment and the model is a good predictor during inference
My question is
Do I need an extra step to merge the 4 models together to create 1 model that can predict as good as num_slot = 1? Or am I doing something wrong? I was expecting only the chief node saving models
So, you do not want to call log_model in all 4 of the Tensorflow workers. You want to log it in 1 of them. I believe you would use https://www.tensorflow.org/api_docs/python/tf/distribute/get_replica_context to figure out which worker you are, and perhaps only log if you are worker 0. That's what I do when using Horovod for a similar purpose.
You do not merge the models; they are the same model in all 4 replicas. That's the point of what this is doing.
If the model is 'worse' than with 1 replica, I would suspect other subtler issues are at play. For example, with 4 workers, your batch size has changed unless you compensate for that. See https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras#train_the_model for a discussion.
I'm working on a project where I have 3 inputs (v, f, n) and 1 output (delta(t)).
I'm trying to test the effect of the inputs on the output and to figure out which input is the most effective in different situations, therefore I would like to predict new output values that depend on new inputs values.
I have been testing this system and I got the following data table:
This table contains 1000 rows.
I'm new to this whole Neural Network thing, so I don't know what should be the Activation function, the loss function, etc.
I've been trying use some Keras models, but I'm getting wrong predictions when trying model.predict() some inputs values.
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
model = Sequential()
model.add(Dense(16, activation='relu', input_shape=(3,)))
model.add(Dense(16, activation='relu'))
model.add(Dense(1))
model.compile(optimizer=Adam(), loss='mse')
data = np.array(pd.read_excel(r'Data.xlsx'))
x = data[:, :3]
y = data[:, 3]
target = model.fit(x, y, validation_split=0.2, epochs=15000,
batch_size=256)
# check some predictions:
print(model.predict([[0.9, 840370875, 240]]))
As a learning exercise, I'm trying to use an LSTM model with the Keras framework to predict the stock market based on multiple data points. The size of my input array is roughly [5000, 100]. Based on other questions on this site and articles online, the approach seems fairly standard: put the data in a numpy array, scale it, reshape it to 3 dimensions for the LSTM, split it into train and test sections, and feed it through the model. Running only the training portion of the model, I am consistently getting loss scores around 400,000,000. This is not changed by altering the batch size, the number of epochs, the number of layers, replacing the normalization with dropout layers, changing the sizes of each layer, or using different optimizers and loss functions. Any idea why the loss is so high and what I can do to fix that? Attached is the code. All advice is greatly appreciated.
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, losses, optimizers, Model, preprocessing
from keras.utils import plot_model
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
scaler = MinMaxScaler(feature_range=(0, 1))
features_df = pd.read_csv("dataset.csv")
features_np = np.array(features_df)
features_np.astype(np.float64)
scaler.fit_transform(features_np)
num_features=features_np.shape[1]
features = np.reshape(features_np, (features_np.shape[0], 1, features_np.shape[1]))
labels_np = np.array(pd.read_csv("output.csv"))
scaler.fit_transform(labels_np)
test_in = features_np[int(features_np.shape[0] * 0.75):]
test_in = np.reshape(test_in, (test_in.shape[0], 1, test_in.shape[1]))
test_out = labels_np[int(labels_np.shape[0] * 0.75):]
test_out = np.reshape(test_out, (test_out.shape[0], 1, test_out.shape[1]))
inputs = layers.Input(shape=(1, features.shape[2]))
x = layers.LSTM(5000, return_sequences=True)(inputs)
lstm1 = layers.LSTM(1000, return_sequences=True)(x)
norm1 = layers.BatchNormalization()(lstm1)
lstm2 = layers.LSTM(1000, return_sequences=True)(norm1)
lstm3 = layers.LSTM(1000, return_sequences=True)(lstm2)
norm2 = layers.BatchNormalization()(lstm3)
lstm4 = layers.LSTM(1000, return_sequences=True)(norm2)
lstm5 = layers.LSTM(1000)(lstm4)
dense1 = layers.Dense(1000, activation='relu')(lstm5)
dense2 = layers.Dense(1000, activation='sigmoid')(dense1)
outputs = layers.Dense(2)(dense2)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(features, labels_np, epochs=1, batch_size=4)
evaluate = model.evaluate(test_in, test_out, verbose=2)
While I have not solved the error, implementing the Sequential() model and using only two LSTM layers and a Dense layer changed the error: the training error is now very low while testing remains high. This now appears to be a (relatively) simple problem of overfitting rather than the more confusing error of high training loss. Hopefully, this helps anyone having a similar problem.
There are two things i notice and dont understand why you use them. First one is , dense2 layer with sigmoid activation. I dont think sigmoid activation is benefical to when we are trying to solve a regression problem. Can you change that to relu and see what happens. Second one is you have two dense layers. You did not specify that but i think you are predicting two values with same inputs. If you are trying to predict just one value, you should you should change that to
outputs = layers.Dense(1)(dense2)