Difficulty with stacking MNIST and Fashion_MNIST - tensorflow

I know it's basic and too easy for you people, but I'm a beginner who needs your help.
I'm struggling to make binary classifier with CNN.
My final goal is to check accuracy over 0.99
I import both MNIST and FASHION_MNIST to identify if it's number or clothing.
So there are 2 category. I want to categorize 0-60000 as 0, and 60001-120000 as 1.
I will use binary_crossentropy.
but I dont know how to start from the beginning.
How can I use vstack hstack at first to combine MNIST and FASHION_MNIST?
This is how I tried so far
****import numpy as np
from keras.datasets import mnist
from keras.datasets import fashion_mnist
import keras
import tensorflow as tf
from keras.utils.np_utils import to_categorical
num_classes = 2
train_images = train_images.astype("float32") / 255
test_images = test_images.astype("float32") / 255
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))
train_labels = to_categorical(train_labels, num_classes)
test_labels = to_categorical(test_labels, num_classes)****

First of all
They're images so better treat them as images and don't reshape them to vectors.
Now the answer of the question. Suppose you have mnist_train_image and fashion_train_image, both have (60000, 28, 28) input shape.
What you want to do is consist of 2 parts, combining inputs and making the targets.
First the inputs
As you've already wrote in the question, you can use np.vstack like this
>>> train_image = np.vstack((fashion_train_image, mnist_train_image))
>>> train_image.shape
(120000, 28, 28)
But as you should have already noticed, remembering whether you need vstack or dstack or hstack is kinda a pain. My preference is that I'd use np.concatenate instead
>>> train_image = np.concatenate((fashion_train_image, mnist_train_image), axis=0)
>>> train_image.shape
(120000, 28, 28)
Now instead of remembering what the duck are v or h or d you just need to remember the axis (or dimension) you want to concatenate, in this case it's the first axis which means 0. Especially in case like this one where the "vertical" is the second axis because it's a stack of images and the first axis is "batch".
Next, the labels
Since you want to categorize 0-60000 as 0, and 60001-120000 as 1, there's a lot of fancy ways to do this.
But in a nutshell you can use np.zeros to create an array filled with 0. And np.ones to, you guess it, create an array filled with 1. But as both ones and zeros give you an array of float and I'm not sure this will become a problem or not so I add .astype('uint8') in the back just in case. You can add parameter dtype='uint8' in the function too.
Use the concatenate from above
>>> train_labels = np.concatenate((np.zeros(60000), np.ones(60000))).astype('uint8')
>>> train_labels.shape
(120000,)
Use ones or zeros for the whole size and subtract or add or reassign the rest
>>> train_labels = np.zeros(120000).astype('uint8')
>>> train_labels[60000:] = 1
#####
>>> train_labels = np.ones(120000, dtype='uint8')
>>> train_labels[:60000] -= 1
Important!!!!
There's a noticeable mistake in your example about the label, the index start with 0 so the 60,000th index is 59,999.
So what you actually want is categorize 0-59999 as 0, and 60000-119999 as 1.

Related

I am only getting `accuracy_score` instead of `roc_auc` for XGBClassifier in both GridSearch and cross validation

I am using XGBClassifier for the Rain in Australia dataset and trying to predict whether it will rain today or not. I wanted to tune the hyperparameters of the classifier with GridSearch and score it with ROC_AUC. Here is my code:
param_grid = {
"max_depth": [3, 4, 5, 7],
"gamma": [0, 0.25, 1],
"reg_lambda": [0, 1, 10],
"scale_pos_weight": [1, 3, 5],
"subsample": [0.8], # Fix subsample
"colsample_bytree": [0.5], # Fix colsample_bytree
}
from sklearn.model_selection import GridSearchCV
# Init the classifier
xgb_cl = xgb.XGBClassifier(objective="binary:logistic", verbose=0)
# Init the estimator
grid_cv = GridSearchCV(xgb_cl, param_grid, scoring="roc_auc", n_jobs=-1)
# Fit
_ = grid_cv.fit(X, y)
When the search is finally done, I am getting the best score with .best_score_ but somehow only getting an accuracy score instead of ROC_AUC. I thought this was only the case with GridSearch, so I tried HalvingGridSearchCV and cross_val_score with scoring set to roc_auc but I got accuracy score for them too. I checked this by manually computing ROC_AUC with sklearn.metrics.roc_auc_score.
Is there anything I am doing wrong or what is the reason for this behavior?
Have you tried your own roc_auc scoring rule? It seems like you are passing labels instead of probabilities (you originally need) for roc_auc.
problem described in here:
Different result roc_auc_score and plot_roc_curve
Solutions for own scorers:
Grid-Search finding Parameters for AUC
Update2
Sorry, saw today that my introduction text from the notebook was missing lol
When calculating roc_auc_score you have the option (it doesnt matter, if it is with or without gridsearch, with or without pipeline) that you can pass it labels like (0/1) or probabilities like (0.995, 0.6655). The first should be easy available if you just convert your probas to labels. However that would result in a (straight reversed L) output plot. That looks sometimes ugly. the other option is to use predicted probabilites to pass them them to the roc_auc_score. That would result in a (staircase reversed L) output plot, which looks much better.
So what you first should test is, can you get a roc auc score with labels, with and without grid, if that is the case. You should then try to get probabilities. And there, I believe, you have to write your own scoring method, as the roc-auc_score in grid only serves labels, that would result in high roc_auc scores. I wrote something for you, so you can see the label approach:
import xgboost as xgb
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
xgb_model = xgb.XGBClassifier(objective="binary:logistic",
eval_metric="auc",
use_label_encoder=False,
colsample_bytree = 0.3,
learning_rate = 0.1,
max_depth = 5,
gamma = 10,
n_estimators = 10,
verbosity=None)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
xgb_model.fit(X_train, y_train)
preds = xgb_model.predict(X_test)
print(confusion_matrix(preds, y_test))
print ('ROC AUC Score',roc_auc_score(y_test,preds))
Gives:
[[51 2]
[ 3 87]]
ROC AUC Score 0.9609862671660424
Here you can see it is ridicoulous high.
If you wanna do it with grid:
get rid of this:
# Fit
_ = grid_cv.fit(X, y)
just grid_cv.fit(x, y) fit is a method applied to grid_cv and results are stored within grid_cv
print(grid_cv.best_score_) should deliver the auc as you already have defined it.
See also: different roc_auc with XGBoost gridsearch scoring='roc_auc' and roc_auc_score?
But this should also be ridicoulos high, as you will be probably serving labels instead of probas.
beware also of:
What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?
And nobody hinders you to apply the roc-auc_score function to your grid_results...

How to visualize CIFAR10 images as matrices

I am currently trying to work with CIFAR10 images. I have the following snippet
import tensorflow as tf
from tensorflow.keras import datasets,layers,models
import matplotlib.pyplot as plt
(train_images,train_labels),(test_images,test_labels)=datasets.cifar10.load_data()
#train_images,test_images=train_images/,test_images
when I print print(train_images[0]) I get 32*32*3 matrix, when I print print(train_images[0][0) I get 32*3 matrix, however I thought it should be 32*32 matrix. How does slicing work with this image, which dimension come first. Any insight and recommendation on reading material will be highly appreciated
train_images variable have batch of images and images are numpy metrics and slicing works same for all metrics in numpy.
Dimensions comes as [batch, rows, columns, channels].
To get first image you will print: print(train_images[0].shape) and it will output (32, 32, 3).
To get first channel of image you will print: print(train_images[0, :, :, 0]) and it will output (32, 32) first channel and so on print(train_images[0, :, :, 1]) for second channel, print(train_images[0, :, :, 2]) for third channel.
Where ':' implies all values.
train_images[0, 0] will output values from first row of first image from batch (32, 3)
More on: basics indexing,arrays indexing

Memory error while creating large one hot encoding for lstm

I am trying to build a character level lstm model using keras and for that I need to create one hot encoding for characters to feed in the model. And I have around 1000 characters in each line with around 160,000 lines.
I tried to create a numpy array of zeros and make the corresponding entries 1, but I am geting memory error due to large size of the matrix is there any other way to do this.
Sure:
Create batches. Only process, say, 10,000 entries (characters) at a time, computing and feeding them into your neural network just before they're needed (say, by using a generator instead of a list). Keras has a fit_generator training function to do this.
Group chunks of data together. Say, instead of a line being a matrix of the one-hot encodings of its characters, instead use the sum/max of all those columns to produce a single vector for the line. Now, each line is only a single vector, with dimensionality equal to the number of unique characters in your data set. E.g., instead of [[0, 0, 1], [0, 1, 0], [0, 0, 1]], use, [0, 1, 1] to represent the entire line.
Perhaps an easier and more intuitive solution is to add a custom one-hot encoding layer in your Keras model architecture.
def build_model(self, batch_size, print_summary=False):
X = Input(shape=(self.sequence_length,), batch_size=batch_size)
embedding = OneHotEncoding(num_classes=self.vocab_size+1,
sequence_length=self.sequence_length)(X)
encoder = Bidirectional(CuDNNLSTM(units=self.recurrent_units,
return_sequences=True))(embedding)
...
where we can define the OneHotEncoding layer as follows:
from tensorflow.keras.layers import Lambda
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Layer # for creating custom layers
class OneHotEncoding(Layer):
def __init__(self, num_classes=None, sequence_length=None):
if num_classes is None or sequence_length is None:
raise ValueError("Can't leave params #num_classes or #sequence_length empty")
super(OneHotEncoding, self).__init__()
self.num_classes = num_classes
self.sequence_length = sequence_length
def encode(self, inputs):
return K.one_hot(indices=inputs,
num_classes=self.num_classes)
def call(self, inputs):
return Lambda(function=self.encode,
input_shape=(self.sequence_length,))(inputs)
Here we are utilizing the fact that the Keras model is fed the training samples in appropriate batch sizes (with the standard fit function), which in turn doesn't yield a MemoryError.

One_Hot Encode and Tensorflow (Explain behind the scenes )

I am new to deep learning world and tensorflow. Tensorflow is so complicated for me right now.
I was following a tutorial on TF Layers API and I got this issue with one hot encode. Here is my code
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
wine_data = load_wine()
feat_data = wine_data['data']
labels = wine_data['target']
X_train, X_test, y_train, y_test = train_test_split(feat_data,
labels,
test_size=0.3,
random_state=101)
scaler = MinMaxScaler()
scaled_x_train = scaler.fit_transform(X_train)
scaled_x_test = scaler.transform(X_test)
# ONE HOT ENCODED
onehot_y_train = pd.get_dummies(y_train).as_matrix()
one_hot_y_test = pd.get_dummies(y_test).as_matrix()
num_feat = 13
num_hidden1 = 13
num_hidden2 = 13
num_outputs = 3
learning_rate = 0.01
import tensorflow as tf
from tensorflow.contrib.layers import fully_connected
X = tf.placeholder(tf.float32,shape=[None,num_feat])
y_true = tf.placeholder(tf.float32,shape=[None,3])
actf = tf.nn.relu
hidden1 = fully_connected(X,num_hidden1,activation_fn=actf)
hidden2 = fully_connected(hidden1,num_hidden2,activation_fn=actf)
output = fully_connected(hidden2,num_outputs)
loss = tf.losses.softmax_cross_entropy(onehot_labels=y_true, logits=output)
optimizer = tf.train.AdamOptimizer(learning_rate)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
training_steps = 1000
with tf.Session() as sess:
sess.run(init)
for i in range(training_steps):
sess.run(train,feed_dict={X:scaled_x_train,y_true:y_train})
# Get Predictions
logits = output.eval(feed_dict={X:scaled_x_test})
preds = tf.argmax(logits,axis=1)
results = preds.eval()
When I run this code I got this error
ValueError: Cannot feed value of shape (124,) for Tensor
'Placeholder_1:0',
which has shape '(?, 3)'
After a little digging I found that modifying sess.run to
sess.run(train,feed_dict{X:scaled_x_train,y_true:onehot_y_train})
and changing y_train to onehot_y_train made the code run
I just want to know what is happening behind the scenes and why is the one_hot encoding that necessary in this code?
Your network is making a class prediction on 3 classes, class A, B, and C.
In defining a neural network to transform your 13 inputs to a representation that you can use to distinguish between these 3 classes you have a few choices.
You could output 1 number. Let's define a single-value output <1 represents class A, an output between [0,1] is class B, and an output >1 is class C.
You could define this, use a loss function like square error, and the network would learn to work under these assumptions and probably do half way decently at it.
However, that was a rather arbitrary choice of values to define 3 classes, as I'm sure you can see. And it's certainly sub-optimal. Learning this representation is harder than it needs to be. Can we do better?
Let's pick a more reasonable approach. Instead of 1 output we have 3 outputs. We define each output to represent how strongly we believe in a particular class. In order to conform to the cross entropy loss you use we'll further constrain those values to be in the range [0,1] by applying a sigmoid to them. So great, we now have 3 values in range [0,1] that each represent the belief that the input should fall into each of our 3 classes.
You have labels for each of your inputs, you know for sure that these inputs are class A, B, or C. So for a given input that is say class C, your label would naturally be [0, 0, 1] (e.g. you know it's not A or B, so 0 in both of those cases, and 1 for C which you know the class to be). Voila, you have the one-hot encoding!
As you might imagine this is a much easier problem to solve than the first one I presented. Hence we choose to represent our problem this way because we end up with networks that perform better when we do. It's not that you couldn't represent it another way, you just want the best results possible and one-hot encoding typically performs above other representations you might dream up.

Tensorflow tf.expand_dims

The original Tensorflow tutorial includes the following code:
batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
The second line adds a dimension to the labels tensor. However, labels was fed in via a feed dictionary so it should already have shape [batch_size, NUM_CLASSES]. If so then why is expand_dims used here?
That tutorial is pretty old. You're referencing version 0.6 whereas they are at 0.11 as of (11-20-2016 time of this post). So there were many functions that were different at that time v0.6.
Anyways to answer your question:
The labels in mnist were just encoded as the digits 0-9. however, the loss function expected the labels to be encoded as a one hot vector.
The labels are not already [batch_size, NUM_CLASSES] in that example it was just [batch_size].
This could have been done via similar numpy functions. Also they have also since provided functions to get the labels from the mnist dataset in tensorflow as one hot vectors which do already have the shape you stated.