Raw CIFAR-10 to CNN input with numpy and tensorflow - numpy

i'm new to NNs with Python and Tensorflow, and I'm trying to create the inputs for my CNN.
I have CIFAR10 dataset, a 50000x3072 list in Python (a list containing 50000 lists of 3072 elements), for the training images, and i'm not using the CIFAR10 dataset from keras. The CNN is the same used for the basic TF example:
https://www.tensorflow.org/tutorials/images/cnn
Every 3072 elements list has the following organization: the first 1024 elements are for the first color channel, the seconds 1024 for the second color channel and so on.
I want to organize this list using a numpy array in the same way used in keras (an np array of 32 rows, each containing 32 np arrays of 3-dimension lists (3 color channels per pixel)).
I tried to use reshape and other basic functions but i'm not sure what to do to obtain the result.

To convert a Data from size of (50000,3072) to that required for CNN, you can use tf.reshape as shown below:
!pip install tensorflow==2.1
import tensorflow as tf
import numpy as np
tf.__version__
a = tf.constant(np.zeros((50000,3072)))
a.shape #TensorShape([50000, 3072])
b = tf.reshape(a, [-1,32,32,3])
b.shape #TensorShape([50000, 32, 32, 3])
And in the First Layer of the CNN, you can specify the Input shape as mentioned below:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
For more information about tf.reshape, please refer this Tensorflow Page.
For more information about CNN on CIFAR Dataset, please refer this CNN Tutorial in Tensorflow Site.

Related

Using Sparse Tensors as Input for Autoencoders

I have an One-hot-encoded sparse matrix which can't be transformed into a normal matrix due to its size.
I would like to reduce the dimensions using an autoencoder. Currently I am trying to use Tensorflow and its Keras library for that.
The Tensorflow docs state that sparse tensors exist and that they can be used in Keras (see https://www.tensorflow.org/guide/sparse_tensor).
The Problem is that all autoencoders I've found in the internet do not seem to work with sparse tensors.
I have prepared a small code example which stops after the first training epoch with the error message: "Failed to convert elements of SparseTensor to Tensor. Consider casting elements to a supported type.".
My Questions would be:
Do you have an idea to improve the Code or ideally do you have an example which I can look up?
If not: Do you have other ideas on how to do what I would like to do (e.g. another library, other method, etc.)?
Code Example:
#necessary imports
import tensorflow as tf
from keras.models import Model, Sequential
from keras.layers import Input, Dense, ActivityRegularization
from tensorflow.keras import backend as K
from tensorflow.keras import regularizers
#example one-hot-encoded matrix with 10 records with each one out of 4 distinct categories
sparse_tensor = tf.sparse.SparseTensor(indices=[[0,3], [1,3], [2,0], [3,1], [4,0], [5,2], [6,2], [7,1], [8,3], [9,1]],
values=[1 for i in range(10)],
dense_shape=[10, 4])
encoder = Sequential([
Input(shape=(4,), sparse=True),
Dense(1, activation = 'relu'),
ActivityRegularization(l1=1e-3)
])
decoder = Sequential([
Dense(4, activation = 'sigmoid', input_shape = (1, )),
])
autoencoder = Sequential([encoder, decoder])
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(x=sparse_tensor, y=sparse_tensor, epochs=5, batch_size=5, shuffle=True)

Tries to understand Tensorflow input_shape

I have some confusions regarding to Tensorflow input_shape.
Suppose there are 3 documents (each row) in "doc" defined below, and the vocabulary has 4 words (each sublist in each row).
Further suppose that each word is represented by 2 numbers via word embedding.
The program only works when I specify input_shape=(3,4,2) under a Dense layer.
But when I use a LSTM layer, the program only works when input_shape=(4,2) but not when input_shape=(3,4,2).
So how to specify the input shape for such inputs? How to make sense of it?
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
doc=[
[[1,0],[0,0],[0,0],[0,0]],
[[0,0],[1,0],[0,0],[0,0]],
[[0,0],[0,0],[1,0],[0,0]]
]
model=Sequential()
model.add(Dense(2,input_shape=(3,4,2))) # model.add(LSTM(2,input_shape=(4,2)))
model.compile(optimizer=Adam(learning_rate=0.0001),loss="sparse_categorical_crossentropy",metrics=("accuracy"))
model.summary()
output=model.predict(doc)
print(model.weights)
print(output)
The input_shape argument in a keras.layers.LTSM layer expects a 2D array with a shape of [timesteps, features]. Your doc has the shape [batch_size, timesteps, features] and therefore one dimension too much.
You can use the batch_input_shape argument instead, if you want feed batch_size, too.
To do so, you have just to replace this line of your code:
model.add(LSTM(2,input_shape=(4,2)))
With this one:
model.add(LSTM(2,batch_input_shape=(3,4,2)))
If you're setting a specific batch_size in your model and then feed a different size other than 3 (in your case), you will get an error. Using input_shape instead you have the flexibility to feed any batch size to the network.

Problem with shapes of experimental Tensorflow dataset

I am trying to store numpy arrays in a Tensorflow dataset. The model fits correctly when using the numpy arrays as train and test data but not when I store the numpy arrays in a single Tensorflow dataset. The problem is with the dimensions of the dataset. Something is wrong even though shapes seem OK at first sight.
After trying multiple things to reshape my Tensorflow dataset, I am still unable to get it working. My code is the following:
train_x.shape
Out[54]: (7200, 40)
train_y.shape
Out[55]: (7200,)
dataset = tf.data.Dataset.from_tensor_slices((x,y))
print(dataset)
Out[56]: <TensorSliceDataset shapes: ((40,), ()), types: (tf.int32, tf.int32)>
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
history = model.fit(dataset, epochs=EPOCHS, batch_size=256)
sparse_softmax_cross_entropy_with_logits
logits.get_shape()))
ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (40, 1351)).
I have seen this answer but I am sure it doesn't apply here. I must use sparse_categorical_crossentropy. I am inspiring myself from this example where I want to store the train and test data in a Tensorflow dataset. I also want to store the arrays in a dataset as I will have to use it later.
You can't use batch_size with model.fit() when using a tf.data.Dataset. Instead use tf.data.Dataset.batch(). You'll have to change your code as follows for it to work.
import numpy as np
import tensorflow as tf
# Some toy data
train_x = np.random.normal(size=(7200, 40))
train_y = np.random.choice([0,1,2], size=(7200))
dataset = tf.data.Dataset.from_tensor_slices((train_x,train_y))
dataset = dataset.batch(256)
#### - Define your model here - ####
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
history = model.fit(dataset, epochs=EPOCHS)

How to create an NLP processing pipeline with Keras

I regularly use scikit-learn pipelines to streamline model processing, and I'm wondering the easiest way to do something similar with Keras in Tensorflow 2.0.
What I'd like to do is deploy a Keras model as an API endpoint, and then submit a piece of text in a numpy array to it and have it tokenized, padded and predicted. But I don't know the shortest path to do this.
Here's some sample code:
from tensorflow import keras
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, Dense, Flatten
import numpy as np
sample_words = [
'The sky is blue',
'The sky delivers us many gifts',
'Wise men appreciate gifts for what they are, not what they are not',
'Wherever you go, there you are',
'Don\'t pass judgment onto others, or you will quickly be judged yourself'
]
y = np.array([1, 0, 1, 1, 0])
tokenizer = Tokenizer(num_words=10)
tokenizer.fit_on_texts(sample_words)
train_sequences = tokenizer.texts_to_sequences(sample_words)
train_sequences = pad_sequences(train_sequences, maxlen=7)
mod = Sequential([
Embedding(10, 2, input_length=7),
Flatten(),
Dense(3, activation='relu'),
Dense(1, activation='sigmoid')
])
mod.compile(optimizer='adam', loss='binary_crossentropy')
mod.fit(train_sequences, y)
The idea is that if I have a web form and someone submits a form with the words 'The sky is pretty today', I can wrap it in a numpy array, send it to the endpoint (which will be setup on Google Cloud), and have it padded, tokenized, and predicted.
In scikit learn it would be as simple as: pipe = make_pipeline(tokenizer, mod), and then go from there.
I have a feeling there are some solutions that include td.Datasets, but I was hoping keras had something in it that was more user friendly.
Keras is easy in a way that there is no need to explicitly build any pipelines.
The Keras model is using Tensorflow backend to create a computation graph which could be loosely said as similar to scikit-learn's pipeline.
Thus your mod is in itself equivalent to a pipeline having the operations: Embedding -> Flatten -> Dense -> Dense. The mod.compile() method is generating the tensorflow computation graph.
Then everything comes together in model.fit() method where you plug in your inputs to your model (i.e. pipeline) and then the method trains on your data.
In order to have the tokenization be a part of your model, the TextVectorization layer can be used.
This layer has basic options for managing text in a Keras model. It transforms a batch of strings (one sample = one string) into either a list of token indices (one sample = 1D tensor of integer token indices) or a dense representation (one sample = 1D tensor of float values representing data about the sample's tokens)
Code snapshot:
vectorize_layer = TextVectorization(
max_tokens=max_features,
output_mode='int',
output_sequence_length=max_len
)
model.add(vectorize_layer)
input_data = [["foo qux bar"], ["qux baz"]]
model.predict(input_data)
>>>
array([[2, 1, 4, 0],
[1, 3, 0, 0]])

How to give multiple input at each time step of a sequential data to a recurrent neural network using tensorflow?

Suppose i am having a data set with: number of observations = 1000, each observation is a sequence of fixed length = 10(lets say), and each point in the sequence having 2 features(numerical). how we can input such data to an rnn in tensorflow ?
Any small suggestions also accepted. Thanks
According to your description, Your dataset is 1000x10x2
which looks something like this:
import numpy as np
data=np.random.randint(0,10,[1000,10,2])
Now as you said your sequence is fixed size so you don't need padding , now you have to just decide batch_size and then iterations
suppose batch size is 5:
batch_size=5
iterations=int(len(train_dataset)//batch_size)
Now feed your input to tensorflow lstm cell , your model would be something like this:
Here is example without batch size,
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn
data=np.random.randint(0,10,[1000,10,2])
input_x=tf.placeholder(tf.float32,[1000,10,2])
with tf.variable_scope('encoder') as scope:
cell=rnn.LSTMCell(150)
model=tf.nn.dynamic_rnn(cell,inputs=input_x,dtype=tf.float32)
output_,(fs,fc)=model
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(model, feed_dict={input_x: data})
print(output)
if you want to use batch then you have to either reshape data for LSTM or you have to use embedding, because LSTM takes rank 3