I am using Keras to build a CNN to work with the CIFAR-10 dataset. I am slightly confused at one of the last lines of an online tutorial. They take 50,000 32x32 color images and process them through 4 convolutional layers and one fully connected layer. The last part is accomplished by:
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
I am trying to understand why it is model.add(Dense(512)) and not some other number. For example, I thought 32x32 images can be flattened to a 1024-size vector. But, why did they choose 512 here?
Thanks!
Actualy not 32x32, it's 32x32x3 because of color channels and flatten and dense different methods I think you won't get the code there is low level implementation:
W1=tf.Variable(tf.random_normal([32*32*3,512]),name="W1") #variable
x=tf.placeholder(tf.float32,[batch,32,32,3]) #placeholder for inputs
flat=tf.reshape(x,[batch,32*32*3]) #model.add(Flatten())
mul1=tf.matmul(flat,W1) #model.add(Dense(512))
relu=tf.nn.relu(mul1) #model.add(Activation('relu'))
flat's shape=[batch,32*32*3]
mul1's shape=[batch,512]
Of course it could be 1024 or 5000 but it becomes harder to optimize.
Related
I am new to CNN topic, I have one basic question regarding mapping between input image with neurons in first convolution layer.
My question is :
should input image go to all neurons in first convolution layer (I mean first hidden layer) or not ?
For example: if my first hidden layer in CNN as 8 neurons, in that case complete input image is passed to all these 8 neurons or only set of pixels of input image is passed to each neuron.
I am not sure whether you understand what you are asking because the number of neurons in a convolutional layer is something that you don't concern yourself too much with unless you are building your own implementation of a CNN.
To answer your question - No, each neuron in the first convolutional layer is connected only to pixels in its own receptive field (given by kernel size) and the same logic also applies to next convolutional layers as well except now they are connected to lower layer neurons in their receptive field.
For example: if my first hidden layer in CNN as 8 neurons
How do you know that it has 8 neurons? Unless you are doing some low level CNN programming, you do not specify the number of neurons. The number of neurons that you need is usually given by a combination of kernel size, stride, type of padding and the amount of filters that you have chosen to use. These 4 things (together with input size of the image) tells you exactly how many neurons you need.
For example, in Keras (since you have tagged this question with tensorflow) you might see convolutional layer like this one:
keras.layers.Conv2D(filters=128, kernel_size=3, strides=1, activation="relu",
input_shape=(100, 100, 1))
As you can see, you don't specify anything like the number of neurons here (at least not directly). Under this settings, you end up with output whose width and height are cropped by 2 (due to the default padding) so the output shape of this layer is (128, 98, 98) (technically it has a shape of (None, 128, 98, 98) where None is for batch size). If you would flatten this and feed the output into a single neuron (let's say in a dense layer), you would end up with 128 * 98 * 98 = 1,229,313 weights just between these two layers.
So for the analogy between dense and convolutional layer, the above convolutional layers with 128 filters connected to one output neuron is similar to having dense layer with 1,229,313 neurons connected to one output neuron.
So I am new to computer vision, and I do not really know what the layers do in keras. What is the use of adding layers (dense, Conv2D, etc) in keras? What do they add to it?
Convolution neural network has 4 main steps: Convolution, Pooling, Flatten, and Full connection.
Conv2D(), Conv3D(), etc. is for Feature extraction (It's a Convolution Layer).
Pooling layers (MaxPool2D(), AvgPool2D(), etc) is for Feature extraction as well (It has different operation though).
Flattening layers (Flatten() ) are to convert the extracted feature map into Vector before being fed into the Fully connection layers (The Dense layers).
Dense layers are for Fully connected step in Computer vision that acts as Classifier (The Neural network classify each extracted features from the Convolution layers.)
There are also optimization layers such as Dropout(), BatchNormalization(), etc.
For more information, just open the keras documentation.
If you want to start learning Convolution neural network, this article may help.
A layer in an Artificial Neural Network is a bunch of nodes bound together at a specific depth in a Neural Network. Keras is a high level API used over NN modules like TensorFlow or CNTK in order to simplify tasks. A Keras layer comprises 3 main parts:
Input Layer - Which contains the raw data
Hidden layer - Where the nodes of a layer learn some aspects about
the raw data which is input. It's similar to levels of abstraction
to form a Neural network.
Output Layer - Consists of a single output which is mostly a single
node and can be subjected to classification.
Keras, as a whole consists of many different types of layers. A Convolutional layer creates a kernel which is convoluted with the input over a single temporal space to derive a group of outputs. Pooling layers provide sampling of the feature maps by simplifying features in a map into patches. Max Pooling and Average Pooling are commonly used methods in a Pool layer.
Other commonly used layers in Keras are Embedding layers, Noise layers and Core layers. A single NN layer can represent only a Linearly seperable method. Most prediction problems are complicated and more than just one layer is required. This is where Multi Layer concept is required.
I think i clear your doubts and for any other queries you can see on https://www.tensorflow.org/api_docs/python/tf/keras
Neural networks are a great tool nowadays to automate classification problems. However when it comes to computer vision the amount of input data is too great to be handled efficiently by simple neural networks.
To reduce the network workload, your data needs to be preprocessed and certain features need to be identified. To find features in images we can use certain filters (like sobel edge detection), which will highlight the essential features needed for classification.
Again the amount of filters required to classify one image is too great, and thus the selection of those filters needs to be automated.
That's where the convolutional layer comes in.
We use a convolutional layer to generate multiple random (at first) filters that will highlight certain features in an image. While the network is training those filters are optimized to do a better job at highlighting features.
In Tensorflow we use Conv2D() to add one of those layers. An example of parameters is : Conv2D(64, 3, activation='relu'). 64 denotes the number of filters used, 3 denotes the size of the filters (in this case 3x3) and activation='relu' denotes the activation function
After the convolutional layer we use a pooling layer to further highlight the features produced by the previous convolutional layer. In Tensorflow this is usually done with MaxPooling2D() which takes the filtered image and applies a 2x2 (by default) layer every 2 pixels. The filter applied by MaxPooling is basically looking for the maximum value in that 2x2 area and adds it in a new image.
We can use this set of convolutional layer and pooling layers multiple times to make the image easier for the network to work with.
After we are done with those layers, we need to pass the output to a conventional (Dense) neural network.
To do that, we first need to flatten the image data from a 2D Tensor(Matrix) to a 1D Tensor(Vector). This is done by calling the Flatten() method.
Finally we need to add our Dense layers which are used to train on the flattened data. We do this by calling Dense(). An example of parameters is Dense(64, activation='relu')
where 64 is the number of nodes we are using.
Here is an example CNN structure I used recently:
# Build model
model = tf.keras.models.Sequential()
# Convolution and pooling layers
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 1))) # Input layer
model.add(tf.keras.layers.MaxPooling2D())
model.add(tf.keras.layers.Conv2D(64, 3, activation='relu'))
model.add(tf.keras.layers.MaxPooling2D())
# Flattened layers
model.add(tf.keras.layers.Flatten())
# Dense layers
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(2, activation='softmax')) # Output layer
Of course this worked for a certain classification problem and the number of layers and method parameters differ depending on the problem.
The Youtube channel The Coding Train has a very helpful video explaining the Convolutional and Pooling layer.
Are there any methods that I may employ to identify 'false' 4K images? i.e. Images that have been upscaled to 4K from 720p/1080p.
I have tried searching but I have mainly only found methods to upscale images with different methods like Billinear, Bicubic, Lanczos , SRCNN and EDSR.
How may I then identify these images that have been upscaled from a lower resolution from 'truly 4K' images?
I currently have a dataset of 200 'true' 4K images that I will downscale and upscale again using one of the methods stated above. Is there a way I can train a model to differentiate these images in a given image dataset? This should give me at least 400 images to work with, with 2 categories. True 4K and Upscaled 4K.
Is there a machine learning model I should use? So far I am new to the field of computer vision, digital image processing, machine learning in general and have only had experience with Convolutional Neural Network Image Classifiers. Can a CNN be used to train a model to identify such images? Or is a machine learning approach not suitable in this case?
Thank you for your time.
EDIT: Following #CAFEBABE suggestion, I've split these 4K images (real, lanczos upscaled and bicubic upscaled from 1080p) into 51200 images of 240x135 for each category and put them into a CNN as shown below.
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape = (135, 240, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
# 2 hidden layers
model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dense(64))
model.add(Activation("relu"))
# The output layer with 3 neurons, for 3 classes
model.add(Dense(3))
model.add(Activation("softmax"))
# Compiling the model using some basic parameters
model.compile(loss="sparse_categorical_crossentropy"
,optimizer="adam"
,metrics=["accuracy"])
However, my model does not seem to be learning based on upscaling and is instead trying to categorise based on what the image is; I am getting accuracies of 33% (purely random)
Is CNN able to be used for this problem, or is there something I missed with my model?
You should try.
Short message potentially you can do it with a CNN, trained on the two class problem upscaled/not upscaled. I actually would train it to even identify the method, as it seems to be an easier problem. I guess you need more images though. Secondly to train a CNN on such large resolution images is a pain in the neck.
hence I'd follow the following approach:
(step1) build a dataset on lower resolution patches from large scale images. So a 4096 × 2160 consists basically out of 16 1024x540 image datasets and so on. To make it realistically trainable build up a dataset of images with resolution 227x240 from any source.
(step2) down and upscale these images like you would do with the high res images. For this step I would not use the patches itself but the original high-res images
(step3) Train a NN to identify.
(step4) Calculate for each image who well it helps to solve the problem (entropy good vs bad)
(step5) build a segmentation model which selects from an image the best region(s) to solve the problem. So which 227x240 patches out ouf a 4k images help you to identify the downscaling. The segmentation does not need to be trained on the full resolution image. Assumption is that you will not be able to identify certain upscaling methods on uniform coloured image regions.
(loop) but use in step1 the segmentation model to identify patches.
Try out GAN for this. It will help you to solve this problem
I want to predict the estimated wait time based on images using a CNN. So I would imagine that this would use a CNN to output a regression type output using a loss function of RMSE which is what I am using right now, but it is not working properly.
Can someone point out examples that use CNN image recognition to output a scalar/regression output (instead of a class output) similar to wait time so that I can use their techniques to get this to work because I haven't been able to find a suitable example.
All of the CNN examples that I found are for the MSINT data and distinguishing between cats and dogs which output a class output, not a number/scalar output of wait time.
Can someone give me an example using tensorflow of a CNN giving a scalar or regression output based on image recognition.
Thanks so much! I am honestly super stuck and am getting no progress and it has been over two weeks working on this same problem.
Check out the Udacity self-driving-car models which take an input image from a dash cam and predict a steering angle (i.e. continuous scalar) to stay on the road...usually using a regression output after one or more fully connected layers on top of the CNN layers.
https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models
Here is a typical model:
https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/autumn
...it uses tf.atan() or you can use tf.tanh() or just linear to get your final output y.
Use MSE for your loss function.
Here is another example in keras...
model = models.Sequential()
model.add(convolutional.Convolution2D(16, 3, 3, input_shape=(32, 128, 3), activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(32, 3, 3, activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(64, 3, 3, activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(core.Flatten())
model.add(core.Dense(500, activation='relu'))
model.add(core.Dropout(.5))
model.add(core.Dense(100, activation='relu'))
model.add(core.Dropout(.25))
model.add(core.Dense(20, activation='relu'))
model.add(core.Dense(1))
model.compile(optimizer=optimizers.Adam(lr=1e-04), loss='mean_squared_error')
They key difference from the MNIST examples is that instead of funneling down to a N-dim vector of logits into softmax w/ cross entropy loss, for your regression output you take it down to a 1-dim vector w/ MSE loss. (you can also have a mix of multiple classification and regression outputs in the final layer...like in YOLO object detection)
The key is to have NO activation function in your last Fully Connected (output) layer. Note that you must have at least 1 FC layer beforehand.
I am working with convolutional autoencoders. My autoenoder configuration has one convolutional layer with stride (2,2) or avg-pooling and relu activation and one deconvolutional layer with stride (2,2) or avg-unpooling and relu activation.
I trained the autoencoder with the MNIST data set.
When I am looking at the feature maps after the first convolutional layer (20 filters with filter size 3), I got some black feature maps instead the learned filters are not black. The same happens if I change the number of filters or the filter size.
I get this phenomena with TensorFlow and Theano autoencoders. (I did not test other neural network software yet.)
Does anyone know why this happens?
I can avoid the black feature maps when adding a LRN layer but I want to understand why the black feature maps appear.
I found the same phenomenon.
After training an convolutional autoencoder with 7x7x3x6 for thousands RGB images, two or three filters has some outputs, other filters gets zero outputs.
And the error does not decrease when they has too many zero output filters.
I also changed the filter numbers and sizes but results were almost the same.