Are there any methods that I may employ to identify 'false' 4K images? i.e. Images that have been upscaled to 4K from 720p/1080p.
I have tried searching but I have mainly only found methods to upscale images with different methods like Billinear, Bicubic, Lanczos , SRCNN and EDSR.
How may I then identify these images that have been upscaled from a lower resolution from 'truly 4K' images?
I currently have a dataset of 200 'true' 4K images that I will downscale and upscale again using one of the methods stated above. Is there a way I can train a model to differentiate these images in a given image dataset? This should give me at least 400 images to work with, with 2 categories. True 4K and Upscaled 4K.
Is there a machine learning model I should use? So far I am new to the field of computer vision, digital image processing, machine learning in general and have only had experience with Convolutional Neural Network Image Classifiers. Can a CNN be used to train a model to identify such images? Or is a machine learning approach not suitable in this case?
Thank you for your time.
EDIT: Following #CAFEBABE suggestion, I've split these 4K images (real, lanczos upscaled and bicubic upscaled from 1080p) into 51200 images of 240x135 for each category and put them into a CNN as shown below.
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape = (135, 240, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
# 2 hidden layers
model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dense(64))
model.add(Activation("relu"))
# The output layer with 3 neurons, for 3 classes
model.add(Dense(3))
model.add(Activation("softmax"))
# Compiling the model using some basic parameters
model.compile(loss="sparse_categorical_crossentropy"
,optimizer="adam"
,metrics=["accuracy"])
However, my model does not seem to be learning based on upscaling and is instead trying to categorise based on what the image is; I am getting accuracies of 33% (purely random)
Is CNN able to be used for this problem, or is there something I missed with my model?
You should try.
Short message potentially you can do it with a CNN, trained on the two class problem upscaled/not upscaled. I actually would train it to even identify the method, as it seems to be an easier problem. I guess you need more images though. Secondly to train a CNN on such large resolution images is a pain in the neck.
hence I'd follow the following approach:
(step1) build a dataset on lower resolution patches from large scale images. So a 4096 × 2160 consists basically out of 16 1024x540 image datasets and so on. To make it realistically trainable build up a dataset of images with resolution 227x240 from any source.
(step2) down and upscale these images like you would do with the high res images. For this step I would not use the patches itself but the original high-res images
(step3) Train a NN to identify.
(step4) Calculate for each image who well it helps to solve the problem (entropy good vs bad)
(step5) build a segmentation model which selects from an image the best region(s) to solve the problem. So which 227x240 patches out ouf a 4k images help you to identify the downscaling. The segmentation does not need to be trained on the full resolution image. Assumption is that you will not be able to identify certain upscaling methods on uniform coloured image regions.
(loop) but use in step1 the segmentation model to identify patches.
Try out GAN for this. It will help you to solve this problem
Related
I am creating a machine learning model for classifying images of numbers. I have trained the model using Tensorflow and Keras using the inbuilt tf.keras.datasets.mnist dataset. The model works quite well with the test images from the mnist dataset itself but I would like to feed it images of my own. The images that I am feeding this model is extracted from a Captcha so they will follow a similar pattern. I have included some examples of the images in this public google drive folder. When I feed these images, I noticed that the model is not very accurate and I have some guesses as to why.
The background of the image creates too much noise in the picture.
The number is not centered.
The image is not striclty in the color format of MNIST training set (Black background white text).
I wanted to ask how can I remove the background and centre it so that the noise in the image is reduced allowing for better classifications.
Here is the model I am using:
import tensorflow as tf
from tensorflow import keras
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
class Stopper(keras.callbacks.Callback):
def on_epoch_end(self, epoch, log={}):
if log.get('acc') >= 0.99:
self.model.stop_training = True
print('\nReached 99% Accuracy. Stopping Training...')
model = keras.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(1024, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(
optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
x_train, x_test = x_train / 255, x_test / 255
model.fit(x_train, y_train, epochs=10, callbacks=[Stopper()])
And here is my method of importing the image into tensorflow:
from PIL import Image
img = Image.open("image_file_path").convert('L').resize((28, 28), Image.ANTIALIAS)
img = np.array(img)
model.predict(img[None,:,:])
I have also included some examples from the MNIST dataset here. I would like a script to convert my images as closely to the MNIST dataset format as possible. Also, since I would have to do this for an indefinite number of images, I would appreciate if you could provide a fully automated method for this conversion. Thank you very much.
You need to train with a dataset similar to the images you're testing. The MNIST data is hand-written numbers, which is not going to be similar to the computer generated fonts for Captcha data.
What you need to do is gain a catalog of Captcha data similar to what you're predicting on (preferably from the same source you will be inputting to the final model). It's a painstaking task to capture the data, and you'll probably need around 300-400 images for each label before you start to get something useful.
A key note: your model will only ever be as good as the training data you supplied to the model. Trying to make a good model with bad training data is an effort in pure frustration
To address some of your thoughts:
[the model is not very accurate because] the background of the image creates too much noise in the picture.
This is true. If the image data has noise and the neural net was not trained using any noise in the images, then it will not recognize a strong pattern when it encounters this type of distortion. One possible way to combat this is to take clean images and progamatically add noise to the image (noise similar to what you see in the real Captcha) before sending it to be trained.
[the model is not very accurate because] The number is not centered.
Also true for the same reasons. If all the training data is centered, the model will be overtuned for this property and make incorrect guesses. Follow a similar pattern to the one above if you don't have the capacity to manually capture and catalog a good sampling of data.
[the model is not very accurate because] The image is not striclty in the color format of MNIST training set (Black background white text).
You can get around this by applying a binary threshold to the data before processing/ normalize the color input before training. Depending on the amount of noise in the captcha you may have better results allowing the number and noise to retain some of it's color information (still put in greyscale and normalize, just don't apply the threshold).
Additionally I'd recommend using a convolution net rather than the linear network as it is better at distinguishing 2D features like edges and corners. i.e. use keras.layers.Conv2D layers before flattening with keras.layers.Flatten
See the great example found here: Trains a simple convnet on the MNIST dataset.
model = tf.keras.models.Sequential(
[
tf.keras.layers.Conv2D(
32,
kernel_size=(3, 3),
activation=tf.nn.relu,
input_shape=input_shape,
),
tf.keras.layers.Conv2D(64, (3, 3), activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(
num_classes, activation=tf.nn.softmax
),
]
)
I've used this setup for reading fonts in video gameplay footage, and with a test set of 10,000 images I'm achieving 99.98% accuracy, using a random sampling of half the dataset in training, and calculating accuracy using the total set.
I am using Keras to build a CNN to work with the CIFAR-10 dataset. I am slightly confused at one of the last lines of an online tutorial. They take 50,000 32x32 color images and process them through 4 convolutional layers and one fully connected layer. The last part is accomplished by:
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
I am trying to understand why it is model.add(Dense(512)) and not some other number. For example, I thought 32x32 images can be flattened to a 1024-size vector. But, why did they choose 512 here?
Thanks!
Actualy not 32x32, it's 32x32x3 because of color channels and flatten and dense different methods I think you won't get the code there is low level implementation:
W1=tf.Variable(tf.random_normal([32*32*3,512]),name="W1") #variable
x=tf.placeholder(tf.float32,[batch,32,32,3]) #placeholder for inputs
flat=tf.reshape(x,[batch,32*32*3]) #model.add(Flatten())
mul1=tf.matmul(flat,W1) #model.add(Dense(512))
relu=tf.nn.relu(mul1) #model.add(Activation('relu'))
flat's shape=[batch,32*32*3]
mul1's shape=[batch,512]
Of course it could be 1024 or 5000 but it becomes harder to optimize.
I'm a beginner in machine learning and I currently am trying to predict the position of an object within an image that is part of a dataset I created.
This dataset contains about 300 images in total and contains 2 classes (Ace and Two).
I created a CNN that predicts whether it's an Ace or a two with about 88% accuracy.
Since this dataset was doing a great job, I decided to try and predict the position of the card (instead of the class). I read up some articles and from what I understood, all I had to do was to take the same CNN that I used to predict the class and to change the last layer for a Dense layer of 4 nodes.
That's what I did, but apparently this isn't working.
Here is my model:
model = Sequential()
model.add(Conv2D(64,(3,3),input_shape = (150,150,1)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(32,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=2))
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(4))
model.compile(loss="mean_squared_error",optimizer='adam',metrics=[])
model.fit(X,y,batch_size=1,validation_split=0,
epochs=30,verbose=1,callbacks=[TENSOR_BOARD])
What I feed to my model:
X: a grayscale Image of 150x150 pixels. Each pixels are rescaled between [0-1]
y: Smallest X coordinate, Highest Y coordinate, Width and Height of the object (each of those values are between [0-1].
And here's an example of predictions it gives me:
[array([ 28.66145 , 41.278576, -9.568813, -13.520659], dtype=float32)]
but what I really wanted was:
[0.32, 0.38666666666666666, 0.4, 0.43333333333333335]
I knew something was wrong here so I decided to train and test my CNN on a single image (so it should overfit and predict the right bounding box for this single image if it worked). Even after overfitting on this single image, the predicted values were ridiculously high.
So my question is:
What am I doing wrong ?
EDIT 1
After trying #Matias's solution which was to add a sigmoid activation function to the last layer, all of the output's values are now between [0,1].
But, even with this, the model still produces bad outputs.
For example, after training it 10 epochs on the same image, it predicted this:
[array([0.0000000e+00, 0.0000000e+00, 8.4378130e-18, 4.2288357e-07],dtype=float32)]
but what I expected was:
[0.2866666666666667, 0.31333333333333335, 0.44666666666666666, 0.5]
EDIT 2
Okay, so, after experimenting for quite a while, I've come to a conclusion that the problem was either my model (the way it is built)
or the lack of training data.
But even if it was caused by a lack of training data, I should have been able to overfit it on 1 image in order to get the right predictions for this one, right?
I created another post which asks about my last question since the original one has been answered and I don't want to completely re-edit the post since it would make the first answers kind of pointless.
Since your targets (the Y values) are normalized to the [0, 1] range, the output of the model should match this range. For this you should use a sigmoid activation at the output layer, so the output is constrained to the [0, 1] range:
model.add(Dense(4, activation='sigmoid'))
I want to predict the estimated wait time based on images using a CNN. So I would imagine that this would use a CNN to output a regression type output using a loss function of RMSE which is what I am using right now, but it is not working properly.
Can someone point out examples that use CNN image recognition to output a scalar/regression output (instead of a class output) similar to wait time so that I can use their techniques to get this to work because I haven't been able to find a suitable example.
All of the CNN examples that I found are for the MSINT data and distinguishing between cats and dogs which output a class output, not a number/scalar output of wait time.
Can someone give me an example using tensorflow of a CNN giving a scalar or regression output based on image recognition.
Thanks so much! I am honestly super stuck and am getting no progress and it has been over two weeks working on this same problem.
Check out the Udacity self-driving-car models which take an input image from a dash cam and predict a steering angle (i.e. continuous scalar) to stay on the road...usually using a regression output after one or more fully connected layers on top of the CNN layers.
https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models
Here is a typical model:
https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/autumn
...it uses tf.atan() or you can use tf.tanh() or just linear to get your final output y.
Use MSE for your loss function.
Here is another example in keras...
model = models.Sequential()
model.add(convolutional.Convolution2D(16, 3, 3, input_shape=(32, 128, 3), activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(32, 3, 3, activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(convolutional.Convolution2D(64, 3, 3, activation='relu'))
model.add(pooling.MaxPooling2D(pool_size=(2, 2)))
model.add(core.Flatten())
model.add(core.Dense(500, activation='relu'))
model.add(core.Dropout(.5))
model.add(core.Dense(100, activation='relu'))
model.add(core.Dropout(.25))
model.add(core.Dense(20, activation='relu'))
model.add(core.Dense(1))
model.compile(optimizer=optimizers.Adam(lr=1e-04), loss='mean_squared_error')
They key difference from the MNIST examples is that instead of funneling down to a N-dim vector of logits into softmax w/ cross entropy loss, for your regression output you take it down to a 1-dim vector w/ MSE loss. (you can also have a mix of multiple classification and regression outputs in the final layer...like in YOLO object detection)
The key is to have NO activation function in your last Fully Connected (output) layer. Note that you must have at least 1 FC layer beforehand.
I am running a CNN for left and right shoeprint classfication. I have 190,000 training images and I use 10% of it for validation. My model is setup as shown below. I get the paths of all the images, read them in and resize them. I normalize the image, and then fit it to the model. My issue is that I have stuck at a training accuracy of 62.5% and a loss of around 0.6615-0.6619. Is there something wrong that I am doing? How can I stop this from happening?
Just some interesting points to note:
I first tested this on 10 images I was having the same issue but changing the optimizer to adam and batch size to 4 worked.
I then tested on more and more images, but each time I would need to change the batch size to get improvements in the accuracy and loss. With 10,000 images I had to use a batch size of 500 and optimizer rmsprop. However, the accuracy and loss only really began to change after epoch 10.
I am now training on 190,000 images and I cannot increase the batch size as my GPU is at is max.
imageWidth = 50
imageHeight = 150
def get_filepaths(directory):
file_paths = []
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths
def cleanUpPaths(fullFilePaths):
cleanPaths = []
for f in fullFilePaths:
if f.endswith(".png"):
cleanPaths.append(f)
return cleanPaths
def getTrainData(paths):
trainData = []
for i in xrange(1,190000,2):
im = image.imread(paths[i])
im = image.imresize(im, (150,50))
im = (im-255)/float(255)
trainData.append(im)
trainData = np.asarray(trainData)
right = np.zeros(47500)
left = np.ones(47500)
trainLabels = np.concatenate((left, right))
trainLabels = np_utils.to_categorical(trainLabels)
return (trainData, trainLabels)
#create the convnet
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(imageWidth,imageHeight,1),strides=1))#32
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu',strides=1))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 3)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (1, 2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])
#prepare the training data*/
trainPaths = get_filepaths("better1/train")
trainPaths = cleanUpPaths(trainPaths)
(trainData, trainLabels) = getTrainData(trainPaths)
trainData = np.reshape(trainData,(95000,imageWidth,imageHeight,1)).astype('float32')
trainData = (trainData-255)/float(255)
#train the convnet***
model.fit(trainData, trainLabels, batch_size=500, epochs=50, validation_split=0.2)
#/save the model and weights*/
model.save('myConvnet_model5.h5');
model.save_weights('myConvnet_weights5.h5');
I've had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.
Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*
Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it's predicting all one class.
Fixes/Checks (in somewhat of an order):
Double Check Model Architecture: use model.summary(), inspect the model.
Check Data Labels: make sure the labelling of your train data hasn't got mixed up somewhere in the preprocessing etc. (it happens!)
Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)
If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I'll try to update the list.
**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained "enough"
*Other names for the issue to help searches get here...
keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output
You can try to add a BatchNornmalization() layer after MaxPooling2D(). It works for me.
I just have 2 things more to add to the great list of DBCerigo.
Check activation functions: some layers have linear activation function by default, if you do not insert some non linearity into your model it wont be able to generalize, so the net will try to learn how to separate linearly a feature space that is not linear. Making sure you have your non linearity set is a good checkpoint.
Check Model Complexity: if you have a relatively simple model and it learns only till the 1st or the 2nd epoch and then it stalls, it may be that it is trying to learn something too complex. Try making the model deeper. This usually happens when working with frozen models with only 1 or 2 layers unfrozen.
Although the 2nd one may be obvious, I run into his problem once and I lost lots of time checking everythin (data, batches, LR...) before figuring out.
Hope this helps
I would try a couple of things. A lower learning rate should help with more data. Generally, adapting the optimizer should help. Additionally your network seems really small, you might want to increase the capacity of the model by adding layers or increasing the number of filters in the layers.
A better description on how to apply deep learning in practice is given here.
in my case it is the activification function matters. I change from 'sgd' to 'a'