How to get samples per class for TensorFlow Dataset - tensorflow

I am using dataset from TensorFlow datasets.
Is there an easy way to access number of samples for each class in dataset? I was searching through keras api, and I did not found any ready to use function.
Ultimately I would like to plot a bar plot with number of samples at Y axis, and int indicating class id at X axis. The goal is to show how evenly is data distributed across classes.

With np.fromiter you can create a 1-D array from an iterable object.
import tensorflow_datasets as tfds
import numpy as np
import seaborn as sns
dataset = tfds.load('cifar10', split='train', as_supervised=True)
labels, counts = np.unique(np.fromiter(dataset.map(lambda x, y: y), np.int32),
return_counts=True)
plt.ylabel('Counts')
plt.xlabel('Labels')
sns.barplot(x = labels, y = counts)
Update: You can also count the labels like below:
labels = []
for x, y in dataset:
# Not one hot encoded
labels.append(y.numpy())
# If one hot encoded, then apply argmax
# labels.append(np.argmax(y, axis = -1))
labels = np.concatenate(labels, axis = 0) # Assuming dataset was batched.
Then you can plot them using the labels array.

Related

How to change xtick of Yellowbrick's Learning Curve visualizer?

I'm trying to change the xtick of Yellowbrick's learning curve figure from number of samples to normalized number(%) of samples. I googled a lot but couldn't find the way.
You need to change the xticks so that they are normalized to the number of training instances thus you need to specify in the percentformatter in my example the number of training instances(55000). I provide the before and after images.
from yellowbrick.model_selection import LearningCurve
from sklearn.naive_bayes import MultinomialNB
import numpy as np
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from yellowbrick.datasets import load_game
import matplotlib.pyplot as plt
# Create subplot
fig,ax = plt.subplots()
# Create the learning curve visualizer
sizes = np.linspace(0.3, 1.0, 10)
# Load a classification dataset
X, y = load_game()
# Encode the categorical data
X = OneHotEncoder().fit_transform(X)
y = LabelEncoder().fit_transform(y)
# Instantiate the classification model and visualizer
model = MultinomialNB()
visualizer = LearningCurve(
model, scoring='f1_weighted', ax=ax, train_sizes=sizes)
xticks = mtick.PercentFormatter(55000)
ax.xaxis.set_major_formatter(xticks)
visualizer.fit(X, y) # Fit the data to the visualizer
visualizer.show()

Linear regression with one feature from Pandas dataframe

I have tried the code below
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
# Assign the dataframe to this variable.
# TODO: Load the data
bmi_life_data = pd.read_csv("bmi_and_life_expectancy.csv")
X= bmi_life_data['BMI'].values.reshape(-1,1)
y = bmi_life_data['Life expectancy'].values.reshape(-1,1)
# Make and fit the linear regression model
#TODO: Fit the model and Assign it to bmi_life_model
bmi_life_model = LinearRegression()
bmi_life_model.fit(X,y)
# Mak a prediction using the model
# TODO: Predict life expectancy for a BMI value of 21.07931
laos_life_exp = bmi_life_model.predict(21.07931)
but it gives me the error
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Even after reshaping it. I have tried to not reshape it but it still gives me the same error.
The error was in the prediction line
laos_life_exp = bmi_life_model.predict(21.07931)
should be
laos_life_exp = bmi_life_model.predict([[21.07931]])
to be of appropriate dimension
Thanks to #onyambu

tensor slicing in tensorflow

I want to do the same numpy operation as follow to make a custom layer
img=cv2.imread('img.jpg') # img.shape =>(600,600,3)
mask=np.random.randint(0,2,size=img.shape[:2],dtype='bool')
img2=np.expand_dims(img,axis=0) #img.shape => (1,600,600,3)
img2[:,mask,:].shape # => (1, 204030, 3)
this is my first attemp but I failed. I can't do the same operation for for tensorflow tensors
class Sampling_layer(keras.layers.Layer):
def __init__(self,sampling_matrix):
super(Sampling_layer,self).__init__()
self.sampling_matrix=sampling_matrix
def call(self,input_img):
return input_img[:,self.sampling_matrix,:]
More Explanations:
I want to define a keras layer so that given a batch of images it use a sampling matrix and give me a batch of sampled vectors for the images.The sampling matrix is a random boolean matrix the same size as the image. The slicing operation I used is straight forward for numpy arrays and works perfectly. but I can't get it done with tensors in tensorflow. I tried to use loops to perform the operation I want manually but I failed.
You can do the following.
import numpy as np
import tensorflow as tf
# Batch of images
img=np.random.normal(size=[2,600,600,3]) # img.shape =>(600,600,3)
# You'll need to match the first 3 dimensions of mask with the img
# for that we'll repeat the first axis twice
mask=np.random.randint(0,2,size=img.shape[1:3],dtype='bool')
mask = np.repeat(np.expand_dims(mask, axis=0), 2, axis=0)
# Defining input layers
inp1 = tf.keras.layers.Input(shape=(600,600,3))
mask_inp = tf.keras.layers.Input(shape=(600,600))
# The layer you're looking for
out = tf.keras.layers.Lambda(lambda x: tf.boolean_mask(x[0], x[1]) )([inp1, mask])
model = tf.keras.models.Model([inp1, mask_inp], out)
# Predict on sample data
toy_out = model.predict([img, mask])
Note that both your images and mask needs to have the same batch size. I couldn't find a solution to make this work without repeating the mask on batch axis to match the batch size of images. This is the only possible solution that came to my mind, (assuming that your mask changes for every batch of data).

How to resize elements in a ragged tensor in TensorFlow

I would like to resize every element in a ragged tensor. For example, if I have a ragged tensor of various sized images, how can I resize each one so that the dimensions are the same?
For example,
digits = tf.ragged.constant([np.zeros((1,60,60,1)), np.zeros((1,46,75,1))])
resize_lambda = lambda x: tf.image.resize(x, (60,60))
res = tf.ragged.map_flat_values(resize_lambda, digits)
I wish res to be a tensor of shape (2,60,60,1). How can I achieve this?
To clarify, this would be useful if within a custom layer we wanted to slice or crop sections from a single image to batch for inference in the next layer. In my case, I am attempting to combine two models (a model to segment an image into multiple cropped images of varying size and a classifier to predict each sub-image). I am also using tf 2.0
You should be able to do the following.
import tensorflow as tf
import numpy as np
digits = tf.ragged.constant([np.zeros((1,60,60,1)), np.zeros((1,46,75,1))])
res = tf.concat(
[tf.image.resize(digits[i].to_tensor(), (60,60)) for i in tf.range(digits.nrows())],
axis=0)

How do I plot a non-linear model using matplotlib?

I'm a bit lost as to how to proceed to achieve this. Normally with a linear model, when I perform linear regressions, I simply take my training data (x) and and my output data (y) and plot them using matplotlib. Now I have 3 features with and my output/observation (y). Can anyone guide me as to how to graph this kind of model using matplotlib? My goal is to fit a polynomial model and graph a polynomial using matplotlib.
%matplotlib inline
import sframe as frame
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
# Initalize SFrame
sales = frame.SFrame('kc_house_data.gl/')
# Separate data into test and training data
train_data,test_data = sales.random_split(.8,seed=0)
# Organize data into training and testing data
train_x = train_data[['sqft_living', 'bedrooms', 'bathrooms']].to_dataframe().values
train_y = train_data[['price']].to_dataframe().values
test_x = test_data[['sqft_living', 'bedrooms', 'bathrooms']].to_dataframe().values
test_y = test_data[['price']].to_dataframe().values
# Create a model using sklearn with multiple features
regr = linear_model.LinearRegression(fit_intercept=True, n_jobs=2)
# test predictions
regr.predict(train_x)
# Prepare to plot the data
Note:
The train_x variable contains my 3 features, and my train_y contains the output data. I use SFrame to contain the data. SFrame has the ability to convert itself into a dataframe (used in Pandas). Using the conversion I am able to grab the values.
Rather than plotting a non-linear model with multiple discrete features at once, I have found that simply observing each and every feature against my observation/output was better and easier for my research.