Median filter nullifies the numpy array value - numpy

I have an image which I converted to numpy array and after that applying median filter using scipy library on that changes all elements of that ndarray to zero.I don't know why this is happening and I assume it should not happen.
from PIL import Image
import numpy as np
from scipy.signal import medfilt
train = np.array(Image.open("26.jpg").getdata() ,dtype=float).reshape(176, 208, 1)
s = np.sum(train, axis =0)
print(s)
train = medfilt(train, kernel_size= 3)
s1 = np.sum(train, axis =0)
print(s1)
Due to this issue I can't go for further image processing.

medfilt effectively zeropads at the boundaries. Since you have one dimension of size 1, in this direction every pixel is sandwiched between two zeros which outvote everything.
Try omitting the third dimension
train = np.array(Image.open("26.jpg").getdata() ,dtype=float).reshape(176, 208)
and you should be fine.
You can add it after filtering if need be.

Related

TensorFlow:Failed to convert a NumPy array to a Tensor (Unsupported object type int)

I am practicing on this kaggle dataset regarding car price prediction (https://www.kaggle.com/hellbuoy/car-price-prediction). I dont know why am I receiving this error.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras import layers,models
cars_data=pd.read_csv('/content/CarPrice_Assignment.csv')
cars_data.head()
cars_data.info()
cars_data.describe()
train_data=cars_data.iloc[:103]
train_data=train_data.drop('price',axis=1)
train_data=np.asarray(train_data.values)
train_targets=cars_data.price.iloc[:103]
train_targets=np.asarray(train_targets)
test_data=cars_data.iloc[103:165]
test_data=test_data.drop('price',axis=1)
test_data=np.asarray(test_data.values)
test_targets=cars_data.price.iloc[103:165]
test_targets=np.asarray(test_targets)
val_data=cars_data.iloc[165:]
val_data=val_data.drop('price',axis=1)
val_data=np.asarray(val_data.values)
val_targets=cars_data.price.iloc[165:]
val_targets=np.asarray(val_targets)
model=models.Sequential()
model.add(layers.Dense(10,activation='relu',input_shape=(25,)))
model.add(layers.Dense(8,activation='relu'))
model.add(layers.Dense(6,activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse',metrics=['mae'])
model.fit(train_data,train_targets,epochs=20,batch_size=1)
There are 2 things you need to address in your code.
Categorical Variables
By printing the value of train_data, I can see there are still some categorical variables in form of string. Tensorflow cannot process that kind of data directly, so you need to deal with categorical variables. See answer from Best way to deal with categorical variables in regression problem - python as your starting point.
target shape
Your train_targets shape is (107,) means that this is a 1D array. The correct shape for tensorflow input(for simple regression problem) is (107,1). Modify your code like this to reshape the value :
train_targets=np.asarray(train_targets).reshape(-1,1)

FFT of exponentially decaying sinusoidal function

I have a set of simulation data to which I want to perform an FFT. I am using matplotlib to do this. However, the FFT is looking strange, so I don't know if I am missing something in my code. Would appreciate any help.
Original data:
time-varying data
FFT:
FFT
Code for the FFT calculation:
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack as fftpack
data = pd.read_csv('table.txt',header=0,sep="\t")
fig, ax = plt.subplots()
mz_res=data[['mz ()']].to_numpy()
time=data[['# t (s)']].to_numpy()
ax.plot(time[:300],mz_res[:300])
ax.set_title("Time-varying mz component")
ax.set_xlabel('time')
ax.set_ylabel('mz amplitude')
fft_res=fftpack.fft(mz_res[:300])
power=np.abs(fft_res)
frequencies=fftpack.fftfreq(fft_res.size)
fig2, ax_fft=plt.subplots()
ax_fft.plot(frequencies[:150],power[:150]) // taking just half of the frequency range
I am just plotting the first 300 datapoints because the rest is not important.
Am I doing something wrong here? I was expecting single frequency peaks not what I got. Thanks!
Link for the input file:
Pastebin
EDIT
Turns out the mistake was in the conversion of the dataframe to a numpy array. For a reason I have yet to understand, if I convert a dataframe to a numpy array it is converted as an array of arrays, i.e., each element of the resulting array is itself an array of a single element. When I change the code to:
mz_res=data['mz ()'].to_numpy()
so that it is a conversion from a pandas series to a numpy array, then the FFT behaves as expected and I get single frequency peaks from the FFT.
So I just put this here in case someone else finds it useful. Lesson learned: the conversion from a pandas series to a numpy array yields a different result than the conversion from a pandas dataframe.
Solution:
Using the conversion from pandas series to numpy array instead of pandas dataframe to numpy array.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack as fftpack
data = pd.read_csv('table.txt',header=0,sep="\t")
fig, ax = plt.subplots()
mz_res=data['mz ()'].to_numpy() #series to array
time=data[['# t (s)']].to_numpy() #dataframe to array
ax.plot(time,mz_res)
ax.set_title("Time-varying mz component")
ax.set_xlabel('time')
ax.set_ylabel('mz amplitude')
fft_res=fftpack.fft(mz_res)
power=np.abs(fft_res)
frequencies=fftpack.fftfreq(fft_res.size)
indices=np.where(frequencies>0)
freq_pos=frequencies[indices]
power_pos=power[indices]
fig2, ax_fft=plt.subplots()
ax_fft.plot(freq_pos,power_pos) # taking just half of the frequency range
ax_fft.set_title("FFT")
ax_fft.set_xlabel('Frequency (Hz)')
ax_fft.set_ylabel('FFT Amplitude')
ax_fft.set_yscale('linear')
Yields:
Time-dependence
FFT

What is the difference between doing a regression with a dataframe and ndarray?

I would like to know why would I need to convert my dataframe to ndarray when doing a regression, since I get the same result for intercept and coef when I do not convert it?
import matplotlib.pyplot as plt
import pandas as pd
import pylab as pl
import numpy as np
from sklearn import linear_model
%matplotlib inline
# import data and create dataframe
!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
df = pd.read_csv("FuelConsumption.csv")
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
# Split train/ test data
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
# Modeling
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
**# if I use the dataframe, train[['ENGINESIZE']] for 'x', and train[['CO2EMISSIONS']] for 'y'
below, I get the same result**
regr.fit (train_x, train_y)
# The coefficients
print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)
Thank you very much!
So df is the loaded dataframe, cdf is another frame with selected columns, and train is selected rows.
train[['ENGINESIZE']] is a 1 column dataframe (I believe train['ENGINESIZE'] would be a pandas Series).
I believe the preferred syntax for getting an array from the dataframe is:
train[['ENGINESIZE']].values # or
train[['ENGINESIZE']].to_numpy()
though
np.asanyarray(train[['ENGINESIZE']])
is supposed to do the same thing.
Digging down through the regr.fit code I see that it calls sklearn.utils.check_X_y which in turn calls sklearn.tils.check_array. That takes care of converting the inputs to numpy arrays, with some awareness of pandas dataframe peculiarities (such as multiple dtypes).
So it appears that if fit accepts your dataframes, you don't need to convert them ahead of time. But if you can get a nice array from the dataframe, there's no harm in do that either. Either way the fit is done with arrays, derived from the dataframe.

Convert an image format from 32FC1 to 16UC1

I need to encode an image in 16UC1 format, but I receive the error:
cv_bridge.core.CvBridgeError:encoding specified as 16UC1, but image has incompatible type 32FC1
I tried to use skimage function img_as_uint but since my image values are not between -1 and 1 it doesn't work. i also tried to "normalize" my values by dividing all of them by the value obtained from np.amax, but using the skimage function only returns a blank image.
Is there a way of achieving this conversion?
This is the original 32FC1 image
With numpy you should be able to:
import numpy as np
img = np.random.normal(0, 1, (300, 300, 3)).astype(np.float32) # simulated image
uimg = img.astype(np.uint16)
You probably will first want to do some kind of normalization if it isn't already in an unsigned range. Probably something like:
img_normalized = (img-img.min())/(img.max()-img.min())*256**2
But your normalization strategy will depend on what you want to accomplish.
Thanks for sharing an image. I can visualize as follows:
import numpy as np
import matplotlib.pyplot as plt
arr = np.load('32FC1_image.npz')
img = arr['arr_0']
img = np.squeeze(img) # this gets rid of the extra dimensions that are causing matplotlib to not recognize it as an image, the extra dimensions also may be causing your problems
img_normalized = (img-img.min())/(img.max()-img.min())*256**2
img_normalized = img_normalized.astype(np.uint16)
plt.imshow(img_normalized)
Try using the normalized 16 bit image.

Slicing the channels of image and storing the channels into numpy array(same size as image). Plotting the numpy array not giving the original image

I separated the 3 channels of an colour image. I created a new NumPy array of the same size as the image, and stored the 3 channels of the image into 3 slices of the 3D NumPy array. After plotting the NumPy array, the plotted image is not same as original image. Why is this happening?
Both img and new_img array have same elements, but image is different.
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
img=mpimg.imread('/storage/emulated/0/1sumint/kali5.jpg')
new_img=np.empty(img.shape)
new_img[:,:,0]=img[:,:,0]
new_img[:,:,1]=img[:,:,1]
new_img[:,:,2]=img[:,:,2]
plt.imshow(new_img)
plt.show()
Expect the same image as original image.
The problem is that your new image will be created with the default data type of float64 on this line:
new_img=np.empty(img.shape)
unless you specify a different dtype.
You can either (best) copy the original image's dtype like this:
new_img = np.empty(im.shape, dtype=img.dtype)
or use something like this:
new_img = np.zeros_like(im)
or (worst) specify one you happen to know matches your data, like this,
new_img = np.empty(im.shape, dtype=np.uint8)
I presume you have some reason for copying one channel at a time, but if not, you can avoid all the foregoing issues and just do:
new_img = np.copy(img)