TensorFlow:Failed to convert a NumPy array to a Tensor (Unsupported object type int) - tensorflow

I am practicing on this kaggle dataset regarding car price prediction (https://www.kaggle.com/hellbuoy/car-price-prediction). I dont know why am I receiving this error.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras import layers,models
cars_data=pd.read_csv('/content/CarPrice_Assignment.csv')
cars_data.head()
cars_data.info()
cars_data.describe()
train_data=cars_data.iloc[:103]
train_data=train_data.drop('price',axis=1)
train_data=np.asarray(train_data.values)
train_targets=cars_data.price.iloc[:103]
train_targets=np.asarray(train_targets)
test_data=cars_data.iloc[103:165]
test_data=test_data.drop('price',axis=1)
test_data=np.asarray(test_data.values)
test_targets=cars_data.price.iloc[103:165]
test_targets=np.asarray(test_targets)
val_data=cars_data.iloc[165:]
val_data=val_data.drop('price',axis=1)
val_data=np.asarray(val_data.values)
val_targets=cars_data.price.iloc[165:]
val_targets=np.asarray(val_targets)
model=models.Sequential()
model.add(layers.Dense(10,activation='relu',input_shape=(25,)))
model.add(layers.Dense(8,activation='relu'))
model.add(layers.Dense(6,activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse',metrics=['mae'])
model.fit(train_data,train_targets,epochs=20,batch_size=1)

There are 2 things you need to address in your code.
Categorical Variables
By printing the value of train_data, I can see there are still some categorical variables in form of string. Tensorflow cannot process that kind of data directly, so you need to deal with categorical variables. See answer from Best way to deal with categorical variables in regression problem - python as your starting point.
target shape
Your train_targets shape is (107,) means that this is a 1D array. The correct shape for tensorflow input(for simple regression problem) is (107,1). Modify your code like this to reshape the value :
train_targets=np.asarray(train_targets).reshape(-1,1)

Related

How to apply pandas data on word2vec

I am trying to use W2V.
I saved my preprocessed data as a pandas dataframe, and I want to apply the word2vec algorithm to my preprocessed data.
This is my data.
http://naver.me/IFjLAHld
This is my code.
from gensim.models.word2vec import Word2Vec
import pandas as pd
import numpy as np
df = pd.read_excel('re_nlp0820.xlsx')
model = Word2Vec(df['nlp'],
sg=1,
window=3,
min_count=1,
workers=4,
iter=1)
model.init_sims(replace=True)
model_result1 = model.wv.most_similar('국민', topn =20)
print(model_result1)
Please, help me
First you need to convert the data you are passing to the Word2Vec instance into a nested list where each list contains the tokenized form of the text. You can do so by:
from gensim.models.word2vec import Word2Vec
import pandas as pd
import numpy as np
import nltk
df = pd.read_excel('re_nlp0820.xlsx')
nlp = [nltk.word_tokenize(i) for i in df['nlp']]
model = Word2Vec(nlp,
sg=1,
window=3,
min_count=1,
workers=4,
iter=1)
model.init_sims(replace=True)
model_result1 = model.wv.most_similar('국민', topn =20)
print(model_result1)
Gensim's Word2Vec needs as its training corpus a re-iterable sequence, where each item is a list-of-words.
You df['nlp'] is probably just a sequence of strings, so it's not in the right format. You should make sure each of its items is broken into a Python list that has your desired words as individual strings.
(Separately: min_count=1 is almost always a bad idea with this algorithm, which gives better results if rare words with few usage examples are discarded. And, you shouldn't need to call .init_sims() at all.)

MATLAB .mat in Pandas DataFrame to be used in Tensorflow

I have gone days trying to figure this out, hopefully someone can help.
I am uploading a .mat file into python using scipy.io, placing the struct into a dataframe, which will then be used in Tensorflow.
from scipy.io import loadmat
import pandas as pd
import numpy as p
import matplotlib.pyplot as plt
#import TF
path = '/home/anthony/PycharmProjects/Deep_Learning_MATLAB/circuit-data/for tinghao/template1-lib5-eqns-CR-RESULTS-SET1-FINAL.mat'
raw_data = loadmat(path, squeeze_me=True)
data = raw_data['Graphs']
df = pd.DataFrame(data, dtype=int)
df.pop('transferFunc')
print(df.dtypes)
The out put is:
A object
Ln object
types object
nz int64
np int64
dtype: object
Process finished with exit code 0
The struct is (43249x6). Each cell in the 'A' column is a different sized matrix, i.e. 18x18, or 16x16 etc. Each cell in "Ln" is a row of letters each in their own separate cell. Each cell in 'Types' contains 12 columns of numbers, and 'nz' and 'np' i have no issues with.
I want to put all columns into a dataframe, and use column A or LN or Types as the 'Labels' and nz and np as 'features', again i do not have issues with the latter. Can anyone help with this or have some kind of work around.
The end goal is to have tensorflow train on nz and np and give me either a matrix, Ln, or Type.
What type of data is your .mat file of ? Is your application very time critical?
If you can collect all your data in a struct you could give jsonencode a try, make the struct a json file and load it back into python via json (see json documentation on loading data).
Then you can create a pandas dataframe via
pd.df.from_dict()
Of course this would only be a workaround. Still you would have to ensure your data in the MATLAB struct is correctly orderer to be then imported and transferred to a df.
raw_data = loadmat(path, squeeze_me=True)
data = raw_data['Graphs']
graph_labels = pd.DataFrame()
graph_labels['perf'] = raw_data['Objective'][0:1000]
graph_labels['np'] = data['np'][0:1000]
The code above helped out. Its very simple and drawn out, but it got the job done. But, it does not work in tensorflow because tensorflow does not accept this format, and that was my main issue. I have to convert adjacency matrices to networkx graphs, then upload them into stellargraph.

FFT of exponentially decaying sinusoidal function

I have a set of simulation data to which I want to perform an FFT. I am using matplotlib to do this. However, the FFT is looking strange, so I don't know if I am missing something in my code. Would appreciate any help.
Original data:
time-varying data
FFT:
FFT
Code for the FFT calculation:
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack as fftpack
data = pd.read_csv('table.txt',header=0,sep="\t")
fig, ax = plt.subplots()
mz_res=data[['mz ()']].to_numpy()
time=data[['# t (s)']].to_numpy()
ax.plot(time[:300],mz_res[:300])
ax.set_title("Time-varying mz component")
ax.set_xlabel('time')
ax.set_ylabel('mz amplitude')
fft_res=fftpack.fft(mz_res[:300])
power=np.abs(fft_res)
frequencies=fftpack.fftfreq(fft_res.size)
fig2, ax_fft=plt.subplots()
ax_fft.plot(frequencies[:150],power[:150]) // taking just half of the frequency range
I am just plotting the first 300 datapoints because the rest is not important.
Am I doing something wrong here? I was expecting single frequency peaks not what I got. Thanks!
Link for the input file:
Pastebin
EDIT
Turns out the mistake was in the conversion of the dataframe to a numpy array. For a reason I have yet to understand, if I convert a dataframe to a numpy array it is converted as an array of arrays, i.e., each element of the resulting array is itself an array of a single element. When I change the code to:
mz_res=data['mz ()'].to_numpy()
so that it is a conversion from a pandas series to a numpy array, then the FFT behaves as expected and I get single frequency peaks from the FFT.
So I just put this here in case someone else finds it useful. Lesson learned: the conversion from a pandas series to a numpy array yields a different result than the conversion from a pandas dataframe.
Solution:
Using the conversion from pandas series to numpy array instead of pandas dataframe to numpy array.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack as fftpack
data = pd.read_csv('table.txt',header=0,sep="\t")
fig, ax = plt.subplots()
mz_res=data['mz ()'].to_numpy() #series to array
time=data[['# t (s)']].to_numpy() #dataframe to array
ax.plot(time,mz_res)
ax.set_title("Time-varying mz component")
ax.set_xlabel('time')
ax.set_ylabel('mz amplitude')
fft_res=fftpack.fft(mz_res)
power=np.abs(fft_res)
frequencies=fftpack.fftfreq(fft_res.size)
indices=np.where(frequencies>0)
freq_pos=frequencies[indices]
power_pos=power[indices]
fig2, ax_fft=plt.subplots()
ax_fft.plot(freq_pos,power_pos) # taking just half of the frequency range
ax_fft.set_title("FFT")
ax_fft.set_xlabel('Frequency (Hz)')
ax_fft.set_ylabel('FFT Amplitude')
ax_fft.set_yscale('linear')
Yields:
Time-dependence
FFT

What is the difference between doing a regression with a dataframe and ndarray?

I would like to know why would I need to convert my dataframe to ndarray when doing a regression, since I get the same result for intercept and coef when I do not convert it?
import matplotlib.pyplot as plt
import pandas as pd
import pylab as pl
import numpy as np
from sklearn import linear_model
%matplotlib inline
# import data and create dataframe
!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
df = pd.read_csv("FuelConsumption.csv")
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
# Split train/ test data
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
# Modeling
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
**# if I use the dataframe, train[['ENGINESIZE']] for 'x', and train[['CO2EMISSIONS']] for 'y'
below, I get the same result**
regr.fit (train_x, train_y)
# The coefficients
print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)
Thank you very much!
So df is the loaded dataframe, cdf is another frame with selected columns, and train is selected rows.
train[['ENGINESIZE']] is a 1 column dataframe (I believe train['ENGINESIZE'] would be a pandas Series).
I believe the preferred syntax for getting an array from the dataframe is:
train[['ENGINESIZE']].values # or
train[['ENGINESIZE']].to_numpy()
though
np.asanyarray(train[['ENGINESIZE']])
is supposed to do the same thing.
Digging down through the regr.fit code I see that it calls sklearn.utils.check_X_y which in turn calls sklearn.tils.check_array. That takes care of converting the inputs to numpy arrays, with some awareness of pandas dataframe peculiarities (such as multiple dtypes).
So it appears that if fit accepts your dataframes, you don't need to convert them ahead of time. But if you can get a nice array from the dataframe, there's no harm in do that either. Either way the fit is done with arrays, derived from the dataframe.

Median filter nullifies the numpy array value

I have an image which I converted to numpy array and after that applying median filter using scipy library on that changes all elements of that ndarray to zero.I don't know why this is happening and I assume it should not happen.
from PIL import Image
import numpy as np
from scipy.signal import medfilt
train = np.array(Image.open("26.jpg").getdata() ,dtype=float).reshape(176, 208, 1)
s = np.sum(train, axis =0)
print(s)
train = medfilt(train, kernel_size= 3)
s1 = np.sum(train, axis =0)
print(s1)
Due to this issue I can't go for further image processing.
medfilt effectively zeropads at the boundaries. Since you have one dimension of size 1, in this direction every pixel is sandwiched between two zeros which outvote everything.
Try omitting the third dimension
train = np.array(Image.open("26.jpg").getdata() ,dtype=float).reshape(176, 208)
and you should be fine.
You can add it after filtering if need be.