pandas df.plot() not working - TypeError: float() argument must be a string or a number, not 'Period' - pandas

Trying to use df.plot() and I keep getting the below error - I think it might be caused to pandas version?
TypeError: float() argument must be a string or a number, not 'Period'
Plotting time series data:
enter image description here

Related

How to properly tokenize column in pandas?

I am trying to solve tokenization problem in my dataset with comments from social media. I want to tokenize, lemmatize, remove punctuations and stop-words from the pandas column. I am struggling how to do it for each of the comment. I receive the following error when trying to get tokens:
import pandas as pd
import nltk
...
merged['message_tokens'] = merged.apply(lambda x: nltk.tokenize.word_tokenize(x['Clean_message']), axis=1)
TypeError: expected string or bytes-like object
When I am trying to tell pandas that I am passing it a string object, it gives me the following error message:
merged['message_tokens'] = merged.apply(lambda x: nltk.tokenize.word_tokenize(x['Clean_message'].str), axis=1)
AttributeError: 'str' object has no attribute 'str'
What am I doing wrong?
You can use astype to force the column type to string
merged['Clean_message'] = merged['Clean_message'].astype(str)
If you want to look at what's wrong in original column, you can use
m = merged['Clean_message'].apply(type).ne(str)
out = merged[m]
out dataframe contains the rows where the type of Clean_message column is not string.

unsupported operand type(s) for -: 'str' and 'str' - Implementation LimeTabularExplainer

my issue is that I get the following error
enter image description here
when I try to follow the implementation given in the notebook
https://github.com/klemag/PyconUS_2019-model-interpretability-tutorial/blob/master/02-interpretability_LIME-solution.ipynb
by Kevin Lemagnen.
I have used his suggested way of preprocessing the data and converting them to the format needed by the LIME XAI technique.
I have used the following helper function:
`def convert_to_lime_format(X, categorical_names, col_names=None, invert=False):
"""Converts data with categorical values as string into the right format
for LIME, with categorical values as integers labels.
It takes categorical_names, the same dictionary that has to be passed
to LIME to ensure consistency.
col_names and invert allow to rebuild the original dataFrame from
a numpy array in LIME format to be passed to a Pipeline or sklearn
OneHotEncoder
"""
# If the data isn't a dataframe, we need to be able to build it
if not isinstance(X, pd.DataFrame):
X_lime = pd.DataFrame(X, columns=col_names)
else:
X_lime = X.copy()
for k, v in categorical_names.items():
if not invert:
label_map = {
str_label: int_label for int_label, str_label in enumerate(v)
}
else:
label_map = {
int_label: str_label for int_label, str_label in enumerate(v)
}
X_lime.iloc[:, k] = X_lime.iloc[:, k].map(label_map)
return X_lime`
How can I fix this issue? Any help would be greatly appreciated.
I have already looked around on Stackoverflow and I have googled the TypeError and I found the following explanation:
The python error TypeError: unsupported operand type(s) for -: ‘str’ and ‘str’ occurs when you try to subtract a string from another that contains numbers in both strings. The TypeError is due to the operand type minus (‘-‘) is unsupported between str (string). Auto casting is not supported by python. You can subtract a number from a different number. If you try to subtract a string from another string that may contain a number, the error TypeError: unsupported operand type(s) for -: ‘str’ and ‘str’ will be thrown.
In python, an arithmetic operation can be used between valid numbers. For example, you can subtract a number from a different number. The integer can be subtracted from a float number. If you try to subtract a string from a string that contains a number, the error TypeError: unsupported operand type(s) for -: ‘str’ and ‘str’ will be thrown.
Objects other than numbers can not be used in python substraction. The arithmetic subtract can be used only for numbers. If a number is stored as a string, it should be converted to an integer before subtracting it from each string. If you try to subtract a string to a string containing a number, the error TypeError: unsupported operand type(s) for +: ‘str’ and ‘str’ will be shown.
However, I was not able to resolve the problem.

TypeError converting from pandas data frame to numpy array

I am getting TypeError after converting pandas dataframe to numpy array (after using pd.get_dummies or by creating dummy variables from the dataframe using df.apply function) if the columns are of mixed types int, str and float.
I am not getting these errors if only using mixed types int, and str.
code:
df = pd.DataFrame({'a':[1,2]*2, 'b':['m','f']*2, 'c':[0.2, .1, .3, .5]})
dfd = pd.get_dummies(df, drop_first=True, dtype=int)
dfd.values
Error: TypeError: '<' not supported between instances of 'str' and 'int'
I am getting error with dfd.to_numpy() too.
Even if I convert the dataframe dfd to int or float values using df.astype,
dfd.to_numpy() is still producing error. I am getting error even if only selecting columns which were not changed from df.
Goal:
I am encoding categorical features of the dataframe to one hot encoding, and then want to use SelectKBest with score_func=mutual_info_classif to select some features. The error produced by the code after fitting SelectKBest is same as the error produced by dfd.to_numpy() and hence I am assuming that the error is being produced when SelectKBest is trying to convert dataframe to numpy.
Besides, just using mutual_info_classif to get scores for corresponding features is working.
How should I debug it? Thanks.
pandas converting to numpy error for mixed types

Text lemmatization in pyspark return a TypeError: 'Column' object is not callable

I spent a lot of time trying to find the solution to this one. I am working with pyspark on a text column dataframe that I tokenized, and I am trying to lemmatize it using nltk but this gives back an error: TypeError: 'Column' object is not callable. My code is the following:
def lemmatize(fullCorpus):
lemmatizer = nltk.stem.WordNetLemmatizer()
lemmatized = fullCorpus['filtered'].apply(lambda row: list(list(map(lemmatizer.lemmatize,y)) for y in row))
return lemmatized
The above function works fine until I want to apply it to my dataframe string column:
election.withColumn("filtered", lemmatize(col("filtered")))
Where election is my dataframe and "filtered" is the column I would like to lemmatize
It returns the following error: AttributeError: 'DataFrame' object has no attribute 'col'
I tried many ways like using f function but in vain: election.withColumn("filtered", lemmatize(f.col("filtered")))
I also tried map and apply function on my dataframe and it threw this error: AttributeError: 'DataFrame' object has no attribute 'apply'

tf.py_func InvalidArgumentError

I'm trying to wrap a python function into tensorflow using tf.py_func() and getting an InvalidArgumentError which I don't understand.
I'm passing two 2-d tensors and function returns a float value.
It's hard to tell for sure without the code of your distcorr() function, but it seems that, as the error says, the function returns a double / float64 while you are telling tf.py_func() to expect a float32 (c.f. the tf.float32 parameter).
Either modify your function to cast the results before returning it (e.g. your_result.astype(numpy.float32) or change the dtype parameter of tf.py_func() to tf.float64.