I am getting the following error when I am trying to convert the output of a SQL query into a dataframe in jupyter notebook. I have already checked other posts on similar topic, but this is a different error. Can someone please explain why this is happening.
Code:
import pandas as pd
k = %sql select * from table1
df = k.DataFrame()
Error: AttributeError: 'DataFrame' object has no attribute 'DataFrame'
k is already an object of DataFrame.
Always check with
type(object)
type(k)
this will give you what type of object it is. Based on that you can further try to convert as required.
Just a suggestion, going further if you want to convert a variable into DataFrame, use pd.DataFrame.
In your case df = pd.DataFrame(k) if it was not a dataframe object
Related
I am trying to use pyspark.pandas to read excel and I need to convert the pandas dataframe to pyspark dataframe.
df = panndas .read_excel(filepath,sheet_name="A", skiprows=12 ,usecols="B:AM",parse_dates=True)
pyspark_df= spark.createDataFrame(df)
when I do this, I got error
TypeError: Can not infer schema for type:
Even though I tried to specify the dtype for the read_excel and define the schema. I still have the error.
df = panndas .read_excel(filepath,sheet_name="A", skiprows=12 ,usecols="B:AM",parse_dates=True,dtype= dtypetest)
pyspark_df= spark.createDataFrame(df,schema)
Would you tell me how to solve it?
I am trying to solve tokenization problem in my dataset with comments from social media. I want to tokenize, lemmatize, remove punctuations and stop-words from the pandas column. I am struggling how to do it for each of the comment. I receive the following error when trying to get tokens:
import pandas as pd
import nltk
...
merged['message_tokens'] = merged.apply(lambda x: nltk.tokenize.word_tokenize(x['Clean_message']), axis=1)
TypeError: expected string or bytes-like object
When I am trying to tell pandas that I am passing it a string object, it gives me the following error message:
merged['message_tokens'] = merged.apply(lambda x: nltk.tokenize.word_tokenize(x['Clean_message'].str), axis=1)
AttributeError: 'str' object has no attribute 'str'
What am I doing wrong?
You can use astype to force the column type to string
merged['Clean_message'] = merged['Clean_message'].astype(str)
If you want to look at what's wrong in original column, you can use
m = merged['Clean_message'].apply(type).ne(str)
out = merged[m]
out dataframe contains the rows where the type of Clean_message column is not string.
How do I eliminate the error below from the given lines of code I have outlined below:
This is the code:
#import the libraries.
import streamlit as st
import pandas as pd
from PIL import Image
#Display the closing price.
st.header(company_name+" Close Price\n")
st.line_chart(df['Close'])
This is the error I am getting:
TypeError: 'NoneType' object is not subscriptable
If your DataFrame contains something like:
Id Open Close
0 AAA 12.15 13.22
1 BBB 24.11 25.11
then df['Close'] retrieves the respective column and the result is:
0 13.22
1 25.11
Name: Close, dtype: float64
(the left column contains indices and the right column - values from
this column).
But when you run: df = None then df['Close'] yields just the
error you described.
So the probable cause is that your code somehow assigned None to df.
Maybe you attempted to read df from some source and this instruction
resulted in assignment of None to df.
Note that to get such error df variable must exist.
Otherwise you would have got another error, namely:
NameError: name 'df' is not defined.
How to cope with this: Make sure that df contains an actual DataFrame,
with the "wanted" column.
Please view the Data Frame by clicking this image
Names=jobs[['Company Name']]
F = lambda x: x.split("\n")
Names.apply(F , axis=1)
AttributeError: 'Series' object has no attribute 'split'
When I run the following code, it works. Why am I facing this issue, I have never faced this kind of a problem before. PS: I got this data from scraping websites, unlike before. I am just hoping it has something to do with this
Names=jobs[['Company Name']]
F = lambda x: x.str.split("\n")
Names.apply(F , axis=1)
When I try it this why :
Ratings = jobs['Company Name'].apply(lambda x:x.split("\n")[1] , axis=1)
I get this error
TypeError: <lambda>() got an unexpected keyword argument 'axis'
You do not need the apply here, str.split is vectorized
jobs['Company Name'].str.split('\n')
should do the job.
I can not tell you why it had not worked before, but I can imagine it is due to the double brackets in [['Company Name']]. Single Brackets would collapse that to a Series while you keep the (2-dimensional Structure) of the Dataframe with the double brackets. See e.g. Python pandas: Keep selected column as DataFrame instead of Series for more details.
I spent a lot of time trying to find the solution to this one. I am working with pyspark on a text column dataframe that I tokenized, and I am trying to lemmatize it using nltk but this gives back an error: TypeError: 'Column' object is not callable. My code is the following:
def lemmatize(fullCorpus):
lemmatizer = nltk.stem.WordNetLemmatizer()
lemmatized = fullCorpus['filtered'].apply(lambda row: list(list(map(lemmatizer.lemmatize,y)) for y in row))
return lemmatized
The above function works fine until I want to apply it to my dataframe string column:
election.withColumn("filtered", lemmatize(col("filtered")))
Where election is my dataframe and "filtered" is the column I would like to lemmatize
It returns the following error: AttributeError: 'DataFrame' object has no attribute 'col'
I tried many ways like using f function but in vain: election.withColumn("filtered", lemmatize(f.col("filtered")))
I also tried map and apply function on my dataframe and it threw this error: AttributeError: 'DataFrame' object has no attribute 'apply'