How to resolve :first argument must be an iterable of pandas objects, you passed an object of type "DataFrame" in Pandas - pandas

I am getting this error:
first argument must be an iterable of pandas objects, you passed an object of type "DataFrame".
My code:
for f in
glob.glob("C:/Users/panksain/Desktop/aovaNALYSIS/CX AOV/Report*.csv"):
data = pd.concat(pd.read_csv(f,header = None, names = ("Metric Period", "")), axis=0, ignore_index=True)

concat takes a list of dataframes to concat with. You can first build the list and then do the concat at last:
dfs = []
for f in glob.glob("C:/Users/panksain/Desktop/aov aNALYSIS/CX AOV/Report*.csv"):
dfs.append(pd.read_csv(f,header = None, names = ("Metric Period", "")))
data = pd.concat(dfs, axis=0, ignore_index=True)

Related

Convert column names to string in pandas dataframe

Can somebody explain how this works?
df.columns = list(map(str, df.columns))
Your code is not the best way to convert column names to string, use instead:
df.columns = df.columns.astype(str)
Your code:
df.columns = list(map(str, df.columns))
is equivalent to:
df.columns = [str(col) for col in df.columns]
map: for each item in df.columns (iterable), apply the function str on it but the map function returns an iterator, so you need to explicitly execute list to generate the list.

Creating new column in pandas using existing column values as filter using pandas - .isin() fails as Attribute Error

Error: AttributeError: 'int' object has no attribute 'isin'
Question: There are no null values, works in individual code block. Tried to modify the data type of series R to object, error goes : 'str' object has no attribute 'isin'
What am I missing?
Code:
X = [1, 2, 3, 4]
if dg['RFM_Segment'] == '111':
return 'Core'
elif (dg['R'].isin(X) & dg['F'].isin([1]) & dg['M'].isin(X) & (dg['RFM_Segment'] != '111')).any():
return 'Loyal'
elif (dg['R'].isin(X) & dg['F'].isin(X) & dg['M'].isin([1]) & (dg['RFM_Segment'] != '111')).any():
return 'Whales'
elif (dg['R'].isin(X) & dg['F'].isin([1]) & dg['M'].isin([3,4])).any():
return 'Promising'
elif (dg['R'].isin([1]) & dg['F'].isin([4]) & dg['M'].isin(X)).any():
return 'Rookies'
elif (dg['R'].isin([4]) & dg['F'].isin([4]) & dg['M'].isin(X)).any():
return 'Slipping'
else:
return 'NA'
dg['user_segment']= dg.apply(user_segment, axis= 1)
I will assume that you accidentally cut off the top of your code snipet, in which you define user_segment.
The issue lies in the way you tried to use apply. Note that apply will operate on Series, rather than DataFrame. So, by indexing into any element of a series, you will not receive a Series object (as you would when indexing into DataFrame), but rather a object of a given columns' type (like int, str etc.). An example:
import pandas as pd
X = ['a', 'c']
df = pd.DataFrame([['a', 'b'], ['c', 'd'], ['e', 'f']], columns=['col1', 'col2'])
df['col1'].isin(X) # this works, because I'm applying `isin` on the entire column.
def test_apply(x):
print(x['col1'].isin(X))
return x
df.apply(test_apply, axis=1) # this doesn't work,
# because I'm applying `isin` on a non-pandas object, in
# this example `str`

Transform list of pyspark rows into pandas data frame through a dictionary

I am trying to convert a list of PySpark sorted rows to a Pandas data frame using dictionary comprehension but only works when explicitly stating the key and value of the desired dictionary.
row_list = sorted(data, key=lambda row: row['date'])
future_df = {'key': int(key),
'date': map(lambda row: row["date"], row_list),
'col1': map(lambda row: row["col1"], row_list),
'col2': map(lambda row: row["col2"], row_list)}
And then converting it to Pandas with:
pd.DataFrame(future_df)
This operation is to be found inside the class ForecastByKey invoked by:
rdd = df.select('*')
.rdd \
.map(lambda row: ((row['key']), row)) \
.groupByKey() \
.map(lambda args: spark_ops.run(args[0], args[1]))
Up to this point, everything works fine; meaning explicitly indicating the columns inside the dictionary future_df.
The problem arises when trying to convert the whole set of columns (700+) with something like:
future_df = {'key': int(key),
'date': map(lambda row: row["date"], row_list)}
for col_ in columns:
future_df[col_] = map(lambda row: row[col_], row_list)
pd.DataFrame(future_df)
Where columns contains the name of each coumn passed to the ForecastByKey class.
The result of this operation is a data frame with empty or close-to-zero columns.
I am using Python 3.6.10 and PySpark 2.4.5
How is this iteration to be done in order to get a data frame with the right information?
After some research, I realized this can be solved with:
row_list = sorted(data, key=lambda row: row['date'])
def f(x):
return map(lambda row: row[x], row_list)
pre_df = {col_: col_ for col_ in self.sdf_cols}
future_df = toolz.valmap(f, pre_df)
future_df['key'] = int(key)

'numpy.int64' object is not iterable when iterate thru a dictionary

I have a dictionary like this
dd={888202515573088257: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}]),
873697596434513921: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}]),
....,
680055455951884288: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}])}
I want to make a dataframe from this dictionary, like so
df=pd.DataFrame(columns = ['twid','msg'])
for k,v in dd:
df = df.append({'twid': k, 'msg': v},ignore_index = True)
But I get TypeError: 'numpy.int64' object is not iterable. Can someone help me solve this please?
Thanks!
By default, iterating over a dictionary will iterate over the keys. If you want to unpack the (key, value) pairs, you can use dd.items().
In this case, it looks like you don't need the values, so the below should work.
df = pd.DataFrame(columns = ['twid'])
for k in dd:
df = df.append({'twid': k}, ignore_index = True)
Alternatively, you can just pass the keys in when creating the DataFrame.
df = pd.DataFrame(list(dd.keys()), columns=['twid'])
I did this and it works :
df=pd.DataFrame(list(dd.items()), columns=['twid', 'msg'])
df

Concatenate DataFrames.DataFrame in Julia

I have a problem when I try to concatenate multiple DataFrames (a datastructure from the DataFrames package!) with the same columns but different row numbers. Here's my code:
using(DataFrames)
DF = DataFrame()
DF[:x1] = 1:1000
DF[:x2] = rand(1000)
DF[:time] = append!( [0] , cumsum( diff(DF[:x1]).<0 ) ) + 1
DF1 = DF[DF[:time] .==1,:]
DF2 = DF[DF[:time] .==round(maximum(DF[:time])),:]
DF3 = DF[DF[:time] .==round(maximum(DF[:time])/4),:]
DF4 = DF[DF[:time] .==round(maximum(DF[:time])/2),:]
DF1[:T] = "initial"
DF2[:T] = "final"
DF3[:T] = "1/4"
DF4[:T] = "1/2"
DF = [DF1;DF2;DF3;DF4]
The last line gives me the error
MethodError: Cannot `convert` an object of type DataFrames.DataFrame to an object of type LastMain.LastMain.LastMain.DataFrames.AbstractDataFrame
This may have arisen from a call to the constructor LastMain.LastMain.LastMain.DataFrames.AbstractDataFrame(...),
since type constructors fall back to convert methods.
I don't understand this error message. Can you help me out? Thanks!
I just ran into this exact problem on Julia 0.5.0 x86_64-linux-gnu, DataFrames 0.8.5, with both hcat and vcat.
Neither clearing the workspace nor reloading DataFrames solved the problem, but restarting the REPL fixed it immediately.