'numpy.int64' object is not iterable when iterate thru a dictionary

'numpy.int64' object is not iterable when iterate thru a dictionary - dataframe

I have a dictionary like this
dd={888202515573088257: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}]),
873697596434513921: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}]),
....,
680055455951884288: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}])}
I want to make a dataframe from this dictionary, like so
df=pd.DataFrame(columns = ['twid','msg'])
for k,v in dd:
df = df.append({'twid': k, 'msg': v},ignore_index = True)
But I get TypeError: 'numpy.int64' object is not iterable. Can someone help me solve this please?
Thanks!

By default, iterating over a dictionary will iterate over the keys. If you want to unpack the (key, value) pairs, you can use dd.items().
In this case, it looks like you don't need the values, so the below should work.
df = pd.DataFrame(columns = ['twid'])
for k in dd:
df = df.append({'twid': k}, ignore_index = True)
Alternatively, you can just pass the keys in when creating the DataFrame.
df = pd.DataFrame(list(dd.keys()), columns=['twid'])

I did this and it works :
df=pd.DataFrame(list(dd.items()), columns=['twid', 'msg'])
df

Related

How to resolve :first argument must be an iterable of pandas objects, you passed an object of type "DataFrame" in Pandas

I am getting this error:
first argument must be an iterable of pandas objects, you passed an object of type "DataFrame".
My code:
for f in
glob.glob("C:/Users/panksain/Desktop/aovaNALYSIS/CX AOV/Report*.csv"):
data = pd.concat(pd.read_csv(f,header = None, names = ("Metric Period", "")), axis=0, ignore_index=True)

concat takes a list of dataframes to concat with. You can first build the list and then do the concat at last:
dfs = []
for f in glob.glob("C:/Users/panksain/Desktop/aov aNALYSIS/CX AOV/Report*.csv"):
dfs.append(pd.read_csv(f,header = None, names = ("Metric Period", "")))
data = pd.concat(dfs, axis=0, ignore_index=True)

series.str.split(expand=True) returns error: Wrong number of items passed 2, placement implies 1

I have a series of web addresses, which I want to split them by the first '.'. For example, return 'google', if the web address is 'google.co.uk'
d1 = {'id':['1', '2', '3'], 'website':['google.co.uk', 'google.com.au', 'google.com']}
df1 = pd.DataFrame(data=d1)
d2 = {'id':['4', '5', '6'], 'website':['google.co.jp', 'google.com.tw', 'google.kr']}
df2 = pd.DataFrame(data=d2)
df_list = [df1, df2]
I use enumerate to iterate the dataframe list
for i, df in enumerate(df_list):
df_list[i]['website_segments'] = df['website'].str.split('.', n=1, expand=True)
Received error: ValueError: Wrong number of items passed 2, placement implies 1

You are splitting the website which gives you a list-like data structure. Think [google, co.uk]. You just want the first element of that list so:
for i, df in enumerate(df_list):
df_list[i]['website_segments'] = df['website'].str.split('.', n=1, expand=True)[0]
Another alternative is to use extract. It is also ~40% faster for your data:
for i, df in enumerate(df_list):
df_list[i]['website_segments'] = df['website'].str.extract('(.*?)\.')

Transform list of pyspark rows into pandas data frame through a dictionary

I am trying to convert a list of PySpark sorted rows to a Pandas data frame using dictionary comprehension but only works when explicitly stating the key and value of the desired dictionary.
row_list = sorted(data, key=lambda row: row['date'])
future_df = {'key': int(key),
'date': map(lambda row: row["date"], row_list),
'col1': map(lambda row: row["col1"], row_list),
'col2': map(lambda row: row["col2"], row_list)}
And then converting it to Pandas with:
pd.DataFrame(future_df)
This operation is to be found inside the class ForecastByKey invoked by:
rdd = df.select('*')
.rdd \
.map(lambda row: ((row['key']), row)) \
.groupByKey() \
.map(lambda args: spark_ops.run(args[0], args[1]))
Up to this point, everything works fine; meaning explicitly indicating the columns inside the dictionary future_df.
The problem arises when trying to convert the whole set of columns (700+) with something like:
future_df = {'key': int(key),
'date': map(lambda row: row["date"], row_list)}
for col_ in columns:
future_df[col_] = map(lambda row: row[col_], row_list)
pd.DataFrame(future_df)
Where columns contains the name of each coumn passed to the ForecastByKey class.
The result of this operation is a data frame with empty or close-to-zero columns.
I am using Python 3.6.10 and PySpark 2.4.5
How is this iteration to be done in order to get a data frame with the right information?

After some research, I realized this can be solved with:
row_list = sorted(data, key=lambda row: row['date'])
def f(x):
return map(lambda row: row[x], row_list)
pre_df = {col_: col_ for col_ in self.sdf_cols}
future_df = toolz.valmap(f, pre_df)
future_df['key'] = int(key)

How to deal with sublists and dataframe with pandas?

My project is composed by several lists - that I put all together in a dataframe with pandas, to excel.
But one of my list contains sublists, and I don't know how to deal with that.
my_dataframe = pd.DataFrame({
"V1": list1,
"V2": list2,
"V3": list3
})
my_dataframe.to_excel("test.xlsx", sheet_name="Sheet 1", index=False, encoding='utf8')
Let's says that:
list1=[1,2,3]
list2=['a','b','c']
list3=['d',['a','b','c'],'e']
I would like to end in my excel file file with:
I have really no idea how to proceed - if this is even possible?
Any help is welcomed :) Thanks!

Try this before calling to_excel :
my_dataframe = (my_dataframe["V3"].apply(pd.Series)
.merge(my_dataframe.drop("V3", axis = 1), right_index = True, left_index = True)
.melt(id_vars = ['V1', 'V2'], value_name = "V3")
.drop("variable", axis = 1)
.dropna()
.sort_values("V1"))
credits to Bartosz
Hope this helps.

Sort Numpy array by subfield

I have a structured numpy array, in which one of field has subfields:
import numpy, string, random
dtype = [('name', 'a10'), ('id', 'i4'),
('size', [('length', 'f8'), ('width', 'f8')])]
a = numpy.zeros(10, dtype = dtype)
for idx in range(len(a)):
a[idx] = (''.join(random.sample(string.ascii_lowercase, 10)), idx,
numpy.random.uniform(0, 1, size=[1, 2]))
I can easily get it sorted by any of fields, like this:
a.sort(order = ['name'])
a.sort(order = ['size'])
When I try to sort it by a structured field ('size' in this example), it is effectively getting sorted by the first subfield ('length' in this example). However, I would like to have my elements sorted by 'height'. I tried something like this, but it does not work:
a.sort(order = ['size[\'height\']']))
ValueError: unknown field name: size['height']
a.sort(order = ['size', 'height'])
ValueError: unknown field name: height
Therefore, I wonder, if there is a way to accomplish the task?

I believe this is what you want:
a[a["size"]["width"].argsort()]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

'numpy.int64' object is not iterable when iterate thru a dictionary - dataframe

I did this and it works : df=pd.DataFrame(list(dd.items()), columns=['twid', 'msg']) df

Related

How to resolve :first argument must be an iterable of pandas objects, you passed an object of type "DataFrame" in Pandas

series.str.split(expand=True) returns error: Wrong number of items passed 2, placement implies 1

Transform list of pyspark rows into pandas data frame through a dictionary

How to deal with sublists and dataframe with pandas?

Sort Numpy array by subfield

Categories

Resources