I have a dictionary like this
dd={888202515573088257: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}]),
873697596434513921: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}]),
....,
680055455951884288: tweepy.error.TweepError([{'code': 144,
'message': 'No status found with that ID.'}])}
I want to make a dataframe from this dictionary, like so
df=pd.DataFrame(columns = ['twid','msg'])
for k,v in dd:
df = df.append({'twid': k, 'msg': v},ignore_index = True)
But I get TypeError: 'numpy.int64' object is not iterable. Can someone help me solve this please?
Thanks!
By default, iterating over a dictionary will iterate over the keys. If you want to unpack the (key, value) pairs, you can use dd.items().
In this case, it looks like you don't need the values, so the below should work.
df = pd.DataFrame(columns = ['twid'])
for k in dd:
df = df.append({'twid': k}, ignore_index = True)
Alternatively, you can just pass the keys in when creating the DataFrame.
df = pd.DataFrame(list(dd.keys()), columns=['twid'])
I did this and it works :
df=pd.DataFrame(list(dd.items()), columns=['twid', 'msg'])
df
Related
I am getting this error:
first argument must be an iterable of pandas objects, you passed an object of type "DataFrame".
My code:
for f in
glob.glob("C:/Users/panksain/Desktop/aovaNALYSIS/CX AOV/Report*.csv"):
data = pd.concat(pd.read_csv(f,header = None, names = ("Metric Period", "")), axis=0, ignore_index=True)
concat takes a list of dataframes to concat with. You can first build the list and then do the concat at last:
dfs = []
for f in glob.glob("C:/Users/panksain/Desktop/aov aNALYSIS/CX AOV/Report*.csv"):
dfs.append(pd.read_csv(f,header = None, names = ("Metric Period", "")))
data = pd.concat(dfs, axis=0, ignore_index=True)
I have a series of web addresses, which I want to split them by the first '.'. For example, return 'google', if the web address is 'google.co.uk'
d1 = {'id':['1', '2', '3'], 'website':['google.co.uk', 'google.com.au', 'google.com']}
df1 = pd.DataFrame(data=d1)
d2 = {'id':['4', '5', '6'], 'website':['google.co.jp', 'google.com.tw', 'google.kr']}
df2 = pd.DataFrame(data=d2)
df_list = [df1, df2]
I use enumerate to iterate the dataframe list
for i, df in enumerate(df_list):
df_list[i]['website_segments'] = df['website'].str.split('.', n=1, expand=True)
Received error: ValueError: Wrong number of items passed 2, placement implies 1
You are splitting the website which gives you a list-like data structure. Think [google, co.uk]. You just want the first element of that list so:
for i, df in enumerate(df_list):
df_list[i]['website_segments'] = df['website'].str.split('.', n=1, expand=True)[0]
Another alternative is to use extract. It is also ~40% faster for your data:
for i, df in enumerate(df_list):
df_list[i]['website_segments'] = df['website'].str.extract('(.*?)\.')
I am trying to convert a list of PySpark sorted rows to a Pandas data frame using dictionary comprehension but only works when explicitly stating the key and value of the desired dictionary.
row_list = sorted(data, key=lambda row: row['date'])
future_df = {'key': int(key),
'date': map(lambda row: row["date"], row_list),
'col1': map(lambda row: row["col1"], row_list),
'col2': map(lambda row: row["col2"], row_list)}
And then converting it to Pandas with:
pd.DataFrame(future_df)
This operation is to be found inside the class ForecastByKey invoked by:
rdd = df.select('*')
.rdd \
.map(lambda row: ((row['key']), row)) \
.groupByKey() \
.map(lambda args: spark_ops.run(args[0], args[1]))
Up to this point, everything works fine; meaning explicitly indicating the columns inside the dictionary future_df.
The problem arises when trying to convert the whole set of columns (700+) with something like:
future_df = {'key': int(key),
'date': map(lambda row: row["date"], row_list)}
for col_ in columns:
future_df[col_] = map(lambda row: row[col_], row_list)
pd.DataFrame(future_df)
Where columns contains the name of each coumn passed to the ForecastByKey class.
The result of this operation is a data frame with empty or close-to-zero columns.
I am using Python 3.6.10 and PySpark 2.4.5
How is this iteration to be done in order to get a data frame with the right information?
After some research, I realized this can be solved with:
row_list = sorted(data, key=lambda row: row['date'])
def f(x):
return map(lambda row: row[x], row_list)
pre_df = {col_: col_ for col_ in self.sdf_cols}
future_df = toolz.valmap(f, pre_df)
future_df['key'] = int(key)
My project is composed by several lists - that I put all together in a dataframe with pandas, to excel.
But one of my list contains sublists, and I don't know how to deal with that.
my_dataframe = pd.DataFrame({
"V1": list1,
"V2": list2,
"V3": list3
})
my_dataframe.to_excel("test.xlsx", sheet_name="Sheet 1", index=False, encoding='utf8')
Let's says that:
list1=[1,2,3]
list2=['a','b','c']
list3=['d',['a','b','c'],'e']
I would like to end in my excel file file with:
I have really no idea how to proceed - if this is even possible?
Any help is welcomed :) Thanks!
Try this before calling to_excel :
my_dataframe = (my_dataframe["V3"].apply(pd.Series)
.merge(my_dataframe.drop("V3", axis = 1), right_index = True, left_index = True)
.melt(id_vars = ['V1', 'V2'], value_name = "V3")
.drop("variable", axis = 1)
.dropna()
.sort_values("V1"))
credits to Bartosz
Hope this helps.
I have a structured numpy array, in which one of field has subfields:
import numpy, string, random
dtype = [('name', 'a10'), ('id', 'i4'),
('size', [('length', 'f8'), ('width', 'f8')])]
a = numpy.zeros(10, dtype = dtype)
for idx in range(len(a)):
a[idx] = (''.join(random.sample(string.ascii_lowercase, 10)), idx,
numpy.random.uniform(0, 1, size=[1, 2]))
I can easily get it sorted by any of fields, like this:
a.sort(order = ['name'])
a.sort(order = ['size'])
When I try to sort it by a structured field ('size' in this example), it is effectively getting sorted by the first subfield ('length' in this example). However, I would like to have my elements sorted by 'height'. I tried something like this, but it does not work:
a.sort(order = ['size[\'height\']']))
ValueError: unknown field name: size['height']
a.sort(order = ['size', 'height'])
ValueError: unknown field name: height
Therefore, I wonder, if there is a way to accomplish the task?
I believe this is what you want:
a[a["size"]["width"].argsort()]