Keyerror from new dataframe when plotting - pandas

Keyerror : 0 after creating new dataframe and then attempting to plot new dataframe.
Initially, the code facilitated plotting of the original dataframe.
A small number of rows (~5 rows) were removed and a new dataframe was created.
New dataframe displayed without issue, however upon attempting to plot the new dataframe shows a Keyerror : 0.
I have attempted to resolve the issue without success.
The following is the script for the replacing, removal of missing data and new dataframe creation.
df_pre_orderset2_t = df_pre_orderset2.replace(0, np.nan)
df_pre_orderset2_top = df_pre_orderset2_t.dropna()
pd.set_option('display.max_colwidth', None)
df_pre_orderset2_to_10 = df_pre_orderset2_top.head(10)
df_pre_orderset2_top10 = pd.DataFrame(df_pre_orderset2_to_10)
df_pre_orderset2_top10
With the plot script as follows
plt.figure(figsize=(9,7))
ax = plt.gca()
x = df_pre_orderset2_top10['warning_status']
y = df_pre_orderset2_top10['count']
n = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
t = np.arange(10)
plt.title('Warning distribution versus order sets')
plt.ylabel('Warning count by order sets')
plt.xlabel('Warning alerts')
plt.scatter(x, y, c=t, s=100, alpha=1.0, marker='^')
plt.gcf().set_size_inches(13,8)
#scatter labels
for i, txt in enumerate(n):
ax.annotate(txt, (x[i],y[i]))
plt.show()
This returns an in complete outline of the proposed plot and a keyerror : 0, as below.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-161-0cc4009cf4a7> in <module>
16 #scatter labels
17 for i, txt in enumerate(n):
---> 18 ax.annotate(txt, (x[i],y[i]))
19
20 plt.show()
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
851
852 elif key_is_scalar:
--> 853 return self._get_value(key)
854
855 if is_hashable(key):
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
959
960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
962 return self.index._get_values_for_loc(self, loc, label)
963
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 0

The line that's failing is ax.annotate(txt, (x[i],y[i])), and it's failing when i=0. Both x and y are Series objects that are columns taken from df_pre_orderset2_top10, so I'm guessing that when you removed rows from that dataframe, the row with 0 as its index was removed. You should be able to verify this by displaying the dataframe.
If this is the case, you can reset the index to that dataframe before you extract the x and y columns. Set drop=True to make sure the old index isn't added to the dataframe as a new column.
df_pre_orderset2_top10.reset_index(drop=True, inplace=True)
That should fix the problem.

Related

Dropping same rows in two pandas dataframe in python

I want to have uncommon rows in two pandas dataframes. Two dataframes are df1 and wildone_df. When I check their typy both of them are "pandas.core.frame.DataFrame" but when I use below mentioned code to omit their intersection:
o = pd.concat([wildone_df,df1]).drop_duplicates(subset=None, keep='first', inplace=False)
I face following error:
TypeError Traceback (most recent call last)
<ipython-input-36-4e158c0eeb97> in <module>
----> 1 o = pd.concat([wildone_df,df1]).drop_duplicates(subset=None, keep='first', inplace=False)
5 frames
/usr/local/lib/python3.8/dist-packages/pandas/core/algorithms.py in factorize_array(values, na_sentinel, size_hint, na_value, mask)
561
562 table = hash_klass(size_hint or len(values))
--> 563 uniques, codes = table.factorize(
564 values, na_sentinel=na_sentinel, na_value=na_value, mask=mask
565 )
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
**TypeError: unhashable type: 'numpy.ndarray'**
How can I solve this issue?!
Omitting the intersection of two dataframes
Either use inplace=True or re-assign your dataframe when using pandas.DataFrame.drop_duplicates or any other built-in function that has an inplace parameter. You can't use them both at the same time.
Returns (DataFrame or None)
DataFrame with duplicates removed or None if inplace=True.
Try this :
o = pd.concat([wildone_df, df1]).drop_duplicates() #keep="first" by default
try this:
merged_df = merged_df.loc[:,~merged_df.columns.duplicated()].copy()
See this post for more info

Converting financial information to pandas data frame

I am trying to get stock data such as balance_sheet, income_statement and cash_flow for multiple stocks and converting it to a data frame for manipulations.
here is the getting the data part of the code :
**import yahoo_fin.stock_info as yfs
tickers = ['AMZN','AAPL','MSFT','DIS','GOOG']
balance_sheet=[]
income_statement=[]
cash_flow=[]
balance_sheet.append({ticker : yfs.get_balance_sheet(ticker) for ticker in tickers})
income_statement.append({ticker : yfs.get_income_statement(ticker) for ticker in tickers })
cash_flow.append({ticker : yfs.get_cash_flow(ticker) for ticker in tickers})**
This part works well and returns a dictionary for each category. I then this :
my_dict=cash_flow+balance_sheet+income_statement
dff=pd.DataFrame.from_dict(my_dict, orient='columns', dtype=None, columns=None)
Note that when I try orient='index' I get the following error message :
**AttributeError Traceback (most recent call last)
in
1 my_dict=cash_flow+balance_sheet+income_statement
----> 2 dff=pd.DataFrame.from_dict(my_dict, orient='index', dtype=None, columns=None)
3 # dff=dff.set_index('endDate')
4 dff
5 # cash_flow
/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in from_dict(cls, data, orient, dtype, columns)
1361 if len(data) > 0:
1362 # TODO speed up Series case
-> 1363 if isinstance(list(data.values())[0], (Series, dict)):
1364 data = _from_nested_dict(data)
1365 else:
enter code here
AttributeError: 'list' object has no attribute 'values'**
If someone could let me know what I'm doing wrong that would be very appreciated ! :)

How to make a dataframe from another dataframe, using the shape[] of the old dataframe

I have a dataframe with shape (18,1), each cell of this dataframe is with shape (27, 2000, 10). I want to make a new dataframe with shape (27, 18*10) that each cell consists of 2000 data.
I was trying to do like this, but I got error:
[This is how my dataframe looks like][1]
with shape (18,1).
I tried this code to make the new dataframe:
for i in range(18):
a= pd.DataFrame(((HC[i,0]).shape[2]) ,(18* HC[i,0].shape[0]))
andI get this error:
KeyError Traceback (most recent call last)
<ipython-input-20-3138007415b4> in <module>()
19 HC
20 for i in range(18):
---> 21 a= pd.DataFrame(((HC[i,0]).shape[2]) ,(18* HC[i,0].shape[0]))
22
23 # print(a)
1 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
356 except ValueError as err:
357 raise KeyError(key) from err
--> 358 raise KeyError(key)
359 return super().get_loc(key, method=method, tolerance=tolerance)
360
KeyError: (0, 0)
any idea how can I make the new dataframe?
[1]: https://i.stack.imgur.com/6cKqO.png

How do I filter by date and count these dates using relational algebra (Reframe)?

I'm really stuck. I read the Reframe documentation https://reframe.readthedocs.io/en/latest/ that's based on pandas and I tried multiple things on my own but it still doesn't work. So I got a CSV file called weddings that looks like this:
Weddings, Date
Wedding1,20181107
Wedding2,20181107
And many more rows. As you can see, there are duplicates in the date column but this doesn't matter I think. I want to count the amount of weddings filtered by date, for example, the amount of weddings after 5 october 2016 (20161005). So first I tried this:
Weddings = Relation('weddings.csv')
Weddings.sort(['Date']).query('Date > 20161005').project(['Weddings', 'Date'])
This seems logical to me but I get a Keyerror 'Date' and don't know why? So I tried something more simple
Weddings = Relation('weddings.csv')
Weddings.groupby(['Date']).count()
And this doesn't work either, I still get a Keyerror 'Date' and don't know why. Can someone help me?
Track trace
KeyError Traceback (most recent call last)
<ipython-input-44-b358cdf55fdb> in <module>()
1
2 Weddings = Relation('weddings.csv')
----> 3 weddings.sort(['Date']).query('Date > 20161005').project(['Weddings', 'Date'])
4
5
~\Documents\Reframe.py in sort(self, *args,
**kwargs)
110 """
111
--> 112 return Relation(super().sort_values(*args, **kwargs))
113
114 def intersect(self, other):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in sort_values(self, by,
axis, ascending, inplace, kind, na_position)
4416 by = by[0]
4417 k = self._get_label_or_level_values(by, axis=axis,
-> 4418 stacklevel=stacklevel)
4419
4420 if isinstance(ascending, (tuple, list)):
~\Anaconda3\lib\site-packages\pandas\core\generic.py in
_get_label_or_level_values(self, key, axis, stacklevel)
1377 values = self.axes[axis].get_level_values(key)._values
1378 else:
-> 1379 raise KeyError(key)
1380
1381 # Check for duplicates
KeyError: 'Date'

Pandas: Creating a histogram from string counts

I need to create a histogram from a dataframe column that contains the values "Low', 'Medium', or 'High'. When I try to do the usual df.column.hist(), i get the following error.
ex3.Severity.value_counts()
Out[85]:
Low 230
Medium 21
High 16
dtype: int64
ex3.Severity.hist()
TypeError Traceback (most recent call last)
<ipython-input-86-7c7023aec2e2> in <module>()
----> 1 ex3.Severity.hist()
C:\Users\C06025A\Anaconda\lib\site-packages\pandas\tools\plotting.py in hist_series(self, by, ax, grid, xlabelsize, xrot, ylabelsize, yrot, figsize, bins, **kwds)
2570 values = self.dropna().values
2571
->2572 ax.hist(values, bins=bins, **kwds)
2573 ax.grid(grid)
2574 axes = np.array([ax])
C:\Users\C06025A\Anaconda\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5620 for xi in x:
5621 if len(xi) > 0:
->5622 xmin = min(xmin, xi.min())
5623 xmax = max(xmax, xi.max())
5624 bin_range = (xmin, xmax)
TypeError: unorderable types: str() < float()
ex3.Severity.value_counts().plot(kind='bar')
Is what you actually want.
When you do:
ex3.Severity.value_counts().hist()
it gets the axes the wrong way round i.e. it tries to partition your y axis (counts) into bins, and then plots the number of string labels in each bin.
Just an updated answer (as this comes up a lot.) Pandas has a nice module for styling dataframes in many ways, such as the case mentioned above....
ex3.Severity.value_counts().to_frame().style.bar()
...will print the dataframe with bars built-in (as sparklines, using excel-terminology). Nice for quick analysis on jupyter notebooks.
see pandas styling docs
It is a matplotlib issue which cannot order string together, however you can achieve the desired result by labeling the x-ticks:
# emulate your ex3.Severity.value_counts()
data = {'Low': 2, 'Medium': 4, 'High': 5}
df = pd.Series(data)
plt.bar(range(len(df)), df.values, align='center')
plt.xticks(range(len(df)), df.index.values, size='small')
plt.show()
You assumed that because your data was composed of strings that calling plot() on this would automatically perform the value_counts() but this is not the case hence the error, all you needed to do was:
ex3.Severity.value_counts().hist()