Unable to label axis in juyter TypeError - matplotlib

This is the code which I have use to plot except i have removed the key.
import datetime
import pandas as pd
import numpy as np
from pandas_datareader import data as web
import matplotlib.pyplot as plt
from pandas import DataFrame
from alpha_vantage.foreignexchange import ForeignExchange
cc = ForeignExchange(key=' ',output_format='pandas')
data, meta_data = cc.get_currency_exchange_daily(from_symbol='USD',to_symbol='EUR', outputsize='full')
print(data)
data['4. close'].plot()
plt.tight_layout()
plt.title('Intraday USD/Eur')
if i insert ylabel('label') i got the following error TypeError: 'str' object is not callable
This problem was outlined here thus if i restart my juyter kernel the ylabel will show up once but if I rerun the same code i will get the same error again. Is there a bug or is there a problem on my end?
I am not sure if it is relevent but the dataframe looks like this
Open High Low Close
date
2019-11-15 0.9072 0.9076 0.9041 0.9043
2019-11-14 0.9081 0.9097 0.9065 0.9070
2019-11-13 0.9079 0.9092 0.9071 0.9082
2019-11-12 0.9062 0.9085 0.9056 0.9079
2019-11-11 0.9071 0.9074 0.9052 0.9062
... ... ... ... ...
2014-11-28 0.8023 0.8044 0.8004 0.8028
2014-11-27 0.7993 0.8024 0.7983 0.8022
2014-11-26 0.8014 0.8034 0.7980 0.7993
2014-11-25 0.8037 0.8059 0.8007 0.8014
2014-11-24 0.8081 0.8085 0.8032 0.8036
1570 rows × 4 columns

Related

Plotting Closing price of SBIN NSE but it is plotting 3:30pm-9:15am also

the dataframe only have time from 9:15 am to 3:30pm every working day. but when it is getting plotted as chart, matplotlib is plotting times between 3:30 to 9:15 next day now tell the solution
can't figure out how to get continuous figure & here is the csv
i tried using
import matplotlib.pyplot as plt
import pandas as pd
#data = the read file in the link
data = pd.read_csv('sbin.csv')
plt.plot(data['MA_50'], label='MA 50', color='red')
plt.plot(data['MA_10'], label='MA 10', color='blue')
plt.legend(loc='best')
plt.xlim(data.index[0], data.index[-1])
plt.xlabel('Time')
plt.ylabel('Price')plt.show()
I expect again 9:15 after 3:30
Have you tried using mplfinance ?
Using the data you posted:
import mplfinance as mpf
import pandas as pd
df = pd.read_csv('sbin.csv', index_col=0, parse_dates=True)
mpf.plot(df, type='candle', ema=(10,50), style='yahoo')
The result:

ValueError for sklearn, problem maybe caused by float32/float64 dtypes?

So I want to check the feature importance in a dataset, but I get this error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
I checked the dataset and fair enough there were nan values. So I added a line to drop all nan rows. Now there are no nan values. I re-ran the code and still the same error. I checked the .dtypes and fair enough, it was all float64. So I added .astype(np.float32) to the columns that I pass to sklearn. But now I still have the same error. I scrolled through the entire dataframe manually and also used data.describe() and all values are between 0 and 5, so far away from infinity or large values.
What is causing the error here?
Here is my code:
import pandas as pd
import numpy as np
from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
data = pd.read_csv("data.csv")
data.dropna(inplace=True) #dropping all nan values
X = data.iloc[:,8:42]
X = X.astype(np.float32) #converting data from float64 to float32
y = data.iloc[:,4]
y = y.astype(np.float32) #converting data from float64 to float32
# feature importance
model = ExtraTreesClassifier()
model.fit(X,y)
print(model.feature_importances_)
You are in the third case (large value) then in the second case (infinity) after the downcast:
Demo:
import numpy as np
a = np.array(np.finfo(numpy.float64).max)
# array(1.79769313e+308)
b = a.astype('float32')
# array(inf, dtype=float32)
How to debug? Suppose the following array:
a = np.array([np.finfo(numpy.float32).max, np.finfo(numpy.float64).max])
# array([3.40282347e+038, 1.79769313e+308])
a[a > np.finfo(numpy.float32).max]
# array([1.79769313e+308])

pandas.groupby --> DatetimeIndex --> groupby year

I come from Javascript and struggle. Need to sort data by DatetimeIndex, further by the year.
CSV looks like this (i shortened it because of more than 1300 entries):
date,value
2016-05-09,1201
2017-05-10,2329
2018-05-11,1716
2019-05-12,10539
I wrote my code like this to throw away the first and last 2.5 percent of the dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.read_csv( "fcc-forum-pageviews.csv", index_col="date", parse_dates=True).sort_values('value')
df = df.iloc[(int(round((df.count() / 100 * 2,5)[0]))):(int(round(((df.count() / 100 * 97,5)[0])-1)))]
df = df.sort_index()
Now I need to group my DatetimeIndex by years to plot it in a manner way by matplotlib. I struggle right here:
def draw_bar_plot():
df_bar = df
fig, ax = plt.subplots()
fig.figure.savefig('bar_plot.png')
return fig
I really dont know how to groupby years.
Doing something like:
print(df_bar.groupby(df_bar.index).first())
leads to:
value
date
2016-05-19 19736
2016-05-20 17491
2016-05-26 18060
2016-05-27 19997
2016-05-28 19044
... ...
2019-11-23 146658
2019-11-24 138875
2019-11-30 141161
2019-12-01 142918
2019-12-03 158549
How to group this by year? Maybe further explain how to get the data ploted by mathplotlib as a bar chart accurately.
This will group the data by year
df_year_wise_sum = df.groupby([df.index.year]).sum()
This line of code will give a bar plot
df_year_wise_sum.plot(kind='bar')
plt.savefig('bar_plot.png')
plt.show()

pandas DataReader is not working correctly

I am trying to import data from yahoo finance but the pandas seems to not read correctly the start date and the end day.
Also is reporting me an error of pandas that I don't understand
this is the code I put :
import numpy as np
import pandas as pd
from pandas_datareader import data as wb
import matplotlib.pyplot as plt
and this is what appear in the screen but I can still using the pandas
/opt/anaconda3/lib/python3.7/site-packages/pandas_datareader/compat/__init__.py:7: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
from pandas.util.testing import assert_frame_equal.
then I ran this code
acciones=["PG","BEI.DE"]
datos= pd.DataFrame()
for t in acciones:
datos[t]=wb.DataReader(t,data_source="yahoo",start=2016-1-1,end=2019-1-1)["Adj Close"]
and when I check the output date is daelayed by two years I don't know why
datos.tail()
Date PG BEI.DE
2016-12-23 76.435783 78.406380
2016-12-27 76.111885 78.726517
2016-12-28 75.635086 78.600410
2016-12-29 75.886978 78.687721
2016-12-30 75.644073 78.192947
datos.head
Date PG BEI.DE
2014-01-02 65.854416 68.331200
2014-01-03 65.780823 68.686317
2014-01-06 65.936180 68.405960
2014-01-07 66.573967 68.592857
2014-01-08 65.609123 68.004128
You're getting a warning that FutureWarning: pandas.util.testing is deprecated so you can still run your code, but it may break in the future. This issue has been resolved here
Instead of using the import statement: from pandas.util.testing import assert_frame_equal use this one instead
from pandas.testing import assert_frame_equal
Also you should use the datetime library to create your start and end dates so that your dates are the correct type.
import datetime
import pandas as pd
import pandas_datareader.data as wb
start_date = datetime.datetime(2016,1,1)
end_date = datetime.datetime(2019,1,1)
acciones=["PG","BEI.DE"]
datos= pd.DataFrame()
for t in acciones:
datos[t]=wb.DataReader(t,data_source="yahoo",start=start_date,end=end_date)["Adj Close"]
Output:
>>> datos.head()
PG BEI.DE
Date
2016-01-04 68.264992 78.003090
2016-01-05 68.482758 78.849281
2016-01-06 67.820770 78.339645
2016-01-07 67.228455 76.426102
2016-01-08 66.174454 76.233788
>>> datos.tail()
PG BEI.DE
Date
2018-12-24 83.908928 NaN
2018-12-26 86.531075 NaN
2018-12-27 88.384834 89.214600
2018-12-28 87.578018 89.805695
2018-12-31 88.288788 NaN

Can't convert Panda series to DateTime

I'm trying to convert my a panda series to a Datetime format but I'm getting the following error.
AttributeError: 'DataFrame' object has no attribute 'to_datetime'
My current code is:
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as dates
file = pd.ExcelFile('/Users/saif/Dropbox/Personal Health/Mega Sleep Data/Merged Database.xlsx')
Sleep= file.parse('Sheet 1')
Sleep.rename(columns={'levels/summary/deep/minutes': 'TotalDeepSleep', 'levels/summary/deep/thirtyDayAvgMinutes': 'AverageDeepSleep'}, inplace=True)
### converting dates to DateTime
**Ω**
The dates are current stored as:
print(Sleep['dateOfSleep'].head())
0 2018-07-02
1 2018-07-03
2 2018-07-04
3 2018-07-05
4 2018-07-07
Name: dateOfSleep, dtype: object