Removing Space between bars in seaborn barplot - matplotlib

I am trying to plot following data. Duration is Jan to Dec. Type varies from 1 to 7. Key point is, not all types exist for each month. This is not missing value, type simply do not exist.
Month Type Coef
Jan 1 2.3
Jan 2 2.1
..
Code:
ax = sns.barplot(x = 'Month', y = 'Coef_E',hue = 'LCZ',data = df_E, palette=palette)
Result
I want to remove space market by arrows.

Related

How to set xticks for the index of string with hvplot

I have a dataframe region_cumulative_df_sel as below:
Month-Day regions RAIN_PERCENTILE_25 RAIN_PERCENTILE_50 RAIN_PERCENTILE_75 RAIN_MEAN RAIN_MEDIAN
07-01 1 0.0611691028 0.2811064720 1.9487996101 1.4330813885 0.2873695195
07-02 1 0.0945720226 0.8130480051 4.5959815979 2.9420840740 1.0614821911
07-03 1 0.2845511734 1.1912839413 5.5803232193 3.7756001949 1.1988518238
07-04 1 0.3402922750 3.2274529934 7.4262523651 5.2195668221 3.2781836987
07-05 1 0.4680584669 5.2418060303 8.6639881134 6.9092760086 5.3968687057
07-06 1 2.4329853058 7.3453550339 10.8091869354 8.7898645401 7.5020875931
... ...
... ...
... ...
06-27 1 382.7809448242 440.1162109375 512.6233520508 466.4956665039 445.0971069336
06-28 1 383.8329162598 446.2222900391 513.2116699219 467.9851379395 451.1973266602
06-29 1 385.7786254883 449.5384826660 513.4027099609 469.5671691895 451.2281188965
06-30 1 386.7952270508 450.6524658203 514.0201416016 471.2863159180 451.2484741211
The index "Month-Day" is a type of String indicating the first day and the last day of a calendar year instead of type of datetime.
I need to use hvplot to develop an interactive plot.
region_cumulative_df_sel.hvplot(width=900)
It is hard to view the labels on the x axis. How can change the xticks to show only 1st of each month, e.g. "07-01", "08-01", "09-01", ... ..., "06-01"?
I tried #Redox code as below:
region_cumulative_df_sel['Month-Day'] = pd.to_datetime(region_cumulative_df_sel['Month-Day'],format="%m-%d") ##Convert to datetime
from bokeh.models.formatters import DatetimeTickFormatter
## Set format for showing x-axis ... you only need days, but in case counts change
formatter = DatetimeTickFormatter(days=["%m-%d"], months=["%m-%d"], years=["%m-%d"])
region_cumulative_df_sel.plot(x='Month-Day', xformatter=formatter, y=['RAIN_PERCENTILE_25','RAIN_PERCENTILE_50','RAIN_PERCENTILE_75','RAIN_MEAN','RAIN_MEDIAN'], width=900, ylabel="Rainfall (mm)",
rot=90, title="Cumulative Rainfall")
This is what I have generated.
How can I shift the xticks on the x-axis to align with the Month-Day values. Also the popup window shows "1900" as year for Month-Day column. Can the year segment be removed?
The x-axis data is in string format. So, holoviews thinks this is categorical and plotting every row. You need to convert it to datetime and this will allow the plotting to be in the format you need. I am taking a simple example and showing how to do this... should work in your case as well...
##My month-day column is string - 07-01 07-02 07-03 07-04 ... 12-31
df['Month-Day']=pd.to_datetime(df['Month-Day'],format="%m-%d") ##Convert to datetime
df['myY']=np.random.randint(100, size=(len(df))) ##Random Y data
from bokeh.models.formatters import DatetimeTickFormatter
## Set format for showing x-axis ... you only need days, but in case counts change
formatter = DatetimeTickFormatter(days=["%m-%d"], months=["%m-%d"], years=["%m-%d"])
##Plot graph
df.plot(x='Month-Day',xformatter=formatter)#.opts(xticks=4, xrotation=90)
#Redox is on the right track here. The issue is with the way the Month-Day column is converted to a datetime; pandas is assuming the year is 1900 for every row.
Essentially you need to attach a year to the Month-Day in some way.
See the example below, this takes the first month-day string, prepends "2022-" and generates sequential daily values for every row (but there are a few ways of doing this).
code:
import pandas as pd
import numpy as np
import hvplot.pandas
from bokeh.models.formatters import DatetimeTickFormatter
dates = pd.date_range("2021-07-01", "2022-06-30", freq="D")
df = pd.DataFrame({
"md": dates.strftime("%m-%d"),
"ign": np.cumsum(np.random.normal(10, 5, len(dates))),
"sup": np.cumsum(np.random.normal(20, 10, len(dates))),
"imp": np.cumsum(np.random.normal(30, 15, len(dates))),
})
df["time"] = pd.date_range("2021-" + df.md[0], periods=len(df.index), freq="D")
formatter = DatetimeTickFormatter(
days=["%m-%d"], months=["%m-%d"], years=["%m-%d"])
df.hvplot(x='time', xformatter=formatter, y=['ign', 'sup', 'imp'],
width=900, ylabel="Index", rot=90, title="Cumulative ISI")

Matplotlib y axis not displaying from low values to high values

i'm a newbie in spyder especially matplotlib. Currently i am trying to display stock price data from dates of 2014-2019.
The unexpected errors i have encountered are the display of data is incorrect .
There are 2 flaws and workarounds i have attempted.
Is the error in displaying caused due to the first index being the year 2019 instead of the year 2014?
Flaw 1 : plt.yticks([0,140]) Does not even display ticks from 0 to 140.
Not displaying ticks from 0 to 140
Flaw 2 : If i remove the conversion of the Date to a datetime format. The display error will be half rectified but a new error will be displayed as the x axis cannot be displayed correctly.
Data is presented correctly from descending to ascending order but date cannot be displayed as it is not in datetime format.
In essence the main flaw is the tickers of the y axis cannot be displayed and the data needs to be presented in descending to ascending order.
I've spent hours trying to fix this, my hair is falling off, i'm gonna be bald soon.
The data is read from year 2019 to 2014.
As the data is organized as of column 1 is: 2019 and the last column is 2014.
I understand that the data is cramped during display but even changing the view to automatic and not inline will not solve this issue.
The code is as below
file = 'Microsoft.csv'
df = pd.read_csv(file)
## convert timestamp
# df['Date'] = pd.to_datetime(df['Date'], format = '%m/%d/%Y')
# df['Close'] = df['Close'].str.replace('$', '')
# df['Close'].astype('float')
##Not Necessary
# df['Close'].apply(lambda x: float(x))
# df.Close = float(df.Close)
# df['Close'] = df.Close.astype(float)
##Beginning of plot
plt.plot(df.Date, df.Close)
plt.yticks([0,140])
plt.ylim(0, 1511)
plt.suptitle('Stock Price')
plt.title('Microsoft', fontdict={'fontsize':15,'fontweight':'bold'})
plt.xlabel('Year')
plt.ylabel('Price in USD')
plt.show()
This is the original plotting without adding the yticks.
X-axis not converted to datetime, Y-Axis does not have yticks
Any help would be grateful, the goal is to display the plot from descending to ascending order with yticks being able to display from any order 0,20,40,60,80,100,120,140 or any amount.

How to plot only business hours and weekdays in pandas

I have hourly stock data.
I need a) to format it so that matplotlib ignores weekends and non-business hours and b) an hourly frequency.
The problem:
Currently, the graph looks crammed and I suspect it is because matplotlib is taking into account 24 hours instead of 8, and 7 days a week instead of business days.
How do I tell pandas to only take into account business hours, M- F?
How I am graphing the data:
I am looping through a list of price data dataframes, graphing each data frame:
mm = 0
for ii in df:
Ddate = ii['Date']
Pprice = ii['Price']
d = Ddate.to_list()
p = Pprice.to_list()
dates = make_dt(d)
prices = unstring(p)
plt.figure()
plt.plot(dates,prices)
plt.title(stocks[mm])
plt.grid(True)
plt.xlabel('Dates')
plt.ylabel('Prices')
mm += 1
the graph:
To fetch business days, you can use below function:
df["IsBDay"] = bool(len(pd.bdate_range(df['date'], df['date'])))
//Above line should add a new column into the DF as IsBday.
//You can also use Lambda expression to check and have new column for BDay.
df['IsBDay'] = df['date'].apply(lambda x: 'True' if bool(len(pd.bdate_range(x, x))) else 'False')
Now create a new DF that will have only True IsBday column value and other columns.
df[df.IsBday != 'False']
Now your DF is ready for ploting.
Hope this helps.

plot score against timestamp in pandas

I have a dataframe in pandas:
date_hour score
2019041822 -5
2019041823 0
2019041900 6
2019041901 -5
where date_hour is in YYYYMMDDHH format, and score is an int.
when I plot, there is a long line connecting 2019041823 to 2019041900, treating all the values in between as absent (ie. there is no score relating to 2019041824-2019041899, because there is no time relating to that).
Is there a way for these gaps/absetvalues to be ignored, so that it is continuous (Some of my data misses 2 days, so I have a long line which is misleading)
The red circles show the gap between nights (ie. between Apr 18 2300 and Apr 19 0000).
I used:
fig, ax = plt.subplots()
x=gpb['date_hour']
y=gpb['score']
ax.plot(x,y, '.-')
display(fig)
I believe it is because the date_hours is an int, and tried to convert to str, but was met with errors: ValueError: x and y must have same first dimension
Is there a way to plot so there are no gaps?
Try to convert date_hour to timestamp: df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H') before plot.
df = pd.DataFrame({'date_hour':[2019041822, 2019041823, 2019041900, 2019041901],
'score':[-5,0,6,-5]})
df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H')
df.plot(x='date_hour', y='score')
plt.show()
Output:
If you don't want to change your data, you can do
df = pd.DataFrame({'date_hour':[2019041822, 2019041823, 2019041900, 2019041901],
'score':[-5,0,6,-5]})
plt.plot(pd.to_datetime(df.date_hour, format='%Y%m%d%H'), df.score)
which gives:

How to go from relative dates to absolute dates in DataFrame columns

I have a pandas DataFrame containing forward prices for future maturities, quoted on multiple different trading months ('trade date'). Trade dates are given in absolute terms ('January'). The maturities are given in relative terms ('M+1').
How can I convert the maturities into an absolute format, i.e. in trade date 'January' the maturity 'M+1' should say 'February'.
Here is example data:
import pandas as pd
import numpy as np
data_keys = ['trade date', 'm+1', 'm+2', 'm+3']
data = {'trade date':['jan','feb','mar','apr'],
'm+1':np.random.randn(4),
'm+2':np.random.randn(4),
'm+3':np.random.randn(4)}
df = pd.DataFrame(data)
df = df[data_keys]
Starting data:
trade date m+1 m+2 m+3
0 jan -0.446535 -1.012870 -0.839881
1 feb 0.013255 0.265500 1.130098
2 mar 0.406562 -1.122270 -1.851551
3 apr -0.890004 0.752648 0.778100
Result:
Should have Feb, Mar, Apr, May, Jun, Jul in the columns. NaN will be shown in many instances.
The starting DataFrame:
trade date m+1 m+2 m+3
0 jan -1.350746 0.948835 0.579352
1 feb 0.011813 2.020158 -1.221110
2 mar -0.183187 -0.303099 1.323092
3 apr 0.081105 0.662628 -0.703152
Solution:
Define a list of all possible absolute dates you will encounter, in
chronological order. Do the same for relative dates.
Create a function to act on groups coming from df.groupby. The
function will convert the column names of each group appropriately to
an absolute format.
Apply the function.
Pandas handles the clever concatenation of all groups.
Code:
abs_in_order = ['jan','feb','mar','apr','may','jun','jul','aug']
rel_in_order = ['m+0','m+1','m+2','m+3','m+4']
def rel2abs(group, abs_in_order, rel_in_order):
abs_date = group['trade date'].unique()[0]
l = len(rel_in_order)
i = abs_in_order.index(abs_date)
namesmap = dict(zip(rel_in_order, abs_in_order[i:i+l]))
group.rename(columns=namesmap, inplace=True)
return group
grouped = df.groupby(['trade date'])
df = grouped.apply(rel2abs, abs_in_order, rel_in_order)
Pandas may mess up the column order. Do this to get back to something in chronological order:
order = ['trade date'] + abs_in_order
cols = [e for e in order if e in df.columns]
df[cols]
Result:
trade date feb mar apr may jun jul
0 jan -1.350746 0.948835 0.579352 NaN NaN NaN
1 feb NaN 0.011813 2.020158 -1.221110 NaN NaN
2 mar NaN NaN -0.183187 -0.303099 1.323092 NaN
3 apr NaN NaN NaN 0.081105 0.662628 -0.703152
You question doesn't contain enough information to answer it.
You say that the prices are quoted on dates given in absolute terms ('January').
January is not a date, but 2-Jan-2015 is.
What is your actual 'date' and what is its format (i.e. text, datetime.date, pd.Timestamp, etc.). You can use type(date) to check where date is an instance of whatever quote date it represents.
The easiest solution is to get your trade dates into pd.Timestamps and then add an offset:
trade_date = pd.Timestamp('2015-1-15')
>>> trade_date + pd.DateOffset(months=1)
Timestamp('2015-02-15 00:00:00')