How can I fix a mismatch while only one plot within the group is not showing? - shapes

The plot computed the first seven locations but does not plot the last one. When it plots all the locations, all the bars look the same instead of each of them to correspond with the location based on the number of times people spent at said location. I have been trying to fix it, but in vain.
print('Total Hours Spent in 2019 \n', aqua.groupby('Location')['In hours, what was your typical length of stay in 2019?'].count().sort_values(ascending = False))
location = ['Harry Stone', 'Lake Highlands', 'Crawford', 'Samuell Grand', 'Kidd Springs', 'Tietze', 'Fretz', 'Exline']
plt.figure(figsize = (15, 25))
for l in location:
ind = location.index(l)
plt.subplot(4, 2, ind + 1)
aquatic = aqua[aqua['Location'] == l]
count = aquatic['In hours, what was your typical length of stay in 2019?'].value_counts()
Index = [0, 1, 2, 3, 4]
plt.bar(Index, count, color = ['orange', 'yellow', 'green', 'cyan'])
plt.xticks(Index, ['2', '4', '3', 'nan', '0'])
plt.xlabel('How Many Hours Spent')
plt.ylabel('Answers Count')
plt.title('Which Location People Spent the most time ' + l)
for i in range(len(Index)):
plt.text(i, Index[0], count[i], ha = 'right', va = 'bottom')
[Plot of all location based on time spent in location](https://i.stack.imgur.com/696P3.png)

Related

How to expand bars over the month on the x-axis while being the same width?

for i in range(len(basin)):
prefix = "URL here"
state = "OR"
basin_name = basin[i]
df_orig = pd.read_csv(f"{prefix}/{basin_name}.csv", index_col=0)
#---create date x-index
curr_wy_date_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 9, 30),
freq="D",
)
if not calendar.isleap(curr_wy):
print("dropping leap day")
df_orig.drop(["02-29"], inplace=True)
use_cols = ["Median ('91-'20)", f"{curr_wy}"]
df = pd.DataFrame(data=df_orig[use_cols].copy())
df.index = curr_wy_date_rng
#--create EOM percent of median values-------------------------------------
curr_wy_month_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 6, 30),
freq="M",
)
df_monthly_prec = pd.DataFrame(data=df_monthly_basin[basin[i]].copy())
df_monthly_prec.index = curr_wy_month_rng
df_monthly = df.groupby(pd.Grouper(freq="M")).max()
df_monthly["date"] = df_monthly.index
df_monthly["wy_date"] = df_monthly["date"].apply(lambda x: cal_to_wy(x))
df_monthly.index = pd.to_datetime(df_monthly["wy_date"])
df_monthly.index = df_monthly["date"]
df_monthly["month"] = df_monthly["date"].apply(
lambda x: calendar.month_abbr[x.month]
)
df_monthly["wy"] = df_monthly["wy_date"].apply(lambda x: x.year)
df_monthly.sort_values(by="wy_date", axis=0, inplace=True)
df_monthly.drop(
columns=[i for i in df_monthly.columns if "date" in i], inplace=True
)
# df_monthly.index = df_monthly['month']
df_merge = pd.merge(df_monthly,df_monthly_prec,how='inner', left_index=True, right_index=True)
#---Subplots---------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8,4))
ax.plot(df_merge.index, df_merge["Median ('91-'20)"], color="green", linewidth="1", linestyle="dashed", label = 'Median Snowpack')
ax.plot(df_merge.index, df_merge[f'{curr_wy}'], color='red', linewidth='2',label='WY Current')
#------Seting x-axis range to expand bar width for ax2
ax.bar(df_merge.index,df_merge[basin[i]], color = 'blue', label = 'Monthly %')
#n = n + 1
#--format chart
ax.set_title(chart_name[w], fontweight = 'bold')
w = w + 1
ax.set_ylabel("Basin Precipitation Index")
ax.set_yticklabels([])
ax.margins(x=0)
ax.legend()
#plt.xlim(0,9)
#---Setting date format
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
#---EXPORT
plt.show()
End result desired: Plotting both the monthly dataframe (df_monthly_prec) with the daily dataframe charting only monthly values (df_monthly). The bars for the monthly DataFrame should ideally span the whole month on the chart.
I have tried creating a secondary axis, but had trouble aligning the times for the primary and secondary axes. Ideally, I would like to replace plotting df_monthly with df (showing all daily data instead of just the end-of-month values within the daily dataset).
Any assistance or pointers would be much appreciated! Apologies if additional clarification is needed.

geom_bar for total counts of binned continuous variable

I'm really struggling to achieve what feels like an incredibly basic geom_bar plot. I would like the sum of y to be represented by one solid bar (with colour = black outline) in bins of 10 for x. I know that stat = "identity" is what is creating the unnecessary individual blocks in each bar but can't find an alternative to achieving what is so close to my end goal. I cheated and made the below desired plot in illustrator.
I don't really want to code x as a factor for the bins as I want to keep the format of the axis ticks and text rather than having text as "0 -10", "10 -20" etc. Is there a way to do this in ggplot without the need to use summerise or cut functions on the raw data? I am also aware of geom_col and sat_count options but again, can't achive my desired outcome.
DF as below, where y = counts at various values of a continuous variable x. Also a factor variable of type.
y = c(1 ,1, 3, 2, 1, 1, 2, 1, 1, 1, 1, 1, 4, 1, 1,1, 2, 1, 2, 3, 2, 2, 1)
x = c(26.7, 28.5, 30.0, 34.8, 35.0, 36.4, 38.6, 40.0, 42.1, 43.7, 44.1, 45.0, 45.5, 47.4, 48.0, 57.2, 57.8, 64.2, 65.0, 66.7, 68.0, 74.4, 94.1)
type = c(rep("Type 1", 20), "Type 2", rep("Type 1", 2))
df<-data.frame(x,y,type)
Bar plot of total y count for each bin of x - trying to fill by total of type, but getting individual proportions as shown by line colour = black. Would like total for each type in each bar.
ggplot(df,aes(y=y, x=x))+
geom_bar(stat = "identity",color = "black", aes(fill = type))+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Or trying to just have the total count within each bin but don't want the internal lines in the bars, just the outer colour = black for each bar
ggplot(df,aes(y=y, x=x))+
geom_col(fill = "#00C3C6", color = "black")+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Here is one way to do it, with previous data transformation and geom_col:
df <- df |>
mutate(bins = floor(x/10) * 10) |>
group_by(bins, type) |>
summarise(y = sum(y))
ggplot(data = df,
aes(y = y,
x = bins))+
geom_col(aes(fill = type),
color = "black")+
scale_x_continuous(breaks = seq(0,100,10)) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")

Generating one NumPy array for each DataFrame row

I'm attempting to plot stock market trades against a plot of the particular stock using mplfinance.plot(). I keep record of all my trades using jstock which uses as CSV file:
"Code","Symbol","Date","Units","Purchase Price","Current Price","Purchase Value","Current Value","Gain/Loss Price","Gain/Loss Value","Gain/Loss %","Broker","Clearing Fee","Stamp Duty","Net Purchase Value","Net Gain/Loss Value","Net Gain/Loss %","Comment"
"ASO","Academy Sports and Outdoors, Inc.","Sep 13, 2021","25.0","45.85","46.62","1146.25","1165.5","0.769999999999996","19.25","1.6793893129770994","0.0","0.0","0.0","1146.25","19.25","1.6793893129770994",""
"ASO","Academy Sports and Outdoors, Inc.","Aug 26, 2021","15.0","41.3","46.62","619.5","699.3","5.32","79.79999999999995","12.881355932203384","0.0","0.0","0.0","619.5","79.79999999999995","12.881355932203384",""
"ASO","Academy Sports and Outdoors, Inc.","Jun 3, 2021","10.0","37.48","46.62","374.79999999999995","466.2","9.14","91.40000000000003","24.386339381003214","0.0","0.0","0.0","374.79999999999995","91.40000000000003","24.386339381003214",""
"RMBS","Rambus Inc.","Nov 24, 2021","2.0","26.99","26.99","53.98","53.98","0.0","0.0","0.0","0.0","0.0","0.0","53.98","0.0","0.0",""
I can get this data easily enough using
myportfolio = pd.read_csv(PORTFOLIO_LOCATION, parse_dates=[2])
But I need to create individual lists for each trade that match the day-by-day stock price:
Date,High,Low,Open,Close,Volume,Adj Close
2020-12-01,17.020000457763672,16.5,16.799999237060547,16.8799991607666,990900,16.8799991607666
2020-12-02,17.31999969482422,16.290000915527344,16.65999984741211,16.40999984741211,1200500,16.40999984741211
and I have a normal DataFrame containing this. So far this is what I have:
for i in myportfolio.groupby("Code"):
(code, j) = i
if code == "ASO": # just testing it against one stock
simp = pd.DataFrame(columns=["Date", "Units", "Price"],
data=j[["Date", "Units", "Purchase Price"]].values, index=j[["Date"]])
df = pd.read_csv("ASO-2020-12-01-2021-12-01.csv", index_col=0, parse_dates=True)
# df.lookup(simp["Date"])
df.insert(0, 'row_num', range(0,len(df)))
k = df.loc[simp["Date"]]['row_num']
trades = []
for index, m in k.iteritems():
t = np.zeros((df.shape[0], 1))
t.fill(np.nan)
t[m] = simp[index]["Price"]
trades.append(t.to_list())
But I receive a KeyError: Timestamp('2021-09-17 00:00:00')
Any ideas of how to fix this?
Addendum 1:
import pandas as pd
trade_data = [['ASO', '5/5/21', 10], ['ASO', '5/6/21', 12], ['RBLX', '5/7/21', 15]]
trade_df = pd.DataFrame(trade_data, columns = ['Code', 'Date', 'Price'])
trade_df['Date'] = pd.to_datetime(trade_df['Date'])
trade_df
Code Date Price
0 ASO 2021-05-05 10
1 ASO 2021-05-07 12
2 RBLX 2021-05-07 15
aso_data = [['5/5/21', 12, 5, 10, 7], ['5/6/21', 15, 7, 13, 8], ['5/7/21', 17, 10, 15, 11]]
aso_df = pd.DataFrame(aso_data, columns = ['Date', 'High', 'Low', 'Open', 'Close'])
aso_df['Date'] = pd.to_datetime(aso_df['Date'])
aso_df
Date High Low Open Close
0 2021-05-05 12 5 10 7
1 2021-05-06 15 7 13 8
2 2021-05-07 17 10 15 11
So I want to create two NumPy arrays for ASO {one for each trade) and one for the RBLX trade. For ASO I should have two NumPy arrays that looks like [10, Nan, Nan] and [NaN, NaN, 12].
Do you want a list of lists right?
There is no need to loop.
df_list = df.values.tolist()
just in case another novice such as myself surfs in with a similar problem.
for i in myportfolio.groupby(["Code"]):
(code, j) = i
if code == "ASO": # just testing it against one stock
df = pd.read_csv("ASO-2020-12-01-2021-12-01.csv", index_col=0, parse_dates=True)
df.insert(0, 'row_num', range(0,len(df)))
k = df.loc[j["Date"]]['row_num']
trades = []
for index, m in j.iterrows():
t = np.zeros((df.shape[0], 1))
t.fill(np.nan)
t[int(df.loc[m["Date"]]['row_num'])] = m["Purchase Price"]
asplot = mpf.make_addplot(t, type="scatter", color='red', marker="D")
trades.append(asplot)
mpf.plot(df, type='candle', addplot=trades)
produced an okay graph showing my entry points. good luck

'matplotlib.pyplot' has no attribute 'autofmt_xdate'

A project I previously submitted for a course worked as expected. I went back to run the code again and now get an python traceback error message that didn't occur before:
'matplotlib.pyplot' has no attribute 'autofmt_xdate'
I loaded the weather station data files and ran all the code, which previously worked. Below is the code for the visualization plot:
plt.figure()
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'min_tmps'] ,'-', c = 'cyan', linewidth=0.5, label = '10yr record lows')
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'max_tmps'] , '-', c = 'orange', linewidth=0.5, label = '10yr record highs')
plt.gca().fill_between(range(len(minmaxdf.loc[:,'min_tmps'])), minmaxdf['min_tmps'], minmaxdf['max_tmps'], facecolor = (0.5, 0.5, 0.5), alpha = 0.5)
plt.scatter(minbreach15.loc[:,'Month-Day'], minbreach15.loc[:,'min_tmps_breach15'], s = 10, c = 'blue', label = 'Record low broken - 2015')
plt.scatter(maxbreach15.loc[:,'Month-Day'], maxbreach15.loc[:,'max_tmps_breach15'], s = 10, c = 'red', label = 'Record high broken - 2015')
plt.xlabel('Month')
plt.ylabel('Temperature (Tenths of Degrees C)')
plt.title('10yr Max/Min Temperature Range for Wilton CT 06897')
plt.gca().axis([0, 400, -500, 500])
plt.xticks(range(0, len(minmaxdf.loc[:,'Month-Day']), 30), minmaxdf.loc[:,'Month-Day'].index[range(0, len(minmaxdf.loc[:,'Month-Day']), 30)], rotation = '-45')
plt.xticks( np.linspace(0, 15 + 30*11 , num = 12), (r'Jan', r'Feb', r'Mar', r'Apr', r'May', r'Jun', r'Jul', r'Aug', r'Sep', r'Oct', r'Nov', r'Dec') )
plt.legend(loc = 4, frameon = False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.autofmt_xdate()
plt.show()
produced a chart of day of year (2004-14) 10yr average temp max/mins, overlay with scatter points of 2015 max/mins that exceeded the averages.
autofmt_xdate() is a method of the Figure. The command hence needs to be
plt.gcf().autofmt_xdate()

Plotting facetgrid plots in seaborn with smoothing

I have a pandas dataframe a snippet of which is shown below:-
I wish to recreate the graphs shown below in Seaborn. These graphs were created in R using ggplot, but I am working with pandas/matplotlib/seaborn.
Essentially the graphs summarize the variables(mi,steps,st...) grouped by sensor id, with hours to the event on the x-axis. Additionally and most importantly, there is smoothing performed by stat_smooth() within ggplot. I have included a snippet of my ggplot code.
step.plot <- ggplot(data=cdays, aes(x=dfc, y=steps, col=legid)) +
ggtitle('time to event' +
labs(x="Days from event", y='Number of steps') +
stat_smooth(method='loess', span=0.2, formula=y~x) +
geom_vline(mapping=aes(xintercept=0), color='blue') +
theme(legend.position="none")
here is how I would do it. Bear in mind that I had to make assumptions about the structure of your data, so please review what I did before applying it.
Creating some simulated data
subject = np.repeat(np.repeat([1, 2, 3, 4, 5], 4), 31)
time = np.tile(np.repeat(np.arange(-15, 16, 1), 4), 5)
sensor = np.tile([1, 2, 3, 4], 31*5)
measure1 = subject*20 + time*(5-sensor) - time**2*(sensor-2)*0.1 + (time >= 0)*np.random.normal(100*(sensor-2), 10, 620) + np.random.normal(0, 10, 620)
measure2 = subject*10 + time*(2-sensor) - time**2*(sensor-4)*0.1 + (time >= 0)*np.random.normal(50*(sensor-1), 10, 620) + np.random.normal(0, 8, 620)
measure3 = time**2*(sensor-1)*0.1 + (time >= 0)*np.random.normal(50*(sensor-3), 10, 620) + np.random.normal(0, 8, 620)
measure4 = time**2*(sensor-1)*0.1 + np.random.normal(0, 8, 620)
Putting it in a long form dataset for plotting
df = pd.DataFrame(dict(subject=subject, time=time, sensor=sensor, measure1=measure1,
measure2=measure2, measure3=measure3, measure4=measure4))
df = pd.melt(df, id_vars=["sensor", "subject", "time"],
value_vars=["measure1", "measure2","measure3", "measure4"],
var_name="measure")
Creating the plot, without smoothing
g = sns.FacetGrid(data=df, col="measure", col_wrap=2)
g.map_dataframe(sns.tsplot, time="time", value="value", condition="sensor", unit="subject", color="deep")
g.add_legend(title="Sensor Number")
g.set_xlabels("Days from Event")
g.set_titles("{col_name}")
plt.show()
Plotted data, before smoothing
Now let's use statsmodels to smooth the data.
Please review this part, this is where I made assumptions about the sampling unit (I assume that the sampling unit is the subject, and therefore treat sensors and measure types as conditions).
from statsmodels.nonparametric.smoothers_lowess import lowess
dfs = []
for sens in df.sensor.unique():
for meas in df.measure.unique():
# One independent smoothing per Sensor/Measure condition.
df_filt = df.loc[(df.sensor == sens) & (df.measure == meas)]
# Frac is equivalent to span in R
filtered = lowess(df_filt.value, df_filt.time, frac=0.2)
df_filt["filteredvalue"] = filtered[:,1]
dfs.append(df_filt)
df = pd.concat(dfs)
Plotted data, after smoothing
From there you can tweak your plot however you like. Tell me if you have any question.