Matplotlib Plot X, Y Line Plot Multiple Columns Fixed X Axis - pandas

I'm trying to plot a df with the x axis forced to 12, 1, 2 for (Dec, Jan, Feb) and I cannot see how to do this. Matplot keeps wanting to plot the x axis in the 1,2,12 order. My DF (analogs_re) partial columns for the example looks like this:
Month 2000 1995 2009 2014 1994 2003
0 12 -0.203835 0.580590 0.233124 0.490193 0.605808 0.016756
1 1 -0.947029 -1.239794 -0.977004 0.207236 0.436458 -0.501948
2 2 -0.059957 0.708626 0.111840 0.422534 1.051873 -0.149000
I need the y data plotted with x axis in 12, 1, 2 order as shown in the 'Month" column.
My code:
fig, ax = plt.subplots()
#for name, group in groups:
analogs_re.set_index('Month').plot(figsize=(10,5),grid=True)
analogs_re.plot(x='Month', y=analogs_re.columns[1:len(analogs_re.columns)])

When you set Month as the x-axis then obviously it's going to plot it in numerical order (0, 1, 2, 3...), because a sequential series does not start with 12, then 1, then 2, ...
The trick is to use the original index as x-axis, then label those ticks using the month number:
fig, ax = plt.subplots()
analogs_re.drop(columns='Month').plot(figsize=(10,5), grid=True, ax=ax)
ax.set_xticks(analogs_re.index)
ax.set_xticklabels(analogs_re["Month"])

Related

PyPlot line plot changing color by column value

I have a dataframe with a structure similar to the following example.
df = pd.DataFrame({'x': ['2008-01-01', '2008-01-02', '2008-01-03', '2008-01-04'], 'y': [1, 2, 3, 6],
'group_id': ['OBSERVED', 'IMPUTED', 'OBSERVED', 'IMPUTED'], 'color': ['blue', 'red', 'blue', 'red']})
df['x'] = pd.to_datetime(df['x'])
I.e. a dataframe where some of the values (y) are observed and others are imputed.
x y group_id color
0 2008-01-01 1 OBSERVED blue
1 2008-01-02 2 IMPUTED red
2 2008-01-03 3 OBSERVED blue
3 2008-01-04 6 IMPUTED red
How to I create a single line which changes color based on the group_id (the column color is uniquely determined by group_id as in this example)?
I have tried the following two solutions (one of them being omitted by the comment)
df_grp = df.groupby('group_id')
fig, ax = plt.subplots(1)
for id, data in df_grp:
#ax.plot(data['x'], data['y'], label=id, color=data['color'].unique().tolist()[0])
data.plot('x', 'y', label=id, ax=ax)
plt.legend()
plt.show()
However, the plow is not
a single line.
colored correctly by each segment.
You can use the below code to do the forward looking colors. The key was to get the data right in the dataframe, so that the plotting was easy. You can print(df) after manipulation to see what was done. Primarily, I added the x and y from below row as additional columns in the current row for all except last row. I also included a marker of the resultant color so that you know whether the color is red of blue. One thing to note, the dates in the x column should be in ascending order.
#Add x_end column to df from subsequent row - column X
end_date=df.iloc[1:,:]['x'].reset_index(drop=True).rename('x_end')
df = pd.concat([df, end_date], axis=1)
#Add y_end column to df from subsequent row - column y
end_y=df.iloc[1:,:]['y'].reset_index(drop=True).astype(int).rename('y_end')
df = pd.concat([df, end_y], axis=1)
#Add values for last row, same as x and y so the marker is of right color
df.iat[len(df)-1, 4] = df.iat[len(df)-1, 0]
df.iat[len(df)-1, 5] = df.iat[len(df)-1, 1]
for i in range(len(df)):
plt.plot([df.iloc[i,0],df.iloc[i,4]],[df.iloc[i,1],df.iloc[i,5]], marker='o', c=df.iat[i,3])
Output plot

Pandas Scatter Plot Filtered by Multiple Row Conditions and X, Y Column Values

Thank you for your ideas - I have been trying to make a scatter plot using a loop to filter for unique (2) row values for x values (column data) and y values (column data). The column data for the scatter plot is made when the 2 row conditions are met. My data looks like this:
site_name power_1 wind_speed month year day hour power_2
A 50 5.5 1 2021 2 5 60
A 75 5.9 2 2021 8 17 70
A 40 7.3 5 2021 11 20 85
B 80 8.1 4 2021 1 4 90
B 84 8.2 7 2021 18 5 92
B 46 6.1 10 2021 23 11 41
I am trying to plot each site in a separate scatter plot with x = wind speed and y = power_1 and each hour a different color. Ultimately, I need 2 scatter plots (A, B) for wind speed and power and then 3 different color points for the x, y values. I hope this makes sense.
I have tried using a 2-loop structure - 1 outer loop for the sites (A, B) and an inner loop for the colors of the x, y values.
My actual code to a much larger dataset than I show above resembles below and I get a blank plot when I use this:
#PLOT ALL HOURS OF THE MONTHS/YEARS - WIND SPEED vs POWER
sites = (dfc1.plant_name.unique())
sites = sites.tolist()
import matplotlib.patches
from scipy.interpolate import interp1d
levels, categories = pd.factorize(dfc1.hour.unique())
colors = [plt.cm.Paired(k) for k in levels]
handles = [matplotlib.patches.Patch(color=plt.cm.Paired(k), label=c) for k, c in enumerate(categories)]
#fig, ax = plt.subplots(figsize=(10,4))
for i in range(len(sites)):
#fig = plt.figure()
for j in np.arange(0,24): #24 HOURS AND 1 COLOR FOR EACH UNIQUE HOUR
x = dfc1.loc[dfc1['plant_name']==sites[i]].groupby(['hour']).wind_speed_ms
y = dfc1.loc[dfc1['plant_name']==sites[i]].groupby(['hour']).power_kwh
plt.scatter(x,y, edgecolors=colors[0:j],marker='o',facecolors='none')
site = str(sites[i])
plt.title(site + (' ') + str(dfc1.columns[5]) + (' ') + ('vs') + (' ') + str(dfc1.columns[3]) )
plt.xlabel('Wind Speed'); plt.ylabel('Power')
plt.legend(handles=handles, title="Month",loc='center left', bbox_to_anchor=(1,0.5),edgecolor='black')
#plt.plot(mwsvar.iloc[-1,4], mpvar.iloc[-1,4], c='orange',linestyle=(0,()),marker="o",markersize=7)
plt.legend()
plt.show()
I think you're very close, here's a solution using matplotlib which is kind of long and unwieldy but I think it's the correct solution. Then I also show using a different library called seaborn which makes plots like this much easier
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
df = pd.DataFrame({
'site_name': ['A', 'A', 'A', 'B', 'B', 'B'],
'power_1': [50, 75, 40, 80, 84, 46],
'wind_speed': [5.5, 5.9, 7.3, 8.1, 8.2, 6.1],
'month': [1, 2, 5, 4, 7, 10],
'year': [2021, 2021, 2021, 2021, 2021, 2021],
'day': [2, 8, 11, 1, 18, 23],
'hour': [5, 17, 20, 4, 5, 11],
'power_2': [60, 70, 85, 90, 92, 41],
})
#Matplotlib approach
cmap = mpl.cm.get_cmap('Blues')
hour_colors = {h+1:cmap(h/24) for h in range(24)} #different color for each hour
for site_name,site_df in df.groupby('site_name'):
fig, ax = plt.subplots()
for hour,hour_df in site_df.groupby('hour'):
x = hour_df['wind_speed']
y = hour_df['power_1']
color = hour_colors[hour]
ax.scatter(x, y, color=color, label=f'Hour {hour}')
ax.legend()
plt.title(f'Station {site_name}')
plt.xlabel('Wind speed')
plt.ylabel('Power 1')
plt.show()
plt.close()
#Seaborn approach (different library)
import seaborn as sns
sns.relplot(
x = 'wind_speed',
y = 'power_1',
col = 'site_name',
hue = 'hour',
data = df,
)
plt.show()
plt.close()

How to calculate the difference in days in Pandas timeseries and visualize?

I have financial data from January 2019 to July 2020. I would like to choose a date (lets say March 16, 2020) as date 0 and calculate number of days in +-30 day window and visualize it.
The x axis should have days from -30 to +30. Lastly draw horizontal line for the value at 0 days, like the one in the attached photo:
To create a Timestamp from string you can use pandas.Timestamp
If you want to subtract or add from Timestamp several days use pandas.DateOffset
If you want plot something in Python, you can use matplotlib.pyplot. In you case plot function.
To change x ticks labels from timestamp to -30..30, use pyplot.xticks
To plot vertical line - pyplot.vlines
Simple example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x0 = pd.Timestamp('2020-03-16')
x = pd.date_range(x0 - pd.DateOffset(30), x0 + pd.DateOffset(30), freq='D')
y = np.linspace(1, 10, len(x))
plt.plot(x, y)
plt.xticks(x[[0, 15, 30, 45, 60]], labels=[-30, -15, 0, 15, 30])
plt.show()

How to show axis ticks corresponding to plotted datapoints in a seaborn plot?

I am using seaborn to plot some values which are numerical. But the each of those numbers correspond to a textual value and I want those textual values to be displayed on the axes. Like if the numerical values progress as 0, 5, 10, ..., 30; each of those encoded numbers must be linked to a textual description. How can I do this?
Main Point:
Use:
ax = plt.gca()
ax.set_xticks([<the x components of your datapoints>]);
ax.set_yticks([<the y components of your datapoints>]);
More elaborate version below.
You can go back to matplotlib and it will do it for you.
Let's say you want to plot [0, 7, 14 ... 35] against [0, 2, 4, ... 10]. The two arrays can be created by:
stepy=7
stepx=2
[stepy*y for y in range(6)]
(returning [0, 7, 14, 21, 28, 35])
and
[stepx*x for x in range(6)]
(returning [0, 2, 4, 6, 8, 10]).
Plot these with seaborn:
sns.scatterplot([stepx*x for x in range(6)],[stepy*y for y in range(6)]).
Give current axis to matplotlib by ax = plt.gca(), finish using set_xticks and set_yticks:
ax.set_xticks([stepx*x for x in range(6)]);
ax.set_yticks([stepy*y for y in range(6)]);
The whole code together:
stepy=7
stepx=2
sns.scatterplot([stepx*x for x in range(6)],[stepy*y for y in range(6)])
ax = plt.gca()
ax.set_xticks([stepx*x for x in range(6)]);
ax.set_yticks([stepy*y for y in range(6)]);
Yielding the plot:
I changed the example in the OP because with those numbers to plot, the plots already behave as desired.

Subplot multiindex data by level

This is my multiindex data.
Month Hour Hi
1 9 84.39
10 380.41
11 539.06
12 588.70
13 570.62
14 507.42
15 340.42
16 88.91
2 8 69.31
9 285.13
10 474.95
11 564.42
12 600.11
13 614.36
14 539.79
15 443.93
16 251.57
17 70.51
I want to make subplot where each subplot represent the Month. x axis is hour, y axis is Hi of the respective month.
This gives a beautiful approach as follow:
levels = df.index.levels[0]
fig, axes = plt.subplots(len(levels), figsize=(3, 25))
for level, ax in zip(levels, axes):
df.loc[level].plot(ax=ax, title=str(level))
plt.tight_layout()
I want to make 1x2 subplot instead of vertically arranged as above. Later, with larger data, I want to make 3x4 subplot or even larger dimension.
How to do it?
You can do it in pandas
df.Hi.unstack(0).fillna(0).plot(kind='line',subplots=True, layout=(1,2))
Pass the rows and columns arguments to plt.subplots
levels = df.index.levels[0]
# Number of rows v
fig, axes = plt.subplots(1, len(levels), figsize=(6, 3))
for level, ax in zip(levels, axes):
df.loc[level].plot(ax=ax, title=str(level))
plt.tight_layout()