matplotlib - Keeping ticks location evenly but values unevenly - matplotlib

Is there any way in matplotlib to keep tick locations evenly whereas keeping their values uneven so that data may squeeze some interval and may expand at another.
For example following code generates sine wave with ticks [0.0,0.5,1.0,1.5,2.0]
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)
plt.plot(t, s)
plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
ax = plt.gca()
ax.get_xaxis().get_major_formatter().set_useOffset(False)
plt.autoscale(False)
ax.xaxis.set_ticks([0.0,0.5,1.0,1.5,2.0])
plt.show()
I want to change the value 0.5 to 0.25 at ax.xaxis.set_ticks([0.0,0.5,1.0,1.5,2.0]) but keep it in the same location on the plot.

Apparently the following is not what OP is asking for. I will leave it here until the question is edited, such that people at least understand what is not desired.
You can add set_ticklabels to label the ticks differently.
ax.xaxis.set_ticks( [0.0, 0.50, 1.0,1.5,2.0])
ax.xaxis.set_ticklabels([0.0, 0.25, 1.0,1.5,2.0])
Comlpete example:
import matplotlib.pyplot as plt
import numpy as np
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2*np.pi*t)
plt.plot(t, s)
plt.xlabel('time (s)')
plt.ylabel('voltage (mV)')
plt.title('About as simple as it gets, folks')
plt.grid(True)
ax = plt.gca()
ax.get_xaxis().get_major_formatter().set_useOffset(False)
plt.autoscale(False)
ax.xaxis.set_ticks([0.0,0.5,1.0,1.5,2.0])
ax.xaxis.set_ticklabels([0.0,0.25,1.0,1.5,2.0])
plt.show()

I was working with something similar.
I think that what you wanted to do is the following:
ax.set_xticks((0,0.25,1,1.5,2)) # makes ticks values uneven
ax.xaxis.set_minor_locator(plt.MultipleLocator(0.25)) # locates ticks at a multiple of the number you provide, as here 0.25 (keeps ticks evenly spaced)

Related

Directly annotate matplotlib stacked bar graph [duplicate]

This question already has answers here:
Annotate bars with values on Pandas bar plots
(4 answers)
Closed 1 year ago.
I would like to create an annotation to a bar chart that compares the value of the bar to two reference values. An overlay such as shown in the picture, a kind of staff gauge, is possible, but I'm open to more elegant solutions.
The bar chart is generated with the pandas API to matplotlib (e.g. data.plot(kind="bar")), so a plus would be if the solution is playing nicely with that.
You may use smaller bars for the target and benchmark indicators. Pandas cannot annotate bars automatically, but you can simply loop over the values and use matplotlib's pyplot.annotate instead.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = np.random.randint(5,15, size=5)
t = (a+np.random.normal(size=len(a))*2).round(2)
b = (a+np.random.normal(size=len(a))*2).round(2)
df = pd.DataFrame({"a":a, "t":t, "b":b})
fig, ax = plt.subplots()
df["a"].plot(kind='bar', ax=ax, legend=True)
df["b"].plot(kind='bar', position=0., width=0.1, color="lightblue",legend=True, ax=ax)
df["t"].plot(kind='bar', position=1., width=0.1, color="purple", legend=True, ax=ax)
for i, rows in df.iterrows():
plt.annotate(rows["a"], xy=(i, rows["a"]), rotation=0, color="C0")
plt.annotate(rows["b"], xy=(i+0.1, rows["b"]), color="lightblue", rotation=+20, ha="left")
plt.annotate(rows["t"], xy=(i-0.1, rows["t"]), color="purple", rotation=-20, ha="right")
ax.set_xlim(-1,len(df))
plt.show()
There's no direct way to annotate a bar plot (as far as I am aware) Some time ago I needed to annotate one so I wrote this, perhaps you can adapt it to your needs.
import matplotlib.pyplot as plt
import numpy as np
ax = plt.subplot(111)
ax.set_xlim(-0.2, 3.2)
ax.grid(b=True, which='major', color='k', linestyle=':', lw=.5, zorder=1)
# x,y data
x = np.arange(4)
y = np.array([5, 12, 3, 7])
# Define upper y limit leaving space for the text above the bars.
up = max(y) * .03
ax.set_ylim(0, max(y) + 3 * up)
ax.bar(x, y, align='center', width=0.2, color='g', zorder=4)
# Add text to bars
for xi, yi, l in zip(*[x, y, list(map(str, y))]):
ax.text(xi - len(l) * .02, yi + up, l,
bbox=dict(facecolor='w', edgecolor='w', alpha=.5))
ax.set_xticks(x)
ax.set_xticklabels(['text1', 'text2', 'text3', 'text4'])
ax.tick_params(axis='x', which='major', labelsize=12)
plt.show()

Pandas histogram plot with Y axis or colorbar

In Pandas, I am trying to generate a Ridgeline plot for which the density values are shown (either as Y axis or color-ramp). I am using the Joyplot but any other alternative ways are fine.
So, first I created the Ridge plot to show the different distribution plot for each condition (you can reproduce it using this code):
import pandas as pd
import joypy
import matplotlib
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'Category1':np.random.choice(['C1','C2','C3'],1000),'Category2':np.random.choice(['B1','B2','B3','B4','B5'],1000),
'year':np.arange(start=1900, stop=2900, step=1),
'Data':np.random.uniform(0,1,1000),"Period":np.random.choice(['AA','CC','BB','DD'],1000)})
data_pivot=df1.pivot_table('Data', ['Category1', 'Category2','year'], 'Period')
fig, axes = joypy.joyplot(data_pivot, column=['AA', 'BB', 'CC', 'DD'], by="Category1", ylim='own', figsize=(14,10), legend=True, alpha=0.4)
so it generates the figure but without my desired Y axis. So, based on this post, I could add a colorramp, which neither makes sense nor show the differences between the distribution plot of the different categories on each line :) ...
ar=df1['Data'].plot.kde().get_lines()[0].get_ydata() ## a workaround to get the probability values to set the colorramp max and min
norm = plt.Normalize(ar.min(), ar.max())
original_cmap = plt.cm.viridis
cmap = matplotlib.colors.ListedColormap(original_cmap(norm(ar)))
sm = matplotlib.cm.ScalarMappable(cmap=original_cmap, norm=norm)
sm.set_array([])
# plotting ....
fig, axes = joypy.joyplot(data_pivot,colormap = cmap , column=['AA', 'BB', 'CC', 'DD'], by="Category1", ylim='own', figsize=(14,10), legend=True, alpha=0.4)
fig.colorbar(sm, ax=axes, label="density")
But what I want is some thing like either of these figures (preferably with colorramp) :

Making a Scatter Plot from a DataFrame in Pandas

I have a DataFrame and need to make a scatter-plot from it.
I need to use 2 columns as the x-axis and y-axis and only need to plot 2 rows from the entire dataset. Any suggestions?
For example, my dataframe is below (50 states x 4 columns). I need to plot 'rgdp_change' on the x-axis vs 'diff_unemp' on the y-axis, and only need to plot for the states, "Michigan" and "Wisconsin".
So from the dataframe, you'll need to select the rows from a list of the states you want: ['Michigan', 'Wisconsin']
I also figured you would probably want a legend or some way to differentiate one point from the other. To do this, we create a colormap assigning a different color to each state. This way the code is generalizable for more than those two states.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
# generate a random df with the relevant rows, columns to your actual df
df = pd.DataFrame({'State':['Alabama', 'Alaska', 'Michigan', 'Wisconsin'], 'real_gdp':[1.75*10**5, 4.81*10**4, 2.59*10**5, 1.04*10**5],
'rgdp_change': [-0.4, 0.5, 0.4, -0.5], 'diff_unemp': [-1.3, 0.4, 0.5, -11]})
fig, ax = plt.subplots()
states = ['Michigan', 'Wisconsin']
colormap = cm.viridis
colorlist = [colors.rgb2hex(colormap(i)) for i in np.linspace(0, 0.9, len(states))]
for i,c in enumerate(colorlist):
x = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].rgdp_change.values[i]
y = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].diff_unemp.values[i]
legend_label = states[i]
ax.scatter(x, y, label=legend_label, s=50, linewidth=0.1, c=c)
ax.legend()
plt.show()
Use the dataframe plot method, but first filter the sates you need using index isin method:
states = ["Michigan", "Wisconsin"]
df[df.index.isin(states)].plot(kind='scatter', x='rgdp_change', y='diff_unemp')

adding regression line in python using matplotlib

I have a question about drawing a regression line and determining the slope of that line. I am doing research for water heights of inland lakes in Tibet with the help of satellite date. I have the data for one year of one lake in this script.
However I want to determine the annual rise of the lake for as well the reference height as for the total beams. Is there some one that could help me?
This is the link towards the excel file: https://drive.google.com/file/d/12wD2ByQC6ObNCWq_yIhkXiNsV3KfDpit/view?usp=sharing
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Graph in chronological order
heights = pd.read_excel ('Qinghai_dates_heights.xlsx')
dates = (heights.loc[:,'Date'])
strong_beams = (heights.loc[:,'Strong total'])
weak_beams = (heights.loc[:,'Weak total'])
total_beams = (heights.loc[:,'Total'])
# setting the reference data from Hydrolabs
reference_dates = (heights.loc[:,'Date.1'])
reference_heights = (heights.loc[:,'Hydrolabs'])
# Set the locator
locator = mdates.MonthLocator() # every month
# Specify the format - %b gives us Jan, Feb...
fmt = mdates.DateFormatter('%b')
#plt.plot(dates,strong_beams, label='Strong Beams', marker="o")
#plt.plot(dates,weak_beams, label='Weak Beams', marker="o")
plt.plot(dates, total_beams, label='Total Beams', marker="o")
plt.plot(reference_dates, reference_heights, label='Reference height (Hydrolabs)', marker="o")
X = plt.gca().xaxis
X.set_major_locator(locator)
# Specify formatter
X.set_major_formatter(fmt)
plt.xlabel('Date [months]')
plt.ylabel('elevation [m]')
plt.title("Water-Height Qinghai from November 2018 - November 2019 ")
plt.legend()
plt.show()
Does this help ? I usually use sklearn for this.
import numpy as np
from matplotlib import pyplot as plt
from sklearn import linear_model, datasets
Generate a set of data
X = np.linspace(0, 10)
line_X = X[:, np.newaxis]
Y = X + 0.2*np.random.normal(size=50)
Choose your regression model (there are plenty more, depending on your needs)
lr = linear_model.LinearRegression()
Here you really do the fit
lr.fit(line_X, Y)
Here u extract the parameters, since you seems to need it ;)
slope = lr.coef_[0]
intercept = lr.intercept_
And then you plot
plt.plot(X, slope*X + intercept, ls='-', marker=' ')
plt.plot(X, Y)

How can I use matplotlib ticklabel_format to not use scientific notation on y axis labels

I'm creating a plot (in a colab worksheet) and want the y tick labels to not use scientific notation. The ticklabel_format doesn't make any difference to the final graph. The y axis labels are still shown as 10^3 instead of 1000. How do I format the y tick labels to not use scientific notation?
Here is my code
import matplotlib.pyplot as plt
plt.ticklabel_format(style='plain', axis='y')
plt.plot(Cd_rank,Cd_raw,linewidth=4)
plt.plot(Cd_rank,Cd_sed,linewidth=4)
plt.plot(Cd_rank,Cd_filter,linewidth=4)
plt.plot([0,1],[0.3,0.3],linewidth=4)
plt.plot([0,1],[5,5],linewidth=4)
plt.ylabel('Turbidez (UTN)')
plt.xlabel('Datos ordenados')
plt.yscale('log')
plt.legend(['Agua cruda','Decantada','Filtrada','Norma EPA','Norma ENACAL'])
The ScalarFormatter shows the tick labels in a default format. Note that depending on your concrete situation, matplotlib still might be using scientific notation:
When the numbers are too high (default this is about 4 digits). set_powerlimits((n, m)) can be used to change the limits.
In case the numbers are very close together, matplotlib describes the range using an offset. That offset is placed at the top of the axis. This can be suppressed with the useOffset=None parameter of the formatter.
In some cases with a logarithmic scale, there are very few major ticks. Then also some (but not all) minor ticks get a label. Also for these, the formatter could be changed. A problem can be that a simple ScalarFormatter will set too many labels. Either suppress all these minor labels using a NullFormatter or you'll need a very custom formatter that returns empty strings for the minor tick labels that need to be suppressed.
A simple example:
from matplotlib import pyplot as plt
from matplotlib import ticker
import numpy as np
N = 50
Cd_rank = np.linspace(0, 100, N)
Cd_raw = np.random.normal(1, 20, N).cumsum() + 100
plt.plot(Cd_rank, Cd_raw, linewidth=4)
plt.plot([0, 1], [0.3, 0.3], linewidth=4)
plt.plot([0, 1], [5, 5], linewidth=4)
plt.yscale('log')
plt.gca().yaxis.set_major_formatter(ticker.ScalarFormatter())
plt.gca().yaxis.set_minor_formatter(ticker.NullFormatter())
plt.show()
And here is a more complicated example, with both minor (green) and major (red) ticks.
from matplotlib import pyplot as plt
from matplotlib import ticker
import numpy as np
N = 50
Cd_rank = np.linspace(0, 100, N)
Cd_raw = np.random.normal(10, 5, N).cumsum() + 80
plt.plot(Cd_rank, Cd_raw, linewidth=4)
plt.yscale('log')
mticker = ticker.ScalarFormatter(useOffset=False)
mticker.set_powerlimits((-6, 6))
ax = plt.gca()
ax.yaxis.set_major_formatter(mticker)
ax.yaxis.set_minor_formatter(mticker)
ax.tick_params(axis='y', which='major', colors='crimson')
ax.tick_params(axis='y', which='minor', colors='seagreen')
plt.show()
PS: When the ticks involve both powers of 10 larger than 1 and smaller than 1 (so, e.g. 100, 10, 1, 0.1, 0.01) the ScalarFormatter doesn't display the numbers smaller than 1 well (it displays 0.1 and 0.01 as 0). In that case, the StrMethodFormatter can be used instead:
plt.gca().yaxis.set_major_formatter(ticker.StrMethodFormatter("{x}"))
Here is code that turns off scientific notation and handles numbers that are smaller than 1 correctly. Thanks to #Johanc for this code.
from matplotlib import pyplot as plt
from matplotlib import ticker
import numpy as np
N = 50
x = np.linspace(0,1,N)
y = np.logspace(-3, 2, N)
plt.plot(x, y, linewidth=4)
plt.yscale('log')
plt.ylim(bottom=0.001,top=100)
plt.gca().yaxis.set_major_formatter(ticker.ScalarFormatter())
plt.gca().yaxis.set_major_formatter(ticker.StrMethodFormatter("{x}"))
plt.show()```