How to plot a spectrum with plotly - plotly-python

I want to plot a spectrum that is given by an array of masses and intensities. For each pair I want to plot a thin line. When I zoom in, the width of the lines should not change. The bar plot does almost what I need.
import plotly.graph_objects as go
fig = go.Figure(data=[go.Bar(
x=df['mz_array'],
y=df['intensity'],
width = 1
)])
fig.show()
However, when I zoom in the bars change their widths.

I modified the dataframe and then used px.line to plot the spectrum:
def plot_spectrum(df, annot_threshold=1e4, threshold=0):
df = df[df.intensity > threshold].reset_index()
df1 = df.copy()
df1['Text'] = df1.mz_array.astype(str)
df1.loc[df1.intensity < annot_threshold, 'Text'] = None
df1.Text.notnull().sum()
df2 = df.copy()
df2['intensity'] = 0
df3 = pd.concat([df1, df2]).sort_values(['index', 'intensity'])
fig = px.line(df3, x='mz_array', y='intensity', color='index', text='Text')
fig.update_layout(showlegend=False)
fig.update_traces(line=dict(width=1, color='grey'))
fig.update_traces(textposition='top center')
return fig

Related

Suppress stacked bar chart label if less than n

I have data with a lot of values. When plotting a percentage, a lot of values show up in 0%, which are then displayed in a plot. I do not want to include labels which are less than 0% or n%.
This is the code that I use to produce the output
import numpy
import pandas as pd
import matplotlib.pyplot as plt
data = np.random.rand(5,10)
data = 10 + data*10
df = pd.DataFrame(data, columns=list('ABCDEFGHIJ'))
ax = df.plot(kind='bar', stacked=True)
for c in ax.containers:
ax.bar_label(c, fmt='%.0f%%', label_type='center')
ax.legend(bbox_to_anchor=(1.0, 1.0), loc='upper left')
I know that I can do what I need using this
ax = df.plot(kind='bar', stacked=True)
for c in ax.containers:
labels = [v if v > 12 else "" for v in c.datavalues]
ax.bar_label(c, labels=labels, label_type="center")
ax.legend(bbox_to_anchor=(1.0, 1.0), loc='upper left')
This way I can suppress values less than 12, but how can I limit amount of decimals which will be shown in label like this fmt='%.0f%%' ?

How do I invert matplotlib bars at a specific point instead of when negative?

I'd like to invert the bars in this diagram when they are below 1, not when they are negative. Additionally I'd like to have even spacing between the ticks/steps on the y-axis
Here is my current code
import matplotlib.pyplot as plt
import numpy as np
labels = ['A','B','C']
Vals1 = [28.3232, 12.232, 9.6132]
Vals2 = [0.00456, 17.868, 13.453]
Vals3 = [0.0032, 1.234, 0.08214]
x = np.arange(len(labels))
width = 0.2
fig, ax = plt.subplots()
rects1 = ax.bar(x - width, Vals1, width, label='V1')
rects2 = ax.bar(x, Vals2, width, label='V2')
rects3 = ax.bar(x + width, Vals3, width, label='V3')
ax.set_xticks(x)
ax.set_xticklabels(labels)
plt.xticks(rotation=90)
ax.legend()
yScale = [0.0019531,0.0039063,0.0078125,0.015625,0.03125,0.0625,0.125,0.25,0.5,1,2,4,8,16,32]
ax.set_yticks(yScale)
plt.show()
I believe I've stumbled upon the answer, here it is for anyone else looking for the solution. Add the argument bottom='1' to ax.bar instantiation, and then flip the values in the array.
for i in range(len(Vals1)):
Vals1[i] = (1 - Vals1[i]) * -1
As you mentioned, the key is the bottom param of Axes.bar:
bottom (default: 0): The y coordinate(s) of the bars bases.
But beyond that, you can simplify your plotting code using pandas:
Put your data into a DataFrame:
import pandas as pd
df = pd.DataFrame({'V1': Vals1, 'V2': Vals2, 'V3': Vals3}, index=labels)
# V1 V2 V3
# A 28.3232 0.00456 0.00320
# B 12.2320 17.86800 1.23400
# C 9.6132 13.45300 0.08214
Then use DataFrame.sub to subtract the offset and DataFrame.plot.bar with the bottom param:
bottom = 1
ax = df.sub(bottom).plot.bar(bottom=bottom)

how to increase space between bar and increase bar width in matplotlib

i am web-scraping a wikipedia table directly from wikipedia website and plot the table. i want to increase the bar width, add space between the bars and make all bars visible. pls how can i do? my code below
#########scrapping#########
html= requests.get("https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Nigeria")
bsObj= BeautifulSoup(html.content, 'html.parser')
states= []
cases=[]
for items in bsObj.find("table",{"class":"wikitable sortable"}).find_all('tr')[1:37]:
data = items.find_all(['th',{"align":"left"},'td'])
states.append(data[0].a.text)
cases.append(data[1].b.text)
########Dataframe#########
table= ["STATES","CASES"]
tab= pd.DataFrame(list(zip(states,cases)),columns=table)
tab["CASES"]=tab["CASES"].replace('\n','', regex=True)
tab["CASES"]=tab["CASES"].replace(',','', regex=True)
tab['CASES'] = pd.to_numeric(tab['CASES'], errors='coerce')
tab["CASES"]=tab["CASES"].fillna(0)
tab["CASES"] = tab["CASES"].values.astype(int)
#######matplotlib########
x=tab["STATES"]
y=tab["CASES"]
plt.cla()
plt.locator_params(axis='y', nbins=len(y)/4)
plt.bar(x,y, color="blue")
plt.xticks(fontsize= 8,rotation='vertical')
plt.yticks(fontsize= 8)
plt.show()
Use pandas.read_html and barh
.read_html will read all tables tags from a website and return a list of dataframes.
barh will make horizontal instead of vertical bars, which is useful if there are a lot of bars.
Make the plot longer, if needed. In this case, (16.0, 10.0), increase 10.
I'd recommend using a log scale for x, because Lagos has so many cases compared to Kogi
This doesn't put more space between the bars, but the formatted plot is more legible with its increased dimensions and horizontal bars.
.iloc[:36, :5] removes some unneeded columns and rows from the dataframe.
import pandas as pd
import matplotlib.pyplot as plt
# url
url = 'https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Nigeria'
# create dataframe list
dataframe_list = pd.read_html(url) # this is a list of all the tables at the url as dataframes
# get the dataframe from the list
df = dataframe_list[2].iloc[:36, :5] # you want the dataframe at index 2
# replace '-' with 0
df.replace('–', 0, inplace=True)
# set to int
for col in df.columns[1:]:
df[col] = df[col].astype('int')
# plot a horizontal bar
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.style.use('ggplot')
p = plt.barh(width='Cases', y='State', data=df, color='purple')
plt.xscale('log')
plt.xlabel('Number of Cases')
plt.show()
Plot all the data in df
df.set_index('State', inplace=True)
plt.figure(figsize=(14, 14))
df.plot.barh()
plt.xscale('log')
plt.show()
4 subplots
State as index
plt.figure(figsize=(14, 14))
for i, col in enumerate(df.columns, 1):
plt.subplot(2, 2, i)
df[col].plot.barh(label=col, color='green')
plt.xscale('log')
plt.legend()
plt.tight_layout()
plt.show()

How to change the order of these plots using zorder?

I'm trying to get a line plot to be over the bar plot. But no matter what I do to change the zorder, it seems like it keeps the bar on top of the line. Nothing I do to try to change zorder seems to work. Sometimes the bar plot just doesn't show up if zorder is <= 0.
import pandas as pd
import matplotlib.pyplot as plt
def tail_plot(tail):
plt.figure()
#line plot
ax1 = incidence[incidence['actual_inc'] != 0].tail(tail).plot(x='date', y=['R_t', 'upper 95% CI', 'lower 95% CI'], color = ['b', '#808080', '#808080'])
ax1.set_zorder(2)
ax2 = ax1.twinx()
inc = incidence[incidence['actual_inc'] != 0]['actual_inc'].tail(tail).values
dates = incidence[incidence['actual_inc'] != 0]['date'].tail(tail).values
#bar plot
ax2.bar(dates, inc, color ='red', zorder=1)
ax2.set_zorder(1)
Keeps giving me this:
The problem with the approach in the post is that ax1 has a white background which totally occludes the plot of ax2. To solve this, the background color can be set to 'none'.
Note that the plt.figure() in the example code of the post creates an empty plot because the pandas plot creates its own new figure (as no ax is given explicitly).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({f'curve {i}': 20 + np.random.normal(.1, .5, 30).cumsum() for i in range(1, 6)})
# line plot
ax1 = df.plot()
ax1.set_zorder(2)
ax1.set_facecolor('none')
ax2 = ax1.twinx()
# bar plot
x = np.arange(30)
ax2.bar(x, np.random.randint(7 + x, 2 * x + 10), color='red', zorder=1)
ax2.set_zorder(1)
plt.show()

Secondary y axis limit in pandas plot

Is there a way to set the limit for the secondary Y axis in pandas df.plot
I have the following plotting statement. Is there a way to simply add ylim for secondary axis? as in "secondary_ylim=(0,1)"
df[["Date","Col1","Col2"]].plot(x="date",y=["Col1","Col2"],secondary_y="Col2",ylim = (0,1))
Interesting.... I don't know if there is another way to get the axes for the secondary y_axes.
But, you could do it this way:
df = pd.DataFrame({'Date':pd.date_range('2019-02-01', periods=10), 'Col1':np.random.randint(0,10,10), 'Col2':np.random.randint(100,500, 10)})
ax = df[["Date","Col1","Col2"]].plot(x="Date",y=["Col1","Col2"],secondary_y="Col2", ylim = ([0,5]))
ax.set_ylim(0,5)
fig = ax.get_figure()
ax = fig.get_axes()
ax[1].set_ylim(0,250)
or as #Stef points out, you can use the right_ax
ax = df[["Date","Col1","Col2"]].plot(x="Date",y=["Col1","Col2"],secondary_y="Col2", ylim = ([0,5]))
ax.right_ax.set_ylim(0,250)
Output: