How to plot (correctly) lineplot from pandas dataframe? - pandas

I'm plotting a lineplot from a pandas dataframe. However the labels are overlapped on the right side of the X axis instead of to the relative point mark on the line. What is missing?
Here the full code and the pic
#importing pandas package
import pandas as pd
import matplotlib.pyplot as plt
import csv
import seaborn as sns
# making data frame from csv file
dataset = pd.read_csv('curve.csv.csv')
df = pd.DataFrame(dataset.sort_values('Split')[['Split', 'Score']])
df.reset_index(drop=True, inplace=True)
print(df)
ax = df.plot.line(x='Split',y='Score',color='green',marker=".")
ax.set_xlim((0, 1))
ax.grid(True)
# set the tick marks for x axis
ax.set_xticks(df.Score)
ax.set_xticklabels(['.005','.010','.015','.020','.040','.060','.080','1','15','20','25','30','35','40','45','50','55','60'
,'65','70','75','80','85','90','95'])
ax.grid(True, linestyle='-.')
ax.tick_params(labelcolor='r', labelsize='medium', width=3)
plt.show()
My desired output would be to have all the labels on the X axis aligned to the relative marker point on the line.

You seem to be using the y-values (df.Score) as the positions of your x-ticks.
I assume you meant
ax.set_xticks(df['Split'])

Related

Is there a way to draw shapes on a python pandas plot

I am creating shot plots for NHL games and I have succeeded in making the plot, but I would like to draw the lines that you see on a hockey rink on it. I basically just want to draw two circles and two lines on the plot like this.
Let me know if this is possible/how I could do it
Pandas plot is in fact matplotlib plot, you can assign it to variable and modify it according to your needs ( add horizontal and vertical lines or shapes, text, etc)
# plot your data, but instead diplaying it assing Figure and Axis to variables
fig, ax = df.plot()
ax.vlines(x, ymin, ymax, colors='k', linestyles='solid') # adjust to your needs
plt.show()
working code sample
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from matplotlib.patches import Circle
from matplotlib.collections import PatchCollection
df = seaborn.load_dataset('tips')
ax = df.plot.scatter(x='total_bill', y='tip')
ax.vlines(x=40, ymin=0, ymax=20, colors='red')
patches = [Circle((50,10), radius=3)]
collection = PatchCollection(patches, alpha=0.4)
ax.add_collection(collection)
plt.show()

how to prevent seaborn to skip year in xtick label in Timeseries Plot

I have included the screenshot of the plot. Is there a way to prevent seaborn from skipping the xtick labels in timeseries data.
Most seaborn functions return a matplotlib object, so you can control the number of major ticks displayed via matplotlib. By default, matplotlib will auto-scale, which is why it hides some year labels, you can try to set the MaxNLocator.
Consider the following example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# load data
df = sns.load_dataset('flights')
df.drop_duplicates('year', inplace=True)
df.year = df.year.astype('str')
# plot
fig, ax = plt.subplots(figsize=(5, 2))
sns.lineplot(x='year', y='passengers', data=df, ax=ax)
ax.xaxis.set_major_locator(plt.MaxNLocator(5))
This gives you:
ax.xaxis.set_major_locator(plt.MaxNLocator(10))
will give you
Agree with answer of #steven, just want to say that methods for xticks like plt.xticks or ax.xaxis.set_ticks seem more natural to me. Full details can be found here.

Empty Plots - matplotlib only shows frame of plot but no data

From my excel imported file, I want to plot specific entries i.e.rows and columns but plt.plot command does not display the data, only a blank frame is shown. please see the attached picture.
May be it has something to do with my code.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
hpge = pd.read_excel('mypath\\filename.xlsx','Sheet3', skiprows=1,usecols='C:G,I,J')
x=[]
y=[]
x.append(hpge.E_KeV[2700:2900])# E_KeV is a column
y.append(hpge.Fcounts[2700:2900])# Fcounts is a column
x1=[]
y1=[]
x1.append(hpge.E[2700:2900])
y1.append(hpge.C[2700:2900])
#print(y1)
#print(x)
#plt.xlim(590,710)
#plt.yscale('log')
plt.plot(x, y, label='Cs')
plt.plot(x1,y1)
plt.show()

Stacking multiple plots on a 2Y axis

I am trying to plot multiple plots in a 2Y plot.
I have the following code:
Has a list of files to get some data;
Gets the x and y components of data to plot in y-axis 1 and y-axis 2;
Plots data.
When the loop iterates, it plots on different figures. I would like to get all the plots in the same figure.
Can anyone give me some help on this?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
file=[list of paths]
for i in files:
# Loads Data from an excel file
data = pd.read_excel(files[i],sheet_name='Results',dtype=float)
# Gets x and y data from the loaded files
x=data.iloc[:,-3]
y1=data.iloc[:,-2]
y12=data.iloc[:,-1]
y2=data.iloc[:,3]
fig1=plt.figure()
ax1 = fig1.add_subplot(111)
ax1.set_xlabel=('x')
ax1.set_ylabel=('y')
ax1.plot(x,y1)
ax1.semilogy(x,y12)
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.plot(x,y2)
fig1.tight_layout()
plt.show()
You should instantiate the figure outside the loop, and then add the subplots while iterating. In this way you will have a single figure and all the plots inside it.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
files=[list of paths]
fig1=plt.figure()
for i in files:
# Loads Data from an excel file
data = pd.read_excel(files[i],sheet_name='Results',dtype=float)
# Gets x and y data from the loaded files
x=data.iloc[:,-3]
y1=data.iloc[:,-2]
y12=data.iloc[:,-1]
y2=data.iloc[:,3]
ax1 = fig1.add_subplot(111)
ax1.set_xlabel=('x')
ax1.set_ylabel=('y')
ax1.plot(x,y1)
ax1.semilogy(x,y12)
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.plot(x,y2)
fig1.tight_layout()
plt.show()

Formatting Seaborn Factorplot y-labels to percentages [duplicate]

I have an existing plot that was created with pandas like this:
df['myvar'].plot(kind='bar')
The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)
How can I format the y axis as percentages without changing the line above?
Here is the solution I found but requires that I redefine the plot:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
Link to the above solution: Pyplot: using percentage on x axis
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):
import ...
import matplotlib.ticker as mtick
ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Update
PercentFormatter was introduced into Matplotlib proper in version 2.1.0.
pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,5))
# you get ax from here
ax = df.plot()
type(ax) # matplotlib.axes._subplots.AxesSubplot
# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
Jianxun's solution did the job for me but broke the y value indicator at the bottom left of the window.
I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):
import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter
df = pd.DataFrame(np.random.randn(100,5))
ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
Generally speaking I'd recommend using FuncFormatter for label formatting: it's reliable, and versatile.
For those who are looking for the quick one-liner:
plt.gca().set_yticklabels([f'{x:.0%}' for x in plt.gca().get_yticks()])
this assumes
import: from matplotlib import pyplot as plt
Python >=3.6 for f-String formatting. For older versions, replace f'{x:.0%}' with '{:.0%}'.format(x)
I'm late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.
Echoing #Mad Physicist answer, using the package PercentFormatter it would be:
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer
I propose an alternative method using seaborn
Working code:
import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)
You can do this in one line without importing anything:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{}%'.format))
If you want integer percentages, you can do:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{:.0f}%'.format))
You can use either ax.yaxis or plt.gca().yaxis. FuncFormatter is still part of matplotlib.ticker, but you can also do plt.FuncFormatter as a shortcut.
Based on the answer of #erwanp, you can use the formatted string literals of Python 3,
x = '2'
percentage = f'{x}%' # 2%
inside the FuncFormatter() and combined with a lambda expression.
All wrapped:
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: f'{y}%'))
Another one line solution if the yticks are between 0 and 1:
plt.yticks(plt.yticks()[0], ['{:,.0%}'.format(x) for x in plt.yticks()[0]])
add a line of code
ax.yaxis.set_major_formatter(ticker.PercentFormatter())