I am changing the font-sizes in my python pandas dataframe plot. The only part that I could not change is the scaling of y-axis values (see the figure below).
Could you please help me with that?
Added:
Here is the simplest code to reproduce my problem:
import pandas as pd
start = 10**12
finish = 1.1*10**12
y = np.linspace(start , finish)
pd.DataFrame(y).plot()
plt.tick_params(axis='x', labelsize=17)
plt.tick_params(axis='y', labelsize=17)
You will see that this result in the graph similar to above. No change in the scaling of the y-axis.
Ma
There are just so many features that you can control with the plotting capabilities of pandas, which leverages matplotlib. I found that seaborn is a lot easier to produce pretty charts, and you have a lot more control over the parameters of your plots.
This is not the most elegant solution, but it works; however, it has a seborn dependency:
%pylab inline
import pandas as pd
import seaborn as sns
import numpy as np
sns.set(style="darkgrid")
sns.set(font_scale=1.5)
start = 10**12
finish = 1.1*10**12
y = np.linspace(start , finish)
pd.DataFrame(y).plot()
plt.tick_params(axis='x', labelsize=17)
plt.tick_params(axis='y', labelsize=17)
I use Jupyter Notebook an that's why I use %pylab inline. The key element here is the use of
font_scale=1.5
Which you can set to whatver you want that produces your desired result. This is what I get:
Related
I managed to make a displot as I intended with seaborn and the only thing I want to change is the bars' outline width. Specifically, I want to make it thinner. Here's the code and a sample of how the dataframe is composed.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data_final = pd.merge(data, data_filt)
q = sns.displot(data=data_final[data_final['cond_state'] == True], y='Brand', hue='Style', multiple='stack')
plt.title('Sample of brands and their offering of ramen styles')
I'm specifying that the plot should only use rows where the cond_state is True. Here is a sample of the data_final dataframe.
Here is how the plot currently looks like.
I've tried various ways published online, but most of them use the deprecated distplot instead of displot. There also doesn't seem to be a parameter for changing the bars' outline width in the seaborn documentation for displot and FacetGrid
The documentation for the seaborn displot function doesn't have this parameter listed, but you can pass matplotlib axes arguments, such as linewidth = 0.25, to the seaborn.displot function to solve your problem.
I am using python 3.8 on Windows 10; trying to make a plot with about 700M points in it, sound wave analysis. Here: Interactive large plot with ~20 million sample points and gigabytes of data
Vaex was highly recommended. I am trying to use examples from the Vaex tutorial but the graph does not appear. I could not find a good example on Internet.
import vaex
import numpy as np
df = vaex.example()
df.plot1d(df.x, limits='99.7%');
The Vaex documents don't mention that pyplot.show() should be used to display. Plot1d plots a histogram. How to plot just connected points?
I am pretty sure that the vaex documentation explains that the (now deprecated) method .plot1d(...) is a wrapper around matplotlib plotting routines.
If you would like to create custom plots using the binned data, you can take this approach (I also found it in their docs)
import vaex
import numpy as np
import pylab as plt
# Load example data
df = vaex.example()
# Do the binning yourself
counts = df.count(binby=df.x, shape=64, limits='99.7%')
# Take care of the x-axis
limits = df.limits_percentage(df.x, percentage=99.7)
xvals = np.linspace(limits[0], limits[1], num=64)
# Create your custom plot via matplotlib, plotly or your favorite tool
p.plot(xvals, counts, marker='o', ms=5);
I am making a python script using the PyCharm IDE, and the idea is to display descriptive statistics and a box plot for each group in a DataFrame. The statistics displays, but the boxplot is nowhere to be seen...
I have tried Googling an answer, but it does not seem this question have been answered before.
import pandas as pd
import matplotlib as plt
(...)
for name, group in grouped:
if len(group) > 3:
print("\n\nNAME: {}".format(name))
print("GROUP: {}".format(group))
print("DESCRIPTIVE STATISTICS
{}".format(group.distance2.describe()))
print(group.distance2.plot.box())
group.distance2.plot.box()
I do not get any error messages, the code runs and completes, but I do not know where the boxplot is supposed to display.
I think the code as it is does not create a matplotlib figure object. Try creating a test data object for group.distance2, then create a matplotlib boxplot object. I am assuming you are using the matplotlib library.
import matplotlib.pyplot as plt
for name, group in grouped:
if len(group) > 3:
data = group.distance2
# create a matplotlib figure object
fig, axs = plt.subplots(1, 1)
# basic plot
axs[0, 0].boxplot(data)
axs[0, 0].set_title('basic plot of group.distance2')
plt.show()
It that works, you can try putting several group data into one figure (axes). Here is more information: https://matplotlib.org/3.1.0/gallery/statistics/boxplot_demo.html
When plotting using matplotlib, I ran into an interesting issue where the y axis is scaled by a very inconvenient quantity. Here's a MWE that demonstrates the problem:
import numpy as np
import matplotlib.pyplot as plt
l = np.linspace(0.5,2,2**10)
a = (0.696*l**2)/(l**2 - 9896.2e-9**2)
plt.plot(l,a)
plt.show()
When I run this, I get a figure that looks like this picture
The y-axis clearly is scaled by a silly quantity even though the y data are all between 1 and 2.
This is similar to the question:
Axis numerical offset in matplotlib
I'm not satisfied with the answer to this question in that it makes no sense to my why I need to go the the convoluted process of changing axis settings when the data are between 1 and 2 (EDIT: between 0 and 1). Why does this happen? Why does matplotlib use such a bizarre scaling?
The data in the plot are all between 0.696000000017 and 0.696000000273. For such cases it makes sense to use some kind of offset.
If you don't want that, you can use you own formatter:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
l = np.linspace(0.5,2,2**10)
a = (0.696*l**2)/(l**2 - 9896.2e-9**2)
plt.plot(l,a)
fmt = matplotlib.ticker.StrMethodFormatter("{x:.12f}")
plt.gca().yaxis.set_major_formatter(fmt)
plt.show()
The following code when graphed looks really messy at the moment. The reason is I have too many values for 'fare'. 'Fare' ranges from [0-500] with most of the values within the first 100.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
y =titanic.groupby([titanic.fare//1,'sex']).survived.mean().reset_index()
sns.set(style="whitegrid")
g = sns.factorplot(x='fare', y= 'survived', col = 'sex', kind ='bar' ,data= y,
size=4, aspect =2.5 , palette="muted")
g.despine(left=True)
g.set_ylabels("Survival Probability")
g.set_xlabels('Fare')
plt.show()
I would like to try slicing up the 'fare' of the plots into subsets but would like to see all the graphs at the same time on one screen. I was wondering it this is possible without having to resort to groupby.
I will have to play around with the values of 'fare' to see what I would want each graph to represent, but for a sample let's use break up the graph into these 'fare' values.
[0-18]
[18-35]
[35-70]
[70-300]
[300-500]
So the total would be 10 graphs on one page, because of the juxtaposition with the opposite sex.
Is it possible with Seaborn? Do I need to do a lot of configuring with matplotlib? Thanks.
Actually I wrote a little blog post about this a while ago. If you are plotting histograms you can use the by keyword:
import matplotlib.pyplot as plt
import seaborn.apionly as sns
sns.set() #rescue matplotlib's styles from the early '90s
data = sns.load_dataset('titanic')
data.hist(by='class', column = 'fare')
plt.show()
Otherwise if you're just plotting value-counts, you have to roll your own grid:
def categorical_hist(self,column,by,layout=None,legend=None,**params):
from math import sqrt, ceil
if layout==None:
s = ceil(sqrt(self[column].unique().size))
layout = (s,s)
return self.groupby(by)[column]\
.value_counts()\
.sort_index()\
.unstack()\
.plot.bar(subplots=True,layout=layout,legend=None,**params)
categorical_hist(data, by='class', column='embark_town')
Edit If you want survival rate by fare range, you could do something like this
data.groupby(pd.cut(data.fare,10)).apply(lambda x.survived.sum(): x./len(x))