How can I set the number of ticks in Julia using Pyplot? - matplotlib

I am struggling to 'translate' the instructions I find for Python to the use of Pyplot in Julia. This must be a simple question, but do you know how to set the number of ticks in a plot in Julia using Pyplot?

If you have
x = [1,2,3,4,5]
y = [1,3,6,8,11]
you can
PyPlot.plot(x,y)
which draws the plot
and then do
PyPlot.xticks([1,3,5])
for tics at 1,3 and 5 on the x-axis
PyPlot.yticks([1,6,11])
for tics at 1,6 and 11 on the y-axis
Tic spacing
if you want fx 4 tics and want it evenly spaced and dont mind Floats, you can do
collect(linspace(x[1], x[end], 4).
If you need the tics to be integers and you want 4 tics, you can do
collect(x[1]:div(x[end],4):x[end])
Edit
Maybe this wont belong here but atleast you'll see it...
whenever you're looking for a method that's supposed to be in a module X you can find these methods by typing in the REPL X. + TAB key
to clarify, if you want to search a module for a method you suspect starts with an x, like xticts, in the REPL (terminal/shell) do
PyPlot.x
and press TAB twice and you'll see
julia> PyPlot.x
xkcd xlabel xlim xscale xticks
and if you're not sure exactly how the method works, fx its arguments, and there isnt any help available, you can call
methods(PyPlot.xticks)
to see every "version" that method has
Bonus
The module for all the standard methods, like maximum, vcat etc is Base

After some trying and searching, I found a way to do it. One can just set the number of bins that should be on each axis. Here is a minimal example:
using PyPlot
x = linspace(0, 10, 200)
y = sin(x)
fig, ax = subplots()
ax[:plot](x, y, "r-", linewidth=2, label="sine function", alpha=0.6)
ax[:legend](loc="upper center")
ax[:locator_params](axis ="y", nbins=4)
The last line specifies the number of bins that should be used on the y-axis. Leaving the argument axis unspecified will set that option for both axis at the same value.

Related

Add more deciamals to matplotlib chart?

Simple question and I tried a quick search before posting but could not find. I am trying to do a chart and axis Y consists of price.
However Y is scaled like attached image and has only 1 decimal. How do I make y axis more precise with 2 decimals and more entries with increment of 0.01?
::Update with code::
# Make the plot
fig, ax = plt.subplots(figsize=(48,32))
ax.scatter(x=times, y=tidy['Price'], c=colors, s=tidy['Volume'] / 4000, alpha=0.4)
ax.ticklabel_format(axis='y', style='plain')
ax.set(
xlabel='Time',
xlim=(xmin, xmax),
ylabel='Price'
)
ax.xaxis.set_major_formatter(DateFormatter('%H:%M'))
One method to increase the number of decimals is to use a formatter for your axis:
from matplotlib.ticker import FormatStrFormatter
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
However, this method will not increase the number of ticks on your axis. You can set the yticks with .01 increments using the following but you might end up over-saturating the axis might want to increase the increment size.
ax.set_yticks(np.arange(108.30,108.71,.01))

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

matplotlib/pyplot: print only ticks once in scatter plot?

I am looking for a way to clean-up the ticks in my pyplot scatter plot.
To create a scatter plot from a Pandas dataset column with strings as elements, I followed the example in [2] - and got me a nice scatter plot:
input are 10k data points where the X axis has only ~200 unique 'names', that got matched to scalars for plotting. Obviously, plotting all the 10k ticks on the x axis is a bit clocked. So, I am looking for a way, to print each unique tick only once and not for each data point?
My code looks like:
fig2 = plt.figure()
WNsUniques, WNs = numpy.unique(taskDataFrame['modificationhost'], return_inverse=True)
scatterWNs = fig2.add_subplot(111)
scatterWNs.scatter(WNs, taskDataFrame['cpuconsumptiontime'])
scatterWNs.set(xticks=range(len(WNsUniques)), xticklabels=WNsUniques)
plt.xticks(rotation='vertical')
plt.savefig("%s_WNs-CPUTime_scatter.%s" % (dfName,"pdf"))
actually, I was hoping that setting the plot x ticks to the unique names should be sufficient - but apparently not? Probably it is something easy, but how do I reduce the ticks for my subplot to unique once (should they not already be uniqueified as returned by numpy.unique?)?
Maybe someone has an idea for me?
Cheers ans thanks,
Thomas
You can use the set_xticks method to accomplish this. Note that 200 axis ticks with labels are still quite a lot to force on a small plot like this, and this is what you might already be seeing with the above code. Without complete code to play with, I can't say for sure.
Additionally, what is the size of WNsUniques? That can easily be used to check if your call to unique is doing what you think.

Dotted line style from non-evenly distributed data

I'm new to Python and MatPlotlib.
This is my first posting to Stackoverflow - I've been unable to find the answer elsewhere and would be grateful for your help.
I'm using Windows XP, with Enthought Canopy v1.1.1 (32 bit).
I want to plot a dotted-style linear regression line through a scatter plot of data, where both x and y arrays contain random floating point data.
The dots in the resulting dotted line are not distributed evenly along the regression line, and are "smeared together" in the middle of the red line, making it look messy (see upper plot resulting from attached minimal example code).
This does not seem to occur if the items in the array of x values are evenly distributed (lower plot).
I'm therefore guessing that this is an issue with how MatplotLib renders dotted lines, or with how Canopy interfaces Python with Matplotlib.
Please could you tell me a workaround which will make the dots on the dotted line type appear evenly distributed; even if both x and y data are non-evenly distributed; whilst still using Canopy and Matplotlib?
(As a general point, I'm always keen to improve my coding skills - if any code in my example can be written more neatly or concisely, I'd be grateful for your expertise).
Many thanks in anticipation
Dave
(UK)
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
#generate data
x1=10 * np.random.random_sample((40))
x2=np.linspace(0,10,40)
y=5 * np.random.random_sample((40))
slope, intercept, r_value, p_value, std_err = stats.linregress(x1,y)
line = (slope*x1)+intercept
plt.figure(1)
plt.subplot(211)
plt.scatter(x1,y,color='blue', marker='o')
plt.plot(x1,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
slope, intercept, r_value, p_value, std_err = stats.linregress(x2,y)
line = (slope*x2)+intercept
plt.subplot(212)
plt.scatter(x2,y,color='blue', marker='o')
plt.plot(x2,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
plt.show()
Welcome to SO.
You have already identified the problem yourself, but seem a bit surprised that a random x-array results in the line be 'cluttered'. But you draw a dotted line repeatedly over the same location, so it seems like the normal behavior to me that it gets smeared at places where there are multiple dotted lines on top of each other.
If you don't want that, you can sort your array and use that to calculate the regression line and plot it. Since its a linear regression, just using the min and max values would also work.
x1_sorted = np.sort(x1)
line = (slope * x1_sorted) + intercept
or
x1_extremes = np.array([x1.min(),x1.max()])
line = (slope * x1_extremes) + intercept
The last should be faster if x1 becomes very large.
With regard to your last comment. In your example you use whats called the 'state-machine' environment for plotting. It means that specified commands are applied to the active figure and the active axes (subplots).
You can also consider the OO approach where you get figure and axes objects. This means you can access any figure or axes at any time, not just the active one. Its useful when passing an axes to a function for example.
In your example both would work equally well and it would be more a matter of taste.
A small example:
# create a figure with 2 subplots (2 rows, 1 column)
fig, axs = plt.subplots(2,1)
# plot in the first subplots
axs[0].scatter(x1,y,color='blue', marker='o')
axs[0].plot(x1,line,'r:',label="Regression Line")
# plot in the second
axs[1].plot()
etc...

Multiplot with matplotlib without knowing the number of plots before running

I have a problem with Matplotlib's subplots. I do not know the number of subplots I want to plot beforehand, but I know that I want them in two rows. so I cannot use
plt.subplot(212)
because I don't know the number that I should provide.
It should look like this:
Right now, I plot all the plots into a folder and put them together with illustrator, but there has to be a better way with Matplotlib. I can provide my code if I was unclear somewhere.
My understanding is that you only know the number of plots at runtime and hence are struggling with the shorthand syntax, e.g.:
plt.subplot(121)
Thankfully, to save you having to do some awkward math to figure out this number programatically, there is another interface which allows you to use the form:
plt.subplot(n_cols, n_rows, plot_num)
So in your case, given you want n plots, you can do:
n_plots = 5 # (or however many you programatically figure out you need)
n_cols = 2
n_rows = (n_plots + 1) // n_cols
for plot_num in range(n_plots):
ax = plt.subplot(n_cols, n_rows, plot_num)
# ... do some plotting
Alternatively, there is also a slightly more pythonic interface which you may wish to be aware of:
fig, subplots = plt.subplots(n_cols, n_rows)
for ax in subplots:
# ... do some plotting
(Notice that this was subplots() not the plain subplot()). Although I must admit, I have never used this latter interface.
HTH