why does plt.show() change the plt.save()? - matplotlib

I have multiple dimensional data set (6D), I want to plot the dimension 2/2 and save them.
But I have a problem with the save:
if I use plt.show() I get a good plot in the save_file
if I don't use plt.show() I get an awfull flot with multiple superposition
I try to figure out why
I have tried searching for the effect of plt.show on the plt.savefig but didn't found any useful information
for x in range(6):
for y in range(6):
if x != y:
plt.scatter(data.T[x],data.T[y], s=50, linewidth=0, c=cluster_member_colors, alpha=0.25)
plt.xlabel(reading_axis[x])
plt.ylabel(reading_axis[y])
plt.title(str(chip_id)+'\n ' + str(chamber_ID) + '\n '+reading_axis[x]+"vs"+reading_axis[y])
plt.savefig(save_plot_path+'/'+reading_axis[x]+"vs"+ reading_axis[y])
plt.show()
#if i remove plt.show(), the plt.savefig will save a different plot
I haven't figure yet how to post screenshot or png on stack overflow yet
but to explain what I see:
I try with and without plt.show():
with I get 4 clusters easy to differentiate
without I get 14 clusters where 10 of them are different from the 4 previous one

Related

Is it possible to break x and y axis at the same time on lineplot?

I am working on drawing lineplots with matplotlib.
I checked several posts and could understand how the line break works on matplotlib (Break // in x axis of matplotlib)
However, I was wondering is it possible to break x and y axis all together at the same time.
My current drawing looks like below.
As shown on the graph, x-axis [2000,5000] waste spaces a lot.
Because I have more data that need to be drawn after 7000, I want to save more space.
Is it possible to split x-axis together with y-axis?
Or is there another convenient way to not to show specific region on lineplot?
If there is another library enabling this, I am willing to drop matplotlib and adopt others...
Maybe splitting the axis isn't your best choice. I would perhaps try inserting another smaller figure into the open space of your large figure using add_axes(). Here is a small example.
t = np.linspace(0, 5000, 1000) # create 1000 time stamps
data = 5*t*np.exp(-t/100) # and some fake data
fig, ax = plt.subplots()
ax.plot(t, data)
box = ax.get_position()
width = box.width*0.6
height = box.height*0.6
x = 0.35
y = 0.35
subax = fig.add_axes([x,y,width,height])
subax.plot(t, data)
subax.axis([0, np.max(t)/10, 0, np.max(data)*1.1])
plt.show()

How to display all the lables present on x and y axis in matplotlib [duplicate]

I'm playing around with the abalone dataset from UCI's machine learning repository. I want to display a correlation heatmap using matplotlib and imshow.
The first time I tried it, it worked fine. All the numeric variables plotted and labeled, seen here:
fig = plt.figure(figsize=(15,8))
ax1 = fig.add_subplot(111)
plt.imshow(df.corr(), cmap='hot', interpolation='nearest')
plt.colorbar()
labels = df.columns.tolist()
ax1.set_xticklabels(labels,rotation=90, fontsize=10)
ax1.set_yticklabels(labels,fontsize=10)
plt.show()
successful heatmap
Later, I used get_dummies() on my categorical variable, like so:
df = pd.get_dummies(df, columns = ['sex'])
resulting correlation matrix
So, if I reuse the code from before to generate a nice heatmap, it should be fine, right? Wrong!
What dumpster fire is this?
So my question is, where did my labels go, and how do I get them back?!
Thanks!
To get your labels back, you can force matplotlib to use enough xticks so that all your labels can be shown. This can be done by adding
ax1.set_xticks(np.arange(len(labels)))
ax1.set_yticks(np.arange(len(labels)))
before your statements ax1.set_xticklabels(labels,rotation=90, fontsize=10) and ax1.set_yticklabels(labels,fontsize=10).
This results in the following plot:

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

How to get legend next to plot in Seaborn?

I am plotting a relplot with Seaborn, but getting the legend (and an empty axis plot) printed under the main plot.
Here is how it looks like (in 2 photos, as my screen isn't that big):
Here is the code I used:
fig, axes = plt.subplots(1, 1, figsize=(12, 5))
clean_df['tax_class_at_sale'] = clean_df['tax_class_at_sale'].apply(str)
sns.relplot(x="sale_price_millions", y='gross_sqft_thousands', hue="neighborhood", data=clean_df, ax=axes)
fig.suptitle('Sale Price by Neighborhood', position=(.5,1.05), fontsize=20)
fig.tight_layout()
fig.show()
Does someone has an idea how to fix that, so that the legend (maybe much smaller, but it's not a problem) is printed next to the plot, and the empty axis disappears?
Here is my dataset form (in 2 screenshot, to capture all columns. "sale_price_millions" is the target column)
Since you failed to provide a Minimal, Complete, and Verifiable example, no one can give you a final working answer because we can't reproduce your figure. Nevertheless, you can try specifying the location for placing the legend as following and see if it works as you want
sns.relplot(x="sale_price_millions", y='gross_sqft_thousands', hue="neighborhood", data=clean_df, ax=axes)
plt.legend(loc=(1.05, 0.5))

matplotlib/pyplot: print only ticks once in scatter plot?

I am looking for a way to clean-up the ticks in my pyplot scatter plot.
To create a scatter plot from a Pandas dataset column with strings as elements, I followed the example in [2] - and got me a nice scatter plot:
input are 10k data points where the X axis has only ~200 unique 'names', that got matched to scalars for plotting. Obviously, plotting all the 10k ticks on the x axis is a bit clocked. So, I am looking for a way, to print each unique tick only once and not for each data point?
My code looks like:
fig2 = plt.figure()
WNsUniques, WNs = numpy.unique(taskDataFrame['modificationhost'], return_inverse=True)
scatterWNs = fig2.add_subplot(111)
scatterWNs.scatter(WNs, taskDataFrame['cpuconsumptiontime'])
scatterWNs.set(xticks=range(len(WNsUniques)), xticklabels=WNsUniques)
plt.xticks(rotation='vertical')
plt.savefig("%s_WNs-CPUTime_scatter.%s" % (dfName,"pdf"))
actually, I was hoping that setting the plot x ticks to the unique names should be sufficient - but apparently not? Probably it is something easy, but how do I reduce the ticks for my subplot to unique once (should they not already be uniqueified as returned by numpy.unique?)?
Maybe someone has an idea for me?
Cheers ans thanks,
Thomas
You can use the set_xticks method to accomplish this. Note that 200 axis ticks with labels are still quite a lot to force on a small plot like this, and this is what you might already be seeing with the above code. Without complete code to play with, I can't say for sure.
Additionally, what is the size of WNsUniques? That can easily be used to check if your call to unique is doing what you think.