Trying to plot using matplot but lines based on the value of a non x , y column.
For example this is my DF:
code reqs value
AGB 253319 57010.16528
ABC 242292 35660.58176
DCC 240440 36587.45336
CHB 172441 57825.83052
DEF 148357 34129.71166
Which yields this plot df.plot(x='reqs',y='value',figsize=(8,4)) :
What I'm looking to do is have a plot with multiple lines one line for each of the codes. Right now its just doing 1 line and ignoring the code column.
I tried searching for an answer but each one is asking for multiple y's I dont have multiple y's I have the same y but with different focuses
(surely i'm using the wrong terms to describe what I'm trying to do hopefully this example and image makes sense)
The result should look something like this:
So I worked out how to do exactly ^ if anyone is curious:
plt_df = df
fig, ax = plt.subplots()
for key,grp in plt_df.groupby(['code']):
ax = grp.plot(ax=ax, kind ='line',x='reqs',y='value',label=key,figsize=(20,4),title = "someTitle")
plt.show()
Related
I'm playing around with the abalone dataset from UCI's machine learning repository. I want to display a correlation heatmap using matplotlib and imshow.
The first time I tried it, it worked fine. All the numeric variables plotted and labeled, seen here:
fig = plt.figure(figsize=(15,8))
ax1 = fig.add_subplot(111)
plt.imshow(df.corr(), cmap='hot', interpolation='nearest')
plt.colorbar()
labels = df.columns.tolist()
ax1.set_xticklabels(labels,rotation=90, fontsize=10)
ax1.set_yticklabels(labels,fontsize=10)
plt.show()
successful heatmap
Later, I used get_dummies() on my categorical variable, like so:
df = pd.get_dummies(df, columns = ['sex'])
resulting correlation matrix
So, if I reuse the code from before to generate a nice heatmap, it should be fine, right? Wrong!
What dumpster fire is this?
So my question is, where did my labels go, and how do I get them back?!
Thanks!
To get your labels back, you can force matplotlib to use enough xticks so that all your labels can be shown. This can be done by adding
ax1.set_xticks(np.arange(len(labels)))
ax1.set_yticks(np.arange(len(labels)))
before your statements ax1.set_xticklabels(labels,rotation=90, fontsize=10) and ax1.set_yticklabels(labels,fontsize=10).
This results in the following plot:
I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])
I'm trying to use seaborn dataframe functionality (e.g. passing column names to x, y and hue plot parameters) for my timeseries (in pandas datetime format) plots.
x should come from a timeseries column(converted from a pd.Series of strings with pd.to_datetime)
y should come from a float column
hue comes from a categorical column that I calculated.
There are multiple streams in the same series that I am trying to separate (and use the hue for separating them visually), and therefore they should not be connected by a line (like in a scatterplot)
I have tried the following plot types, each with a different problem:
sns.scatterplot: gets the plotting right and the labels right bus has problems with the xlimits, and I could not set them right with plt.xlim() using data.Dates.min and data.Dates.min
sns.lineplot: gets the limits and the labels right but I could not find a setting to disable the lines between the individual datapoints like in matplotlib. I tried the setting the markers and the dashes parameters to no avail.
sns.stripplot: my last try, plotted the datapoints correctly and got the xlimits right but messed the labels ticks
Example input data for easy reproduction:
dates = pd.to_datetime(('2017-11-15',
'2017-11-29',
'2017-12-15',
'2017-12-28',
'2018-01-15',
'2018-01-30',
'2018-02-15',
'2018-02-27',
'2018-03-15',
'2018-03-27',
'2018-04-13',
'2018-04-27',
'2018-05-15',
'2018-05-28',
'2018-06-15',
'2018-06-28',
'2018-07-13',
'2018-07-27'))
values = np.random.randn(len(dates))
clusters = np.random.randint(1, size=len(dates))
D = {'Dates': dates, 'Values': values, 'Clusters': clusters}
data = pd.DataFrame(D)
To each of the functions I am passing the same arguments:
sns.OneOfThePlottingFunctions(x='Dates',
y='Values',
hue='Clusters',
data=data)
plt.show()
So to recap, what I want is a plot that uses seaborn's pandas functionality, and plots points(not lines) with correct x limits and readable x labels :)
Any help would be greatly appreciated.
ax = sns.scatterplot(x='Dates', y='Values', hue='Clusters', data=data)
ax.set_xlim(data['Dates'].min(), data['Dates'].max())
I am reading Data from a Simulation out of an Excel File. Out of this Data I generated two DataFrames containing 200 values. Now i want to plot all the Values from DataFrame one in blue and all Values from DataFrame two in purple. Therefore I have following code:
df = pd.read_excel("###CENSORED####.xlsx", sheetname="Data")
unpatched = df["Unpatched"][:-800]
patched = df["Patched"][:-800]
x = range(0,len(unpatched))
fig = plt.figure(figsize=(10, 5))
plt.scatter(x, unpatched, zorder=10, )
plt.scatter(x, patched, c="purple",zorder=19,)
This results in following Graph:
But now i want to draw in some lines that visualize the difference between the blue and purple dots. I thought about an orange line going from blue dot at simulation-run x to the purple dot at simulation-run x. I've tried to "cheat" with following code, since I'm pretty new to matplotlib.
scale_factor = 300
for a in x:
plt.axvline(a, patched[a]/scale_factor, unpatched[a]/scale_factor, c="orange")
But this resulted in a inaccuracy as seen seen below:
So is there a smarter way to do this? I've realized that the axvline documentation only says that ymin, ymax can only be scalars. Can I somehow turn my given values into fitting scalars?
sorry if the post, is not that good. It's the first one for me on Stack Overflow.
I have Datasets in the following structure:
Revolution1 Position1 Temperature1 Revolution2 Position2 Temperature2
1/min mm C 1/min m C
datas....
I plot these against the time. Now I want for every different unit a new y axis. So i looked in the matplotlib example and wrote something like this. X ist the X-Values and d is the pandas dataframe:
fig,host=plt.subplots()
fig.subplots_adjust(right=0.75)
par1 = host.twinx()
par2 = host.twinx()
uni_units = np.unique(units[1:])
par2.spines["right"].set_position(("axes", 1.2))
make_patch_spines_invisible(par2)
# Second, show the right spine.
par2.spines["right"].set_visible(True)
for i,v in enumerate(header[1:]):
if d.loc[0,v] == uni_units[0]:
y=d.loc[an:en,v].values
host.plot(x,y,label=v)
if d.loc[0,v] == uni_units[1]:
(v,ct_yax[1]))
y=d.loc[an:en,v].values
par1.plot(x,y,label=v)
if d.loc[0,v] == uni_units[2]:
y=d.loc[an:en,v].values
par2.plot(x,y,label=v)
EDIT: Okay i really missed to ask the question (maybe i was nervous, because it was the first time posting here):
I actually wanted to ask why it does not work, since i only saw 2 plots. But by zooming in I saw it actually plots every curve...
sorry!
If I understand correctly what you want is to get subplots from the Dataframe.
You can achieve such using the subplots parameter within the plotfunction you have under the Dataframe object.
With below toy sample you can get a better idea on how to achieve this:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"y1":[1,5,3,2],"y2":[10,12,11,15]})
df.plot(subplots=True)
plt.show()
Which produces below figure:
You may check documentation about subplots for pandas Dataframe.