Multiple Axes and Plots - pandas

sorry if the post, is not that good. It's the first one for me on Stack Overflow.
I have Datasets in the following structure:
Revolution1 Position1 Temperature1 Revolution2 Position2 Temperature2
1/min mm C 1/min m C
datas....
I plot these against the time. Now I want for every different unit a new y axis. So i looked in the matplotlib example and wrote something like this. X ist the X-Values and d is the pandas dataframe:
fig,host=plt.subplots()
fig.subplots_adjust(right=0.75)
par1 = host.twinx()
par2 = host.twinx()
uni_units = np.unique(units[1:])
par2.spines["right"].set_position(("axes", 1.2))
make_patch_spines_invisible(par2)
# Second, show the right spine.
par2.spines["right"].set_visible(True)
for i,v in enumerate(header[1:]):
if d.loc[0,v] == uni_units[0]:
y=d.loc[an:en,v].values
host.plot(x,y,label=v)
if d.loc[0,v] == uni_units[1]:
(v,ct_yax[1]))
y=d.loc[an:en,v].values
par1.plot(x,y,label=v)
if d.loc[0,v] == uni_units[2]:
y=d.loc[an:en,v].values
par2.plot(x,y,label=v)
EDIT: Okay i really missed to ask the question (maybe i was nervous, because it was the first time posting here):
I actually wanted to ask why it does not work, since i only saw 2 plots. But by zooming in I saw it actually plots every curve...
sorry!

If I understand correctly what you want is to get subplots from the Dataframe.
You can achieve such using the subplots parameter within the plotfunction you have under the Dataframe object.
With below toy sample you can get a better idea on how to achieve this:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"y1":[1,5,3,2],"y2":[10,12,11,15]})
df.plot(subplots=True)
plt.show()
Which produces below figure:
You may check documentation about subplots for pandas Dataframe.

Related

How to plot Series with selective ticks?

I have a Series that I would like to plot as a bar chart: pd.Series([-4,2, 3,3, 4,5,9,20]).value_counts()
Since I have many bars I only want to display some (equidistant) ticks.
However, unless I actively work against it, pyplot will print the wrong labels. E.g. if I leave out set_xticklabels in the code below I get
where every element from the index is taken and just displayed with the specified distance.
This code does what I want:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
s = pd.Series([-4,2, 3,3, 4,5,9,20]).value_counts().sort_index()
mi,ma = min(s.index), max(s.index)
s = s.reindex(range(mi,ma+1,1), fill_value=0)
distance = 10
a = s.plot(kind='bar')
condition = lambda t: int(t[1].get_text()) % 10 == 0
ticks_,labels_=zip(*filter(condition, zip(a.get_xticks(), a.get_xticklabels())))
a.set_xticks(ticks_)
a.set_xticklabels(labels_)
plt.show()
But I still feel like I'm being unnecessarily clever here. Am I missing a function? Is this the best way of doing that?
Consider not using a pandas bar plot in case you intend to plot numeric values; that is because pandas bar plots are categorical in nature.
If instead using a matplotlib bar plot, which is numeric in nature, there is no need to tinker with any ticks at all.
s = pd.Series([-4,2, 3,3, 4,5,9,20]).value_counts().sort_index()
plt.bar(s.index, s)
I think you overcomplicated it. You can simply use the following. You just need to find the relationship between the ticks and the ticklabels.
a = s.plot(kind='bar')
xticks = np.arange(0, max(s)*10+1, 10)
plt.xticks(xticks + abs(mi), xticks)

matplot pandas plotting multiple y values on the same column

Trying to plot using matplot but lines based on the value of a non x , y column.
For example this is my DF:
code reqs value
AGB 253319 57010.16528
ABC 242292 35660.58176
DCC 240440 36587.45336
CHB 172441 57825.83052
DEF 148357 34129.71166
Which yields this plot df.plot(x='reqs',y='value',figsize=(8,4)) :
What I'm looking to do is have a plot with multiple lines one line for each of the codes. Right now its just doing 1 line and ignoring the code column.
I tried searching for an answer but each one is asking for multiple y's I dont have multiple y's I have the same y but with different focuses
(surely i'm using the wrong terms to describe what I'm trying to do hopefully this example and image makes sense)
The result should look something like this:
So I worked out how to do exactly ^ if anyone is curious:
plt_df = df
fig, ax = plt.subplots()
for key,grp in plt_df.groupby(['code']):
ax = grp.plot(ax=ax, kind ='line',x='reqs',y='value',label=key,figsize=(20,4),title = "someTitle")
plt.show()

Using pd.cut to create bins for a graph, but bin values are not coming out as expected

Here is the code I'm running:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
y =titanic.groupby([titanic.fare//1,'sex']).survived.mean().reset_index() #grouping by 'fare' rounded to an integer and 'sex' and then getting the survivability
x =pd.cut(y.fare, (0,17,35,70,300,515)) #I'm not sure if my format is correct but this is how I cut up the fare values
y['Fare_bins']= x # adding the newly created bins to a new column "Fare_bins' in original dataframe.
#graphing with seaborn
sns.set(style="whitegrid")
g = sns.factorplot(x='Fare_bins', y= 'survived', col = 'sex', kind ='bar' ,data= y,
size=4, aspect =2.5 , palette="muted")
g.despine(left=True)
g.set_ylabels("Survival Probability")
g.set_xlabels('Fare')
plt.show()
The problem I'm having is that Fare_values are showing up as (0,17].
The left side is a circle bracket and the right side is square bracket.
If possible I would like to have something like this:
(0-17) or [0-17]
Next, there seems to be a gap between each bar plot. I was expecting them to be adjoined. There are two graphs being represented, so I don't expect of the bars to be ajoined, but the first 5 bars(first graph)should be connected and the last 5 bars to eachother(second graph).
How can I go about fixing these two issues?
It seems I can add labels.
Just by adding labels to the "cut" method parameters, I can display the Fare_values as I want.
x =pd.cut(y.fare, (0,17,35,70,300,515), labels = ('(0-17)', '(17-35)', '(35-70)', '(70-300)','(300-515)') )
As for the brackets showing around the fare_value groups,
according to the documentation:
right : bool, optional
Indicates whether the bins include the rightmost edge or not. If right == True (the default), then the bins [1,2,3,4] indicate (1,2], (2,3], (3,4].
Still not sure if it's possible to join the bars though.

Extra lane in heat map (pandas)

Here is my file
I plot heat map from it using the following code:
import pandas as pd
import matplotlib.pyplot as plt
new = pd.read_csv(r'path_to_file')
full_list=new.columns.values
new = new[full_list[1:]]
plt.pcolor(new, cmap='Blues')
plt.show()
File has only 11 rows of values, but for some reason 12 rows show up. Do you know what is wrong?
Here is how output looks for me:
There is nothing wrong. First, this has nothing to do with pandas, so we can leave that out and consider the following example
import matplotlib.pyplot as plt
import numpy as np
a = np.random.randint(0,10,size=(11, 2))
plt.pcolor(a, cmap='Blues')
plt.show()
We create an array with 11 rows and 2 columns and plot it. It also shows a 12th row.
The easiest solution is probably to just limit the axis to the number of rows
plt.ylim([0,a.shape[0]])
in this case plt.ylim([0,11]).
However we want to know more...
Is eleven special? Maybe, so let's find out by putting some other numbers in.
1 to 10 work fine. 11 won't. 12 will, 13 not.
So what is special about those numbers, is that matplotlib cannot easily find good axes tickmarks if it is asked to plot 11, 13, ... entities.
This is decided by the matplotlib locator.
The tricky part would now be to find a good locator for 11 entities. I think there is none, as
plt.gca().yaxis.set_major_locator( MaxNLocator(nbins = 11) )
wont work here. But this may also be a different question now.

Cutting up the x-axis to produce multiple graphs with seaborn?

The following code when graphed looks really messy at the moment. The reason is I have too many values for 'fare'. 'Fare' ranges from [0-500] with most of the values within the first 100.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
y =titanic.groupby([titanic.fare//1,'sex']).survived.mean().reset_index()
sns.set(style="whitegrid")
g = sns.factorplot(x='fare', y= 'survived', col = 'sex', kind ='bar' ,data= y,
size=4, aspect =2.5 , palette="muted")
g.despine(left=True)
g.set_ylabels("Survival Probability")
g.set_xlabels('Fare')
plt.show()
I would like to try slicing up the 'fare' of the plots into subsets but would like to see all the graphs at the same time on one screen. I was wondering it this is possible without having to resort to groupby.
I will have to play around with the values of 'fare' to see what I would want each graph to represent, but for a sample let's use break up the graph into these 'fare' values.
[0-18]
[18-35]
[35-70]
[70-300]
[300-500]
So the total would be 10 graphs on one page, because of the juxtaposition with the opposite sex.
Is it possible with Seaborn? Do I need to do a lot of configuring with matplotlib? Thanks.
Actually I wrote a little blog post about this a while ago. If you are plotting histograms you can use the by keyword:
import matplotlib.pyplot as plt
import seaborn.apionly as sns
sns.set() #rescue matplotlib's styles from the early '90s
data = sns.load_dataset('titanic')
data.hist(by='class', column = 'fare')
plt.show()
Otherwise if you're just plotting value-counts, you have to roll your own grid:
def categorical_hist(self,column,by,layout=None,legend=None,**params):
from math import sqrt, ceil
if layout==None:
s = ceil(sqrt(self[column].unique().size))
layout = (s,s)
return self.groupby(by)[column]\
.value_counts()\
.sort_index()\
.unstack()\
.plot.bar(subplots=True,layout=layout,legend=None,**params)
categorical_hist(data, by='class', column='embark_town')
Edit If you want survival rate by fare range, you could do something like this
data.groupby(pd.cut(data.fare,10)).apply(lambda x.survived.sum(): x./len(x))