Extra lane in heat map (pandas) - pandas

Here is my file
I plot heat map from it using the following code:
import pandas as pd
import matplotlib.pyplot as plt
new = pd.read_csv(r'path_to_file')
full_list=new.columns.values
new = new[full_list[1:]]
plt.pcolor(new, cmap='Blues')
plt.show()
File has only 11 rows of values, but for some reason 12 rows show up. Do you know what is wrong?
Here is how output looks for me:

There is nothing wrong. First, this has nothing to do with pandas, so we can leave that out and consider the following example
import matplotlib.pyplot as plt
import numpy as np
a = np.random.randint(0,10,size=(11, 2))
plt.pcolor(a, cmap='Blues')
plt.show()
We create an array with 11 rows and 2 columns and plot it. It also shows a 12th row.
The easiest solution is probably to just limit the axis to the number of rows
plt.ylim([0,a.shape[0]])
in this case plt.ylim([0,11]).
However we want to know more...
Is eleven special? Maybe, so let's find out by putting some other numbers in.
1 to 10 work fine. 11 won't. 12 will, 13 not.
So what is special about those numbers, is that matplotlib cannot easily find good axes tickmarks if it is asked to plot 11, 13, ... entities.
This is decided by the matplotlib locator.
The tricky part would now be to find a good locator for 11 entities. I think there is none, as
plt.gca().yaxis.set_major_locator( MaxNLocator(nbins = 11) )
wont work here. But this may also be a different question now.

Related

How to make bar charts for muultiple groups?

Have a dataframe of multiple groups of stats of two classes, ex:
player position Points target_class
lebron sf 23 1
Magic pg 22 0
How do I make bar charts of the average points per position(5 of them) but split for each class. So side by side plots in pandas.
Without more information it's hard to know what you are expecting. Using sns.barplot like so should get you close to what I think you want.
import seaborn as sns
import numpy as np
ax = sns.barplot(
data=df,
x="position",
y="Points",
hue="target_class",
estimator=np.mean,
)

How can i plotting two columns with string as value in a DataSet with Matplotlib?

I have the following Dataset and I wanna create a plot, which to columns compares with each other.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
ds=pd.read_csv('h-t-t-p-:bit.ly/uforeports') #My DataSet
ds.head(5) # Only the fist 5 rows to show
ds1= ds.head(4).drop(['Colors Reported','State'],axis=1) # Droping of unnecesssary rows
print(ds1)
Now I wanna compare "City" and "Shape Reported" with help of plotting. I found something with Pandas but this is not so elegant!
x=ds.loc[0:100,['State']]
y=ds.loc[0:100,['Shape Reported']]
x.apply(pd.value_counts).plot(kind='bar', subplots=True)
y.apply(pd.value_counts).plot(kind='bar', subplots=True)
Do you know a better solution with Matplotlib to this problem?
This is what I want
It's not exactly clear how you want to compare them.
The simplest way of drawing a bar chart is:
df['State'].value_counts().plot.bar()
df['Shape Reported'].value_counts().plot.bar()
If you just want to do it for the first 100 rows as in your example, just add head(100):
df['State'].head(100).value_counts().plot.bar()
df['Shape Reported'].head(100).value_counts().plot.bar()
EDIT:
To compare the two values you can plot a bivariate distribution plot. This is easily done with seaborn:
import seaborn
sns.displot(df,x='State', y='Shape Reported', height=6, aspect=1.33)
Result:

Multiple Axes and Plots

sorry if the post, is not that good. It's the first one for me on Stack Overflow.
I have Datasets in the following structure:
Revolution1 Position1 Temperature1 Revolution2 Position2 Temperature2
1/min mm C 1/min m C
datas....
I plot these against the time. Now I want for every different unit a new y axis. So i looked in the matplotlib example and wrote something like this. X ist the X-Values and d is the pandas dataframe:
fig,host=plt.subplots()
fig.subplots_adjust(right=0.75)
par1 = host.twinx()
par2 = host.twinx()
uni_units = np.unique(units[1:])
par2.spines["right"].set_position(("axes", 1.2))
make_patch_spines_invisible(par2)
# Second, show the right spine.
par2.spines["right"].set_visible(True)
for i,v in enumerate(header[1:]):
if d.loc[0,v] == uni_units[0]:
y=d.loc[an:en,v].values
host.plot(x,y,label=v)
if d.loc[0,v] == uni_units[1]:
(v,ct_yax[1]))
y=d.loc[an:en,v].values
par1.plot(x,y,label=v)
if d.loc[0,v] == uni_units[2]:
y=d.loc[an:en,v].values
par2.plot(x,y,label=v)
EDIT: Okay i really missed to ask the question (maybe i was nervous, because it was the first time posting here):
I actually wanted to ask why it does not work, since i only saw 2 plots. But by zooming in I saw it actually plots every curve...
sorry!
If I understand correctly what you want is to get subplots from the Dataframe.
You can achieve such using the subplots parameter within the plotfunction you have under the Dataframe object.
With below toy sample you can get a better idea on how to achieve this:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"y1":[1,5,3,2],"y2":[10,12,11,15]})
df.plot(subplots=True)
plt.show()
Which produces below figure:
You may check documentation about subplots for pandas Dataframe.

Basic axis malfuction in matplotlib

When plotting using matplotlib, I ran into an interesting issue where the y axis is scaled by a very inconvenient quantity. Here's a MWE that demonstrates the problem:
import numpy as np
import matplotlib.pyplot as plt
l = np.linspace(0.5,2,2**10)
a = (0.696*l**2)/(l**2 - 9896.2e-9**2)
plt.plot(l,a)
plt.show()
When I run this, I get a figure that looks like this picture
The y-axis clearly is scaled by a silly quantity even though the y data are all between 1 and 2.
This is similar to the question:
Axis numerical offset in matplotlib
I'm not satisfied with the answer to this question in that it makes no sense to my why I need to go the the convoluted process of changing axis settings when the data are between 1 and 2 (EDIT: between 0 and 1). Why does this happen? Why does matplotlib use such a bizarre scaling?
The data in the plot are all between 0.696000000017 and 0.696000000273. For such cases it makes sense to use some kind of offset.
If you don't want that, you can use you own formatter:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
l = np.linspace(0.5,2,2**10)
a = (0.696*l**2)/(l**2 - 9896.2e-9**2)
plt.plot(l,a)
fmt = matplotlib.ticker.StrMethodFormatter("{x:.12f}")
plt.gca().yaxis.set_major_formatter(fmt)
plt.show()

Cutting up the x-axis to produce multiple graphs with seaborn?

The following code when graphed looks really messy at the moment. The reason is I have too many values for 'fare'. 'Fare' ranges from [0-500] with most of the values within the first 100.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
y =titanic.groupby([titanic.fare//1,'sex']).survived.mean().reset_index()
sns.set(style="whitegrid")
g = sns.factorplot(x='fare', y= 'survived', col = 'sex', kind ='bar' ,data= y,
size=4, aspect =2.5 , palette="muted")
g.despine(left=True)
g.set_ylabels("Survival Probability")
g.set_xlabels('Fare')
plt.show()
I would like to try slicing up the 'fare' of the plots into subsets but would like to see all the graphs at the same time on one screen. I was wondering it this is possible without having to resort to groupby.
I will have to play around with the values of 'fare' to see what I would want each graph to represent, but for a sample let's use break up the graph into these 'fare' values.
[0-18]
[18-35]
[35-70]
[70-300]
[300-500]
So the total would be 10 graphs on one page, because of the juxtaposition with the opposite sex.
Is it possible with Seaborn? Do I need to do a lot of configuring with matplotlib? Thanks.
Actually I wrote a little blog post about this a while ago. If you are plotting histograms you can use the by keyword:
import matplotlib.pyplot as plt
import seaborn.apionly as sns
sns.set() #rescue matplotlib's styles from the early '90s
data = sns.load_dataset('titanic')
data.hist(by='class', column = 'fare')
plt.show()
Otherwise if you're just plotting value-counts, you have to roll your own grid:
def categorical_hist(self,column,by,layout=None,legend=None,**params):
from math import sqrt, ceil
if layout==None:
s = ceil(sqrt(self[column].unique().size))
layout = (s,s)
return self.groupby(by)[column]\
.value_counts()\
.sort_index()\
.unstack()\
.plot.bar(subplots=True,layout=layout,legend=None,**params)
categorical_hist(data, by='class', column='embark_town')
Edit If you want survival rate by fare range, you could do something like this
data.groupby(pd.cut(data.fare,10)).apply(lambda x.survived.sum(): x./len(x))