How to group a Box plot by the column names of a data frame in Seaborn? [duplicate] - dataframe

This question already has answers here:
Boxplot of Multiple Columns of a Pandas Dataframe on the Same Figure (seaborn)
(4 answers)
Closed 24 days ago.
I'm a beginner trying to learn data science and this is my first time using the seaborn and matplotlib libraries. I have this practice dataset and data frame :
that I want to turn into a box plot and I want the x-axis to have all of the column names and the y-axis to range from 0 - 700 but, I'm not sure what to do.
I tried using : random_variable = sms.boxplot(data = df, x = ?, y = 'TAX')
which does give me the y-axis that is close to what I am looking for but, I don't know what the x-axis should be set equal too.
I thought may I could use the keys of the dataframe but, all I got was this mess that doesn't work:
random_variable = sms.boxplot(x = df.keys(), y = df['TAX'])
I want it to look like this but, I'm really lost on how to do this:
I apologize if this is an easy fix but, I would appreciate any help.

If you just want to display your data like that go with
import seaborn as sns
sns.boxplot(data=df)

Related

Cannot plot a histogram from a Pandas dataframe

I've used pandas.read_csv to generate a 1000-row dataframe with 32 columns. I'm looking to plot a histogram or bar chart (depending on data type) of each column. For columns of type 'int64', I've tried doing matplotlib.pyplot.hist(df['column']) and df.hist(column='column'), as well as calling matplotlib.pyplot.hist on df['column'].values and df['column'].to_numpy(). Weirdly, nthey all take areally long time (>30s) and when I've allowed them to complet, I get unit-height bars in multiple colors, as if there's some sort of implicit grouping and they're all being separated into different groups. Any ideas about what I can do to get a normal histogram? Unfortunately I closed the charts so I can't show you an example right now.
Edit - this seems to be a much bigger problem with Int columns, and casting them to float fixes the problem.
Follow these two steps:
import the Histogram class from the Matplotlib library
use the "plot" method, which will accept a dataframe as argument
import matplotlib.pyplot as plt
plt.hist(df['column'], color='blue', edgecolor='black', bins=int(45/1))
Here's the source.

Plot data using facet-grid in seaborn [duplicate]

This question already has answers here:
How to change the number or rows and columns in my catplot
(2 answers)
Seaborn multiple barplots
(2 answers)
subplotting with catplot
(1 answer)
Closed 4 months ago.
I have this dataset table
And i want to plot profit made by different sub_category in different region.
now i am using this code to make a plot using seaborn
sns.barplot(data=sub_category_profit,x="sub_category",y="profit",hue="region")
I am getting a extreamly huge plot like this output
is there is any way i can get sub-plots of this like a facet-gird. Like subplots of different sub_category. I have used the facet grid function but it is the also not working properly.
g=sns.FacetGrid(data=sub_category_profit,col="sub_category")
g.map(sns.barplot(data=sub_category_profit,x="region",y="profit"))
I am getting the following output
As you can see in the facet grid output the plots are very small and the bar graph is just present on one grid.
See docs on seaborn.FacetGrid, particularly the posted example, where you should not pass the data again in the map call but simply the plot function and x and y variables to draw plots to corresponding facets.
Also, consider the col_wrap argument since you do not specify row to avoid the very wide plot output.
g=sns.FacetGrid(data=sub_category_profit, col="sub_category", col_wrap=4)
g.map_dataframe(sns.barplot, x="region", y="profit")

unable to obtain desired line graph through datafram.plot()

years = list(map(str,range(1980,2014)))
df_can.loc[['Haiti'],years].plot(kind='line')
plt.title('Immigration from Haiti')
plt.ylabel('Number of immigrants')
plt.xlabel('Years')
plt.show()
This is the plot I'm getting from above code
https://i.stack.imgur.com/nqM5F.png instead of line graph. I tried all different methods still not able to get the desired line graph.
I don't have your data, so I can't recreate the graph, but I'm pretty sure you're just missing a for loop.
years = list(map(str,range(1980,2014)))
for i in years:
df_can.loc[['Haiti'],i].plot(kind='line')
plt.title('Immigration from Haiti')
plt.ylabel('Number of immigrants')
plt.xlabel('Years')
plt.show()
Right now as your code stands you're trying to plot all of the years, and all of the data, on one line. It doesn't intuitively know to break it up.
To plot a line graph we need two columns of data but the above code as two rows one with immigrants data and the other with year from 1980-2013. So, we transpose this to get two columns and as the years are column names in string we convert them to integer data type.
years = list(map(str,range(1980,2014))
df_canada=df_can.loc[['Haiti'],years].plot(kind='line').transpose()
df_canada.index= df_canada.index.map(int)
plt.title('Immigration from Haiti')
plt.ylabel('Number of immigrants')
plt.xlabel('Years') plt.show()

Multiple Axes and Plots

sorry if the post, is not that good. It's the first one for me on Stack Overflow.
I have Datasets in the following structure:
Revolution1 Position1 Temperature1 Revolution2 Position2 Temperature2
1/min mm C 1/min m C
datas....
I plot these against the time. Now I want for every different unit a new y axis. So i looked in the matplotlib example and wrote something like this. X ist the X-Values and d is the pandas dataframe:
fig,host=plt.subplots()
fig.subplots_adjust(right=0.75)
par1 = host.twinx()
par2 = host.twinx()
uni_units = np.unique(units[1:])
par2.spines["right"].set_position(("axes", 1.2))
make_patch_spines_invisible(par2)
# Second, show the right spine.
par2.spines["right"].set_visible(True)
for i,v in enumerate(header[1:]):
if d.loc[0,v] == uni_units[0]:
y=d.loc[an:en,v].values
host.plot(x,y,label=v)
if d.loc[0,v] == uni_units[1]:
(v,ct_yax[1]))
y=d.loc[an:en,v].values
par1.plot(x,y,label=v)
if d.loc[0,v] == uni_units[2]:
y=d.loc[an:en,v].values
par2.plot(x,y,label=v)
EDIT: Okay i really missed to ask the question (maybe i was nervous, because it was the first time posting here):
I actually wanted to ask why it does not work, since i only saw 2 plots. But by zooming in I saw it actually plots every curve...
sorry!
If I understand correctly what you want is to get subplots from the Dataframe.
You can achieve such using the subplots parameter within the plotfunction you have under the Dataframe object.
With below toy sample you can get a better idea on how to achieve this:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"y1":[1,5,3,2],"y2":[10,12,11,15]})
df.plot(subplots=True)
plt.show()
Which produces below figure:
You may check documentation about subplots for pandas Dataframe.

Hide matplotlib descriptions in jupyter notebook [duplicate]

This question already has answers here:
Disable the output of matplotlib pyplot
(5 answers)
Closed 6 years ago.
I am not sure what is the correct term for this, but here is what I see when I plot something:
The plots is actually what I want so see, but jupyter notebook also outputs some text: <matplotlib.axes._subplots.AxesSubplot at 0x1263354d0>, <matplotlib.figure.Figure at 0x1263353d0> which I am trying to get rid of.
After some searching, the only thing I was able to find is plt.ioff(), which didn't help me. Is there a way to get rid of the text?
You can finish the corresponding (matplotlib) line with a semicolon ;
This is a bit of a workaround, but it should work consistently:
1. Assign the plotting function to a variable (which could also be useful if you need to access some plot elements later on)
plt.figure(figsize=(3, 3))
plot = plt.plot(range(10),
[x*x for x in range(10)],
'o-')
2. Add a "pass" at the bottom of the cell (or an equivalent operation with no consequence)
plt.figure(figsize=(3, 3))
plt.plot(range(10),
[x*x for x in range(10)],
'o-')
pass
3. Add a semicolon at the end of the last statement
plt.figure(figsize=(3, 3))
plt.plot(range(10),
[x*x for x in range(10)],
'o-');