matplotlib: make a pie chart for select column in Python - pandas

Dataset:https://dl.dropboxusercontent.com/s/v9gmgxupkypn5dw/train-data.csv
I want to make a pie chart for Fuel_Type.
Divide the Fuel_Type into three part: Diesel, Petrol, and Others (not Diesel and Petrol).
Make a pie chart for that and show the percentages on that chart.
My codes are below:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('car_train_data.csv',sep= ",",index_col= 0)
temp = car_df.groupby("Fuel_Type", as_index=False).size()
cutoff = temp["size"].sum() * 0.05
temp_idx = temp["size"] < cutoff
other_sum = temp.loc[temp_idx, "size"].sum()
But I really confused about how to define not Diesel and Petrol...Anyone can help me? Or show me a example so that I can change the code from other example. Thanks a lot!

You can filter your dataframe based on the desired value for a specific column. Try something like:
petrol = df[df['Fuel_Type'] == 'Petrol']
diesel = df[df['Fuel_Type'] == 'Diesel']
others = df[df['Fuel_Type'] != 'Petrol' & df['Fuel_Type'] != 'Diesel']
this will create the three portions you need, then pass them to the pie chart

Related

Seaborn hue with loc condition

I'm facing the following problem: I'd like to create a lmplot with seaborn and I'd like to distinguish the colors not based on an existing column but based on a condition adressed to a column.
Given the following df for a rental price prediction:
area
rental price
year build
...
40
400
1990
...
60
840
1995
...
480
16
1997
...
...
...
...
...
sns.lmplot(x="area", y="rental price", data=df, hue = df.loc[df['year build'] > 1992])
this one above is not working. I know I can add a column representing this condition and adressing this column in "hue" but is there no way giving seaborn a condition to hue?
Thanks in advance!
You could add a new column with the boolean information and use that for the hue. For example data['at least from eighties'] = data['model_year'] >= 80. This will create a legend with the column name as title, and False and True as texts. If you map the values to strings, these will appear. Here is an example using one of seaborn's demo datasets:
import matplotlib.pyplot as plt
import seaborn as sns
df = sns.load_dataset('mpg')
df['decenium'] = (df['model_year'] >= 80).map({False: "seventies", True: "eighties"})
sns.lmplot(x='weight', y='mpg', data=df, hue='decenium')
plt.tight_layout()
plt.show()

Bokeh grouped bar chart, changing data presented

I'm experimenting with the grouped bar chart example for Bokeh using Pandas shown below. I am trying to see if I can get it to display the data differently for example I wanted the bar graph to show a count of the rows that meet each group. I tried that by replacing instances of 'mpg_mean' to 'mpg_count' but just got an invalid column error. I also experimented with having the graph show a sum by again using 'mpg_sum' with the same error. I'm assuming the calculation for the 'mpg_mean' is occurring in the groupby but how do I get it to display the count or the sum? It's definitely not clear in this example where any calculations are happening.
Thanks in advance for any help!
from bokeh.io import output_file, show
from bokeh.palettes import Spectral5
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap
output_file("bar_pandas_groupby_nested.html")
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
group = df.groupby(by=['cyl', 'mfr'])
index_cmap = factor_cmap('cyl_mfr', palette=Spectral5, factors=sorted(df.cyl.unique()), end=1)
p = figure(width=800, height=300, title="Mean MPG by # cylinders and manufacturer",
x_range=group, toolbar_location=None, tooltips=[("MPG", "#mpg_mean"), ("Cyl, Mfr", "#cyl_mfr")])
p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group,
line_color="white", fill_color=index_cmap, )
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
show(p)

Plotly base values are in percentage

I have table in which one my base values are in percentage
ID TYPE PERCENTAGE
1 gold 15%
2 silver 71.4%
3 platinum 20%
4 copper 88.88%
But plotly doesn't like that
Do you know how I could tell him "hey these data are in percentage, please show me a percentage graph"?
I think plotly is the required answer, so I created it in Plotly. I have converted the percentages in the existing data frame to decimal format. Finally, I set the Y axis display to '%'.
import plotly.express as px
df['PERCENTAGE'] = df['PERCENTAGE'].apply(lambda x:float(str(x).strip('%')) / 100)
fig = px.bar(df, x='TYPE', y='PERCENTAGE')
fig.update_layout(yaxis_tickformat='%')
fig.show()
Does this work for you:
df.PERCENTAGE = df.PERCENTAGE.str.replace('%', '') #remove % sign
df.PERCENTAGE = pd.to_numeric(df.PERCENTAGE) #convert to numeric
plt.bar(df.TYPE, df.PERCENTAGE) #plot
plt.ylabel('Percentage')
plt.show()
Output:
Note you can always check the type of your data with df.dtypes

Creating a barplot in python seaborn with error bars showing standard deviation

I am new to python.
I am analyzing a dataset and need some help in plotting the barplot with error bars showing SD.
Check an example data set below at the following link https://drive.google.com/file/d/10JDr7d_vhEocWzChg-sfBEumsWVghFS8/view?usp=sharing
Here is the code that I am using;
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
df = pd.read_excel('Sample_data.xlsx')
#Adding a column 'Total' by adding all cell counts in each row
#This will give the cells counted in each sample
df['Total'] = df['Cell1'] + df['Cell2'] + df['Cell3'] + df['Cell4']
df
# Creating a pivot table based on Timepoint and cell types
phenotype = df.pivot_table (index = ['Timepoint'],
values=['Cell1',
'Cell2',
'Cell3',
'Cell4'],
aggfunc = np.sum,
margins = False)
phenotype
# plot different cell types grouped according to the timepoint and error bars = SD
sns.barplot(data = phenotype)
Now I am stuck in plotting cell types based on timepoint column and putting error bars = SD.
Any help is much appreciated.
Thanks.
If you swap the rows and columns from pivot, you get the format you want. Does this fit the intent of your question?
phenotype = df.pivot_table (index = ['Time point'],
values=['Cell1', 'Cell2', 'Cell3', 'Cell4'],
aggfunc = np.sum,
margins = False)
phenotype.reset_index()
phenotype = phenotype.stack().unstack(level=0)
phenotype
Time point 48 72 96
Cell1 54 395 57
Cell2 33 35 39
Cell3 1 3 9
Cell4 2 6 3
sns.boxplot(data = phenotype)

Align Pandas barchart style

please I went through the documentation on:
https://pandas.pydata.org/pandas-docs/stable/style.html
section "Bar Charts" but when I try to use the same code with this simple dataframe it seems something is going wrong.
My code is the following:
import pandas as pd
df = pd.DataFrame([[0,0,19],[0,-3,16],[1,0,21]], columns = ["5D Net Chg","20D Net Chg","60D Net Chg"] )
df.style.bar(subset=["5D Net Chg","20D Net Chg","60D Net Chg"], align='mid', color=['#d65f5f', '#5fba7d'])
If you run this code, you should see what I see, that is:
* the -3 cell is fully red, whereas I would expect to see a red bar starting from the middle of the cell
* the cell with value 1 is fully green. I would expect to see a tiny green bar starting from the middle of the cell, given the max value in the dataframe is 21
what could I do to make it look like more like the "Conditional Formatting" data bars we have in excel?
Thanks
what you want is to align the value 0 in the mid of the cell. So you should choose align='zero'. Note that the aligning works columnwise.
import pandas as pd
df = pd.DataFrame([[0,0,-5],[0,-3,16],[1,2,21]], columns = [1,2,3] )
df.style.bar(subset=[1,2,3], align='zero', color=['#d65f5f', '#5fba7d'])
I think everything is correct. It's just the values that you have. Changing the values to the following:
import pandas as pd
df = pd.DataFrame([[0,2,19],[0,-3,16],[1,-4,21]],
columns = ["5D Net Chg","20D Net Chg","60D Net Chg"] )
I obtain this
df.style.bar(subset=["5D Net Chg","20D Net Chg","60D Net Chg"], align='mid', color=['#d65f5f', '#5fba7d'])