How to display negative x values on the left side for barplot? - matplotlib

I would like to ask question regarding to barplot for seaborn.
I have a dataset returned from bigquery and converted to dataframe as below.
Sample data from `df.sort_values(by=['dep_delay_in_minutes']).to_csv(csv_file)`
,dep_delay_in_minutes,arrival_delay_in_minutes,numflights
1,-50.0,-38.0,2
2,-49.0,-59.5,4
3,-46.0,-28.5,4
4,-45.0,-44.0,4
5,-43.0,-53.0,4
6,-42.0,-35.0,6
7,-40.0,-26.0,4
8,-39.0,-33.5,4
9,-38.0,-21.5,4
10,-37.0,-37.666666666666664,12
11,-36.0,-35.0,2
12,-35.0,-32.57142857142857,14
13,-34.0,-30.0,18
14,-33.0,-26.200000000000003,10
15,-32.0,-34.8,10
16,-31.0,-28.769230769230766,26
17,-30.0,-34.93749999999999,32
18,-29.0,-31.375000000000004,48
19,-28.0,-24.857142857142854,70
20,-27.0,-28.837209302325583,86
I wrote the code as below but the negative value is plotted on right hand side .
import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
import numpy as np
import google.datalab.bigquery as bq
import warnings
# Disable warnings
warnings.filterwarnings('ignore')
sql="""
SELECT
DEP_DELAY as dep_delay_in_minutes,
AVG(ARR_DELAY) AS arrival_delay_in_minutes,
COUNT(ARR_DELAY) AS numflights
FROM flights.simevents
GROUP BY DEP_DELAY
ORDER BY DEP_DELAY
"""
df = bq.Query(sql).execute().result().to_dataframe()
df = df.sort_values(['dep_delay_in_minutes'])
ax = sb.barplot(data=df, x='dep_delay_in_minutes', y='numflights', order=df['dep_delay_in_minutes'])
ax.set_xlim(-50, 0)
How can I display x axis as numeric order with negative values on left hand side ?
I appreciate if I could get some adice.

It doesn't work to specify left and right with ax.set_xlim(). It was displayed well with only one specification.
import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
import numpy as np
df = df.sort_values(['dep_delay_in_minutes'])
ax = sb.barplot(x='dep_delay_in_minutes', y='numflights', data=df, order=df['dep_delay_in_minutes'])
ax.set_xlim(0.0)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=45)
plt.show()
A different notation was also possible.
ax.set_xlim(0.0,)

Related

Matplotlib tick formatter for large numbers

When I run this
import numpy as np
import matplotlib.pyplot as plt
plt.plot(np.arange(1e6, 3 * 1e7, 1e6))
I get this plot plot_before,
where the y-axis is a bit weird (there is a 1e7 on the top).
So, I am trying to fix this. I came up with a solution using FuncFormatter,
import numpy as np
import matplotlib.pyplot as plt
plt.plot(np.arange(1e6, 3 * 1e7, 1e6))
def y_fmt(x, y):
if x == 0:
return r'$0$'
r, p = "{:.1e}".format(x).split('e+')
r = r[:-2] if r[-1] == '0' else r
p = p[1:] if p[0] == '0' else p
return r'${:}\times10^{:}$'.format(r, p)
plt.gca().get_yaxis().set_major_formatter(matplotlib.ticker.FuncFormatter(y_fmt))
here is the result plot_after.
My question is, is there a better way of doing this, maybe using LogFormatterSciNotation? Or is it possible to say matplotlib to not put 1e7 on the top?
UPDATE:
I didn’t know that there is such a thing as
plt.ticklabel_format(useOffset=False)
but it seems that this is not doing anything for the data I used above (np.arange(1e6, 3 * 1e7, 1e6)). I don’t know if this is a bug or if there is something I don’t understand about this function...
You may want ScalarFormatter and ticklabel_format. Some claimed you need both two of them, or just ScalarFormatter and don't need ticklabel_format. I'm not entirely sure about this behaviour. But it works.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mt
fig, ax = plt.subplots(1,1)
ax.plot(np.arange(1e6, 3 * 1e7, 1e6))
ax.yaxis.set_major_formatter(mt.ScalarFormatter(useMathText=True))
ax.ticklabel_format(style="sci", axis="y", scilimits=(0,2))

Ploting dataframe with NAs with linearly joined points

I have a dataframe where each column has many missing values. How can I make a plot where the datapoints in each column are joined with lines, i.e. NAs are ignored, instead of having a choppy plot?
import numpy as np
import pandas as pd
pd.options.plotting.backend = "plotly"
d = pd.DataFrame(data = np.random.choice([np.nan] + list(range(7)), size=(10,3)))
d.plot(markers=True)
One way is to use this for each column:
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, name="linear",
line_shape='linear'))
Are there any better ways to accomplish this?
You can use pandas interpolate. Have demonstrated using plotly express and chained use so underlying data is not changed.
Post comments have amended answer so that markers are not shown for interpreted points.
import numpy as np
import pandas as pd
import plotly.express as px
d = pd.DataFrame(data=np.random.choice([np.nan] + list(range(7)), size=(10, 3)))
px.line(d).update_traces(mode="lines+markers").add_traces(
px.line(d.interpolate(limit_direction="both")).update_traces(showlegend=False).data
)

How to use IF to plot the valuers higher than x

I have an Excell file and I want to plot all values higher than 200000 (for example) in the column Verkaufspreis
#Calling the Excell file
import pandas as pd
df = pd.read_excel("wohnungspreise.xlsx")
#ploting the date in a table
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(df["Quadratmeter"], df["Verkaufspreis"])
if(df["Verkaufspreis"] > 200000){
plt.show()
}

Distribution probabilities for each column data frame, in one plot

I am creating probability distributions for each column of my data frame by distplot from seaborn library sns.distplot(). For one plot I do
x = df['A']
sns.distplot(x);
I am trying to use the FacetGrid & Map to have all plots for each columns at once
in this way. But doesn't work at all.
g = sns.FacetGrid(df, col = 'A','B','C','D','E')
g.map(sns.distplot())
I think you need to use melt to reshape your dataframe to long format, see this MVCE:
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')
g = sns.FacetGrid(dfm, col='columns')
g = (g.map(sns.distplot, 'value'))
Output:
From seaborn 0.11.2 it is not recommended to use FacetGrid directly. Instead, use sns.displot for figure-level plots.
np.random.seed(2022)
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')
g = sns.displot(data=dfm, x='value', col='columns', col_wrap=3, common_norm=False, kde=True, stat='density')
You're getting this wrong on two levels.
Python syntax.
FacetGrid(df, col = 'A','B','C','D','E') is invalid, because col gets set to A and the remaining characters are interpreted as further arguments. But since they are not named, this is invalid python syntax.
Seaborn concepts.
Seaborn expects a single column name as input for the col or row argument. This means that the dataframe needs to be in a format that has one column which determines to which column or row the respective datum belongs.
You do not call the function to be used by map. The idea is of course that map itself calls it.
Solutions:
Loop over columns:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
fig, axes = plt.subplots(ncols=5)
for ax, col in zip(axes, df.columns):
sns.distplot(df[col], ax=ax)
plt.show()
Melt dataframe
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
g = sns.FacetGrid(df.melt(), col="variable")
g.map(sns.distplot, "value")
plt.show()
You can use the following:
# listing dataframes types
list(set(df.dtypes.tolist()))
# include only float and integer
df_num = df.select_dtypes(include = ['float64', 'int64'])
# display what has been selected
df_num.head()
# plot
df_num.hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8);
I think the easiest approach is to just loop the columns and create a plot.
import numpy as np
improt pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
for col in df.columns:
hist = df[col].hist(bins=10)
print("Plotting for column {}".format(col))
plt.show()

Matplotlib float values on the axis instead of integers

I have the following code that shows the following plot. I can't get to show the fiscal year correctly on the x axis and it's showing as if they are float. I tried to do the astype(int) and it didn't work. Any ideas on what I am doing wrong?
p1 = plt.bar(list(asset['FISCAL_YEAR']),list(asset['TOTAL']),align='center')
plt.show()
This is the plot:
In order to make sure only integer locations obtain a ticklabel, you may use a matplotlib.ticker.MultipleLocator with an integer number as argument.
To then format the numbers on the axes, you may use a matplotlib.ticker.StrMethodFormatter.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
df = pd.DataFrame({"FISCAL_YEAR" : np.arange(2000,2017),
'TOTAL' : np.random.rand(17)})
plt.bar(df['FISCAL_YEAR'],df['TOTAL'],align='center')
locator = matplotlib.ticker.MultipleLocator(2)
plt.gca().xaxis.set_major_locator(locator)
formatter = matplotlib.ticker.StrMethodFormatter("{x:.0f}")
plt.gca().xaxis.set_major_formatter(formatter)
plt.show()