i am new to machine learning, and i want to compare the predicted and the actual value, now i want to compare this both of the data in plot to see if both the values are same or not.
data:
[-0.26112159, 1.84683522, 2.23912728, 1.58848056, 1.28589823,
2.01355579, -0.144594 , 0.8845673 , -0.19764173, 0.00837658,
1.3515489 , 0.18876488, 1.07088203, 1.11333346, 0.99854107,
1.67141781, 1.74938417, 1.17907989, 1.57017018, 2.04269495,
-0.10662102, 0.96283466, -0.01117658, 0.01610438, 1.31111783,
-0.08608504, -0.09535655, -0.0227967 , 1.82867539, 1.4492189 ]
this is my data sample for both A and B datasets
i want to plot like this,
I prefer using seaborn
Seaborn provides some elaborated functions to display data. Internally it depends heavily on matplotlib. As the requested plot doesn't fall into the categories Seaborn excells at, it seems more adequate to employ matplotlib's scatter directly:
import numpy as np
from matplotlib import pyplot as plt
A = [-0.26112159, 1.84683522, 2.23912728, 1.58848056, 1.28589823,
2.01355579, -0.144594, 0.8845673, -0.19764173, 0.00837658,
1.3515489, 0.18876488, 1.07088203, 1.11333346, 0.99854107,
1.67141781, 1.74938417, 1.17907989, 1.57017018, 2.04269495,
-0.10662102, 0.96283466, -0.01117658, 0.01610438, 1.31111783,
-0.08608504, -0.09535655, -0.0227967, 1.82867539, 1.4492189]
B = A
x = np.arange(len(A))
plt.scatter(x - 0.2, A, marker='o', color='tomato', label='Dataset A')
plt.scatter(x + 0.2, B, marker='o', color='deepskyblue', label='Dataset B')
plt.legend()
plt.show()
To have crosses as markers, use marker='x' or marker='+'.
To draw a swarmplot (very similar to a stripplot) via Seaborn:
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# A, B = ...
sns.swarmplot(x=np.repeat(['Dataset A', 'Dataset B'], len(A)), y=np.concatenate([A, B]))
plt.show()
A kde plot can be used to compare the statistical distribution. Here is an example with some noise added to make both sets a little different:
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# A = ...
A = np.array(A)
B = A + np.random.normal(0,.1, len(A))
sns.kdeplot(A, label='Dataset A')
sns.kdeplot(B, label='Dataset B')
plt.show()
If you insist on using seaborn, one way is the following. However, stripplot fits in very well for such case as you don't have to pass the x-values explicitly.
import seaborn as sns
sns.set()
sns.scatterplot(range(len(A)), A, marker='x', color='orange', label='Dataset A')
sns.scatterplot(range(len(A)), A+0.1, marker='x', color='blue', label='Dataset B')
just figured out a way to do so, please let me know if there is any better method to plot for comparing two arrays.
fig,_ax=plt.subplots(1,2,figsize=(15,10))
sns.stripplot(y=predictions,ax=_ax[0])
sns.stripplot(y=y_test,ax=_ax[1],color='red')
Related
I have two DataFrames that have time-series data of BTC. I want to display the graphs side by side to analyze them.
display(data_df.plot(figsize=(15,20)))
display(model_df.plot(figsize=(15,20)))
When I plot them like this they stack on top of each-other vertically. I want them side-by-side so they look like this.
enter image description here
Heres one way that might work using subplots (Im guessing you want a total figsize=30x20):
import pylab as plt
fig,(ax0,ax1) = plt.subplots(nrows=1,ncols=2, figsize=(30,20))
data_df.plot(ax=ax0)
model_df.plot(ax=ax1)
You can use matplotlib.pyplot.subplots :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data_df = pd.DataFrame(np.random.randint(0,100,size=(15, 2)), columns=list('AB'))
model_df = pd.DataFrame(np.random.randint(0,100,size=(15, 2)), columns=list('AB'))
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12,4))
for col, ax in zip(data_df, axes):
data_df[col].plot(ax=ax, label=f"data_df ({col})")
model_df[col].plot(ax=ax, label=f"model_df ({col})")
ax.legend()
# Output :
I'm trying to transform the scales on y-axis to the log values. For example, if one of the numbers on y is 0.01, I want to get -2 (which is log(0.01)). How should I do this in matplotlib (or any other library)?!
Thanks,
Without plt.yscale('log') there will be few y-ticks visible that have a nice number as log. You can change the "formatter" to a function that only shows the exponent. Also note that in the latest seaborn version distplot has been replaced by histplot(..., kde=True) or kdeplot(...).
Here is an example:
import matplotlib.pyplot as plt
from matplotlib.ticker import LogFormatterExponent
import numpy as np
import seaborn as sns
x = np.random.randn(10, 1000).cumsum(axis=1).ravel()
ax = sns.histplot(x, kde=True, stat='density', color='purple')
ax.set_yscale('log')
ax.yaxis.set_major_formatter(LogFormatterExponent(base=10.0, labelOnlyBase=True))
ax.set_ylabel(ax.get_ylabel() + ' (exponent)')
ax.margins(x=0)
plt.show()
I am trying to create a grouped bar graph using Seaborn but I am getting a bit lost in the weeds. I actually have it working but it does not feel like an elegant solution. Seaborn only seems to support clustered bar graphs when there is a binary option such as Male/Female. (https://seaborn.pydata.org/examples/grouped_barplot.html)
It does not feel right having to fall back onto matplotlib so much - using the subplots feels a bit dirty :). Is there a way of handling this completely in Seaborn?
Thanks,
Andrew
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
sns.set_theme(style="whitegrid")
rcParams.update({'figure.autolayout': True})
dataframe = pd.read_csv("https://raw.githubusercontent.com/mooperd/uk-towns/master/uk-towns-sample.csv")
dataframe = dataframe.groupby(['nuts_region']).agg({'elevation': ['mean', 'max', 'min'],
'nuts_region': 'size'}).reset_index()
dataframe.columns = list(map('_'.join, dataframe.columns.values))
# We need to melt our dataframe down into a long format.
tidy = dataframe.melt(id_vars='nuts_region_').rename(columns=str.title)
# Create a subplot. A Subplot makes it convenient to create common layouts of subplots.
# https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.subplots.html
fig, ax1 = plt.subplots(figsize=(6, 6))
# https://stackoverflow.com/questions/40877135/plotting-two-columns-of-dataframe-in-seaborn
g = sns.barplot(x='Nuts_Region_', y='Value', hue='Variable', data=tidy, ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
I'm not sure why you need seaborn. Your data is wide format, so pandas does it pretty well without the need for melting:
from matplotlib import rcParams
sns.set(style="whitegrid")
rcParams.update({'figure.autolayout': True})
fig, ax1 = plt.subplots(figsize=(12,6))
dataframe.plot.bar(x='nuts_region_', ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
Output:
I'd like to use matplotlib to display a horizontal histogram similar to the one below:
The code below works fine for vertical histograms:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
plt.hist(df['A'])
plt.show()
The orientation='horizontal' parameter makes the bars horizontal, but clobbers the horizontal scale.
plt.hist(df['A'],orientation='horizontal')
The following works, but feels like a lot of work. Is there a better way?
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.set_xticks([0,5,10])
ax.set_xticklabels([0,5,10])
ax.set_yticks([0,1])
ax.set_yticklabels(['Male','Female'])
df['A'].hist(ax=ax,orientation='horizontal')
fig.tight_layout() # Improves appearance a bit.
plt.show()
plt.hist(df['A']) only works by coincidence. I would recommend not to use plt.hist for non-numeric or categorical plots - it's not meant to be used for that.
Also, it's often a good idea to separate data aggregation from visualization. So, using pandas plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
df["A"].value_counts().plot.barh()
plt.show()
Or using matplotlib plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
counts = df["A"].value_counts()
plt.barh(counts.index, counts)
plt.show()
I would like to configure the subplots size using Gridspec as explained in this question.
Python/Matplotlib - Change the relative size of a subplot
How do this, If i want to use Pandas Dataframe's plot funtion? Is it possible at all?
You can do it this way.
import pandas as pd
import pandas.io.data as web
import matplotlib.pyplot as plt
aapl = web.DataReader('AAPL', 'yahoo', '2014-01-01', '2015-05-31')
# 3 x 1 grid, position at (1st row, 1st col), take two rows
ax1 = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
# plot something on this ax1
aapl['Adj Close'].plot(style='r-', ax=ax1)
# plot moving average again on this ax1
pd.ewma(aapl['Adj Close'], span=20).plot(style='k--', ax=ax1)
# get the other ax
ax2 = plt.subplot2grid((3, 1), (2, 0), rowspan=1)
ax2.bar(aapl.index, aapl.Volume)
I don't think you can use gridspec with pandas plot. pandas plot module is a wrapper around the matplotlib pyplot and does not necessarily implements all the functionality.
If you inspect the pandas source on github and do a search on gridspec you will notice that plot offers no options to configure gridspec
https://github.com/pydata/pandas