Adjusting the space between datapoints on a seaborn swarm/scatter plot - matplotlib

I am searching for a way to adjust the space between data points (red arrows) and between the x-ticks (green arrows) on a seaborn strip- or swarm-plot.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {'Days': np.full((48, 5), [4,7, 8, 9, 10]).reshape(-1),
'Group': np.full((80, 3), ["Group1", "Group2", "Group3"]).reshape(-1),
'Value': np.random.rand(240)}
df = pd.DataFrame(data=data)
fig, ax = plt.subplots(figsize=(20, 10), dpi=80)
sns.stripplot(x=df.Days, y=df.Value, jitter=0, size=5, ax=ax, linewidth=1,
dodge=True, hue=df.Group, palette="Set1", data=df)
plt.show()

The strip plot provides functionality called jitter which makes it an advantage to visualize the collision of data as shown in the image below. However, to adjust the space you should adjust the jitter in the strip plot method to a number greater than zero. you can make it 5 for example or any number that is appropriate for u.
sns.stripplot(x=df.Days, y=df.Value, jitter=5, data=df)
Moreover, u can see this video to understand more about strip plot https://www.youtube.com/watch?v=wNgxdH02hrw
see the image

Related

Scale Y axis of matplotlib plot in jupyter notebook

I want to scale Y axis so that I can see values, as code below plots cant see anything other than a thin black line. Changing plot height doesn't expand the plot.
import numpy as np
import matplotlib.pyplot as plt
data=np.random.random((4,10000))
plt.rcParams["figure.figsize"] = (20,100)
#or swap line above with one below, still no change in plot height
#fig=plt.figure(figsize=(20, 100))
plt.matshow(data)
plt.show()
One way to do this is just repeat the values then plot result, but I would have thought it possible to just scale the height of the plot?
data_repeated = np.repeat(data, repeats=1000, axis=0)
You can do it like this:
import numpy as np
import matplotlib.pyplot as plt
data=np.random.random((4, 10000))
plt.figure(figsize=(40, 10))
plt.matshow(data, fignum=1, aspect='auto')
plt.show()
Output:

Is there a way to draw shapes on a python pandas plot

I am creating shot plots for NHL games and I have succeeded in making the plot, but I would like to draw the lines that you see on a hockey rink on it. I basically just want to draw two circles and two lines on the plot like this.
Let me know if this is possible/how I could do it
Pandas plot is in fact matplotlib plot, you can assign it to variable and modify it according to your needs ( add horizontal and vertical lines or shapes, text, etc)
# plot your data, but instead diplaying it assing Figure and Axis to variables
fig, ax = df.plot()
ax.vlines(x, ymin, ymax, colors='k', linestyles='solid') # adjust to your needs
plt.show()
working code sample
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from matplotlib.patches import Circle
from matplotlib.collections import PatchCollection
df = seaborn.load_dataset('tips')
ax = df.plot.scatter(x='total_bill', y='tip')
ax.vlines(x=40, ymin=0, ymax=20, colors='red')
patches = [Circle((50,10), radius=3)]
collection = PatchCollection(patches, alpha=0.4)
ax.add_collection(collection)
plt.show()

How to plot only y-axis using seaborn

i am new to machine learning, and i want to compare the predicted and the actual value, now i want to compare this both of the data in plot to see if both the values are same or not.
data:
[-0.26112159, 1.84683522, 2.23912728, 1.58848056, 1.28589823,
2.01355579, -0.144594 , 0.8845673 , -0.19764173, 0.00837658,
1.3515489 , 0.18876488, 1.07088203, 1.11333346, 0.99854107,
1.67141781, 1.74938417, 1.17907989, 1.57017018, 2.04269495,
-0.10662102, 0.96283466, -0.01117658, 0.01610438, 1.31111783,
-0.08608504, -0.09535655, -0.0227967 , 1.82867539, 1.4492189 ]
this is my data sample for both A and B datasets
i want to plot like this,
I prefer using seaborn
Seaborn provides some elaborated functions to display data. Internally it depends heavily on matplotlib. As the requested plot doesn't fall into the categories Seaborn excells at, it seems more adequate to employ matplotlib's scatter directly:
import numpy as np
from matplotlib import pyplot as plt
A = [-0.26112159, 1.84683522, 2.23912728, 1.58848056, 1.28589823,
2.01355579, -0.144594, 0.8845673, -0.19764173, 0.00837658,
1.3515489, 0.18876488, 1.07088203, 1.11333346, 0.99854107,
1.67141781, 1.74938417, 1.17907989, 1.57017018, 2.04269495,
-0.10662102, 0.96283466, -0.01117658, 0.01610438, 1.31111783,
-0.08608504, -0.09535655, -0.0227967, 1.82867539, 1.4492189]
B = A
x = np.arange(len(A))
plt.scatter(x - 0.2, A, marker='o', color='tomato', label='Dataset A')
plt.scatter(x + 0.2, B, marker='o', color='deepskyblue', label='Dataset B')
plt.legend()
plt.show()
To have crosses as markers, use marker='x' or marker='+'.
To draw a swarmplot (very similar to a stripplot) via Seaborn:
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# A, B = ...
sns.swarmplot(x=np.repeat(['Dataset A', 'Dataset B'], len(A)), y=np.concatenate([A, B]))
plt.show()
A kde plot can be used to compare the statistical distribution. Here is an example with some noise added to make both sets a little different:
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# A = ...
A = np.array(A)
B = A + np.random.normal(0,.1, len(A))
sns.kdeplot(A, label='Dataset A')
sns.kdeplot(B, label='Dataset B')
plt.show()
If you insist on using seaborn, one way is the following. However, stripplot fits in very well for such case as you don't have to pass the x-values explicitly.
import seaborn as sns
sns.set()
sns.scatterplot(range(len(A)), A, marker='x', color='orange', label='Dataset A')
sns.scatterplot(range(len(A)), A+0.1, marker='x', color='blue', label='Dataset B')
just figured out a way to do so, please let me know if there is any better method to plot for comparing two arrays.
fig,_ax=plt.subplots(1,2,figsize=(15,10))
sns.stripplot(y=predictions,ax=_ax[0])
sns.stripplot(y=y_test,ax=_ax[1],color='red')

changing the size of subplots with matplotlib

I am trying to plot multiple rgb images with matplotlib
the code I am using is:
import numpy as np
import matplotlib.pyplot as plt
for i in range(0, images):
test = np.random.rand(1080, 720,3)
plt.subplot(images,2,i+1)
plt.imshow(test, interpolation='none')
the subplots appear tiny though as thumbnails
How can I make them bigger?
I have seen solutions using
fig, ax = plt.subplots()
syntax before but not with plt.subplot ?
plt.subplots initiates a subplot grid, while plt.subplot adds a subplot. So the difference is whether you want to initiate you plot right away or fill it over time. Since it seems, that you know how many images to plot beforehand, I would also recommend going with subplots.
Also notice, that the way you use plt.subplot you generate empy subplots in between the ones you are actually using, which is another reason they are so small.
import numpy as np
import matplotlib.pyplot as plt
images = 4
fig, axes = plt.subplots(images, 1, # Puts subplots in the axes variable
figsize=(4, 10), # Use figsize to set the size of the whole plot
dpi=200, # Further refine size with dpi setting
tight_layout=True) # Makes enough room between plots for labels
for i, ax in enumerate(axes):
y = np.random.randn(512, 512)
ax.imshow(y)
ax.set_title(str(i), fontweight='bold')

How to use Gridspec with Pandas plot

I would like to configure the subplots size using Gridspec as explained in this question.
Python/Matplotlib - Change the relative size of a subplot
How do this, If i want to use Pandas Dataframe's plot funtion? Is it possible at all?
You can do it this way.
import pandas as pd
import pandas.io.data as web
import matplotlib.pyplot as plt
aapl = web.DataReader('AAPL', 'yahoo', '2014-01-01', '2015-05-31')
# 3 x 1 grid, position at (1st row, 1st col), take two rows
ax1 = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
# plot something on this ax1
aapl['Adj Close'].plot(style='r-', ax=ax1)
# plot moving average again on this ax1
pd.ewma(aapl['Adj Close'], span=20).plot(style='k--', ax=ax1)
# get the other ax
ax2 = plt.subplot2grid((3, 1), (2, 0), rowspan=1)
ax2.bar(aapl.index, aapl.Volume)
I don't think you can use gridspec with pandas plot. pandas plot module is a wrapper around the matplotlib pyplot and does not necessarily implements all the functionality.
If you inspect the pandas source on github and do a search on gridspec you will notice that plot offers no options to configure gridspec
https://github.com/pydata/pandas