Using two parameters in seaborn size - dataframe

I'am trying to pass two parameters to a seaborn size scatterplot, it is possilbe to do it?
Am using iris df
sns.scatterplot(data=iris, x = 'sepal_length', y = 'petal_length', hue = 'species', size= 'petal_width', 'sepal_width)
That line does not work, but if i leave just one parameter in size it gives me this graph
What am trying to do is having both sepal and petal width in the same graph so it is posible to see how big they are based on the dots sizes. should I add more hues ?

Related

Auto-resize Figure in Seaborn

I am looking for some option to automatically resize the figures that I am generating using seaborn (barplots, countplot, boxplot). I am creating all the plots in one shot, but the issue is, in some of the graphs labels & bars are tightly packed because some of the columns have too many categorical values. I am using the below code:
for col in dff.drop(target_col_name, axis=1).columns:
if ((dff[col].nunique() / len(dff[col])) < threshold):
ax = sns.countplot(x=dff[col], hue= dff[target_col_name] )
ax.set_xticklabels(ax.get_xticklabels(), rotation = 90)
plt.tight_layout()
plt.show()
pd.crosstab(index = dff[col],
columns = dff[target_col_name], normalize = 'index').plot.bar()
plt.tight_layout()
plt.show()
elif (dff[col].dtype == 'int64' or dff[col].dtype == 'float64'):
sns.boxplot(dff[target_col_name], dff[col])
One solution is to increase all the figsize for all figures or use another if condition to target specific columns that have more categorical values and increase the size of those figures.
But I am looking for a more flexible solution so that all the figures get resized automatically based on the information in them.
I have used a plotly in-built function "figure()" that you can use to alter the size of charts. All you need do is declare it right before the code for your chats.
For instance, plt.figure(figsize=(12,5)) alters the height and width of the chart to 12 and 5 respectively.

Why the point size using sns.lmplot is different when I used plt.scatter?

I want to do a scatterplot according x and y variables, and the points size depend of a numeric variable and the color of every point depend of a categorical variable.
First, I was trying this with plt.scatter:
Graph 1
After, I tried this using lmplot but the point size is different in relation to the first graph.
I think the two graphs should be equals. Why not?
The point size is different in every graph.
Graph 2
Your question is no so much descriptive but i guess you want to control the size of the marker. Here is more documentation
Here is the start point for you.
A numeric variable can also be assigned to size to apply a semantic mapping to the areas of the points:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="size", size="size")
For seaborn scatterplot:
df = sns.load_dataset("anscombe")
sp = sns.scatterplot(x="x", y="y", hue="dataset", data=df)
And to change the size of the points you use the s parameter.
sp = sns.scatterplot(x="x", y="y", hue="dataset", data=df, s=100)

3d seaborn lmplot using variable marker size

I have a pandas dataframe with three columns (A,B,C). I have drawn a regression line of A vs B using
sns.lmplot(x='A', y='B', data = df, x_bins=10, ci=None)
I am using 10 bins and no confidence interval as I have a large number (~5million) datapoints.
I would like to show the value of C on this plot. C has nothing to do with the regression of A against B. I would just like to show C by making the marker size of each bin equal to the average value of C within that bin.
It seems seaborn doesn't have a markersize parameter that can be set equal to a column of the dataframe. Is this even possible?
I cam across this stackexchange post which suggests using scatter_kws={"s": 100} to set the marker size. However, when I tried scatter_kws={"s": df['C']} it threw an error.
If this is not possible in seaborn, are there any alternative solutions?

Seaborn time series plotting: a different problem for each function

I'm trying to use seaborn dataframe functionality (e.g. passing column names to x, y and hue plot parameters) for my timeseries (in pandas datetime format) plots.
x should come from a timeseries column(converted from a pd.Series of strings with pd.to_datetime)
y should come from a float column
hue comes from a categorical column that I calculated.
There are multiple streams in the same series that I am trying to separate (and use the hue for separating them visually), and therefore they should not be connected by a line (like in a scatterplot)
I have tried the following plot types, each with a different problem:
sns.scatterplot: gets the plotting right and the labels right bus has problems with the xlimits, and I could not set them right with plt.xlim() using data.Dates.min and data.Dates.min
sns.lineplot: gets the limits and the labels right but I could not find a setting to disable the lines between the individual datapoints like in matplotlib. I tried the setting the markers and the dashes parameters to no avail.
sns.stripplot: my last try, plotted the datapoints correctly and got the xlimits right but messed the labels ticks
Example input data for easy reproduction:
dates = pd.to_datetime(('2017-11-15',
'2017-11-29',
'2017-12-15',
'2017-12-28',
'2018-01-15',
'2018-01-30',
'2018-02-15',
'2018-02-27',
'2018-03-15',
'2018-03-27',
'2018-04-13',
'2018-04-27',
'2018-05-15',
'2018-05-28',
'2018-06-15',
'2018-06-28',
'2018-07-13',
'2018-07-27'))
values = np.random.randn(len(dates))
clusters = np.random.randint(1, size=len(dates))
D = {'Dates': dates, 'Values': values, 'Clusters': clusters}
data = pd.DataFrame(D)
To each of the functions I am passing the same arguments:
sns.OneOfThePlottingFunctions(x='Dates',
y='Values',
hue='Clusters',
data=data)
plt.show()
So to recap, what I want is a plot that uses seaborn's pandas functionality, and plots points(not lines) with correct x limits and readable x labels :)
Any help would be greatly appreciated.
ax = sns.scatterplot(x='Dates', y='Values', hue='Clusters', data=data)
ax.set_xlim(data['Dates'].min(), data['Dates'].max())

Plot axvline from Point to Point in Matplotlib Python 3.6

I am reading Data from a Simulation out of an Excel File. Out of this Data I generated two DataFrames containing 200 values. Now i want to plot all the Values from DataFrame one in blue and all Values from DataFrame two in purple. Therefore I have following code:
df = pd.read_excel("###CENSORED####.xlsx", sheetname="Data")
unpatched = df["Unpatched"][:-800]
patched = df["Patched"][:-800]
x = range(0,len(unpatched))
fig = plt.figure(figsize=(10, 5))
plt.scatter(x, unpatched, zorder=10, )
plt.scatter(x, patched, c="purple",zorder=19,)
This results in following Graph:
But now i want to draw in some lines that visualize the difference between the blue and purple dots. I thought about an orange line going from blue dot at simulation-run x to the purple dot at simulation-run x. I've tried to "cheat" with following code, since I'm pretty new to matplotlib.
scale_factor = 300
for a in x:
plt.axvline(a, patched[a]/scale_factor, unpatched[a]/scale_factor, c="orange")
But this resulted in a inaccuracy as seen seen below:
So is there a smarter way to do this? I've realized that the axvline documentation only says that ymin, ymax can only be scalars. Can I somehow turn my given values into fitting scalars?