Matplotlib - Displaying a scatterplot by coloring according to a column [duplicate] - dataframe

This question already has answers here:
How to change scatter plot color according to certain rule
(3 answers)
Matplotlib scatterplot; color as a function of a third variable
(3 answers)
plot different color for different categorical levels using matplotlib
(8 answers)
Closed 26 days ago.
The matplotlib scatter method plots every single point on the graph.
The parameter c is for the color and the parameter s is for size.
These parameters can either be a single element (e.g. c = 'red') or a list of the same size as the number of points (e.g. c = ['red', 'green', 'blue']).
When workig with dataframes the parameter c allows to color the points according to the values taken by the points in a column, by passing this column as an argument to the parameter c.
I want to display a scatterplot of the price according to the living area, but coloring according to the quality of the house, and it is given :
color = ['black', 'blue', 'red', 'green', 'purple', 'pink', 'brown', 'yellow', 'cyan']
The parameter c will be the column of the dataframe that corresponds to the quality, or not? How do we use the list color then?
I would write :
plt.scatter(df['LivingArea'], df['Price'], c=df['Quality'])
Could you explain to me how we use the given list of colors?

Related

make specific data points in scatter plot seaborn more visible [duplicate]

I have a Seaborn scatterplot and am trying to control the plotting order with 'hue_order', but it is not working as I would have expected (I can't get the blue dot to show on top of the gray).
x = [1, 2, 3, 1, 2, 3]
cat = ['N','Y','N','N','N']
test = pd.DataFrame(list(zip(x,cat)),
columns =['x','cat']
)
display(test)
colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(data=test, x='x', y='x',
hue='cat', hue_order=['Y', 'N', ],
palette=colors,
)
Flipping the 'hue_order' to hue_order=['N', 'Y', ] doesn't change the plot. How can I get the 'Y' category to plot on top of the 'N' category? My actual data has duplicate x,y ordinates that are differentiated by the category column.
The reason this is happening is that, unlike most plotting functions, scatterplot doesn't (internally) iterate over the hue levels when it's constructing the plot. It draws a single scatterplot and then sets the color of the elements with a vector. It does this so that you don't end up with all of the points from the final hue level on top of all the points from the penultimate hue level on top of all the ... etc. But it means that the scatterplot z-ordering is insensitive to the hue ordering and reflects only the order in the input data.
So you could use your desired hue order to sort the input data:
hue_order = ["N", "Y"]
colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(
data=test.sort_values('cat', key=np.vectorize(hue_order.index)),
x='x', y='x',
hue='cat', hue_order=hue_order,
palette=colors, s=100, # Embiggen the points to see what's happening
)
There may be a more efficient way to do that "sort by list of unique values" built into pandas; I am not sure.
TLDR: Before plotting, sort the data so that the dominant color appears last in the data. Here, it could just be:
test = test.sort_values('cat') # ascending = True
Then you get:
It seems that hue_order doesn't affect the order (or z-order) in which things are plotted. Rather, it affects how colors are assigned. E.g., if you don't specify a specific mapping of categories to colors (i.e. you just use a list of colors or a color palette), this parameter can determine whether 'N' or 'Y' gets the first (and which gets the second) color of the palette. There's an example showing this behavior here in the hue_order section. When you have the dict already linking categories to colors (colors = {'N': 'gray', 'Y': 'blue'}), it seems to just affect the order of labels in the legend, as you probably have seen.
So the key is to make sure the color you want on top is plotted last (and thus "on top"). I would have also assumed the hue_order parameter would do as you expected, but apparently not!

Plot data using facet-grid in seaborn [duplicate]

This question already has answers here:
How to change the number or rows and columns in my catplot
(2 answers)
Seaborn multiple barplots
(2 answers)
subplotting with catplot
(1 answer)
Closed 4 months ago.
I have this dataset table
And i want to plot profit made by different sub_category in different region.
now i am using this code to make a plot using seaborn
sns.barplot(data=sub_category_profit,x="sub_category",y="profit",hue="region")
I am getting a extreamly huge plot like this output
is there is any way i can get sub-plots of this like a facet-gird. Like subplots of different sub_category. I have used the facet grid function but it is the also not working properly.
g=sns.FacetGrid(data=sub_category_profit,col="sub_category")
g.map(sns.barplot(data=sub_category_profit,x="region",y="profit"))
I am getting the following output
As you can see in the facet grid output the plots are very small and the bar graph is just present on one grid.
See docs on seaborn.FacetGrid, particularly the posted example, where you should not pass the data again in the map call but simply the plot function and x and y variables to draw plots to corresponding facets.
Also, consider the col_wrap argument since you do not specify row to avoid the very wide plot output.
g=sns.FacetGrid(data=sub_category_profit, col="sub_category", col_wrap=4)
g.map_dataframe(sns.barplot, x="region", y="profit")

How to plot different line style for each column in the dataset using seaborn

I have the following dataset df:
A B C D E
1 2 5 6 9
7 9 10 11 13
6 10 11 23 87
I want to create a seaborn line plot so that for each of the columns I get a different linestyle with the same color, the linestyle which I can choose, however, I am clueless about how to proceed from here
I tried this and I am getting the required result however I want to choose the different linestyle for each of the columns manually:
sns.lineplot(data=df)
Use markers to activate multiple linestyles, and use palette to set all columns to the same color:
sns.lineplot(data=df, markers=True, palette=['blue'] * df.columns.size)
This example uses markers=True which lets seaborn automatically choose the linestyles, but you can also pass a list of matplotlib markers to manually specify your own:
markers : boolean, list, or dictionary
Object determining how to draw the markers for different levels of the style variable. Setting to True will use default markers, or you can pass a list of markers or a dictionary mapping levels of the style variable to markers. Setting to False will draw marker-less lines. Markers are specified as in matplotlib.

Option c and option s in pandas dataframe plot

I saw the following snippet of code from a book I am studying from
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
)
plt.legend()
housing here is a pandas dataframe. I checked the documentation for pandas.DataFrame.plot on https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html. However, I don't see anywhere on the link where it states what c and s mean. I can infer from the resulting figure what it means, but I'm wondering why does the documentation not show this?
I believe you should look at matplotlib documentation site, mentioned here . Since the default plotting.backend is set to matplotlib in pandas, so these things are mentioned there. Here is what it say:
s: float or array-like, shape (n, ), optional
The marker size in points**2. Default is rcParams['lines.markersize'] ** 2.
c: array-like or list of colors or color, optional
The marker colors. Possible values:
A scalar or sequence of n numbers to be mapped to colors using cmap and norm.
A 2-D array in which the rows are RGB or RGBA.
A sequence of colors of length n.
A single color format string.

map the colors of a boxplot with values using seaborn

I am making a boxplot with seaborn base on some values , and i want to map the colors with some other colors i am doing this :
plt.figure(figsize=(20,12))
sns.boxplot(y='name',x='value',data=df,showfliers=False,orient="h")
the result is boxplots with random colors i want the colors to be defined according to a value of a third column in the dataframe. The only thing i could find it the use of "HUE" but it is dividing the data on more boxplots and it is not what i want to do
You can indeed specify the color with hue:
sns.boxplot(x='name', y='value', data=df
hue='name', # same with `x`
palette={'A':'r','B':'b'}, # specify color
)