Make colors in seaborn based on column names - pandas

I have a table with three columns: Names, A, B that I use to create a plot with the following code:
import seaborn as sns
sns.scatterplot(data=df, x="B", y="A")
How can make two different colors for dots based on column names? (i.e. A - red, B - green)

In a scatterplot, each point represents a pair values relating two sets of data, in your case A and B. Therefore, since each point on the graph is a pair, you can't colour different each individual point based on 'A' or 'B'.
What you can do, is set a different colour based on your Name column, using hue argument.
Below is an example using seaborn's tips dataset.
import seaborn as sns
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time")
In your case try something like:
sns.scatterplot(data=df, x="B", y="A",hue="Name")
https://seaborn.pydata.org/generated/seaborn.scatterplot.html

Related

How to create Correlation Heat Map of All Measure in Tableau?

I have Query with 10 Measures I am able to draw correlation heat map in Python using below?
import pandas as pd
import seaborn as sn
import matplotlib as mt
df = pd.read_sql('select statement')
sn.heatmap(df.corr(), annot=True)
mt.pyplot.show()
How can I make similar correlation heat map in Tableau?
The general way to make a heatmap in Tableau is to put a discrete field on rows and a discrete field on columns. Select the square mark type. Under Format, make a square cell size, and adjust the cell size to be as large as you prefer.
Then put a continuous field on the color shelf. Click on the color button to choose the color palette you like, and possibly turn on a border. Click on the size button to adjust the mark size to match the cell size.
There are a lot of good examples on Tableau Public.
https://public.tableau.com/app/search/vizzes/correlation%20matrix

Enforcing Incoming X-Axis Data to map with Static X-Axis - Plotly

I am trying to plot a multi-axes line graph in Plotly and my data is based on the percentage (y-axis) v/s date (x-axis).
X and Y-axis coming from the database via pandas
Now since Plotly doesn't understand the order of string date in the x-axis it adjusted it automatically.
I am looking for something where my x-axis remains static for dates and in order and graph plots on top of that mapping based on their dates matching parameter.
static_x_axis = ['02-11-2021', '03-11-2021', '04-11-2021', '05-11-2021', '06-11-2021', '07-11-2021', '08-11-2021', '09-11-2021', '10-11-2021', '11-11-2021', '12-11-2021', '13-11-2021', '14-11-2021', '15-11-2021', '16-11-2021', '17-11-2021', '18-11-2021', '19-11-2021', '20-11-2021', '21-11-2021', '22-11-2021', '23-11-2021']
and the above list determines the x-axis mapping.
I tried using range but seems that does not support static mapping or either map all graphs from the 0th point.
Overall I am looking for a way that either follows a static date range or either does not break the current order of dates like what happened in the above graph.
Thanks in advance for your help.
from your question your data:
x date as a string representation (i.e. categorical)
y a number between 0 and 1 (a precentage)
three traces
you describe that x is unordered as source. Require it to be sorted in the x-axis
below simulates a figure in this way
then applies categorical axis sorting
import pandas as pd
import numpy as np
import plotly.graph_objects as go
s = pd.Series(pd.date_range("2-nov-2021", periods=40).strftime("%d-%m-%Y"))
fig = go.Figure(
[
go.Scatter(
x=s.sample(10).sort_index().values,
y=np.linspace(n/4, n/3, 10),
mode="lines+markers+text",
)
for n in range(1,4)
]
).update_traces(texttemplate="%{y:.2f}", textposition="top center")
fig.show()
fig.update_layout(xaxis={"categoryorder": "array", "categoryarray": s.values})
fig.show()

Pandas dataframe rendered with bokeh shows no marks

I am attempting to create a simple hbar() chart on two columns [project, bug_count]. Sample dataframe follows:
df = pd.DataFrame({'project': ['project1', 'project2', 'project3', 'project4'],
'bug_count': [43683, 31647, 27494, 24845]})
When attempting to render any chart: scatter, circle, vbar etc... I get a blank chart.
This very simple code snippet shows an empty viz. This example shows a f.circle() just for demonstration, I'm actually trying to implement a f.hbar().
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
f = figure()
f.circle(df['project'], df['bug_count'],size = 10)
show(f)
The values of df['project'] are strings, i.e. categorical values, not numbers. Categorical ranges must be explicitly provided, since you are the only person who possess the knowledge of what order the arbitrary factors should appear in on the axis. Something like
p = figure(x_range=sorted(set(df['project'])))
There is an entire chapter in the User's Guide devoted to Handling Categorical Data, with many complete examples (including many bar charts) that you can refer to.

map the colors of a boxplot with values using seaborn

I am making a boxplot with seaborn base on some values , and i want to map the colors with some other colors i am doing this :
plt.figure(figsize=(20,12))
sns.boxplot(y='name',x='value',data=df,showfliers=False,orient="h")
the result is boxplots with random colors i want the colors to be defined according to a value of a third column in the dataframe. The only thing i could find it the use of "HUE" but it is dividing the data on more boxplots and it is not what i want to do
You can indeed specify the color with hue:
sns.boxplot(x='name', y='value', data=df
hue='name', # same with `x`
palette={'A':'r','B':'b'}, # specify color
)

hist() - how to force equal bins width?

Assuming I have the following array: [1,1,1,2,2,40,60,70,75,80,85,87,95] and I want to create a histogram out of it based on the following bins - x<=2, [3<=x<=80], [x>=81].
If I do the following: arr.hist(bins=(0,2,80,100)) I get the bins to be at different widths (based on their x range). I want them to represent different size ranges but appear in the histogram at the same width. Is it possible in an elegant way?
I can think of adding a new column for this (holding the bin id that will be calculated based on the boundaries I want) but don't really like this solution..
Thanks!
Sounds like you want a bar graph; You could use bar:
import numpy as np
import matplotlib.pyplot as plt
arr=np.array([1,1,1,2,2,40,60,70,75,80,85,87,95])
h=np.histogram(arr,bins=(0,2,80,100))
plt.bar(range(3),h[0],width=1)
xlab=['x<=2', '3<=x<=80]', 'x>=81']
plt.xticks(arange(0.5,3.5,1),xlab)