Interactively annotating points in scatter plot using Bokeh - pandas

I'm trying to use Bokeh to build an interactive tool that allows a user to select a subset of points from a scatter plot and to subsequently label or annotate those points. Ideally, the user-provided input would update a "label" field for that sample's row in a dataframe.
The code below allows the user to select the points, but how do I make it so that they can then label those selected points from a text-input widget e.g. text = TextInput(value="default", title="Label:")
, and in so doing, change the "label" field for that sample in the dataframe?
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show, ColumnDataSource
from bokeh.models import HoverTool
from bokeh.models.widgets import TextInput
data = pd.DataFrame()
data["x"] = np.random.randn(100)
data["y"] = np.random.randn(100)
data["label"] = "other"
x=data.x.values
y=data.y.values
label=data.label.values
output_file("toolbar.html")
source = ColumnDataSource(
data=dict(
x=x,
y=y,
_class=label,
)
)
hover = HoverTool(
tooltips=[
("index", "$index"),
("(x,y)", "($x, $y)"),
("class", "#_class"),
]
)
p = figure(plot_width=400, plot_height=400, tools=[hover,"lasso_select","crosshair",],
title="Mouse over the dots")
p.circle('x', 'y', size=5, source=source)
show(p)

Related

Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Am trying to find hist()'s figsize and layout parameter for sns.pairplot().
I have a pairplot that gives me nice scatterplots between the X's and y. However, it is oriented horizontally and there is no equivalent layout parameter to make them vertical to my knowledge. 4 plots per row would be great.
This is my current sns.pairplot():
sns.pairplot(X_train,
x_vars = X_train.select_dtypes(exclude=['object']).columns,
y_vars = ["SalePrice"])
This is what I would like it to look like: Source
num_mask = train_df.dtypes != object
num_cols = train_df.loc[:, num_mask[num_mask == True].keys()]
num_cols.hist(figsize = (30,15), layout = (4,10))
plt.show()
What you want to achieve isn't currently supported by sns.pairplot, but you can use one of the other figure-level functions (sns.displot, sns.catplot, ...). sns.lmplot creates a grid of scatter plots. For this to work, the dataframe needs to be in "long form".
Here is a simple example. sns.lmplot has parameters to leave out the regression line (fit_reg=False), to set the height of the individual subplots (height=...), to set its aspect ratio (aspect=..., where the subplot width will be height times aspect ratio), and many more. If all y ranges are similar, you can use the default sharey=True.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some test data with different y-ranges
np.random.seed(20230209)
X_train = pd.DataFrame({"".join(np.random.choice([*'uvwxyz'], np.random.randint(3, 8))):
np.random.randn(100).cumsum() + np.random.randint(100, 1000) for _ in range(10)})
X_train['SalePrice'] = np.random.randint(10000, 100000, 100)
# convert the dataframe to long form
# 'SalePrice' will get excluded automatically via `melt`
compare_columns = X_train.select_dtypes(exclude=['object']).columns
long_df = X_train.melt(id_vars='SalePrice', value_vars=compare_columns)
# create a grid of scatter plots
g = sns.lmplot(data=long_df, x='SalePrice', y='value', col='variable', col_wrap=4, sharey=False)
g.set(ylabel='')
plt.show()
Here is another example, with histograms of the mpg dataset:
import matplotlib.pyplot as plt
import seaborn as sns
mpg = sns.load_dataset('mpg')
compare_columns = mpg.select_dtypes(exclude=['object']).columns
mpg_long = mpg.melt(value_vars=compare_columns)
g = sns.displot(data=mpg_long, kde=True, x='value', common_bins=False, col='variable', col_wrap=4, color='crimson',
facet_kws={'sharex': False, 'sharey': False})
g.set(xlabel='')
plt.show()

Add a slider to plotly that dynamically changes a column of data frame that is displayed

Minimal working example:
import pandas as pd
import plotly.express as px
A = [10,20,30,40,50,60]
B = [40,50,60,10,20,30]
data = pd.DataFrame({"A":A,"B":B})
alpha=0.5
data["Parameter"]= alpha*data["A"] +(1-alpha)*data["B"]
fig = px.scatter(
data, x="A",y="B",color="Parameter"
)
fig.show()
I would like to have a slider for alpha in plotly graph. I looked at the documentation but only found a slider for a fixed column with constant values.
fig = px.scatter(data, x="A", y="B", color="Parameter", animation_frame='Parameter')
fig["layout"].pop("updatemenus")
fig.update_xaxes(range=[0, 100])
fig.show('browser')

Plotly chart percentage with smileys

I would like o add a plot figure based on smileys like this one:
dat will come from a dataframe pandas : dataframe.value_counts(normalize=True)
Can some one give me some clues.
use colorscale in normal way for a heatmap
use anotation_text to assign an emoji to a value
import plotly.figure_factory as ff
import plotly.graph_objects as go
import pandas as pd
import numpy as np
df = pd.DataFrame([[j*10+i for i in range(10)] for j in range(10)])
e=["😃","🙂","😐","☚ī¸"]
fig = go.Figure(ff.create_annotated_heatmap(
z=df.values, colorscale="rdylgn", reversescale=False,
annotation_text=np.select([df.values>75, df.values>50, df.values>25, df.values>=0], e),
))
fig.update_annotations(font_size=25)
# allows emoji to use background color
fig.update_annotations(opacity=0.7)
update coloured emoji
fundamentally you need emojicons that can accept colour styling
for this I switched to Font Awesome. This then also requires switching to dash, plotly's cousin so that external CSS can be used (to use FA)
then build a dash HTML table applying styling logic for picking emoticon and colour
from jupyter_dash import JupyterDash
import dash_html_components as html
import pandas as pd
import branca.colormap
# Load Data
df = pd.DataFrame([[j*10+i for i in range(10)] for j in range(10)])
external_stylesheets = [{
'href': 'https://use.fontawesome.com/releases/v5.8.1/css/all.css',
'rel': 'stylesheet', 'crossorigin': 'anonymous',
'integrity': 'sha384-50oBUHEmvpQ+1lW4y57PTFmhCaXp0ML5d60M1M7uH2+nqUivzIebhndOJK28anvf',
}]
# possibly could use a a different library for this - simple way to map a value to a colormap
cm = branca.colormap.LinearColormap(["red","yellow","green"], vmin=0, vmax=100, caption=None)
def mysmiley(v):
sm = ["far fa-grin", "far fa-smile", "far fa-meh", "far fa-frown"]
return html.Span(className=sm[3-(v//25)], style={"color":cm(v),"font-size": "2em"})
# Build App
app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div([
html.Table([html.Tr([html.Td(mysmiley(c)) for c in r]) for r in df.values])
])
# Run app and display result inline in the notebook
app.run_server(mode='inline')

How to plot frequency distribution graph using Matplotlib?

I trust you are doing well. I am using a data frame in which there are two columns screens and it's frequency. I am trying to find out the relationship between the screen and the frequency of the appearance of the screens. Now I want to know, for all screens what are all of the frequencies as sort of a summary graph. Imagine putting all of those frequencies into an array, and wanting to study the distribution in that array. Below is my code that I have tried so far:
data = pd.read_csv('frequency_list.csv')
new_columns = data.columns.values
new_columns[1] = 'frequency'
data.columns = new_columns
import matplotlib.pyplot as plt
%matplotlib inline
dataset = data.head(10)
dataset.plot(x = "screen", y = "frequency", kind = "bar")
plt.show()
col_one_list = unpickled_df['screen'].tolist()
col_one_arr = unpickled_df['screen'].head(10).to_numpy()
plt.hist(col_one_arr) #gives you a histogram of your array 'a'
plt.show() #finishes out the plot
Below is the screenshot of my data frame containing screen as one column and frequency as another. Can you help me to find out a way to plot a frequency distribution graph? Thanks in advance.
Will a bar plot work? Here's an example:
import pandas as pd
import matplotlib.pyplot as plt
freq = [102,98,56,117]
screen = ['A','B','C','D']
df = pd.DataFrame(list(zip(screen, freq)), columns=['screen', 'freq'])
plt.bar(df.screen,df.freq)
plt.xlabel('x')
plt.ylabel('count')
plt.show()

Update data point labels in bokeh plot

I use bokeh in an ipython notebook and would like to have a button next to a plot to switch on or off labels of the data points. I found a solution using IPython.html.widgets.interact, but this solution resets the plot for each update including zooming and padding
This is the minimal working code example:
from numpy.random import random
from bokeh.plotting import figure, show, output_notebook
from IPython.html.widgets import interact
def plot(label_flag):
p = figure()
N = 10
x = random(N)+2
y = random(N)+2
labels = range(N)
p.scatter(x, y)
if label_flag:
pass
p.text(x, y, labels)
output_notebook()
show(p)
interact(plot, label_flag=True)
p.s. If there is an easy way to do this in matplotlib I would also switch back again.
By using bokeh.models.ColumnDataSource to store and change the plot's data I was able to achieve what I wanted.
One caveat is, that I found no way to make it work w/o refresh w/o calling output_notebook twice in two different cells. If I remove one of the two output_notebook calls the gui of the tools-button looks breaks or changing a setting also results in a reset of the plot.
from numpy.random import random
from bokeh.plotting import figure, show, output_notebook
from IPython.html.widgets import interact
from bokeh.models import ColumnDataSource
output_notebook()
## <-- new cell -->
p = figure()
N = 10
x_data = random(N)+2
y_data = random(N)+2
labels = range(N)
source = ColumnDataSource(
data={
'x':x_data,
'y':y_data,
'desc':labels
}
)
p.scatter('x', 'y', source=source)
p.text('x', 'y', 'desc', source=source)
output_notebook()
def update_plot(label_flag=True):
if label_flag:
source.data['desc'] = range(N)
else:
source.data['desc'] = ['']*N
show(p)
interact(update_plot, label_flag=True)