visualizing longitudinal patient data: Adding specific icons or symbols to certain cells in a time series heatmap to indicate events/outcomes - data-visualization

I am currently involved in a clinical study. We are trying to visualize patients blood work over time using seaborn cluster maps (for example patient CPR levels). For reference: We have some 200 Patients and up to 60 days of observed data, so cells in the plot are pretty small.
Some patients during the observations period either died or developed an outcome of interest. We would love to visualize these key events with some form a symbol or icon. I am imagining something like this:
In addition to its color coding the field at the date of death gehts big dot right in the middle, or even a symbolic cross or some other symbol.
Things that might work, but i do not know how to do:
I am using lines to seperate cells. Changing the widths and color of the cells at the date an event occured might work.
Things that dont work:
cell in my heatmap are too small for custom annotations
import pandas as pd
import seaborn as sns
df= pd.read_excel('data.xlsx')
heatmap = sns.clustermap(df,col_cluster=False, row_cluster=False, cmap='YlOrRd', mask=df=0, vmax=10, vmin=0, linewidths=1, linecolor='black', figsize=(20,16), cbar_pos=(0.1, 0.2, .02, .6))

Related

Annotating numeric values on grouped bars chart in pyplot

Good evening all,
I have a pd.dataframe called plot_eigen_vecs_df which is of (3,11) dimension, and I am plotting each column value grouped by rows on a bar chart. I am using the following code:
plot_eigen_vecs_df.plot(kind='bar', figsize=(12, 8),
title='First 3 PCs factor loadings',
xlabel='Evects', legend=True)
The result is this graph:
enter image description here
I would like to keep the graph (grouped) exactly as it is, but I need to show the numeric value above each bars.
Thank you
I tried the add_label method, but unfortunately I am currently using a version of pyplot which is not the most recent, so .add_label doesn't work for me. Could you please help on the matter?

How to create Correlation Heat Map of All Measure in Tableau?

I have Query with 10 Measures I am able to draw correlation heat map in Python using below?
import pandas as pd
import seaborn as sn
import matplotlib as mt
df = pd.read_sql('select statement')
sn.heatmap(df.corr(), annot=True)
mt.pyplot.show()
How can I make similar correlation heat map in Tableau?
The general way to make a heatmap in Tableau is to put a discrete field on rows and a discrete field on columns. Select the square mark type. Under Format, make a square cell size, and adjust the cell size to be as large as you prefer.
Then put a continuous field on the color shelf. Click on the color button to choose the color palette you like, and possibly turn on a border. Click on the size button to adjust the mark size to match the cell size.
There are a lot of good examples on Tableau Public.
https://public.tableau.com/app/search/vizzes/correlation%20matrix

How to make a Scatter Plot for a Dataset with 4 Attribtues and 5th attribute being the Cluster

I have a dataset which looks like this,
It has four attributes and the fifth column (which I added by myself) is the cluster of each row to which the row belongs.
I want to build something like a Scatter Plot for this dataset, but I am unable to do so. I have tried searching it up and the best I could find was this following question on Stackoverflow,
How to make a 4d plot with matplotlib using arbitrary data
Using this, I was able to make a Scatter Plot but it can only be done for three attributes while fourth attribute being the cluster of each row.
Can anyone help me figure out how would it be possible to do the same to make a Scatter Plot for a dataset similar to mine?
I would recommend something like seaborn's pairplot:
import seaborn as sns
sns.pairplot(df, hue="cluster")
See the images in the link, of what it looks like.
This creates several pairwise scatterplots instead of trying to make a 3D plot and arbitrarily flatten one of the dimensions.

Pandas dataframe rendered with bokeh shows no marks

I am attempting to create a simple hbar() chart on two columns [project, bug_count]. Sample dataframe follows:
df = pd.DataFrame({'project': ['project1', 'project2', 'project3', 'project4'],
'bug_count': [43683, 31647, 27494, 24845]})
When attempting to render any chart: scatter, circle, vbar etc... I get a blank chart.
This very simple code snippet shows an empty viz. This example shows a f.circle() just for demonstration, I'm actually trying to implement a f.hbar().
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
f = figure()
f.circle(df['project'], df['bug_count'],size = 10)
show(f)
The values of df['project'] are strings, i.e. categorical values, not numbers. Categorical ranges must be explicitly provided, since you are the only person who possess the knowledge of what order the arbitrary factors should appear in on the axis. Something like
p = figure(x_range=sorted(set(df['project'])))
There is an entire chapter in the User's Guide devoted to Handling Categorical Data, with many complete examples (including many bar charts) that you can refer to.

How do I create a bar chart that starts and ends in a certain range

I created a computer model (just for fun) to predict soccer match result. I ran a computer simulation to predict how many points that a team will gain. I get a list of simulation result for each team.
I want to plot something like confidence interval, but using bar chart.
I considered the following option:
I considered using matplotlib's candlestick, but this is not Forex price.
I also considered using matplotlib's errorbar, especially since it turns out I can mashes graphbar + errorbar, but it's not really what I am aiming for. I am actually aiming for something like Nate Silver's 538 election prediction result.
Nate Silver's is too complex, he colored the distribution and vary the size of the percentage. I just want a simple bar chart that plots on a certain range.
I don't want to resort to plot bar stacking like shown here
Matplotlib's barh (or bar) is probably suitable for this:
import numpy as np
import matplotlib.pylab as pl
x_mean = np.array([1, 3, 6 ])
x_std = np.array([0.3, 1, 0.7])
y = np.array([0, 1, 2 ])
pl.figure()
pl.barh(y, width=2*x_std, left=x_mean-x_std)
The bars have a horizontal width of 2*x_std and start at x_mean-x_std, so the center denotes the mean value.
It's not very pretty (yet), but highly customizable: