seaborn how do i create a box plot of only particular attributes in a dataframe - pandas

I would like to create two boxplots to visualize different attributes within my data by splitting the attributes up based on their scale. I currently have this
box plots to show the distributions of attributes
sns.boxplot(data=df)
box plot with all attributes included
I would like it to be like the images below with the attributes in different box plots based on their scale but with the attribute labels below each boxplot (not the current integers).
box plots to show the distributions of attributes
sns.boxplot(data=[df['mi'],df['steps'],df['Standing time'],df['lying time']])
box plot by scale 1

You can subset a pandas DataFrame by indexing with a list of column names
sns.boxplot(data=df[['mi', 'steps', 'Standing time', 'lying time']])

Related

How to create Correlation Heat Map of All Measure in Tableau?

I have Query with 10 Measures I am able to draw correlation heat map in Python using below?
import pandas as pd
import seaborn as sn
import matplotlib as mt
df = pd.read_sql('select statement')
sn.heatmap(df.corr(), annot=True)
mt.pyplot.show()
How can I make similar correlation heat map in Tableau?
The general way to make a heatmap in Tableau is to put a discrete field on rows and a discrete field on columns. Select the square mark type. Under Format, make a square cell size, and adjust the cell size to be as large as you prefer.
Then put a continuous field on the color shelf. Click on the color button to choose the color palette you like, and possibly turn on a border. Click on the size button to adjust the mark size to match the cell size.
There are a lot of good examples on Tableau Public.
https://public.tableau.com/app/search/vizzes/correlation%20matrix

is there a way to plot multiple lines using hvplot.line from an xarray array

I have multiple ytraces data in an xarray array.
data trace selection can be done by
t=s_xr_all.sel(trace_index=slice(0,2,1),xy='y')
# trace_index and xy are dimension names and above selects subset of 3 traces (lines) into t
t.name='t'
t.hvplot.line(x='point_index',y='t')
The above creates a line plot with a widget slider that allows scrolling through the lines with single line displayed at a time
I would like to be able to plot all lines without creating the slider widget.hvplot documentation is sparse as to how to do that
t.hvplot.line(x='point_index',y='t').overlay()
The .overlay() function chaining eliminates the slider creation and all the lines in the xarray are displayed

How to make a Scatter Plot for a Dataset with 4 Attribtues and 5th attribute being the Cluster

I have a dataset which looks like this,
It has four attributes and the fifth column (which I added by myself) is the cluster of each row to which the row belongs.
I want to build something like a Scatter Plot for this dataset, but I am unable to do so. I have tried searching it up and the best I could find was this following question on Stackoverflow,
How to make a 4d plot with matplotlib using arbitrary data
Using this, I was able to make a Scatter Plot but it can only be done for three attributes while fourth attribute being the cluster of each row.
Can anyone help me figure out how would it be possible to do the same to make a Scatter Plot for a dataset similar to mine?
I would recommend something like seaborn's pairplot:
import seaborn as sns
sns.pairplot(df, hue="cluster")
See the images in the link, of what it looks like.
This creates several pairwise scatterplots instead of trying to make a 3D plot and arbitrarily flatten one of the dimensions.

Pandas dataframe rendered with bokeh shows no marks

I am attempting to create a simple hbar() chart on two columns [project, bug_count]. Sample dataframe follows:
df = pd.DataFrame({'project': ['project1', 'project2', 'project3', 'project4'],
'bug_count': [43683, 31647, 27494, 24845]})
When attempting to render any chart: scatter, circle, vbar etc... I get a blank chart.
This very simple code snippet shows an empty viz. This example shows a f.circle() just for demonstration, I'm actually trying to implement a f.hbar().
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
f = figure()
f.circle(df['project'], df['bug_count'],size = 10)
show(f)
The values of df['project'] are strings, i.e. categorical values, not numbers. Categorical ranges must be explicitly provided, since you are the only person who possess the knowledge of what order the arbitrary factors should appear in on the axis. Something like
p = figure(x_range=sorted(set(df['project'])))
There is an entire chapter in the User's Guide devoted to Handling Categorical Data, with many complete examples (including many bar charts) that you can refer to.

Holoviews Polygons inputs

I have been able to make a choropleth map in Bokeh using multiple lists (latitudes, longitudes, county names, value to display, color to display). I wanted to use Holoviews with Bokeh to get their color legend as I prefer it over Bokeh's disjoint grouping one.
In general, I have been unable to find good documentation on structuring a dataframe so that Holoviews can pull data from it. I found mentions of it on their GeoViews documentation, and tried to replicate the Choropleths example they give but cannot get it to work. How do dataframes need to be formatted for Holoviews?
If you are wanting to render polygons from dataframes in HoloViews/GeoViews you have one of two options:
1) Use geopandas dataframes, which will work out of the box. Just pass your geopandas dataframe to the Polygons element and it will display itself.
2) Pass in a list of dataframes one for each polygon, e.g. in the following example we create list of dataframes by creating Box elements and calling dframe on them. This list of dataframes can now be passed to the Polygons element:
list_of_dfs = [hv.Box(0, 0, i/10.).dframe() for i in range(10, 1, -1)]
hv.Polygons(list_of_dfs)