How to change legend labels in scatter matrix - pandas

I have a scatter matrix that I want to change the labels for. On the right-hand, I want to change the blue color 1 to Say Mystery and the red color 2 to say Science. I also want to change the labels of each graph to label their counterpart [Spicy, Savory, and Sweet]. I tried using dict to relabel but then my charts came out wrong.
import plotly.express as px
fig = px.scatter_matrix(df,
dimensions=["Q12_Spicy", "Q12_Sav", "Q12_Sweet", ],color="Q11_Ans"
)
fig.show()

You can create a new column called Q11_Labels that maps 1 to Mystery and 2 to Science from the Q11_Ans column, and pass colors='Q11_Labels' to the px.scatter_matrix function. If you still want the legend to display the original column name, you can pass a dictionary to the labels parameter of the px.scatter_matrix function with labels={"Q11_Labels":"Q11_Ans"}
Then you can extend this dictionary to include the other column name to display name mappings as well, so that [Spicy, Savory, Sweet] are displayed instead of [Q12_Spicy, Q12_Savory, Q12_Sweet].
import numpy as np
import pandas as pd
import plotly.express as px
## recreate random data with the same columns
np.random.seed(42)
df = pd.DataFrame(
np.random.randint(0,100,size=(100, 3)),
columns=["Q12_Spicy", "Q12_Sav", "Q12_Sweet"]
)
df["Q11_Ans"] = np.random.randint(1,3,size=100)
df["Q11_Ans"] = df["Q11_Ans"].astype("category")
df = df.sort_values(by="Q11_Ans")
## remap the values of 1 and 2 to their meanings, then pass this as the color
df["Q11_Labels"] = df["Q11_Ans"].map({1: "Mystery", 2: "Science"})
## pass a dictionary to the labels parameter
fig = px.scatter_matrix(df,
dimensions=["Q12_Spicy", "Q12_Sav", "Q12_Sweet"],color="Q11_Labels",
labels = {"Q12_Spicy":"Spicy","Q12_Sav":"Savory","Q12_Sweet":"Sweet", "Q11_Labels":"Q11_Ans"}
)
fig.show()

Related

How to build columns in Plotly with multiple values sorted by value?

I have a dataframe with data, the code is below, in which there are 3 columns - date, system and number, building a bar graph in Plotly I get two bars in which I cannot set the sorting by values, they are atomatically sorted by name.
import pandas as pd
import numpy as np
data = [('2022-10-01','Pay1',644), ('2022-10-01','Pay2',1460), ('2022-10-01','Pay3',1221), ('2022-10-01','Pay4',1623),\
('2022-10-01','Pay5',1904), ('2022-10-01','Pay6',1853), ('2022-10-01','Pay7',1826), ('2022-10-01','Pay8',247),\
('2022-10-01','Pay9',713), ('2022-10-01','Pay10',1159), ('2022-10-02','Pay1',755), ('2022-10-02','Pay2',786),\
('2022-10-02','Pay3',623), ('2022-10-02','Pay4',1766), ('2022-10-02','Pay5',1141), ('2022-10-02','Pay6',362),\
('2022-10-02','Pay7',1097), ('2022-10-02','Pay8',655), ('2022-10-02','Pay9',1569), ('2022-10-02','Pay10',796)]
data = pd.DataFrame(data,columns=['date','system','number'])
import plotly.express as px
fig = px.bar(data, x='date', y='number',
color='system')
fig.show()
I want to get a bar that will be sorted by value, from smallest to largest in each case
The expected graph is a stacked graph using the same color for categorical variables, and the order of the graphs is in order of increasing numerical value. To make the categorical variables the same color, create a dictionary of default discrete to maps and system columns. Add a column of colors to each data frame. Extract data frames by date, sort them in numerical order of size, and loop through them row by row.
import plotly.graph_objects as go
import plotly.express as px
colors = px.colors.qualitative.Plotly
system_name = data['system'].unique()
colors_dict = {k:v for k,v in zip(system_name, colors)}
# print(colors_dict)
fig = go.Figure()
dff = data.query('date =="2022-10-01"')
dff = dff.sort_values('number',ascending=False)
dff['color'] = dff['system'].map(colors_dict)
for row in dff.itertuples():
fig.add_trace(go.Bar(x=[row.date], y=[row.number], name=row.system, marker_color=row.color))
fig.update_layout(barmode='stack')
dfm = data.query('date =="2022-10-02"')
dfm = dfm.sort_values('number',ascending=False)
dfm['color'] = dfm['system'].map(colors_dict)
for row in dfm.itertuples():
fig.add_trace(go.Bar(x=[row.date], y=[row.number], name=row.system, marker_color=row.color))
fig.update_layout(barmode='stack')
names = set()
fig.for_each_trace(
lambda trace:
trace.update(showlegend=False)
if (trace.name in names) else names.add(trace.name))
fig.show()

label pandas header with an index name

i try to run this code, to generate a plot with plotly express.
import plotly.express as px
df = pd.DataFrame([[1,1,1,0,0], [1,1,1,0,1],
[1,1,0,1,0], [0,1,1,1,1]])
##example by plotly
#https://plotly.com/python/facet-plots/
#fig = px.line(df, facet_col="company", facet_col_wrap=2)
fig = px.area(df, facet_col_wrap=2) #works but not as expected
#fig = px.area(df,facet_col="???", facet_col_wrap=2) #should be the solution, but "???" is missing
fig.show(renderer="browser")
the example of plolty (https://plotly.com/python/facet-plots/) has a labeled header ("company"), which is called with the facet_col. id don't know how to insert a label for my df header. i expect to plot the dataframe as in the example from plotly.
You need to get your data frame into appropriate structure for plotly express
unstack() to transform columns into rows in index
reset_index() to make index columns, plus restore original row index with set_index()
now you have a structure to use parameters to px.line()
import plotly.express as px
df = pd.DataFrame([[1,1,1,0,0], [1,1,1,0,1],
[1,1,0,1,0], [0,1,1,1,1]])
px.area(df.unstack().to_frame().reset_index().set_index("level_0"), facet_col="level_1", facet_col_wrap=2)

Making a Scatter Plot from a DataFrame in Pandas

I have a DataFrame and need to make a scatter-plot from it.
I need to use 2 columns as the x-axis and y-axis and only need to plot 2 rows from the entire dataset. Any suggestions?
For example, my dataframe is below (50 states x 4 columns). I need to plot 'rgdp_change' on the x-axis vs 'diff_unemp' on the y-axis, and only need to plot for the states, "Michigan" and "Wisconsin".
So from the dataframe, you'll need to select the rows from a list of the states you want: ['Michigan', 'Wisconsin']
I also figured you would probably want a legend or some way to differentiate one point from the other. To do this, we create a colormap assigning a different color to each state. This way the code is generalizable for more than those two states.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
# generate a random df with the relevant rows, columns to your actual df
df = pd.DataFrame({'State':['Alabama', 'Alaska', 'Michigan', 'Wisconsin'], 'real_gdp':[1.75*10**5, 4.81*10**4, 2.59*10**5, 1.04*10**5],
'rgdp_change': [-0.4, 0.5, 0.4, -0.5], 'diff_unemp': [-1.3, 0.4, 0.5, -11]})
fig, ax = plt.subplots()
states = ['Michigan', 'Wisconsin']
colormap = cm.viridis
colorlist = [colors.rgb2hex(colormap(i)) for i in np.linspace(0, 0.9, len(states))]
for i,c in enumerate(colorlist):
x = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].rgdp_change.values[i]
y = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].diff_unemp.values[i]
legend_label = states[i]
ax.scatter(x, y, label=legend_label, s=50, linewidth=0.1, c=c)
ax.legend()
plt.show()
Use the dataframe plot method, but first filter the sates you need using index isin method:
states = ["Michigan", "Wisconsin"]
df[df.index.isin(states)].plot(kind='scatter', x='rgdp_change', y='diff_unemp')

Creating dataframe boxplot from dataframe with row and column multiindex

I have the following Pandas data frame and I'm trying to create a boxplot of the "dur" value for both client and server organized by qdepth (qdepth on x-axis, duration on y-axis, with two variables client and server). It seems like I need to get client and serveras columns. I haven't been able to figure this out trying combinations ofunstackandreset_index`.
Here's some dummy data I recreated since you didn't post yours aside from an image:
qdepth,mode,runid,dur
1,client,0x1b7bd6ef955979b6e4c109b47690c862,7.0
1,client,0x45654ba030787e511a7f0f0be2db21d1,30.0
1,server,0xb760550f302d824630f930e3487b4444,19.0
1,server,0x7a044242aec034c44e01f1f339610916,95.0
2,client,0x51c88822b28dfa006bf38603d74f9911,15.0
2,client,0xd5a9028fddf9a400fd8513edbdc58de0,49.0
2,server,0x3943710e587e3932adda1cad8eaf2aeb,30.0
2,server,0xd67650fd984a48f2070de426e0a942b0,93.0
Load the data: df = pd.read_clipboard(sep=',', index_col=[0,1,2])
Option 1:
df.unstack(level=1).boxplot()
Option 2:
df.unstack(level=[0,1]).boxplot()
Option 3:
Using seaborn:
import seaborn as sns
sns.boxplot(x="qdepth", hue="mode", y="dur", data=df.reset_index(),)
Update:
To answer your comment, here's a very approximate way (could be used as a starting point) to recreate the seaborn option using only pandas and matplotlib:
fig, ax = plt.subplots(nrows=1,ncols=1, figsize=(12,6))
#bp = df.unstack(level=[0,1])['dur'].boxplot(ax=ax, return_type='dict')
bp = df.reset_index().boxplot(column='dur',by=['qdepth','mode'], ax=ax, return_type='dict')['dur']
# Now fill the boxes with desired colors
boxColors = ['darkkhaki', 'royalblue']
numBoxes = len(bp['boxes'])
for i in range(numBoxes):
box = bp['boxes'][i]
boxX = []
boxY = []
for j in range(5):
boxX.append(box.get_xdata()[j])
boxY.append(box.get_ydata()[j])
boxCoords = list(zip(boxX, boxY))
# Alternate between Dark Khaki and Royal Blue
k = i % 2
boxPolygon = mpl.patches.Polygon(boxCoords, facecolor=boxColors[k])
ax.add_patch(boxPolygon)
plt.show()

matplotlib: varying color of line to capture natural time parameterization in data

I am trying to vary the color of a line plotted from data in two arrays (eg. ax.plot(x,y)). The color should vary as the index into x and yincreases. I am essentially trying to capture the natural 'time' parameterization of the data in arrays x and y.
In a perfect world, I want something like:
fig = pyplot.figure()
ax = fig.add_subplot(111)
x = myXdata
y = myYdata
# length of x and y is 100
ax.plot(x,y,color=[i/100,0,0]) # where i is the index into x (and y)
to produce a line with color varying from black to dark red and on into bright red.
I have seen examples that work well for plotting a function explicitly parameterized by some 'time' array, but I can't get it to work with raw data...
The second example is the one you want... I've edited it to fit your example, but more importantly read my comments to understand what is going on:
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
x = myXdata
y = myYdata
t = np.linspace(0,1,x.shape[0]) # your "time" variable
# set up a list of (x,y) points
points = np.array([x,y]).transpose().reshape(-1,1,2)
print points.shape # Out: (len(x),1,2)
# set up a list of segments
segs = np.concatenate([points[:-1],points[1:]],axis=1)
print segs.shape # Out: ( len(x)-1, 2, 2 )
# see what we've done here -- we've mapped our (x,y)
# points to an array of segment start/end coordinates.
# segs[i,0,:] == segs[i-1,1,:]
# make the collection of segments
lc = LineCollection(segs, cmap=plt.get_cmap('jet'))
lc.set_array(t) # color the segments by our parameter
# plot the collection
plt.gca().add_collection(lc) # add the collection to the plot
plt.xlim(x.min(), x.max()) # line collections don't auto-scale the plot
plt.ylim(y.min(), y.max())