Plotly scatterplot using pandas groupby for traces - pandas

I run into this pattern quite often. I want my traces to be the results of a groupby operation.
data = dict(
time = [1,1,1,2,2,2,3,3,3],
satellite_ID = [3,24,9,3,24,9,3,24,9],
satellite_type = ['gps','glonass','galileo']*3,
snr = [28,34,26,27,35,25,28,36,24])
df = pd.DataFrame(data)
The x-axis is time, the y-axis is SNR, and each line+marker trace is a unique satellite ID. There should be 3 traces at time 1, 2, and 3 for each satellite. A nice addition would be to have each satellite_type be a different color and visible on mouse hover.

I think I figured it out from the documentation.
import plotly.express as px
import pandas as pd
data = dict(
time = [1,1,1,2,2,2,3,3,3],
satellite_ID = [3,24,9,3,24,9,3,24,9],
satellite_type = ['gps','glonass','galileo']*3,
snr = [28,34,26,27,35,25,28,36,24])
df = pd.DataFrame(data)
fig = px.line(df, x="time", y="snr", color='satellite_ID',
hover_data=['satellite_type'] )
fig.update_traces(mode="markers+lines")
"color" selects the traces, and additional hover data can be entered using the "hover_data' argument to px.line.

Related

Add a slider to plotly that dynamically changes a column of data frame that is displayed

Minimal working example:
import pandas as pd
import plotly.express as px
A = [10,20,30,40,50,60]
B = [40,50,60,10,20,30]
data = pd.DataFrame({"A":A,"B":B})
alpha=0.5
data["Parameter"]= alpha*data["A"] +(1-alpha)*data["B"]
fig = px.scatter(
data, x="A",y="B",color="Parameter"
)
fig.show()
I would like to have a slider for alpha in plotly graph. I looked at the documentation but only found a slider for a fixed column with constant values.
fig = px.scatter(data, x="A", y="B", color="Parameter", animation_frame='Parameter')
fig["layout"].pop("updatemenus")
fig.update_xaxes(range=[0, 100])
fig.show('browser')

Pandas Plotly -- Secondary Graph needs to be to RSI

In plotly, I am able to make this graph(Attached picture at the end), with the code below.
(Data is stock market data for 1year, which is in csv format. Please use any OHLC data which has about 200 to 300 rows)
import pandas as pd
import ta
import plotly.graph_objects as go
df = pd.read_csv("Trial.csv")
df["rsi"] = ta.momentum.rsi(df["Close"], window=14, fillna=False)
dfff = df.tail(180)
layoutt = go.Layout(autosize=False, width=4181, height=1597)
fig_001 = go.Figure(data=[go.Candlestick(x=dfff['Date'], open=dfff['Open'], high=dfff['High'], low=dfff['Low'], close=dfff['Close'])], layout=layoutt)
fig_001.write_image("fig_001.jpeg")
As you see in the attached picture below, Plotly is generating 2 charts by default (with a smaller-duplicated chart below)...
About the secondary graph which is enclosed in 'Green', how can I change that to a RSI graph((Which is currently the same candlestick data as in red))?
plotly is not generating two charts. It is one with a range slider (when interactive can use to zoom into xaxis)
have hidden range slider
have created an additional trace and set it to use a second yaxis
have configure yaxes to use domains so it has same visual effect
import pandas as pd
import ta
import plotly.graph_objects as go
# df = pd.read_csv("Trial.csv")
# use plotly OHLC sample data
df = pd.read_csv(
"https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
df = df.rename(columns={c: c.split(".")[1] for c in df.columns if "." in c})
df["rsi"] = ta.momentum.rsi(df["Close"], window=14, fillna=False)
dfff = df.tail(180)
layoutt = go.Layout(autosize=False, width=4181, height=1597)
# make it fit on my screen!!!
layoutt = go.Layout(autosize=True)
layoutt2 = go.Layout(autosize=False, width=6731, height=2571)
fig_001 = go.Figure(
data=[
go.Candlestick(
x=dfff["Date"],
open=dfff["Open"],
high=dfff["High"],
low=dfff["Low"],
close=dfff["Close"],
name="OHLC",
),
go.Scatter(
x=dfff["Date"], y=dfff["rsi"], mode="markers+lines", name="RSI", yaxis="y2"
),
],
layout=layoutt,
).update_layout(
yaxis_domain=[0.3, 1],
yaxis2={"domain": [0, 0.20]},
xaxis_rangeslider_visible=False,
showlegend=False,
)
fig_001

How to plot frequency distribution graph using Matplotlib?

I trust you are doing well. I am using a data frame in which there are two columns screens and it's frequency. I am trying to find out the relationship between the screen and the frequency of the appearance of the screens. Now I want to know, for all screens what are all of the frequencies as sort of a summary graph. Imagine putting all of those frequencies into an array, and wanting to study the distribution in that array. Below is my code that I have tried so far:
data = pd.read_csv('frequency_list.csv')
new_columns = data.columns.values
new_columns[1] = 'frequency'
data.columns = new_columns
import matplotlib.pyplot as plt
%matplotlib inline
dataset = data.head(10)
dataset.plot(x = "screen", y = "frequency", kind = "bar")
plt.show()
col_one_list = unpickled_df['screen'].tolist()
col_one_arr = unpickled_df['screen'].head(10).to_numpy()
plt.hist(col_one_arr) #gives you a histogram of your array 'a'
plt.show() #finishes out the plot
Below is the screenshot of my data frame containing screen as one column and frequency as another. Can you help me to find out a way to plot a frequency distribution graph? Thanks in advance.
Will a bar plot work? Here's an example:
import pandas as pd
import matplotlib.pyplot as plt
freq = [102,98,56,117]
screen = ['A','B','C','D']
df = pd.DataFrame(list(zip(screen, freq)), columns=['screen', 'freq'])
plt.bar(df.screen,df.freq)
plt.xlabel('x')
plt.ylabel('count')
plt.show()

Python pie chart / Show several columns combined

I have dataframe with 2 columns:
Col1- managers' name
Col2 - their profit
I want plot a pie chart where I can show most profitable 5 managers seperately , and others in one slice as 'others'
How about that:
With automatic labeling of the pie pieces using autopct argument.
import pandas as pd
import matplotlib.pyplot as plt
data = {'managers':['mike1','mike2','mike3','mike4','mike5','mike6','mike7'],
'profit':[110,60,40,30,10,5,5],
}
df = pd.DataFrame(data)
df = df.sort_values(by = 'profit', ascending = False)
top_5 = df.iloc[:5]
others = df.iloc[5:]['profit'].sum()
df2 = pd.DataFrame([['others',others]], columns = ['managers','profit'])
all_data = top_5.append(df2, ignore_index=True)
all_data.index = all_data['managers']
#func to lable the pieces
def auto_func(val):
return(round(val))
all_data.plot.pie(y = 'profit', autopct = auto_func)
# ax = plt.gca()
plt.show()

Time series plot of categorical or binary variables in pandas or matplotlib

I have data that represent a time series of categorical variables. I want to display the transitions in categories below a traditional line plot of related continuous time series to show off context as time evolves. I'd like to know the best way to do this. My attempt was in terms of Rectangles. The appearance is a bit weird, and importantly the axis labels for the x axis don't render as dates.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
from pandas.plotting import register_matplotlib_converters
import matplotlib.dates as mdates
register_matplotlib_converters()
t0 = pd.DatetimeIndex(["2017-06-01 00:00","2017-06-17 00:00","2017-07-03 00:00","2017-08-02 00:00","2017-08-09 00:00","2017-09-01 00:00"])
t1 = pd.DatetimeIndex(["2017-06-01 00:00","2017-08-15 00:00","2017-09-01 00:00"])
df0 = pd.DataFrame({"cat":[0,2,1,2,0,1]},index = t0)
df1 = pd.DataFrame({"op":[0,1,0]},index=t1)
# Create new plot
fig,ax = plt.subplots(1,figsize=(8,3))
data_layout = {
"cat" : {0: ('bisque','Low'),
1: ('lightseagreen','Medium'),
2: ('rebeccapurple','High')},
"op" : {0: ('darkturquoise','Open'),
1: ('tomato','Close')}
}
vars =("cat","op")
dfs = [df0,df1]
all_ticks = []
leg = []
for j,(v,d) in enumerate(zip(vars,dfs)):
dvals = d[v][:].astype("d")
normal = mpl.colors.Normalize(vmin=0, vmax=2.)
colors = plt.cm.Set1(0.75*normal(dvals.as_matrix()))
handles = []
for i in range(d.count()-1):
s = d[v].index.to_pydatetime()
level = d[v][i]
base = d[v].index[i]
w = s[i+1] - s[i]
patch=mpl.patches.Rectangle((base,float(j)),width=w,color=data_layout[v][level][0],height=1,fill=True)
ax.add_patch(patch)
for lev in data_layout[v]:
print data_layout[v][level]
handles.append(mpl.patches.Patch(color=data_layout[v][lev][0],label=data_layout[v][lev][1]))
all_ticks.append(j+0.5)
leg.append( plt.legend(handles=handles,loc = (3-3*j+1)))
plt.axhline(y=1.,linewidth=3,color="gray")
plt.xlim(pd.Timestamp(2017,6,1).to_pydatetime(),pd.Timestamp(2017,9,1).to_pydatetime())
plt.ylim(0,2)
ax.add_artist(leg[0]) # two legends on one axis
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d') # This fails
plt.yticks(all_ticks,vars)
plt.show()
which produces this with no dates and has jittery lines:. How do I fix this? Is there a better way entirely?
This is a way to display dates on x-axis:
In your code substitute the line that fails with this one:
ax.xaxis.set_major_formatter((mdates.DateFormatter('%Y-%m-%d')))
But I don't remember how it should look like, can you show us the end-result again?