How to change labels of cureves in plotly.express.bar - plotly-python

I have the following figure:
I need to change the labels in legend.
I read the documentation for plotly.express.bar and I think color_discrete_map is relevant, but I don't know how to apply it.
As you see there is 6 colors in the figure and I want to assign a text to each color.
def update_figure_profits(df0,
label,
xlabel=None,
ylabel=None,
title=None,
percent_fmt=False):
c = ['#636EFA', '#EF553B', '#00CC96',
'#AB63FA', '#FFA15A', '#19D3F3'] * 3
df = df0[[label, 'name']].copy()
fig = px.bar(df,
x=range(len(df)),
y=label,
color=c,
)
fig.update_layout(xaxis=dict(
tickmode='array',
tickvals=list(range(18)),
ticktext=['', '', 'کامل', '', '', ''] + ['', '', 'متغیر', '', '', ''] +
['', '', 'فرامتغیر', '', '', '']
)
)
fig.update_layout(xaxis_title=xlabel,
yaxis_title=ylabel,)
fig.update_layout(title_text=title, title_x=0.5)
return fig
and the data look like this:
id name cost sell profit profit_margin efficiency_on_fixed_asset costing_type
0 1000 سنگِ آهن 4.794900e+06 13759200 8.964300e+06 0.651513 0.664022 full
1 2000 کنسانتره 1.155342e+07 22464000 1.091058e+07 0.485692 0.808191 full
2 3000 گندله 1.470069e+07 32993999 1.829331e+07 0.554443 1.355060 full
3 4000 آهنِ اسفنجی 2.669225e+07 70200000 4.350775e+07 0.619768 3.222796 full
4 5000 شمش 5.577847e+07 140400000 8.462153e+07 0.602717 6.268262 full
5 6000 نوردِ گرم 6.269642e+07 155984400 9.328798e+07 0.598060 6.910220 full
6 1000 سنگِ آهن 3.863700e+06 13759200 9.895500e+06 0.719192 0.733000 variable
7 2000 کنسانتره 8.976960e+06 22464000 1.348704e+07 0.600385 0.999040 variable
8 3000 گندله 1.097908e+07 32993999 2.201492e+07 0.667240 1.630735 variable
9 4000 آهنِ اسفنجی 1.932668e+07 70200000 5.087332e+07 0.724691 3.768394 variable
10 5000 شمش 3.871631e+07 140400000 1.016837e+08 0.724243 7.532125 variable
11 6000 نوردِ گرم 4.260937e+07 155984400 1.133750e+08 0.726836 8.398151 variable
12 1000 سنگِ آهن 1.123200e+06 13759200 1.263600e+07 0.918367 0.936000 supervariable
13 2000 کنسانتره 4.164883e+06 22464000 1.829912e+07 0.814597 1.355490 supervariable
14 3000 گندله 5.850284e+06 32993999 2.714371e+07 0.822686 2.010646 supervariable
15 4000 آهنِ اسفنجی 1.052448e+07 70200000 5.967552e+07 0.850079 4.420409 supervariable
16 5000 شمش 2.459279e+07 140400000 1.158072e+08 0.824838 8.578312 supervariable
17 6000 نوردِ گرم 2.535339e+07 155984400 1.306310e+08 0.837462 9.676371 supervariable
Thanks for any guide.

sorry i didn't read the whole code but what you've done in 2nd line is wrong, the color argument in bar function doesn't mean the colors themselves as you've written,
see the definition from plotly website
color (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign color to marks
so in your case, color would be "costing_type" or "name"

Related

Data Frame % column by groupping

I am working on a forecast accuracy report which measure the deviation between actual & pervious projection. The measurement would be = 1- ('Actual' - 'M-1') / 'Actual' .
There measure need to be groupped based different gratuity, say 'Product Category' / 'Line' / 'Product'. However, the df.groupby('Product Category').sum() function couldnt support the percentage calculation. Does anyone have idea how it should be fixed? Thanks!
data = {
"Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
"Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
"Product": ['A', 'B', 'C','D','E','F'],
"Actual": [100,50,40,20,70,50],
"M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)
df['M1 Gap'] = df['Actual'] - df['M-1']
df['Error_Per'] = 1- df['M1 Gap'] / df['Actual']
Expected output would be
enter image description here
You can also create a custom function and apply it on every row of a pandas data frame as follows. Just note that I set the axis argument to 1 so that the custom function is applied on each row or across columns:
import pandas as pd
def func(row):
row['M1 Gap'] = row['Actual'] - row['M-1']
row['Error_Per'] = 1 - (row['M1 Gap'] / row['Actual'])
return row
df.groupby('Product Category').sum().apply(func, axis = 1)
Actual M-1 M1 Gap Error_Per
Product Category
Drink 190.0 170.0 20.0 0.894737
Food 140.0 150.0 -10.0 1.071429
You should group BEFORE calculating percentage:
data = {
"Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
"Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
"Product": ['A', 'B', 'C','D','E','F'],
"Actual": [100,50,40,20,70,50],
"M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)
df['M1 Gap'] = df['Actual'] - df['M-1']
df_line = df.groupby('Line').sum()
df_line['Error_Per'] = df_line['M1 Gap'] / df_line['Actual']
print(df_line)
df_prod = df.groupby('Product Category').sum()
df_prod['Error_Per'] = df_prod['M1 Gap'] / df_prod['Actual']
print(df_prod)
Output:
Actual M-1 M1 Gap Error_Per
Line
Fruit 140 150 -10 -0.071429
Water 150 160 -10 -0.066667
Wine 40 10 30 0.750000
Actual M-1 M1 Gap Error_Per
Product Category
Drink 190 170 20 0.105263
Food 140 150 -10 -0.071429
Note: your expected Outcome from the screenshot doesn't match the dictionary of your code (which I used)

Using a for loop to create a new column in pandas dataframe

I have been trying to create a web crawler to scrape data from a website called Baseball Reference. When defining my crawler I realized that the different players have a unique id at the end of their URL containing the first 6 letters of their last name, three zeroes and the first 3 letters of their first name.
I have a pandas dataframe already containing columns 'first' and 'last' containing each players first and last names along with a lot of other data that i downloaded from this same website.
my def for my crawler function is as follows so far:
def bbref_crawler(ID):
url = 'https://www.baseball-reference.com/register/player.fcgi?id=' + str(ID)
source_code = requests.get(url)
page_soup = soup(source_code.text, features='lxml')
And the code that I have so far trying to obtain the player id's is as follows:
for x in nwl_offense:
while len(nwl_offense['last']) > 6:
id_last = len(nwl_offense['last']) - 1
while len(nwl_offense['first']) > 3:
id_first = len(nwl_offense['first']) - 1
nwl_offense['player_id'] = (str(id_first) + '000' + str(id_last))
When I run the for / while loop it just never stops running and I am not sure how else to go about achieving the goal I set out for of automating the player id into another column of that dataframe, so i can easily use the crawler to obtain more information on the players that I need for a project.
This is what the first 5 rows of the dataframe, nwl_offense look like:
print(nwl_offense.head())
Rk Name Age G ... WRC+ WRC
WSB OWins
0 1.0 Brian Baker 20.0 14.0 ... 733.107636 2.007068 0.099775 0.189913
1 2.0 Drew Beazley 21.0 46.0 ... 112.669541 29.920766 -0.456988 2.655892
2 3.0 Jarrett Bickel 21.0 33.0 ... 85.017293 15.245547 1.419822 1.502232
3 4.0 Nate Boyle 23.0 21.0 ... 1127.591556 1.543534 0.000000 0.139136
4 5.0 Seth Brewer* 22.0 12.0 ... 243.655365 1.667671 0.099775 0.159319
As stated in the comments, I wouldn't try to create a function to make the ids, as there will likely be some "quirky" ones in there that might not follow that logic.
If you're just go through each letter search they have it divided by and get the id directly by the player url.
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.baseball-reference.com/register/player.fcgi'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
player_register_search = {}
searchLinks = soup.find('div', {'id':'div_players'}).find_all('li')
for each in searchLinks:
links = each.find_all('a', href=True)
for link in links:
print(link)
player_register_search[link.text] = 'https://www.baseball-reference.com/' + link['href']
tot = len(player_register_search)
playerIds = {}
for count, (k, link)in enumerate(player_register_search.items(), start=1):
print(f'{count} of {tot} - {link}')
response = requests.get(link)
soup = BeautifulSoup(response.text, 'html.parser')
kLower = k.lower()
playerSection = soup.find('div', {'id':f'all_players_{kLower}'})
h2 = playerSection.find('h2').text
#print('\t',h2)
player_links = playerSection.find_all('a', href=True)
for player in player_links:
playerName = player.text.strip()
playerId = player['href'].split('id=')[-1].strip()
if playerName not in playerIds.keys():
playerIds[playerName] = []
#print(f'\t{playerName}: {playerId}')
playerIds[playerName].append(playerId)
df = pd.DataFrame({'Player' : list(playerIds.keys()),
'id': list(playerIds.values())})
Output:
print(df)
Player id
0 Scott A'Hara [ahara-000sco]
1 A'Heasy [ahease001---]
2 Al Aaberg [aaberg001alf]
3 Kirk Aadland [aadlan001kir]
4 Zach Aaker [aaker-000zac]
... ...
323628 Mike Zywica [zywica001mic]
323629 Joseph Zywiciel [zywici000jos]
323630 Bobby Zywicki [zywick000bob]
323631 Brandon Zywicki [zywick000bra]
323632 Nate Zyzda [zyzda-000nat]
[323633 rows x 2 columns]
TO GET JUST THE PLAYERS FROM YOUR DATAFRAME:
THIS IS JUST AN EXAMPLE OF YOUR DATAFRAME. DO NOT INCLUDE THIS IN YOUR CODE
# Sample of the dataframe
nwl_offense = pd.DataFrame({'first':['Evan', 'Kelby'],
'last':['Albrecht', 'Golladay']})
Use this:
# YOU DATAFRAME - GET LIST OF NAMES
player_interest_list = list(nwl_offense['Name'])
nwl_players = df.loc[df['Player'].isin(player_interest_list)]
Output:
print(nwl_players)
Player id
3095 Evan Albrecht [albrec001eva, albrec000eva]
108083 Kelby Golladay [gollad000kel]

convert TB to GB on specific index

I wrote such an code. Here i wanted to change all column that constitute TB and GB to single integer. for example if column has 2 TB, this code will delete TB and will keep it as 2. The program works good. What now i want to do is to convert 2TB to 2048 GB so that i can sum all column values. Is there any way to remove TB and make calculation on specific row at the same time?
def removeend():
df= pd.read_csv('ExportList.csv')
if df["Used Space"].str.contains("GB | TB").any() or df["Memory Size"].str.contains("GB | TB").any() or df["Host CPU"].str.contains("Hz|MHz|GHz").any():
df['Used Space'] = df['Used Space'].str.replace(r'GB|TB', '', regex=True)
df["Memory Size"] = df["Memory Size"].str.replace(r'GB|TB', '', regex=True)
df['Host CPU'] = df['Host CPU'].str.replace(r'MHz|Hz|GHz', '', regex=True)
df = df.convert_dtypes()
df["Used Space"] = pd.to_numeric(df["Used Space"])
df["Memory Size"] = pd.to_numeric(df["Memory Size"])
df["Host CPU"] = pd.to_numeric(df["Host CPU"])
else:
print("Error occured!!!")
return df
define\create a custom function:
def converter(x):
try:
return pd.eval(x)
except:
return x
Finally:
cols=["Used Space","Memory Size"]
df[cols]=df[cols].replace({'GB':'','TB':'*1024'},regex=True).applymap(converter)
df["Host CPU"]=df["Host CPU"].replace({'MHz':'','GHz':'*0.001','Hz':'*0.000001'},regex=True).map(converter)

how to change dataframe in apply functions pandas

I want to use apply to dynamically modify the content of my dataframe, the table is like:
index price signal stoploss
0 0 1000 True 990.0
1 1 1010 False 990.0
2 2 1020 True 1010.0
3 3 1000 False 1010.0
4 4 990 False 1010.0
5 5 980 False 1010.0
6 6 1000 False 1010.0
7 7 1020 True 1010.0
8 8 1030 False 1010.0
9 9 1040 False 1010.0
my code is :
def test(row, dd):
if row.signal:
dd['inorder']=True
row['stoploss']=1
df = pd.DataFrame({'index':[0,1,2,3,4,5,6,7,8,9],
'price':[1000,1010,1020,1000,990,980,1000,1020,1030,1040],
'signal':[True, False, True, False, False, False, False, True, False, False]})
if __name__ == '__main__':
df['stoploss'] = df.loc[df['signal'], 'price'] - 10
df['stoploss'].ffill(inplace=True)
xx = dict(inorder=False)
df.apply(lambda row: test(row, xx), axis=1)
print(df)
When I trace into the function test, I can see that the value is indeed changed to 1, but out of the function test scope, it seems has no effect on the dataframe.
I tried to the use another way to modify the content of the dataframe,
for k, row in df.iterrows():
if row.signal:
xx['inorder'] = True
df.loc[k, 'stoploss'] = 1
this one works, but obviously it's a lot slower than apply.
The correct result I expect is :
index price signal stoploss
0 0 1000 True 1.0
1 1 1010 False 990.0
2 2 1020 True 1.0
3 3 1000 False 1010.0
4 4 990 False 1010.0
5 5 980 False 1010.0
6 6 1000 False 1010.0
7 7 1020 True 1.0
8 8 1030 False 1010.0
9 9 1040 False 1010.0
How to achieve that assignment in apply please?
Thanks
If you look at the docs for apply, you'll notice that apply does not change the DataFrame in place, but rather returns a new dataframe where the function has been applied.
So, in your seconds to last line, you can try
df = df.apply(lambda row: test(row, xx), axis=1)
Edit:
IMO, this isn't very well documented, but the call
df.apply(func, axis=1) will apply func to each row, and set the row to the return value of func.
As written, your example won't work because the function you're applying doesn't return anything. The following minimal example works the way you intend.
df = pd.DataFrame({'index':[0,1,2,3,4,5,6,7,8,9],
'price':[1000,1010,1020,1000,990,980,1000,1020,1030,1040],
'signal':[True, False, True, False, False, False, False, True, False, False]})
df['stoploss'] = df.loc[df['signal'], 'price'] - 10
df['stoploss'].ffill(inplace=True)
def test(row):
row.stoploss = 1 if row.signal else row.stoploss
return row
modified_df = df.apply(test, axis=1)
As an aside, I don't think you actually need to use apply to get the result you want. Have you tried something like
df.loc[df['signal'] == True, 'stoploss'] = 1
That would be a much simpler and faster way to get your target output.

Matplotlib table with double headers

Hi is possible to make a matplotlib table to have a "double header" like this
(mind the dashed line)
----------------------------------------
| Feb Total | YTD Total |
----------------------------------------
| 2014|2015 | 2014/2015| 2015/2016 |
--------------------------------------------------
|VVI-ID | 12 | 20 | 188 | 169 |
--------------------------------------------------
|TDI-ID | 34 | 45 | 556 | 456 |
You can do this by using another tables with no data as headers. That is, you create empty tables, whose column labels will be the headers for your table. Let's consider this demo example. At first, add tables header_0 and header_1. At second, correct headers' and table's argument bbox to position all tables correctly. Since the tables are overlapped, the table with data should be the last one.
import numpy as np
import matplotlib.pyplot as plt
data = [[ 66386, 174296, 75131, 577908, 32015],
[ 58230, 381139, 78045, 99308, 160454],
[ 89135, 80552, 152558, 497981, 603535],
[ 78415, 81858, 150656, 193263, 69638],
[ 139361, 331509, 343164, 781380, 52269]]
columns = ('Freeze', 'Wind', 'Flood', 'Quake', 'Hail')
rows = ['%d year' % x for x in (100, 50, 20, 10, 5)]
values = np.arange(0, 2500, 500)
value_increment = 1000
# Get some pastel shades for the colors
colors = plt.cm.BuPu(np.linspace(0, 0.5, len(rows)))
n_rows = len(data)
index = np.arange(len(columns)) + 0.3
bar_width = 0.4
# Initialize the vertical-offset for the stacked bar chart.
y_offset = np.array([0.0] * len(columns))
# Plot bars and create text labels for the table
cell_text = []
for row in range(n_rows):
plt.bar(index, data[row], bar_width, bottom=y_offset, color=colors[row])
y_offset = y_offset + data[row]
cell_text.append(['%1.1f' % (x/1000.0) for x in y_offset])
# Reverse colors and text labels to display the last value at the top.
colors = colors[::-1]
cell_text.reverse()
# Add headers and a table at the bottom of the axes
header_0 = plt.table(cellText=[['']*2],
colLabels=['Extra header 1', 'Extra header 2'],
loc='bottom',
bbox=[0, -0.1, 0.8, 0.1]
)
header_1 = plt.table(cellText=[['']],
colLabels=['Just Hail'],
loc='bottom',
bbox=[0.8, -0.1, 0.2, 0.1]
)
the_table = plt.table(cellText=cell_text,
rowLabels=rows,
rowColours=colors,
colLabels=columns,
loc='bottom',
bbox=[0, -0.35, 1.0, 0.3]
)
# Adjust layout to make room for the table:
plt.subplots_adjust(left=0.2, bottom=-0.2)
plt.ylabel("Loss in ${0}'s".format(value_increment))
plt.yticks(values * value_increment, ['%d' % val for val in values])
plt.xticks([])
plt.title('Loss by Disaster')
plt.show()
If extra header is symmetric or combine equal quantity of "normal" header, all you need to do is to add an extra header table and correct bbox of data table like this (the same example with deleted column):
header = plt.table(cellText=[['']*2],
colLabels=['Extra header 1', 'Extra header 2'],
loc='bottom'
)
the_table = plt.table(cellText=cell_text,
rowLabels=rows,
rowColours=colors,
colLabels=columns,
loc='bottom',
bbox=[0, -0.35, 1.0, 0.3]
)