pandas xlsxwriter stacked barchart - pandas

I am looking to upload a grouped barchart in excel, however I can't seem to find a way to do so.
Here is my code:
bar_chart2 = workbook.add_chart({'type':'column'})
bar_chart2.add_series({
'name':'Month over month product',
'categories':'=Month over month!$H$2:$H$6',
'values':'=Month over month!$I$2:$J$6',
})
bar_chart2.set_legend({'none': True})
worksheet5.insert_chart('F8',bar_chart2)
bar_chart2.set_legend({'none': True})
worksheet5.insert_chart('F8',bar_chart2)
However, I get that.

Using your provided data, I re-worked the Example given in the Docs by jmcnamara (link here) to suit what you're looking for.
Full Code:
import pandas as pd
import xlsxwriter
headings = [' ', 'Apr 2017', 'May 2017']
data = [
['NGN', 'UGX', 'KES', 'TZS', 'CNY'],
[5816, 1121, 115, 146, 1],
[7089, 1095, 226, 120, 0],
]
#opening workbook
workbook = xlsxwriter.Workbook("test.xlsx")
worksheet5 = workbook.add_worksheet('Month over month')
worksheet5.write_row('H1', headings)
worksheet5.write_column('H2', data[0])
worksheet5.write_column('I2', data[1])
worksheet5.write_column('J2', data[2])
# beginning of OP snippet
bar_chart2 = workbook.add_chart({'type':'column'})
bar_chart2.add_series({
'name': "='Month over month'!$I$1",
'categories': "='Month over month'!$H$2:$H$6",
'values': "='Month over month'!$I$2:$I$6",
})
bar_chart2.add_series({
'name': "='Month over month'!$J$1",
'categories': "='Month over month'!$H$2:$H$6",
'values': "='Month over month'!$J$2:$J$6",
})
bar_chart2.set_title ({'name': 'Month over month product'})
bar_chart2.set_style(11)
#I took the liberty of leaving the legend in there - it was commented in originally
#bar_chart2.set_legend({'none': True})
# end of OP snippet
worksheet5.insert_chart('F8', bar_chart2)
workbook.close()
Output:

Related

Format several worksheets with xlsxwriter

Since days I am struggling with this problem: I have a large script where a function is exporting tables in an Excel workbook, each table into a different worksheet. Additionally, I want to give format to the worksheets using engine xlsxwriter. I use the instance with pd.ExcelWriter() as writer.
This works fine for an Excel workbook with a single sheet: using to_excel the table is exported and immediately I use an ad hoc function I created to format it.
Code structure:
Global Excel_formatting function that gives format (input: table, sheet name, text strings)
Script function_tables function (input: dataframe, pathfile) that creates subtables from input dataframe, and uses instance pd.ExcelWriter to:
-- export the Excel worksheets
-- call Excel_formatting function to format the worksheets
At high level, the script calls function_tables
See below the complete code:
# Global function to format the output tables
def Excel_formatting(table_input, sheet_name_input, title_in, remark_in, start_row_input):
# Assign WorkBook and worksheet
workbook = writer.book
worksheet = writer.sheets[sheet_name_input]
start_column = 0
# Title and remark
worksheet.write(0, start_column, title_in,
workbook.add_format({'bold': True,
'color': '#8B0000',
'size': 18,
'align':'left'}))
worksheet.write(1, start_column+1, remark_in,
workbook.add_format({'italic': True,
'size': 11,
'align':'left'}))
# Format header (on top of existing header)
header_format = workbook.add_format({'bold': True,
'text_wrap': False,
'fg_color': '#FF8B8B',
'border': 1,
'align':'center'})
for col_num, value in enumerate(table_input.columns.values):
worksheet.write(start_row_input, col_num, value, header_format)
# Freeze panes / Can also be done with to_excel
worksheet.freeze_panes(start_row_input+1, 0)
# Set column width
end_column = len(table_input.columns)
worksheet.autofit()
# Add autofilter to header
worksheet.autofilter(start_row_input, 0, start_row_input, end_column-1)
# Add logo (if present, to avoid script error)
figure_path = 'Logo.JPG'
if (os.path.exists(figure_path) == True):
worksheet.insert_image(0, start_column+5, figure_path, {'x_scale': 0.1, 'y_scale': 0.08, 'decorative': True})
# End of function
return workbook.close()
def function_tables(x, filename):
# Here the function creates subtables from input dataframe
df = x
Table_1 = df.groupby(['Feature 1'])['Deviation'].sum().reset_index()
Table_2 = df.groupby(['Feature 2'])['Deviation'].sum().reset_index()
# ...
Table_N = df.groupby(['Feature N'])['Deviation'].sum().reset_index()
# Export tables adding new sheets to the same Excel workbook
with pd.ExcelWriter(filename, engine='xlsxwriter', mode='w') as writer:
start_row = 2
Table_1.to_excel(writer, index=True, header=True, sheet_name='Overview Feat.1', startrow=start_row)
Table_2.to_excel(writer, index=True, header=True, sheet_name='Overview Feat.2', startrow=start_row)
# ...
Table_N.to_excel(writer, index=True, header=True, sheet_name='Overview Feat.N', startrow=start_row)
# Formatting the worksheets calling the global function
title_input_1 = 'Title for overview table 1'
remark_input_1 = 'Remark Table 1'
Excel_formatting(Table_2, 'Overview Feat.2', title_input_1, remark_input_1, start_row)
title_input_2 = 'Title for overview table 2'
remark_input_2 = 'Remark Table 2'
# ...
Excel_formatting(Table_2, 'Overview Feat.N', title_input_2, remark_input_2, start_row)
title_input_N = 'Title for overview table N'
remark_input_N = 'Remark Table N'
Excel_formatting(Table_1, 'Overview Feat.N', title_input_N, remark_input_N, start_row)
# Call section of script
function_tables(df_input, Path_filename)
I tried also openpyxl, a loop through the tables using a dictionary for the input or not having the formatting function as global but inside the writer instance but all failed, always giving me the same error:
worksheet = writer.sheets[sheet_name_input]
KeyError: 'Overview Feat.1'
It looks that it cannot find the sheetname. Any help? A poorsman alternative will be to create N Excel workbooks and then merged all them, but I prefer not to do so, it must be a more pythonic way to work this, right?
A million thanks!
There are a few issues in the code: the writer object needs to be passed to the Excel_formatting() function, the writer shouldn't be closed in that function, and there are some typos in the titles, captions and variable names.
Here is a working example with those issues fixed. I've added sample data frames, you can replace that with your groupby() code.
import pandas as pd
import os
# Global function to format the output tables
def Excel_formatting(table_input, writer, sheet_name_input, title_in, remark_in, start_row_input):
# Assign WorkBook and worksheet
workbook = writer.book
worksheet = writer.sheets[sheet_name_input]
start_column = 0
# Title and remark
worksheet.write(0, start_column, title_in,
workbook.add_format({'bold': True,
'color': '#8B0000',
'size': 18,
'align': 'left'}))
worksheet.write(1, start_column + 1, remark_in,
workbook.add_format({'italic': True,
'size': 11,
'align': 'left'}))
# Format header (on top of existing header)
header_format = workbook.add_format({'bold': True,
'text_wrap': False,
'fg_color': '#FF8B8B',
'border': 1,
'align': 'center'})
for col_num, value in enumerate(table_input.columns.values):
worksheet.write(start_row_input, col_num, value, header_format)
# Freeze panes / Can also be done with to_excel
worksheet.freeze_panes(start_row_input + 1, 0)
# Set column width
end_column = len(table_input.columns)
worksheet.autofit()
# Add autofilter to header
worksheet.autofilter(start_row_input, 0, start_row_input, end_column - 1)
# Add logo (if present, to avoid script error)
figure_path = 'Logo.JPG'
if os.path.exists(figure_path):
worksheet.insert_image(0, start_column + 5, figure_path, {'x_scale': 0.1, 'y_scale': 0.08, 'decorative': True})
def function_tables(x, filename):
Table_1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
Table_2 = pd.DataFrame({'Data': [11, 12, 13, 14]})
# ...
Table_N = pd.DataFrame({'Data': [11, 12, 13, 14]})
# Export tables adding new sheets to the same Excel workbook
with pd.ExcelWriter(filename, engine='xlsxwriter', mode='w') as writer:
start_row = 2
Table_1.to_excel(writer, index=True, header=True, sheet_name='Overview Feat.1', startrow=start_row)
Table_2.to_excel(writer, index=True, header=True, sheet_name='Overview Feat.2', startrow=start_row)
# ...
Table_N.to_excel(writer, index=True, header=True, sheet_name='Overview Feat.N', startrow=start_row)
# Formatting the worksheets calling the global function
title_input_1 = 'Title for overview table 1'
remark_input_1 = 'Remark Table 1'
Excel_formatting(Table_1, writer, 'Overview Feat.1', title_input_1, remark_input_1, start_row)
title_input_2 = 'Title for overview table 2'
remark_input_2 = 'Remark Table 2'
Excel_formatting(Table_2, writer, 'Overview Feat.2', title_input_2, remark_input_2, start_row)
title_input_N = 'Title for overview table N'
remark_input_N = 'Remark Table N'
Excel_formatting(Table_N, writer, 'Overview Feat.N', title_input_N, remark_input_N, start_row)
# Call section of script
function_tables(None, 'test.xlsx')
Output:
However, to make it more generic it would be best to handle the main function in a loop like this:
def function_tables(x, filename):
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
Table_1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
Table_2 = pd.DataFrame({'Data': [11, 12, 13, 14]})
# ...
Table_N = pd.DataFrame({'Data': [11, 12, 13, 14]})
# In a real case you would probably append() these in a loop.
dfs = [Table_1, Table_2, Table_N]
for i, df in enumerate(dfs, 1):
start_row = 2
df.to_excel(writer, index=True, header=True, sheet_name=f'Overview Feat.{i}', startrow=start_row)
# Formatting the worksheets calling the global function
title_input = f'Title for overview table {i}'
remark_input = f'Remark Table {i}'
Excel_formatting(df, writer, f'Overview Feat.{i}', title_input, remark_input, start_row)
writer.close()

GroupBy Function Not Applying

I am trying to groupby for the following specializations but I am not getting the expected result (or any for that matter). The data stays ungrouped even after this step. Any idea what's wrong in my code?
cols_specials = ['Enterprise ID','Specialization','Specialization Branches','Specialization Type']
specials = pd.read_csv(agg_specials, engine='python')
specials = specials.merge(roster, left_on='Enterprise ID', right_on='Enterprise ID', how='left')
specials = specials[cols_specials]
specials = specials.groupby(['Enterprise ID'])['Specialization'].transform(lambda x: '; '.join(str(x)))
specials.to_csv(end_report_specials, index=False, encoding='utf-8-sig')
Please try using agg:
import pandas as pd
df = pd.DataFrame(
[
['john', 'eng', 'build'],
['john', 'math', 'build'],
['kevin', 'math', 'asp'],
['nick', 'sci', 'spi']
],
columns = ['id', 'spec', 'type']
)
df.groupby(['id'])[['spec']].agg(lambda x: ';'.join(x))
resiults in:
if you need to preserve starting number of lines, use transform. transform returns one column:
df['spec_grouped'] = df.groupby(['id'])[['spec']].transform(lambda x: ';'.join(x))
df
results in:

adding href link to a pandas data frame

I have sample dataframe
Date Announcement href
Apr 9, 2020 Hello World https://helloworld.com/
data = {'Date': ['c' , 'Apr 8,2010'], 'Announcement': ['Hello World A', 'Hello World B'], 'href': ['https://helloworld.com', 'https://helloworldb.com'}
df = pd.DataFrame(data, columns=['Date', 'Announcement', 'href']
df.to_excel("announce.xls', engine='xlswriter')
I am trying to figure out how can i just have output in xls as following: dataframe in announcement column should have a link to href
Date Announcement
Apr 9, 2020 Hello World
https://helloworld.com/
Updated to your embed the url in the cell. The trick is to use the *.xslx format, as opposed to the 1997 *.xls format:
import pandas as pd
data = {
'Date': ['c' , 'Apr 8,2010'],
'Announcement': ['=HYPERLINK("http://helloworld.com", "Hello World A")','=HYPERLINK("http://helloworldb.com", "Hello World B")'],
}
df = pd.DataFrame(data, columns=['Date', 'Announcement'])
df.to_excel('announce.xlsx')

Overwrite one sheet by another Python

I need to write specific data in two excel sheets, the first sheet will be filled by the first and last date, while the second sheet will contain the time difference between two specific rows(only calculate the time difference when df.iloc[i, 1] == '[1]->[0]' and df.iloc[i + 1, 1] == '[0]->[1]').
This is my code:
import xlsxwriter
import pandas as pd
df= pd.DataFrame({'Time':['2019/01/03 15:02:07', '2019/01/03 15:16:55', '2019/01/03 15:17:20', '2019/01/03 15:28:58','2019/01/03 15:32:28','2019/01/03 15:38:54'],
'Payload':['[0]->[1]', '[1]->[0]','[0]->[1]','[0]->[1]','[1]->[0]','[0]->[1]']})
workbook = xlsxwriter.Workbook('Results.xlsx')
ws = workbook.add_worksheet("Rapport détaillé")
# wsNavco = workbook.add_worksheet("Délai reconnexion NAVCO")
ws.set_column(0, 1, 30)
ws.set_column(1, 2, 25)
# Add a format. Light yellow fill with dark red text.
format1 = workbook.add_format({'bg_color': '#fffcad',
'font_color': '#0a0a0a'})
# Add a format. Green fill with dark green text.
format2 = workbook.add_format({'bg_color': '#e7fabe',
'font_color': '#0a0a0a'})
# Write a conditional format over a range.
ws.conditional_format('A1:A24', {'type': 'cell',
'criteria': '>=',
'value': 50,
'format': format1})
ws.conditional_format('B1:B24', {'type': 'cell',
'criteria': '>=',
'value': 50,
'format': format2})
parametres = (
['Parametres', 'Valeurs'],
['1ere date ', str(df['Time'].iloc[0])],
['Derniere date ', str(df['Time'].iloc[len(df)-1])],
)
# Start from the first cell. Rows and
# columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for name, parametres in (parametres):
ws.write(row, col, name)
ws.write(row, col + 1, parametres)
row += 1
workbook.close()
df= df.sort_values(by='Time')
df.Time = pd.to_datetime(df.Time, format='%Y/%m/%d %H:%M:%S')
print('df\n',df)
diff = []
for i in range(len(df) - 1):
if df.iloc[i, 1] == '[1]->[0]' and df.iloc[i + 1, 1] == '[0]->[1]':
time_diff = df.iloc[i + 1, 0] - df.iloc[i, 0]
else:
time_diff = 0
diff.append(time_diff)
diff.append(0) # to fill the last value
df['Difference'] = diff
print(df['Difference'])
print('df1\n',df)
workbook = xlsxwriter.Workbook('Results.xlsx')
wsNavco = workbook.add_worksheet('Délai reconnexion NAVCO')
# wsNavco = wb.worksheets[1]
wsNavco.set_column(0, 1, 25)
wsNavco.set_column(1, 2, 55)
# Add a format. Light yellow fill with dark red text.
format1 = workbook.add_format({'bg_color': '#fffcad',
'font_color': '#0a0a0a'})
# Add a format. Green fill with dark green text.
format2 = workbook.add_format({'bg_color': '#e7fabe',
'font_color': '#0a0a0a'})
# Write a conditional format over a range.
wsNavco.conditional_format('A1:A24', {'type': 'cell',
'criteria': '>=',
'value': 50,
'format': format1})
wsNavco.conditional_format('B1:B24', {'type': 'cell',
'criteria': '>=',
'value': 50,
'format': format2})
for i in range (len(df)-1):
wsNavco.set_column(1, 1, 15)
wsNavco.write('A'+str(3),'Payload')
wsNavco.write('A'+str(i+4), str((df.iloc[i,1])))
wsNavco.write('B'+str(3),'Délai reconnexion NAVCO')
wsNavco.write('B'+str(i+4), str((df.iloc[i,2])))
workbook.close()
The problem is that it creates the first sheet and name and fill it, but then it overwrites the first sheet by the second sheet.
My question is: how can I save both sheets.
You cannot append to an existing workbook with Xlsxwriter, you need to perform all the writes before closing the workbook. In your case this should be fine, just remove the lines that close and reopen the book between 'Rapport détaillé' and 'Délai reconnexion NAVCO'
If you prepare you data into DataFrames before hand it becomes very simple.
import pandas as pd
# Create some Pandas dataframes from some data.
df1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
df2 = pd.DataFrame({'Data': [21, 22, 23, 24]})
df3 = pd.DataFrame({'Data': [31, 32, 33, 34]})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_multiple.xlsx', engine='xlsxwriter')
workbook = writer.book
# Write each dataframe to a different worksheet.
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')
# Define formats.
format1 = workbook.add_format({'bg_color': '#fffcad',
'font_color': '#0a0a0a'})
format2 = workbook.add_format({'bg_color': '#e7fabe',
'font_color': '#0a0a0a'})
# Format worksheets.
worksheet = writer.sheets['Sheet1']
worksheet.conditional_format('A1:A24', {'type': 'cell',
'criteria': '>=',
'value': 50,
'format': format1})
worksheet = writer.sheets['Sheet2']
worksheet.conditional_format('B1:B24', {'type': 'cell',
'criteria': '>=',
'value': 50,
'format': format2})
# Close the Pandas Excel writer and output the Excel file.
writer.save()
There are alternative engines like Openpyxl that support appending. See this answer for details.

missing data in pandas profiling report

I am using Python 2.7 and Pandas Profiling to generate a report out of a dataframe. Following is my code:
import pandas as pd
import pandas_profiling
# the actual dataset is very large, just providing the two elements of the list
data = [{'polarity': 0.0, 'name': u'danesh bhopi', 'sentiment': 'Neutral', 'tweet_id': 1049952424818020353, 'original_tweet_id': 1049952424818020353, 'created_at': Timestamp('2018-10-10 14:18:59'), 'tweet_text': u"Wouldn't mind aus 120 all-out but before that would like to see a Finch \U0001f4af #PakVAus #AUSvPAK", 'source': u'Twitter for Android', 'location': u'pune', 'retweet_count': 0, 'geo': '', 'favorite_count': 0, 'screen_name': u'DaneshBhope'}, {'polarity': 1.0, 'name': u'kamal Kishor parihar', 'sentiment': 'Positive', 'tweet_id': 1049952403980775425, 'original_tweet_id': 1049952403980775425, 'created_at': Timestamp('2018-10-10 14:18:54'), 'tweet_text': u'#the_summer_game What you and Australia think\nPlay for\n win \nDraw\n or....! #PakvAus', 'source': u'Twitter for Android', 'location': u'chembur Mumbai ', 'retweet_count': 0, 'geo': '', 'favorite_count': 0, 'screen_name': u'kaluparihar1'}]
df = pd.DataFrame(data) #data is a python list containing python dictionaries
pfr = pandas_profiling.ProfileReport(df)
pfr.to_file("df_report.html")
The screenshot of the part of the df_report.html file is below:
As you can see in the image, the Unique(%) field in all the variables is 0.0 although the columns have unique values.
Apart from this, the chart in the 'location' variable is broken. There is no bar for the values 22, 15, 4 and the only bar is for the maximum value only. This is happening in all the variables.
Any help would be appreciated.