xlsxwriter modifying worksheet font - xlsxwriter

I need assistance modifying the font and size of excel spreadsheet from default to 'Arial' 9. I'm able to modify the header row, but unable to get the body of the spreadsheet to do the same. I'm using xlsxwriter, I'm sure it's something simple just not that familiar with xlsxwriter. See code below, any assistance greatly appreciated.
Pipeline_details.to_excel(writer1,
sheet_name ='Pipeline_Details',
startrow=1,
startcol=0,
header=False,
index=False)
workbook = writer1.book
worksheet = writer1.sheets['Pipeline Details']
(max_row, max_col) = Pipeline_details.shape
worksheet.autofilter(0,0,max_row, max_col - 1)
worksheer.hide_gridlines(2)
header_format = workbook_add.format({
'font_name': 'Arial',
'font_size': 9,
'bold': False,
'text_wrap': True})
for col_num, value in enumerate(Pipeline_details.columns.values):
worksheet.write(0, col_num, value, header_format)
cell_format = workbook.add_format({'font_name': 'Arial', 'font_size': 9})
writer1.save()

The docs for formatting while using pandas with xlsxwriter state the following:
XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.
If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas.
That being said
It is possible to format any other, non date/datetime column data using set_column()
For an example using A notation, you'd execute this after writing your data to the worksheet:
worksheet.set_column('A:Z', None, format)
I'd try that first to see if you get anywhere with it. Otherwise I'd suggest writing your rows yourself and adding your format that way.

Related

Pandas Dataframe unable to reference column name in Plotly Dash Dropdown when set to reference in google sheets column

I am currently busy with a Plotly Dash Web Application and busy creating a dropdown referencing a column from a pandas dataframe I am reading in from a CSV file.
The issue is it is not able to read the column and I have seen it is because the column is actually a reference of another sheet I.e =RawData!A1.
I have managed to print the column so I know it exists in the dataframe and all the data is printing correctly however, Plotly Dash does not want to populate the dropdown with the label and values - my current line of code is:
options=[{'label': i, 'value': i} for i in df.CategoryName.unique()],
Category name in Google Sheets is referring to =RawData!A1
What I have tested:
Ammended my sheet name to read directly from my RawData sheet and it works fine - This is not a solution that I want though, this lead me to see the issue was with the reading from the referenced column.
Attempted using column index instead:
options=[{'label': i, 'value': i} for i in df.iloc[:,1].unique()],
Again this worked for printing but not to populate the dropdown in plotly dash.
Any advise will be greatly appreciated!
So by actually adding some data cleaning to remove rows at the bottom of my dataset in pandas it fixed my issue.
I added a remove nan based on my column CategoryName and by doing that my dropdown worked
df = df[df['CategoryName'].notna()]
The reason it worked makes sense - I setup my copy form to =RawData!A:A so my RawData at the moment only comprising of 123 rows, by row 125 in my reference sheet it was referencing a blank column causing the dropdown to have an error showing a reference to something that does not exist, very funny error but logical at the same time. Not sure if this will help many people but hopefully it will assist somebody!

How to maintain mutiindex view in pandas while converting pandas dataframe to csv file?

I have pandas dataframe where mutiindex is there(pic 1) but when i am converting it csv it not showing as multiindex(pic 2)
I am using to_csv() .Is there any parameter i need to pass to get right format?
pic 1:
pic:2
Tried as per suggestion, below is the pic
If you're not bothered about getting a CSV as output the way I do this is by putting the data in an XLSX file.
# Create the workbook to save the data within
workbook = pd.ExcelWriter(xlsx_filename, engine='xlsxwriter')
# Create sheets in excel for data
df.to_excel(workbook, sheet_name='Sheet1')
# save the changes
workbook.save()
Can you try this and see if it formats how you want?
Maybe this can be helpful for you:
pd.DataFrame(df.columns.tolist()).T.to_csv("dataframe.csv", mode="w", header=False, index=False)
df.to_csv("dataframe.csv", mode="a", header=False, index=False)
I guess you are using an older version of pandas. If you are using <0.21.0, there is a tupleize_cols parameter you can play with. If above, just save as to_csv. It will default to each index in one row -

Pandas: printing to csv also creates an unlabelled column with row indices

I use the following code to print a dataframe into a csv file based on the answers seen at this question:
How to avoid Python/Pandas creating an index in a saved csv?
y_insincere = [classify_text(text,trimmed_posterior_dict)<0 for text in X_test]
X_output = pd.DataFrame()
X_output['number'] = number
X_output['CASE'] = case
X_output.to_csv('submission.csv',header=True,columns = ['id','case'],index='False')
However, when I look at the csv file it has an extra column without a header with row indices. I tried other fixes from the above question, but nothing worked. I am stuck. Any help appreciated

Sorting Excel with openpyxl

My English is not well,sorry about that.
I am trying to sort a sheet using openpyxl and Python. I have read the documents and I don't quite understand this page. I am expecting to sort my excel file by some columns,at frist sort by column A,then by column B,then by cloumn C and last by column F.My problem is how to use add_sort_condition and add_filter_column these two ways,just the simple way would be great.If i can get an example,that's help me a lot!
wb_read = openpyxl.load_workbook(filename)
sheet_read = wb_read.get_active_sheet()
sheet_read.auto_filter.ref = 'A3:Q40000'
"""sheet_read.auto_filter.add_filter.column(2,['id'],blank=False) Didn't work
sheet_read.auto_filter.add_sort_condition('A') How and Where to use condition?
sheet_read.auto_filter.add_sort_condition('A3:Q40000') It seems didn't work again"""
wb_read.save('sorted.xlsx')
By the way,If i want to sort,Do I have to use auto_filter?

grab and filter from more than 255 columns from a huge closed workbook

i have a huge workbook (0.6 million rows) and 315 columns whose column names i need to grab into an array. due to the huge size, i don't want to open and close the workbook to copy the 1st row of the range. Also, I want to only grab certain columns from the 1st row that begin with the word "Global ".
can anyone help with short code example on how to go about doing this? please note i have tried ADOX, ADO etc but both show the 255 column limitations. I also dont want to open the workbook, but pull the required "Global " columns from the 315 columns into an array.
any help is most appreciated.
You can copy the first row of your target by opening a new workbook, and in A1 use this formula:
='C:\PATH_TO_TARGET\[TARGET_FILE_NAME.xlsx]WORKSHEET_NAME'!A1
Note that PATH+FILENAME+WORKSHEET is enclosed in single quotes, the FILENAME is enclosed in square brackets, and an exclamation separates the cell reference.
Then copy/Paste or fill right to get the next 314 columns. Note: this formula will return zero for empty target cells.
Once you have the column heading you can copy/paste_special_values if you want to destroy the links to the closed workbook.
Hope that helps
You could use the Python programing language.
While it does not actively works with XLSX fiels, you just have to install the openpyxl external module from here: https://pypi.python.org/pypi/openpyxl -
(You will also have to install Python. of course - just download it from www.python.org)
It will make working with your data in an interactive Python session a piece of cake, and the time to open the workbook without having to load the Excel interface should be a fraction of what you are expecting. (I think it will have to fit in your memory, though).
But this is all I had to type, in an interactive Python2 session to open a workbook, and retreive the column names that start with "bl":
import openpyxl
a = openpyxl.load_workbook("bla.xlsx")
[cell.value for cell in a.worksheets[0].rows[0] if cell.value.startswith("bl")]
output:
Out[8]: [u'bla', u'ble', u'bli', u'blo', u'blu']
The last input line requires on to know Python to be understood, so, here is a summary of what happens: Python is a language very fond of working with sequences - and the openpyxl libray gives your workbook as just that:
an object which is a sequence of worksheets - each worksheet having a rows attribute which has a sequence of all rows in the sheet, and each row bein a sequence of cells. Each cell has a value attribute which is the text within it.
The inline for statement is the compact form, but it could be written as a multiple line statement as:
In [10]: for cell in a.worksheets[0].rows[0]:
....: if cell.value.startswith("bl"):
....: print cell.value
....:
bla
ble
bli
blo
blu
Keep in mind that by exploring Python a bit deeper, you can programatically manipulate your data in a way that will be easier than ininteractivelygiven a data-set this size - and you can even use Python itself to drop select contents to an SQL database, (including its bult-in, single-file database, sqlite), where sophisticated indexes and queries can make working with your data a breeze)