How to set spacing in html export of a pandas dataframe? - pandas

I'm trying to modify this answer and get more spacing between columns.
import pandas as pd
df = pd.DataFrame({'A': [1,10],
'B': ['B','BBBBBB'],
'C': [0,1000],
'D': ['D','DDDDDD']})
#https://stackoverflow.com/a/5667535/3014199
spacing = dict(selector="table",props=[('border-collapse', 'separate'),
('border-spacing', '100px 500px')])
# Style
result=df.style.set_properties(subset=df.columns[[0,2]], **{'text-align':'right'})\
.set_properties(subset=df.columns[[1,3]], **{'text-align':'left'})\
.set_table_styles([spacing])
print(result.render(),file=open('test.html','w'))
But despite ridiculous values, the columns don't seem any further apart.
adding e.g. 'padding-right':'10px', in set_properties seems to work, but I want to do things right.
(Also, how can I suppress the index, it was index=False for .to_html but where to put it here?)

You have to skip selector="table" to assign properties to <table></table>.
With selector="table" it assign for table inside table <table><table></table></table>.
You can use
result.render(head=[])`
to skip headers but there is still <thread> which moves other elements when you use 'border-spacing'
Using
dict(selector="thead", props = [('display', 'none')])
you can hide <thread>
You can also use skip head=[] and it will keep headers in file but you will no see them.
import pandas as pd
import webbrowser
df = pd.DataFrame({
'A': [1, 10],
'B': ['B', 'BBBBBB'],
'C': [0, 1000],
'D': ['D', 'DDDDDD'],
})
styles = [
dict(
props=[
('border-collapse', 'separate'),
('border-spacing', '10px 50px')
]
),
dict(
selector="thead",
props = [('display', 'none')]
)
]
result = df.style.set_properties(subset=df.columns[[0,2]], **{'text-align':'right'})\
.set_properties(subset=df.columns[[1,3]], **{'text-align':'left'})\
.set_table_styles(styles)
with open('test.html', 'w') as f:
f.write(result.render(head=[]))
webbrowser.open('test.html')
BTW: I checked in source code: render() uses template html.tpl. to_html() uses much complex methods to render HTML (ie. it uses class HTMLFormatter).

I prefer to use bootstrap for the spaces
# Creates the dataframe
df = pd.DataFrame(myData)
# Parses the dataframe into an HTML element with 3 Bootstrap classes assigned.
tables=[df.to_html(classes='table table-hover', header="true")]
in this example, I use the class "table table-hover" from Bootstrap

Related

How to change pandas display font of index column

data = {
'X': [3, 2, 0, 1],
'Y': [0, 3, 7, 2]
}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
df.style.set_properties(**{
'font-family':'Courier New'
})
df
The index column is displayed in bold, is it possible to change font of index column?
You must use table_styles. In this example I manage to make the "font-weight":"normal" for the index and columns:
Let's define some test data:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3,4],
'B':[5,4,3,1]})
We define style customization to use:
styles = [
dict(selector="th", props=[("font-weight","normal"),
("text-align", "center")])]
We pass the style variable as the argument for set_table_styles():
html = (df.style.set_table_styles(styles))
html
And the output is:
Please feel free to read about the documentation in pandas Styling for more details.

Pandas exporting and importing different dictionaries

I have two functions for export and import dicts to and from csv, because I have many dictionaries of different kinds.
def export_dict(dict, filename):
df = pd.DataFrame.from_dict(dict, orient="index")
df.to_csv(CSV_PATH + filename)
def import_dict(filename):
df = pd.read_csv(filename, header = None, keep_default_na=False)
d = df.to_dict('split')
d = dict(d["data"])
return d
The dicts look like that:
d1 = {'123456': 8, '654321': 90, '123654': 157483}
d2 = {'ouwejfw': [4], 'uwlfhsn0': [3, 1, 89], 'afwefwe': [3, 4], 'a3rz0dsd': []}
So basically the first case is a simple one, where a string is a key (but it could be an int) and a number is the value.
The second case is where a string is the key and a list of values of different sizes is the value.
The first one, I can write and read without problems. But the second one breaks everything, because of the different sized lists. I cannot shorten them and I cannot use too much space by adding columns, because I have many Millions of data.
Can somebody help me, what can I do to read/write both dicts correctly?
Thank you!
You can consider the dictionary values as a single item while storing in dataframe
import pandas as pd
def export_dict(dict, filename):
df = pd.DataFrame()
df["keys"] = d2.keys()
df["values"] = d2.values()
df.to_csv(CSV_PATH + filename)
d2 = {'ouwejfw': [4], 'uwlfhsn0': [3, 1, 89], 'afwefwe': [3, 4], 'a3rz0dsd': []}
export_dict(d2, "your_filename.csv")

why pandas dataframe style lost when saved with "to_excel"?

Per this example the to_excel method should save the Excel file with background color. However, my saved Excel file does not have any color in it.
I tried to write using both openpyxl and xlsxwriter engines. In both cases, the Excel file was saved, but the cell color/style was lost.
I can read the file back and reformat with openpyxl, but if this to_excel method is supposed to work, why doesn't it?
Here is the sample code.
import pandas as pd # version 0.24.2
dict = {'A': [1, 1, 1, 1, 1], 'B':[2, 1, 2, 1, 2], 'C':[1, 2, 1, 2, 1]}
df = pd.DataFrame(dict)
df_styled = df.style.apply(lambda x: ["background: #ffa31a" if x.iloc[0] < v else " " for v in x], axis=1)
df_styled
''' in my jupyter notebook, this displayed my dataframe with background color when condition is met, (all the 2s highlighted)'''
'''Save the styled data frame to excel using to_excel'''
df_styled.to_excel('example_file_openpyxl.xlsx', engine='openpyxl')
df_styled.to_excel('example_file_xlsxwriter.xlsx', engine='xlsxwriter')
I stumbled across this myself and as far as I'm aware there isn't support for exporting to excel like this yet. I've adjusted your code to match the output to excel in the documentation.
This is the documentation output to excel method.
df.style.\
applymap(color_negative_red).\
apply(highlight_max).\
to_excel('styled.xlsx', engine='openpyxl')
This is your code adjusted:
import pandas as pd
dict = {'A': [1, 1, 1, 1, 1], 'B':[2,1,2,1,2], 'C':[1,2,1,2,1]}
df = pd.DataFrame(dict)
def highlight(df, color = "yellow"):
attr = 'background-color: {}'.format(color)
df_bool = pd.DataFrame(df.apply(lambda x: [True if x.iloc[0] < v else False for v in x],axis=1).apply(pd.Series),
index=df.index)
df_bool.columns =df.columns
return pd.DataFrame(np.where(df_bool, attr, ""),
index= df.index, columns=df.columns)
df.style. \
apply(highlight, axis=None).\
to_excel("styled.xlsx", engine="openpyxl")
Inside the highlight function, I create a boolean dataframe based on the conditions applied in the list comprehension above. Then, I assign styling based on the result of this dataframe.

pandas: .equals should evaluate to True?

I have 2 pandas DataFrames that appear to be exactly the same. However, when I test using the .equals method I get False. Any idea what the potential inconsistency may be? Is there something I'm not checking?
print(df1.values.tolist()==df2.values.tolist())
print(df1.columns.tolist()==df2.columns.tolist())
print(df1.index.tolist()==df2.index.tolist())
print(df1.equals(df2))
# True
# True
# True
# False
One possibility is different datatypes that evaluate as equal in python-space, e.g.
df1 = pd.DataFrame({'a': [1, 2.0, 3]})
df2 = pd.DataFrame({'a': [1,2,3]})
df1.values.tolist() == df2.values.tolist()
Out[45]: True
df1.equals(df2)
Out[46]: False
To chase this down, you can use the assert_frame_equal function.
from pandas.testing import assert_frame_equal
assert_frame_equal(df1, df2)
AssertionError: Attributes are different
Attribute "dtype" are different
[left]: float64
[right]: int64
In version of pandas before 0.20.1, the import is from pandas.util.testing import assert_frame_equal

How to skip key error in pandas?

I have a dictionary and a list. For each key in the list, I want to plot the associated values with that key.
I have the following code in pandas:
import numpy as np; np.random.seed(22)
import seaborn as sns; sns.set(color_codes=True)
window = int(math.ceil(5000.0 / 100))
xticks = range(-2500,2500,window)
sns.tsplot([mydictionary[k] for k in mylist],time=xticks,color="g")
plt.legend(['blue'])
However, I get KeyError: xxxx
I can manually remove all problematic keys in my list, but that will take a long time. Is there a way I can skip this key error?
If you are looking for a way to just swallow the key error, use a try & except. However, cleaning up the data in advance would be much more elegant.
Example:
mydictionary = {
'a': 1,
'b': 2,
'c': 3,
}
mylist = ['a', 'b', 'c', 'd']
result = []
for k in mylist:
try:
result.append(mydictionary[k])
except KeyError:
pass
print(result)
>>> [1, 2, 3]
You will need to construct the list prior to using it in your seaborn plot. Afterwards, pass the list with the call:
sns.tsplot(result ,time=xticks,color="g")