How to change pandas display font of index column - pandas

data = {
'X': [3, 2, 0, 1],
'Y': [0, 3, 7, 2]
}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
df.style.set_properties(**{
'font-family':'Courier New'
})
df
The index column is displayed in bold, is it possible to change font of index column?

You must use table_styles. In this example I manage to make the "font-weight":"normal" for the index and columns:
Let's define some test data:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3,4],
'B':[5,4,3,1]})
We define style customization to use:
styles = [
dict(selector="th", props=[("font-weight","normal"),
("text-align", "center")])]
We pass the style variable as the argument for set_table_styles():
html = (df.style.set_table_styles(styles))
html
And the output is:
Please feel free to read about the documentation in pandas Styling for more details.

Related

How to count No. of rows with special characters in all columns of a PySpark DataFrame?

Assume that I have a PySpark DataFrame. Some of the cells contain only special characters.
Sample dataset:
import pandas as pd
data = {'ID': [1, 2, 3, 4, 5, 6],
'col_1': ['A', '?', '<', ' ?', None, 'A?'],
'col_2': ['B', ' ', '', '?>', 'B', '\B']
}
pdf = pd.DataFrame(data)
df = spark.createDataFrame(pdf)
I want to count the number of rows which contain only special characters (except blank cells). Values like 'A?' and '\B' and blank cells are not counted.
The expected output will be:
{'ID': 0, 'col_1': 3, 'col_2': 1}
Is there anyway to do that?
Taking your sample dataset, this should do:
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when
spark = SparkSession.builder.getOrCreate()
import pandas as pd
data = {'ID': [1, 2, 3, 4, 5, 6],
'col_1': ['A', '?', '<', ' ?', None, 'A?'],
'col_2': ['B', ' ', '', '?>', 'B', '\B']
}
pdf = pd.DataFrame(data)
df = spark.createDataFrame(pdf)
res = {}
for col_name in df.columns:
df = df.withColumn('matched', when((col(col_name).rlike('[^A-Za-z0-9\s]')) & ~(col(col_name).rlike('[A-Za-z0-9]')), True).otherwise(False))
res[col_name] = df.select('ID').where(df.matched).count()
print(res)
The trick is to use regular expressions with two conditions to filter the cells that are valid according to your logic.

Can you change the caption font size using Pandas styling?

I am trying to change the font size of a caption using the pandas styling API. Is this possible?
Here is what I have so far:
import pandas as pd
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=data)
df.style.set_caption("Some Caption")
Appreciate any input.
Try this:
df.style.set_caption("Some Caption").set_table_styles([{
'selector': 'caption',
'props': [
('color', 'red'),
('font-size', '16px')
]
}])

why pandas dataframe style lost when saved with "to_excel"?

Per this example the to_excel method should save the Excel file with background color. However, my saved Excel file does not have any color in it.
I tried to write using both openpyxl and xlsxwriter engines. In both cases, the Excel file was saved, but the cell color/style was lost.
I can read the file back and reformat with openpyxl, but if this to_excel method is supposed to work, why doesn't it?
Here is the sample code.
import pandas as pd # version 0.24.2
dict = {'A': [1, 1, 1, 1, 1], 'B':[2, 1, 2, 1, 2], 'C':[1, 2, 1, 2, 1]}
df = pd.DataFrame(dict)
df_styled = df.style.apply(lambda x: ["background: #ffa31a" if x.iloc[0] < v else " " for v in x], axis=1)
df_styled
''' in my jupyter notebook, this displayed my dataframe with background color when condition is met, (all the 2s highlighted)'''
'''Save the styled data frame to excel using to_excel'''
df_styled.to_excel('example_file_openpyxl.xlsx', engine='openpyxl')
df_styled.to_excel('example_file_xlsxwriter.xlsx', engine='xlsxwriter')
I stumbled across this myself and as far as I'm aware there isn't support for exporting to excel like this yet. I've adjusted your code to match the output to excel in the documentation.
This is the documentation output to excel method.
df.style.\
applymap(color_negative_red).\
apply(highlight_max).\
to_excel('styled.xlsx', engine='openpyxl')
This is your code adjusted:
import pandas as pd
dict = {'A': [1, 1, 1, 1, 1], 'B':[2,1,2,1,2], 'C':[1,2,1,2,1]}
df = pd.DataFrame(dict)
def highlight(df, color = "yellow"):
attr = 'background-color: {}'.format(color)
df_bool = pd.DataFrame(df.apply(lambda x: [True if x.iloc[0] < v else False for v in x],axis=1).apply(pd.Series),
index=df.index)
df_bool.columns =df.columns
return pd.DataFrame(np.where(df_bool, attr, ""),
index= df.index, columns=df.columns)
df.style. \
apply(highlight, axis=None).\
to_excel("styled.xlsx", engine="openpyxl")
Inside the highlight function, I create a boolean dataframe based on the conditions applied in the list comprehension above. Then, I assign styling based on the result of this dataframe.

How to set spacing in html export of a pandas dataframe?

I'm trying to modify this answer and get more spacing between columns.
import pandas as pd
df = pd.DataFrame({'A': [1,10],
'B': ['B','BBBBBB'],
'C': [0,1000],
'D': ['D','DDDDDD']})
#https://stackoverflow.com/a/5667535/3014199
spacing = dict(selector="table",props=[('border-collapse', 'separate'),
('border-spacing', '100px 500px')])
# Style
result=df.style.set_properties(subset=df.columns[[0,2]], **{'text-align':'right'})\
.set_properties(subset=df.columns[[1,3]], **{'text-align':'left'})\
.set_table_styles([spacing])
print(result.render(),file=open('test.html','w'))
But despite ridiculous values, the columns don't seem any further apart.
adding e.g. 'padding-right':'10px', in set_properties seems to work, but I want to do things right.
(Also, how can I suppress the index, it was index=False for .to_html but where to put it here?)
You have to skip selector="table" to assign properties to <table></table>.
With selector="table" it assign for table inside table <table><table></table></table>.
You can use
result.render(head=[])`
to skip headers but there is still <thread> which moves other elements when you use 'border-spacing'
Using
dict(selector="thead", props = [('display', 'none')])
you can hide <thread>
You can also use skip head=[] and it will keep headers in file but you will no see them.
import pandas as pd
import webbrowser
df = pd.DataFrame({
'A': [1, 10],
'B': ['B', 'BBBBBB'],
'C': [0, 1000],
'D': ['D', 'DDDDDD'],
})
styles = [
dict(
props=[
('border-collapse', 'separate'),
('border-spacing', '10px 50px')
]
),
dict(
selector="thead",
props = [('display', 'none')]
)
]
result = df.style.set_properties(subset=df.columns[[0,2]], **{'text-align':'right'})\
.set_properties(subset=df.columns[[1,3]], **{'text-align':'left'})\
.set_table_styles(styles)
with open('test.html', 'w') as f:
f.write(result.render(head=[]))
webbrowser.open('test.html')
BTW: I checked in source code: render() uses template html.tpl. to_html() uses much complex methods to render HTML (ie. it uses class HTMLFormatter).
I prefer to use bootstrap for the spaces
# Creates the dataframe
df = pd.DataFrame(myData)
# Parses the dataframe into an HTML element with 3 Bootstrap classes assigned.
tables=[df.to_html(classes='table table-hover', header="true")]
in this example, I use the class "table table-hover" from Bootstrap

How to skip key error in pandas?

I have a dictionary and a list. For each key in the list, I want to plot the associated values with that key.
I have the following code in pandas:
import numpy as np; np.random.seed(22)
import seaborn as sns; sns.set(color_codes=True)
window = int(math.ceil(5000.0 / 100))
xticks = range(-2500,2500,window)
sns.tsplot([mydictionary[k] for k in mylist],time=xticks,color="g")
plt.legend(['blue'])
However, I get KeyError: xxxx
I can manually remove all problematic keys in my list, but that will take a long time. Is there a way I can skip this key error?
If you are looking for a way to just swallow the key error, use a try & except. However, cleaning up the data in advance would be much more elegant.
Example:
mydictionary = {
'a': 1,
'b': 2,
'c': 3,
}
mylist = ['a', 'b', 'c', 'd']
result = []
for k in mylist:
try:
result.append(mydictionary[k])
except KeyError:
pass
print(result)
>>> [1, 2, 3]
You will need to construct the list prior to using it in your seaborn plot. Afterwards, pass the list with the call:
sns.tsplot(result ,time=xticks,color="g")