Generating a mouse heatmap with X, Y coordinates - dataframe

I'm trying to use Python to generate a mouse heatmap using a large set of X, Y coordinates. I've imported the CSV using Pandas, here's the first few rows to get an idea of what it looks like:
X Y
0 2537 638
1 2516 637
2 2451 644
3 2317 652
4 2147 658
5 1999 647
I've tried using Matplotlib with not a lot of success, so swapped over to Seaborn to attempt to generate the heatmap that way. For reference, this is what I'm hoping to generate (with a different image in the background):
https://imgur.com/s5qiBsB
This is what my current code looks like:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
df = pd.read_csv(r'C:\Users\Jen\Desktop\mp.csv')
df[["x", "y"]] = pd.DataFrame.to_numpy(df)
matrix = np.zeros((df.x.max()+1, df.y.max()+1))
matrix[df.x, df.y] = df.index
sns.heatmap(matrix, cmap='jet')
plt.show()
With the following as a result:
https://imgur.com/12dMBsk
Obviously, this isn't exactly what I'm going for. First off, my x and y axes are swapped. What do I need to do to make my result look more like the example I provided? How do I create that blob effect around the different points?
More than happy to try anything at this point. This dataset is about 13,000 rows but I anticipate it will be even larger in the future.
(For reference, these were captured using 2 monitors, each at a resolution of 1650x1050, hence the large x values)

Related

How to make bar charts for muultiple groups?

Have a dataframe of multiple groups of stats of two classes, ex:
player position Points target_class
lebron sf 23 1
Magic pg 22 0
How do I make bar charts of the average points per position(5 of them) but split for each class. So side by side plots in pandas.
Without more information it's hard to know what you are expecting. Using sns.barplot like so should get you close to what I think you want.
import seaborn as sns
import numpy as np
ax = sns.barplot(
data=df,
x="position",
y="Points",
hue="target_class",
estimator=np.mean,
)

How to turn Pandas output into an image for download? [duplicate]

I constructed a pandas dataframe of results. This data frame acts as a table. There are MultiIndexed columns and each row represents a name, ie index=['name1','name2',...] when creating the DataFrame. I would like to display this table and save it as a png (or any graphic format really). At the moment, the closest I can get is converting it to html, but I would like a png. It looks like similar questions have been asked such as How to save the Pandas dataframe/series data as a figure?
However, the marked solution converts the dataframe into a line plot (not a table) and the other solution relies on PySide which I would like to stay away simply because I cannot pip install it on linux. I would like this code to be easily portable. I really was expecting table creation to png to be easy with python. All help is appreciated.
Pandas allows you to plot tables using matplotlib (details here).
Usually this plots the table directly onto a plot (with axes and everything) which is not what you want. However, these can be removed first:
import matplotlib.pyplot as plt
import pandas as pd
from pandas.table.plotting import table # EDIT: see deprecation warnings below
ax = plt.subplot(111, frame_on=False) # no visible frame
ax.xaxis.set_visible(False) # hide the x axis
ax.yaxis.set_visible(False) # hide the y axis
table(ax, df) # where df is your data frame
plt.savefig('mytable.png')
The output might not be the prettiest but you can find additional arguments for the table() function here.
Also thanks to this post for info on how to remove axes in matplotlib.
EDIT:
Here is a (admittedly quite hacky) way of simulating multi-indexes when plotting using the method above. If you have a multi-index data frame called df that looks like:
first second
bar one 1.991802
two 0.403415
baz one -1.024986
two -0.522366
foo one 0.350297
two -0.444106
qux one -0.472536
two 0.999393
dtype: float64
First reset the indexes so they become normal columns
df = df.reset_index()
df
first second 0
0 bar one 1.991802
1 bar two 0.403415
2 baz one -1.024986
3 baz two -0.522366
4 foo one 0.350297
5 foo two -0.444106
6 qux one -0.472536
7 qux two 0.999393
Remove all duplicates from the higher order multi-index columns by setting them to an empty string (in my example I only have duplicate indexes in "first"):
df.ix[df.duplicated('first') , 'first'] = '' # see deprecation warnings below
df
first second 0
0 bar one 1.991802
1 two 0.403415
2 baz one -1.024986
3 two -0.522366
4 foo one 0.350297
5 two -0.444106
6 qux one -0.472536
7 two 0.999393
Change the column names over your "indexes" to the empty string
new_cols = df.columns.values
new_cols[:2] = '','' # since my index columns are the two left-most on the table
df.columns = new_cols
Now call the table function but set all the row labels in the table to the empty string (this makes sure the actual indexes of your plot are not displayed):
table(ax, df, rowLabels=['']*df.shape[0], loc='center')
et voila:
Your not-so-pretty but totally functional multi-indexed table.
EDIT: DEPRECATION WARNINGS
As pointed out in the comments, the import statement for table:
from pandas.tools.plotting import table
is now deprecated in newer versions of pandas in favour of:
from pandas.plotting import table
EDIT: DEPRECATION WARNINGS 2
The ix indexer has now been fully deprecated so we should use the loc indexer instead. Replace:
df.ix[df.duplicated('first') , 'first'] = ''
with
df.loc[df.duplicated('first') , 'first'] = ''
There is actually a python library called dataframe_image
Just do a
pip install dataframe_image
Do the imports
import pandas as pd
import numpy as np
import dataframe_image as dfi
df = pd.DataFrame(np.random.randn(6, 6), columns=list('ABCDEF'))
and style your table if you want by:
df_styled = df.style.background_gradient() #adding a gradient based on values in cell
and finally:
dfi.export(df_styled,"mytable.png")
The best solution to your problem is probably to first export your dataframe to HTML and then convert it using an HTML-to-image tool.
The final appearance could be tweaked via CSS.
Popular options for HTML-to-image rendering include:
WeasyPrint
wkhtmltopdf/wkhtmltoimage
Let us assume we have a dataframe named df.
We can generate one with the following code:
import string
import numpy as np
import pandas as pd
np.random.seed(0) # just to get reproducible results from `np.random`
rows, cols = 5, 10
labels = list(string.ascii_uppercase[:cols])
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 10)), columns=labels)
print(df)
# A B C D E F G H I J
# 0 44 47 64 67 67 9 83 21 36 87
# 1 70 88 88 12 58 65 39 87 46 88
# 2 81 37 25 77 72 9 20 80 69 79
# 3 47 64 82 99 88 49 29 19 19 14
# 4 39 32 65 9 57 32 31 74 23 35
Using WeasyPrint
This approach uses a pip-installable package, which will allow you to do everything using the Python ecosystem.
One shortcoming of weasyprint is that it does not seem to provide a way of adapting the image size to its content.
Anyway, removing some background from an image is relatively easy in Python / PIL, and it is implemented in the trim() function below (adapted from here).
One also would need to make sure that the image will be large enough, and this can be done with CSS's #page size property.
The code follows:
import weasyprint as wsp
import PIL as pil
def trim(source_filepath, target_filepath=None, background=None):
if not target_filepath:
target_filepath = source_filepath
img = pil.Image.open(source_filepath)
if background is None:
background = img.getpixel((0, 0))
border = pil.Image.new(img.mode, img.size, background)
diff = pil.ImageChops.difference(img, border)
bbox = diff.getbbox()
img = img.crop(bbox) if bbox else img
img.save(target_filepath)
img_filepath = 'table1.png'
css = wsp.CSS(string='''
#page { size: 2048px 2048px; padding: 0px; margin: 0px; }
table, td, tr, th { border: 1px solid black; }
td, th { padding: 4px 8px; }
''')
html = wsp.HTML(string=df.to_html())
html.write_png(img_filepath, stylesheets=[css])
trim(img_filepath)
Using wkhtmltopdf/wkhtmltoimage
This approach uses an external open source tool and this needs to be installed prior to the generation of the image.
There is also a Python package, pdfkit, that serves as a front-end to it (it does not waive you from installing the core software yourself), but I will not use it.
wkhtmltoimage can be simply called using subprocess (or any other similar means of running an external program in Python).
One would also need to output to disk the HTML file.
The code follows:
import subprocess
df.to_html('table2.html')
subprocess.call(
'wkhtmltoimage -f png --width 0 table2.html table2.png', shell=True)
and its aspect could be further tweaked with CSS similarly to the other approach.
Although I am not sure if this is the result you expect, you can save your DataFrame in png by plotting the DataFrame with Seaborn Heatmap with annotations on, like this:
http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.heatmap.html#seaborn.heatmap
It works right away with a Pandas Dataframe. You can look at this example: Efficiently ploting a table in csv format using Python
You might want to change the colormap so it displays a white background only.
Hope this helps.
Edit:
Here is a snippet that does this:
import matplotlib
import seaborn as sns
def save_df_as_image(df, path):
# Set background to white
norm = matplotlib.colors.Normalize(-1,1)
colors = [[norm(-1.0), "white"],
[norm( 1.0), "white"]]
cmap = matplotlib.colors.LinearSegmentedColormap.from_list("", colors)
# Make plot
plot = sns.heatmap(df, annot=True, cmap=cmap, cbar=False)
fig = plot.get_figure()
fig.savefig(path)
The solution of #bunji works for me, but default options don't always give a good result.
I added some useful parameter to tweak the appearance of the table.
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import table
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df.index = [item.strftime('%Y-%m-%d') for item in df.index] # Format date
fig, ax = plt.subplots(figsize=(12, 2)) # set size frame
ax.xaxis.set_visible(False) # hide the x axis
ax.yaxis.set_visible(False) # hide the y axis
ax.set_frame_on(False) # no visible frame, uncomment if size is ok
tabla = table(ax, df, loc='upper right', colWidths=[0.17]*len(df.columns)) # where df is your data frame
tabla.auto_set_font_size(False) # Activate set fontsize manually
tabla.set_fontsize(12) # if ++fontsize is necessary ++colWidths
tabla.scale(1.2, 1.2) # change size table
plt.savefig('table.png', transparent=True)
The result:
I had the same requirement for a project I am doing. But none of the answers came elegant to my requirement. Here is something which finally helped me, and might be useful for this case:
from bokeh.io import export_png, export_svgs
from bokeh.models import ColumnDataSource, DataTable, TableColumn
def save_df_as_image(df, path):
source = ColumnDataSource(df)
df_columns = [df.index.name]
df_columns.extend(df.columns.values)
columns_for_table=[]
for column in df_columns:
columns_for_table.append(TableColumn(field=column, title=column))
data_table = DataTable(source=source, columns=columns_for_table,height_policy="auto",width_policy="auto",index_position=None)
export_png(data_table, filename = path)
There is a Python library called df2img available at https://pypi.org/project/df2img/ (disclaimer: I'm the author). It's a wrapper/convenience function using plotly as backend.
You can find the documentation at https://df2img.dev.
import pandas as pd
import df2img
df = pd.DataFrame(
data=dict(
float_col=[1.4, float("NaN"), 250, 24.65],
str_col=("string1", "string2", float("NaN"), "string4"),
),
index=["row1", "row2", "row3", "row4"],
)
Saving a pd.DataFrame as a .png-file can be done fairly quickly. You can apply formatting, such as background colors or alternating the row colors for better readability.
fig = df2img.plot_dataframe(
df,
title=dict(
font_color="darkred",
font_family="Times New Roman",
font_size=16,
text="This is a title",
),
tbl_header=dict(
align="right",
fill_color="blue",
font_color="white",
font_size=10,
line_color="darkslategray",
),
tbl_cells=dict(
align="right",
line_color="darkslategray",
),
row_fill_color=("#ffffff", "#d7d8d6"),
fig_size=(300, 160),
)
df2img.save_dataframe(fig=fig, filename="plot.png")
If you're okay with the formatting as it appears when you call the DataFrame in your coding environment, then the absolute easiest way is to just use print screen and crop the image using basic image editing software.
Here's how it turned out for me using Jupyter Notebook, and Pinta Image Editor (Ubuntu freeware).
As jcdoming suggested, use Seaborn heatmap():
import seaborn as sns
import matplotlib.pyplot as plt
fig = plt.figure(facecolor='w', edgecolor='k')
sns.heatmap(df.head(), annot=True, cmap='viridis', cbar=False)
plt.savefig('DataFrame.png')
The easiest and fastest way to convert a Pandas dataframe into a png image using Anaconda Spyder IDE- just double-click on the dataframe in variable explorer, and the IDE table will appear, nicely packaged with automatic formatting and color scheme. Just use a snipping tool to capture the table for use in your reports, saved as a png:
This saves me lots of time, and is still elegant and professional.
The following would need extensive customisation to format the table correctly, but the bones of it works:
import numpy as np
from PIL import Image, ImageDraw, ImageFont
import pandas as pd
df = pd.DataFrame({ 'A' : 1.,
'B' : pd.Series(1,index=list(range(4)),dtype='float32'),
'C' : np.array([3] * 4,dtype='int32'),
'D' : pd.Categorical(["test","train","test","train"]),
'E' : 'foo' })
class DrawTable():
def __init__(self,_df):
self.rows,self.cols = _df.shape
img_size = (300,200)
self.border = 50
self.bg_col = (255,255,255)
self.div_w = 1
self.div_col = (128,128,128)
self.head_w = 2
self.head_col = (0,0,0)
self.image = Image.new("RGBA", img_size,self.bg_col)
self.draw = ImageDraw.Draw(self.image)
self.draw_grid()
self.populate(_df)
self.image.show()
def draw_grid(self):
width,height = self.image.size
row_step = (height-self.border*2)/(self.rows)
col_step = (width-self.border*2)/(self.cols)
for row in range(1,self.rows+1):
self.draw.line((self.border-row_step//2,self.border+row_step*row,width-self.border,self.border+row_step*row),fill=self.div_col,width=self.div_w)
for col in range(1,self.cols+1):
self.draw.line((self.border+col_step*col,self.border-col_step//2,self.border+col_step*col,height-self.border),fill=self.div_col,width=self.div_w)
self.draw.line((self.border-row_step//2,self.border,width-self.border,self.border),fill=self.head_col,width=self.head_w)
self.draw.line((self.border,self.border-col_step//2,self.border,height-self.border),fill=self.head_col,width=self.head_w)
self.row_step = row_step
self.col_step = col_step
def populate(self,_df2):
font = ImageFont.load_default().font
for row in range(self.rows):
print(_df2.iloc[row,0])
self.draw.text((self.border-self.row_step//2,self.border+self.row_step*row),str(_df2.index[row]),font=font,fill=(0,0,128))
for col in range(self.cols):
text = str(_df2.iloc[row,col])
text_w, text_h = font.getsize(text)
x_pos = self.border+self.col_step*(col+1)-text_w
y_pos = self.border+self.row_step*row
self.draw.text((x_pos,y_pos),text,font=font,fill=(0,0,128))
for col in range(self.cols):
text = str(_df2.columns[col])
text_w, text_h = font.getsize(text)
x_pos = self.border+self.col_step*(col+1)-text_w
y_pos = self.border - self.row_step//2
self.draw.text((x_pos,y_pos),text,font=font,fill=(0,0,128))
def save(self,filename):
try:
self.image.save(filename,mode='RGBA')
print(filename," Saved.")
except:
print("Error saving:",filename)
table1 = DrawTable(df)
table1.save('C:/Users/user/Pictures/table1.png')
The output looks like this:
People who use Plotly for data visualization:
You can easily convert the dataframe to go.Table.
You can save the dataframe with columns names.
You can format the dataframe through go.Table.
You can save the dataframe as pdf, jpg, or png with different scales and high resolution.
import plotly.express as px
df = px.data.medals_long()
fig = go.Figure(data=[
go.Table(
header=dict(values=list(df.columns),align='center'),
cells=dict(values=df.values.transpose(),
fill_color = [["white","lightgrey"]*df.shape[0]],
align='center'
)
)
])
fig.write_image('image.png',scale=6)
Note: the image is downloaded in the same directory where the current python file is running.
Output:
I really like the way Jupyter notebooks format the DataFrame and this library exports it in the same format:
import dataframe_image as dfi
dfi.export(df, "df.png")
There is also a dpi argument in case you want to increase the quality of the image. I'd recommend 300 for an ok quality, 600 for exelent, 1200 for perfect and more than that is probably too much.
import dataframe_image as dfi
dfi.export(df, "df.png", dpi = 600)

KeyError while plotting a graph in matplotlib

I am trying to plot a simple graph for the dataframe below
indeces Zeitstempel Ergebnis
0 382 16.04.2020 16:12:07 PASS
1 383 16.04.2020 16:13:07 PASS
2 392 16.04.2020 16:13:20 FAIL
3 382 16.04.2020 16:13:22 PASS
4 383 16.04.2020 16:14:22 PASS
It has three columns. The x-axis should be Zeitstempel, y-axis should be indeces and I would also want to specify the values in Ergebnis column(maybe color coding green for PASS,red for FAIL and grey for BLOCKED)
as to which index is passing or failing or blocking at what time. The actual dataframe has 1172 rows × 3 columns values but in the above i have only mentioned few.
The code I am trying is as below but somehow I am not able to figure out how to plot all the 3 as required.
times = pd.date_range('2020-04-16 04:12 AM', '2020-04-16 11:00 PM', freq='1H')
fig, ax = plt.subplots(1)
fig.autofmt_xdate()
df.plot(kind='line',x='times',y='Index',ax=ax)
xfmt = mdates.DateFormatter('%d-%m-%y %H:%M')
ax.xaxis.set_major_formatter(xfmt)
ax = plt.gca()
plt.show()
times has Zeitstempel values and Index has indeces values stored in them. This gives me KeyError. Is there a simpler way to do this? I am new to matplotlib and I am running out of possibilities. Please suggest.
Take a look at the answer I posted at: How to read a dataframe in np.genfromtxt instead of a file in matplotlib. It shows how to load data from a csv file with np.genfromtxt() then generate the desired color coded plot (similar to what you want to do). If you can map your Pandas data to a NumPy recarray, the rest of the process will still work the same.
I'm not familiar with Pandas, so can only supply pseudo-code. It will look something like this. Replace this call to np.genfromtxt() (that reads the csv data from a file):
csv_arr = np.genfromtxt(csv, # Data to be read
...)
With the following lines to create the recarray. (I reused the csv_arr name to simplify. Feel free to use any name you like):
csv_dt = np.dtype([ ('indeces', '<i4'), ('Zeitstempel', 'O'), ('Ergebnis', '<U7') ])
csv_arr = np.empty(shape=(nrows,), dtype=csv_dt)
csv_arr['Zeitstempel'] = # pandas Zeitstempel data goes here as a numpy array
csv_arr['indeces'] = # pandas indeces data goes here as a numpy array
csv_arr['Ergebnis'] = # pandas Ergebnis data goes here as a numpy array
After you add your Pandas data to csv_arr, the rest of the code should work the same and create the same plot shown in the referenced answer. Good luck.

Extra lane in heat map (pandas)

Here is my file
I plot heat map from it using the following code:
import pandas as pd
import matplotlib.pyplot as plt
new = pd.read_csv(r'path_to_file')
full_list=new.columns.values
new = new[full_list[1:]]
plt.pcolor(new, cmap='Blues')
plt.show()
File has only 11 rows of values, but for some reason 12 rows show up. Do you know what is wrong?
Here is how output looks for me:
There is nothing wrong. First, this has nothing to do with pandas, so we can leave that out and consider the following example
import matplotlib.pyplot as plt
import numpy as np
a = np.random.randint(0,10,size=(11, 2))
plt.pcolor(a, cmap='Blues')
plt.show()
We create an array with 11 rows and 2 columns and plot it. It also shows a 12th row.
The easiest solution is probably to just limit the axis to the number of rows
plt.ylim([0,a.shape[0]])
in this case plt.ylim([0,11]).
However we want to know more...
Is eleven special? Maybe, so let's find out by putting some other numbers in.
1 to 10 work fine. 11 won't. 12 will, 13 not.
So what is special about those numbers, is that matplotlib cannot easily find good axes tickmarks if it is asked to plot 11, 13, ... entities.
This is decided by the matplotlib locator.
The tricky part would now be to find a good locator for 11 entities. I think there is none, as
plt.gca().yaxis.set_major_locator( MaxNLocator(nbins = 11) )
wont work here. But this may also be a different question now.

Cutting up the x-axis to produce multiple graphs with seaborn?

The following code when graphed looks really messy at the moment. The reason is I have too many values for 'fare'. 'Fare' ranges from [0-500] with most of the values within the first 100.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
y =titanic.groupby([titanic.fare//1,'sex']).survived.mean().reset_index()
sns.set(style="whitegrid")
g = sns.factorplot(x='fare', y= 'survived', col = 'sex', kind ='bar' ,data= y,
size=4, aspect =2.5 , palette="muted")
g.despine(left=True)
g.set_ylabels("Survival Probability")
g.set_xlabels('Fare')
plt.show()
I would like to try slicing up the 'fare' of the plots into subsets but would like to see all the graphs at the same time on one screen. I was wondering it this is possible without having to resort to groupby.
I will have to play around with the values of 'fare' to see what I would want each graph to represent, but for a sample let's use break up the graph into these 'fare' values.
[0-18]
[18-35]
[35-70]
[70-300]
[300-500]
So the total would be 10 graphs on one page, because of the juxtaposition with the opposite sex.
Is it possible with Seaborn? Do I need to do a lot of configuring with matplotlib? Thanks.
Actually I wrote a little blog post about this a while ago. If you are plotting histograms you can use the by keyword:
import matplotlib.pyplot as plt
import seaborn.apionly as sns
sns.set() #rescue matplotlib's styles from the early '90s
data = sns.load_dataset('titanic')
data.hist(by='class', column = 'fare')
plt.show()
Otherwise if you're just plotting value-counts, you have to roll your own grid:
def categorical_hist(self,column,by,layout=None,legend=None,**params):
from math import sqrt, ceil
if layout==None:
s = ceil(sqrt(self[column].unique().size))
layout = (s,s)
return self.groupby(by)[column]\
.value_counts()\
.sort_index()\
.unstack()\
.plot.bar(subplots=True,layout=layout,legend=None,**params)
categorical_hist(data, by='class', column='embark_town')
Edit If you want survival rate by fare range, you could do something like this
data.groupby(pd.cut(data.fare,10)).apply(lambda x.survived.sum(): x./len(x))