Image as array in pandas dataframe - pandas

Images in PNG format in a variable (as array) are not being displayed in pandas dataframe.
The command lines:
display(HTML('df.html', df.to_html(escape=False ,formatters=format_dict)))
or
display(HTML(df.to_html(escape=False ,formatters=format_dict)))
should run using the following IPython packages
from IPython.display import display,HTML
from IPython.display import IFrame
However, these commands can only concatenate str (not "PngImageFile") to str. Also, I found in the stack overflow (How to write PNG image to string with the PIL?) that it is possible to solve this issue using a png file, but I need to add this png file into a pandas dataframe that does not work as well because a PngImageFile object is not callable.
How can I solve this issue?
Here is an example:
https://colab.research.google.com/drive/163jW-GoXKW4TckuIVK3_wjXB9wP11u0g?usp=sharing

Related

Generate graphs for all the columns in a excel file with Pandas in Google Colab

I'm trying to generate graphs for all the columns from a excel file but I'm having one error. My goal is getting this graphs in png files and then download it. Let me give you some context: I'm reading a csv, for each column, I'm trying to use a for to use .value_counts() and then create a graph once the graph is generated saving this one in a png file with the number of the index my code is this one:
import pandas as pd
from google.colab import files
from matplotlib import pyplot as plt
df = pd.read_excel('columns.xlsx')
for i in df.columns:
print(i)
#i.value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
df[i].value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
#plt.savefig('viz_movies.png')
Error in this code line:
df[i].value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
Error:
index 0 is out of bounds for axis 0 with size 0
Also I want to add in the for the names of the files something like this:
for index, number in enumerate(numbers):
plt.savefig('index.png') #use the index as a name

how to save Correlation map generated in pandas [duplicate]

The code below when run in jupyter notebook renders a table with a colour gradient format that I would like to export to an image file.
The resulting 'styled_table' object that notebook renders is a pandas.io.formats.style.Styler type.
I have not been able to find a way to export the Styler to an image.
I hope someone can share a working example of an export, or give me some pointers.
import pandas as pd
import seaborn as sns
data = {('count', 's25'):
{('2017-08-11', 'Friday'): 88.0,
('2017-08-12', 'Saturday'): 90.0,
('2017-08-13', 'Sunday'): 93.0},
('count', 's67'):
{('2017-08-11', 'Friday'): 404.0,
('2017-08-12', 'Saturday'): 413.0,
('2017-08-13', 'Sunday'): 422.0},
('count', 's74'):
{('2017-08-11', 'Friday'): 203.0,
('2017-08-12', 'Saturday'): 227.0,
('2017-08-13', 'Sunday'): 265.0},
('count', 's79'):
{('2017-08-11', 'Friday'): 53.0,
('2017-08-12', 'Saturday'): 53.0,
('2017-08-13', 'Sunday'): 53.0}}
table = pd.DataFrame.from_dict(data)
table.sort_index(ascending=False, inplace=True)
cm = sns.light_palette("seagreen", as_cmap=True)
styled_table = table.style.background_gradient(cmap=cm)
styled_table
As mentioned in the comments, you can use the render property to obtain an HTML of the styled table:
html = styled_table.render()
You can then use a package that converts html to an image. For example, IMGKit: Python library of HTML to IMG wrapper. Bear in mind that this solution requires the installation of wkhtmltopdf, a command line tool to render HTML into PDF and various image formats. It is all described in the IMGKit page.
Once you have that, the rest is straightforward:
import imgkit
imgkit.from_string(html, 'styled_table.png')
You can use dexplo's dataframe_image from https://github.com/dexplo/dataframe_image. After installing the package, it also lets you save styler objects as images like in this example from the README:
import numpy as np
import pandas as pd
import dataframe_image as dfi
df = pd.DataFrame(np.random.rand(6,4))
df_styled = df.style.background_gradient()
dfi.export(df_styled, 'df_styled.png')

Loading a csv file with no header on my Colab by Pandas read_csv and Numpy loadtxt gave me a different results

This is the image of the error on my Colab when I used pd.dtye to pd_data
This is the image of the error on my Colab when I used np.dtye to pd_data
I have loaded one csv file to my Colab note by two diffrent way. By pd.read_csv() and np.loadtxt(). And I have assigned these two in nd_data and pd_data ,repectively. After that I printed the shape of each data. At this point I've got two diffrent shape even though I loaded the same csv file.
My question is why I've got two diffrent shape by loading the same data.
this is the link to ThoraricSurgery.csv file which I've used.
'''
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
pd_data = pd.read_csv('/content/drive/MyDrive/딥러닝과실습1/ThoraricSurgery.csv')
print(pd_data.shape)
print(type(pd_data))
import numpy as np
nd_data = np.loadtxt('/content/drive/MyDrive/딥러닝과실습1/ThoraricSurgery.csv', delimiter=",")
print(nd_data.shape)
print(type(nd_data))this is the mentioned result
'''

How to convert PDF to excel using tabula-py into dataframe of several tables?

I have a PDF file where are several tables, For example:
Table from PDF File
By the way, I learned that I have to use tabula-py from Java (Note: I'm working on Jupyter Notebook
So, I code this:
import pandas as pd
import numpy as np
import tabula
from tabula import read_pdf
pdf_path = "..\PDFs\pobreza2.pdf" #File direction
df=tabula.read_pdf(pdf_path, pages="all", stream=True, guess=False, multiple_tables=True) #PDF have many pages with several tables
And I get this:
Output of the code
It's like a list and not a dataframe
So, how could I get this table into a Dataframe? The tables have string and float object

png file shows bluish image when using plt.imshow()

I'm trying to plot a png file using matplotlib.pyplot.imshow() but it's showing a bluish image(see below). It works for jpeg file but not for png.
This is the code:
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
im = Image.open('apple_logo.png')
im.save('test.png') #test.png is same as original
data = np.array(im)
print(data)
plt.imshow(data) #shows a bluish image of the logo
The image i'm using:
bluish image:
Python 3.8.2
matplotlib 3.3.0
Pillow 7.2.0
numpy 1.19.0
OS: Windows 10
The original PNG image is an indexed PNG file. That is, it has a palette (i.e. a lookup table for the colors), and the array of data that makes up the image is an array of indices into the lookup table. When you convert im to a numpy array with data = np.array(im), data is the array of indices into the palette, instead of the array of actual colors.
Use the convert() method before passing the image through numpy.array:
data = np.array(im.convert())