RDKit - Export pandas data frame with mol image - pandas

I would like to know whether is it possible to export pandas dataframe with molecular image directly in excel file format?
Thanks in advance,

In RDKit's PandasTools there is the funktion SaveXlsxFromFrame.
http://www.rdkit.org/Python_Docs/rdkit.Chem.PandasTools-module.html#SaveXlsxFromFrame
XlsxWriter must be installed.
import pandas as pd
from rdkit import Chem
from rdkit.Chem import PandasTools
smiles = ['c1ccccc1', 'c1ccccc1O', 'c1cc(O)ccc1O']
df = pd.DataFrame({'ID':['Benzene', 'Phenol', 'Hydroquinone'], 'SMILES':smiles})
df['Mol Image'] = [Chem.MolFromSmiles(s) for s in df['SMILES']]
PandasTools.SaveXlsxFromFrame(df, 'test.xlsx', molCol='Mol Image')

Related

How to create multiple pandas profiling reports for multiple csv files in a directory? The report name should match the file name

I tried this,
import glob
import os
import pandas as pd
import pandas_profiling
from pandas_profiling import ProfileReport
files = glob.glob("D:\home_health_services_current_data\*.csv")
df = pd.DataFrame()
for f in files:
csv = pd.read_csv(f)
df = df.append(csv)
profile = ProfileReport(df, title="Profiling Report", explorative=True)
profile.to_file("D:\proj_report\profilerep\prof_report.html")

Adding a frame border to a dataframe in Excel

It seems simplistic as a task to perform, but I've been having hard time to add a border frame to my excel-written table (using xlsxwriter engine). The only way I could do so is by getting the size of my df & starting row/column then loop on each cell and format it, which is redundant. Is there a solution I'm not seeing ? I tried the styleframe module in vain.
Reproducible example:
import pandas as pd
import numpy as np
from styleframe import StyleFrame, Styler, utils
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('AB'))
df = df.style.set_properties(**{'text-align': 'center'})
writer = StyleFrame.ExcelWriter("Test.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name= 'Random', index=False)
format_x = workbook.add_format({'border': 2})
worksheet.set_column('A:B',20,format_x)
writer.save()

read csv file from buffer got EmptyDataError?

i need to read a string like csv content with pandas , but pandas get some errors, i don't knonw what happened, can anyone help me?
import pandas as pd
import io
s = ',测试项,信息,结果\r\n0,软件测试机型805,软件测试机型805,PASS\r\n1,软件当前版本1,软件当前版本1,FAIL\r\n2,软件测试机型805,软件测试机型805,PASS\r\n3,软件当前版本1,软件当前版本1,FAIL\r\n4,软件测试机型805,软件测试机型805,PASS\r\n5,软件当前版本1,软件当前版本1,FAIL\r\n'
buf = io.StringIO()
buf.write(s)
df = pd.read_csv(buf)
got error, EmptyDataError: No columns to parse from file
老铁你拿去
import pandas as pd
import io
s = ',测试项,信息,结果\r\n0,软件测试机型805,软件测试机型805,PASS\r\n1,软件当前版本1,软件当前版本1,FAIL\r\n2,软件测试机型805,软件测试机型805,PASS\r\n3,软件当前版本1,软件当前版本1,FAIL\r\n4,软件测试机型805,软件测试机型805,PASS\r\n5,软件当前版本1,软件当前版本1,FAIL\r\n'
buf = io.StringIO()
buf.write(s)
buf.seek(0)
df = pd.read_csv(buf)
``

Blank Bokeh plot when reading data from dataframe and using time on x-axis

I am unable to display plot using Bokeh. I am reading the data from dataframe. Here's a snippet of my Python code.
I am new to Bokeh. I tried following some of the examples from the User Guide. I'm unable to figure out what's going wrong here. Please advise.
import datetime
import pandas
from bokeh.plotting import figure, show, output_file, output_notebook
from bokeh.models import ColumnDataSource
PATH_TO_CSV = "Sample_Data.csv"
output_notebook()
data = pd.read_csv(PATH_TO_CSV, index_col=False)
data['timestamp'] = pd.to_datetime(data['timestamp']).dt.strftime("%H:%M:%S")
source = ColumnDataSource(data)
p = figure(plot_width=400, plot_height=400, x_axis_type="datetime")
p.line('timestamp', 'event_msg', source=source)
show(p)
Here's sample .csv
event_msg,timestamp
Created,2019-03-02 13:19:44.164562-0700
Created,2019-03-02 13:20:32.212323-0700
Created,2019-03-02 13:20:56.582761-0700
Modified,2019-03-02 13:21:48.021752-0700
Deleted,2019-03-02 13:22:16.938382-0700
Modified,2019-03-02 13:22:22.139714-0700
Permission changed,2019-03-02 13:24:20.195975-0700
Deleted,2019-03-02 13:33:53.049900-0700
Modified,2019-03-02 13:33:56.266113-0700
Deleted,2019-03-02 13:33:59.757584-0700
I am seeing completely blank plot. Ideally, I am interested in plotting different line plots based on the event messages.
You should convert your time like this:
data['timestamp'] = pd.to_datetime(data['timestamp'])
So your code should look like (tested with Bokeh v1.1.0):
import os
import datetime
import pandas as pd
from bokeh.plotting import figure, show, output_file, output_notebook
from bokeh.models import ColumnDataSource
PATH_TO_CSV = "Sample_Data.csv"
output_notebook()
data = pd.read_csv(os.path.join(os.path.dirname(__file__), PATH_TO_CSV), index_col = False)
data['timestamp'] = pd.to_datetime(data['timestamp'])
source = ColumnDataSource(data)
p = figure(plot_width = 400, plot_height = 400, x_axis_type = "datetime", y_range = data['event_msg'].unique())
p.line('timestamp', 'event_msg', source = source)
show(p)
Result:

Wordcloud using matplotlib is not showing

For my code, please see below:
#tfids words word cloud
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import pandas as pd
tf = pd.DataFrame(columns=['word'])
tf['word'] = ['federalist', 'wrexham', 'remy', 'islamic', 'hegseth', 'hereford', 'gbt', 'sharenet', 'cpr', 'vegas', 'krvn', 'bandidos', 'nginx', 'manafort' , 'roth', 'kennedy' ,'pence', 'quantum']
wordcloud10 = WordCloud().generate(' '.join(tf['word']))
plt.imshow(wordcloud10)
plt.axis("off")
plt.title("")
plt.show()
display()
In databricks, one has to use display to see a chart. Despite the above code, I don't see a wordcloud. The O/p i see as below:
Out[1106]: <matplotlib.image.AxesImage at 0x7fae917806d0>
Out[1106]: (-0.5, 399.5, 199.5, -0.5)
Out[1106]: <matplotlib.text.Text at 0x7faeb31a6110>
Thanks for your help.
Screenshot from Databricks - No Chart