Bokeh Server: import .csv file with FileInput widget and pass it to ColumnDataSource - dataframe

I have a csv file with my data to plot (x, y, and other fields) and want to import it using the new FileInput widget. I don't have sufficient knowledge to manipulate the "base64" strings coming from FileInput to pass it to a ColumnDataSource of dataframe.
from bokeh.io import curdoc
from bokeh.models.widgets import FileInput
def update_cds(attr, old, new):
#A code here to extract column names and data
#from file_input and pass it to a ColumnDataSource or a DataFrame
file_input = FileInput(accept=".csv")
file_input.on_change('value', update_cds)
doc=curdoc()
doc.add_root(file_input)
Thanks for your help!

Here is a working solution: the code will upload the csv file on the server side in a 'data' folder (to be created before). Then it is easy to open the csv and pass it to a ColumnDataSource for instance.
#widget
file_input = FileInput(accept=".csv")
def upload_csv_to_server(attr, old, new):
#decode base64 format (from python base24 website)
base64_message = file_input.value
base64_bytes = base64_message.encode('ascii')
message_bytes = base64.b64decode(base64_bytes)
message = message_bytes.decode('ascii')
#convert string to csv and save it on the server side
message_list = message.splitlines()
with open('data/' + file_input.filename, 'w', newline='') as file:
writer = csv.writer(file)
for i in range(len(message_list)):
writer.writerow(message_list[i].split(','))
file_input.on_change('value', upload_csv_to_server)
If you see a nicer way please let me know. It is easy with the csv structure to do that way but what about any other file format?

Python has a built in base64 standard library module:
import base64
data = base64.b64decode(encoded)

Related

Python Dill or Pickle gives error when use in a new file

I need help with my code. I have built a recommendation system using cosine similarity on a colab and used pickle to serialized it. when I deserialized it inside a colab file, it works perfectly fine but when I deserialize it in a new colab file. it gives me an error
name 'data' is not defined
data is a variable that is initialized with my dataset which is outside of the class InstaPost.
import pandas as pd
import numpy as np
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
import dill as pickle
data = pd.read_csv("/content/instaData.txt")
data
data = data[["Caption", "Hashtags"]]
captions = data["Caption"].tolist()
uni_tfidf = text.TfidfVectorizer(input=captions, stop_words="english")
uni_matrix = uni_tfidf.fit_transform(captions)
uni_sim = cosine_similarity(uni_matrix)
def recommend_post(x):
return ", ".join(data["Caption"].loc[x.argsort()[-7:-1]])
data["Recommended Post"] = [recommend_post(x) for x in uni_sim]
class InstaPost:
def Post(number):
count = 0
wordy = (data["Recommended Post"][number])
sentence = wordy.split(',')
for i in sentence:
count=count+1
print(count," ",i)
obj = InstaPost
obj.Post(1)
pickle_out = open("modelREC", "wb")
pickle.dump(obj, pickle_out)
pickle_out.close()
pickle_in = open("modelREC", "rb")
exe = pickle.load(pickle_in)
print(exe.Post(10))
NOTE: on a different file
print(exe.Post)
works
and give output
<function InstaPost.Post at 0x7efc0b4c3f70>
if I need to give the reference of the data than please guide me how should I do it. It will be a great help to me
Thanks in advance

Importing a gziped csv into pandas

I have an url: https://api.kite.trade/instruments
And this is my code to fetched data from url and write into excel
import pandas as pd
url = "https://api.kite.trade/instruments"
df = pd.read_json(url)
df.to_excel("file.xlsx")
print("Program executed successfully")
but, when I run this program I'm getting error like this_
AttributeError: partially initialized module 'pandas' has no attribute 'read_json' (most likely due to a circular import)
It's not a json, it's csv. So you need to use read_csv. Can you please try this?
import pandas as pd
url = "https://api.kite.trade/instruments"
df = pd.read_csv(url)
df.to_excel("file.xlsx",index=False)
print("Program excuted successfully")
I added an example how I converted the text to a csv dump on your local drive.
import requests
url = "https://api.kite.trade/instruments"
filename = 'test.csv'
f = open(filename, 'w')
response = requests.get(url)
f.write(response.text)

Accessing methods within a class from bokeh FileInput widget

I am working on a Bokeh serve UI and am running into trouble interfacing a class (and its methods) with the FileInput widget. I am using a class (in this example, called "EIS_data") which, when instantiated, loads a file using pd.read_csv. The EIS_data class also has a method to plot the data in a particular way, and I'd like to be able to load the pandas dataframe and call and manipulate the data using the methods already in place in the class.
So far, I have been able to load the data successfully using the FileInput widget, but I can't figure out how to access the dataframe again once it's loaded in. In a standalone Jupyter notebook, I could run d = EIS_data("filename") and then ```d.plot''' to load the data into a pandas dataframe and then plot it according to the method defined in the EIS_data class, but I can't figure out how to replicate this in the UI code once the data are loaded using the FileInput widget.
Is there a way I can interface this with Bokeh widgets, such that I could simply add d.plot() to curdoc()? I have found a workaround using ColumnDataSource, but it seems a shame to redefine plotting methods and data handling when they are already defined in the class. Below are minimal working examples of the UI code and the class definition.
UI Code:
import numpy as np
import pandas as pd
from eis_analysis_trimmed import EIS_data
import bokeh
from bokeh.io import curdoc
from bokeh import layouts
from bokeh.layouts import column,row,gridplot
from bokeh.plotting import figure
from bokeh.models import *
import base64
import io
## Instantiate the EIS_data class for loading data
def load_data(f):
return EIS_data(f)
## updater function called to load data with FileInput widget
## Must be decoded using base64
def load_file(attr, old, new):
decoded = base64.b64decode(new)
d = io.BytesIO(decoded)
dat = load_data(d)
print(dat.df)
print(dat)
print("EIS Data Uploaded Successfully")
return dat
f_load = Paragraph(text="""Load Data""",height=15)
f = FileInput()
f.on_change('value',load_file)
curdoc().add_root(column(f))
and here is the EIS_data class:
import numpy as np
from scipy.optimize import curve_fit
from sklearn.metrics import r2_score
from bokeh.plotting import figure, show
from bokeh.models import LinearAxis, Range1d
from bokeh.resources import INLINE
import bokeh.io
#locally include javascript dependencies in html
bokeh.io.output_notebook(INLINE)
class EIS_data:
def __init__(self, file_name, delimiter='\t',
header=0, f_low=None, f_high=None):
#load eis data into a pandas dataframe
eis_data = pd.read_csv(file_name, delimiter=delimiter, header=header)
#iterate through all of the columns and check to see
#if all of the values in that column are null
#if they are, then remove that column
for c in eis_data.columns:
if eis_data[c].isnull().all():
eis_data = eis_data.drop([c], axis=1)
#make sure that the data are imported as floats and not strings
eis_data = eis_data[['freq/Hz', 'Re(Z)/Ohm', '-Im(Z)/Ohm']]
eis_data['freq/Hz'] = pd.to_numeric(eis_data['freq/Hz'])
eis_data['Re(Z)/Ohm'] = pd.to_numeric(eis_data['Re(Z)/Ohm'])
eis_data['-Im(Z)/Ohm'] = pd.to_numeric(eis_data['-Im(Z)/Ohm'])
self.df = eis_data.sort_values(by='freq/Hz')
def plot(self, fit_vals = None):
plot = figure(title="Nyquist Plot",
x_axis_label='Re(Z) Ohm',
y_axis_label='-Im(Z) Ohm',
plot_width=600,
plot_height=600)
plot.circle(self.df['Re(Z)/Ohm'], self.df['-Im(Z)/Ohm'],
size=7, color='navy', name='Data')
return plot
EDIT: Adding the workaround using ColumnDataSource
from bokeh.layouts import column
from bokeh.plotting import figure
from bokeh.models import *
from bokeh.models.widgets import FileInput
import base64
import io
from eis_analysis2 import EIS_data
# Instantiate the EIS_data class for loading data
def load_data(data):
return EIS_data(data)
# updater function called to load data with FileInput widget
# Must be decoded using base64
def load_file(attr, old, new):
decoded = base64.b64decode(new)
d = io.BytesIO(decoded)
dat = load_data(d)
dat_df = dat.df
# Replace plot data with data from newly-loaded file
source.data = dict(freq=dat_df[dat_df.columns[0]], reZ=dat_df[dat_df.columns[1]], imZ=dat_df[dat_df.columns[2]])
#phase,mag = bode_calc(reZ,imZ)
print(dat_df)
print("EIS Data Uploaded Successfully")
# Create Column Data Source that will be used by the plot
source = ColumnDataSource(data=dict(freq=[], reZ=[], imZ=[]))
##Make the nyquist plot
nyq_plot = figure(title="Nyquist Plot",
x_axis_label='Re(Z) Ohm',
y_axis_label='-Im(Z) Ohm',
plot_width=600,
plot_height=600)
nyq_plot.circle(x="reZ", y="imZ",source=source,size=7, color='navy', name='Data')
f = FileInput()
f.on_change('value', load_file)
layout = column(f, nyq_plot)
curdoc().add_root(layout)

Python- Exporting a Dataframe into a csv

I'm trying to write a dataframe file to a csv using pandas. I'm getting the following error AttributeError: 'list' object has no attribute 'to_csv'. I believe I'm writing the syntax correctly, but could anyone point out where my syntax is incorrect in trying to write a dataframe to a csv?
This is link the link of the file: https://s22.q4cdn.com/351912490/files/doc_financials/quarter_spanish/2018/2018.02.25_Release-4Q18_ingl%C3%A9s.pdf
Thanks for your time!
import tabula
from tabula import read_pdf
import pandas as pd
from pandas import read_json, read_csv
a = read_pdf(r"C:\Users\Emege\Desktop\micro 1 true\earnings_release.pdf",\
multiple_tables= True, pages = 15, output_format = "csv",\
)
print(a)
a.to_csv("a.csv",header = False, index = False, encoding = "utf-8")
enter image description here

Arranging data into lists from url

The following code is written in python 2. How can I write it in python 3? thanks
import urllib2
import sys
#read data from uci data repository
target_url = ("https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data")
data = urllib2.urlopen(target_url)
#arrange data into list for labels and list of lists for attributes
xList = []
labels = []
for line in data:
#split on comma
row = line.strip().split(",")
xList.append(row)
You can use the requests library of Python 3.
import requests
data = requests.get("https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data").text
for line in data.split('\n'):
row = line.strip().split(",")
xList.append(row)