Edit pandas dataframe in flask html page - pandas

What is the best way to edit a pandas dataframe in flask?
I have a dataframe which I want to output on a HTML page with flask (there are many examples how to do this). However, I don't just want to output it, but I want to make it editable. Ideally each field should be a html input field.
I would like to avoid to have to create a form manually and then reconvert it to a dataframe. Is there any elegant solution to that? Does pandas or any other package offer any functionality that could simplify that task?

You can make use of df.style (a Styler instance) to render a DataFrame as a grid of HTML inputs.
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 100, (3, 3)))
df.style.format('<input name="df" value="{}" />').render()
This will render as
If you wrap the result in a <form> and submit it to some endpoint, the request query string (or POST body) will look like
df=44&df=47&df=64&df=67&df=67&df=9&df=83&df=21&df=36
Note that these are the cells of the data frame in row-major order. At this point, you can re-create the data frame using
df = pd.DataFrame(np.asarray(request.values.getlist('df'), dtype=np.int).reshape((3, 3)))
As you suggested in your comment, another approach is to name HTML inputs with the column name, to avoid having to reshape the data.
def html_input(c):
return '<input name="{}" value="{{}}" />'.format(c)
df.style.format({c: html_input(c) for c in df.columns}).render()
The data sent to the server will then look like
0=44&1=47&2=64&0=67&1=67&2=9&0=83&1=21&2=36
and you can restore the data frame using
df = pd.DataFrame(request.values.lists())
This is more elegant than the above, apart from the need to create the formatter dictionary {c: html_input(c) for c in df.columns}. Unfortunately, the formatter function is only passed the value, and none of the index information.

Related

Append data to a dataframe using root.after() from tkinter

first I'm going to explain what my script does, and then problem. I'm trying to automatize a task every 5 mins. This task involves pandas Tkinter and Matplotlib. I'll attach some guide code of mine to help understand this. First, I do some big task to initialize software programs (petroleum ones) to open files and then work with them. Second, I create a Treeview window and a plot window from Tkinter, then I need them to be updated every 5 mins, the Treeview is updated as expected, but the main problem is that I can't update or append some data to an empty dataframe;which is generated in every loop, need this new data to update plot every5 mins. I tried with append like in code, but it's not working, thanks in advance people.
import pandas
import tkinter as tk
## big task here
#create an empty dataframe
dfoil = pd.DataFrame(columns=[['Date','Oil Rate Cal','Oil Rate Mesu']])
root=tk.Tk
def update_item(df,df0,df01,df02):
#where df,df0,df01,df02 are dataframes are updated and are working correctly
#another big task here where i can get the desire results and i can see a treeview updating every 5 mins
#.........
#.........
#time2 comes from a working dataframe
dfoil.append({'Date':time2, 'Oil Rate Cal':dff1.iat[3,11],'Oil Rate Mesu':dff1.iat[3,12]},ignore_index=True)
root.after(1000*60*5, update_item, df,df0,df01,df02)
update_item(my_df,raiserdf,separetordf,compressorsdf)
root.mainloop
dfoil is the dataframe that is always getting empty after everyloop
You need to add inside the function:
global dfoil
This will make the dataframe global and 'exist' outside the function
Another option is at the end of the function:
return dfoil
and change the line that calls the function to:
dfoil = update_item(my_df,raiserdf,separetordf,compressorsdf)
Also the append does not occur 'inplace' so you need:
dfoil = dfoil.append({'Date':time2, 'Oil Rate Cal':dff1.iat[3,11],'Oil Rate Mesu':dff1.iat[3,12]},ignore_index=True)

Dataframe column not showing in one line

I have a dataframe imported from csv file and I have set option of ('display.max_colwidth', None) still the column (column named 'Week') values (which are strings) come in two lines. What should I write so that it comes in one line?
One thing is that if I display less columns then it comes one line (in the above pic I had set the option to display all columns, although they are not visible in the truncated screenshot). This can be seen in below pic.
pd.set_option sets one single option at the time. If you want to set multiple options you have to call it multiple times.
In your case that would be:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
Alternatively you can set all these options using a for loop:
options = {'display.max_rows': None,
'display.max_columns': None,
'display.max_colwidth', None}
for option, value in options.items():
pd.set_option(option, value)
What worked for me was this answer. It modifies the CSS of the style option of dataframe.
I'm putting the code for a quick reference here or if the link gets corrupted in future.
# #RomuloPBenedetti's answer, check out the link
df.style.set_table_styles([{'selector': 'td', 'props': 'white-space: nowrap !important;'}])

How to export Pandas styled dataframe as an image to Databricks dbfs?

Context: I am writing a bot on Databricks using Python that will send to a Slack channel the image of a pandas dataframe table. That table was formatted using .style to make it faster for people to see the most important numbers.
I have two problems:
how can I save as an image a pandas dataframe that went through the .style method?
how can I open that image in another Databricks notebook?
Step 1 - OK: generating a sample dataframe.
import pandas as pd
my_df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
Step 2 - OK: then, I save a new variable in the following way to add to the table several formatting modifications that I need:
my_df_styled = (my_df.style
.set_properties(**{'text-align': 'center', 'padding': '15px'})
.hide_index()
.set_caption('My Table')
.set_table_styles([{'selector': 'caption',
'props': [('text-align', 'bottom'),
('padding', '10px')
]}])
)
Step 3 - Problem: trying to save the new variable as an image. But here, I am not being able to correctly do it. I tried to follow what was mentioned here, but they are using matplotlib to save it and it is something that I don't want to do, because I don't want to lose the formatting on my table.
my_df_styled.savefig('/dbfs/path/figure.png')
But I get the following error:
AttributeError: 'Styler' object has no attribute 'savefig'
Step 4 - Problem: opening the image in a different notebook. Not sure how to do this. I tried the following using another image:
opening_image = open('/dbfs/path/otherimage.png')
opening_image
But instead of getting the image, I get:
Out[#]: <_io.TextIOWrapper name='/dbfs/path/otherimage.png' mode='r'
encoding='UTF-8'>
For first question, savefig() is the method of Matplotlib so it is certainly not working if you try to do sth like df.savefig()
Rather, you should use another wrapper (matplotlib or other library in below link) to input the dataframe so that the dataframe can be converted into image with desired style.
https://stackoverflow.com/a/69250659/4407905
For the second question, I do not try Databrick myself, but I guessed it would be better if you do use to_csv(), to_excel(), to_json(), etc. method to export data into text-based format.

Jupyterlab Table dynamic output (sorting, filterung, ...)

Good evening everyone,
is there a way to dynamically display the output of a table (pandas dataframe) so that you can sort by a column in the output or filter a column?
I would have thought that this should be included in Jupyter by default, but I can't find a setting.
Maybe I just can't find such a setting, so I'm curious about your answers. :-)
There is an extension called qgrid.
Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your DataFrames with intuitive scrolling, sorting, and filtering controls, as well as edit your DataFrames by double clicking cells.
You can display your dataframe with widget, and sort, filter or even edit your data interactively.
import qgrid
qgrid_widget = qgrid.show_grid(dataframe, show_toolbar=True)
qgrid_widget
As of now (March, 2021), works with both jupyter lab: 3.0.10 and jupyter-notebook : 6.2.0
Example screenshot:

How do I pre-select rows in a DataTable based on the value in a column?

Situation:
I have a pandas dataframe which I convert into an html table via df.to_html(). I then add the DataTables class to the table. This DataTables-table has the following columns:
ID | X | Y | Val |...More columns...| Selection_Criteria |...More columns...
The values in Selection_Criteria can be either 1 or 0. I know that with:
$('#ProductList').DataTable( {
...
"fnInitComplete": function(oSettings, json) { $('#ProductList tbody tr:eq(0)').click(); }
});
(Source: http://code.datatables.net/forums/discussion/38171/automatic-select-of-the-first-row-on-reload)
..it is theoretically possible to select the first row. (In reality, I have not been able to simulate a click for the first row.)
But my question goes more towards: How do I automatically pre-select ALL rows where the value is 1 in Selection_Criteria? What is the best approach? Should this be done client/server side?
In pandas the term "select"(ing) means to screen out that which was not selected for. I know that in a table on a web page, selected can mean being highlighted to stand out from the others. There are a couple of ways you can do this on the server side. You could display two tables, one for each state of Selection_Criteria. This would save you the hassle of trying to select individual rows out of a table in the first place (which would be done with Javascript, not Pandas). While pandas has the ability to add a class to the resulting html, the class is applied to the element.
If you are using jquery you are going to use these pieces. as you haven't put example data I can't be exact.
replace x in the next line with the number of columns the Selection_Criteria=1 is across the table
$( "tr td:nth-child(x):contains('1')" ).addClass('selected');
There are solutions on the backend using beautifulsoup and css selectors, or lxml.etree with xpath selectors. But jquery is going to be the most concise with this problem.
#Aliester. Thank you for the pointer!
This helped me find the solution to my own question. What I did:
1.) Identify row index that I want to select when the table loads.
2.) Pass the index to js.
3.) Loop over the indices and apply the following command to each index entry:
table.row(':eq('+hit_index_row+')').select();
So I am using the API to select each individual row. This works for me and hopefully could be helpful to others as well. It may be a bit hacky, so more elegant suggestions are welcome!
You can do this by providing a function for the "rowCallback" option when initializing the DataTable. https://datatables.net/reference/option/rowCallback
Also it is generally better to use the API methods to select rows instead of just changing the class. I found that the DataTable + Select libraries keep an internal collection of selected row indexes (just current page if serverside processing is on) instead of using the class to resolve selected items.
So while the display will look right, if you just change the class, if you rely on any of the API methods to get selected items later on there will be issues. Additionally just changing the class on the row will not fire any of the "select" events on the table so you can't rely on those either.