How to turn a column in pandas dataframe into headers - dataframe

I have a pandas dataframe like this:
enter image description here
I would like to turn it into something like:
enter image description here
in general I want to turn the "Location" column into headers.
I tried pd.pivot, but it doesn't giuve what I want:
enter image description here

Related

I want to change the shape of the data frame

I have this type of data.
pd.DataFrame({'ID':[1684318],
'1':[5],
'2':[6],
'3':[7],
'4':[8],
'5':[9]})
I'd like to change this data to the following form. What can I do?
pd.DataFrame({'ID':[1684318,1684318,1684318,1684318,1684318],
'1':[5,6,7,8,9]})
I want to know how to change the format of the data.

Why do I get this error with ggbetweenstats?

I am trying to label outliers in a boxplot via ggbetweenstats, but I get this error:
enter image description here
My data is from here: https://travelsmartcampaign.org/wp-content/uploads/Travel-Smart-Ranking-2022.xlsx-Travel-Smart-Ranking-2022-8May.pdf. (I have added "_" to the variables Total Score and Company Name)
My code is:
enter image description here

How to export Pandas styled dataframe as an image to Databricks dbfs?

Context: I am writing a bot on Databricks using Python that will send to a Slack channel the image of a pandas dataframe table. That table was formatted using .style to make it faster for people to see the most important numbers.
I have two problems:
how can I save as an image a pandas dataframe that went through the .style method?
how can I open that image in another Databricks notebook?
Step 1 - OK: generating a sample dataframe.
import pandas as pd
my_df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
Step 2 - OK: then, I save a new variable in the following way to add to the table several formatting modifications that I need:
my_df_styled = (my_df.style
.set_properties(**{'text-align': 'center', 'padding': '15px'})
.hide_index()
.set_caption('My Table')
.set_table_styles([{'selector': 'caption',
'props': [('text-align', 'bottom'),
('padding', '10px')
]}])
)
Step 3 - Problem: trying to save the new variable as an image. But here, I am not being able to correctly do it. I tried to follow what was mentioned here, but they are using matplotlib to save it and it is something that I don't want to do, because I don't want to lose the formatting on my table.
my_df_styled.savefig('/dbfs/path/figure.png')
But I get the following error:
AttributeError: 'Styler' object has no attribute 'savefig'
Step 4 - Problem: opening the image in a different notebook. Not sure how to do this. I tried the following using another image:
opening_image = open('/dbfs/path/otherimage.png')
opening_image
But instead of getting the image, I get:
Out[#]: <_io.TextIOWrapper name='/dbfs/path/otherimage.png' mode='r'
encoding='UTF-8'>
For first question, savefig() is the method of Matplotlib so it is certainly not working if you try to do sth like df.savefig()
Rather, you should use another wrapper (matplotlib or other library in below link) to input the dataframe so that the dataframe can be converted into image with desired style.
https://stackoverflow.com/a/69250659/4407905
For the second question, I do not try Databrick myself, but I guessed it would be better if you do use to_csv(), to_excel(), to_json(), etc. method to export data into text-based format.

Data in legend doesnt match the displayed data

I have code that plot tabular data I have . The code choose everytime different row (observation) to plot and display the data and the legend with the name of the observations.
My problem is that that even if I change the displayed data using the iloc (e.g changing the rows to be displayed) , I still get the same legend .
for example:
If I use this code, that suppose to display rows 0-10:
SavitzkyGolay(db_plants.iloc[:10,5:],25,2).T.plot(title='25/06/2019 17:00',figsize=(17,10))
plt.legend(db_plants['plant'])
The result I get is this :
But when I change the iloc:
SavitzkyGolay(db_plants.iloc[12:22,5:],25,2).T.plot(title='25/06/2019 17:00',figsize=(17,10))
plt.legend(db_plants['plant'])
I get the same legend:
*I can't share the original dataframe
*The observations names are different for sure
My end goal: to have the correct observations displayed in the legend
EDIT: I have used the iloc :
SavitzkyGolay(db_plants.iloc[12:22,5:],25,2).T.plot(title='25/06/201917:00',figsize=(17,10))
plt.legend(db_plants['plant'].iloc[12:22,5:])
But I still gett error:
IndexingError: Too many indexers

Edit pandas dataframe in flask html page

What is the best way to edit a pandas dataframe in flask?
I have a dataframe which I want to output on a HTML page with flask (there are many examples how to do this). However, I don't just want to output it, but I want to make it editable. Ideally each field should be a html input field.
I would like to avoid to have to create a form manually and then reconvert it to a dataframe. Is there any elegant solution to that? Does pandas or any other package offer any functionality that could simplify that task?
You can make use of df.style (a Styler instance) to render a DataFrame as a grid of HTML inputs.
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 100, (3, 3)))
df.style.format('<input name="df" value="{}" />').render()
This will render as
If you wrap the result in a <form> and submit it to some endpoint, the request query string (or POST body) will look like
df=44&df=47&df=64&df=67&df=67&df=9&df=83&df=21&df=36
Note that these are the cells of the data frame in row-major order. At this point, you can re-create the data frame using
df = pd.DataFrame(np.asarray(request.values.getlist('df'), dtype=np.int).reshape((3, 3)))
As you suggested in your comment, another approach is to name HTML inputs with the column name, to avoid having to reshape the data.
def html_input(c):
return '<input name="{}" value="{{}}" />'.format(c)
df.style.format({c: html_input(c) for c in df.columns}).render()
The data sent to the server will then look like
0=44&1=47&2=64&0=67&1=67&2=9&0=83&1=21&2=36
and you can restore the data frame using
df = pd.DataFrame(request.values.lists())
This is more elegant than the above, apart from the need to create the formatter dictionary {c: html_input(c) for c in df.columns}. Unfortunately, the formatter function is only passed the value, and none of the index information.