Is there a way to update a pandas table with to_html() keeping the links intact? - pandas

Newbie to pandas here.
I am reading a html table to pandas data frame using read_html. This table contains a column (column 3) with clickable url .
I want to append this table with new row . But after appending and using table.to_html command, only the last row clickable url is there. All previous clickable url become unclickable
table = pd.read_html(page_content)
table = table[0]
I want to append a row to this table with clickable url hyperlink and then write out html.
table.loc[len(table.index)] = [sys.argv[1], report ,make_clickable(jiraurl,iss) , sys.argv[3]]
newHTML = table.to_html(index=False,render_links=True,escape=False)
But this only makes the newly added link clickable and all previous link in the table just become unclickable with table.to_html
Am i missing something?
make_clickable function is as follows
def make_clickable(url, name):
return '{}'.format(url,name)
I hope i made my issue clear.

Related

How to create hyperlinks in VBA for mutiple cells?

I have to create hyperlinks for a list of variables to get easily to the datas in a database. My database contains summary of many variables coming from different sources. My problem is, I have to skip multiple rows fill with duplicates and blank cells. The number of rows to skip depends for every variable. I also want that my hyperlinks have the name of the variable from the database (example : Customer_Since). Is there a way to create a loop to skip the good number of rows and create a hyperlink? Example : if I click on the link named Customer_Since, it takes me to the row that contains the summary of the variable Customer_Since. Can somebody help me?
To create a hyperlink you can use this code:
ThisWorkbook.Sheets("SheetName").Hyperlinks.Add Anchor:=Range("A10"), Address:="", SubAddress:="Sheet2!B5", TextToDisplay:="Link"
where:
Anchor - cell, where there will be a hyperlink;
SubAddress - range to navigate to;
TextToDisplay - text in the link.
You can use another vba code with your rules to correctly insert Anchor`s range.
Or please provide more information and examples of the input and output data to get help.

Pandas Dataframe unable to reference column name in Plotly Dash Dropdown when set to reference in google sheets column

I am currently busy with a Plotly Dash Web Application and busy creating a dropdown referencing a column from a pandas dataframe I am reading in from a CSV file.
The issue is it is not able to read the column and I have seen it is because the column is actually a reference of another sheet I.e =RawData!A1.
I have managed to print the column so I know it exists in the dataframe and all the data is printing correctly however, Plotly Dash does not want to populate the dropdown with the label and values - my current line of code is:
options=[{'label': i, 'value': i} for i in df.CategoryName.unique()],
Category name in Google Sheets is referring to =RawData!A1
What I have tested:
Ammended my sheet name to read directly from my RawData sheet and it works fine - This is not a solution that I want though, this lead me to see the issue was with the reading from the referenced column.
Attempted using column index instead:
options=[{'label': i, 'value': i} for i in df.iloc[:,1].unique()],
Again this worked for printing but not to populate the dropdown in plotly dash.
Any advise will be greatly appreciated!
So by actually adding some data cleaning to remove rows at the bottom of my dataset in pandas it fixed my issue.
I added a remove nan based on my column CategoryName and by doing that my dropdown worked
df = df[df['CategoryName'].notna()]
The reason it worked makes sense - I setup my copy form to =RawData!A:A so my RawData at the moment only comprising of 123 rows, by row 125 in my reference sheet it was referencing a blank column causing the dropdown to have an error showing a reference to something that does not exist, very funny error but logical at the same time. Not sure if this will help many people but hopefully it will assist somebody!

create a drop down list for a entire column in excel using openpy python

I would like to create a dropdown list in excel sheet for entire column that matches with number of lines in the input file using OpenPy . I can able to create it for one cell in a column but not for entire column.
here is the link for similar ques:
OpenPyXL: Is it possible to create a dropdown menu in an excel sheet?
my code:
with open('input.csv') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
ws.append(row)
data_val = DataValidation(type="list",formula1='waiver1,waiver2,waiver3')
ws.add_data_validation(data_val)
data_val.add(ws['H1'])
Instead of adding drop down list to only "H1" column,I want to add dropdown list for "H" column up to where the data (here input.csv) is present. How can i do that?
Please help me.

iterating through a dataframe column to see if individual cell value is in a list of file paths

I have a list of shapefile paths (sf_list) and I'm using a nested for loop to iterate through the list and a dataframe (df) to see if a value in the row of a column (name) of that dataframe is in that path, and if it is, append that list value to that row of the dataframe in a new column of the dataframe (sf_path). This is what I have now:
for sf in sf_list:
for row in df.iterrows():
df.loc[df['name'].isin(sf),'sf_path'] = [sf]
The script runs but the new column is empty. The list is populated with all of the paths I need and the column of that dataframe contains specific text that is in the path I went the path to populate in that row of the new column. Any direction appreciated.
UPDATE:
Alright now I have:
for sf in sf_list:
for row in dlrules_df.iterrows():
dlrules_df.loc[dlrules_df['dl_foldername'] in sf, 'sf_path'] = sf
Error returned:
TypeError: 'in <string>' requires string as left operand, not Series
Can you give this a try? apply isn't recommended but it has become quite a habit for me. Would like to spend more time to give you a more efficient solution but it's already bedtime here and this popped out off the back of my head.
sf = [list_of_folder_paths]
dlrules_df.loc[:, 'dl_foldername'].apply(lambda x: sf[sf.index(x)] if x in sf else None)
PS: Not tested, so it may break somewhere but I hope it gives you some idea.

Text File Input ignore line of tabs

I have a job in Pentaho with a Text File Input step reading from a tab delimited text file. Sometimes when this file is given they have lines that are empty of data but the row is filled with tabs because they copied empty lines in excel. Below is a screen shot of the 'empty' rows in Notepad++.
Is there a way to ignore lines like this? I tried adding a filter entry with Filter string = number of tabs, Filter position = 0, Stop on filter = Y, Positive match = Y. This filter didn't seem to have any effect.
When the job runs it treats all of these as NULL records which makes sense but then this causes the next job a Table output to fail. If there is not a way to fix this with a text file input is there another job which can easily clean-up the bad records?
You can check one or more field values using Filter Rows.
Your transformation would look like: Text Input -> Filter Rows -> Table Output.
When I did more debugging I found out that the Filter tab did have the logic to achieve what I was looking for. Instead of Filter string = number of tabs, Filter position = 0, Stop on filter = Y, Positive match = Y it needed to have Positive match = N. After this change then it started working correctly.