Beautifulsoup replace colspan=2 with single col - beautifulsoup

I'm trying to parse data from rows which occasionally have a colspan=2 which spoils my ability to target data to extract.
What I'd like to do is remove the 'colspan=2' from the table element every time it occurs:
#replace
<td colspan="2" class="time">10:00 AM</td>
#with
<td>635</td>
Is this possible? And can I work it into a conditional if then else?
Here's a more verbose example:
<table>
<tr class="playerRow even">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td class="player"><p class="playerName">John doe</p></td>
<td class="background">X</td>
<td>345</td> #THIS ELEMENT FREQUENT
<td></td>
<td></td>
<td></td>
<td></td>
<td style=""></td>
</tr><
<tr class="playerRow odd">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td class="player"><p class="playerName">John doe</p></td>
<td class="background">X</td>
<td colspan="2" class="myClass" style="">3:15 PM</td> #THIS ELEMENT OCCASIONAL
<td></td>
<td></td>
<td></td>
<td></td>
<td style=""></td>
</tr>
<tr class="playerRow odd">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td class="player"><p class="playerName">John doe</p></td>
<td class="background">X</td>
<td>22</td> #THIS ELEMENT FREQUENT
<td></td>
<td></td>
<td></td>
<td></td>
<td style=""></td>
</tr>
</table>
So whenever I come across the colspan I'd like to replace it with a plain td, so it doesn't shunt the row elements across and mess up my count.

This will convert:
<td colspan="2" class="myClass" style="">3:15 PM</td>
to:
<td>3:15 PM</td>
from bs4 import BeautifulSoup
bs = BeautifulSoup(html)
for x in bs.findAll("td"):
if "colspan" in x.attrs:
x.attrs = {}
Do you want it to remove the value also?

Related

Replicating a table

I'm having trouble replicating this table. I'm confused with the rowspan and colspan. I would really appreciate it if someone helps.
Click here to view image of the table
Here you go:
<table>
<tr>
<td colspan="7">The Error Rate on Different Forms</td>
</tr>
<tr>
<td rowspan="2">Form name</td>
<td rowspan="2">A</td>
<td rowspan="2">B</td>
<td rowspan="2">Total Fields (X = A x B)</td>
<td rowspan="2">Fields with errors (Y)</td>
<td colspan="2">Error rate* (Y/X) x 100 (%)</td>
</tr>
<tr>
<td>X/Y</td>
<td>%age</td>
</tr>
<tr>
<td colspan="7">High Risk Errors</td>
</tr>
<!-- normal tr td -->
<tr>
<td colspan="7">Low Risk Error</td>
</tr>
<!-- normal tr td -->
</table>

rowSpan hides rows

<table>
<tr> <td rowspan="2">1</td> <td>2</td> </tr>
<tr> <td rowspan="2">3</td> </tr>
<tr> <td>4</td> </tr>
</table>
seemingly only displays two rows:
The reason for hiding the second row [1 3] is, that the cells with text 1 and 3 are reduced in height. Is there a way to ensure, that the second row is visible in the display (not only in DOM)?
The problem gets clearer, if you look at the same table with an additional column:
<table>
<tr> <td rowspan="2">1</td> <td>2</td> <td>0</td> </tr>
<tr> <td rowspan="2">3</td> <td>0</td> </tr>
<tr> <td>4</td> <td>0</td> </tr>
</table>
which is displayed like:
You can add a height property to the row:
<table border=1>
<tr>
<td rowspan="2">1</td>
<td>2</td>
</tr>
<tr style="height: 1.5em">
<td rowspan="2">3</td>
</tr>
<tr>
<td>4</td>
</tr>
</table>
One suboptimal option could be to add an empty column:
<table>
<tr> <td rowspan="2">1</td> <td>2</td> <td class="void"></td> </tr>
<tr> <td rowspan="2">3</td> <td class="void"></td> </tr>
<tr> <td>4</td> <td class="void"></td> </tr>
</table>
CSS:
table,td {border:1px solid}
.void {height:1em;padding:0;border:0}
However, the spacing between columns leads to unnecessary space for the added column:
As this problem could be solved with padding-left for TD and a cellspacing of 0 for the table, this solution would not be general enough, so I'm still waiting for a good idea.

Finding Duplicate Pairs in Excel

So I have this summary sheet. It contains data from multiple workbooks going across.
It's not like this question, because what I'm trying to do is find all the inconsistant pairs of data in this worksheet going across and highlight them.
Here is a fiddle that explains what I want to accomplish. I have a large worksheet, and would like to compare the first 2 rows with the next 2 rows etc. throughout the worksheet. Below is an HTML representation of what I am trying to accomplish.
<table class="tg">
<tr>
<th class="tg-031e">#INT1</th>
<th class="tg-031e">#INT1</th>
<th class="tg-031e">#INT2</th>
<th class="tg-031e">#INT2</th>
<th class="tg-031e">#INT3</th>
<th class="tg-031e">#INT3</th>
</tr>
<tr>
<td class="tg-031e">Apples</td>
<td class="tg-031e">Y</td>
<td class="tg-031e">Bananas</td>
<td class="tg-031e">Y</td>
<td class="tg-031e">Apples</td>
<td class="tg-031e">Y</td>
</tr>
<tr>
<td class="tg-031e">Bananas</td>
<td class="tg-031e">Y</td>
<td class="tg-031e">Peppers</td>
<td class="tg-031e">Y</td>
<td class="tg-031e">Bananas</td>
<td class="tg-031e">Y</td>
</tr>
<tr>
<td class="tg-031e">Peppers</td>
<td class="tg-031e">Y</td>
<td class="tg-031e">Pomegranite</td>
<td class="tg-031e">Y</td>
<td class="tg-031e">Peppers</td>
<td class="tg-031e">Y</td>
</tr>
<tr>
<td class="tg-031e">Pomegranite</td>
<td class="tg-031e">Y</td>
<td class="tg-031e">Nuts</td>
<td class="tg-031e">YYYYYNN</td>
<td class="tg-031e">Pomegranite</td>
<td class="tg-031e">Y</td>
</tr>
<tr>
<td class="tg-031e">Nuts</td>
<td class="tg-031e">YYYYYYNN</td>
<td class="tg-031e">Smango</td>
<td class="tg-031e">NNNYYNNN</td>
<td class="tg-031e">Nuts</td>
<td class="tg-031e">NNNYNNNN</td>
</tr>
<tr>
<td class="tg-zl7m">Oranges</td>
<td class="tg-zl7m">YYYYNNNN</td> <!-- this oranges is different from... -->
<td class="tg-031e">Blackberries</td>
<td class="tg-031e">NNNYYNNNN</td>
<td class="tg-zl7m">Oranges</td>
<td class="tg-zl7m">NNNYYNNN</td> <!-- ...this one -->
</tr>
<tr>
<td class="tg-031e">Smango</td>
<td class="tg-031e">NNNYYNNN</td>
<td class="tg-031e">Berries</td>
<td class="tg-031e">YYNYNNNN</td>
<td class="tg-031e">Smango</td>
<td class="tg-031e">Y</td>
</tr>
<tr>
<td class="tg-031e">Skiwi</td>
<td class="tg-031e">NNNYNNNN</td>
<td class="tg-031e">Beer</td>
<td class="tg-031e">NNNYNNNN</td>
<td class="tg-031e">Steaks</td>
<td class="tg-031e">Y</td>
</tr>
<tr>
<td class="tg-031e">Steaks</td>
<td class="tg-031e">Y</td>
<td class="tg-031e">Blueberries</td>
<td class="tg-031e">YNNYNNNN</td>
<td class="tg-031e">Steaksauce</td>
<td class="tg-031e">NNNYNNNN</td>
</tr>
<tr>
<td class="tg-zl7m">Steaksauce</td>
<td class="tg-zl7m">YYNYNNNN</td>
<td class="tg-031e">Blucheese</td>
<td class="tg-031e">NNNYNNNN</td>
<td class="tg-zl7m">Apricot</td>
<td class="tg-zl7m">YYYYNNNN</td>
</tr>
<tr>
<td class="tg-031e">Apricot</td>
<td class="tg-031e">YYYYNNNN</td>
<td class="tg-031e">Blackberries</td>
<td class="tg-031e">NNNYNNNN</td>
<td class="tg-031e">Milkshake</td>
<td class="tg-031e">NNNYNNNN</td>
</tr>
</table>
I have tried VBA solutions and also conditional formatting. Any solution that will make this work is greatly appreciated.
Thank you.
I think this array formula should work:-
=SUM(ISODD(COLUMN())*($A$2:$E$12=A2)*($B$2:$F$12<>B2))
if the table starts in A1, this can be applied as conditional formatting from A2 to E12 and will highlight the left-hand (fruit) cell of inconsistent pair of cells.
Then you can use a similar formula to highlight the right-hand cell of each pair:-
=SUM(ISEVEN(COLUMN())*($A$2:$E$12=A2)*($B$2:$F$12<>B2))
Apply this from B2 to F12.
Note that Smango are highlighted because they are in an inconsistent group (although they are also in a consistent group).
Here is the alternative approach (as suggested) of highlighting the consistent groups:-
The formulae are
=SUM(ISODD(COLUMN())*($A$2:$E$12=A2)*($B$2:$F$12=B2))>1
and
=SUM(ISEVEN(COLUMN())*($A$2:$E$12=A2)*($B$2:$F$12=B2))>1
to be applied as before.
The sum this time will always be at least one because each pair of cells will match with itself, so the '>' sign is to find if there are any matches with other pairs of cells.

Selenium ide change values at runtime

I want to change the value of global param inside while loop.
For some reason the value isn't changing although I'm inserting a new value.
<tr>
<td>store</td>
<td>1</td>
<td>CallTime</td>
</tr>
<tr>
<td>while</td>
<td>${OnCall}==true</td>
<td></td>
</tr>
<tr>
<td>storeAttribute</td>
<td>callStateLabel_16#text</td>
<td>ElapsedTime</td>
</tr>
<tr>
<td>echo</td>
<td>${ElapsedTime}</td>
<td></td>
</tr>
<tr>
<td>store</td>
<td>storedVars['CallTime']=${ElapsedTime}</td>
<td>CallTime</td>
</tr>
<tr>
<td>echo</td>
<td>${CallTime}</td>
<td></td>
</tr>
The last echo gives 1 while the elapsed time is 00:35. How should it be done?
instead of "store" you should use "storeEval", and leave field "value" empty
try
<tr>
<td>storeEval</td>
<td>storedVars['CallTime']=${ElapsedTime}</td>
<td></td>
</tr>

Cannot execute a Selenium IDE test case for a Pop Window

Hi can any one please help me with this script please.
I cannot run a test case for pop window with selenium IDE.
here is the following script i am using to run the test case.
<tr>
<td>open</td>
<td>/car-insurance</td>
<td></td>
</tr>
<tr>
<td>assertTitle</td>
<td>Car Insurance | Netpig Insurance</td>
<td></td>
</tr>
<tr>
<td>clickAt</td>
<td>//img[#alt='Get an insurance quote']</td>
<td></td>
</tr>
<tr>
<td>selectPopUpAndWait</td>
<td>GetaCarInsurancequote</td>
<td>30000</td>
</tr>
<tr>
<td>selectWindow</td>
<td>null</td>
<td></td>
</tr>
<tr>
<td>assertTitle</td>
<td>Car Insurance | Netpig Insurance</td>
<td></td>
</tr>
<tr>
<td>type</td>
<td>form1:txt_4_3_0_Policy_CoverDate</td>
<td>26</td>
</tr>
<tr>
<td>type</td>
<td>form1:txt_4_4_0_Policy_CoverDate</td>
<td>05</td>
</tr>
<tr>
<td>type</td>
<td>form1:txtRegLookup</td>
<td>VN05XVO</td>
</tr>
<tr>
<td>click</td>
<td>form1:imgGetVehicle</td>
<td></td>
</tr>
<tr>
<td>select</td>
<td>form1:cboVehicleYearOfManufacture</td>
<td>label=2006</td>
</tr>
<tr>
<td>select</td>
<td>form1:cboVehicleModified</td>
<td>label=Select</td>
</tr>
<tr>
<td>select</td>
<td>form1:cboVehicleModified</td>
<td>label=No</td>
</tr>
<tr>
<td>type</td>
<td>form1:txtPurchaseDateDay</td>
<td>10</td>
</tr>
<tr>
<td>type</td>
<td>form1:txtPurchaseDateMonth</td>
<td>02</td>
</tr>
<tr>
<td>type</td>
<td>form1:txtPurchaseDateYear</td>
<td>2009</td>
</tr>
<tr>
<td>type</td>
<td>form1:txtVehicleEstimatedValue</td>
<td>2001</td>
</tr>
<tr>
<td>select</td>
<td>form1:cboVehicleNightLocation</td>
<td>label=Car Park</td>
</tr>
<tr>
<td>type</td>
<td>form1:txtOvernightPostCode</td>
<td>wr51dh</td>
</tr>
<tr>
<td>select</td>
<td>form1:cboVehicleCoverType</td>
<td>label=Third Party Only</td>
</tr>
<tr>
<td>select</td>
<td>form1:cboVolExcess</td>
<td>label=£300</td>
</tr>
<tr>
<td>select</td>
<td>form1:cboNCBYears</td>
<td>label=9</td>
</tr>
<tr>
<td>select</td>
<td>form1:cboNCBProtected</td>
<td>label=No</td>
</tr>
<tr>
<td>select</td>
<td>form1:cboNCBType</td>
<td>label=Motorcycle</td>
</tr>
if any one have some solution please email mail me on dhanunjayakumar#gmial.com
<tr>
<td>selectWindow</td>
<td>null</td>
<td></td>
</tr>
This piece looks problematic it should be like
<tr>
<td>selectWindow</td>
<td>name=NameOfPopupWindow</td>
<td></td>
</tr>
[error] Permission denied for http://www.netpig.co.uk to call method Location.toString on http://quotes.netpig.co.uk
Its Same Origin Policy Issue
Your popup window on http://www.netpig.co.uk is not allowed by browser to modify DOM on http://quotes.netpig.co.uk because its different domain.