Beautifulsoup Table Scraping table navigation

Beautifulsoup Table Scraping table navigation - beautifulsoup

I am trying to learn beautifulsoup to scrap HTML and have a difficult challenge.
HTML I am trying to scrap is not well formatted and with lack of knowledge with beautifulsoup I am kind of stuck..
The HTML I am trying to scrap is as below
<table>
<tr>
<td><b>Value 1<b/>HiddenValue1</td>
<td>Value 2</td>
</tr>
<tr>
<td>NoValue</td>
</tr>
<tr>
<td><b>Value 3<b/>HiddenValue2</td>
<td>Value 4</td>
</tr>
</table>
So the outcome I am trying to get is extract all rows with two td tags.
This will extract the first and the last tr.
Once I get them, I need to arrange these td and b and just text into dictionary.
My desired outcome is list of dictionary
[
{ tdb : 'Value 1', tdHidden : 'HiddenValue1', tdSecond : 'Value 2' },
{ tdb : 'Value 3', tdHidden : 'HiddenValue2', tdSecond : 'Value 4' },
]
I am trying to use findall() function but don't know how to check length of children td tags and also not to sure how to navigate to first td and second td ..
Thanks in advance for your help !
EDIT :
Could you please also help with how to get "GetThisValue" and "Current" with in the td tag?
<td align="left" valign="top">
<b>Value1</b>
<br>
<font>
<b>Current</b>
</font>
<br>
GetThisValue
</td>

Following code should work -
trs = soup.find('table').find_all('tr')
trs = [tr for tr in trs if len(tr.find_all('td')) == 2]
results = []
for tr in trs:
tds = tr.find_all('td')
d = {
'tdb': tds[0].b.text,
'tdHidden': tds[0].b.next_sibling,
'tdSecond': tds[1].text
}
results.append(d)

Answer2 for the EDIT part -
# GetThisValue
soup.find('td').find_all('br')[1].next_sibling
# Current
soup.find('td').find('font').b.text

Related

how to apply filter on JQuery DataTable each columns? [duplicate]

I'm trying to filter table rows in an intelligent way (as opposed to just tons of code that get the job done eventually) but a rather dry of inspiration.
I have 5 columns in my table. At the top of each there is either a dropdown or a textbox with which the user may filter the table data (basically hide the rows that don't apply)
There are plenty of table filtering plugins for jQuery but none that work quite like this, and thats the complicated part :|

Here is a basic filter example http://jsfiddle.net/urf6P/3/
It uses the jquery selector :contains('some text') and :not(:contains('some text')) to decide if each row should be shown or hidden. This might get you going in a direction.
EDITED to include the HTML and javascript from the jsfiddle:
$(function() {
$('#filter1').change(function() {
$("#table td.col1:contains('" + $(this).val() + "')").parent().show();
$("#table td.col1:not(:contains('" + $(this).val() + "'))").parent().hide();
});
});

Slightly enhancing the accepted solution posted by Jeff Treuting, filtering capability can be extended to make it case insensitive. I take no credit for the original solution or even the enhancement. The idea of enhancement was lifted from a solution posted on a different SO post offered by Highway of Life.
Here it goes:
// Define a custom selector icontains instead of overriding the existing expression contains
// A global js asset file will be a good place to put this code
$.expr[':'].icontains = function(a, i, m) {
return $(a).text().toUpperCase()
.indexOf(m[3].toUpperCase()) >= 0;
};
// Now perform the filtering as suggested by #jeff
$(function() {
$('#filter1').on('keyup', function() { // changed 'change' event to 'keyup'. Add a delay if you prefer
$("#table td.col1:icontains('" + $(this).val() + "')").parent().show(); // Use our new selector icontains
$("#table td.col1:not(:icontains('" + $(this).val() + "'))").parent().hide(); // Use our new selector icontains
});
});

This may not be the best way to do it, and I'm not sure about the performance, but an option would be to tag each column (in each row) with an id starting with a column identifier and then a unique number like a record identifier.
For example, if you had a column Produce Name, and the record ID was 763, I would do something like the following:
<table id="table1">
<thead>
<tr>
<th>Artist</th>
<th>Album</th>
<th>Genre</th>
<th>Price</th>
</tr>
</thead>
<tbody>
<tr>
<td id="artist-127">Red Hot Chili Peppers</td>
<td id="album-195">Californication</td>
<td id="genre-1">Rock</td>
<td id="price-195">$8.99</td>
</tr>
<tr>
<td id="artist-59">Santana</td>
<td id="album-198">Santana Live</td>
<td id="genre-1">Rock</td>
<td id="price-198">$8.99</td>
</tr>
<tr>
<td id="artist-120">Pink Floyd</td>
<td id="album-183">Dark Side Of The Moon</td>
<td id="genre-1">Rock</td>
<td id="price-183">$8.99</td>
</tr>
</tbody>
</table>
You could then use jQuery to filter based on the start of the id.
For example, if you wanted to filter by the Artist column:
var regex = /Hot/;
$('#table1').find('tbody').find('[id^=artist]').each(function() {
if (!regex.test(this.innerHTML)) {
this.parentNode.style.backgroundColor = '#ff0000';
}
});

You can filter specific column by just adding children[column number] to JQuery filter. Normally, JQuery looks for the keyword from all the columns in every row. If we wanted to filter only ColumnB on below table, we need to add childern[1] to filter as in the script below. IndexOf value -1 means search couldn't match. Anything above -1 will make the whole row visible.
ColumnA | ColumnB | ColumnC
John Doe 1968
Jane Doe 1975
Mike Nike 1990
$("#myInput").on("change", function () {
var value = $(this).val().toLowerCase();
$("#myTable tbody tr").filter(function () {
$(this).toggle($(this.children[1]).text().toLowerCase().indexOf(value) > -1)
});
});

step:1 write the following in .html file
<input type="text" id="myInput" onkeyup="myFunction()" placeholder="Search for names..">
<table id="myTable">
<tr class="header">
<th style="width:60%;">Name</th>
<th style="width:40%;">Country</th>
</tr>
<tr>
<td>Alfreds Futterkiste</td>
<td>Germany</td>
</tr>
<tr>
<td>Berglunds snabbkop</td>
<td>Sweden</td>
</tr>
<tr>
<td>Island Trading</td>
<td>UK</td>
</tr>
<tr>
<td>Koniglich Essen</td>
<td>Germany</td>
</tr>
</table>
step:2 write the following in .js file
function myFunction() {
// Declare variables
var input, filter, table, tr, td, i;
input = document.getElementById("myInput");
filter = input.value.toUpperCase();
table = document.getElementById("myTable");
tr = table.getElementsByTagName("tr");
// Loop through all table rows, and hide those who don't match the search query
for (i = 0; i < tr.length; i++) {
td = tr[i].getElementsByTagName("td")[0];
if (td) {
if (td.innerHTML.toUpperCase().indexOf(filter) > -1) {
tr[i].style.display = "";
} else {
tr[i].style.display = "none";
}
}
}
}

grabTextFrom finds anticipated string that assertion using Locator::contains cannot

I have an interesting case with the following HTML and a Codeception Acceptance test using PhpBrowser, which I've not been able to find a similar issue for on StackOverflow.
I am looking to assert that the content of a P tag (as part of the only table on the page) contains an anticipated phrase.
// representative code
$anticipatedValue = "updated text description";
$xpath = "//table/tbody/tr[position() = 1]/td[position() = 2]/descendant::p";
$val = $I->grabTextFrom($xpath); // this equals the value of $anticipatedValue;
echo "the value is[" .$val."][".$anticipatedValue."]"; // these match perfectly, no trimming needed
$I->see(Locator::contains($xpath,$anticipatedValue)); //this fails
$I->see(Locator::contains($xpath,$val)); //so does this
The fail HTML output does not display anything out of the ordinary. I am hoping someone with more experience of XPath in PhpBrowser can point out what I am missing.
The HTML portion I am looking at is below. This is a legacy application I am adding tests to, so the HTML is at present, as you can see, not optimal.
<table id="calendar" class="unitable" cellspacing=0 cellpadding=0>
<thead>
<tr>
<th>Time</th>
<th></th>
</tr>
</thead>
<tbody>
<tr class="past"><td>10 Dec 00:00</td>
<td class="entry" rowspan=2>
<div class="time">Tue 10 09:00 - Tue 10 17:00</div>
<div class="stage">new stage B</div>
<p class="description">updated text description</p>
</td>
</tr>
<tr class="past">
<td>10 Dec 12:00</td>
</tr>
<tr class="past">
<td>11 Dec 00:00</td>
<td class="entry" rowspan=1>
<div class="time">Wed 11 09:00 - Wed 11 11:30</div>
<div class="stage">new stage C</div>
<p class="description">this is new stage C</p>
</td>
</tr>
<tr class="past">
<td>11 Dec 12:00</td>
<td></td>
</tr>
<tr class="past">
<td>12 Dec 00:00</td>
<td></td>
</tr>
<tr class="past">
<td>12 Dec 12:00</td>
<td class="entry" rowspan=1>
<div class="time">Thu 12 13:30 - Thu 12 17:00</div>
<div class="stage">new stage D</div>
<p class="description">this is new stage D</p>
</td>
</tr>
</tbody>
<tfoot></tfoot>
</table>
and the failure report reads:
Failed asserting that on page ...
{short snippet of upper page HTML - not from the area under scrutiny, followed by}
[Content too long to display. See complete response in '/var/www/public/vhosts/system/tests/_output/' directory]
--> contains "//table/tbody/tr[position() = 1]/td[position() = 2]/descendant::p[contains(., 'updated text description')]".
Many thanks for your consideration of this issue.

Your problem is that you pass result of Locator::contains to $I->see() method,
see expects to get a string as a first parameter and it is looking for that exact text in HTML.
Start with
$I->see('updated text description');
If you want to check if the text is displayed in specific location, pass XPath expression as a second parameter:
$I->see('updated text description', '//table/tbody/tr[position() = 1]/td[position() = 2]/descendant::p');
as documented at https://codeception.com/docs/modules/PhpBrowser#see

How to get all direct(immediate) rows from WebElement Table

From the below table I need immediate row elements using Xpath or "css-selector" or Selenium API :- element.findelements. Please help.
<table id ="Main">
<tbody>
<tr id="row_1">
<tr id="row_1_1">
<tr id="row_1_1_1">
</tr>
</tr>
</tr>
<tr id="row_2">
</tr>
<tr id="row_3">
<tr id="row_3_1">
</tr>
</tr>
</tbody>
</table>
Expected Output:-
[<tr id="row_1">,<tr id="row_2">,<tr id="row_3">]
Imp Note:- I am looking for a generic solution. Sometimes tbody wont be present in the table. I am having Table WebElement with me.

You can use xpath union operator (|) to combine multiple xpath expressions, f.e one to handle the case when tbody exists, and another for the case when tbody doesn't exist :
//table[#id='Main']/tbody/tr | //table[#id='Main']/tr

Use below locator
By.cssSelector("table#Main > tbody > tr")
Or
By.xpath("//table[#id='Main']/tbody/tr")
List<WebElement> allRows = driver.findElements(By.cssSelector("table#Main > tbody > tr"));
for(WebElement ele : allRows) {
//do your operation with that row
//ele.getText();
}

how to get the class name through selenium

<table >
<tr class="odd First"><td>1one Cell</td><td>2one Cell</td><td>3one Cell</td><td>4one Cell</td> </tr>
<tr class="even Second"><td>Two Cell</td><td>2Two Cell</td><td>3Two Cell</td><td>4Two Cell</td></tr>
<tr class="odd Thrid"><td>1Three Cell</td><td>2Three Cell</td><td>3Three Cell</td><td>4Three Cell</td></tr>
<tr class="even Fourth"><td>1Five Cell</td><td>2Five Cell</td><td>3Five Cell</td><td>4Five Cell</td></tr>
</table>
How can i get the class names of the tr. Please suggest me.

To get the class names of all the tags using java.
List<WebElement> list = driver.findElements(By.tagName("tr"));
for(WebElement ele:list){
String className = ele.getAttribute("class");
System.out.println("Class name = "+className);
}
This will print all the class names to the console for all the tags on the web page.
String className = selenium.getAttribute("//html/body/table/tbody/tr[1]/#class");
May be this code might get you the value of the first tag's class name. Let me know if this works.

List0 = []
List1 = driver.find_elements(By.XPATH, '/table/tr')
for element in List1:
name = element.get_attribute('class')
List0.append(name)
print(List0)

PHPTAL and specific table

I have to create specific table in PHPTAL.
I have array like that:
$tab = array('item1', 'item2', 'item3', 'item4');
Final table should be look like that:
<table>
<tr>
<td>Item1</td>
<td>Item2</td>
</tr>
<tr>
<td>Item3</td>
<td>Item4</td>
</tr>
</table>
So I was trying use tal:condition width "repeat/item/odd" and "repeat/item/even" to fit < tr > tag in right place, but it not working that I want to.
Have you any ideas?

<tr tal:repeat="row php:array_chunk(tab, 2)">

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Beautifulsoup Table Scraping table navigation - beautifulsoup

Following code should work - trs = soup.find('table').find_all('tr') trs = [tr for tr in trs if len(tr.find_all('td')) == 2] results = [] for tr in trs: tds = tr.find_all('td') d = { 'tdb': tds[0].b.text, 'tdHidden': tds[0].b.next_sibling, 'tdSecond': tds[1].text } results.append(d)

Answer2 for the EDIT part - # GetThisValue soup.find('td').find_all('br')[1].next_sibling # Current soup.find('td').find('font').b.text

Related

how to apply filter on JQuery DataTable each columns? [duplicate]

grabTextFrom finds anticipated string that assertion using Locator::contains cannot

How to get all direct(immediate) rows from WebElement Table

how to get the class name through selenium

PHPTAL and specific table

Categories

Resources