Get text with quotes inside one tag using BeautifulSoup - beautifulsoup

I am trying to parse web page using BeautifulSoup.
Case 1:
<div class="a">
<div class="b">abc def</div>
<div class="c">123 456</div>
</div>
Case 2:
<div class="a">
<div class="b">
"abc "
"def"
</div>
<div class="c">123 456</div>
</div>
I want to get a text from class C using the code:
c = soup.find('div', class = 'b', text = 'abc def').next_sibling.text
In Case 1 it works well. But in Case 2 it doesn't work. For Case 2 I tried also:
c = soup.find('div', class = 'b', text = '"abc ""def"').next_sibling.text
In both cases
soup.find('div', class = 'b').text
gives me the same value:
abc def
What is the right way to work with Case 2?
[EDIT #1]
I need to do this way because there are several div with the same class:
<div class="a">
<div class="b">abc def1</div>
<div class="c">123</div>
</div>
<div class="a">
<div class="b">abc def2</div>
<div class="c">456</div>
</div>
<div class="a">
<div class="b">
"abc "
"def3"
</div>
<div class="c">789</div>
</div>
[EDIT #2]
I tried to save response.text to file and saw, that it looks like:
<div class="b">abc <!-- -->def3</div>
But in Chrome it looks like:
<div class="b">
"abc "
"def3"
</div>
Also, I can't get text by re.compile if the text inside tag is like:
<div class="b">abc m<sup>2</sup></div>

You can find next element with tag name and then print the value.
You can use re module to search text.
Here you have two examples.
Example 1:
import bs4
htmldoc='''<html><div class="a">
<div class="b">
"abc "
"def"
</div>
<div class="c">123 456</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
c = soup.find('div', class_='b').findNext('div').contents[0]
print(c)
output:
123 456
Example 2:
import bs4
import re
htmldoc='''<html><div class="a">
<div class="b">
"abc "
"def"
</div>
<div class="c">123 456</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
d = soup.find('div', text=re.compile('def')).findNext('div').contents[0]
print(d)
output:
123 456
Example 3:
import bs4
htmldoc='''<html><div class="a">
<div class="b">abc def1</div>
<div class="c">123</div>
</div>
<div class="a">
<div class="b">abc def2</div>
<div class="c">456</div>
</div>
<div class="a">
<div class="b">
"abc "
"def3"
</div>
<div class="c">789</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
c = soup.find_all('div', class_='b')
for d in c:
text=d.findNext('div').contents[0]
print(text)
Output:
123
456
789
Example 4:
import bs4
htmldoc='''<html><div class="a">
<div class="b">abc def1</div>
<div class="c">123</div>
</div>
<div class="a">
<div class="b">abc def2</div>
<div class="c">456</div>
</div>
<div class="a">
<div class="b">
"abc "
"def3"
</div>
<div class="c">789</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
d=soup.find(lambda tag:tag.name=="div" and "abc " in tag.text and "def3" in tag.text).findNext('div').findNext('div').contents[0]
print(d)
output:
789
Example 5:
import bs4
htmldoc='''<html><div class="a">
<div class="b">abc def1</div>
<div class="c">123</div>
</div>
<div class="a">
<div class="b">abc def2</div>
<div class="c">456</div>
</div>
<div class="a">
<div class="b">
"abc "
"def3"
</div>
<div class="c">789</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
c = soup.find_all('div', class_='b')
for d in c:
if (('abc ' in d.text) and ('def3' in d.text)):
textc=d.findNext('div').contents[0]
print(textc)
Hope this helps.

Related

Selenium css and xpath selector

I am trying to select an element using CSS selector (and have tried XPath also).
<div data-sheetstartpoint="1">
<div class="booking">
<div class="startsheet two">
<div class="title">
<div class="time">Time</div>
<div class="cols">
<div class="hidden-xs two">Team 1</div>
<div class="hidden-xs two">Team 2</div>
<div class="visible-xs two">Teams</div>
</div>
</div>
<div class="slot" data-startpointno="1">
<div class="time" id="timelabel"> 07:30 </div>
<div class="cols">
<div class="timeslot " data-roundteamid="" data-slotno="1" data-time="17/04/2021 06:30:00" data-startpointno="1" data-offset="0" data-bookingname="">
<span class="available-to-book">Available </span>
</div>
<div class="timeslot " data-roundteamid="" data-slotno="2" data-time="17/04/2021 06:30:00" data-startpointno="1" data-offset="0" data-bookingname="">
<span class="available-to-book">Available </span>
</div>
</div>
</div>
I have made a CSS selector (and have also looked using Chropath)
I am getting an element not found error.
The selectors I am using
FindElementByCss(".timeslot[data-slotno*='1'][data-time*='17/04/2021 06:30:00']")
FindElementByXPath("//*[#id='hdidbookingcontent']/div[2]/div[1]/div/div[3]/div[2]/div[1]")
I see further up the page there is a class "container" - is this something that would impact selection?

Boostrap 3 - thumbnail images alignment problem

how to align them in a row with equal height.
<div id="about-page-contain">
<div class="">
<div class="row equal">
<div class="wwd">
<div class="col-md-12">
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/plots">
<img src="/public/uploadfiles/images/be71f72e-b2e6-4dab-8dd4-8bc4d48365b8.jpg" alt="PLOTS" style="">
<div class="caption" style="text-align:center;">
<p>PLOTS</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/flats">
<img src="/public/uploadfiles/images/ff74dcc3-71b4-4d35-b53b-77c6c36a0947.jpg" alt="FLATS" style="">
<div class="caption" style="text-align:center;">
<p>FLATS</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/farm-land">
<img src="/public/uploadfiles/images/06fdde52-b169-45eb-a96a-ec1dd07f15e5.jpg" alt="FARM LAND" style="">
<div class="caption" style="text-align:center;">
<p>FARM LAND</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/development-sites">
<img src="/public/uploadfiles/images/45d18935-9220-41ed-8f58-4aa3a16be8a8.jpg" alt="DEVELOPMENT SITES" style="">
<div class="caption" style="text-align:center;">
<p>DEVELOPMENT SITES</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/development-lands-for-plots">
<img src="/public/uploadfiles/images/d85c0a59-2d35-4f24-830b-8cfe606f8caf.png" alt="DEVELOPMENT LANDS FOR PLOTS" style="">
<div class="caption" style="text-align:center;">
<p>DEVELOPMENT LANDS FOR PLOTS</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/independent-houses">
<img src="/public/uploadfiles/images/5d265aad-31ef-4285-94a3-3aaeb0111e7a.jpg" alt="INDEPENDENT HOUSES" style="">
<div class="caption" style="text-align:center;">
<p>INDEPENDENT HOUSES</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/outrate-lands-for-appt-&-plots">
<img src="/public/uploadfiles/images/8d3cfcec-5415-422c-8426-665271216009.jpg" alt="OUTRATE LANDS FOR APPT & PLOTS" style="">
<div class="caption" style="text-align:center;">
<p>OUTRATE LANDS FOR APPT & PLOTS</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/commercial-properties">
<img src="/public/uploadfiles/images/707cc088-4383-4423-8172-2fb2d9efa46d.jpg" alt="COMMERCIAL PROPERTIES" style="">
<div class="caption" style="text-align:center;">
<p>COMMERCIAL PROPERTIES</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/independent-villas-in-prime-locations">
<img src="/public/uploadfiles/images/c2a9b09c-94bd-43ee-92ae-db4c587dd8eb.jpg" alt="INDEPENDENT VILLAS IN PRIME LOCATIONS" style="">
<div class="caption" style="text-align:center;">
<p>INDEPENDENT VILLAS IN PRIME LOCATIONS</p>
</div>
</a>
</div>
</div>
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/construction-contract">
<img src="/public/uploadfiles/images/5f253c56-7f4d-4e61-9359-f5b2f7748443.jpg" alt="CONSTRUCTION CONTRACT" style="">
<div class="caption" style="text-align:center;">
<p>CONSTRUCTION CONTRACT</p>
</div>
</a>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
There are a few issues here:
You have a class of wwd that we cannot see. Is that applying some styles of some sort?
Each image has a blank style="" class. Why is that there?
If I remove the link to the bootstrap styles I do not see any difference in applied styles.
the image sizes of your images can affect the layout and there are no actual links to the image so it makes it difficult to reproduce. Include the full links in your example to make it easier to reproduce.
I created a codepen with a fake image but I don't see the issue you are seeing.
<div class="col-md-2">
<div class="thumbnail">
<a href="hyderabad/flats">
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.r9MVqtg3zlXjdKLjkrZ04QHaE8%26pid%3DApi&f=1" alt="DEVELOPMENT SITES" style="" width="100px">
<div class="caption" style="text-align:center;">
<p>FLATS</p>
</div>
</a>
</div>
</div>
How about this example?
<div class = "row">
<div class = "col-sm-6 col-md-3">
<a href = "#" class = "thumbnail">
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.r9MVqtg3zlXjdKLjkrZ04QHaE8%26pid%3DApi&f=1" alt = "Generic placeholder thumbnail" width="100px">
</a>
</div>
<div class = "col-sm-6 col-md-3">
<a href = "#" class = "thumbnail">
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.r9MVqtg3zlXjdKLjkrZ04QHaE8%26pid%3DApi&f=1" alt = "Generic placeholder thumbnail" width="100px">
</a>
</div>
<div class = "col-sm-6 col-md-3">
<a href = "#" class = "thumbnail">
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.r9MVqtg3zlXjdKLjkrZ04QHaE8%26pid%3DApi&f=1" alt = "Generic placeholder thumbnail" width="100px">
</a>
</div>
<div class = "col-sm-6 col-md-3">
<a href = "#" class = "thumbnail">
<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.r9MVqtg3zlXjdKLjkrZ04QHaE8%26pid%3DApi&f=1" alt = "Generic placeholder thumbnail" width="100px">
</a>
</div>
</div>

beautifulsoup find unique div tag

Working with bs4. When trying to find html info inside <div data-reactroot> I am never able to locate it.
Inside <div data-reactroot> I am going to have to loop through each text entry and split it. I get how to find a normal tags but data-reactroot seems to be giving me a lot of trouble.
<div data-reactroot="">
<div>
<div class="_hgs47m">
<div class="_1thk0tsb">
<div style="margin-right:8px">
<div class="_1ciyl4o"><span class="_8tbpu3" aria-hidden="true">󱀁</span></div>
</div>
</div>
<div class="_n5lh69r">
<div class="_1p3joamp">Entire guest suite</div>
<div class="_36rlri">
<div class="_36rlri" style="margin-right:24px">
<div class="_czm8crp">2 guests</div>
</div>
<div class="_36rlri" style="margin-right:24px">
<div class="_czm8crp">1 bedroom</div>
</div>
<div class="_36rlri" style="margin-right:24px">
<div class="_czm8crp">1 bed</div>
</div>
<div class="_36rlri" style="margin-right:0">
<div class="_czm8crp">1 bath</div>
</div>
</div>
</div>
</div>
<div style="margin-top:16px">
<div class="_hgs47m">
<div class="_1thk0tsb">
<div style="margin-right:8px">
<div class="_1ciyl4o"><span class="_8tbpu3" aria-hidden="true">󰄄</span></div>
</div>
</div>
<div class="_n5lh69r">
<div><span class="_1p3joamp">Self check-in</span></div>
<div class="_czm8crp">Check yourself in with the keypad.</div>
</div>
</div>
</div>
<div style="margin-top:16px">
<div class="_hgs47m">
<div class="_1thk0tsb">
<div style="margin-right:8px">
<div class="_1ciyl4o"><span class="_8tbpu3" aria-hidden="true">󰀢</span></div>
</div>
</div>
<div class="_n5lh69r">
<div><span class="_1p3joamp">Sparkling clean</span></div>
<div class="_czm8crp">9 recent guests said this place was sparkling clean.</div>
</div>
</div>
</div>
<div style="margin-top:16px">
<div class="_hgs47m">
<div class="_1thk0tsb">
<div style="margin-right:8px">
<div class="_1ciyl4o"><span class="_8tbpu3" aria-hidden="true">󰀃</span></div>
</div>
</div>
<div class="_n5lh69r">
<div><span class="_1p3joamp">Michael & Tammy is a Superhost</span></div>
<div class="_czm8crp">Superhosts are experienced, highly rated hosts who are committed to providing great stays for guests.</div>
</div>
</div>
</div>
<div style="margin-top:24px;margin-bottom:24px">
<div class="_7qp4lh"></div>
</div>
</div>
</div>
I have tried:
data_soup.find_all("div data-reactroot")
data_soup.get('data-reactroot')

Selenium WebDriver click child element

I'm using selenium and I want to click a child element with 2 as value.
This is the full code:
<div class="dialer-keypad">
<div class="dialpad-row">
<div class="key">
<div class="value">1</div>
<div class="letters"></div>
</div>
<div class="key">
<div class="value">2</div>
<div class="letters">ABC</div>
</div>
<div class="key">
<div class="value">3</div>
<div class="letters">DEF</div>
</div>
</div>
<div class="dialpad-row">
<div class="key">
<div class="value">4</div>
<div class="letters">GHI</div>
</div>
<div class="key">
<div class="value">5</div>
<div class="letters">JKL</div>
</div>
<div class="key">
<div class="value">6</div>
<div class="letters">MNO</div>
</div>
</div>
</div>
So my question is How can I click this element?
<div class="value">2</div>
You should be able do this quite succinctly with XPath:
//*[contains(#class, 'value') and text()='2']
Alternatively, assuming that the markup was static you could target the element using specific indices. For example:
.dialpad-row:first-child .key:nth-child(2) .value
Simply use xpath
//div[contains (#class,'value') and contains (text(),'2')]

Bootstrap 3 - style a site with 3 columns per row on medium & large devices and two columns on small

I would like to style a site with 3 columns of info per row on medium & large devices, just two columns on small devices and one on mobile.
// This is the less I used:
.site_info_box_group {
.make-row();
.site_info_box {
.make-sm-column(6);
.make-md-column(4);
}
The problem is I have to use different markup for the different layouts.
Is it possible to do this sort of design using less. My feeling is that it
would work best if I put all the #site_info_box's into a single .site_info_box_group
and then styled every nth div to force it on to a new line - but I'm not sure how to
get this to work.
I have considered a javascript solution, but I want to understand if this is possible using pure less.
For three columns I'd need this markup
<div class="site_info_box_group">
<div class="site_info_box"> ... some markup here... </div>
<div class="site_info_box"> ... some markup here... </div>
<div class="site_info_box"> ... some markup here... </div>
<div>
<div class="site_info_box_group">
<div class="site_info_box"> ... some markup here... </div>
<div class="site_info_box"> ... some markup here... </div>
<div class="site_info_box"> ... some markup here... </div>
<div>
... etc
But for two columns I'd need this markup:
<div class="site_info_box_group">
<div class="site_info_box"> ... some markup here... </div>
<div class="site_info_box"> ... some markup here... </div>
<div>
<div class="site_info_box_group">
<div class="site_info_box"> ... some markup here... </div>
<div class="site_info_box"> ... some markup here... </div>
<div>
<div class="site_info_box_group">
<div class="site_info_box"> ... some markup here... </div>
<div class="site_info_box"> ... some markup here... </div>
<div>
... etc
To fix your problem you will need to add all your columns in the same row. For example:
<div class="container">
<div class="row">
<div class="col-md-4 col-sm-6">1</div>
<div class="col-md-4 col-sm-6">2</div>
<div class="col-md-4 col-sm-6">3</div>
<div class="col-md-4 col-sm-6">4</div>
<div class="col-md-4 col-sm-6">5</div>
<div class="col-md-4 col-sm-6">6</div>
<div class="col-md-4 col-sm-6">7</div>
<div class="col-md-4 col-sm-6">8</div>
</div>
</div>
See also: Bootstrap 3.0:responsive column resets section of the documentation i your case you seems to don't need a clearfix
When the height of your columns varies you will have to use the #grid-responsive-resets:
<div class="container">
<div class="row">
<div class="col-md-4 col-sm-6">1</div>
<div class="col-md-4 col-sm-6" style="height:50px;">2</div>
<div class="clearfix visible-sm"></div>
<div class="col-md-4 col-sm-6">3</div>
<div class="clearfix visible-md visible-lg"></div>
<div class="col-md-4 col-sm-6">4</div>
<div class="clearfix visible-sm"></div>
<div class="col-md-4 col-sm-6">5</div>
<div class="col-md-4 col-sm-6">6</div>
<div class="clearfix visible-sm visible-md visible-lg"></div>
<div class="col-md-4 col-sm-6">7</div>
<div class="col-md-4 col-sm-6">8</div>
</div>
</div>
Thanks to https://stackoverflow.com/users/1596547/bass-jobsen .
This is the less I used to make this solution work for me:
.site_info_box_group {
.make-row();
.site_info_box {
.make-sm-column(6);
.make-md-column(4);
}
.colsplit2{
.clearfix;
.visible-sm;
}
.colsplit3{
.clearfix;
.visible-md;
.visible-lg;
}
}
and this is the (simplified) code I used to draw the divs
<div class = "site_info_box_group">
<?php
function draw_div($content){
static $count=0;
print "<div class='site_info_box'>{$content}</div>";
$class ="";
if (0 == (($count+1) % 2)) {
$class .= " colsplit2 ";
};
if (0 == (($count+1) % 3)){
$class .= " colsplit3 ";
};
if (!empty($class)){
print "<div class='{$class}'></div>";
}
$count++;
}
foreach ($mycontent as $content){
draw_div($content);
}
?>
</div>