Soup parser not able to extract class information

Soup parser not able to extract class information - beautifulsoup

While running query:
soup.find_all('div')
I get results:
<div class="class-link">
<a class="btn btn-primary" href="abc" style="text-decoration: underline">
See all</a>
</div>, <div class="sota-table-link">
<a class="btn btn-primary" href="abc" style="text-decoration: underline">
See all</a>
</div>, <div class="class-link">
Though when I run: soup.find_all('div', _class='class-link') I get empty list.
What causes this issue and how can I get correct div?

Pass the class as key-value pairs to the attrs parameter, instead, like this:
soup.find_all('div', {'class': 'class-link'})
Result:
[<div class="class-link">
<a class="btn btn-primary" href="abc" style="text-decoration: underline">
See all</a>
</div>, <div class="class-link"></div>]

According to the BeautifulSoup4 documentation
it must be class_ and not _class
So you code mus be changed to
print(soup.find_all("div", class_="class-link"))

Personally I find css selectors a lot cleaner
soup.select('div.class-link')
where the . is a css class selector

Related

How to clean up pulled data from BeautifulSoup, Pandas, Python

Hello everyone I have the information I want pulled using BeautiuflSoup but I can't seem to get it printed out correctly to send to pandas and excel.
html_f ='''
<li class="list-group-item">
<div>
<div class="tyler-toggle-controller open">
<p class="text-primary">
07/01/2022 Date
<span class="caret"> </span>
</p>
</div>
<div class="tyler-toggle-container row-buff" style="display: block; overflow: hidden;">
<p class="col-sm-12 col-md-12">
<span class="text-muted">Comment</span><br>
[1] Comments
</p>
</div>
</div>
</li>'''
My code used to pull the data I want:
soup = BeautifulSoup(html_f,'html.parser')
for child in soup.findAll('li',class_='list-group-item')[0]:
print (child.text)
Here is the info it pulls But it prints it out weird with tons of spacing
07/01/2022 Date
Comment
[1] Comments
Ideally, I only need the top portion of (date and File Date) printed out but at the very least I need help getting it into a list format like:
07/01/2022 Date
Comment
[1] Comments

To get your information printed as expected in your question, you could use stripped_strings and iterate over its elements:
for e in soup.find_all('li',class_='list-group-item'):
for t in list(e.stripped_strings):
print(t)
Note: In new code use find_all() instead of old syntax findAll().
Example
html='''
<li class="list-group-item">
<div>
<div class="tyler-toggle-controller open">
<p class="text-primary">
07/01/2022 Date
<span class="caret">
</span>
</p>
</div>
<div class="tyler-toggle-container row-buff" style="display: block; overflow: hidden;">
<p class="col-sm-12 col-md-12">
<span class="text-muted">
Comment
</span>
<br/>
[1] Comments
</p>
</div>
</div>
</li>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for e in soup.find_all('li',class_='list-group-item'):
for t in list(e.stripped_strings):
print(t)
Output
07/01/2022 Date
Comment
[1] Comments
Not sure cause you are talking about pandas, you also could pick each information, clean it up and append to a list of dicts:
data = []
for e in soup.find_all('li',class_='list-group-item'):
data.append({
'date': e.p.text.strip().replace(' Date',''),
'comment': e.select_one('.tyler-toggle-container br').next_sibling.strip()
})
pd.DataFrame(data)
or
data = [{
'date':soup.select_one('li.list-group-item .text-primary').text.strip().replace(' Date',''),
'comment':soup.select_one('li.list-group-item .tyler-toggle-container br').next_sibling.strip()
}]
Output
date
comment
07/01/2022
[1] Comments

So far so good, it's my trying
doc='''
<li class="list-group-item">
<div>
<div class="tyler-toggle-controller open">
<p class="text-primary">
07/01/2022 Date
<span class="caret">
</span>
</p>
</div>
<div class="tyler-toggle-container row-buff" style="display: block; overflow: hidden;">
<p class="col-sm-12 col-md-12">
<span class="text-muted">
Comment
</span>
<br/>
[1] Comments
</p>
</div>
</div>
</li>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(doc, 'html.parser')
text=[' '.join(child.get_text(strip=True).split(' ')).replace(' DateComment[1]',',') for child in soup.find_all('li',class_='list-group-item')]
print(text)
Output:
['07/01/2022, Comments']
Try this ways,must work
text=' '.join([' '.join(child.get_text(strip=True).split(' ')).replace(' DateComment[1]',',') for child in soup.find_all('li',class_='list-group-item')]).strip()
#Or
text= [' '.join(child.get_text(strip=True).split(' ')).replace(' DateComment[1]',',') for child in soup.find_all('li',class_='list-group-item')]
final_text= text[1]+ ',' +text[2]
final_text= text[1]+text[2].split()#if you want to make list

Selenium Automation - Need xpath to locate 2 UI elements with almost same properties located in different HTML tags

The scenario here is I need to Assert whether the status of a jobname is changed to Completed, But the issue is that on the UI page the Job status HTML element*(title="Completed")* is similar for all the different job names*(title="Job1")*.
Below is the sample HTML code:
<div class="flex-primary"><i title="Completed" class="fa fa-cube provider-logo hwx-secondary orange"></i><span class="hwx-title" title="Job1">Job1</span> </div>
<span><button class="btn btn-icon btn-transparent inline-overview-actions" eventkey="1" title="Completed"><i class="fa fa-play-circle hwx-secondary inline-actions-overview no-select"></i></button></span>
<div class="flex-primary"><i title="Completed" class="fa fa-cube provider-logo hwx-secondary orange"></i><span class="hwx-title" title="Job2">Job2</span> </div>
<span><button class="btn btn-icon btn-transparent inline-overview-actions" eventkey="1" title="Completed"><i class="fa fa-play-circle hwx-secondary inline-actions-overview no-select"></i></button></span>
<div class="flex-primary"><i title="Completed" class="fa fa-cube provider-logo hwx-secondary orange"></i><span class="hwx-title" title="Job3">Job3</span> </div>
<span><button class="btn btn-icon btn-transparent inline-overview-actions" eventkey="1" title="Completed"><i class="fa fa-play-circle hwx-secondary inline-actions-overview no-select"></i></button></span>
I want a locator which will uniquely be able to point to a job title in completed state i.e: I want a xpath which will be a combined output for below 2 xpath's result:
//span[#title='job1'] and //button[#title='Completed']
NOTE: This is a follow up question for Answer received for Selenium Automation - Need to combine 1 or more xpath locators to form a single locator

To uniquely identify each button you can find the following::span[1] and then find the button.
//span[#title='Job1']/following::span[1]/button[#title='Completed']
//span[#title='Job2']/following::span[1]/button[#title='Completed']
//span[#title='Job3']/following::span[1]/button[#title='Completed']

Cypress - get an element in iframe

I solve the problem with getting into iframe but now I can't get my element. Maybe I'm finding bad but right now it took me too much time and I don't what to do next.
Source code:
<divid="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_inpDruhVozidla_ADX" class="inputCell" style="visibility:visible;display:inherit;">
<span id="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_lblDruhVozidla_ADX" class="labels labelC1_n W270">Druh vozidla:
</span>
<div id="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_cmbDruhVozidla_ADX" tabindex="13" class="RadDropDownList RadDropDownList_CMS_Black RadComboBoxInput" style="width:216px;height:23px;font-weight:bold;font-size:10pt;font-family:Arial;color:#396170;border-width:1px;border-style:Solid;border-color:#FDC267;background-color:#F9FBFC;">
<span class="rddlInner">
<span class="rddlFakeInput"></span>
<span class="rddlIcon"><!-- --></span>
</span>
<div class="rddlSlide" id="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_cmbDruhVozidla_ADX_DropDown" style="display:none;">
<div class="rddlPopup rddlPopup_CMS_Black">
<ul class="rddlList">
<li class="rddlItem rddlItemSelected"></li>
<li class="rddlItem">Osobní automobily</li>
<li class="rddlItem">Motocykly</li>
<li class="rddlItem">Užitkové automobily</li>
</ul>
</div>
</div>
<input id="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_cmbDruhVozidla_ADX_ClientState" name="ctl00_Telo_Dock_1005_C_ctl00_MainPage1_myPageVozidlo_cmbDruhVozidla_ADX_ClientState" type="hidden" />
</div>
</div>
Image of input:
My get function:
cy.get('#iframe-id')
.iframe('body #elementToFind')
.should('exist')
Thank you all for helping me.

Unfortunately, Cypress have some open issues regarding interacting with an iframe. But here's a pretty straightforward workaround: https://github.com/cypress-io/cypress/issues/136#issuecomment-328100955.
Anyway, I believe that this can work only if the domain of the outer page and of the iframe are the same, due to the same-origin limitation.

Unable to click an elemt " Search Intimation View-Details" thows not such element Exception

I am new to selenium,Guys please help me to click this element " Search Intimation View-Details".Cant able to use the ID has it is in number,& class name is not pointing exactly to that button.Guide me please,I'm strucked up.I tried
driver.findElementByXPath(" //div[span='Search Intimation View-Details'] ").click();
//
driver.findElementByClassName("v-tree-node v-tree-node-expanded v-tree-node-root v-tree-node-last ").click();
Below is the code
<div class="v-tree-node v-tree-node-expanded v-tree-node-last" id="gwt-uid-36" role="treeitem" aria-selected="false" aria-labelledby="gwt-uid-35" aria-level="2" aria-expanded="true">
<div class="v-tree-node-caption">
<div id="gwt-uid-35" for="gwt-uid-36">
<span>Intimations</span>
</div>
</div>
<div class="v-tree-node-children v-tree-node-children-last" role="group">
<div class="v-tree-node v-tree-node-leaf v-tree-node-leaf-last" id="gwt-uid-38" role="treeitem" aria-selected="true" aria-labelledby="gwt-uid-37" aria-level="3">
<div class="v-tree-node-caption v-tree-node-selected">
<div id="gwt-uid-37" for="gwt-uid-38">
<span>Search Intimation View-Details</span>
</div>
</div>
<div class="v-tree-node-children v-tree-node-children-last" role="group"></div>
</div>
</div>
</div>

I believe you are using Java selenium binding, so Use this code
driver.findElement(By.xpath("//span[normalize-space()='Search Intimation View-Details']")).click()

The below script uses the Java programming language and uses the CSS locators to find the required element if we are unable to find using the class, id, XPath, etc.
Use:
driver.findElement(By.cssSelector("div[role=treeitem][id^='gwt-uid-36']")).click();

How to use indexes in XPath

I do have popup where are three dropdowns, ids are unique
with each popup generation:
The first element:
<a aria-required="true" class="select" aria-disabled="false" aria-
describedby="5715:0-label" aria-haspopup="true" tabindex="0" role="button"
title="" href="javascript:void(0);" data-aura-rendered-by="5733:0" data-
interactive-lib-uid="10">Stage 1 - Needs Assessment</a>
While I'm able to identify the element above by simple xpath="//*[#class='select'][1]", the other two, which look same to me (example below), can't be identified by index like //*[#class='select'][2], tried 'following' without success, but I may be not correct with syntax.
Example of dropdown element I'm unable to locate..
<a aria-required="false" class="select" aria-disabled="false" aria-
describedby="6280:0-label" aria-haspopup="true" tabindex="0" role="button"
title="" href="javascript:void(0);" data-aura-rendered-by="6290:0" data-
interactive-lib-uid="16">--None--</a>
Any ideas what am I missing?, except advanced xpath knowledge..
Thank you!

//*[#class='select'][2] will return you required node only if both links are children of the same parent, e.g.
<div>
<a class="select">Stage 1 - Needs Assessment</a>
<a class="select">--None--</a>
</div>
If links are children of different parents, e.g.
<div>
<a class="select">Stage 1 - Needs Assessment</a>
</div>
<div>
<a class="select">--None--</a>
</div>
you should use
(//*[#class='select'])[1]
for first
(//*[#class='select'])[2]
for second

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Soup parser not able to extract class information - beautifulsoup

Pass the class as key-value pairs to the attrs parameter, instead, like this: soup.find_all('div', {'class': 'class-link'}) Result: [<div class="class-link"> <a class="btn btn-primary" href="abc" style="text-decoration: underline"> See all</a> </div>, <div class="class-link"></div>]

According to the BeautifulSoup4 documentation it must be class_ and not _class So you code mus be changed to print(soup.find_all("div", class_="class-link"))

Personally I find css selectors a lot cleaner soup.select('div.class-link') where the . is a css class selector

Related

How to clean up pulled data from BeautifulSoup, Pandas, Python

Selenium Automation - Need xpath to locate 2 UI elements with almost same properties located in different HTML tags

Cypress - get an element in iframe

Unable to click an elemt " Search Intimation View-Details" thows not such element Exception

How to use indexes in XPath

Categories

Resources