beautifulsoup tag element contents() strip() method throw TypeError: Nonetype is not callable - beautifulsoup

case 1st:
<li class="chapters">
<i>In</i>
<i>vitro</i>
blahblah1
<i>in</i>
<i>vitro</i>
blahblah2
View details
</li>
case 2nd:
<li class="chapters">
blahblah2
View details
</li>
I got 2 problems:
Problem 1st: when I use .contents[0].strip(), to get the blahblahs, case 2nd works. but will throw TypeError: Nonetype is not callable.
At case 1st. .contents[0] of case 1st is a tag: In. is this a Nonetype? it's a tag but no Nonetype.
Problem 2nd: how can I use one or two lines to deal with these 2 cases. the reason why case 1st exists I guess is the input error of the website.
by the way, I use the Beautifulsoup & lxml to parse the html.

select a tag then get previous content using .previous_sibling
texts = soup.select('.chapters a')
for t in texts:
print(t.previous_sibling.strip())

Related

Click on parent element based on two conditions in child elements Selenium Driver

Using Selenium 4.8 in .NET 6, I have the following html structure to parse.
<ul class="search-results">
<li>
<a href=//to somewhere>
<span class="book-desc">
<div class="book-title">some title</div>
<span class="book-author">some author</span>
</span>
</a>
</li>
</ul>
I need to find and click on the right li where the book-title matches my variable input (ideally ignore sentence case too) AND the book author also matches my variable input. So far I'm not getting that xpath syntax correct. I've tried different variations of something along these lines:
var matchingBooks = driver.FindElements(By.XPath($"//li[.//span[#class='book-author' and text()='{b.Authors}' and #class='book-title' and text()='{b.Title}']]"));
then I check if matchingBooks has a length before clicking on the first element. But matchingBooks is always coming back as 0.
class="book-author" belongs to span while class="book-title" belongs to div child element.
Also it cane be extra spaces additionally to the text, so it's better to use contains instead of exact equals validation.
So, instead of "//li[.//span[#class='book-author' and text()='{b.Authors}' and #class='book-title' and text()='{b.Title}']]" please try this:
"//li[.//span[#class='book-author' and(contains(text(),'{b.Authors}'))] and .//div[#class='book-title' and(contains(text(),'{b.Title}'))]]"
UPD
The following XPath should work. This is a example specific XPath I tried and it worked "//li[.//span[#class='book-author' and(contains(text(),'anima'))] and .//div[#class='book-title' and(contains(text(),'Coloring'))]]" for blood of the fold search input.
Also, I guess you should click on a element inside the li, not on the li itself. So, it's try to click the following element:
"//li[.//span[#class='book-author' and(contains(text(),'{b.Authors}'))] and .//div[#class='book-title' and(contains(text(),'{b.Title}'))]]//a"

Unable to loop through navigable string with BeautifulSoup CSS Selector

I would like to extract the content inside the p tag below.
<section id="abstractSection" class="row">
<h3 class="h4">Abstract<span id="viewRefPH" class="pull-right hidden"></span>
</h3>
<p> Variation of the (<span class="ScopusTermHighlight">EEG</span>), has functional and. behavioural effects in sensory <span class="ScopusTermHighlight">EEG</span We can interpret our. Individual <span class="ScopusTermHighlight">EEG</span> text to extract <span class="ScopusTermHighlight">EEG</span> power level.</p>
</section>
A one line Selenium as below,
document_abstract = WebDriverWait(self.browser, 20).until(
EC.visibility_of_element_located((By.XPATH, '//*[#id="abstractSection"]/p'))).text
can extract easily the p tag content and provide the following output:
Variation of the EEG, has functional and. behavioural effects in sensoryEEG. We can interpret our. Individual EEG text to extract EEG power level.
Nevertheless, I would like to employ the BeautifulSoup due to speed consideration.
The following bs by referring to the css selector (i.e.,#abstractSection ) was tested
url = r'scopus_offilne_specific_page.html'
with open(url, 'r', encoding='utf-8') as f:
page_soup = soup(f, 'html.parser')
home=page_soup.select_one('#abstractSection').next_sibling
for item in home:
for a in item.find_all("p"):
print(a.get_text())
However, the compiler return the following error:
AttributeError: 'str' object has no attribute 'find_all'
Also, since Scopus require login id, the above problem can be reproduce by using the offline html which is accessible via this link.
May I know where did I do wrong, appreciate for any insight
Thanks to this OP, the problem issued above apparently can be solve simply as below
document_abstract=page_soup.select('#abstractSection > p')[0].text

How to get the content of this html code in selenium

How to get the value "https://connect.facebook.net..." which is present in the li tag. I tried with the below options but it is returning as NULL string in Selenium.
HTML code as below:
<ul>
<li>Script</li>
<li>https://connect.facebook.net/?v=2.8.1:0</li>
</ul>
Tried code:
driver.findElement(By.xpath("//*/li[2][contains(#text,'')]")).getText()
driver.findElement(By.xpath("//ul/li[2]")).getText()
driver.findElement(By.xpath("//ul/li[2]")).getAttribute("value")
All the above is returning value as 0.
I got that using the below code.
driver.findElement(By.xpath("//ul/li[2]")).getAttribute("innerText")
What are the different values which can be expected or used for .getAttribute Method? It would be of great help if any shares me the details.

Error if don't check if {{object.field}} exists

I have a question about checking if some field in object exists.
I want to print all categories which user has so I'm doing something like this:
<ul *ngIf="user.categories.length > 0" *ngFor="#category of user.categories">
<li>
{{category.name}}
</li>
</ul>
The reason? All the data are PROPERLY printed, but I'm getting an error in web console like this:
Cannot read property 'name' of null
But when I do something like:
<ul *ngIf="user.categories.length > 0" *ngFor="#category of user.categories">
<li *ngIf="category">
{{category.name}}
</li>
</ul>
Then all is okay.
Am I doing something wrong or maybe I have to check this every time? Have you ever had a problem like this one?
basic usage
Use the safe-navigation operator
{{category?.name}}
then name is only read when category is not null.
array
This only works for the . (dereference) operator.
For an array you can use
{{records && records[0]}}
See also Angular 2 - Cannot read property '0' of undefined error with context ERROR CONTEXT: [object Object]
async pipe
With async pipe it can be used like
{{(chapters | async)?.length
ngModel
With ngModel currently it needs to be split into
[ngModel]="details?.firstname" (ngModelChange)="details.firstname = $event"
See also Data is not appending to template in angular2
*ngIf
An alternative is always to wrap the part of the view with *ngIf="data" to prevent the part being rendered at all before the data is available to prevent the dereference error.

jQuery selector, IE <P><FORM> selector behavior

If your prepend a FORM element with a P element, elements below the DIV in the example will not be selected!
<P><FORM id=f ...
<INPUT ...>
<DIV><INPUT (this element is not selectable)
</DIV>
</FORM>
No $('#f INPUT').events will happen in IE for the second input above
Try the testcase at: http://jsfiddle.net/jorese/Bzc7M/
In IE you will receive an alert=3, remove the P element in front of the FORM element and you get the expected alert=5. In Chrome|FF you get alert=5 as expected.
Can somebody explain this?
Your HTML code is not valid, it contains a few errors, the reason why some browsers render it is that they tolerate invalid code to some extent by trying to guess what the developer originally wanted to write.
The div element can be used to group almost any elements together. Indeed, it can contain almost any other element, unlike p, which can only contain inline elements.
Use div instead:
http://jsfiddle.net/mshMX/
Sitepoint reference: http://reference.sitepoint.com/html/p
W3 reference http://www.w3.org/TR/html4/sgml/dtd.html
A former StackOverflow question about the same problem: Why <p> tag can't contain <div> tag inside it?