XPath select following until some condition? - scrapy

I'm having a trouble selecting product from following node. Here's the html:
<div>
<p>Order ID 1</p>
<p style="display:none"></p>
<p>product 1</p>
<p>Order ID 2</p>
<p style="display:none"></p>
<p>product 1</p>
<p>product 2</p>
<p>Order ID 3</p>
<p style="display:none"></p>
<p>product 1</p>
<p>product 2</p>
<p>Order ID 4</p>
<p style="display:none"></p>
<p>product 1</p>
<p>product 2</p>
<p>product 3</p>
<p>Order ID 5</p>
<p style="display:none"></p>
<p>product 1</p>
</div>
I selected Order ID with following code:
//div/p[#style="display:none"]/preceding-sibling::p[1]
Is there any way to select product? code I tried :
//div/p[#style="display:none"]/following::p[not(#style="display:none" )]
result :
<p>product 1</p>
<p>Order ID 2</p>
<p>product 1</p>
<p>product 2</p>
<p>Order ID 3</p>
<p>product 1</p>
<p>product 2</p>
<p>Order ID 4</p>
<p>product 1</p>
<p>product 2</p>
<p>product 3</p>
<p>Order ID 5</p>
<p>product 1</p>
How to deselect order ID

You can try using the text() content, as follow:
//div/p[contains(text(), 'product')]/text()
or
//div/p[not(contains(text(), 'Order'))]/text()
Using Scrapy in python the output using extract() function is:
['product 1', 'product 1', 'product 2', 'product 1', 'product 2', 'product 1', 'product 2', 'product 3', 'product 1']

I. Use:
/div/p[#style='display:none']
/following-sibling::p[not(#style)]
[not(following-sibling::p[1][#style='display:none'])]
II. Explanation
In simple words this XPath expression instructs the XPath engine to do the following:
Get all following siblings of all p elements that are children of the top element div and have a style attribute with value the string "display:none", such that (these following siblings) don't have a style attribute themselves, and are not an immediate preceding sibling of a p element that has a style attribute with value the string "display:none"
III. XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/div/p[#style='display:none']
/following-sibling::p[not(#style)]
[not(following-sibling::p[1][#style='display:none'])]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<div>
<p>Order ID 1</p>
<p style="display:none"></p>
<p>product 1</p>
<p>Order ID 2</p>
<p style="display:none"></p>
<p>product 1</p>
<p>product 2</p>
<p>Order ID 3</p>
<p style="display:none"></p>
<p>product 1</p>
<p>product 2</p>
<p>Order ID 4</p>
<p style="display:none"></p>
<p>product 1</p>
<p>product 2</p>
<p>product 3</p>
<p>Order ID 5</p>
<p style="display:none"></p>
<p>product 1</p>
</div>
The XPath expression is evaluated and its (wanted, correct) result is copied to the output:
<p>product 1</p>
<p>product 1</p>
<p>product 2</p>
<p>product 1</p>
<p>product 2</p>
<p>product 1</p>
<p>product 2</p>
<p>product 3</p>
<p>product 1</p>
Here is a screenshot of evaluating this XPath expression with the XPath Visualizer:

You can check for p tags that their following sibling doesn't have that style (this won't apply to Order ID i).
scrapy shell
In [1]: from scrapy import Selector
In [2]: html=""" <div>
...: <p>Order ID 1</p>
...: <p style="display:none"></p>
...: <p>product 1</p>
...:
...: <p>Order ID 2</p>
...: <p style="display:none"></p>
...: <p>product 1</p>
...: <p>product 2</p>
...:
...: <p>Order ID 3</p>
...: <p style="display:none"></p>
...: <p>product 1</p>
...: <p>product 2</p>
...:
...: <p>Order ID 4</p>
...: <p style="display:none"></p>
...: <p>product 1</p>
...: <p>product 2</p>
...: <p>product 3</p>
...:
...: <p>Order ID 5</p>
...: <p style="display:none"></p>
...: <p>product 1</p>
...:
...: </div>"""
In [3]: sel = Selector(text=html)
In [4]: sel.xpath('//div/p[#style="display:none"]/following::p[not(following::p[1][#style="display:none"])]/text()').ge
...: tall()
Out[4]:
['product 1',
'product 1',
'product 2',
'product 1',
'product 2',
'product 1',
'product 2',
'product 3',
'product 1']

Related

i am learning selenium by my own. I have the following question from an interview. write an xpath using sibling

I have the following question from an interview: Write an xpath to access the hotels link by traversing from flights link:
i have highlighted the flights link in web browsertool. but how to do after that i am not sure
for this HTML:
<ul class="makeFlex font12">
<li data-cy="menu_Flights" class="menu_Flights">
<a href="https://www.makemytrip.com/flights/" class="active makeFlex hrtlCenter column
">
<span class="chNavIcon appendBottom2 chSprite chFlights active"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Flights
</span>
</a>
</li>
<li data-cy="menu_Hotels" class="menu_Hotels">
<a href="https://www.makemytrip.com/hotels/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chHotels"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Hotels
</span>
</a>
</li>
<li data-cy="menu_Homestays" class="menu_Homestays">
<a href="https://www.makemytrip.com/homestays/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chHomestays"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Homestays
</span>
</a>
</li>
<li data-cy="menu_Holidays" class="removeItemMargin menu_Holidays">
<a href="https://www.makemytrip.com/holidays-india/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chHolidays"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Holiday Packages
</span>
</a>
</li>
<li data-cy="menu_Trains" class="menu_Trains">
<a href="https://www.makemytrip.com/railways/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chTrains"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Trains
</span>
</a>
</li>
<li data-cy="menu_Buses" class="menu_Buses">
<a href="https://www.makemytrip.com/bus-tickets/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chBuses"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Buses
</span>
</a>
</li>
<li data-cy="menu_Cabs" class="menu_Cabs">
<a href="https://www.makemytrip.com/cabs/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chCabs"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Cabs
</span>
</a>
</li>
<li data-cy="menu_Visa" class="menu_Visa">
<a href="https://www.makemytrip.com/visa/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chVisa"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Visa
</span>
</a>
</li>
<li data-cy="menu_Charters" class="menu_Charters">
<a href="https://www.makemytrip.com/charter-flights/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chCharterFlights"></span>
<span class="reduceMenuSpacing chNavText darkGreyText">
<!-- --> <!-- -->Charter Flights
</span>
</a>
</li>
<li data-cy="menu_Activities" class="menu_Activities">
<a href="https://www.makemytrip.com/activities/" class="makeFlex hrtlCenter column">
<span class="chNavIcon appendBottom2 chSprite chActivities"></span>
<span class="false chNavText darkGreyText">
<!-- --> <!-- -->Activities
</span>
</a>
</li>
</ul>
A simple xpath to access the hotels link by traversing from flights link:
//a[contains(#href,'/flights/')]/../following-sibling::li
Explanation:
//a[contains(#href,'/flights/')]
is to target flights and then go step up in DOM by using /.. and the looking for following-sibling which is li

Get text with quotes inside one tag using BeautifulSoup

I am trying to parse web page using BeautifulSoup.
Case 1:
<div class="a">
<div class="b">abc def</div>
<div class="c">123 456</div>
</div>
Case 2:
<div class="a">
<div class="b">
"abc "
"def"
</div>
<div class="c">123 456</div>
</div>
I want to get a text from class C using the code:
c = soup.find('div', class = 'b', text = 'abc def').next_sibling.text
In Case 1 it works well. But in Case 2 it doesn't work. For Case 2 I tried also:
c = soup.find('div', class = 'b', text = '"abc ""def"').next_sibling.text
In both cases
soup.find('div', class = 'b').text
gives me the same value:
abc def
What is the right way to work with Case 2?
[EDIT #1]
I need to do this way because there are several div with the same class:
<div class="a">
<div class="b">abc def1</div>
<div class="c">123</div>
</div>
<div class="a">
<div class="b">abc def2</div>
<div class="c">456</div>
</div>
<div class="a">
<div class="b">
"abc "
"def3"
</div>
<div class="c">789</div>
</div>
[EDIT #2]
I tried to save response.text to file and saw, that it looks like:
<div class="b">abc <!-- -->def3</div>
But in Chrome it looks like:
<div class="b">
"abc "
"def3"
</div>
Also, I can't get text by re.compile if the text inside tag is like:
<div class="b">abc m<sup>2</sup></div>
You can find next element with tag name and then print the value.
You can use re module to search text.
Here you have two examples.
Example 1:
import bs4
htmldoc='''<html><div class="a">
<div class="b">
"abc "
"def"
</div>
<div class="c">123 456</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
c = soup.find('div', class_='b').findNext('div').contents[0]
print(c)
output:
123 456
Example 2:
import bs4
import re
htmldoc='''<html><div class="a">
<div class="b">
"abc "
"def"
</div>
<div class="c">123 456</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
d = soup.find('div', text=re.compile('def')).findNext('div').contents[0]
print(d)
output:
123 456
Example 3:
import bs4
htmldoc='''<html><div class="a">
<div class="b">abc def1</div>
<div class="c">123</div>
</div>
<div class="a">
<div class="b">abc def2</div>
<div class="c">456</div>
</div>
<div class="a">
<div class="b">
"abc "
"def3"
</div>
<div class="c">789</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
c = soup.find_all('div', class_='b')
for d in c:
text=d.findNext('div').contents[0]
print(text)
Output:
123
456
789
Example 4:
import bs4
htmldoc='''<html><div class="a">
<div class="b">abc def1</div>
<div class="c">123</div>
</div>
<div class="a">
<div class="b">abc def2</div>
<div class="c">456</div>
</div>
<div class="a">
<div class="b">
"abc "
"def3"
</div>
<div class="c">789</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
d=soup.find(lambda tag:tag.name=="div" and "abc " in tag.text and "def3" in tag.text).findNext('div').findNext('div').contents[0]
print(d)
output:
789
Example 5:
import bs4
htmldoc='''<html><div class="a">
<div class="b">abc def1</div>
<div class="c">123</div>
</div>
<div class="a">
<div class="b">abc def2</div>
<div class="c">456</div>
</div>
<div class="a">
<div class="b">
"abc "
"def3"
</div>
<div class="c">789</div>
</div></html>'''
soup = bs4.BeautifulSoup(htmldoc, 'html.parser')
c = soup.find_all('div', class_='b')
for d in c:
if (('abc ' in d.text) and ('def3' in d.text)):
textc=d.findNext('div').contents[0]
print(textc)
Hope this helps.

How to make a QWeb report widget?

I'm trying to create a simple Qweb widget. My code is like this
<template id="contact_name">
<address t-ignore="true" class="mb0" itemscope="itemscope" itemtype="http://schema.org/Signature">
<div t-att-class="'name' not in fields and 'css_non_editable_mode_hidden'">
<span itemprop="name" t-esc="name"/>
</div>
</address>
</template>
Then I called the widget like this
<address t-field="o.partner_id" t-field-options='{"widget": "contact_name", "fields": ["name"], "no_marker": true}' />
It did not print out name but print out the object res.partner(703,)
How is it to print the name? Is it not enough to just to use <span itemprop="name" t-esc="name"/>?
<div class="page">
<div class="oe_structure"/>
<div class="col-xs-6 pull-left">
<h2 style="color:red">
<span>Plan Order :
<span style="color:Red" t-field='doc.name'/>
</span>
</h2>
</div>
<div colspan="4" class="col-xs-6 text-right">
<span>
<img t-att-src="'/report/barcode/?type=%s&value=%s&width=%s&height=%s' % ('Code128', doc.name, 500, 50)"
style="width:100%;height:50px"/>
</span>
</div>

Bootstrap tab content won't open after clicking tab link

Code looks fine in my opinion, I basically used the example code from the docs. jQuery is also referenced before bootstrap.js as well. What could be the issue here? When I click on any other tab that's not active, it still only displays the first tabs content.
<section id="how-it-works">
<div class="container">
<div class="wizard">
<div class="wizard-inner">
<div class="connecting-line"></div>
<ul class="nav nav-tabs">
<li class="active">
<a href="#tab1" data-toggle="tab" aria-controls="tab1" role="tab" title="Step 1">
<span class="round-tab">
<i class="glyphicon glyphicon-folder-open"></i>
</span>
</a>
</li>
<li>
<a href="#tab2" data-toggle="tab" aria-controls="tab2" role="tab" title="Step 2">
<span class="round-tab">
<i class="glyphicon glyphicon-pencil"></i>
</span>
</a>
</li>
<li>
<a href="#tab3" data-toggle="tab" aria-controls="tab3" role="tab" title="Step 3">
<span class="round-tab">
<i class="glyphicon glyphicon-picture"></i>
</span>
</a>
</li>
<li>
<a href="#tab4" data-toggle="tab" aria-controls="tab4" role="tab" title="That's It!">
<span class="round-tab">
<i class="glyphicon glyphicon-ok"></i>
</span>
</a>
</li>
</ul>
</div>
<div class="tab-content">
<div class="tab-pane active" id="#tab1" role="tabpanel">
<h3>Step 1</h3>
<p>This is step 1</p>
</div>
<div class="tab-pane" id="#tab2" role="tabpanel">
<h3>Step 2</h3>
<p>This is step 2</p>
</div>
<div class="tab-pane" id="#tab3" role="tabpanel">
<h3>Step 3</h3>
<p>This is step 3</p>
</div>
<div class="tab-pane" id="#tab4" role="tabpanel">
<h3>Step 4</h3>
<p>That's It!</p>
</div>
</div>
</div>
</div>
</section>
Try removing the # in the ID of each tab under tab content.
id="#tab4" to id="tab4"
Also, you tagged bootstrap 3 but the code you are using looks like it is from bootstrap 4.
https://v4-alpha.getbootstrap.com/components/navs/

Bootstrap centering odd thumbnail

I'm trying to center some generated thumbnail in a row. The problem is that I don't know the number of thumbnail I will get so... I need a flexible solution... I tried with the center-block class, but this isn't working.
Here is a snippet available to show an exemple of my problem (try to center the 4th thumbnail under the first row) :
http://www.bootply.com/663prFgvlL
And my HTML below :
<div class="col-md-3 col-sm-6 thumb">
<div class="view view-first">
<a href="/photologue/photo/mdl-5/">
<img class="img-portfolio img-responsive" title="MDL 5" src="/media/photologue/photos/cache/MDL_6_thumbnail.jpg" alt="portfolio MDL 5">
</a>
<div class="mask">
<h4>MDL 5</h4>
<p class="muted"><small>Publiée le 13 avril 2016 16:53</small></p>
</div>
</div>
</div>
<div class="col-md-3 col-sm-6 thumb">
<div class="view view-first">
<a href="/photologue/photo/mdl-4/">
<img class="img-portfolio img-responsive" title="MDL 4" src="/media/photologue/photos/cache/MDL_5_thumbnail.jpg" alt="portfolio MDL 4">
</a>
<div class="mask">
<h4>MDL 4</h4>
<p class="muted"><small>Publiée le 13 avril 2016 16:52</small></p>
</div>
</div>
</div>
<div class="col-md-3 col-sm-6 thumb">
<div class="view view-first">
<a href="/photologue/photo/mdl-3/">
<img class="img-portfolio img-responsive" title="MDL 3" src="/media/photologue/photos/cache/MDL_4_thumbnail.jpg" alt="portfolio MDL 3">
</a>
<div class="mask">
<h4>MDL 3</h4>
<p class="muted"><small>Publiée le 13 avril 2016 16:51</small></p>
</div>
</div>
</div>
<div class="col-md-3 col-sm-6 thumb">
<div class="view view-first">
<a href="/photologue/photo/mdl-2/">
<img class="img-portfolio img-responsive" title="MDL 2" src="/media/photologue/photos/cache/MDL_3_thumbnail.jpg" alt="portfolio MDL 2">
</a>
<div class="mask">
<h4>MDL 2</h4>
<p class="muted"><small>Publiée le 13 avril 2016 16:51</small></p>
</div>
</div>
</div>
<div class="col-md-3 col-sm-6 thumb">
<div class="view view-first">
<a href="/photologue/photo/mdl-1/">
<img class="img-portfolio img-responsive" title="MDL 1" src="/media/photologue/photos/cache/MDL_1_thumbnail.jpg" alt="portfolio MDL 1">
</a>
<div class="mask">
<h4>MDL 1</h4>
<p class="muted"><small>Publiée le 13 avril 2016 16:50</small></p>
</div>
</div>
</div>
<div class="col-md-3 col-sm-6 thumb">
<div class="view view-first">
<a href="/photologue/photo/plan-mdl/">
<img class="img-portfolio img-responsive" title="Plan MDL" src="/media/photologue/photos/cache/Plan_amenagement_couleurs_thumbnail.jpg" alt="portfolio Plan MDL">
</a>
<div class="mask">
<h4>Plan MDL</h4>
<p class="muted"><small>Publiée le 13 avril 2016 16:50</small></p>
</div>
</div>
</div>
I'm looking to center only the last thumb on the grid... I think this is pretty simple, but... I'm stuck... !
Thanks in advance for your help !
Ok, I should try to search a little bit more before asking.... !
Got my answer here :
http://www.minimit.com/articles/solutions-tutorials/bootstrap-3-responsive-centered-columns
Only have to :
<div class="container">
<div class="row row-centered">
<div class="col-xs-6 col-centered"></div>
<div class="col-xs-6 col-centered"></div>
<div class="col-xs-3 col-centered"></div>
<div class="col-xs-3 col-centered"></div>
<div class="col-xs-3 col-centered"></div>
</div>
</div>
And CSS :
/* centered columns styles */
.row-centered {
text-align:center;
}
.col-centered {
display:inline-block;
float:none;
/* reset the text-align */
text-align:left;
/* inline-block space fix */
margin-right:-4px;
}