How to extract full prices with scrapy? - scrapy

Hi i am trying to scrap e-commerce page, but cant get prices.
I have page with this lines:
<span class="price">255,<sup>99</sup>€</span>
<span class="price">255 €</span>
But i can't extracts all price to one line.
I tried:
response.xpath('//span[#class="price"]/text()').extract()
But it ignores text in <sup> tag...
What i am doing wrong? please help.

You need to add another slash before text. So it addresses ALL nodes.
response.xpath('//span[#class="price"]//text()').extract()
Text='255,'
Text='99'
Text='€'

You should put double splash instead of single one.
response.xpath('//span[#class="price"]//text()').extract()
This statement returns all text under the specified tag as list object.
Note that the returned list may have some useless elements just like empty or return carriage character.
So you can use regex if you want extract only price information.
response.xpath('//span[#class="price"]//text()').re(r'[\d.,]+')
The currency symbol was ignored.
['255,','99','255']
Finally if you want get 255.99 from the page
''.join(response.xpath('//span[#class="price"][1]//text()').re(r'[\d.,]+')).replace(",",".")
You get all products first.
Final code:
products = response.xpath('//*[#class="catalog-table"]//td')
for prod in products:
price = ''.join(prod.xpath('//span[#class="price"][1]//text()').re(r'[\d.,]+')).replace(",",".")
print price

Check source HTML. There is in the source:
I was searching for the same question for the whole day and find this answer perfect for this
response.xpath('//meta[#itemprop="price"]/#content').get()

Related

Replacing commas with spaces for classes in Shopify

I am using product tags as a way to filter what users see depending on if they are commercial or consumer customers. I have managed to get the tags into the class of each product displayed in a collection using the following code. The {{itemTags}} is being used to pass the value over into the HTML.
itemHtml = itemHtml.replace(/{{itemTags}}/g, data.tags);
<div class="{{itemTags}}">
The only issue is that all of the tags are displayed in a string with no spaces and commas in between each one. Resulting in the following class being added as an example.
class="accessories,Consumer,DJI Air 2S,live,pre-order,preorder"
Is it possible to remove the commas and add spaces in between each tag?
use the replace tag in liquid
itemHtml = itemHtml.replace(/{{itemTags | replace: "," , " " }}/g, data.tags);

Extract portion of HTML from website?

I'm trying to use VBA in Excel, to navigate a site with Internet explorer, to download an Excel file for each day.
After looking through the HTML code of the site, it looks like each day's page has a similar structure, but there's a portion of the website link that seems completely random. But this completely random part stays constant and does not change each time you want to load the page.
The following portion of the HTML code contains the unique string:
<a href="#" onClick="showZoomIn('222698519','b1a9134c02c5db3c79e649b7adf8982d', event);return false;
The part starting with "b1a" is what is used in the website link. Is there any way to extract this part of the page and assign it as a variable that I then can use to build my website link?
Since you don't show your code, I will talk too in general terms:
1) You get all the elements of type link (<a>) with a Set allLinks = ie.document.getElementsByTagName("a"). It will be a vector of length n containing all the links you scraped from the document.
2) You detect the precise link containing the information you want. Let's imagine it's the 4th one (you can parse the properties to check which one it is, in case it's dynamic):
Set myLink = allLinks(3) '<- 4th : index = 3 (starts from zero)
3) You get your token with a simple split function:
myToken = Split(myLink.onClick, "'")(3)
Of course you can be more synthetic if the position of your link containing the token is always the same, like always the 4th link:
myToken = Split(ie.document.getElementsByTagName("a")(3).onClick,"'")(3)

#Dblookup and formatting on web

I have been developing a web application using domino, therein I have dblookup-ing the field from notes client; Now, this is working fine but the format of value is missing while using on web.
For example in lotus notes client the field value format is as above
I am one, I am two, I am one , I am two, labbblallalalalalalalalalalalalalalalalalalaallllal
Labbbaalalalallalalalalalaalallaal
Hello there, labblalalallalalalllaalalalalalalalalalalalalalalalalalalalalalalala
Now when I retrieve the value of the field on web it seems it takes 2 immediate after 1. and so forth, I was expecting line feed here which is not happening.
The field above is multi valued field. Also on web I have used computed text which does db lookup from notes client.
Please help me what else could/alternate solution for this case.
Thanks
HD
Your multi-valued field has display options associated with it and the Notes client honors those. Obviously, your options are set up to display entries separated by newlines.
The computed text that you are using for the web does not have options like that and the field options are irrelevant because you aren't displaying the field. Your code has to insert the #Newlines. That's pretty easy because #DbLookup returns a list, and if you concatenate a list and a scalar, the scalar will be appended to each element of the list. (Look at the third example under "concatenation, pairwise" here to see what I mean.
The way you've worded your question is a little unclear to me, but what you need in your computed text formula is either something like this:
list := #DbLookup(etc,. etc.);
list + #Newline;
Or something like this:
multiValueFieldContainingListWithDbLookupResult + #NewLine;
I used #implode(Dblookupreturnedvalue;"");
thanks All :)

In Shopify, am i able to change the display of variants title?

my variants title is “pack/4 bottles:4/PK”
“4/PK” is needed for shipping company to catch specific item.
However, it looks ugly when "4/PK" is displayed on page
Is there a way to hide it? Which liquid template should i touch?
Should I use
{{variant.title|move:'4/PK'}}
where should i put this code?
While this sounds more like something that should be assigned as an option for your variants instead of in the title, you can hide the part of the variants title that you don't want via using split and first
https://help.shopify.com/themes/liquid/filters/string-filters#split
split can be used to split a string (in this case your variant.title) into an array based on a set delimiter to divide it.
So you could do something only the lines of
{{variant.title | split: ':' | first }}
In your case, the output of the above would be: pack/4 bottles.
As for which liquid templates you will need to edit this into ... it will depend on your store. However some common areas would be:
product.liquid
cart.liquid
I highly recommend you read the the shopify liquid documents Here
Also, make sure to make a backup theme before doing any liquid changes in your theme that you are unsure of.
Hope this helps!

Netsuite PDF Templating: get number of pages as attribute

I am templating pdfs in Netsuite using freemarker and I want to display the footer only on the last page. I have been doing some research, but couldn't find a solution (since looks like the environment does not allow me to include or import libs), so I thought that just comparing the number of the page with the total pages in an if tag would be a nice and easy workaround. I already know how to display the numbers by using the <pagenumber/> and <totalpages/> tags, but still cannot get them as values so I can use them like this:
<#if (pagenumber == totalpages) >
... footer html...
</#if>
Any ideas of how or where can I get those values from?
The approach you are trying won't work, because you are mixing BFO and Freemarker syntax. Netsuite uses two different "engines" to process PDF Templates. The first step is Freemarker, which merges the record fields with your template and produces an XML file, which is then converted by BFO into a PDF file. The <totalpages/> element is meaningless to Freemarker, as it is only converted into a number by BFO later.
Unfortunately, the ability to add a footer to only the last page of a document is currently a limitation of BFO, as per the BFO FAQ:
At the moment we do not have a facility for explicitly assigning a
footer or header to the last page in a document when the number of
pages is unknown.
You CAN add it after a page break - and put the page break at the end of the body
<pbr footer="nlfooter" footer-height="25%"></pbr>
</body>
The issue here is - on a one page output - you will get 2 pages minimum... it will always ADD a page for the disclaimer / footer...