Schema.org price and SPARQL - sparql

I have the following HTML with Schema.org RDFa:
<li class="product" typeof="s:Product">
<a href="item.php?id=227">
<img property="s:img" src="http://www.test.com/pictures/227.jpg"></a>
<h2 property="s:name">Example name</h2>
<div property="s:brand">Examplebrand</div>
<div property="s:model">Examplemodel</div>
<div rel="s:offers">
<div class="description" typeof="s:Offer">
<div property="s:price">79,00</div>
<div property="s:priceCurrency" content="EUR"></div>
</div>
</div>
<div property="s:productID" content="NQ==">
<div rel="s:seller">
<div class="description" typeof="s:Organization">
<div property="s:name">Shop1</div>
</div>
</div>
</div>
</li>
After loading the page I want to use SPARQL to select all the products which are (for example) > €70,00.
But this only gives back NULL:
PREFIX s: <http://schema.org/>
SELECT ?a ?price
WHERE {
?a s:price ?price.
FILTER (?price > 70).
}
I think it's not interpreting the price as an price/float. What am I doing wrong?

The XHTML isn't enough for us to get the corresponding RDF data from the RDFa. I've filled out your XHTML into the following. Note that I've make the s prefix be http://schema.org/ based on your SPARQL query. However, if those prefixes don't line up in your data, that would be an easy place for things to break down.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
version="XHTML+RDFa 1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:s="http://schema.org/"
xsi:schemaLocation="http://www.w3.org/1999/xhtml
http://www.w3.org/MarkUp/SCHEMA/xhtml-rdfa-2.xsd"
lang="en"
xml:lang="en">
<head><title>Some title</title></head>
<body>
<li class="product" typeof="s:Product">
<a href="item.php?id=227">
<img property="s:img" src="http://www.test.com/pictures/227.jpg"/></a>
<h2 property="s:name">Example name</h2>
<div property="s:brand">Examplebrand</div>
<div property="s:model">Examplemodel</div>
<div rel="s:offers">
<div class="description" typeof="s:Offer">
<div property="s:price">79,00</div>
<div property="s:priceCurrency" content="EUR"></div>
</div>
</div>
<div property="s:productID" content="NQ==">
<div rel="s:seller">
<div class="description" typeof="s:Organization">
<div property="s:name">Shop1</div>
</div>
</div>
</div>
</li>
</body>
</html>
Putting that into the W3C's RDFa distiller, we can get this RDF:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:s="http://schema.org/"
xmlns:xhv="http://www.w3.org/1999/xhtml/vocab#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
<rdf:Description rdf:about="http://www.test.com/pictures/227.jpg">
<s:img xml:lang="en"></s:img>
</rdf:Description>
<s:Product>
<s:seller>
<s:Organization>
<s:name xml:lang="en">Shop1</s:name>
</s:Organization>
</s:seller>
<s:productID xml:lang="en">NQ==</s:productID>
<s:model xml:lang="en">Examplemodel</s:model>
<s:offers>
<s:Offer>
<s:priceCurrency xml:lang="en">EUR</s:priceCurrency>
<s:price xml:lang="en">79,00</s:price>
</s:Offer>
</s:offers>
<s:name xml:lang="en">Example name</s:name>
<s:brand xml:lang="en">Examplebrand</s:brand>
</s:Product>
</rdf:RDF>
Looking at the RDF, it's easy to see why the price is being interpreted as a string:
<s:price xml:lang="en">79,00</s:price>
The property value is a string, and a string with a language tag at that! You can specify the datatype easily, however, by adding the namespace and a datatype attribute:
<html xmlns="http://www.w3.org/1999/xhtml"
version="XHTML+RDFa 1.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
...>
...
<div property="s:price" datatype="xsd:float">79,00</div>
...
</html>
However, the comma notation isn't actually legal for the xsd:float type, so you'll actually need to specify a content attribute too, as in:
<div property="s:price" datatype="xsd:float" content="79.00">79,00</div>
After those changes, you'll get this RDF:
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:s="http://schema.org/"
xmlns:xhv="http://www.w3.org/1999/xhtml/vocab#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
<s:Product>
<s:productID xml:lang="en">NQ==</s:productID>
<s:model xml:lang="en">Examplemodel</s:model>
<s:brand xml:lang="en">Examplebrand</s:brand>
<s:offers>
<s:Offer>
<s:priceCurrency xml:lang="en">EUR</s:priceCurrency>
<s:price rdf:datatype="http://www.w3.org/2001/XMLSchema#float">79.00</s:price>
</s:Offer>
</s:offers>
<s:name xml:lang="en">Example name</s:name>
<s:seller>
<s:Organization>
<s:name xml:lang="en">Shop1</s:name>
</s:Organization>
</s:seller>
</s:Product>
<rdf:Description rdf:about="http://www.test.com/pictures/227.jpg">
<s:img xml:lang="en"></s:img>
</rdf:Description>
</rdf:RDF>
After those changes, your query works just fine with no modifications:
$ arq --data data3.rdf --query query.sparql
------------------------------------------------------------
| a | price |
============================================================
| _:b0 | "79.00"^^<http://www.w3.org/2001/XMLSchema#float> |
------------------------------------------------------------

Related

Why am I not able to scrape just this particular P tag?

I am using scrapy shell just to make sure my selectors for my spider are correct. I am able to get all other sections I need except this one p tag that contains the cross ref part numbers. I am scraping from this particular page here
When I try response.css('div.col-1-2-2' > div.rpr-help m-chm > div > p::text').extract() it returns blank
When I try response.css('div > p::text').extract() the results have the section I am looking for plus a bunch of data I do not want.
I have a feeling this is going to be a super easy answer, but I have no idea what I am missing here
This is a snippet of the html section of the page I am trying to scrape, the last 'p' tag starting with Part Number
<div class="col-1-2-2">
<div id="img-detail" style="text-align:center;">
<div id="img-detail-main">
<a id="ctl00_cphMain_imgenlarge" rel="nofollow" href="/detail-img.aspx?id=3094537&i=" class="cboxElement"><img id="ctl00_cphMain_iMain" src="https://cdn.appliancepartspros.com/images/product/cache/whirlpool-clutch-assembly-285785-ap3094537_01_l.jpg" style="border-width:0px;outline:none;">
<div class="img-overlay" style="display:none;"><img src="/images/play.png" style="height:107px;"></div>
<div id="main-text-overlay" style="display:none;"></div>
</a>
</div>
<div class="img-help">Click image to open expanded view</div>
<div id="img-detail-thumb">
<div class="a-button a-active">
<img id="ctl00_cphMain_rImgTh_ctl01_imgTh" src="https://cdn.appliancepartspros.com/images/product/cache/whirlpool-clutch-assembly-285785-ap3094537_01_tt.jpg" style="border-width:0px;">
</div>
<div class="a-button">
<img id="ctl00_cphMain_rImgTh_ctl02_imgTh" src="https://cdn.appliancepartspros.com/images/product/cache/whirlpool-clutch-assembly-285785-ap3094537_02_tt.jpg" style="border-width:0px;">
</div>
<div class="a-button">
<img id="ctl00_cphMain_rImgTh_ctl03_imgTh" src="https://cdn.appliancepartspros.com/images/product/cache/whirlpool-clutch-assembly-285785-ap3094537_03_tt.jpg" style="border-width:0px;">
</div>
<div class="a-button">
<img id="ctl00_cphMain_rImgTh_ctl04_imgTh" src="https://cdn.appliancepartspros.com/images/product/cache/whirlpool-clutch-assembly-285785-ap3094537_04_tt.jpg" style="border-width:0px;">
</div>
<div class="a-button">
<img id="ctl00_cphMain_rImgTh_ctl05_imgTh" src="https://cdn.appliancepartspros.com/images/product/cache/whirlpool-clutch-assembly-285785-ap3094537_05_tt.jpg" style="border-width:0px;">
</div>
<div class="a-button">
<img id="ctl00_cphMain_rImgTh_ctl06_imgTh" class="diagram" data-dcmt="Clutch assembly AP3094537 is number 5 on this diagram. This is to give you an idea of the appearance and the location of the part. Your appliance model may be slightly different." src="https://483cda5f439700fab03b-6195bc77e724f6265ff507b1dc015ddb.ssl.cf1.rackcdn.com/0029384112_4.gif" style="border-width:0px;">
</div>
<div class="a-button">
<img id="ctl00_cphMain_rImgTh_ctl07_imgTh" class="video" src="https://img.youtube.com/vi/7RS1l6t8efc/hqdefault.jpg" style="border-width:0px;">
<div class="img-overlay"><img src="/images/play.png"></div>
</div>
</div>
</div>
<div class="rpr-help m-chm">
<div class="header">
<h2 class="h6">Repair Help</h2>
</div><!-- /end .header -->
<div class="inner m-bsc">
<ul>
<li>Repair Video</li>
<li>Repair Q&A</li>
</ul>
</div>
<div>
<br>
<span class="h4">Cross Reference Information</span><br>
<p>Part Number 285785 (AP3094537) replaces 2670, 285331, 285380, 285422, 285540, 285761, 285785VP, 3350015, 3350114, 3350115, 3351342, 3351343, 387888, 388948, 388949, 3946794, 3946847, 3951311, 3951312, 62699, 63174, 63765, 64176, AH334641, EA334641, J27-662, LP326, PS334641.
<br>
</p>
</div>
</div>
</div>
Hope this works
response.xpath('//div[#class="col-1-2-2"]//p/text()').extract_first()
You can try this also, response.xpath('(//div[#class="rpr-help m-chm"]//p//text())[1]').get()

How to return data from html tag in scrapy

I need to extract data from html tags have class is class="review card"
My HTML source is
<session class="full-reviews">
<div class="feature-reviews">
<div class="review card"></div>
<div class="review card"></div>
<div class="review card"></div>
</div>
<div class="review card></div>
<div class="review card"></div>
</session>
How I can select only html tag have class review card outside class feature-reviews
One possible way with XPath:
//session[#class="full-reviews"]/div[#class="review card"]

Rails 5 JavaScript pipeline not working

I'm trying to apply a jQuery etalage in my Rails 5 app.
I copied all the assets file to 'app/assets' folder. I removed the CSS and JavaScripts links from html header file and my 'css' working just fine but JavaScripts not working. JavaScripts only works if I add the link in the body section of my 'html.erb' file.
My 'html.erb' file is below: `
<!DOCTYPE html>
<html>
<head>
<title>Pedal House | Single :: w3layouts</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<script type="application/x-javascript"> addEventListener("load", function() { setTimeout(hideURLbar, 0); }, false); function hideURLbar(){ window.scrollTo(0,1); } </script>
<!--webfont-->
<link href='http://fonts.googleapis.com/css?family=Roboto:500,900,100,300,700,400' rel='stylesheet' type='text/css'>
<!--webfont-->
<!-- dropdown -->
<!--js-->
</head>
<body>
<!--banner-->
<script>
$(function () {
$("#slider").responsiveSlides({
auto: false,
nav: true,
speed: 500,
namespace: "callbacks",
pager: true,
});
});
</script>
<div class="banner-bg banner-sec">
<div class="container">
<div class="header">
<div class="logo">
<img src="assets/logo.png" alt=""/>
</div>
<div class="top-nav">
<label class="mobile_menu" for="mobile_menu">
<span>Menu</span>
</label>
<input id="mobile_menu" type="checkbox">
<ul class="nav">
<li class="dropdown1"><a>BIKES</a>
<ul class="dropdown2">
<li>NEW BIKES</li>
<li>VELOCE</li>
<li>TREK</li>
<li>PHOENIX</li>
<li>GIANT</li>
<li>GHOST</li>
<li>BINACHI</li>
<li>CORE</li>
<li>MUSTANG</li>
<li>OTHERS</li>
</ul>
</li>
<li class="dropdown1">KIDS ITEM
</li>
<li class="dropdown1">PARTS
</li>
<li class="dropdown1">ACCESSORIES
</li>
<li class="dropdown1">ABOUT US
</li>
</ul>
</div>
<div class="clearfix"></div>
</div>
</div>
</div>
<!--/banner-->
<div class="product">
<div class="container">
<div class="ctnt-bar cntnt">
<div class="content-bar">
<div class="single-page">
<!--Include the Etalage files-->
<script src="assets/jquery.etalage.min.js"></script>
<script>
jQuery(document).ready(function($){
$('#etalage').etalage({
thumb_image_width: 400,
thumb_image_height: 400,
source_image_width: 800,
source_image_height: 1000,
show_hint: true,
click_callback: function(image_anchor, instance_id){
alert('Callback example:\nYou clicked on an image with the anchor: "'+image_anchor+'"\n(in Etalage instance: "'+instance_id+'")');
}
});
});
</script>
<!--//details-product-slider-->
<div class="details-left-slider">
<div class="grid images_3_of_2">
<ul id="etalage">
<li>
<a href="optionallink.html">
<img class="etalage_thumb_image" src="assets/m1.jpg" class="img-responsive" />
<img class="etalage_source_image" src="assets/m1.jpg" class="img-responsive" title="" />
</a>
</li>
</ul>
</div>
</div>
<div class="details-left-info">
<h3>SCOTT SPARK</h3>
<h4></h4>
<p><label>$</label> 300 </p>
<h5>Description ::</h5>
<p class="desc">The first mechanically-propelled, two-wheeled vehicle may have been built by Kirkpatrick MacMillan, a Scottish blacksmith, in 1839, although the claim is often disputed. He is also associated with the first recorded instance of a cycling traffic offense, when a Glasgow newspaper in 1842 reported an accident in which an anonymous "gentleman from Dumfries-shire... bestride a velocipede... of ingenious design" knocked over a little girl in Glasgow and was fined five
The word bicycle first appeared in English print in The Daily News in 1868, to describe "Bysicles and trysicles" on the "Champs Elysées and Bois de Boulogne.</p>
</div>
<div class="clearfix"></div>
</div>
</div>
</div>
</div>
</div>
<!---->
<div class="footer">
<div class="container wrap">
<div class="logo2">
<p class="copyright">2017 | Developed By Hussain & Zaman</p>
</div>
<div class="ftr-menu">
<ul>
<li>BIKES</li>
<li>KIDS ITEM</li>
<li>PARTS</li>
<li>ACCESSORIES</li>
</ul>
</div>
<div class="clearfix"></div>
</div>
</div>
<!---->
</body>
</html>
My 'application.js' file:
// This is a manifest file that'll be compiled into application.js, which will include all the files
// listed below.
//
// Any JavaScript/Coffee file within this directory, lib/assets/javascripts, or any plugin's
// vendor/assets/javascripts directory can be referenced here using a relative path.
//
// It's not advisable to add code directly here, but if you do, it'll appear at the bottom of the
// compiled file. JavaScript code in this file should be added after the last require_* statement.
//
// Read Sprockets README (https://github.com/rails/sprockets#sprockets-directives) for details
// about supported directives.
//= require jquery
//= require jquery_ujs
//= require jquery.easydropdown
//= require jquery.etalage.min
//= require jquery.min
//= require responsiveslides.min
//= require rails-ujs
//= require turbolinks
//= require_tree .
`
I'm new with rails and I spent a lot of time to fix this problem. I tried so many ways but none worked. Some expert user please help me to fix this issue. Thanks in advance.
Have you added gem 'jquery-rails', '~> 4.3', '>= 4.3.1' to your Gemfile? In Rails 5 jquery does not come by default.

Schema Tag: The property priceSpecification is not recognized by Google

priceSpecification schema is returning error, how to resolve it? Website has classified listings for used cars posted by users.
Here are schema tags (screenshots attached), unable to paste schema tags code.
<div class="pos-rel" itemprop="itemOffered" itemscope itemtype="http://schema.org/Car" >
<h3 itemprop="name">Toyota Vitz F 1.0 for Sale</h3>
<div class="price-details generic-dark-grey mb5 mt10" itemprop="priceSpecification" itemscope itemtype="http://schema.org/UnitPriceSpecification">
<meta itemprop="priceCurrency" content="PKR">
<meta itemprop="price" content="1585000">
<span class='pkr'>PKR</span> 15.9 <span>lacs</span>
</div>
Read the docs, http://schema.org/Car. Go to bottom of page, click on microdata tag of example. Notice http://schema.org/Offer is part of the person schema. You could do it in two sections like their example or use something like this
<!-- Car Details -->
<div id="product" itemprop="itemOffered" itemscope itemtype="http://schema.org/Car">
<h3 itemprop="name">Toyota Vitz F 1.0 for Sale</h3>
<!-- Seller Details -->
<div itemscope itemtype="http://schema.org/Person">
<strong>Contact Name: </strong> <span itemprop="name givenName">Brent</span>
<div itemprop="makesOffer" itemscope itemtype="http://schema.org/Offer" itemref="product">
<span itemprop="priceCurrency" content="PKR">PKR</span>
<span itemprop="price" content="1585000">15.9</span>
</div>
</div>
</div>
Notice code will validate here: https://search.google.com/structured-data/testing-tool

Is there a Microformat for the Hours a Business is open?

I was wondering if there was yet a Microformat for a business's hours of operation.
If not, who do I submit a standard to?
After submitting the same question to the Microformats mailing list, I received a reply from someone named Martin Hepp who apparently has come up with a specification for this.
He provided me with the following links:
The GoodRelations vocabulary provides
a standard way for business hours of
operation, see:
http://www.ebusiness-unibw.org/wiki/Rdfa4google
http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
#
the full spec and other materials are at
http://www.ebusiness-unibw.org/wiki/GoodRelations
This is used e.g. by Bestbuy to expose the opening hours of their 1000k
stores in the US.
Best
Martin
The most widely used markup for opening hours on the Web is GoodRelations.
Here is an example:
<div xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:gr="http://purl.org/goodrelations/v1#"
xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<div about="#store" typeof="gr:LocationOfSalesOrServiceProvisioning">
<div property="rdfs:label" content="Pizzeria La Mamma"></div>
<div rel="vcard:adr">
<div typeof="vcard:Address">
<div property="vcard:country-name" content="Germany"></div>
<div property="vcard:locality" content="Munich"></div>
<div property="vcard:postal-code" content="85577"></div>
<div property="vcard:street-address" content="1234 Main Street"></div>
</div>
</div>
<div property="vcard:tel" content="+33 408 970-6104"></div>
<div rel="foaf:depiction" resource="http://www.pizza-la-mamma.com/image_or_logo.png"></div>
<div rel="vcard:geo">
<div>
<div property="vcard:latitude" content="48.08" datatype="xsd:float"></div>
<div property="vcard:longitude" content="11.64" datatype="xsd:float"></div>
</div>
</div>
<div rel="gr:hasOpeningHoursSpecification">
<div about="#mon_fri" typeof="gr:OpeningHoursSpecification">
<div property="gr:opens" content="08:00:00" datatype="xsd:time"></div>
<div property="gr:closes" content="18:00:00" datatype="xsd:time"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Friday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Thursday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Wednesday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Tuesday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Monday"></div>
</div>
</div>
<div rel="gr:hasOpeningHoursSpecification">
<div about="#sat" typeof="gr:OpeningHoursSpecification">
<div property="gr:opens" content="08:30:00" datatype="xsd:time"></div>
<div property="gr:closes" content="14:00:00" datatype="xsd:time"></div>
<div rel="gr:hasOpeningHoursDayOfWeek" resource="http://purl.org/goodrelations/v1#Saturday"></div>
</div>
</div>
<div rel="foaf:page" resource=""></div>
</div>
</div>
Note that the Microformats suggestion from Ton does not really model that this is an opening hour, so a client cannot do a lot with it. GoodRelations markup is supported by many major companies. For example, BestBuy is using GoodRelations on all of their 1000+ store pages for indicating opening hours.
A HTML micro-format can look like:
<ol class="business_hours">
<li class="monday">Maandag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="tuesday">Dinsdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="wednesday">Woensdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="thursday">Donderdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="friday">Vrijdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="17:00:00+01">18.00</span> uur</li>
<li class="saturday">Zaterdag <span class="dtstart" title="08:00:00+01">9.00</span> - <span class="dtend" title="15:00:00+1">16.00</span> uur</li>
<li>Zondag Gesloten</li>
</ol>
Excuse my Dutch :)
My 2 cents.
Microformat has updated their wiki with a suggested way of implementing Operating Hours based on hCalendar.
http://microformats.org/wiki/operating-hours
See https://schema.org/openingHours
Schema.org is an initiative launched on 2 June 2011 by Bing, Google and Yahoo.
An example:
<strong>Openning Hours:</strong>
<time itemprop="openingHours" datetime="Tu,Th 16:00-20:00">
Tuesdays and Thursdays 4-8pm
</time>
Perhaps http://microformats.org/ may be of use...
If is still useful, you should submit to the microformats community using their wiki: microformats.org.
In this link you have all the existing process to propose a new microformat specification.
Hope that helps.