Hiding complex microdata structures - semantic-web

To embed microdata that should be hidden or isn't provided as text you can use meta elements. Here is an example for non-visible properties using meta elements. Is there a similar way to hide instances of types?
For example I have a page with a table that lists events of a single performer. The performer is implicit and is not repeatedly shown for every entry, so I hide it in a meta element. The performer property should be of the type Person, which has additional attributes that I also want to hide. I'm trying to achieve something like this:
<meta itemprop="performer" itemscope itemtype="http://schema.org/Person">
<meta itemprop="name" content="Some performer"/>
</meta >
Of course this won't work, the meta element must be empty. Using other elements and hiding them with CSS would work but probably isn't very nice for screen readers. Is there any recommended way to do this?

In this case the person scope could be a <span> tag? That tag has no semantic value and if there are only meta tags inside it, it shouldn't be visible in your site.
You could also look into itemref and add the Person only once to the page and reference that id multiple times. However not all testing tools support itemref, so testing if it's correctly set up is quite hard at the moment.

Related

Exclude menu from content extraction with tika

I generate html documents that contain a menu and a content part. Then I want to extract the content of these document to feed it to a lucene index. However, I would like to exclude the menu from the content extraction and thus only index the content.
<div class="menu">my menu goes here</div>
<div class="content">my content goes here</div>
what is the simplest way to achieve this with apache tika?
As a more general solution (not just for you specific menu) I would advise looking at boilerpipe that deals with removing uninteresting parts from pages (menus, navigation etc).
I know it can be integrated in Solr/tika, have a look and you probably can integrate it in your scenario.
Have a look at this post which specifies how to handle DIVs during the HTML parse, by specifying whether they are safe to parse or not, in which case its ignored. For your problem, you could have some logic in the override methods which ignore only DIV elements with attribute value "menu" (i.e. tell TIKA parser this DIV is unsafe to parse).
You can parse the html with a parser to a xhtml dom object an remove the div tag cotaining the attribute class="menu".

How to find XPath of an ExtJS element with dynamic ID

All the elements in the application which i am testing have dynanic ID's . The test always passes when i replay it without refreshing the page but as soon as i refresh the page, The Test Fails because Id's of all the elements changes randomly and selenium cannot match the recorded id's with the new ones .
I tried to use Xpath-position, It works for some objects but in case of Dropdown list and Buttons, it dosent work!
Can anyone please tell me how to find the Xpath (Meathods in JAVA or S*elence*) of an object OR How to create a new Locator Finder for Dropdown list and Buttons
I can show the properties (Inspected by Firebug) of the dropdown which is teasing me.
properties of Dropdown :
<div id="ext-gen1345" class="x-trigger-index-0 x-form-trigger x-form-arrow-trigger x-form-trigger-last x-unselectable" role="button" style="-moz-user-select: none;"></div>
properties of Dropdown*Choice*:
<ul>
<li class="x-boundlist-item" role="option">Rescue</li>
</ul>
Please search before posting, I have been answering this over and over.
ExtJS pages are hard to test, especially on finding elements.
Here are some of the tips I consider useful:
Don't ever use dynamically generated IDs. like (:id, 'ext-gen1345')
Don't ever use absolute/meaningless XPath, like //*[#class='someclass']/li/ul/li[2]/ul/li[2]/table/tbody/tr/td[2]/div
Take advantage of meaningful auto-generated partial ids and class names. (So you need show more HTML in your example, as I can make suggestions.)
For example, this ExtJS grid example: (:css, '.x-grid-view .x-grid-table') would be handy. If there are multiple of grids, try index them or locate the identifiable ancestor, like (:css, '#something-meaningful .x-grid-view .x-grid-table'). In your case, (:css, '#something-meaningful .x-form-arrow-trigger')
Take advantage of button's text.
For example, this ExtJS example: you can use XPath .//li[contains(#class, 'x-boundlist-item') and text()='Rescue']. However, this method is not suitable for CSS Selector or multi-language applications.
The best way to test is to create meaningful class names in the source code. If you don't have the access to the source code, please talk to your manager, using Selenium against ExtJS application should really be a developer's job. ExtJS provides cls and tdCls for custom class names, so you can add cls:'testing-btn-foo' in your source code, and Selenium can get it by (:css, '.x-panel .testing-btn-foo').
Other answers I made on this topic:
How to find ExtJS elements with dynamic id
How to find unique selectors for elements on pages with ExtJS for use with Selenium?
How to click on elements in ExtJS using Selenium?
Using class names in Watir
how to click on checkboxes on a pop-up window which doesn't have name, label
I would suggest you build a xpath from any of the parent of your DIV. you may get messed if there is no immediate parent node has such one.
example,
//*[#id='parentof div']/div
//*[#class='grand parent of div']/div/div
i did even something like this,
//*[#class='someclass']/li/ul/li[2]/ul/li[2]/table/tbody/tr/td[2]/div
But still, its not encouraged to do so.

many buttons with the same id

I am using selenium to test a page with multiple portlets made by liferay.
Every portlet is having a save button with the same id, it use the iframe id of the portlet to differentiate between the buttons.
How can I write a code in selenium that can understand which button I mean??
You need to use driver.switchTo().frame(IFrameElement). Any kind of IFrame you need to switch in/out of.
https://stackoverflow.com/a/9943605/1769273
You can use xpath or css selectors to find children dependent on parents.
paste your html and we can provide examples
Does this mean your portlets are all embedding iframes? Typically portlets just render HTML snippets into the same documents. In this case, your implementation would be considered flawed: Portlets must not use IDs that can conflict. E.g. you should not render
<input type="submit" id="save"/>
but
<input type="submit" id="<portlet:namespace/>save"/>
or similar - make sure the id is unique, as it ends up in the same HTML-DOM which - by specification - assumes ids are unique.
There are other methods to create unique ids, but keep in mind: If you come up with the prefix yourself, per portlet, someone might add the same portlet to the page twice and you can end up with the same id even though all different portlets have unique ids.
If you are indeed rendering many different iframes from your portlets, you can disregard this answer or take it as a suggestion to make better use of the portal environment by changing the implementation.

What HTML5 Tag Should be Used for a "Call to Action" Div?

I am new to HTML5 and am wondering which HTML5 tag should be used on a Call to Action div that sits in a column next to the main content on the home page.
Option 1:
<aside>
//call to action
</aside>
Option 2:
<article>
<section>
//call to action
</section>
</article>
The reason I ask is because I don't see either option as being a perfect fit. Perhaps I am missing something. Thanks!
My HTML for the Call to Action:
<section class="box">
<hgroup>
<h1 class="side">Call Now</h1>
<h2 class="side">To Schedule a Free Pick-Up!</h2>
<ul class="center">
<li>Cleaning</li>
<li>Repair</li>
<li>Appraisals</li>
</ul>
<h3 class="side no-bottom">(781) 729-2213</h3>
<h4 class="side no-top no-bottom">Ask for Bob!</h4>
</hgroup>
<img class="responsive" src="img/satisfaction-guarantee.png" alt="100% Satisfaction Guarantee">
<p class="side">We guarantee you will be thrilled with our services or your money back!</p>
</section>
This is a box on the right column of a three column layout. The content in the large middle column gives a summary of the company's services. If you wanted to use those services, you would have to schedule a pick-up, hence the call to action.
Does anyone object to this use of HTML5, or have a better way?
My take is that the best practices for the new HTML5 structural elements are still being worked out, and the forgiving nature of the new HTML5 economy means that you can establish the conventions that make the most sense for your application.
In my applications, I have separate considerations for markup that reflects the layout of the view (that is, the template that creates the overall consistency from page to page) versus the content itself (usually any function or query results that receive additional markup before being inserted into the various regions in the layout). The distinction matters because the layout element semantics (like header, footer, and aside) don't really help with differentiation of the content during search since that markup is usually repeated from page to page. I particularly favor using the semantic distinctions in HTML5 to describe the content the user is actually searching on. For example I generally use article to wrap the primary content and nav to wrap any associated list of links. Widget wrappers are usually tied to the page layout, so I'd go with the convention of the template for that guideline.
Whenever I have to decide on semantic vs generic names, my general heuristic is:
If there is a possible precedent already in the page template, follow that precedent;
If the element in question is new part of the page layout (vs a content query that is rendered into a region in the layout) and there is no guiding pattern in the template already, div is fine for associating that page layout behavior to;
If the content is created dynamically (that is, anything that gets instanced into the layout at request time--posts, navigation, most widgets), wrap it in a semantic wrapper that best describes what that item is (vs how you think it should appear)
Whenever authoring or generating content, use semantic HTML5 markup as appropriate within that content (hgroup to bracket hierarchical headings, section to organize chunks within the article, etc.). This is future-proof enrichment for search.
According to all this, div would be fine as a wrapper for your widget unless your page template already establishes a different widget wrapper. Also, your use of heading elements for creating large, bold appearance within the widget is using markup for appearance rather than for semantics. Since your particular usage is appearance-motivated, it would be better to use divs or spans with CSS classes that can let you specify sizes, spacing, and other adornments as needed for that non-specific text rather than having to override the browser defaults for the heading elements. I'd save the heading elements for the page heading, for widget headings, and for headings within the primary content region of the page. There can be SEO ranking issues for misuse of headings that are not part of the main content.
I hope these ideas help in your consideration of HTML5 markup usage.
So far as the semantics of the markup go, Don's advice makes sense. (As you said your CTA was visually beside the main content and secondary to that content, I would favor aside, but there's no single correct answer.)
However, you've tagged your question with "seo," so I take it you're interested in the SEO benefits of using the right markup. At this time, Google doesn't give special weight to having nice, semantic markup---they don't care about the difference between things like aside, section, and div. This may be partly because the meaning of these tags is still being defined (by the practice of Web devs), but they even seem to ignore tags that are clearly relevant search results (like nav, which will almost always be irrelevant to a page's description in the search results).
Instead, they heavily favor using microdata for marking up rich semantics. In the short term, marking your page up using the Schema.org WebPage microdata will likely provide greater benefit. You can mark your CTA as a relatedLink or significantLink, and keep it outside the mainContent of the page. If you're looking to optimize your page for search, this is a great way to do it---in my experience, Google very rarely shows text outside of your mainContent block in the search results description.
Proper markup depends on the actual content, which you have not provided.
That said, wrapping everything in a div is fine (although perhaps unnecessary) no matter what your content is as the <div> tag has no semantic value. Your two examples are probably not correct, unless your "call to action" is literally an entire article (which I doubt is the case).
The call to action might occur within an <aside>, but it's not likely that the call to action is the aside itself. Once again, that depends on the content (what it is) and context (where it is in relation to other content).
Typically "call to action" is a link somewhere, so the obvious answer to me is using an anchor, <a>.
It's just a link to another page. Use a div.

RDFa Snippet Generator from GoodRelations

I've created a RDFa snippet to use on a client's website using the GoodRelations tool. The generated code creates the tags as expected, but there's no text between the divs, for instance:
<div typeof="vcard:Address">
<div property="vcard:locality" content="Yorba Linda"></div>
</div>
I'm assuming that this is OK, and that I am expected to put descriptive text for humans between the 'locality' divs without any adverse effects (in relation to SEO.) Correct?
As William says: In most cases, is is impractical to reuse visible content for publishing meta-data, because they differ in sequence or structure. In that case, it is better to put all meta-data in a single block of <div> elements without visible content. This is called "RDFa in Snippet Style", see
http://www.ebusiness-unibw.org/tools/rdf2rdfa/
Hepp, Martin; GarcĂ­a, Roberto; Radinger, Andreas: RDF2RDFa: Turning RDF into Snippets for Copy-and-Paste, Technical Report TR-2009-01, 2009., PDF at http://www.heppnetz.de/files/RDF2RDFa-TR.pdf
Google is consuming such markup, despite a general preference for marking up visible content. Many big shops are using this approach with good results, e.g. http://www.rachaelraystore.com/Product/detail/Rachael-Ray-Stoneware-2-pc-Bubble-Brown-Baker-Set-Eggplant/316398
So if you can integrate the visible content and the RDFa constructs, then use
<div typeof="vcard:Address">
<div property="vcard:locality">Yorba Linda</div>
</div>
If you cannot, then use
<div typeof="vcard:Address">
<div property="vcard:locality" content="Yorba Linda"></div>
</div>
...
<div>
<div>Yorba Linda</div>
</div>
But the divs with invisible content must be close to the visible content and be placed better before than after the visible markup.
From and RDFa point of view, it is fine (I am assuming you are using bracers because you don't know how to escape greater than / less than characters).
The only thing you need to think about is how adding this fragment of HTML to your HTML document, will affect the rendering. Based on the fact that you are using the content attribute, this fragment is destined to remain hidden. So yo should think about this in relation to the CSS architecture. My advice would be to create a specific CSS class that is for annotations.
Having spoken to the author of Good Relations, his advice would be to put this fragment before any other HTML element in the body of your document. Generally, the Rich Snippets team indicate that they ignore hidden RDFa, but it doesn't actually matter and really in the long run it enables the publishing of RDF to anyone (not only Google) who wants to consume it.