Is the <wbr> element semantic HTML? What about in a microdata context? - semantic-web

In short, this is bad web development and UX:
But solving it by using CSS3 word breaking (code & demo) can lead to an 'awkward whitespace' situation, and strange cut-offs — here's an example of both:
Maybe it's not such a big deal, and the UX perspective of it is here, but let's look at the semantics of one of the solutions:
You could ... use the <wbr> element to indicate an optional word
break opportunity. This will tell the browser to insert a line break
as necessary to flow onto a new line inside the container.
The first question: is using <wbr> semantic HTML? (Does it at least degrade gracefully?)
In either case, it seems that being un-semantic in the general sense is a small price to pay for good UX functionality.
However, the second quesiton is about the big picture:
Are there any (microdata/RFDa) ramifications to consider when using <wbr> to split up an email address? Will it still be valid there?

The wbr element is defined in the HTML5 spec. So it's fine to use it. If it's used right (= according to the definition in the spec), you may call it also "semantic use".
I don't think that there would be any problems in combination with micordata/RDFa. Usually you'd provide the URL in an attribute anyway, which can't contain wbr elements of course: foo<wbr>#example<wbr>.com.
For element content I'd guess (didn't check though) that microdata/RDFa parsers should use the text content without markup resp. understand what is markup and what is text, otherwise e.g. a FOAF name would be <abbr>Dr.</abbr> Foo instead of Dr. Foo.
So you can bet that microdata/RDFa parsers know HTML ;), and therefor it shouldn't be a problem to use its elements.


When Relative Xpath Fail? Selenium locator Xpath is reliable?

I'm doing automated tests. And I use SelectorHub to find elements in a website. In some cases I get very long Relative Xpath as you see below:
As I understood it correctly that it will fail if the website changes change in the future because it has too many "DIV". Why then is it said that relative Xpath is reliable? I could not create a shorter path manually to find a reliable path.
Any XPath that works today on a particular HTML page H1 may or may not produce the same result when applied (in the future) to a different HTML page H2. If you want it to have the best chance of returning the same result, then you want to minimise its dependencies, and in particular, you want to avoid having dependencies on the properties of H1 that are most likely to change. That, of course, is entirely subjective. It can be said that the longer your path expression is, the more dependencies it has (that is, the greater the number of changes that might cause it to break). But that's not universally true: the expression (//*)[842] is probably the shortest XPath expression to locate a particular element, but it's also highly fragile: it's likely to break if the HTML changes. Expressions using id attributes (such as //p[#id='Introduction'] are often considered reasonably stable, but they break too if the id values change.
The bottom line is that this is entirely subjective. Writing XPath expressions that are resilient to change in the HTML content is an art, not a science. It can only be done by reading the mind of the person who designed the HTML page.

How to style parts of i18n messages when using thymeleaf

I'm not sure this is the right place to ask this. I would like to know how best to style parts of messages from l10n properties files. For example, my client want this message and formatting in a help window:
This is a self-assessment and comparison application.
Simplest solution would be to include the HTML tags in the entry for this label. The problem with that is that the 40 translators that will process the are bound to make mistakes like deleting the <, translating the attributes or styles of the HTML markup etc. Also it makes maintaining the markup and styling difficult for the devs.
Any better way to do this?
The solution I've seen typically done just uses th:utext with HTML tags in the .properties files. I would opine it does create a maintenance hassle as you mention and should be kept to a minimum.
One workaround is to create separate strings in some cases, like:
<span th:text=#{thisIsA}>This is a </span><strong><span th:text="#{selfAssessment}">self-assessment</span></strong>
However, this is error-prone since certain languages may change the order of the words. So that's not a great option.
If the HTML tags specifically are an issue, another way albeit somewhat ugly could be:
thisIsASelfAssessment=This is a {0}self-assessment{1}.
Or even
thisIsA=This is a {0}.
But that might be confusing for the next developer reading it and may introduce the same issue you have with the 40 translators looking at it since you have curly braces. It also all becomes very tedious and generates more lines.
So in the end, you're likely best going with the simplest solution of utext.
Project-wise, you could have the initial translation done without the markup and add the markup in after they are done with a first pass at translating it. The issue may arise in the future when you need to change strings, but doing this would minimize some headache. It could make sense to keep these strings in a separate block in the .properties file so you can target them later.
Good question as I've had this issue myself.

Losing Aria/accessibility when converting from HTML to PDF

I am using ABCpdf to generate a collection of PDFs from HTML markup, and am struggling with making it fully accessible.
The HTML pages include several graphs which are created by CSS, and which are completely ignored by the screenreader.
I have tried using aria-label to give a written explanation of the graphs, but it is lost in the conversion. I have tried configuring the Gecko engine within ABCpdf in numerous ways, including scaling back security options, altering markup options, and adding special tags to explicitly include an element. The PDF is tagged and is rated as fully accessible by our evaluation program.
I haven't been able to find a way to include "hidden" text in the PDF for the purpose of screenreaders. Any help is appreciated!
EDIT: Due to security concerns, I am unable to display the actual data behind the graphs. Manual steps are also not an option due to the sheer number of generated PDFs, and a short timeline.
HTML-to-PDF conversion utilities are usually pretty basic and typically don't handle complex CSS very well at all. You may be better off taking a screen capture and then using alt-text to describe the intent of the graph. Sometimes the simplest approach is the most reliable.
Another way of approaching the issue would be to present the complete data set to users via a data table. That way, they can "see" everything contained in the graph, and it won't matter if the graph itself is inaccessible. If placing a giant data table in the middle of your document doesn't fit with your formatting, you can also include the data set in an appendix with a note or hyperlink in the text directing readers where they can go to access the entirety of information.

Xpath selector difference: pros and cons

I have two xpath selectors that find exactly the same element, but I wonder which one is better from code, speed of execution, readability points of view etc.
First Xpath :
//*[#id="some_id"]/table/tbody/tr[td[contains(., "Stuff_01")]]//ancestor-or-self::td/input[#value="Stuff_02"]
Second Xpath:
The argument for example is that if the code of the page will be changed and for example some "tbody" will be moved that the first one won't work, is it true ?
So any way which variant of the code is better and why ?
I would appreciate an elaborate answer, because this is crucial to the workflow.
It is possible that neither XPath is ideal. Seeing the targeted HTML and a description of the selection goal would be needed to decide or to offer another alternative.
Also, as with all performance matters, measure first.
That said, performance is unlikely to matter, especially if you use an #id or other anchor point to hone in on a reduced subtree before further restraining the selection space.
For example, if there's only one elem with id of 1234 in the document, by using //elem[#id="1234"]/rest-of-xpath, you've eliminated the rest of the document as a performance/readability/robustness concern. As long as the subtree below elem is relatively tame (and it usually will be), you'll be fine regarding those concerns.
Also, yes, table//td is a fine way to abstract over whether tbody is present or not.

What does the word "semantic" mean in Computer Science context?

I keep coming across the use of this word and I never understand its use or the meaning being conveyed.
Phrases like...
"add semantics for those who read"
"HTML5 semantics"
"semantic web"
"semantically correctly way to..."
... confuse me and I'm not just referring to the web. Is the word just another way to say "grammar" or "syntax"?
Semantics are the meaning of various elements in the program (or whatever).
For example, let's look at this code:
int width, numberOfChildren;
Both of these variables are integers. From the compiler's point of view, they are exactly the same. However, judging by the names, one is the width of something, while the other is a count of some other things.
numberOfChildren = width;
Syntactically, this is 100% okay, since you can assign integers to each other. However, semantically, this is totally wrong, since the width and the number of children (probably) don't have any relationship. In this case, we'd say that this is semantically incorrect, even if the compiler permits it.
Syntax is structure. Semantics is meaning. Each different context will give a different shade of meaning to the term.
HTML 5, for example, has new tags that are meant to provide meaning to the data that is wrapped in the tags. The <aside> tag conveys that the data contained within is tangentially-related to the information around itself. See, it is meaning, not markup.
Take a look at this list of HTML 5's new semantic tags. Contrast them against the older and more familiar HTML tags like <b>, <em>, <pre>, <h1>. Each one of those will affect the appearance of HTML content as rendered in a browser, but they can't tell us why. They contain no information of meaning.
The word ‘semantic ‘as an adjective simply means ‘meaningful’ which is very related to the word 'high level' in computer science.
For instances:
Semantic data model:
a data model that is semantic, that is meaningful and understood by anyone regardless of his background or expertise.
C++ is less semantic than Java, because Java uses meaningful words for its classes, methods and fields.
HTML5 semantics: refer to the tags that describe themselves such , , and so on.
It means "meaning", what you've got left when you've already accounted for syntax and grammar. For example, in C++ i++; is defined by the grammar as a valid statement, but says nothing about what it does. "Increment i by one" is semantics.
HTML5 semantics is what a well-formed HTML5 description is supposed to put on the page. "Semantic web" is, generally, a web where links and searches are on meaning, not words. The semantically correct way to do something is how to do it so it means the right thing.
It is not just Computer Science terminology, and if you ask,
What is the meaning behind this Computer Science lingo?
then I'm afraid we'll get in a recursive loop just like this.
In the HTML world, "semantic" is used to talk about the meaning of tags, rather than just considering how the output looks. For example, it's common to italicize foreign words, and it's also common to italicize emphasized words. You could simply wrap all foreign or emphasized words in <i>..</i> tags, but that only describes how they look, it doesn't describe why they look that way.
A better tag to use for emphasized word is <em>..</em>, because it conveys the semantics of emphasis. The browser (or your stylesheet) can then render them in italics, and other consumers of the page will know the word is emphasized. For example, a screen-reader could properly read it as an emphasized word.
From my view, it's almost like looking at syntax in a grammatical way. I can't speak to semantics in a broad term, but When people talk about semantics on the web, they are normally referring to the idea that if you stripped away all of the css and javascript etc; what was left (the bare bones html) would make sense to be read.
It also takes into account using the correct tags for correct markup. This stems from the old table-based layouts (tables should only be used for tabular data), and using lists to present list-like content.
You wouldn't use an h1 for something that was less important than an h2. That wouldn't make sense.
The below is syntactically different but semantically the same:
C, C++, C#, Java, JavaScript, Python, Ruby, etc.
x += y
Perl, PHP
$x += $y