List of docx4j supported XHTML tags - docx4j

Is there a list to show what XHTML tags and CSS attributes are supported by docx4j XHTML importer?
Thanks.

*Disclosure: I wrote the docx4j code in question *
There is no definitive list. However, there is support for at least:
p, div, li
table
h1 to h3
table
img
span
a
br
There is no support for font color right now, nor u (underline).
Support is a 2 phase affair:
Flying Saucer (XHTMLRenderer) needs to support it
FS supports pretty much all of CSS 2.1; see What_XHTML CSS_features_does_Flying_Saucer_currently_support
docx4j needs to convert the relevant Flying Saucer construct to WordML
If in doubt, just try it on the XHTML of interest to you.

Related

Are there any differences between this 4 html elements?

Can I clarify that these 4 elements below actually do the same job by rendering the text in italic and there is no difference in using each of them except to differentiate the type of content?
<i>
<em>
<address>
<cite>
Yes, you can. Do this:
<html>
<body>
<i>italic text</i>
<em>italic text</em>
<address>italic text</address>
<cite>italic text</cite>
</body>
</html>
Put this into a file called <filename>.html and open it in a browser (e.g. Chrome). If the text looks the same, it looks the same!
As you can see, the <i> and <em> elements do not make a newline automatically, but otherwise there is no difference. If you'd like to change the styles yourself, you can create a css file.
You right, these 4 Elements basically renders all text or content as italic, the different off all these element just at the content.
The <i> tag defines a part of text in an alternate voice or mood.
The <em> tag is used to define emphasized text.
The <address> tag defines the contact information for the author/owner of a document or an article.
The <cite> tag defines the title of a creative work (e.g. a book, a poem, a song, a movie, a painting, a sculpture, etc.).
Yes, they all have the same default style which sets the content to italic.
And yes, a particular tag will be chosen according to the content.
Reference: For more information refer to below links:
What is the difference between <cite>, <em>, and <i> tags of HTML?
What's the difference between <b> and <strong>, <i> and <em>?

Custom live template does not show up in xhtml JSF file

I am using IntelliJ 2017.2 ultimate edition and I have created a live template to surround text in xhtml (JSF) by a tag and attribute.
#{myvalue}
should become
<h:outputText value="#{myvalue}" />
I created a live template named "swot" applicable to in XML : XML Text.
<h:outputText value="$SELECTION$" />
Unfortunately, it does not show up when I try to surround a selection in a xhtml file. I only get standard live templates.
Update with solution
As the file in which I want to applicate the template is JSF xhtml file, I just had to applicate it to "JSP"
As the file in which I want to applicate the template is JSF xhtml file, I just had to applicate it to "JSP"
Setting the applicability to HTML should solve your issue. Please see below image:

converting HTML with custom css fonts(#face-fonts) to image

I need to convert html with custom css fonts(#face-fonts) to image.I have evaluated html2image and flying-saucer but both fail to convert custom css font.
Below is the css font in html.
#font-face{font-family:'FuturaBT';src:url(../fonts/FuturaBT.eot?#iefix) format("embedded-opentype")
flying saucer requires ttf. I am not sure if eot is supported.

Is it possible to use Selenium WebDriver for xHtml Web Pages?

I want to select a portion of text in a WebPage. When I check for the xPath for the portion of text. I found this is a xHtml page. So Can we use the Selenium WebDriver for this.
According to https://groups.google.com/forum/#!topic/selenium-developers/U6A5FktzLMQ it is only partially supported. The problem is with the namespaces that you can define in the xhtml. it's a good discussion thought it might be outdated.

Should I use <![CDATA[...]]> in HTML5?

I'm pretty sure <![CDATA[...]]> sections can be used in XHTML5, but what about HTML5?
The CDATA structure isn't really for HTML at all, it's for XML.
People sometimes use them in XHTML inside script tags because it removes the need for them to escape <, > and & characters. It's unnecessary in HTML though, since script tags in HTML are already parsed like CDATA sections.
Edit: This is where we open that really mouldy old can of worms from 2002 over whether you're sending XHTML as text/html or as application/xhtml+xml like you’re “supposed” to :-)
From the same page #pst linked to:
Element-specific parsing for script and style tags, Guidance for XHTML-HTML compatibility: "The following code with escaping can ensure script and style elements will work in both XHTML and HTML, including older browsers."
Maximum backwards compatibility:
<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>
Simpler version, sort of incompatible with "much older browsers":
<script>//<![CDATA[
...
//]]></script>
So, CDATA can be used in HTML5, and it's recommended in the official Guidance for XHTML-HTML compatibility.
This useful for polyglot HTML/XML/XHTML pages, which are served as strict application/xml XML during development, but served as text/html HTML5 in production mode for better cross-browser compatibility. Polyglot pages have their benefits; I've used this myself, as it's much easier to debug XML/XHTML5. Google Chrome, for example, will throw an error for invalid XML/XHTML5 (including for example character escaping), whereas the same page served as HTML5 will "just work" also known as "probably work".
The spec seems to clear up this issue. script and style tags are considered to be "raw text elements." CDATA is not needed or allowed for them. CDATA is only used with "foreign content" - i.e. MathML and SVG. Note that there are some restrictions to what can go in the script tag -- basically you can't put something like var x = '</script>' in there because it will close the tag and needs to be split like pst noted in his answer. http://www.w3.org/TR/html5/syntax.html#cdata-rcdata-restrictions
HTML5-supporting browsers (and most older browsers going all the way back to 2001) already read the content inside <style> and <script> tags as CDATA (character data). That means you generally do not need to add CDATA tags inside those elements for most HTML browsers built the past 20 years as they will parse any special characters ok that might popup when adding CSS and JavaScript code between them.
However...you do need to add the CDATA block inside <style> and <script> HTML5 tags if you want your HTML5 page to be compatible with XHTML and XML browsers and parsers, which do need CDATA tags. For that reason, I do recommend you use CDATA in HTML5 <style> and <script> tags, but please read on. If you do not do this right, you will break your website!
Note: The CDATA tag helps XML parsers ignore special characters that might popup in between those elements, which are part of XML elements, and therefore which would break the markup (like using < or > characters, for example). Only the <style> and <script> in modern HTML parses have this special feature already built in. That simply means in HTML browsers and parsers they are designed to ignore those weird characters, or rather not read or parse them, as part of the markup. If they did not have built in CDATA properties, your web page, styles, and scripts could break!
XML and XHTML parsers will read the <style> and <script> tag content as they do all HTML elements, as PCDATA (i.e. a normal HTML element), meaning the contents are parsed as markup and potentially break with special characters added in between those tags. You can add special CDATA sections between those two tags to support it. Because XML and XHTML parsers reads everything inside elements as potentially more markup, adding CDATA prevents certain characters from being interpreted as XML or other types of character references.
The problem is, most HTML4/HTML5 browsers and parsers don't support adding additional CDATA sections between those tags, so CDATA blocks have to be commented out for those agents if you add them for XHTML/XML support.
Also, note that all HTML comments (<!-- or -->) added inside those tags are ignored by HTML parsers, but implemented by XHTML ones, commenting out CSS and JavaScript for XHTML, when added. Many people in the past would add comment rules between those tags to hide styles and scripts from very old browsers that normally would not understand CSS or Javascript (pre-1998 browsers). But that strategy failed in XHTML without additional code.
So how do you combine all that inside <style> and <script> tags, and should you care?
I am a purist and like my HTML5 content to still be XML/XHTML-friendly, regardless of what markup recommendation I am using. I also like my pages to work in browsers that know CSS and older browsers that do not. So here are two solutions to support all those scenarios and still display your styles and scripts in modern browsers without error. They are totally safe to use in modern HTML5 browsers:
STYLE
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
...put your styles here
/*]]>*/-->
</style>
SCRIPT
<script type="text/javascript">
<!--//--><![CDATA[//><!--
...put your scripts here
//--><!]]>
</script>
ADDITIONAL NOTES
These two code blocks do not change anything in modern HTML5 browsers.
These two code blocks will allow your CSS and JavaScript to work normally as before in HTML5 browsers but hide CSS and JavaScript from very old browsers (pre-2001) that do not support those technologies.
XHTML browsers will now parse your CSS and JavaScript as before but not allow special characters like <, >, and & to be interpreted as markup or entities/escaped characters which would generate parsing errors. They are CDATA now.
XML parsers of your page will not understand your CSS and JavaScript, of course, but will accept any type of text you add in there and not try and parse them as markup. They are CDATA now.
HOW THE EXAMPLES WORK
For modern HTML5-supporting browsers, because script and style elements act like CDATA, all markup is ignored and treated as characters. So comment markers <!-- and --> inside script and style tags are ignored. Older browsers (pre-2001) that do not know scripts or CSS do not treat script and style elements as CDATA-supporting elements. They will recognize the HTML comment tags so will comment out all the CSS and JavaScript between them. Note that some browsers do know CSS and scripts but also read the HTML comments, so we close out the first comment (<!--/*-->), then they are forced to read the <![CDATA[/*> block (used for XHTML and XML parsers) which to them becomes an empty unknown element to these browsers and so is ignored. The HTML comment that follows last in the block is designed to hide all the CSS and scripts from there to the end of the block. The final <!]]> is another ignored empty element that closes the unknown CDATA markup tag for those that still read it.
For XHTML, these parsers read all the code inside these tags as HTML. They also need a CDATA element wrapped around all CSS and JavaScript in the block so they act like HTML5 browsers. They do not read the content inside these elements as CDATA yet. XHTML parsers will read the HTML comment tags, but knowing CSS and JavaScript comments as well, they will be ended early. The <![CDATA[ element is then read and starts the CDATA block for them, as it is a known HTML element in the XHTML W3C recommendations. It then wraps around all styles and scripts inside the tags till ]]> ends it, creating a true CDATA wrapper that now hides XML characters correctly for them. Everything inside the CDATA block is interpreted like HTML5 parsers do now - as normal CSS and scripts - but to the XHTML parser no longer recognize HTML markup inside them. Because old and new XHTML browsers all know CSS and JavaScript, they still parse and process that code correctly now, ignoring XML reserved characters now.
XML parsers know HTML comments but not CSS and JavaScript comments, so those parsers would hide everything between the comment tags. Since they do not need to know or parse CSS and JavaScript code, it is not needed.
Your HTML5 page is fully cross-compatible with modern HTML5 and XHTML5 browsers, older HTML/XHTML browsers, very old 1990's non-supporting CSS/script browsers, and various XML parsers, old and new! Enjoy
Perhaps see: http://wiki.whatwg.org/wiki/HTML_vs._XHTML
<![CDATA[...]]> is a a bogus comment.
In HTML, <script> is already protected -- this is why sometimes it must be written as a = "<" + "/script>", to avoid confusing the browser. Note that the code is valid outside a CDATA in HTML.