dompdf absolte positioned elemnts pagebreak - css-position

I have a dynamically generated page made up of some divs and tables and other elements inside those divs, all absolutely positioned. The lower divs can potentially have more contents in them, like comments/description, so they'll be longer then a page.
The problem is dompdf doesn't insert a page break, it just spans till end of first page and the rest of my html gets cut off...
Obviously page-break-before/after allways is not going to work since the content is dynamic, it may or may not span multiple pages depending on every entry.
Does anyone know of a simpler way to make it behave, apart from measuring content height, inserting page breaks with JS before html is generated and sent over to dompdf?

The issue was a large div containing all elements with in the body. Dompdf is not able to comprehend div to break into pieces. Once that was gone it arranged separate elements within the body tag just fine.

The reason is that dompdf doesn't insert an auto page break in elements with position:absolute. Here's the bug report: Auto page break in "position:absolute" elements not working
So without knowing the details of your CSS, I would say that the problem is not about "breaking the div into pieces" but about treating absolute as literally absolute on this very page.

Related

bootstrap collapse not collapsing after second click

i used twitter bootstrap collapse and the problem is that after the second click on the a tag (to open and hide the div) the div is not hidden.
i see that in the first time i click on a tag the 'in' class disappear and after that is staying (what makes the div appear).
i tried to copy the exact code from bootstrap site (without any changes) and it keeps going.
i read some post in stackoverflow that say the problem came from includes more than 1 bootstrap js or css (i check and it's notmy case).
there is something that blocking the collapse.
http://twitteer_D.com/javascript/#collapse/Twitteer_D used twitter bootstrap collapse and the problem is that after the second click on the a tag (to open and hide the div) the div is not hidden. i see that in the first time i click on a tag the 'in' class disappear and after that is staying (what makes the div appear). i tried to copy the exact code from bootstrap site (without any changes) and it keeps going. i read some post in stackoverflow that say the problem came from includes more than 1 bootstrap js or css (i check and it's notmy case). there is something that blocking the collapse.

Exclude menu from content extraction with tika

I generate html documents that contain a menu and a content part. Then I want to extract the content of these document to feed it to a lucene index. However, I would like to exclude the menu from the content extraction and thus only index the content.
<div class="menu">my menu goes here</div>
<div class="content">my content goes here</div>
what is the simplest way to achieve this with apache tika?
As a more general solution (not just for you specific menu) I would advise looking at boilerpipe that deals with removing uninteresting parts from pages (menus, navigation etc).
I know it can be integrated in Solr/tika, have a look and you probably can integrate it in your scenario.
Have a look at this post which specifies how to handle DIVs during the HTML parse, by specifying whether they are safe to parse or not, in which case its ignored. For your problem, you could have some logic in the override methods which ignore only DIV elements with attribute value "menu" (i.e. tell TIKA parser this DIV is unsafe to parse).
You can parse the html with a parser to a xhtml dom object an remove the div tag cotaining the attribute class="menu".

What HTML5 Tag Should be Used for a "Call to Action" Div?

I am new to HTML5 and am wondering which HTML5 tag should be used on a Call to Action div that sits in a column next to the main content on the home page.
Option 1:
<aside>
//call to action
</aside>
Option 2:
<article>
<section>
//call to action
</section>
</article>
The reason I ask is because I don't see either option as being a perfect fit. Perhaps I am missing something. Thanks!
My HTML for the Call to Action:
<section class="box">
<hgroup>
<h1 class="side">Call Now</h1>
<h2 class="side">To Schedule a Free Pick-Up!</h2>
<ul class="center">
<li>Cleaning</li>
<li>Repair</li>
<li>Appraisals</li>
</ul>
<h3 class="side no-bottom">(781) 729-2213</h3>
<h4 class="side no-top no-bottom">Ask for Bob!</h4>
</hgroup>
<img class="responsive" src="img/satisfaction-guarantee.png" alt="100% Satisfaction Guarantee">
<p class="side">We guarantee you will be thrilled with our services or your money back!</p>
</section>
This is a box on the right column of a three column layout. The content in the large middle column gives a summary of the company's services. If you wanted to use those services, you would have to schedule a pick-up, hence the call to action.
Does anyone object to this use of HTML5, or have a better way?
My take is that the best practices for the new HTML5 structural elements are still being worked out, and the forgiving nature of the new HTML5 economy means that you can establish the conventions that make the most sense for your application.
In my applications, I have separate considerations for markup that reflects the layout of the view (that is, the template that creates the overall consistency from page to page) versus the content itself (usually any function or query results that receive additional markup before being inserted into the various regions in the layout). The distinction matters because the layout element semantics (like header, footer, and aside) don't really help with differentiation of the content during search since that markup is usually repeated from page to page. I particularly favor using the semantic distinctions in HTML5 to describe the content the user is actually searching on. For example I generally use article to wrap the primary content and nav to wrap any associated list of links. Widget wrappers are usually tied to the page layout, so I'd go with the convention of the template for that guideline.
Whenever I have to decide on semantic vs generic names, my general heuristic is:
If there is a possible precedent already in the page template, follow that precedent;
If the element in question is new part of the page layout (vs a content query that is rendered into a region in the layout) and there is no guiding pattern in the template already, div is fine for associating that page layout behavior to;
If the content is created dynamically (that is, anything that gets instanced into the layout at request time--posts, navigation, most widgets), wrap it in a semantic wrapper that best describes what that item is (vs how you think it should appear)
Whenever authoring or generating content, use semantic HTML5 markup as appropriate within that content (hgroup to bracket hierarchical headings, section to organize chunks within the article, etc.). This is future-proof enrichment for search.
According to all this, div would be fine as a wrapper for your widget unless your page template already establishes a different widget wrapper. Also, your use of heading elements for creating large, bold appearance within the widget is using markup for appearance rather than for semantics. Since your particular usage is appearance-motivated, it would be better to use divs or spans with CSS classes that can let you specify sizes, spacing, and other adornments as needed for that non-specific text rather than having to override the browser defaults for the heading elements. I'd save the heading elements for the page heading, for widget headings, and for headings within the primary content region of the page. There can be SEO ranking issues for misuse of headings that are not part of the main content.
I hope these ideas help in your consideration of HTML5 markup usage.
So far as the semantics of the markup go, Don's advice makes sense. (As you said your CTA was visually beside the main content and secondary to that content, I would favor aside, but there's no single correct answer.)
However, you've tagged your question with "seo," so I take it you're interested in the SEO benefits of using the right markup. At this time, Google doesn't give special weight to having nice, semantic markup---they don't care about the difference between things like aside, section, and div. This may be partly because the meaning of these tags is still being defined (by the practice of Web devs), but they even seem to ignore tags that are clearly relevant search results (like nav, which will almost always be irrelevant to a page's description in the search results).
Instead, they heavily favor using microdata for marking up rich semantics. In the short term, marking your page up using the Schema.org WebPage microdata will likely provide greater benefit. You can mark your CTA as a relatedLink or significantLink, and keep it outside the mainContent of the page. If you're looking to optimize your page for search, this is a great way to do it---in my experience, Google very rarely shows text outside of your mainContent block in the search results description.
Proper markup depends on the actual content, which you have not provided.
That said, wrapping everything in a div is fine (although perhaps unnecessary) no matter what your content is as the <div> tag has no semantic value. Your two examples are probably not correct, unless your "call to action" is literally an entire article (which I doubt is the case).
The call to action might occur within an <aside>, but it's not likely that the call to action is the aside itself. Once again, that depends on the content (what it is) and context (where it is in relation to other content).
Typically "call to action" is a link somewhere, so the obvious answer to me is using an anchor, <a>.
It's just a link to another page. Use a div.

Placing two inline <div>s side by side in DTCoreText in objective C?

I have used DTCoreText (through OCPDFGenerator) in objective C for converting HTML to PDF. Everything is working fine except placing two divs side by side.
(I cannot use tables as DTCoreText does not support table rendering as of now -
https://github.com/Cocoanetics/DTCoreText/issues/144 )
EX:
Left Right
There seems no way to do so as no matter what attributes we pass in style , they always clutter together as internally it is merged as single text.
Code that I am using is -
<div style='float:left;position:relative;width:100%;'>
<span style='position:absolute;top:0;left:0;'>Left</span>
<span style='position:absolute;top:0;right:0;'>Right</span>
</div>
Output in rendered pdf file is (though html is correctly rendered on web page, float:left works too in similar way, so correctness of HTML is not a doubt) -
LeftRight
This issue extends for maintaining many such pre-defined spaced text tabs side by side.
A workaround solution was to append spaces to first text string till it matches the passed width , but this gives an issue in case of multi-line text, so I could not go ahead with it.
You probably should create another NSAttributedString and place it where you want. It seems like DTCoreText doesn't support multiple column layouts so you have to do this using objective-c (not html).

Alter Rendered Page in Webbrowser Control

is there a way to alter the rendered HTML page in webbrowser control? What i need is to alter the rendered HTML Page in my webbrowser control to highlight selected text.
What i did is use a webclient and use the webclient.Downloadstring() to get the source code of the page, Highlight specific text then write it again in webbrowser. problm is, images along with that page does not appear since they are rendered as relative path.
Is there a way to solve this problem? Is there a way to detect images in a webbrowser control?
Not sure why you need to change the HTML to lighlight text, why not use IHighlightRenderingServices?
To specify a base url when loading HTML string you need to use the document's IPersistMoniker interface and specify a url in your IMoniker implementation.
I suggest you do it a different way, download and replace the text using the webbrowser control, this way your links will work. All you do is replace whatever is in the Search TextBox with the following, say the search term is "hello", then you replace all occurances of hello with the following:
<font color="yellow">hello</font>
Of course, this HTML can be replaced with the SPAN tag (which is an inline version of the DIV tag, so your lines wont break using SPAN, but will using DIV). But in either case, both these tags have a style attribute, where you can use CSS to change its color or a zillion other properties that are CSS compatible, like follows:
<SPAN style="background-color: yellow;">hello</SPAN>
Of course, there are a zillion other ways to change color using HTML, feel free to search the web for more if you want.
Now, you can use the .Replace() function in dotnet to do this (replace the searched text), it's very easy. So, you can Get the Whole document as a string using .DocumentText, and once all occurances are replaced (using .Replace()), you can set it back to .DocumentText (so, you're using .DocumentText to get the original string, and setting .DocumentText with the replaced string). Of course, you probably don't want to do this to items inside the actual HTML, so you can just loop through all the elements on the page by doing a For Each loop over all elements like below:
For Each someElement as HTMLElement in WebBrowser1.Document.All
And each element will have a .InnerText/.InnerHTML and .OuterText/.OuterHTML that you can Get (read from) and Set (overwrite with replaced text).
Of course, for your needs, you'd probably just want to be replacing and overwriting the .InnerText and/or the .OuterText.
If you need more help, let me know. In either case, i'd like to know how it worked out for you anyway, or if there is anything more any of us can do to add value to your problem. Cheers.