What is the naming convention used when mapping a pdf template within nested repeat fields? - pdf

According to the instructions on: http://wiki.orbeon.com/forms/doc/user-guide/form-builder-user-guide/pdf-generation#TOC-Multi-line-text
It states to map fields from orbeon forms to pdf templates (my-section$my-field). This works perfectly. When using repeats we researched on how this naming convention changes slightly to include a "$1"... (ie: my-section$my-field$1, my-section$my-field$2).
We have a form that has a nested repeat and have tried to apply the same naming convention and it has not worked. We have also experimented with other combinations to no success.
Please let us know how to map fields from nested repeats to pdf templates.
Thanks!

The syntax you mention, using $1, $2… at the end of the field id, looks correct to me. You can see this used in the DMV 14 example; you'll find it linked from the examples home page, and you can look at the source for that example on GitHub. If you open the PDF with Acrobat, you can see the ids, which follow the syntax you mentioned:

Related

How to Implement org.eclipse.wst.sse.ui.semanticHighlighting

I'm trying to implement content-assist and some custom highlighting as a plugin for Eclipse, after a lot of research I found this eclipse document.
I got content-assist working for XML documents, the problem is the part about SemanticHighlighting, I didn't find any information about how to implement this extension-point and I'm a bit lost. The only info that I found is the XSD for the extension point.
I'm trying to make some customs expressions on XML get a different color Ex:
<span>%%colored_text%%</span>
Where can I get more information about this org.eclipse.wst.sse.ui.semanticHighlighting and how to implement it?
I don't think there's a lot of a documentation on the semantic highlighting for SSE. The document that you found is a little light on details. For an example, the XSL project implemented semantic highlighting using the extension point.
The basic idea behind the semantic highlighting extension point is that when a change occurs, implementors will be asked if it can 'consume' a region of the document. If it can, it will return an array of Positions that can be highlighted by that particular highlighter. It can apply only one style, so it ends up being very specific. For example, you wouldn't be able to say 'color this part of the region blue and this other part of the region red'. You would need two separate highlighters to accomplish that.
The highlighter obtains style information for the highlight by using a preference store that you return from getPreferenceStore(). You'll then need to set up keys that the highlighter will use to look up styles from that preference store. If you use the styleStringKey on the extension point, the only key of importance from the semantic highlighting implementation is the one returned from getEnabledPreferenceKey(). This is kind of the condensed way to declare a style, as it only takes 2 preferences to get going. The semantic highlighting framework knows how the parse the string value returned by the preference store for the styleStringKey into the appropriate style components. Just follow the format as defined on the New Help for Old Friends document that you linked to.
Now, if you want to keep all the style components separate, the other get*PreferenceKey() methods become important. You'll have to define keys for each of them, and then add default values for each of those keys in your preference initializer.
org.eclipse.wst.xsl.ui.internal.preferences.XSLUIPreferenceInitializer has examples of both ways to define these style defaults.

Mechanical Turk: Categorization project via Request UI diffictulties

I am a newbie in MTurk, and I am trying to create a very simple Categorization Project via their Requester UI (rather then the API).
Each batch I use has 10 items (question and possible answer). I have searched their documentation and forums with not help and so I have several questions:
When i use their Standard Categorization template, I have no option for modifying the HTML and layout (as shown for "Tagging of an image" project). the only formatting options are for the categories, instructions and includes/excludes. Is there a way to edit the HTML of the standard template they provide?
In the Standard Categorization template, while my input data file (csv file) contains 10 items, only 5 are shown (tried with 6, still only 5 are displayed in the preview). Is there a way to change this limitation?
When I try to use the "Create HITs Individually" (rather than the standard template, as explained above), I have the "Design Layout" options, but I cannot find a way to make the questions in the "form" required (which is possible via the API). Is there a way to achieve this?
If you stick to the standard project templates, you can't modify them. That's the reason to create HITs individually (through the RUI or via the API).
You'll have to show us your CSV file, because it's not really clear from your description what the issues could be.
Your third question is unclear, but basically for creating HITs individually, you simply do standard HTML markup and put in ${variablename} placeholders wherever you want one of your CSV upload variables to be placed.
If your project is at all large, I would definitely recommend going through the API. It's simply much more flexible than the RUI for creating any kind of customized design.

Extracting the actual in-text title from a PDF

There seems to be a lot of questions about extracting a title from a PDF (using its metadata). However, the large majority of the titles do not seem to exist in the metadata. I found this out when using pypdf .
Is there anyway to actually retrieve the in text title from a pdf? I tried to export to a text file then search but there is no consistent formatting. Is there any way to export the pdf to a document with its formatting, then check for a font size >= 14 ?
This is a very good question. Applications that create PDFs don't seem to do anything useful with the available metadata fields.
Take pdflatex as an example: even when one sets the \title{...} and \author{...} in the preamble, this information is not reflected in the metadata. After a quick search, the solution appears to be to introduce a block in the preamble which is read only by pdflatex [1]:
\pdfinfo
{
/Title{...}
/Author{...}
...
}
...which is then placed in the the relevant metadata fields of the PDF. It is strange that this is necessary, though.
I cannot speak for word processors like Word or Writer. One presumes such metadata fields have to be set manually by the user.
Perhaps a heuristic approach is the only way you can approach your problem if your PDFs are not generated by you. [2] seems like it does something similar to what you want, but I guess it depends how well published the PDFs are -- this tool seems to be scientific-paper oriented.
I hope that is at least some help.
[1] http://wlug.org.nz/PdfLatexNotes
[2] http://www.molspaces.com/d_cb2bib-metadata.php

Understanding the PDF DOM

I am writing an application that has to read and interpret data stored in some PDF files. The reading part is done but I am only able to get a dump of all the words on a page and not the format of the words. What I mean is that if I have to extract a table, I am getting the numbers in the table but not the markup which defines the table.
Further, there is some formatting used which displays a few of these numbers within parentheses (meaning that those numbers are negative) but the parentheses themselves are not part of the text. Hence, I am not able to distinguish between positive and negative numbers present in the PDF table!
How do you get the PDF markup along with the text? Is a PDF similar in structure to an XML with tags used to markup tables etc.? If not, then, is there a resource which describes the salient features of the PDF DOM?
I am using VBA and the Acrobat library (AcroExch etc.)
There is no such thing as "PDF markup" in the sense of HTML etc. A table in PDF cannot be distinguished from line art, other than by using OCR, which can be error-prone if the layout is complex. It is simply drawn using geometrical shapes, like in a vector-based graphics program.
"Is a PDF similar in structure to an XML with tags used to markup tables etc.?"
No, not at all.
And there is no such thing as a 'DOM' either. Google for a file named *PDF32000_2008.pdf*. The current PDF specification for v1.7 (ISO spec) is that file. You should be able to locate it on the Adobe website.
As omz stated, text inside PDF does not really have a structure. You can take a look on the specification here. However, for some very specific files, there is something called PDF Tags, or PDF Marked Content, which is fairly new, and it aims to give PDF documents some kind of structure. If you target this kind of files specifically, you might be able to achieve something. Take a look on chapter 10 (Document Interchange) of the Adobe's specification for further details.
Maybe what you want to achieve can be done with less effort and faster by using TET, the Text Extraction Toolkit made by the fine folks from pdflib.com ( http://www.pdflib.com/products/tet/ ) ??
AFAIR, the TET has some (limited) support for table detection as well....

Stopping doxygen searching for (and assuming) non-existant variables in source code

Im using doxygen outside of its design, but well within its capability. I have a bunch of essentially text files, appended with some doxygen tags. I am successfully generating doxygen output. However, somehow doxygen occasionally discovers what it assumes to be a variable, and proceeds to document it using surrounding text, causing a lot of confusing documentation. I cant see any direct relationship between these anomalies, only that they're reproducing the same output on each run, and what I can see is at least some are next to a ';' or a '='.
I only want doxygen to document what I've manually tagged. I am hoping to remove any occurrence of these anomalies, however I cannot alter existing text. I can only add doxygen tags, or alter the configuration file. Any ideas?
Many thanks.
Because in my particular case, I do not need any automatically generated documentation, only that which I have tagged with doxygen tags, setting
EXCLUDE_SYMBOLS = *
removes any instance of doxygen "finding" and documenting variables. This however may remove any ability to find any class declarations, namespaces or functions, however this is acceptable for me.