I am writing a PDF where it is convinient for me to have a ToC where I might have to mix numbered and non-numbered section headers but I am struggling with finding how to do this automatically (eg just add a * to the section name or something like this).
\section*
removes the numbering but it also removes the section from the ToC
Related
Probably quite a niche question, but I believe in the power of a big community: Is it possible to set up jEdit in way, that it automatically inserts a comment character (//, #, ... depending on the edit mode) at the beginning of a new line, if the line before the wrap was a comment?
Sample:
# This is a comment spanning multiple lines. If I continue to type here, it
# wraps around automatically, but I have to manually add a `#` to each line.
If I continue to type after the . the third line should start with the # automatically. I searched in the plugin repository but could not find anything related.
Background: jEdit has the concepct of soft and hard wrap. While soft wrap only breaks lines visually at a character limit, it does not insert line breaks in the file. Hard wrap on the other hand inserts \n into the file at the desired character count.
This is not exactly what you want: I use the macros Enter_with_Prefix.bsh to automatically insert the prefix (e.g., #, //) at the beginning of the new line.
Description copied from Enter_with_Prefix.bsh:
Enter_with_Prefix.bsh - a Beanshell macro for jEdit
that starts a new line continuing any recognized
sequence that started the previous. For example,
if the previous line beings with "1." the next will
be prefixed with "2.". It supports alpha lists (a., b., etc...),
bullet lists (+, =, *, etc..), comments, Javadocs,
Java import statements, e-mail replies (>, |, :),
and is easy to extend with new sequence types. Suggested
shortcut for this macro is S+ENTER (SHIFT+ENTER).
I am using docx4j to load a template, replace specific keywords inside it (load the template, marshall then unmarshall)
My problem is that leading spaces inside the text I insert inside the template are ignored when I open the generated file.
I found some examples suggest to using "xml:space="preserve", but since I am using a template and xml marshall\unmarshall method to generate my report I don't know If there are a way to add this property
So, is there any way to make my reports recognize leading spaces?
I need to parse a CSV file with blocks of text being processed in different ways according to certain rules, e.g.
userone,columnone,columntwo
userthirteen,columnone,columntwo
usertwenty,columnone,columntwo
customerone,columnone<br>
customertwo,columntwo<br>
singlevalueone
singlevaluetwo
singlevalueone_otherruleapplies
singlevaluethree_otherruleapplies
Each block of text will be grouped so the first three rows will be parsed using certain rules and so on. Notice that the last two groups have only one single column but each group must be handled in a different way.
I have the chance to propose the customer the format of the file so I'm thinking to propose the following.
[group 1]
userone,columnone,columntwo
userthirteen,columnone,columntwo
usertwenty,columnone,columntwo
[group N]
rowN
A kind of sections like the INI files from some years ago. However I'd like to hear your comments because I think there must be a better way to handle this.
I proposed to use XML but the customer prefers the text files.
Any suggestions are welcome.
m0dest0.
Ps. using VB.net and VS 2008
You can use regular expression groups set to either an enum line mode if each line has the same format, or to an enum multi-line if the format is not constrained to a single line. For each line in multiline you can include \n in your pattern to cross multiple lines to find you pattern. If its on a single line you don't need to include \n also know as Carriage return line feed in your regex matching pattern.
vb.net as well as many other modern programming language has extensive support for grouping operations. You can use index groups, or named groups.
Each name such as header1 or whatever you want to name it would be in this format: <myname>
See this link for more info: How do I access named capturing groups in a .NET Regex?.
Good luck.
I am using PDFBox to extract text from PDF.
The PDF has a tabular structure, which is quite simple and columns are also very widely spaced from each-other
This works really well, except that all kinds of horizontal space gets converted into a single space character, so that I cannot tell columns apart anymore (space within words in a column looks just like space between columns).
I appreciate that a general solution is very hard, but in this case the columns are really far apart so that having a simple differentiation between "long spaces" and "space between words" would be enough.
Is there a way to tell PDFBox to turn horizontal whitespace of more then x inches into something other than a single space? A proportional approach (x inch become y spaces) would also work.
The pdftotext C library/tool has a '-layout' switch that tries to preserve the layout. Basically, if I can emulate that with PDFBox, that would be perfect.
There does not seem to be a setting for this, but I was able to modify the source for the PDFTextStripper tool to output a column separator (|) when a "long" space was encountered. In the code where it was building the output line it is possible to look at the x positions of the current and previous letter, and if it is large enough, do something special. PDFTextStripper has lots of protected methods, but turned out to be not really all that extensible. I ended up having to copy the whole class to change a private method.
Looking at the code in there, I call myself lucky that with the particular PDF, this simple approach was successful. A more general solution seems very tricky.
PDF text extraction is difficult.
If the text was output as one big string separated by spaces such as :-
PDFTextOut(" Column 1 Column 2 Column 3");
and you are using a fixed width font such as Courier then you could theoretically calculate the number of spaces between items of text because each character is the same width. If the font is proportional such a Arial then the calculation is harder.
In reality most PDF's generated by individually placing each piece of text directly into its position. Therefore, there is technically no space character or any other characters between columns. The text is just placed into an absolute position on the page.
PDFMoveTo(100,100);
PDFTextOut("Column 1");
PDFMoveTo(250,100);
PDFTextOut("Column 2");
In order to perform data extraction on PDF documents you have to do a little bit more work to find and match column data by using pixel locations as you have mentioned and by making some assumptions and having a little bit of luck.
I'm using the hyperref package in my document. One of the things that it does is create bookmarks in my pdf, based on the table of contents. Some section titles contain a reference to a citation
\section{Some title \citep{BibTeXkey}}
The label of the bookmark then looks like
Some title BibTeXkey
But I would like it to be
Some title (Author, year)
Just like it is displayed in the text and the table of contents. So only the bookmarks are messed up.
I used the sequence pdflatex, bibtex, pdflatex, pdflatex to compile the document.
How do I change the bookmark label to use the same format as in the table of contents?
Whenever I'm having an issue with the pdf bookmarks not working properly, the solution is usually to use \texorpdfstring. It allows you to make a section title contain some non-text material (like a link or some symbols) and specify what should appear in the pdf bookmark, which cannot contain symbols. The input
\section{The section with \texorpdfstring{LaTeX symbols}{plain text version}}
produces the section title "The section with LaTeX symbols", but the pdf bookmark for the section is "The section with plain text version".
In your case, the easiest thing to do is probably
\section{Some title \texorpdfstring{\citep{BibTeXkey}}{(Author, year)}}
Unfortunately, this means that you have to paste "(Author, year)" in by hand, which is a little annoying, but not a big deal if your bibliography entry doesn't change (which is probably shouldn't) and you don't change your citation conventions.
If you really want to avoid having to type in "(Author, year)" by hand, you can try using the \show command to try to figure out how \citep produces it's output. But I warn you that this approach is not for the faint of heart: in this case, I think you'll end up looking through the aux file, not to mention the blg, brf, and bbl files.