I'd like to embed a block of code like this:
foreach my $n (1..10) {
print "$n\n";
}
in a groff 'man' document and not have it reformat it into a single line. I'd like to suppress all formatting, and not have groff recognize and process any special characters. This would be like the <pre> and </pre> tags in HTML. Is this possible to do in groff? Thanks.
You can turn off fill-mode with the .nf request, and turn off escape processing with .eo; they are correspondingly restored with .fi and .ec. groff will process these correctly, but it is possible that some alternate man-page formatters will not.
Example:
.TH "Manpage Test" SO
.SH NAME
Manpage test \- test manpage formatting
.SH DETAILS
Here is the example.
.RS
.nf
.eo
foreach my $n (1..10) {
print "$n\n";
}
.ec
.fi
.RE
.P
And there you have the example.
The code snippet should be rendered as "pre-formatted", and
following text should be back to filled/adjusted.
.SH AUTHOR
Evil Otto
Renders as:
Manpage Test(SO) Manpage Test(SO)
NAME
Manpage test - test manpage formatting
DETAILS
Here is the example.
foreach my $n (1..10) {
print "$n\n";
}
And there you have the example. The code snippet should be rendered as
"pre-formatted", and following text should be back to filled/adjusted.
AUTHOR
Evil Otto
Manpage Test(SO)
Few now remember that originally man pages were typeset for printing and not only terminal viewing. It's still possible by outputting postscript (man -Tps) or better dvi intermediate:
man -Tdvi ... > /tmp/man.dvi && dvipdfmx /tmp/man.dvi -o /tmp/man.pdf && xdg-open /tmp/man.pdf
and it's a somewhat good test for "semantic correctness" of formatting that otherwise looks same on terminal.
evil otto's answer is a good start but keeps the regular variable-width font:
You probably want a monospaced Courier font:
.RS
.ft CR
.nf
.eo
foreach my $n (1..10) {
print "$n\n";
}
.ec
.fi
.ft R
.RE
which can be shortened with .EX...EE macro for typesetting examples (plus, it saves and restores previous font family which is cleaner than hardcoding .ft R):
.RS
.EX
.eo
foreach my $n (1..10) {
print "$n\n";
}
.ec
.EE
.RE
.EX is an "extension", no idea what that means for its portability 🤷
groff documentation claims .EX takes an optional indent parameter (which could replace the .RS....RE) but it didn't work for me (and I don't see any such logic in the implementation).
Related
My workflow includes making figures in Inkscape, which are then converted to PDF and included into LaTeX documents. In these figures, I often have to include mathematical formulas. For that, I use TexText. For font consistency and simplicity, when I want to add some plain text to my figure, I also use TexText. When the resulting SVG is converted to PDF, the TexText-generated text is not searchable.
How can I make a PDF from the SVG such that it is searchable while remaining a vector PDF?
I know I could rasterize the figure and then use e.g. Tesseract to create a searchable PDF. But the resulting PDF will of course contain a rasterized version of my figure. I would like the figure itself to remain vector graphics.
I am guessing there has to be a way that would go something like this: indeed rasterize the PDF and use Tesseract to extract the text. But then take the output of Tesseract and somehow add it to the original vector PDF. Unfortunately, I don't know how to do this.
It turns out a question that is directly relevant to mine was answered on another StackExchange, here. The script that answers my actual question is svgToSearchablePDF.sh, and I post it below. It uses, as the key element, the script pdf-merge-text.sh from the accepted answer to that other question. For completeness, I will repost pdf-merge-text.sh in this answer.
The solution
Note that perhaps you will need to magnify the SVG file before converting it to a searchable PDF: larger image sizes help the OCR process. To magnify, in Inkscape, select the entire image, then go to Object -> Transform… . In the Transform tab, select Scale. Then select 'Scale proportionally', and finally, in either 'Width' or 'Height', enter something like '300' (make sure % is selected in the menu immediately to the right of the 'Width' field). Next, File->Document Properties…. In the 'Document Properties' tab, under Custom size, click on Resize page to content. Make sure that either nothing is selected or else that the entire image is selected. Then click the button 'Resize page to drawing or selection'. Save the SVG.
The script svgToSearchablePDF.sh uses an SVG file as input and produces a searchable vector PDF file as output. It is assumed that all of the following are installed: Tesseract, Inkscape, and Ghostscript.
For example, assume that we used Inkscape to create the file mygraphics.svg. Then the following command will produce a searchable PDF file mygraphics.pdf:
svgToSearchablePDF.sh mygraphics.svg
The scripts
First, svgToSearchablePDF.sh:
#!/bin/bash
filename="$1"
inkscape ${filename%.*}.svg -o ${filename%.*}_auxfile.png
inkscape ${filename%.*}.svg -o ${filename%.*}_auxfileVCT.pdf
tesseract ${filename%.*}_auxfile.png ${filename%.*}_auxfileTXT -l eng pdf
pdf-merge-text.sh ${filename%.*}_auxfileTXT.pdf ${filename%.*}_auxfileVCT.pdf ${filename%.*}.pdf
rm -f ${filename%.*}_auxfile.png ${filename%.*}_auxfileVCT.pdf ${filename%.*}_auxfileTXT.pdf
As I said, that script uses the script pdf-merge-text.sh from here. For completeness, here it is:
#!/usr/bin/env bash
set -eu
pdf_merge_text() {
local txtpdf; txtpdf="$1"
local imgpdf; imgpdf="$2"
local outpdf; outpdf="${3--}"
if [ "-" != "${txtpdf}" ] && [ ! -f "${txtpdf}" ]; then echo "error: text PDF does not exist: ${txtpdf}" 1>&2; return 1; fi
if [ "-" != "${imgpdf}" ] && [ ! -f "${imgpdf}" ]; then echo "error: image PDF does not exist: ${imgpdf}" 1>&2; return 1; fi
if [ "-" != "${outpdf}" ] && [ -e "${outpdf}" ]; then echo "error: not overwriting existing output file: ${outpdf}" 1>&2; return 1; fi
(
local txtonlypdf; txtonlypdf="$(TMPDIR=. mktemp --suffix=.pdf)"
trap "rm -f -- '${txtonlypdf//'/'\\''}'" EXIT
gs -o "${txtonlypdf}" -sDEVICE=pdfwrite -dFILTERIMAGE "${txtpdf}"
pdftk "${txtonlypdf}" multistamp "${imgpdf}" output "${outpdf}"
)
}
pdf_merge_text "$#"
I have started using Markdown to write my Latex PDFs, and so far I am impressed by the amount of boilerplate it takes away.
However, I find Markdown not as expressive as Tex, and therefore in some situations would like to write the document in Markdown, convert to tex, then add some Latex-only stuff and only then convert to PDF.
However, converting .md to .tex with Pandoc does not yield an compilable file: it only contains the body of the file, not the "document setup".
Example, the following .md file:
```haskell
data Expr = I Int
```
Converts to:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{data} \DataTypeTok{Expr} \FunctionTok{=} \DataTypeTok{I} \DataTypeTok{Int}
\end{Highlighting}
\end{Shaded}
Obviously this is missing some stuff like the document class, start of document and the imported packages. Is there any way to generate this complete file instead of just the body? Or if not, can anyone at least tell me what package the Shaded, Highlighting, KeywordTok, DataTypeTok and FunctionTok commands are pulled from? Then I can add these imports myself.
Pandoc creates small snippets by default. Invoke it with the --standalone (or -s) command line flag to get a full document.
I have a latex file which needed to include snippets of Lua code (for display, not execution), so I used the minted package. It requires latex to be run with the latex -shell-escape flag.
I am trying to upload a PDF submission to arXiv. The site requires these to be submitted as .tex, .sty and .bbl, which they will automatically compile to PDF from latex. When I tried to submit to arXiv, I learned that there was no way for them to activate the -shell-escape flag.
So I was wondering if any of you knew a way to highlight Lua code in latex without the -shell-escape flag. I tried the listings package, but I can't get it to work for Lua on my Ubuntu computer.
You can set whichever style you want inline using listings. It's predefined Lua language has all the keywords and associated styles identified, so you can just change it to suit your needs:
\documentclass{article}
\usepackage{listings,xcolor}
\lstdefinestyle{lua}{
language=[5.1]Lua,
basicstyle=\ttfamily,
keywordstyle=\color{magenta},
stringstyle=\color{blue},
commentstyle=\color{black!50}
}
\begin{document}
\begin{lstlisting}[style=lua]
-- defines a factorial function
function fact (n)
if n == 0 then
return 1
else
return n * fact(n-1)
end
end
print("enter a number:")
a = io.read("*number") -- read a number
print(fact(a))
\end{lstlisting}
\end{document}
Okay so lhf found a good solution by suggesting the GNU source-hightlight package. I basically took out each snippet of lua code from the latex file, put it into an appropriately named [snippet].lua file and ran the following on it to generate a [snippet]-lua.tex :
source-highlight -s lua -f latex -i [snippet].lua -o [snippet]-lua.tex
And then I included each such file into the main latex file using :
\input{[snippet]-lua}
The result really isn't as nice as that of the minted package, but I am tired of trying to convince the arXiv admin to support minted...
Currently, to represent a newline in go programs, I use \n. For example:
package main
import "fmt"
func main() {
fmt.Printf("%d is %s \n", 'U', string(85))
}
... will yield 85 is U followed by a newline.
However, this doesn't seem all that cross-platform. Looking at other languages, PHP represents this with a global constant ( PHP_EOL ). Is \n the right way to represent newlines in a cross-platform specific manner in go / golang?
I got curious about this so decided to see what exactly is done by fmt.Println. http://golang.org/src/pkg/fmt/print.go
If you scroll to the very bottom, you'll see an if addnewline where \n is always used. I can't hardly speak for if this is the most "cross-platform" way of doing it, and go was originally tied to linux in the early days, but that's where it is for the std lib.
I was originally going to suggest just using fmt.Fprintln and this might still be valid as if the current functionality isn't appropriate, a bug could be filed and then the code would simply need to be compiled with the latest Go toolchain.
You can always use an OS specific file to declare certain constants. Just like _test.go files are only used when doing go test, the _[os].go are only included when building to that target platform.
Basically you'll need to add the following files:
- main.go
- main_darwin.go // Mac OSX
- main_windows.go // Windows
- main_linux.go // Linux
You can declare a LineBreak constant in each of the main_[os].go files and have your logic in main.go.
The contents of you files would look something like this:
main_darwin.go
package somepkg
const LineBreak = "\n"
main_linux.go
package somepkg
const LineBreak = "\n"
main_windows.go
package somepkg
const LineBreak = "\r\n"
and simply in your main.go file, write the code and refer to LineBreak
main.go
package main
import "fmt"
func main() {
fmt.Printf("%d is %s %s", 'U', string(85), LineBreak)
}
Having the OS determine what the newline character is happens in many contexts to be wrong. What you really want to know is what the "record" separator is and Go assumes that you as the programmer should know that.
Even if the binary runs on Windows, it may be consuming a file from a Unix OS.
Line endings are determined by what the source of the file or document said was a line ending, not the OS the binary is running in.
You can use os.PathSeparator
func main() {
var PS = fmt.Sprintf("%v", os.PathSeparator)
var LineBreak = "\n"
if PS != "/" {
LineBreak = "\r\n"
}
fmt.Printf("Line Break %v", LineBreak)
}
https://play.golang.com/p/UTnBbTJyL9c
I want to convert PDF to SVG please suggest some libraries/executable that will be able to do this efficiently. I have written my own java program using the apache PDFBox and Batik libraries -
PDDocument document = PDDocument.load( pdfFile );
DOMImplementation domImpl =
GenericDOMImplementation.getDOMImplementation();
// Create an instance of org.w3c.dom.Document.
String svgNS = "http://www.w3.org/2000/svg";
Document svgDocument = domImpl.createDocument(svgNS, "svg", null);
SVGGeneratorContext ctx = SVGGeneratorContext.createDefault(svgDocument);
ctx.setEmbeddedFontsOn(true);
// Ask the test to render into the SVG Graphics2D implementation.
for(int i = 0 ; i < document.getNumberOfPages() ; i++){
String svgFName = svgDir+"page"+i+".svg";
(new File(svgFName)).createNewFile();
// Create an instance of the SVG Generator.
SVGGraphics2D svgGenerator = new SVGGraphics2D(ctx,false);
Printable page = document.getPrintable(i);
page.print(svgGenerator, document.getPageFormat(i), i);
svgGenerator.stream(svgFName);
}
This solution works great but the size of the resulting svg files in huge.(many times greater than the pdf). I have figured out where the problem is by looking at the svg in a text editor. it encloses every character in the original document in its own block even if the font properties of the characters is the same. For example the word hello will appear as 6 different text blocks. Is there a way to fix the above code? or please suggest another solution that will work more efficiently.
Inkscape can also be used to convert PDF to SVG. It's actually remarkably good at this, and although the code that it generates is a bit bloated, at the very least, it doesn't seem to have the particular issue that you are encountering in your program. I think it would be challenging to integrate it directly into Java, but inkscape provides a convenient command-line interface to this functionality, so probably the easiest way to access it would be via a system call.
To use Inkscape's command-line interface to convert a PDF to an SVG, use:
inkscape -l out.svg in.pdf
Which you can then probably call using:
Runtime.getRuntime().exec("inkscape -l out.svg in.pdf")
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Runtime.html#exec%28java.lang.String%29
I think exec() is synchronous and only returns after the process completes (although I'm not 100% sure on that), so you shoudl be able to just read "out.svg" after that. In any case, Googling "java system call" will yield more info on how to do that part correctly.
Take a look at pdf2svg (also on on github):
To use
pdf2svg <input.pdf> <output.svg> [<pdf page no. or "all" >]
When using all give a filename with %d in it (which will be replaced by the page number).
pdf2svg input.pdf output_page%d.svg all
And for some troubleshooting see:
http://www.calcmaster.net/personal_projects/pdf2svg/
pdftocairo can be used to convert pdf to svg. pdfcairo is part of poppler-utils.
For example to convert 2nd page of a pdf, following command can be run.
pdftocairo -svg -f 1 -l 1 input.pdf
pdftk 82page.pdf burst
sh to-svg.sh
contents of to-svg.sh
#!/bin/bash
FILES=burst/*
for f in $FILES
do
inkscape -l "$f.svg" "$f"
done
I have encountered issues with the suggested inkscape, pdf2svg, pdftocairo, as well as the not suggested convert and mutool when trying to convert large and complex PDFs such as some of the topographical maps from the USGS. Sometimes they would crash, other times they would produce massively inflated files. The only PDF to SVG conversion tool that was able to handle all of them correctly for my use case was dvisvgm. Using it is very simple:
dvisvgm --pdf --output=file.svg file.pdf
It has various extra options for handling how elements are converted, as well as for optimization. Its resulting files can further be compacted by svgcleaner if necessary without perceptual quality loss.
inkscape (#jbeard4) for me produced svgs with no text in them at all, but I was able to make it work by going to postscript as an intermediary using ghostscript.
for page in $(seq 1 `pdfinfo $1.pdf | awk '/^Pages:/ {print $2}'`)
do
pdf2ps -dFirstPage=$page -dLastPage=$page -dNoOutputFonts $1.pdf $1_$page.ps
inkscape -z -l $1_$page.svg $1_$page.ps
rm $1_$page.ps
done
However this is a bit cumbersome, and the winner for ease of use has to go to pdf2svg (#Koen.) since it has that all flag so you don't need to loop.
However, pdf2svg isn't available on CentOS 8, and to install it you need to do the following:
git clone https://github.com/dawbarton/pdf2svg.git && cd pdf2svg
#if you dont have development stuff specific to this project
sudo dnf config-manager --set-enabled powertools
sudo dnf install cairo-devel poppler-glib-devel
#git repo isn't quite ready to ./configure
touch README
autoreconf -f -i
./configure && make && sudo make install
It produces svgs that actually look nicer than the ghostscript-inkscape one above, the font seems to raster better.
pdf2svg $1.pdf $1_%d.svg all
But that installation is a bit much, too much even if you don't have sudo. On top of that, pdf2svg doesn't support stdin/stdout, so the readily available pdftocairo (#SuperNova) worked a treat in these regards, and here's an example of "advanced" use below:
for page in $(seq 1 `pdfinfo $1.pdf | awk '/^Pages:/ {print $2}'`)
do
pdftocairo -svg -f $page -l $page $1.pdf - | gzip -9 >$1_$page.svg.gz
done
Which produces files of the same quality and size (before compression) as pdf2svg, although not binary-identical (and even visually, jumping between output of the two some pixels of letters shift, but neither looks wrong/bad like inkscape did).
Inkscape does not work with the -l option any more. It said "Can't open file: /out.svg (doesn't exist)". The long form that option is in the man page as --export-plain-svg and works but shows a deprecation warning. I was able to fix and update the command by using the -o option on Inkscape 1.1.2-3ubuntu4:
inkscape in.pdf -o out.svg