How to remove a single output cell only in Jupyter notebook using nbconvert - pdf

Sorry if that's a silly question, but it really confuses me.
I want to use nbconvert in Jupyter Notebook to export PDF and HTML file for sharing. I must keep the output cell (mostly matplotlib subplots) and hide the input cell (code). However, I cannot hide the output of using nbconvert command.
For example, if I command:
!jupyter nbconvert --to pdf --no-input grmd4203.ipynb
And it will get:
[NbConvertApp] WARNING | Config option `dpi` not recognized by `PDFExporter`.
[NbConvertApp] Converting notebook grmd4203.ipynb to pdf
[NbConvertApp] Support files will be in grmd4203_files\
[NbConvertApp] Making directory .\grmd4203_files
[NbConvertApp] Making directory .\grmd4203_files
[NbConvertApp] Making directory .\grmd4203_files
[NbConvertApp] Making directory .\grmd4203_files
[NbConvertApp] Writing 25260 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 1 time: ['xelatex', 'notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook']
[NbConvertApp] WARNING | b had problems, most likely because there were no citations
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 922426 bytes to grmd4203.pdf
at the end of the page.
Can I hide this part even if I want to keep my graphs visible? I can manually delete them in pdf file but I cannot remove them in the created HTML file. That's really sad.

Related

PNG attachment downloads corrupted on some systems

I am trying to serve a PNG file as an attachment. This is working fine on my CentOS 8 development machine, but when I deploy it to a RedHat 7 machine the file has extra bytes in the downloaded file. For example, viewing the PNG file in an emacs buffer, the original file shows:
\211PNG^M
but the downloaded file shows
\302\211PNG^M
and there are \302 entries throughout the downloaded file.
Again, this corruption occurs only on the RedHat 7 machines.
I check the file byte count on the server process and it has the correct value. It appears these \302 entries are being added by the server.
The server process is a Perl script, and I'm using a regular print statement to output the image file contents.
The UTF-8 encoding of Code Point 0211 (0x89) is 0302 0211 (0xC2 89). You are encoding the image using UTF-8 for some reason. Don't :)

Matplotlib's LaTeX run directory

Where is the LaTeX run directory for matplotlib? Are the LaTeX log files kept at all?
I use pdflatex system to generate a ".pgf" plot, that I could insert into my LaTeX document. Unfortunately the python traceback shows only a small part of the log file, which is not enough to solve the issue. I would like to take a look at the log file. The traceback tells me the following:
! Dimension too large.
<to be read again>
\relax
l.995 \Gm#process
! ==> Fatal error occurred, no output PDF file produced!
Transcript written on figure.log.
and the following error:
shell returned 1
Since I have nowhere "\Gm" in my python code, I would need to take a look at the .tex file and the log to help me figure out what is going on. I've tried to search for the file "figure.log" on my system, but it does not exists.

Running tesseract 4.1 with openjpeg2 - cannot produce pdf output

I have installed on my RedHat machine:
(py36_maw) [rvp#lib-archcoll box]$ tesseract -v
tesseract 4.1.0
leptonica-1.78.0
libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libopenjp2 2.3.1
Found SSE
I try to run, per what docs I can find, to produce pdf output:
(py36_maw) [rvp#lib-archcoll box]$ time tesseract test.jp2 out -l eng PDF
read_params_file: Can't open PDF
Tesseract Open Source OCR Engine v4.1.0 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 275
That takes 10 seconds and produces file out.txt with fine OCR to text conversion evident.
However, it tries to read a file called PDF, but I cannot figure how to get PDF output.
I have read various docs, the most promising seeming to be advising to edit the config file, but the only docs I can guess are relevant, by googling 'tesseract 4.1 config', list many 'config' variable names, for older versions of tesseract, but none of which seems to indicate I can specify producing pdf output, much less specifically for tesseract 4.1.
How can I invoke tesseract 4.1 (using libopenjp2 2.3.1) via CLI to produce pdf output from my jp2 input file? Bonus question: how can I get it to produce both txt and pdf output in one run?
Robert
After more surfing and digging, assuming the reader also has done some and knows what TESSDATA_PREFIX is used for by tesseract, here are the steps that worked for me:
Download the pdf.ttf file from: https://github.com/tesseract-ocr/tesseract/blob/master/tessdata/pdf.ttf
Copy pdf.ttf to your directory $TESSDATA_PREFIX and make sure that variable is exported to your shell.
TIP: Use command: tesseract --print-parameters # to discover defined variable names you can use in your own config file
Go to your dir with the test.jp2 file and create file config with these lines.
tessedit_create_pdf 1 Write .pdf output file
tessedit_create txt 1 Write .txt output file
(Note: or you may be able to put the config file in the TESSDATA_PREFIX directory as well and let it always be the default. Not tested.)
Run in that dir:
$ tesseract test.jp2 outputbase -l eng config
Verify your success: it runs and produces files outputbase.txt and outputbase.pdf. The txt file looks good and the searchable pdf looks and works OK in a pdf viewer, that is, you can search and find text strings.
Hope this helps someone else!

Tesseract searchable pdf creation doesn't work

i'm running Tesseract 4.0.0 and i tried the following command in order to create a searchable pdf but it doesn't seem to work :
tesseract input output pdf
It gives an error :
can't open file "\Program Files\...//pdf.ttf"!
error during processing
The pdf file gets created but it cannot be open.
I tried it on different image formats : jpg, tif, png with no success.
It does work, not sure which os you are using, but I realised that to make it work on Linux a full install was necessary
sudo apt install tesseract-ocr
sudo apt install tesseract-ocr-all
then, for a German document for example, originally a multipage tif:
tesseract multipage-tiff.tif out pdf -l deu
the manual is useful - https://github.com/tesseract-ocr/tesseract/wiki

error Converting PDF to PNG - Python 3.6 and GhostScript

I have much trouble to have a code to convert pdf file to png on python 3.6, windows 10.
I know what you are going to say : google it !
But barely everything I've found was on python 2.7. And some packages haven't been updated.
What I've seen so far it's that the best way to do it is using Wand, right ? (I have installed ImageMagick before )
from wand.image import Image
# Converting first page into JPG
with Image(filename='0.pdf') as img:
img.save(filename="/temp.jpg")
# Resizing this image
Here was my second error :
wand.exceptions.DelegateError: PDFDelegateFailed
`The system cannot find the file specified.' # error/pdf.c/ReadPDFImage/809
So i read i need ghostscript. I installed it. But the package is for python 2.7 and it doesn't work. I found python3-ghostscript 0.5.0. https://pypi.python.org/pypi/python3-ghostscript/0.5.0
New error :
RuntimeError: Can not find Ghostscript DLL in registry
So here I needed to install Ghostscript 9 :
https://www.ghostscript.com/download/gsdnld.html
First of all it's not a GPL license ... That's not even a package but a program. I don't know how I can use it in my futures python codes...
and there is still an error :
RuntimeError: Can not find Ghostscript DLL in registry
and i can't find anything for it.
Ghostscript is licensed under the AGPL, the licence can be found in /Program Files (x86)/gs/gs9.21/doc if you want sources then they are available from the Ghostscript Git repository. Note I'm assuming you are running on Windows since you refer to the Registry.
If you install the prebuilt binary then it will create an entry in the Windows Registry, I assume that's what your Python code is looking for but I can't be sure. You should make sure you install the correct word size (32 or 64) version required by Python, if it cares.
You can, of course, simply run Ghostscript to render a PDF file and produce PNG output.
gswin32c -sDEVICE=png16m -sOutputFile=out%d.png input.pdf
This will create one file per page of the input PDF file, use gswin64c for the 64-bit version...
You can alter the resolution of the output with the -r switch, eg -r300
I presume you can simply fork a process from Python. Otherwise you'll have to get someone to tell you what the Python script is looking for in the Registry. Perhaps its looking for a specific version of Ghostscript, or the 32-bit version or something.