How can I extract music notation from a PDF? - pdf

I am trying to render a PDF with another embedded font for musical notation and I don't know how to do it.
I am trying to rasterize a music staff in a PDF and I realized that the PDF is rendering the musical notation bad because of the embeded font. Then I want to try with another font.
I extract text with ghostscript, or convert PDF to PS and edit the .ps, but I believe if I can understand how to re-redender the PDF...
gs -dBATCH -dNOPAUSE -sDEVICE=txtwrite -sOutputFile=Betlem_pdf.txt Betlem.pdf
iconv.exe -f MACROMAN -t UTF-8 Betlem_pdf.txt > Betlem_pdf_txt_utf8.txt
enscript.exe -f Petrucci40 Betlem_pdf_txt_utf8.txt -o Betlem_pdf_txt_utf8.ps
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=Betlem_2.pdf Betlem_pdf_txt_utf8.ps
The expected result is to see the same staff than in the original PDF but with another font. But I don't know what I am doing...
http://www.xn--estudiantladolaina-lvb.com/partitures/baixa/pdf/26

The PDF you linked does not have fonts for the music notes. The music notes are PDF shape/image objects. Best you can do here is use a OCR (optical character recognition) for the PDF and hope that OCR supports music notes.

Related

Converting pdf to eps without rasterizing or changing fonts

I have been trying to convert a pdf vector graphic to eps. I tried two commands from the following answer: https://stackoverflow.com/a/44737018/5661667
The inkscape command inkscape input.pdf --export-eps=output.eps or rather, since --export-eps is deprecated now,
inkscape input.pdf --export-filename=output.eps
nicely converts to a vectorized eps. However, it strangely converts my Times New Roman fonts (the graphic was originally created using matplotlib) to some sans serif font (looks like Arial or something).
The ghostscript version of the conversion from the linked answer
gs -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -sOutputFile=output.eps input.pdf
keeps my fonts nicely. However, the eps seems to be rasterized despite the -dNOCACHE option.
Is there any way to get one of these to just convert my pdf to eps without modifying it?
Further info: I am using Mac OS. For the first part, my suspicion is that I only have an Arial Unicode.tff installed in /Library/Fonts/. I tried installing some other fonts, but no success for my conversion.
I had the same problem when trying to convert a powerpoint generated pdf to eps format using inkscape.
After trying with gs and disabling the transparency I noticed some areas turned black after eps conversion.
gs -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -dNOTRANSPARENCY -sDEVICE=eps2write -sOutputFile=output.eps input.pdf
Coming back to inkscape I noticed that Powerpoint added some transparent objects in these areas that turned black. So I manually removed them using inkscape and when converting to eps again the result was perfect!
In short: if there are transparent elements in your pdf, the fonts will probably be rasterized during eps conversion. So, you need to remove these elements.
Maybe there is an easier way to identify them in inkscape.
In my case I was able to use Find/Replace (Ctrl+F) to search objects with string "clipPath" and with 'Search option = Properties'. Then I open the Objects Tab (Menu Object->Objects...) and use that to delete each transparent object generated by Powerpoint.

Convert pdf to pdfx maintaining vector graphics / without rasterisation (e.g. using ghostscript)

I would like to save a pdf to a pdf/x.
The pdf contains a vector graphic.
When I convert it using ghostscript v9.53.3 on windows 10 and using...
gswin64c -dPDFX -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sColorConversionStrategy=CMYK -sProcessColorModel=DeviceCMYK -sOutputFile=document-pdfx.pdf document.pdf
...graphics and text are rasterized.
What can I do to prevent this?
If I do the same using Adobe Acrobat DC Pro my graphics remain as vector graphics.
I could not really find something helpfull at https://www.ghostscript.com/doc/current/VectorDevices.htm

Ghostscript - Convert vector pdf to the raster pdf

I would like to convert the vector pdf to raster pdf by using ghostscript(i.e. rasterized the vector pdf). But I cannot find the appropriate parameters to do so even if I add the resolution parameter -r300.
The code I used is -dSAFER -dBATCH -dNOPAUSE -dPDFSETTINGS=/screen -dGrap
hicsAlphaBits=1 -sDEVICE=pdfwrite -r300 -sOutputFile="output-raster.pdf" "input-vector.pdf"
Anyone know how to rasterized the pdf?
You can use pdftocairo from the Poppler library. It can convert a PDF to a raster image format like PNG or JPEG. Then use any image viewer or imagemagick to convert the image to a PDF file if you need a PDF as output.

Conversion of EPS to PDF or Tiff does not maintain transparency

I am trying to convert eps file to pdf file or tiff file using ghostscript but having issues keeping it transparent. When I convert it to png, transparency is maintained but I need to have pdf or tiff for printing purpose.
To convert it to PDF, I am using below Arguments:
For PDF
-dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dEPSCrop -sOutputFile=C:\temp\test.pdf
C:\temp\test.eps;
For Tiff
-dNOPAUSE -dBATCH -sDEVICE=tiff32nc -r300 -dEPSCrop -sOutputFile=C:\temp\test.tiff C:\temp\test.eps;
is there something I am missing or is it not possible to maintain transparancy?
EPS cannot contain transparency, its not part of the standard, so I don't really see how it can fail to 'maintain' it....

How to convert PDF to low-resolution (but good quality) JPEG?

When I use the following ghostscript command to generate jpg thumbnails from PDFs, the image quality is often very poor:
gs -q -dNOPAUSE -dBATCH -sDEVICE=jpeggray -g465x600 -dUseCropBox -dPDFFitPage -sOutputFile=pdf_to_lowres.jpg test.pdf
By contrast, if I use ghostscript to generate a high-resolution png, and then use mogrify to convert the high-res png to a low-res jpg, I get pretty good results.
gs -q -dNOPAUSE -dBATCH -sDEVICE=pnggray -g2550x3300 -dUseCropBox -dPDFFitPage -sOutputFile=pdf_to_highres.png test.pdf
mogrify -thumbnail 465x600 -format jpg -write pdf_to_highres_to_lowres.jpg pdf_to_highres.png
Is there any way to achieve good results while bypassing the intermediate pdf -> high-res png step? I need to do this for a large number of pdfs, so I'm trying to minimize the compute time.
Here are links to the images referenced above:
test.pdf
pdf_to_lowres.jpg
pdf_to_highres.png
pdf_to_highres_to_lowres.jpg
One option that seems to improve the output a lot: -dDOINTERPOLATE. Here's what I got by running the same command as you but with the -dDOINTERPOLATE option:
I'm not sure what interpolation method this uses but it seems pretty good, especially in comparison to the results without it.
P.S. Consider outputting PNG images (-sDEVICE=pnggray) instead of JPEG. For most PDF documents (which tend to have just a few solid colors) it's a more appropriate choice.
Your PDF looks like it is just a wrapper around a jpeg already.
Try using the pdfimages program from xpdf to extract the actual image rather than rendering
to a file.