Saving plotly plots in one pdf file - pandas

I am trying to save plotly plots generated within a for a loop into one pdf fil, but here is says we need to pay for it
Is there any updates on this feature? Do we really need to pay to save as pdf?

For anyone else still looking for a quick answer 2 years later:
It is possible to export static plotly figures as pdf. Produce a plotly figure, say fig. This has the .write_image method to export to several formats. Simply do:
fig.write_image("your_image.pdf")
NOTE: you may need to install kaleido (plotly uses it to convert to static images).
pip install -U kaleido
Then:
fig.write_image("your_image.pdf", engine="kaleido")
Credits and references:
plotly explaining
it
kaleido

I believe you do need to pay in order to save as a pdf:
py.plotly.image.save_as(fig, filename='file.pdf')
PlotlyRequestError: Hi there! Accounts on the Community Plan can only download PNG and JPEG (raster) images.
To download publication-quality vector images (SVG, PDF, and EPS), please upgrade your account.
UPGRADE HERE: https://plot.ly/products/cloud
you can save them as a jpeg or png:
py.plotly.image.save_as(fig, filename='file.png')

Related

Extracting embedded PNG byte streams from PDF

I am programming in Python, but if some tool/library exists in another language that would help me considerably, I am open to suggestions.
I have a large collection of pdf pages that live in a database, and I am trying to automate the collection of those pages to build some image recognition models with them.
These "pdfs" are actually just PNG images encased with a PDF wrapper (presumably so they can be read by PDF readers like Adobe Acrobat). I need the pdfs in image format to feed into the image recognition model pipeline. I am assuming they are PNG images, because when I save the images from the browser (i.e., right click and save image as), the resulting file is a PNG file.
After reading this question from 2010, and checking out this blog post from 2007, I've concluded that there must be a way to just extract the PNG byte array from the PDF instead of re-converting the PDF into a new image. Oddly though, I couldn't find the PNG file header with
#Python 3.6
header = bytes([137, 80, 78, 71, 13, 10, 26, 10])
#the resulting header looks like this: b'\x89PNG\r\n\x1a\n'
file.find(header)
Does that mean that the embedded image is not in fact a PNG image?
If there is no easy way to extract the embedded image byte array, what tool might I use to automate the conversion of each PDF file to some image format (preferably JPEG, PNG, or TIFF)?
Edit: I know tools like ImageMagick exist for format conversions, but I'd really rather do the extraction method for the sake of learning more about these file formats.
pip install pdf2image
pip install pillow
pip install numpy
pip install opencv-python
Then,
import numpy as np
from pdf2image import convert_from_path as read
import PIL
import cv2
#pdf in the form of numpy array to play around with in OpenCV or PIL
img = np.asarray(read('path to the pdf file')[0])#first page of pdf
cv2.imwrite('path to save the image with the file extension',img)

Is it possible to convert fabricjs svg output to pdf without rasterizing?

We are building a web app where the user can make a design by using fabric.js and at the end he should receive a pdf file with his work.
At first, we tried to use JSPDF because it was prefered to have a cliente-side solution. However by doing pdf.addImage(canvas.toDataURL(),...) we are rasterizing the design.
In second place, we tried server side solution using WKHTMLTOPDF, sending canvas.toSVG(), but there are some issues with fonts and shapes rendering.
The designs are complex as they can have text, shapes, images and svg.
We also tried INKSCAPE (inkscape --without-gui --export-pdf ...), MPDF and MUPDF without good results. IMAGEMAGICK is not a solution has it also rasterize the design.
The main goal is to get a vector pdf, where it's possible to increase size and where the elements of the design are selectable, and if possible that pdf should be ready to print (300 dpi and cmyk)
Yes its possible using TCPDF library.
Please check this ImageSVG api for more information for converting SVG to PDF.
https://tcpdf.org/examples/example_058/
Export the canvas to svg and use pdflib to make the pdf.
You can find an exemple here:https://www.pdflib.com/pdflib-cookbook/graphics/starter-svg/

Tesseract cannot recognize my image correctly

I am developing an Android app now, it needs to recognize captcha from website.
I utilize the tess-two to recognize captcha and follow TrainingTesseract3 instructions to train my own traineddata (using jTessBoxEditor to correct characters), but it cannot recognize correctly and even cannot recognize it.
The below TIFF image is that I use to train my Tesseract, I collect many captchas and merge them into a image.
TIFF image
The image that I want to recognize
For example, the expected result of the above image should be k8666, but the actual result is only 66.
Does anyone give me a help? Thanks.
I tried your images using a .NET wrapper for tesseract-ocr Tesseract-ocr .Net Wrapper by Charliesw.
I got some better results like (K8EEE, K8656), i think you have to increase the text font and make it bold and i saved the image in tiff format with 96DPI resolution to get a better results than mine.

Viewing graphs in Jupyter or IPython

I am trying to use both bokeh and matplotlib in my IPython notebook... Neither work perfectly.
Attached is a screen shot of Bokeh. Matplotlib explanation is below.
Here are my system specs:
-Windows 7 with Vagrant
-Jupyter/IPython
BOKEH -- buttons are static images; there is no resizing, yet the graph is interactive
Should look like from this website: http://docs.bokeh.org/en/latest/docs/quickstart.html
MATPLOTLIB -- only static shots appear when it should be zoomable, etc (like bokeh)
1) read the documentation.
you would only change output_file() to a call to output_notebook() instead.
Which you did not seem to do above.
2) Why should it ? What di you do to make it zoomable ?
3) try not to post 2 unrelated question at the same time.
You appear to be using an older version of Bokeh. The issue with the CSS problems (button appearance) has been fixed for some time. As mentioned above, you will need to execute output_notebook() to load Bokeh for IPython notebook usage. For future reference, questions like this greatly benefit from providing as much information as possible (e.g., Bokeh and browser versions, platform information, etc.) Without that information it is impossible to diagnose problems with any certainty.

PDFBox : Converting to image : Quality loss when converting PDF containing scanned documents

My use case is pretty simple. I need to convert the PDFs to images.I tried using apache pdfbox and i am having some trouble in converting pdfs which contains scanned images. when i convert scanned image the image clarity is lost due to compression/scaling. So i was trying to extract the image data from the PDF and then store it. But the problem is i may get PDF files which will contain images and text in which case i would need to fallback to image conversion mode. The problem is how to differentiate between the pages/documents having only image and the ones with composite data. I was thinking i could use ProcSet defenition for this purpose but looks like it is marked as obsolete and non-reliable according to PDF specifications. Other possibility is to check all the objects linked to that page and see if it contains anything other than images. Please let me know if there is an easier way of doing this
Thanks
If your intention is convert pdf to image, It is better to use ImageMagick for that. If you use ImageMagick, there is a lot options to change the quality of the image. And converting pdf to image is pretty simple using ImageMagick.