I have a huge pdf file (152M) containing embedded videos. I reckon most of its size must come from the videos themselves. I want to make a lighter version out of it so that it's easier to share and send around, so I would like to remove the videos.
Is there a simple way to do it? In particular, using free tools? Possibly a one-off solution without needing to remove the files one by one?
I thought of using imagemagick for it, but I can't find how and I'm not even sure it's possible
Well, I could not find a way to do it. However, my original problem involved a pdf exported from a pptx presentation, so I managed to remove the videos from the presentation.
This does not answer the presentation, but I had opened a parallel question on askubuntu.com, I'll put the link here if anybody is interested: https://askubuntu.com/questions/1453755/stripping-a-pptx-presentation-from-videos-in-libreoffice-impress/
Related
I have a 300+ page PDF document which needs to have internal page links added to it to reference other pages in the document. The document is created in Visio, which does not support consistent hyperlink generation in PDF export, so the link generation needs to be done on the PDF itself, not up the chain. This is an annual need, and regularly takes over a week due to the amount of manual labor, time, and checking needed.
The text which is hyperlinked has the same format in every case (e.g., "See Section 8.18 - How to Hyperlink"), and I'm certain this can be automated, as there are commercial plugins which can do this, but they cost hundreds of dollars, and are not able to be used in this case due to restrictions imposed by my employer. Example: https://www.evermap.com/ABAddingHyperlinks.asp
I've been looking through the Acrobat Plugin SDK and it seems doable, but I know there is also a higher level scripting language available for Acrobat. Does anyone have experience working with PDFs or with the Acrobat scripting / SDK tools? Are there open source methods for doing this? I've looked everywhere! Willing to learn. I've looked at Ghostscript (Adding internal hyperlink to a pdf) but what I need is way more than just a Table of Contents, and links can appear in many places on the page with line breaks, so consistency is a challenge.
EDIT: I found a solution! Bluebeam software's Revu Extreme works pretty darn well, and can be used as a 30 day free trial of all features. Only limitation is that links which extend across a line break (multiple lines of text) do not properly work in Edge or Chrome's PDF viewer, as they don't properly support hyperlinks with multiple click regions. I've submitted a ticket requesting a feature be added to Revu that fixes this, but for now those links need to be manually fixed following the batch link. The process is described here: https://support.bluebeam.com/online-help/revu2018/Content/RevuHelp/Menus/Batch/Link/Batch-Link--T.htm
EDIT: I found a solution! Bluebeam software's Revu Extreme works pretty darn well, and can be used as a 30 day free trial of all features. Only limitation is that links which extend across a line break (multiple lines of text) do not properly work in Edge or Chrome's PDF viewer, as they don't properly support hyperlinks with multiple click regions. I've submitted a ticket requesting a feature be added to Revu that fixes this, but for now those links need to be manually fixed following the batch link. The process is described here: https://support.bluebeam.com/online-help/revu2018/Content/RevuHelp/Menus/Batch/Link/Batch-Link--T.htm
You can add hyperlinks to a document with Ghostscript, but you would need to know the location of the text to hyperlink and the destination in advance, you cannot automate it or in fact write any reasonably simple code to automate the task using Ghostscript. You'd need to modify chunks of the PDF interpreter, which is written in PostScript and is not a task for anyone not a PostScript expert.
You could probably do it with MuPDF, and probably using MuJS to script it, but I don't know enough to be certain. It would still require some coding effort, but it would probably be easier to use JavaScript at least.
I am having trouble in trying to find the solution for the below described problem.
Annotate the PDF file when user clicks on specific location in pdf and then finaly save the pdf which in future opens at annotated location.
How to approach this?
What I have tried.
I have tried to find various libraries irrespective of programming language (since programing language is not the dependency)- found few libraries like minipdf in python, pdfbox in java to mention few relevant ones. Finally selected pdfbox since it seemed to be mature enough to provide the solution closeby.
There are various hurdles now how to get user the location clicked by the user? since after getting the location I can able to perform various actions like annotating at the clicked location and then saving the pdf on the same specific location.
It seems I have to write whole pdf javascript to approach it but again how to do so?
I had similar problem and have solved it the other way. In my case I am not opening PDF in Adobe reader, but in browser. So what I did is converted the pdf to html using python libraries (Let me know if you are interested, I will share different library names with their pros and cons).
Now that html can be edited easily. We can put hyperlinks, highlights everything there as source code is with us.
This workaround may be applicable to you if your front end is web based.
PS: Wanted to post this workaround as comment, but couldn't due to little less reputation count as of now. Hope, it won't be downmarked :)
According to their website (http://www.gdpicture.com/products/managed-pdf/) you have the ability to extract fonts from a PDF file. However, I can't seem to find the functionality to do this. I have encountered several methods to add them, but none to extract them (and they don't show as embedded files). Has anyone tried to do this, or have experience with GdPicture?
Version: 14 (Current)
Disclosure: I am part of the ORPALIS technical staff that edits the GdPicture.NET SDK, that's why I know there's an ongoing communication about this already.
It is my understanding that you have a support case open for a merging issue relative to fonts and as you know, our development team is currently working on a fix that will solve it so I strongly recommend that you wait for them to finish.
There's no extraction of the embedded font as you might expect at the moment but the development team is also working on one, we will let you know as soon as it is available (it should be very soon).
You can get information about (already) embedded fonts using the GetFontCount, IsFontEmbedded, GetFontName and GetFontType methods.
You can also add new embedded fonts (of different types) using the AddFontFromFileU, AddStandardFont, AddTrueTypeFont, AddTrueTypeFontFromFile, AddTrueTypeFontFromFileU and AddTrueTypeFontU methods.
I used to use the method outlined by Obie Fernandez to download files:
http://www.therailsway.com/2009/2/22/file-downloads-done-right/
But this way is a bit outdated so curious what is better? My fear is if you have multiple clients downloading large files too much memory would be eaten up.
I need to camo the S3 url and control who can download what. Perhaps this method is still fine or maybe streaming is the way to go.
Thanks in advance.
I'm trying to solve a specific problem (but this could benefit others) which from googling around doesn't seem to have a definitive solution. I think there are probably several partial solutions out there, I'd like to find the best of those (or a combination) that does the trick most of the time.
My specific example is: users in my app can send videos to each other and I'm going to allow them to save videos they have received to their camera roll. I would like to prevent them from forwarding the video on to others. I don't need to identify a particular video, just that it was originally saved from my app.
I have achieved a pretty good solution for images by saving some EXIF metadata that I can use to identify that the image was saved from my app and reject any attempts to forward it on, but the same solution doesn't work for videos.
I'm open to any ideas. So far I've seen suggested:
Using ALAssetRepresentation in some way to save a filename and then compare it when reading in, but I've read that upgrading iOS wipes these names out
x-Saving metadata. Not possible.
MD5. I suspect iOS would modify the video in some way on saving which would invalidate this.
I've had a thought about appending a frame or two to the start of the video, perhaps an image which is a solid block of colour, magenta for example. Then when reading in, get the first frame, do some kind of processing to identify this. Is this practical or even possible?
What are your thoughts on these, and/or can you suggest anything better?
Thanks!
Steven
There are 2 approaches you could try. Both solutions only work under iOS5.
1) Save the url returned by [ALAssetRepresentation url]. Under iOS 5 this URL contains a CoreData objectID and should be persistent.
2) Use the customMetadata property of ALAsset to append custom info to any asset you saved yourself.
Cheers,
Hendrik