I have a url in my website that opens a pdf page that is loaded from Google Drive using https://docs.google.com/document/d/[document id]/export?format=pdf url, using a simple php readfile code. However, the generated pdf does not have metadata such as Title, Author, and Description.
What is the best way serve the pdf with the updated metadata?
Caveats:
The website uses a shared cPanel web hosting.
I cannot install perl modules.
The host doesn't provide native php pdf support, and I don't have access to composer.
The only supported server-side languages are perl and php.
The only acceptable solution I found was using ConvertApi, which has a very limited free account (1500 seconds). However, I can get around that by caching the pdf and either retrieving a new copy when it's been over a day since the it was last updated or when I pass an argument to force it to re-cache.
Do you recommend any other solutions? I very much rather have set-it-and-forget-it solution.
Or is 1500 seconds enough for a file that is rarely going to be used?
Related
I want to automatically download pdf files from a pool of sites like these:
https://www.wfp.org/publications?f%5B0%5D=topics%3A2234
https://www.unhcr.org/search?comid=4a1d3b346&cid=49aea93a6a&scid=49aea93a39&tags=evaluation%20report
https://www.unicef.org/evaluation/reports#/
I then want to upload them onto my own site.
Can I use Python to build a script for this function? I'd need to scrape the websites periodically so that, as soon as a new file is uploaded, the file is automatically downloaded to my server.
Lastly, assuming I'm sharing these on my own website for non-profit purposes, is this legal?
You can use the python-moduled requests and beautifulsoup4 to periodically scrape the websites and download the pdfs like so Download files using requests and BeautifulSoup .
Then you can save them in your servers web-path and display them dynamically.
I'm not a lawyer but i think this is not legal. Its like secretly recording a movie in the cinema and then sharing it online which is super not legal.
I have a log server, where users upload archives and view their content online when needed. Currently the server unzips files, right after receiving them. Unfortunately, my peers consumed all the drive space I had. I can free up a lot of space, if there's a way of storing ZIP archives, but feeding them to users as HTML page (same as default Apache's file browser).
I know there are solutions relying on JS, like:
http://gildas-lormeau.github.io/zip.js/demos/demo2.html
https://stuk.github.io/jszip/
or I can unzip them on demand at server side and provide link to a temporary folder. However, some time ago I've heard a browser can view an archive content if proper headers are sent from Apache/nginx. Apache's mod-deflate doesn't help much here and I can't find other docs - perhaps it's not possible after all?
Cheers.
I am developing a web application to upload .mp3 files and need to play them. I successfully uploaded the files and saving them in C:/uploads folder. I understand that as it's a web application we need to save them in the Apache web server it self. But I am not sure, where to save them.
Thanks,
Serenity.
You can use content repositories to store uploaded data, I think this is common approach. For instance, take a look at the Apache JackRabbit CR, applying it you won't easy look for uploaded files on hard drive, but you will have web interface, and also some other tools available to connect to repository and show you files there etc.
As alternative to JackRabbit, you can try Alfresco CMS, they both implement JCR, other implementations are listed here (you will them at the bottom of that page).
We have a FogBugz 6 installation, with a good deal of wiki content in place. We're transitioning to use Atlassian products (JIRA and Confluence), so we'd like to get that wiki content into Confluence. How would you approach this?
Unfortunately, FogBugz doesn't appear to provide any kind of wiki export functionality, and Confluence doesn't provide any FogBugz wiki import.
FogBugz does have an API, but its a little light on the details w.r.t. accessing wiki content. We don't really care about past revisions of pages (just content, links, and images/attachments), so it's not clear that the API gets us any further than scraping the FB wikis with wget or something, and working with the HTML and images/attachments from there.
Confluence has a pretty full-featured content import utility that supports a number of source wikis:
TWiki
PmWiki
DokuWiki
Mediawiki
MoinMoin
Jotspot
Tikiwiki
Jspwiki
Sharepoint
SWiki
Vqwiki
XWiki
Trac
No FogBugz option there, but if we could export the FogBugz wiki content into one of the above wikis, then we could likely use the Confluence multi-wiki importer from there.
Alternatively, we could use wget to scrape the FogBugz wiki content, and then find a way to get static HTML + images + attachments into either Confluence or into one of the above other wikis as a stepping stone to Confluence.
Thoughts?
A colleague ended up figuring this one out, and the process ended up being generally-applicable to other web content we wanted to pull into Confluence as well. In broad strokes, the process involved:
Using wget to suck all of the content out of FogBugz (configured so that images and attachments were downloaded properly, and links to them and to other pages were properly relativized).
Using a simple XSLT transform to strip away the "template" content (e.g. logos, control/navigation links, etc) that surrounded the body of each page.
(optionally) Using a perl module to convert the resulting HTML fragments into Confluence's markup format
Using the Confluence command line interface to push up all of the page, image, and attachment data.
Note that I said "optionally" in #3 above. That is because the Confluence CLI has two relevant options: it can be used to create new pages directly, in which case it's expecting Confluence markup already, or it can be used to create new pages using HTML, which it converts to Confluence markup itself. In some cases, the Confluence CLI converted the HTML just fine; for other data sources, we needed to use the perl module.
What I'm looking for is some sort of a proxy tool that will allow me to specify a local file to load instead of one specified in the web page that is being browsed. I have tried Burp Suite which is almost working - it allows us to intercept a file and replace it by pasting the contents of the file we are swapping in into an input field. The file content is compiled code (Flash content) so we are pasting in bytecode, but something isn't working.
The reason is we are a 3rd party software developer without access to our client's development or testing environments. Our content must interact correctly with the rest of the content on their webpage (there are elements on their page that communicate with our content) and to test any changes we make takes several hours turnaround to get our files uploaded to their servers. So what we need is some sort of hacking tool to let us test our work with their web pages, hence the requirement to specify a file in a webpage to swap with a local version.
The autoresponder feature in Fiddler Web Debugging Proxy might do what you need, if it's only static content.
I've been using HTTP::Proxy for a long time, and it has always helped me fiddle with things on the fly.
You might be able to do this with Greasemonkey but I'm not sure if the tests will be totally reliable.
http://diveintogreasemonkey.org/patterns/replace-element.html
And if Greasemonkey seems plain wrong for you I would take it as the perfect excuse to try out mouseHole. Now I have to admit that I've never tried it but since _why also made Hpricot I expect it to be fun, productive, and different.