Adobe reader online doesn't read all pdf? - header

As the title says i made a script to read pdf files. Only specifical files can be opened. All files last modified till 29-09-2008 can be opened. All files after can't.
Here is my code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Stienser Omroeper</title>
</head>
<body>
<?php
$file = 'E:/Omrop/'.$_GET['y'].'/'.$_GET['f'];
$filename = $_GET['f'];
header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');
#readfile($file);
?>
</body>
</html>
The $_GET contains y (year for map structure) and f (the filename). If i echo $file after and use the link in run on my pc it works perfectly. In browser i get the message This file is broken and can't be repaired..
Anybody ideas?

This code contains a filesystem traversal vulnerability. You are performing no validation of the arguments that lead to the file. Files on disk are blindly opened and fed to the client.
What if you were on a Unix system? What would happen if someone submitted ?y=&f=../../../etc/passwd?
That doesn't even touch the fact that you aren't doing any sort of sanitization on the user's desired filename for the file. The user could submit entirely bogus data there and get an entirely bogus filename.
This code performs no error checking, and even expressly turns errors off when throwing the file at the user using readfile. This is the root of your problem. Nobody has any idea what's going wrong.
So, we can fix this.
First things first, you're going to want to do some validation on y and f. You mentioned that y is a year, so
$year = (int)$_GET['y'];
should do the trick. By forcing it into an integer, you remove any horibleness there.
f is going to be a bit more tricky. You haven't given us an idea about what the files are named. You're going to want to add some pattern matching validation to ensure that only valid filenames are looked for. For example, if all the PDFs are named "report_something_0000.pdf", then you'd want to validate against, say
$file = null;
if(preg_match('/^report_something_\d{4}\.pdf$/', $_GET['f'])) {
$file = $_GET['f'];
}
Now that we've got a valid filename and a valid year directory, the next step is making sure the file exists.
$path = 'E:/Omrop/' . $year . '/' . $file;
if(!$file || !file_exists($path) || !is_readable($path)) {
header('HTTP/1.0 404 File Not Found', true, 404);
header('Content-type: text/html');
echo "<h1>404 File Not Found</h1>";
exit;
}
If $file ended up not being set because the pattern match failed, or if the resulting file path wasn't found, then the script will bail with an error message.
I'm going to guess that your problems opening older PDFs are caused by the files not existing or having bad permissions. You're feeding Adobe Reader the right headers and then no data.
You'll also want to perform the same kind of sanity checking on the user-supplied desired filename. Again, I don't know your requirements here, but make sure that nothing bogus can sneak in.
Next, get rid of the # in front of readfile. It's suppressing any actual errors, and you're going to want to see them. Because you probably don't want to see them in the output, make sure to set up an error log instead.
Finally... how is this code even working? You're emitting headers in the middle of HTML! Not only that, you're giving explicit content-lengths while doing so. You should be getting a hell of a lot of errors from this. Are you sure that you didn't accidentally copy/paste some code wrong here? Maybe you forgot a section at the top where you're calling ob_start()? Regardless, ditch everything before the opening <?php tag.

Related

How to access pdf file outside of public_html in joomla site?

Actually i want to edit a module to fetch PDF file outside from public_html.
I already tried to change permission of that file from which i want to fetch PDF to 777.
I am trying to fetch PDF by following codes
$baseurl = JURI::base();
$outside_baseurl = $baseurl.'../pdf/name.pdf';
Shows this error
Cannot access file!
https://mysitedomain.com/../pdf/name.pdf
It's really not safe to access a file outside the scope of your public folder in the open like that. It has the potential to open serious security holes. If you are trying to do this to modify or use the PDF file for something inside PHP, you should be able to. If you are trying to send it to a user for download or preview, you might wanna try fpassthru(). Something like the example below.
<?php
$path = 'path/to/file.pdf';
$public_name = basename($path);
$finfo = finfo_open(FILEINFO_MIME_TYPE);
$mime_type = finfo_file($finfo, $path);
header("Content-Disposition: attachment; filename=$public_name;");
header("Content-Type: $mime_type");
header('Content-Length: ' . filesize($path));
$fop = fopen($path, 'rb');
fpassthru($fop);
exit;
This should serve your purpose.

MSDN Microsoft Translator API

I used the PHP script exactly from MSDN (with my own Id) and it works fine (urlencoding my text!). I can hear my text!
So far, I'm happy, but... the script overwrites my own page, leaving just a tab for playing the text.
How can I capture the response in an mp3 file, from within this PHP script?
Hope someone can help me out!
Meantime it's OK:
I removed
header('Content-Type: audio/mp3');
And put in place
$mp3 = $translatorObj->curlRequest($url, $authHeader);
$file = md5($text); // random name
$file = "audio/" . $file . ".mp3";
file_put_contents($file, $mp3);

SEO, Google Webmaster Tools - How can get I generate a 404 crawl error report for bad URLs that are in the sitemap?

I have an automatically generated sitemap for a large website which contains a number of URLs that are causing 404 errors which I need to remove. I need to generate a report based on only the URLs that are in the sitemap and not crawl errors caused by bad links on the site. I can not see any way of filtering the crawl error reports to only include these URLs. Does anyone know of a way that I can achieve this?
Thanks
I'm not sure you can do it easily from webmaster tools, but it is trivial to check them all yourself. Here is a perl program that will accept a sitemap file and check each line, printing each url along with its status.
#!/usr/bin/perl
use strict;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
while (my $line = <>){
if ($line =~ /\<loc\>(.*?)\<\/loc\>/){
my $url = $1;
my $response = $ua->get($url);
my $status = $response->status_line;
$status =~ s/ .*//g;
print "$status $url\n";
}
}
I save it as checksitemapstatus.pl and use it like this:
$ /tmp/checksitemap.pl /tmp/sitemap.xml
200 http://example.com/
404 http://example.com/notfound.html
Nothing natively within WMT. You'll want to do some Excel.
Download the list of busted links
Get your list of sitemap links.
Put them side by side.
Use a VLOOKUP to match columns (http://www.techonthenet.com/excel/formulas/vlookup.php)
As a bonus, use some conditional formatting to make it easier to see if they match. Then, sort by colour.
You can also import the sitemap.xml into A1 Website Analyzer and let it scan them. See:
http://www.microsystools.com/products/website-analyzer/help/crawl-website-pages-list/
After that, you can filter scan results by e.,g. 404 response code and export that to CSV if need to be. (Including if-so-wanted from where they are linked.)

php using "Content-Disposition:attachment; and Content-type:application/octet stream, gives "/n"

$file_name="test.key"; $key="1111111";
header("Content-disposition: attachment; filename=$file_name");
header("Content-type: application/octet-stream"); echo trim($key);
the above listed is my piece of code but when this is called, when opening the document, I am having to new lines before "11111111".
May I know, how to avoid new line space, is this an issue or am I using wrong headers,
Here the scenario is I'll get the key from the database and give that key as a download option.
Thanks in advance

wget without any headers

I would like to get the files without headers. I have tried many things like
wget --header="" http://xxxxx.xxxxxx.xx
How can I get any files without headers?
This doesn't quite answer the question, but I got here by looking up "remove default header wget" so I'll put in my 2 cents.
You can remove the User-Agent header with -U "". This was useful for me because the Geometry Dash servers will reject your request if it has a user agent.
Could you assign the output of wget to a string, then use something else to process it to drop headers (or parse them out of the text)?
For example, using bash and grep, you can store the html from a webpage as a string, then use grep to extract the text in the <body> section:
w1=$(wget --quiet --output-document - www.example.com)
echo $w1 | grep --only-matching "<body>.*</body>"
which gives the output below (I have added some newlines to improve how it displays here):
<body> <div>
<h1>Example Domain</h1> <p>
This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.
</p> <p>
More information...</p>
</div> </body>