wget without any headers - header

I would like to get the files without headers. I have tried many things like
wget --header="" http://xxxxx.xxxxxx.xx
How can I get any files without headers?

This doesn't quite answer the question, but I got here by looking up "remove default header wget" so I'll put in my 2 cents.
You can remove the User-Agent header with -U "". This was useful for me because the Geometry Dash servers will reject your request if it has a user agent.

Could you assign the output of wget to a string, then use something else to process it to drop headers (or parse them out of the text)?
For example, using bash and grep, you can store the html from a webpage as a string, then use grep to extract the text in the <body> section:
w1=$(wget --quiet --output-document - www.example.com)
echo $w1 | grep --only-matching "<body>.*</body>"
which gives the output below (I have added some newlines to improve how it displays here):
<body> <div>
<h1>Example Domain</h1> <p>
This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.
</p> <p>
More information...</p>
</div> </body>

Related

Converting linux commands to URI/CGI encoded. A better way?

I am testing some PHP apps for injectable commands. I have to convert my commands to a URI/CGI encoded format. I am wondering if there is a better way to do it.
When I want to include a ping (to test if the app is, in fact, executing from an injection) I am converting it as follows.
hURL -X --esc ";ping localhost -c 1" | sed -e ‘s/\\x/\%/g’
Here is the output.
%3b%20%70%69%6e%67%20%6c%6f%63%61%6c%68%6f%73%74%20%2d%63%20%31
Works perfect. The code is injected and logs are showing it being handled as expected.
QUESTION: Is there a better way to convert to the above. I think I am over complicating things.
You could possibly use an out-of-the-box library for doing the escaping, may be a little easier on the eye ...
$ echo ';ping localhost -c 1' | perl -ne 'use URI::Escape; print(uri_escape($_) . "\n");'
%3Bping%20localhost%20-c%201%0A
Obviously this output does not escape legitimate url chars so not sure this entirely answers your question ...

Find and replace multiple line string using SSH

I'm only a light user of SSH and finding this one a struggle.
I have spammy code littered through my header.php and footer.php files in multiple directories that contains something like the following multiline string:
<div style="position:absolute;filter:alpha(opacity=0);opacity:0.001;z-index:10;”>
awful spammy shoes or whatever online
blah blah outlet
</div>
I'm looking to find and replace or delete the code from the files.
Not 100% sure of what linux tools are available (eg: perl) but happy to give recommendations a try.
I've come up with an answer with help from a mate.
If there is a standard beginning and a standard ending then sed comes to the rescue.
With an opening string of:
<div style="position:absolute;filter:alpha(opacity=0);opacity:0.001;z-index:10;">
and a closing string of:
</div>
Then the following finds and removes the opening string, closing string and anything contained within:
find . -type f -name "*.php" -exec sed -i '/<div style="position:absolute;filter:alpha(opacity=0);opacity:0.001;z-index:10;">/,/<\/div>/d' {} \;
Tested, works like a charm!

Split a batch of text files using pattern

I have a directory of almost a thousand html files. Each file needs to be split up into multiple text files, based on a recurring pattern (a heading). I am on a windows machine, using GnuWin32 tools.
I've found a way to do this, for a single file:
csplit 1.html -b "%04d.txt" /"Words in heading"/ {*}
But I don't know how to repeat this operation over the entire set of HTML files. This:
csplit *.html -b "%04d.txt" /"Words in heading"/ {*}
doesn't work, and neither does this:
for %i in (*.html) do csplit *.html -b "%04d.txt" /"Words in heading"/ {*}
Both result in an invalid pattern error. Help would be much appreciated!
The options/arguments order is important with csplit. And it won’t accept multiple files. It’s help gets you there:
% csplit --help
Usage: csplit [OPTION]... FILE PATTERN...
I’m surprised your first example works for the single file. It really should be changed to:
% csplit -b "%04d.txt" 1.html "/Words in heading/" "{*}"
^^^^^^^^^^^^^ ^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^
OPTS/ARGS FILE PATTERNS
Notice also that I changed your your quoting to be around the arguments. You probably also need to have quoted your last "{*}".
I’m not sure what shell you’re using, but if that for-loop syntax is appropriate, then the fixed command should work in the loop.

How do I add a heroku deployhook:email using API?

I have been able to add a deployhook (for email and IRC) using the heroku cli (sdk). I would like to be able to add or update the email values (receipient etc) using the API. Is this possible? When I try to curl request to add a hook I get an error message noting the need for extra data: receipient, body, etc.
You can find a bit more detail about the different arguments here: https://devcenter.heroku.com/articles/deploy-hooks#email
Once you know what values you want/need you should be able to send it using this:
curl -n -g -X POST 'https://api.heroku.com/apps/MYAPP/addons/deployhooks%3Aemail?config[recipient]=me#example.com&config[subject]="MYAPP%20Deployed"&config[body]="{{user}}%20deployed%20app"'
The arguments:
-n reads credentials from netrc (this should be set by the toolbelt/cli)
-g tells it to not try and interpret the [] and {} in the url
-X POST sets it to be a POST rather than get request
Beyond that it was just a matter of encoding the params properly. I believe of those values recipient is the only required value (and the others have reasonable defaults).
You can only have one deployhook per type and they don't appear to allow updates. So if you need to change it, you'll want to remove the old one and then add another with the updated attributes. You can remove the old one like this:
curl -n -X DELETE 'https://api.heroku.com/apps/MYAPP/addons/deployhooks%3Aemail'

Adobe reader online doesn't read all pdf?

As the title says i made a script to read pdf files. Only specifical files can be opened. All files last modified till 29-09-2008 can be opened. All files after can't.
Here is my code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Stienser Omroeper</title>
</head>
<body>
<?php
$file = 'E:/Omrop/'.$_GET['y'].'/'.$_GET['f'];
$filename = $_GET['f'];
header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');
#readfile($file);
?>
</body>
</html>
The $_GET contains y (year for map structure) and f (the filename). If i echo $file after and use the link in run on my pc it works perfectly. In browser i get the message This file is broken and can't be repaired..
Anybody ideas?
This code contains a filesystem traversal vulnerability. You are performing no validation of the arguments that lead to the file. Files on disk are blindly opened and fed to the client.
What if you were on a Unix system? What would happen if someone submitted ?y=&f=../../../etc/passwd?
That doesn't even touch the fact that you aren't doing any sort of sanitization on the user's desired filename for the file. The user could submit entirely bogus data there and get an entirely bogus filename.
This code performs no error checking, and even expressly turns errors off when throwing the file at the user using readfile. This is the root of your problem. Nobody has any idea what's going wrong.
So, we can fix this.
First things first, you're going to want to do some validation on y and f. You mentioned that y is a year, so
$year = (int)$_GET['y'];
should do the trick. By forcing it into an integer, you remove any horibleness there.
f is going to be a bit more tricky. You haven't given us an idea about what the files are named. You're going to want to add some pattern matching validation to ensure that only valid filenames are looked for. For example, if all the PDFs are named "report_something_0000.pdf", then you'd want to validate against, say
$file = null;
if(preg_match('/^report_something_\d{4}\.pdf$/', $_GET['f'])) {
$file = $_GET['f'];
}
Now that we've got a valid filename and a valid year directory, the next step is making sure the file exists.
$path = 'E:/Omrop/' . $year . '/' . $file;
if(!$file || !file_exists($path) || !is_readable($path)) {
header('HTTP/1.0 404 File Not Found', true, 404);
header('Content-type: text/html');
echo "<h1>404 File Not Found</h1>";
exit;
}
If $file ended up not being set because the pattern match failed, or if the resulting file path wasn't found, then the script will bail with an error message.
I'm going to guess that your problems opening older PDFs are caused by the files not existing or having bad permissions. You're feeding Adobe Reader the right headers and then no data.
You'll also want to perform the same kind of sanity checking on the user-supplied desired filename. Again, I don't know your requirements here, but make sure that nothing bogus can sneak in.
Next, get rid of the # in front of readfile. It's suppressing any actual errors, and you're going to want to see them. Because you probably don't want to see them in the output, make sure to set up an error log instead.
Finally... how is this code even working? You're emitting headers in the middle of HTML! Not only that, you're giving explicit content-lengths while doing so. You should be getting a hell of a lot of errors from this. Are you sure that you didn't accidentally copy/paste some code wrong here? Maybe you forgot a section at the top where you're calling ob_start()? Regardless, ditch everything before the opening <?php tag.