I'm learning to become a security researcher so my question is solely to understand/learn what happens and not for doing bad stuff.
I'm currently looking at file upload. In the specific case the file upload occurs with a HTTP Post request with the content-type header set to the actual MIME-type of the file (e.g. image/png) and not multipart/form-data. The HTTP request body contains the actual file content without any encoding, at least as far as I can see (when I save the raw HTTP request body to a file and compare it with the original file they are identical).
I would like to modify the HTTP request body to include the content of another file. The question is how can I obtain the actual content of this second file and replace the HTTP request body with this in a correct way (formatting, encoding,...).
I've used already tried many things (cat + copy paste, texteditor, xclip and alternatives, piping etc...) but for some reason the content always change (formatting,encoding, etc...) and the request is rejected by the servers as invalid. Even if I modify it with the content of the same file and when I save the raw request body the are different.
I know my issue is actually related to a lack of basic file content/handling etc. knowledge so I've already spend hours to search the web but seems I'm missing the correct terminology to find interesting resources.
So more than just having a solution for this specific use case I would like to understand what is happening in the process. So please feel free to point me to any resources that could help me to understand the underlying better.
I really appreciate any help, info. Many thanks in advance.
Best regards,
Related
I have collected all the requests made by websites with the aim to identify the third-parties through the requests which are made by a website. I used selenium and WebDriver to do that.
These requests can be made by the JavaScript present in the source code of the website or can be dynamically called by the web-page from the advertisements or can be initiated by Google or DoubleClick or Facebook. These requests help to track the data that is being shared by these websites with or without the user consent.
You can see an example of the requests when the browser wants to load this website: www.focuscamera.com/ in this excel file:
https://drive.google.com/file/d/16wNA0dFUehrjPww31TAIj8GZUZ05LsIU/view?usp=sharing
My questions are:
1- which kind of HTTP header field can be used for my analysis if I tend to gather some info about third parties? my goal is to distinguish and differentiate the third party behavior!
For example, the field content-length in the requests indicates the size of the entity-body. So a request with higher content-length means that the third party received and collect more data/information?
2- What does exactly content-length indicates? what does exactly "HTTP request body data" contain?
3- Are there any other HTTP header fields that I can use if I aim to distinguish and differentiate the third party behavior? ( a list of field I collect can be found in sheet1 of the excel file I shared before)
4- Are there any other information on the internet that I can use if I aim to distinguish and differentiate the third party behavior? For example, I use cookiepedia.co.uk in order to know what kind of services third parties provide? is it functionality, performance, or Targeting/advertising?
It sounds like you may be reinventing the wheel here. Take a look at https://webbkoll.dataskydd.net; they provide lots of security and privacy analysis on any site you like. Generate nice visual request maps using https://requestmap.webperf.tools:
Try using that tool on sites like wired.com and forbes.com to see how spectacularly bad it can get!
To answer your questions specifically:
Headers are not massively useful as they are within each request (it's the request itself that's more interesting), but the important ones from a privacy perspective will be Referer and Set-cookie. Content-length does indeed tell you how big the request body is – that will always be 0 on a GET request and so is usually omitted – large post requests indicate more data is being transmitted, but that may be down to inefficiency rather than anything else.
Content-length indicates the length of the data (in bytes) within the body of a POST request. An HTTP request body can contain any kind of data: text, images, video, audio, formatted data.
There are some, but most headers are functional rather than semantic, concerned with making the request actually work. It's more interesting that requests happen at all than what they contain.
You can't necessarily tell what kind of service a third party is providing from the requests themselves, but the domains they are going to are more interesting. For example anything going to doubleclick.com is going to be ad and tracking related because of what that domain is known to be used for (Webbkoll cites these as "known trackers"); So you're correct that sites like cookiepedia can help you find out what a particular service does. The divisions between functional/performance/profiling are mostly made up by ad companies to excuse their behaviour, and you can't tell what they are using data for, only whether they are receiving data, and what data they are receiving (because you can see what's in the requests they make using browser developer tools). To clarify - a site could receive your full name and address, but do absolutely nothing with it; but you can't tell that from looking at the data that's sent. In privacy terms, it's always best to assume the worst (because ad companies absolutely cannot be trusted!), so if they are receiving data, assume it will be abused.
My index page is tri-lingual... in this scenario, W3 informs us that the original 'ID solution' was dropped, without a replacement......
W3 does suggest the use of HTTP headers, but fails to explain how this is accomplished.
Can stackoverflow solve this problem?
Background
W3 suggests that this code is not good/should not be used:
<meta http-equiv="Content-Language" content="de, fr, en">
However, they then say that there is nothing to replace it:
One implication of HTML5 dropping the meta element for declaring
language is that there is now no obvious way to provide metadata about
the document inside the document itself.
That's a painful statement, but... they then go on to suggest that "content-language" should be specified in a HTTP header.
This information is associated with a particular page by settings on
the server or by server-side scripting.
Fantastic... they even show a typical example... great!
HTTP/1.1·200·OK
Date:·Sat,·23·Jul·2011·07:28:50·GMT
Server:·Apache/2
Content-Location:·qa-http-and-lang.en.php
Vary:·negotiate,accept-language,Accept-Encoding
TCN:·choice
P3P:·policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Connection:·close
Transfer-Encoding:·chunked
Content-Type:·text/html; charset=utf-8
Content-Language:·en
But where is this file... and why is this character "·" used?
Why not use comma separated en, fr, de ?
Rant (after hours of researching):
If website programmers are advised not to use in-doc programming, it would be better if we were told exactly how to edit the HTTP header for any given page.
Therefore the question is simple?
Using CPanel, or Filezilla (and perhaps notepad++)... How do I modify the HTTP header for index.html to show that it contains English, French, German?
Note: I am currently using the bad code PLUS 'lang tags' eg:
<li lang="fr">
I'm trying to do what is right, but after looking on 'HTTP header help-sites', I never once found a statement re:
Exact file location
Filename and extension
Can anybody help solve this mystery?
If you didn't manage to find this, the HTTP Headers are what you are after as they describe the language you are expecting your target audience to use, and it can be multiple languages. HTTP Headers are set on your web server and apply to every page in your website.
If you are using Apache find the .htaccess file and add something along the lines of:
Header set Content-Language "en"
If you are using IIS then:
navigate to your website in the IIS GUI
double-click 'Http Response headers'
click 'Add...'
the name is Content-Language, the value is the language you want to use, for example use en for any kind of English, use commas to seperate multiple
Click OK
I got most of my info from here:
https://www.w3.org/International/questions/qa-html-language-declarations#metadata
Here is a list of the subtags you can use:
http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
thickguru supplied the .htaccess solution above, many thanks, his answer is here:
Language not declared Ideally
We have a website that requires a username and password. Once logged in, the user can select a link to a PDF in the web browser. Once this has happened they are able to see the full URL path of the PDF, they could copy and paste the path into a different browser without logging in, or send the address to someone else to look at.
I am asking this for a co-worker so I am not too sure on what is needed, but they want to change it from say "documents/customerlist.pdf" to "documents/info.asp" (not sure what the file type should be, maybe just "documents/info"?) I think that is what the goal is. Is this possible? If someone could point me in the right direction we might be able to figure it out!
I should think you can do this in ASP. You'll need to deliver the PDF dynamically via an ASP page, which detects the user's session and only serves the data if they are suitably authenticated (so copying the URL to a different browser/machine will result in a 404 or access denied, as you wish). You'll need to read the data from file and binary-write it to the browser, and set HTTP headers for mime-type, content length etc.
I'd start off with serving it on a pdf.asp?file=customerlist URL, but you can later experiment with changing this to something more readable (docs/customerlist.php). You'll need to look into URL rewriting here.
So, that's the general approach. If you do a web-search around these topics ("ASP serve binary file", "ASP URL rewriting") you are sure to get plenty of examples.
is it necessary to mention content-type in http header while uploading the file. i tried using c#. i had set it "image/png" while uploading a pdf file and when i downloaded the uploaded file, the pdf file was perfect. it didn't get corrupted.
so what is the role of specifying content-type in http header.
can it be null or any other wrong value.
because the application that i am making, user will just give the file and and i just need to upload it.
any help highly appreciated. thanks in advance.
Weird that nobody answered this question.
You should always set the content-type, some software (servers) may break when it's omitted.
You could give it any value that the target system can handle.
Since you can easily fake http request, you can also fake the headers.
So your target system (in this case an upload processor) should only accept content-type
values it's able to handle and you MUST validate if the given content-type actually matches the data that is send in the body (the uploaded file itself).
You can never trust the content-type value, until you validate it somehow.
As a PHP developer I always check any file upload against a mime-type validator to be sure I got what I expected. For example I use getimagesize() to detect whether it's an image or not and if it is to get it's file format type (PNG).
Since both PNG and PDF files are binary file formats your upload succeeded.
This is because you coded it that way or the target system falls back on default settings or does some checking for itself.
I an trying to do the following -
1) Access POST submission data in Apache
2) Create a file name by using some of the parameters in POST data from 1)
I will like to then read the file from disk vs using back-end application to regenerate the page.
Any help will be greatly appreciated.
Your methodology is fundamentally flawed.
A POST should only be used where you want to change the state of the server. If you're serving up a cached file instead then you're not changing the state.
If generating the content after the POST can be cached, then the way to implement this would be to return a 302/303/307 header in the POST target pointing to the page which should be rendered (and can be cached).
C.