Restrict unauthenticated access to files with mod_rewrite and scripting language - apache

I have scavenged for the answers online but none seem to be similar to what I am trying to achieve. As such, I hope that gurus at stackoverflow can help me out.
What is it that I am trying to accomplish?
I want to restrict access to content for non-authorized users. Accessible content to non-authorized users will be specified in a white list. All other content is blacklisted.
What is my environment?
I am running Apache in conjunction with a scripting language very similar to that of PHP. The scripting language will not be known by many but it is Fazzt ( in case you do know and are able to infer the differences of it as compared to PHP... there are no pointers / memory management, decimal values, and binary data ). I have to use this environment due to the nature of the project.
What is happening on the site?
The site authenticates users and stores authentication in sessions. An unauthenticated user is presented with a styled ( contains images, css, js, etc ) webpage. Hence, I need to white-list all of the static images, css, js files in order for them to be available for download by the client browser. Once signed in, broader range of dynamic content is presented ( as such, anything that is not white-listed is automatically black-listed ).
How did I plan to solve the problem?
This is silly but I guess obvious is not always seen. My approach involved mod_rewriting all requests to existing files that do not match .fzt and .fsp pages. The rewrite would go to a scripting file that would check the requested file against the white list. If the file is present in the list, request would get routed directly to the file ( yes, silly me... it would get mod_rewritten again >_< ). If it's not in the list, user's authentication would be checked. If the user is not authenticated, "File not found" HTTP would be returned. Otherwise, the request would be redirected to the file and served ( same folly ).
As you can see, the approach is greatly flawed. However, I am sure something of the nature should be possible... yet, I have not found any proof just yet. What do you think? Is the mod_rewrite / script a completely wrong way of performing this task? How would you do it otherwise? Note that I cannot simply slap .htaccess as the access determined by user authentication that is tracked by Fazzt ( read above, scripting language similar to that of PHP ).
Any suggestions or thoughts would be greatly appreciated!

Related

How Safe is an Obscure File Download Link?

Here's what I'm trying to do:
I want to distribute my Vcard (.vcf) file by hosting it on my personal website (this part is a rigid requirement). People will access it from a QR code on my business card, however, no links to the file will exist on my webpages.
I want to make the file publicly accessible, while ensuring that it doesn't get scraped by a bot. It will be contained in a folder disallowed from "normal" bots via robots.txt, and I will disable directory listings in Apache.
I do NOT want to introduce additional steps such as captchas or authentication.
My thought is something like how google drive does public sharing - a 44-character random string that represents the file. So....
http://mywebsite.com/private/34599771831821330576336168849178778047996955.vcf
My questions are:
1) How safe is this? Presumably, as long as I disable directory listing on Apache, the only way a bot can stumble on the file without a direct link is via random guessing. Do bots really bother trying to do just a thing?
2) If it's safe, presumably string length is key. Just how long does the string need to be to make it "safe"?
3) Is there a better way to do this than filename obscurity?
Yes, there is a better way. It is called recaptcha.
The idea should be to present the user with the captcha and if he/she/it solves it correctly, then you proceed to the download.
https://www.google.com/recaptcha/intro/index.html

Hiding/changing the virtual path in classic ASP

We have a website that requires a username and password. Once logged in, the user can select a link to a PDF in the web browser. Once this has happened they are able to see the full URL path of the PDF, they could copy and paste the path into a different browser without logging in, or send the address to someone else to look at.
I am asking this for a co-worker so I am not too sure on what is needed, but they want to change it from say "documents/customerlist.pdf" to "documents/info.asp" (not sure what the file type should be, maybe just "documents/info"?) I think that is what the goal is. Is this possible? If someone could point me in the right direction we might be able to figure it out!
I should think you can do this in ASP. You'll need to deliver the PDF dynamically via an ASP page, which detects the user's session and only serves the data if they are suitably authenticated (so copying the URL to a different browser/machine will result in a 404 or access denied, as you wish). You'll need to read the data from file and binary-write it to the browser, and set HTTP headers for mime-type, content length etc.
I'd start off with serving it on a pdf.asp?file=customerlist URL, but you can later experiment with changing this to something more readable (docs/customerlist.php). You'll need to look into URL rewriting here.
So, that's the general approach. If you do a web-search around these topics ("ASP serve binary file", "ASP URL rewriting") you are sure to get plenty of examples.

remote image embeds: how to handle ones that require authentication?

I manage a large and active forum and we're being plagued by a very serious problem. We allow users to embed remote images, much like how stackoverflow handles image (imgur) however we don't have a specific set of hosts, images can be embedded from any host with the following code:
[img]http://randomsource.org/image.png[/img]
and this works fine and dandy... except users can embed an image that require authentication, the image causes a pop-up to appear and because authentication pop-ups can be edited they put something like "please enter your [sitename] username and password here" and unfortunately our users have been falling for it.
What is the correct response to this? I have been considering the following:
Each page load has a piece of Javascript execute that checks each image on the page and its status
Have an authorised list of image hosts
Disable remote embedding completely
The problem is I've NEVER seen this happen anywhere else, yet we're plagued with it, how do we prevent this?
Its more than the password problem. You are also allowing some of your users to carry out CSRF attacks against other users. For example, a user can set up his profile image as [img]http://my-active-forum.com/some-dangerous-operation?with-some-parameters[/img].
The best solution is to -
Download the image server side and store it on the file system/database. Keep a reasonable maximum file size, otherwise the attacker can download tons of GBs of data onto your servers to hog n/w and disk resources.
Optionally, verify the file is actually an image
Serve the image using a throw-away domain or ip address. It is possible to create images that masquerade as a jar or applet; serving all files from a throwaway domain protects you
from such malicious activity.
If you cannot download the images on the server side, create a white list of allowed url patterns (not just domains) on the server side. Discard any urls that don't match this URL pattern.
You MUST NOT perform any checks in javascript. Performing checks in JS solves your immediate problems, but does not protect your from CSRF. You are still making a request to an attacker-controlled url from your users browser, and that is risky. Besides, the performance impact of that approach is prohibitive.
I think you mostly answered your own question. Personally I would have gone for a mix between option 1 and option 2: i.e. create a client-side Javascript which first checks image embed URLs against a set of white-listed hosts. For each embedded URL which is not in that list, do something along these lines, while checking that the server does not return the 401 status code.
This way there is a balance between latency (we attempt to minimize duplicate requests via the HEAD method and domain whitelists) and security.
Having said that, option 2 is the safest one, if your users can accept it.

How do I implement a secure upload/download area?

I've been asked to create a solution where people log in and are able to upload and download off of our work server. So John uploads a photo, and Jen can download it, for example. They also have to authenticate themselves.
Can someone give me a rough overview of how to implement this? I'm familiar enough with MySQL, C#, and JavaScript.
The rough overview
This should just be a matter of planning out the pieces.
at the very top of the page, put some code that checks if a user is logged in. If not, show a login form (or redirect to...). If they are logged in, show the rest of the page. If not, you'll need some logic to show a form, and then check it once it's submitted for authentication, and set a SESSION cookie or something similar.
Once the user is logged in, on the homepage, you might have an file-upload form and a listing of existing files. How you would style would depend on how many files you might expect to have. To keep things extremely simple, you could simple iterate through whatever files are in the upload directory. If you expect many more files than that, you may consider using a db.
Handle a file upload by sanitizing filenames (checking for filetype/filesize if you want to limit those) and putting the file into the directory.
Force the users to download the files (instead of having the browser decide what to do with them) for security purposes. Implementing this on certain filetypes may also be acceptable.
Other thoughts
You probably would not want the users to be able to excecute any files, so keeping the file directory hidden would be a good idea.
Keeping track of who uploaded and downloaded what is also doable, but would add another layer of complication to the script.

Figure out if a website has restricted/password protected area

I have a big list of websites and I need to know if they have areas that are password protected.
I am thinking about doing this: downloading all of them with httrack and then writing a script that looks for keywords like "Log In" and "401 Forbidden". But the problem is these websites are different/some static and some dynamic (html, cgi, php,java-applets...) and most of them won't use the same keywords...
Do you have any better ideas?
Thanks a lot!
Looking for password fields will get you so far, but won't help with sites that use HTTP authentication. Looking for 401s will help with HTTP authentication, but won't get you sites that don't use it, or ones that don't return 401. Looking for links like "log in" or "username" fields will get you some more.
I don't think that you'll be able to do this entirely automatically and be sure that you're actually detecting all the password-protected areas.
You'll probably want to take a library that is good at web automation, and write a little program yourself that reads the list of target sites from a file, checks each one, and writes to one file of "these are definitely passworded" and "these are not", and then you might want to go manually check the ones that are not, and make modifications to your program to accomodate. Using httrack is great for grabbing data, but it's not going to help with detection -- if you write your own "check for password protected area" program with a general purpose HLL, you can do more checks, and you can avoid generating more requests per site than would be necessary to determine that a password-protected area exists.
You may need to ignore robots.txt
I recommend using the python port of perls mechanize, or whatever nice web automation library your preferred language has. Almost all modern languages will have a nice library for opening and searching through web pages, and looking at HTTP headers.
If you are not capable of writing this yourself, you're going to have a rather difficult time using httrack or wget or similar and then searching through responses.
Look for forms with password fields.
You may need to scrape the site to find the login page. Look for links with phrases like "log in", "login", "sign in", "signin", or scrape the whole site (needless to say, be careful here).
I would use httrack with several limits and then search the downloaded files for password fields.
Typically, a login form could be found within two links of the home page. Almost all ecommerce sites, web apps, etc. have login forms that are accessed just by clicking on one link on the home page, but another layer or even two of depth would almost guarantee that you didn't miss any.
I would also limit the speed that httrack downloads, tell it not to download any non-HTML files, and prevent it from downloading external links. I'd also limit the number of simultaneous connections to the site to 2 or even 1. This should work for just about all of the sites you are looking at, and it should be keep you off the hosts.deny list.
You could just use wget and do something like:
wget -A html,php,jsp,htm -S -r http://www.yoursite.com > output_yoursite.txt
This will cause wget to download the entire site recursively, but only download endings listed with the -A option, in this case try to avoid heavy files.
The header will be directed to file output_yoursite.txt which you then can parse for the header value 401, which means that the part of the site requires authentication, and parse the files accordingly to Konrad's recommendation also.
Looking for 401 codes won't reliably catch them as sites might not produce links to anything you don't have privileges for. That is, until you are logged in, it won't show you anything you need to log in for. OTOH some sites (ones with all static content for example) manage to pop a login dialog box for some pages so looking for password input tags would also miss stuff.
My advice: find a spider program that you can get source for, add in whatever tests (plural) you plan on using and make it stop of the first positive result. Look for a spider that can be throttled way back, can ignore non HTML files (maybe by making HEAD requests and looking at the mime type) and can work with more than one site independently and simultaneously.
You might try using cURL and just attempting to connect to each site in turn (possibly put them in a text file and read each line, try to connect, repeat).
You can set up one of the callbacks to check the HTTP response code and do whatever you need from there.