We have a website that requires a username and password. Once logged in, the user can select a link to a PDF in the web browser. Once this has happened they are able to see the full URL path of the PDF, they could copy and paste the path into a different browser without logging in, or send the address to someone else to look at.
I am asking this for a co-worker so I am not too sure on what is needed, but they want to change it from say "documents/customerlist.pdf" to "documents/info.asp" (not sure what the file type should be, maybe just "documents/info"?) I think that is what the goal is. Is this possible? If someone could point me in the right direction we might be able to figure it out!
I should think you can do this in ASP. You'll need to deliver the PDF dynamically via an ASP page, which detects the user's session and only serves the data if they are suitably authenticated (so copying the URL to a different browser/machine will result in a 404 or access denied, as you wish). You'll need to read the data from file and binary-write it to the browser, and set HTTP headers for mime-type, content length etc.
I'd start off with serving it on a pdf.asp?file=customerlist URL, but you can later experiment with changing this to something more readable (docs/customerlist.php). You'll need to look into URL rewriting here.
So, that's the general approach. If you do a web-search around these topics ("ASP serve binary file", "ASP URL rewriting") you are sure to get plenty of examples.
Related
User uploads files, lets say image files on the remote server.
For example if user1 uploads an image to this path
http://somedomain.com/uploads/123.jpg
And I display image on the web page for the logged in user using the above url.
Suppose the user logs out. and some user come to know about the above url and he can access the image. How can I prevent this?
I will just demonstrate an idea for you. Then you can search in more detail for a solution. There are many solutions, many of them complicated, so it depends on how far you are willing to go. I think this one is a pretty simple solution (depends on your programming skills of course). So,
User clicks the link http://somedomain.com/uploads/123.jpg to open the image.
You have an htaccess file, that will take that url and do a conversion (behind the scenes).
That htaccess file will actually call, for example, the images.php file.
images.php file will get the name of the image and will check if a user is logged in or not.
If user is logged in, it will grub the image file with name, let's say, up-image-123.jpg
The htaccess file will do the conversion again and instead of revealing the real name up-image-123.jpg, will reveal the 123.jpg (which is not a real file name for someone else to access)
I have scavenged for the answers online but none seem to be similar to what I am trying to achieve. As such, I hope that gurus at stackoverflow can help me out.
What is it that I am trying to accomplish?
I want to restrict access to content for non-authorized users. Accessible content to non-authorized users will be specified in a white list. All other content is blacklisted.
What is my environment?
I am running Apache in conjunction with a scripting language very similar to that of PHP. The scripting language will not be known by many but it is Fazzt ( in case you do know and are able to infer the differences of it as compared to PHP... there are no pointers / memory management, decimal values, and binary data ). I have to use this environment due to the nature of the project.
What is happening on the site?
The site authenticates users and stores authentication in sessions. An unauthenticated user is presented with a styled ( contains images, css, js, etc ) webpage. Hence, I need to white-list all of the static images, css, js files in order for them to be available for download by the client browser. Once signed in, broader range of dynamic content is presented ( as such, anything that is not white-listed is automatically black-listed ).
How did I plan to solve the problem?
This is silly but I guess obvious is not always seen. My approach involved mod_rewriting all requests to existing files that do not match .fzt and .fsp pages. The rewrite would go to a scripting file that would check the requested file against the white list. If the file is present in the list, request would get routed directly to the file ( yes, silly me... it would get mod_rewritten again >_< ). If it's not in the list, user's authentication would be checked. If the user is not authenticated, "File not found" HTTP would be returned. Otherwise, the request would be redirected to the file and served ( same folly ).
As you can see, the approach is greatly flawed. However, I am sure something of the nature should be possible... yet, I have not found any proof just yet. What do you think? Is the mod_rewrite / script a completely wrong way of performing this task? How would you do it otherwise? Note that I cannot simply slap .htaccess as the access determined by user authentication that is tracked by Fazzt ( read above, scripting language similar to that of PHP ).
Any suggestions or thoughts would be greatly appreciated!
I manage a large and active forum and we're being plagued by a very serious problem. We allow users to embed remote images, much like how stackoverflow handles image (imgur) however we don't have a specific set of hosts, images can be embedded from any host with the following code:
[img]http://randomsource.org/image.png[/img]
and this works fine and dandy... except users can embed an image that require authentication, the image causes a pop-up to appear and because authentication pop-ups can be edited they put something like "please enter your [sitename] username and password here" and unfortunately our users have been falling for it.
What is the correct response to this? I have been considering the following:
Each page load has a piece of Javascript execute that checks each image on the page and its status
Have an authorised list of image hosts
Disable remote embedding completely
The problem is I've NEVER seen this happen anywhere else, yet we're plagued with it, how do we prevent this?
Its more than the password problem. You are also allowing some of your users to carry out CSRF attacks against other users. For example, a user can set up his profile image as [img]http://my-active-forum.com/some-dangerous-operation?with-some-parameters[/img].
The best solution is to -
Download the image server side and store it on the file system/database. Keep a reasonable maximum file size, otherwise the attacker can download tons of GBs of data onto your servers to hog n/w and disk resources.
Optionally, verify the file is actually an image
Serve the image using a throw-away domain or ip address. It is possible to create images that masquerade as a jar or applet; serving all files from a throwaway domain protects you
from such malicious activity.
If you cannot download the images on the server side, create a white list of allowed url patterns (not just domains) on the server side. Discard any urls that don't match this URL pattern.
You MUST NOT perform any checks in javascript. Performing checks in JS solves your immediate problems, but does not protect your from CSRF. You are still making a request to an attacker-controlled url from your users browser, and that is risky. Besides, the performance impact of that approach is prohibitive.
I think you mostly answered your own question. Personally I would have gone for a mix between option 1 and option 2: i.e. create a client-side Javascript which first checks image embed URLs against a set of white-listed hosts. For each embedded URL which is not in that list, do something along these lines, while checking that the server does not return the 401 status code.
This way there is a balance between latency (we attempt to minimize duplicate requests via the HEAD method and domain whitelists) and security.
Having said that, option 2 is the safest one, if your users can accept it.
I have recently placed an ad in a weekly publication that sends out a PDF file. My ad is directly linked so that the reader can click on it and go to my website. The PDF file is hosted on a different server, but is, in fact, a PDF file that has to be downloaded and viewed on that site, not emailed or shared that way. I have Google Analytics and a couple other stats tracking programs installed and I can't see the referring URL from this other site at all, in anything. Is there something I can ask the designer of the PDF file to include in her links to make them trackable? Or is this simply not possible?
Use Google Analytics Campaign Tagging.
This tool will help set it up. You'll want to classify the variables such that the source and the medium are set, at minimum.
http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=55578
So, for example, if your URL is http://example.com, you could set the parameters as such:
utm_source: BlahNews
utm_medium: newsletter
utm_campaign: july10issue
Your resulting URL would be http://example.com/?utm_source=BlahNews&utm_medium=newsletter&utm_campaign=july10issue
Google Analytics would track these hits under that Campaign, Source and medium.
If the URL is displayed raw, and want to avoid 'displaying' an ugly URL, you could setup an internal redirect to that URL, and it looks like you're using WordPress, there are a few free plugins that manage redirects like this (I happen to like 'Redirection')
So, you could tell the plugin to redirect
http://example.com/blahnews TO http://example.com/?utm_source=BlahNews&utm_medium=newsletter&utm_campaign=july10issue
Can you ask them to put some token in the query string of the URL to the site?
I have a big list of websites and I need to know if they have areas that are password protected.
I am thinking about doing this: downloading all of them with httrack and then writing a script that looks for keywords like "Log In" and "401 Forbidden". But the problem is these websites are different/some static and some dynamic (html, cgi, php,java-applets...) and most of them won't use the same keywords...
Do you have any better ideas?
Thanks a lot!
Looking for password fields will get you so far, but won't help with sites that use HTTP authentication. Looking for 401s will help with HTTP authentication, but won't get you sites that don't use it, or ones that don't return 401. Looking for links like "log in" or "username" fields will get you some more.
I don't think that you'll be able to do this entirely automatically and be sure that you're actually detecting all the password-protected areas.
You'll probably want to take a library that is good at web automation, and write a little program yourself that reads the list of target sites from a file, checks each one, and writes to one file of "these are definitely passworded" and "these are not", and then you might want to go manually check the ones that are not, and make modifications to your program to accomodate. Using httrack is great for grabbing data, but it's not going to help with detection -- if you write your own "check for password protected area" program with a general purpose HLL, you can do more checks, and you can avoid generating more requests per site than would be necessary to determine that a password-protected area exists.
You may need to ignore robots.txt
I recommend using the python port of perls mechanize, or whatever nice web automation library your preferred language has. Almost all modern languages will have a nice library for opening and searching through web pages, and looking at HTTP headers.
If you are not capable of writing this yourself, you're going to have a rather difficult time using httrack or wget or similar and then searching through responses.
Look for forms with password fields.
You may need to scrape the site to find the login page. Look for links with phrases like "log in", "login", "sign in", "signin", or scrape the whole site (needless to say, be careful here).
I would use httrack with several limits and then search the downloaded files for password fields.
Typically, a login form could be found within two links of the home page. Almost all ecommerce sites, web apps, etc. have login forms that are accessed just by clicking on one link on the home page, but another layer or even two of depth would almost guarantee that you didn't miss any.
I would also limit the speed that httrack downloads, tell it not to download any non-HTML files, and prevent it from downloading external links. I'd also limit the number of simultaneous connections to the site to 2 or even 1. This should work for just about all of the sites you are looking at, and it should be keep you off the hosts.deny list.
You could just use wget and do something like:
wget -A html,php,jsp,htm -S -r http://www.yoursite.com > output_yoursite.txt
This will cause wget to download the entire site recursively, but only download endings listed with the -A option, in this case try to avoid heavy files.
The header will be directed to file output_yoursite.txt which you then can parse for the header value 401, which means that the part of the site requires authentication, and parse the files accordingly to Konrad's recommendation also.
Looking for 401 codes won't reliably catch them as sites might not produce links to anything you don't have privileges for. That is, until you are logged in, it won't show you anything you need to log in for. OTOH some sites (ones with all static content for example) manage to pop a login dialog box for some pages so looking for password input tags would also miss stuff.
My advice: find a spider program that you can get source for, add in whatever tests (plural) you plan on using and make it stop of the first positive result. Look for a spider that can be throttled way back, can ignore non HTML files (maybe by making HEAD requests and looking at the mime type) and can work with more than one site independently and simultaneously.
You might try using cURL and just attempting to connect to each site in turn (possibly put them in a text file and read each line, try to connect, repeat).
You can set up one of the callbacks to check the HTTP response code and do whatever you need from there.