remote image embeds: how to handle ones that require authentication? - authentication

I manage a large and active forum and we're being plagued by a very serious problem. We allow users to embed remote images, much like how stackoverflow handles image (imgur) however we don't have a specific set of hosts, images can be embedded from any host with the following code:
[img]http://randomsource.org/image.png[/img]
and this works fine and dandy... except users can embed an image that require authentication, the image causes a pop-up to appear and because authentication pop-ups can be edited they put something like "please enter your [sitename] username and password here" and unfortunately our users have been falling for it.
What is the correct response to this? I have been considering the following:
Each page load has a piece of Javascript execute that checks each image on the page and its status
Have an authorised list of image hosts
Disable remote embedding completely
The problem is I've NEVER seen this happen anywhere else, yet we're plagued with it, how do we prevent this?

Its more than the password problem. You are also allowing some of your users to carry out CSRF attacks against other users. For example, a user can set up his profile image as [img]http://my-active-forum.com/some-dangerous-operation?with-some-parameters[/img].
The best solution is to -
Download the image server side and store it on the file system/database. Keep a reasonable maximum file size, otherwise the attacker can download tons of GBs of data onto your servers to hog n/w and disk resources.
Optionally, verify the file is actually an image
Serve the image using a throw-away domain or ip address. It is possible to create images that masquerade as a jar or applet; serving all files from a throwaway domain protects you
from such malicious activity.
If you cannot download the images on the server side, create a white list of allowed url patterns (not just domains) on the server side. Discard any urls that don't match this URL pattern.
You MUST NOT perform any checks in javascript. Performing checks in JS solves your immediate problems, but does not protect your from CSRF. You are still making a request to an attacker-controlled url from your users browser, and that is risky. Besides, the performance impact of that approach is prohibitive.

I think you mostly answered your own question. Personally I would have gone for a mix between option 1 and option 2: i.e. create a client-side Javascript which first checks image embed URLs against a set of white-listed hosts. For each embedded URL which is not in that list, do something along these lines, while checking that the server does not return the 401 status code.
This way there is a balance between latency (we attempt to minimize duplicate requests via the HEAD method and domain whitelists) and security.
Having said that, option 2 is the safest one, if your users can accept it.

Related

What is preventing people from using someone else's CAPTCHA as their own?

Why (other than moral reasons) don't more people use the CAPTCHAs of other sites as their own while selling the solving of said CAPTCHAs?
To me, such a system seems like it would be simple to implement:
set up a script that does something on another website that requires a CAPTCHA to be completed through the use of a proxy service
when a user on your site performs a task that requires the completion of a CAPTCHA, simply serve them the CAPTCHA that the other
site asks you to solve
when the user solves the CAPTCHA, your script can perform the desired action on the other site that is the source of the CAPTCHA,
and the user on your site is also verified through this process
Is this commonplace? If not, why not? What, if anything, could be done to prevent this?
Fetching the captcha. Assuming one could easily fetch the exact visual of the captcha from the foreign host. To do this, you have to pass the referral check (most browsers (navigated by humans) allow to send the http_referer). You also would have to save the session_id and the secret from the hidden input.
Checking the result. The foreign host must link the saved variables with the ones associated with the session of your first request, which requires you to implement tricky cURL methods. You would have to handle multiple parallel requests, all from your single ip.
Your server will probably use more resources when hacking a captcha on a foreign host than if it generates a captcha on its own.
Prevents
http_referer check
limit requests for single IP to e.g. 5 / minute
good session handling and tricky cookies
it's not impossible to reverse engineer javascript, but the more complicated your javascript is, ...
you have to find a pattern that recognizes the result on the foreign host. the easiest signature may be the Location header field, leading either to /path/success.html or /path/tryagain.php
Challenge:
I took a moment to prepare an example: http://woisteinebank.de/test/
In this example, I attach keys to the session_id(); and save it in the database.
Through session_regenerate_id(); I have a fresh session on every request.
In check.php, I compare the database values to the $_GET values.
Try to find a way to get leech this captcha, I'll try to defend. Everytime you sucessfully use my captcha on your site, I try to defend it.

Restrict unauthenticated access to files with mod_rewrite and scripting language

I have scavenged for the answers online but none seem to be similar to what I am trying to achieve. As such, I hope that gurus at stackoverflow can help me out.
What is it that I am trying to accomplish?
I want to restrict access to content for non-authorized users. Accessible content to non-authorized users will be specified in a white list. All other content is blacklisted.
What is my environment?
I am running Apache in conjunction with a scripting language very similar to that of PHP. The scripting language will not be known by many but it is Fazzt ( in case you do know and are able to infer the differences of it as compared to PHP... there are no pointers / memory management, decimal values, and binary data ). I have to use this environment due to the nature of the project.
What is happening on the site?
The site authenticates users and stores authentication in sessions. An unauthenticated user is presented with a styled ( contains images, css, js, etc ) webpage. Hence, I need to white-list all of the static images, css, js files in order for them to be available for download by the client browser. Once signed in, broader range of dynamic content is presented ( as such, anything that is not white-listed is automatically black-listed ).
How did I plan to solve the problem?
This is silly but I guess obvious is not always seen. My approach involved mod_rewriting all requests to existing files that do not match .fzt and .fsp pages. The rewrite would go to a scripting file that would check the requested file against the white list. If the file is present in the list, request would get routed directly to the file ( yes, silly me... it would get mod_rewritten again >_< ). If it's not in the list, user's authentication would be checked. If the user is not authenticated, "File not found" HTTP would be returned. Otherwise, the request would be redirected to the file and served ( same folly ).
As you can see, the approach is greatly flawed. However, I am sure something of the nature should be possible... yet, I have not found any proof just yet. What do you think? Is the mod_rewrite / script a completely wrong way of performing this task? How would you do it otherwise? Note that I cannot simply slap .htaccess as the access determined by user authentication that is tracked by Fazzt ( read above, scripting language similar to that of PHP ).
Any suggestions or thoughts would be greatly appreciated!

How do I implement a secure upload/download area?

I've been asked to create a solution where people log in and are able to upload and download off of our work server. So John uploads a photo, and Jen can download it, for example. They also have to authenticate themselves.
Can someone give me a rough overview of how to implement this? I'm familiar enough with MySQL, C#, and JavaScript.
The rough overview
This should just be a matter of planning out the pieces.
at the very top of the page, put some code that checks if a user is logged in. If not, show a login form (or redirect to...). If they are logged in, show the rest of the page. If not, you'll need some logic to show a form, and then check it once it's submitted for authentication, and set a SESSION cookie or something similar.
Once the user is logged in, on the homepage, you might have an file-upload form and a listing of existing files. How you would style would depend on how many files you might expect to have. To keep things extremely simple, you could simple iterate through whatever files are in the upload directory. If you expect many more files than that, you may consider using a db.
Handle a file upload by sanitizing filenames (checking for filetype/filesize if you want to limit those) and putting the file into the directory.
Force the users to download the files (instead of having the browser decide what to do with them) for security purposes. Implementing this on certain filetypes may also be acceptable.
Other thoughts
You probably would not want the users to be able to excecute any files, so keeping the file directory hidden would be a good idea.
Keeping track of who uploaded and downloaded what is also doable, but would add another layer of complication to the script.

Figure out if a website has restricted/password protected area

I have a big list of websites and I need to know if they have areas that are password protected.
I am thinking about doing this: downloading all of them with httrack and then writing a script that looks for keywords like "Log In" and "401 Forbidden". But the problem is these websites are different/some static and some dynamic (html, cgi, php,java-applets...) and most of them won't use the same keywords...
Do you have any better ideas?
Thanks a lot!
Looking for password fields will get you so far, but won't help with sites that use HTTP authentication. Looking for 401s will help with HTTP authentication, but won't get you sites that don't use it, or ones that don't return 401. Looking for links like "log in" or "username" fields will get you some more.
I don't think that you'll be able to do this entirely automatically and be sure that you're actually detecting all the password-protected areas.
You'll probably want to take a library that is good at web automation, and write a little program yourself that reads the list of target sites from a file, checks each one, and writes to one file of "these are definitely passworded" and "these are not", and then you might want to go manually check the ones that are not, and make modifications to your program to accomodate. Using httrack is great for grabbing data, but it's not going to help with detection -- if you write your own "check for password protected area" program with a general purpose HLL, you can do more checks, and you can avoid generating more requests per site than would be necessary to determine that a password-protected area exists.
You may need to ignore robots.txt
I recommend using the python port of perls mechanize, or whatever nice web automation library your preferred language has. Almost all modern languages will have a nice library for opening and searching through web pages, and looking at HTTP headers.
If you are not capable of writing this yourself, you're going to have a rather difficult time using httrack or wget or similar and then searching through responses.
Look for forms with password fields.
You may need to scrape the site to find the login page. Look for links with phrases like "log in", "login", "sign in", "signin", or scrape the whole site (needless to say, be careful here).
I would use httrack with several limits and then search the downloaded files for password fields.
Typically, a login form could be found within two links of the home page. Almost all ecommerce sites, web apps, etc. have login forms that are accessed just by clicking on one link on the home page, but another layer or even two of depth would almost guarantee that you didn't miss any.
I would also limit the speed that httrack downloads, tell it not to download any non-HTML files, and prevent it from downloading external links. I'd also limit the number of simultaneous connections to the site to 2 or even 1. This should work for just about all of the sites you are looking at, and it should be keep you off the hosts.deny list.
You could just use wget and do something like:
wget -A html,php,jsp,htm -S -r http://www.yoursite.com > output_yoursite.txt
This will cause wget to download the entire site recursively, but only download endings listed with the -A option, in this case try to avoid heavy files.
The header will be directed to file output_yoursite.txt which you then can parse for the header value 401, which means that the part of the site requires authentication, and parse the files accordingly to Konrad's recommendation also.
Looking for 401 codes won't reliably catch them as sites might not produce links to anything you don't have privileges for. That is, until you are logged in, it won't show you anything you need to log in for. OTOH some sites (ones with all static content for example) manage to pop a login dialog box for some pages so looking for password input tags would also miss stuff.
My advice: find a spider program that you can get source for, add in whatever tests (plural) you plan on using and make it stop of the first positive result. Look for a spider that can be throttled way back, can ignore non HTML files (maybe by making HEAD requests and looking at the mime type) and can work with more than one site independently and simultaneously.
You might try using cURL and just attempting to connect to each site in turn (possibly put them in a text file and read each line, try to connect, repeat).
You can set up one of the callbacks to check the HTTP response code and do whatever you need from there.

Refresh browser via cron(or not) to a different page on remote request?

I need to display pages in a tutorial fashion. I looked in to netsupport, beamyourscreen and other possibilities but, I do not want the viewers to download anything. I cannot use gd / send screenshots due to audio / video instructions embedded in some of the pages.
Basically, I need the ability to "refresh" a users browser window to a different page via an interface on my end. Whether via a form submission, javascript or any other type of "controller" that allows me to change the page on the viewers browser. PERL preferred but, PHP / javascript whatever works and is cross browser. I set up a simple javascript page forward timer that "works" but, page load times and conversation interruptions are a huge factor.
The entire tutorial website will be developed around this ability.
I was looking in to curl / cron / wget methods but, found little information.
I have seen forum and chat scripts that basically perform a similar task but, there must be a simple(ish) solution in leau of hacking up another script to suit my needs.
I do not want others to control the pages either. The site really, only needs to be accessable during the tutorial however, It "could" remain web accessable as long as user interaction was normal unless (being controlled).
The initial site concept is based on instructing people how to properly introduce new pets into a home. Will be operated by a veteranarian that saved my pets life. I wanted to give something back.
Possible? I really appreciate simple examples etc...
You have no other way but to keep polling the server for "instructions" using javascript. No, you can't send nothing to the end user browser, neither curl nor wget.
Mainly, you'll have to set up a simple request/response protocol between the browser and the server.
If you want to go deeper, you can use something like cometd/meteord/etc. If not, a hidden iframe that reloads himself and receives pages with javascript code for the needed actions can do the trick.
Another alternative.
With javascript dopolling and single character flatfile. Have a simple one character flatfile with a single var. Write it in perl (it is faster and uses less resources than php). The parent script calls a javascript variable in a flatfile. It hits the flatfile and goes wherever the var sets it. The flatfile is written to by the controller. Done.
I guess you could also rename an empty flatfile and use that as the controller. I am usure which is faster, open and read a specific file or hit the directory and return the file name. On the controller side, opening and writing to a file vs renaming a file. Maybe they counter each other in resources and time?
This way the site can act as a normal site. When you want to have remote users see a "presentation" (automatically being shown the site pages at the controllers pace), the controller activates polling and tells the viewers to push a start button. This allows a remote instructor to load pages for the viewers at his leisure.
It is a simple solution that works with nothing really sophisticated going on. No frames are needed either. Just need javascript enabled.
Any better suggestions are welcome!
It occurred to me that what you might want to use is HTML Push technology. Check out the wiki, they have several links. I have never used it myself