invisible captcha - captcha

I'm using the following security(invisble captcha) for my site's form submission to prevent auto submission:
generate the result of md5 with a fixed salt on number x and render it
inside the form as a hidden field
generate 2 hidden fields a and b where a + b = x, a and b are
unencrypted
upon submission, use javascript to add another plain hidden field c
where c=a+b
on server side apply md5 on c with the salt, compare it with encrypted
x
However such system is cracked in production, one person was able to auto-submit thousands of forms successfully. Any idea how?
One way to do it is, the hacker already knows that the operation is + (simple to find out by observation of javascript), read the form and add a and b, create a new form with the extra c field where c=a+b. He has to first read a form, then create one for submission.
My questions are:
Is the hypothesis I presented above the likely way to break my system?
If so, what should I do to prevent this kind of hack?
What are other alternative hacks the hacker might use?
I don't want to use real captcha because it degrades user experience. All suggestions are welcome.

Alternatively, the hacker could just execute your javascript themselves.
If you want to validate that the user isn't a robot, you'll have to get the user to do something a robot can't. It's really that simple.

A further step would be to increase the amount of computation required; make it infeasible to submit the forms too rapidly. Try looking at HashCash.

I can't give advice in your specific case, but Django has some nice approaches, how spam in comment fields could be supressed without captchas: Nice approaches here.

Your system is not working because the attacker(s) are just executing your JavaScript themselves. If you want to use a somewhat similar scheme that will prevent automated submissions you need to put a workload factor on the client. This will not stop the automated software from being able to submit to your site but it will slow them down and increase the cost of an attack. The goal is to increase the cost and slow them down enough that the attack is just not worthwhile. Instead of trying to build it yourself try using this proof of work service.

Related

Should I store basic information (like table sort or page number) in query string or session storage?

I am developing a web application (a dashboard of some sort), and on some pages I have a couple of data tables. Each data table has its own page number, filter and sort. Although the data is fetched asynchronously in the background (hence the page does not need to reload), but I need to store these information (page, filter, sort) so it can be persistent for example during a page refresh. As far as I know, there are two ways to store these information:
In the session storage (local storage is not an option, because I don't want these information to be persistent between sessions)
In the query string
Up until now I used to store these information in the query string because I thought it has some advantages: e.g. the user can copy, bookmark, or share the URL with a colleague to discuss some data which they find interesting (for example on the 10th page of a table after performing some filtering and sorting). With the session storage I cannot do this
But now I am extending the abilities of the filter function, and now the information is quite long to use in the query string, and may even exceed the limit (I think 2048 characters, right?). And also there may be more than a couple of tables on each page, and therefore the query string would even become longer.
So first I wanted to know what is the best practice in this situation
And second, is that feature (being able to copy/bookmark/share the page as is) really that important, or not?
Note: please note that the information that I'm talking about is nothing secret or sensitive. It's just table page number, table filter and table sort
The 2048 limit seems that's not really an actual limit, see What is the maximum possible length of a query string?
About the best approach - I've personally always hated that most "new websites" do not support the feature you care about, so I'd personally encourage you to support it!
And finally about the exact mechanism, firstly the query string approach will keep working for a lot longer than 2048 characters, but I can see that copy&pasting it might be unwieldy and depending on the media to share the URL it can introduce mistakes.
So, from the user's perspective, I think the best experience would be given by storing those searches on the backend side and enabling a shorter URL for permalinking/sharing/bookmarking.
This new URL could be obtained by the user via a specific UI button (Share/Permalink) so you save the search in that moment and return the URL, or (best experience but harder and costlier to implement) you can be saving it continuously and sending back to the UI the generated URL and use Javascript to replace the URL for the nicer version (either always or just when people copy it).
Also consider: it may be good enough to just keep using the query string :-)

Counting the amount of users or executions of an application.

I made a program that gets the data from the clipboard and saves it in a string variable. Then it looks for specific words in that string and generates several URLs. Afterwards it open the browser and shows each URL in an own tab.
Some of my friends already use this program frequently and I want to have some statistics about how often. I simple counter variable would be enough but I need to get access to it.
I came up with two options that could work:
I could send an email to a specific adress every time my app is executed. Then I can track the amount of uses by manually or automaticly counting the amount of emails in the postbox. I think this would be a Vers dirty solution.
I could create and publish a website containing a counter. This counter could be refreshed by my application. This solution is a bit better I think but a lot more work for just one single counter.
Do you have better ideas to solve my problem or is one of mine already a good one?
Thank you in advace!
You can use Measurement Protocol Overview. This provides you statistics of usage your application compared with Google Analytics. You can see even a geo statistic, version distribution, crash reports. It is easy to use it from .net. It is just about requesting http request to google.

how to get the data from captcha in selenium webdriver

I'm using Selenium webdriver (Java).
I need to test the registration form but before submitting, image box (captcha) is appearing but everytime of execution it is going to be changed. I want to know how to get the data from image (captcha).
Anyone can help me?
If the captcha is coming from an environment under your control, you will likely need to implement some sort of method indicating you are in a test environment and have the captcha system return a known value or some indicator of what the expected value is.
If, on the other hand, the captcha is coming from another source out of your control, you are probably our of luck. At that point, you are essentially in the same boat as the spammers who are in a constant arms race to write software that can visually parse a captcha.
UPDATE
I feel the need to add some clarification to the ideas put forth in the question, answer and comments. Essentially you are dealing with one of the following situations (note that when I say 'your', I am referring to you, your company, client, etc):
1) Your form, Your captcha system: If this is the case, your best solution is to work with your developers to add a 'test' mode to your captchas, returning either a known value, or additional information in the page that indicates what the expected value should be. If you are able to make use of a tool, either written by you, or by another, that can successfully 'read' the captcha image, your system is broken. If you can do it in test mode, what is to stop anyone else (spammer, hacker, etc) from bypassing your captcha in exactly the same manner.
2) Your form, 3rd Party captcha system: If this is the case, your best solution is again to see if the system has some 'test' mode that you can make use of. I have no experiance with these systems myself but in general would guess that test methods exist for the major systems out there. A Google search of {Captcha System Name} automated testing should return some good hints as to how to go about testing with the system. If nothing good comes from that, your next bet would be to implement your own, internal, test only, dummy captcha system that works with some known value and make your captcha provider configurable so that you can point to your test system in test/dev/etc and your real system in production.
3) Another Form, Unknown captcha system: I am going to make a leap of faith here and assume this is not your case, but just for completeness I will include it. If this is your case, your not testing anything at all and are simply asking for help bypassing someone else's security mechanisms for your own reasons. If that is the case, please seek your assistance on less scrupulous sites.
Captcha code was introduced in order to prevent from the robot or automation codes. There is no option for automating the Captcha code.
1 . You can give a wait time for the automation, so that the user can enter the captcha code.
2. If the project is in testing url means, you can request your system admin and developer to disable the captcha validation.
May be this can help you, but i din't try on this..
Developers will generate a random value for captcha, and they will convert the value into image as well as they will store the value in session for comparing the entered input is matching with the captcha code.
So If possible, you can take that session value and give as the input.

How to Verify whether a Robot is Entering Information

I have a web form which the users fill and the info send to server and stored on a database. I am worried that Robots might just fill in the form and I will end up with a database full of useless records. How can I prevent Robots from filling in my forms? I am thinking maybe something like Stackoverflow's robot detection, where if it thinks you are a robot, it asks you to verify that you are not. Is there a server-side API in Perl, Java or PHP?
There are several solutions.
Use a CAPTCHA. SO uses reCAPTCHA as far as I know.
Add an extra field to your form and hide it with CSS (display:none). A normal user would not see this field and therefore will not fill it. You check at the submission if this field is empty. If not, then you are dealing with a robot that has carefully filled out all form fields. This technique is usually referred to as a "honeypot".
Add a JavaScript timer function. At the page load it starts a value at zero and then increases it as time passes. A normal user would read and fill out your form for some time and only then submit it. A robot would just fill out and submit the form immediately upon receiving it. You check if the value has gone much from zero at the submission. If it has, then it is likely a real user. If you see just a couple of seconds (or even no value at all due to the robots not executing JavaScript) then it is likely a robot. This will however only work if you decide you will require your users have JavaScript on in order to perform "write" operations.
There are other techniques for sure. But these are quite simple and effective.
You can use reCAPTCHA (same as stackoverflow) - they have libraries for a number of programming languages.
I've always preferred Honeypot captcha (article by phil haack), as its less invasive to the user.
Captchas bring accessibility problems and will be ultimately defeated by software recognition.
I recommand the reading of this short article about bot traps, which include hidden fields, as Matthew Vines and New in town already suggested.
Anyway, you are still free to use both captcha and bot traps.
CAPTCHA is great. The other thing you can do that will prevent 99% of your robot traffic yet not annoy your users is to validate fields.
My site, I check for text in fields like zip code and phone number. That has removed all of the non-targeted robot misinformation.
You could create a two-step system in which a user fills the form, but then must reply to an e-mail to "activate" the record within a set period of time - say 24 hours.
In the back end, instead of populating your current table with all the form submissions, you could put them into a temporary table that automatically deletes any row that is older than your time allotment. Unless you have a serious bot problem, then I would think that the table wouldn't get that big, especially if the first form is just a few fields.
A benifit of this approach is that you don't have to use captcha or some other technology like that that might create some accessibility problems.

Which multilingual web design solution is fastest for the user, if this is indeed an issue?

Context:
I'm in the design phase of what I'm hoping will be a big website (lots of traffic, lots of users reading and writing to database).
I want to offer this website in the three languages I speak myself (English, French, and by the time I finish the website, I will hopefully have learned enough Spanish to offer that too)
Dilemma:
I'm wondering how I should go about offering these various languages (and perhaps more in the future).
Criteria:
Many methods exist for designing multi-language websites. I'm looking for the technique that will result in a faster browsing experience for the user.
Choices:
Currently, I can think of (and have read about) the following choices. They are sorted in order of preference up to now.
Store all language-specific strings
in a database and fetch the good one
depending on prefered-language
(members can choose which language
they prefer),
browser-default-language and which
language is selected during the
current session, in that order.
Pros:
Most of the time, a single
test at the beggining of the session
confirms which language to use for
the remainder of the session (stored
in a SESSION variable). Otherwise, a
user logging in also fetches the
right language and keeps it until
he/she logs out (no further tests). So the testing part should be
pretty fast.
Cons:
I'm afraid that accessing the
database all the time would be quite
time-consuming (longer page load for
the user), especially considering
that lots of users could also be
accessing the database at the same
time for the same reason (getting the website text in the correct language), but also
for posting comments and the such.
Strings which include variables
(e.g. "Hello " + user.name + ", how
are you?") are harder to
store because the variable (e.g.
user name) changes for each user.
A direct link to a portal for a specific language would be ugly (e.g. www.site.com?lang=es)
Store all language-specific strings
in a text file and fetch the good one
depending on prefered-language
(members can choose which language
they prefer),
browser-default-language and which
language is selected during the
current session, in that order.
Pros:
Most of the time, a single
test at the beggining of the session
confirms which language to use for
the remainder of the session (stored
in a SESSION variable). Otherwise, a
user logging in also fetches the
right language and keeps it until
he/she logs out (no further tests). So the testing part should be
pretty fast.
Cons:
I'm afraid that accessing the
text file all the time would be quite
time-consuming (longer page load for
the user), especially considering
that lots of users could also be
accessing the file at the same
time for the same reason (getting the website text in the correct language).
Strings which include variables
(e.g. "Hello " + user.name + ", how
are you?") are harder to
store because the variable (e.g.
user name) changes for each user.
I don't think multiple users could access the text file concurrently, though I may be wrong. If that's the case though, every user loading a page would have to wait for his/her turn to access the text file.
Fetching the very last string of the text file could be pretty long...
A direct link to a portal for a specific language would be ugly (e.g. www.site.com?lang=es)
Creating multiple versions of the website in seperate folders, where each version is in a different language.
Pros:
No extra-treatment is needed for handling languages, so no extra waiting time.
Cons:
Maintaining the website will be like going to school: painfull, long, makes you stupid after doing the same thing over and over again.
ugly url (e.g. www.site.com/es/ instead of www.site.com)
Additionnaly, the coices above could be combined with one or more of the following techniques:
Caching certain frequently requested pages (in a singleton or static PHP function?). Certain sentences could also be cached for every language.
Pros
Quicker access for frequently-requested pages.
Which pages need caching can be determined dynamically, with time.
Cons
I'm not sure about this one, but would this end up bloating the server's RAM?
Rewritting the url could be used for many things.
A user looking for direct access to one language could do so using www.site.com/fr/somefile and would be redirected to www.site.com/somefile, but with the language selected beign stored in a session variable.
Pros
Search engines like this because they have two different pages to show for two different languages
Cons
Bookmarking a page doesn't mean you'll en up with the right language when you come back, unless I put the language information in the url (www.site.com/somefile?lang=fr)
A little more info
I usually user the following technologies to make a website:
PHP
SQL
XHTML
CSS
Javascript (and AJAX)
This being said, if a solution requires that I learn a new language or something, I'm very open to doing so. I have no deadline for this project and I do intend to learn a lot from doing it!
Conclusion:
What I'm looking for is a method that allows me to offer multiple languages while not increasing page load time and not going crazy when trying to maintain the website. If you guys/gals have other ideas I should consider, I will try adding them to my list. Another possibility is that I'm overdoing this. Maybe I won't gain enough time with these methods for this all to be worth it, I just don't know how to verify if I need to worry about this or not.. so if you have any ideas for that, it would also help me.
Whether you use a database or a filesystem to store the translations, you should be loading the text all at once and then serving it from memory. Most applications will typically not have so much text that this becomes a problem. In Java or .Net this could be accomplished by storing the text in a singleton or static object. Then all the strings are in RAM and do not need to be loaded or parsed. If your platform does not have a convenient way to store data in ram, you could run a separate caching application such as memcached.
The rest of your concerns can be mitigated by hiding the details. Build or find a framework that lets you load your translations and then look them up by some key. If you decide to switch to files or a database later, the rest of your code is unaffected. In the short term do whichever is easier for you. I've found that it's best to have a mix: it's easier to manage application text along with the source code in a version control system. But some text changes often, or needs to change without requiring a build+deployment cycle, and that text should be in the DB.
Finally, don't build strings with substitutions in them. Use some kind of format string, because otherwise your translators will go crazy trying to translate sentence fragments.
(Warning: Java code sample)
//WRONG
String msg = "Hello, " + username + ", welcome back.";
//RIGHT
String fmt = "Hello, %s, welcome back."; // in real code: load this string from a file or the db
String msg = fmt.format(username);
Another person mentioned encoding the language in the URL. This is the preferred way to do it if you care what a search engine thinks of your site. Google recommends using different hostnames or a different subdirectory. This means that the language headers sent by the user can't be used for anything, except perhaps initially sending them to one landing page or another. You will need to determine the language for each request based on the incoming URL (this actually simplifies your code a lot later on). In Java I'd store the language code in the Request and just grab it whenever I need it.
The easiest way to handle language codes in the URL is to use re-writing. A client sends a request for www.yoursite.com/de/somepage and internally you re-write the request to www.yoursite.com/somepage and store the language identifier somewhere. In Java each request has an HttpServletRequest object where you can store attributes for the lifecycle of the request. If your framework doesn't have anything like that you can just add a parameter to the url: www.yoursite.com/de/somepage => www.yoursite.com/somepage?lang=de. If you are using hostname-based languages you can use hostnames such as de.yoursite.com or www.yoursite.de. There are pros and cons to using this approach. For one thing, using country-code TLDs means registering new TLDs and trying to figure out whether a country code is appropriate to represent a language (it's often not). Using differnet hostnames/domains means you have to consider under what domains cookies are stored. If you want a cookie-free subdomain you need to plan this carefully. But from the coding side a language-based hostname doesn't need any additional re-writing; you can read the hostname (it's the Host header in the HTTP request) and parse that to determine the language.
Offer the initial page in a language depending on the Accept-Language HTTP header.
Let the user set the language in the current session and, if they're authenticated, in their user profile.
In your code and templates, mark strings as "translatable." You should have tools that gather all the strings from your codebase and let your translaters translate them.
Have a layer which loads the translations from the database either individually or as a bundle, and apply them to the page which is loading. Cache these parts to make them fast -- every page load shouldn't make a hundred calls to the database for every translatable string.
Checkout how Django does it -- it should be enlightening.
"I'm afraid that accessing [the database/text file] all the time would be quite time-consuming"
It would be, but that's why you'd likely be using caching to some extent. Nearly all large sites are accessing data stored outside the HTML page itself and, as such, utilize caching techniques as needed.
Your question regarding speed really is irrelevant to having multiple languages. It's an issue of storing data (content) so it's easy to maintain and present to the user. Whether it's one language or 10 the problem is the same.
Create the most generic form of the site as you can. Import the translation from a database, with fall back (i.e. an order of languages, if a translation does not exist then use the next best langauge (For German: German, Dutch, English etc).
You would solve performance issues by keeping caches of the dynamically created pages. [Check the dependent data and update if necessary]
The perfered language that a user would like is passed along in the HTTP request headers. Having a select language+query string would often be unnecessary.
Resource files would be one way to go. It is easier to send to translators. However it can be difficult to resuse amongst multiple websites.
Databases are convient because it is the first thing that should be backed up on a website. It also has the benefit of being fast. However, if you have an extremely database focused project, you may not want to add additional strain on your database.
For my solutions I want this:
The language should be indicated in the URL, it works better with google indexing the page and people following the links in google's search result.
As much pre-generated translations as possible, for faster page-serving.
The first is quite easily done by having an URL like http://example.com/fr/and-so-on. URL rewriting can turn that into http://example.com/and-so-on?lang=fr which is potentially easier to handle.
For pre-generating translations, it is good to use a html template framework so you can generate translated templates from one set of source templates. A blunt approach is to generate a sed-script from a language key-value files, and run that sed script on each template to get a translated version.
What remains then is to translate the dynamically generated parts of the pages. There are a few tools for that java has bundles, gnu gettext is a quite nice tool.