Proof of work login instead of captcha - captcha

I've just seen this genius script on github:
https://github.com/jsavoie/proof-of-work-login
My question is: Why is POW login not a world standard right now in 2018? It's absolute Genius!
Why are old fashioned captchas and recaptchas still so widespread?

The current going rate for reCaptcha solves is upwards of $3 per 1000. Meanwhile, the going rate for a t2.large spot instance is 2.78 cents per hour, dual cores with burst capability. So, for a proof of work to have the same attack cost as reCaptcha, the POW would have to take well over a minute, maybe two, on a single core of 3GHz. You're looking at possibly as much as 5min on an iphone 6. Most users would prefer to click signs.

Real-world data shows that the difficulty levels associated with Proof-of-Work would mean that significant numbers of legitimate users would be unable to continue their current levels of activity.
It seems to be a classic example of security over usability.
Source

The Main Purpose of Recaptcha is to prevent automated Bots,
Preventing Spam is an obvious outcome of Recaptcha as Bots are usually used for Spamming, Though proof-of-work can prevent spamming (By making process slow) It doesn't actually stop bots, It might be used as a seperate mechanism but It by no way is a replacement to Recaptcha as Recaptcha is to detect Humans

Related

Concurrent page request comparisons

I have been hoping to find out what different server setups equate to in theory for concurrent page requests, and the answer always seems to be soaked in voodoo and sorcery. What is the approximation of max concurrent page requests for the following setups?
apache+php+mysql(1 server)
apache+php+mysql+caching(like memcached or similiar (still one server))
apache+php+mysql+caching+dedicated Database Server (2 servers)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/single dbserver)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/multi dbserver)
+distributed (amazon cloud elastic) -- I know this one is "as much as you can afford" but it would be nice to know when to move to it.
I appreciate any constructive criticism, I am just trying to figure out when its time to move from one implementation to the next, because they each come with their own implementation feat either programming wise or setup wise.
In your question you talk about caching and this is probably one of the most important factors in a web architecture r.e performance and capacity.
Memcache is useful, but actually, before that, you should be ensuring proper HTTP cache directives on your server responses. This does 2 things; it reduces the number of requests and speeds up server response times (if you have Apache configured correctly). This can also be improved by using an HTTP accelerator like Varnish and a CDN.
Another factor to consider is whether your system is stateless. By stateless, it usually means that it doesn't store sessions on the server and reference them with every request. A good systems architecture relies on state as little as possible. The less state the more horizontally scalable a system. Most people introduce state when confronted with issues of personalisation - i.e serving up different content for different users. In such cases you should first investigate using the HTML5 session storage (i.e store the complete user data in javascript on the client, obviously over https) or if the data set is smaller, secure javascript cookies. That way you can still serve up cached resources and then personalise with javascript on the client.
Finally, your stack includes a database tier, another potential bottleneck for performance and capacity. If you are only reading data from the system then again it should be quite easy to horizontally scale. If there are reads and writes, its typically better to separate the read write datasets into a separate database and have the read only in another. You can then use more relevant methods to scale.
These setups do not spit out a single answer that you can then compare to each other. The answer will vary on way more factors than you have listed.
Even if they did spit out a single answer, then it is just one metric out of dozens. What makes this the most important metric?
Even worse, each of these alternatives is not free. There is engineering effort and maintenance overhead in each of these. Which could not be analysed without understanding your organisation, your app and your cost/revenue structures.
Options like AWS not only involve development effort but may "lock you in" to a solution so you also need to be aware of that.
I know this response is not complete, but I am pointing out that this question touches on a large complicated area that cannot be reduced to a single metric.
I suspect you are approaching this from exactly the wrong end. Do not go looking for technologies and then figure out how to use them. Instead profile your app (measure, measure, measure), figure out the actual problem you are having, and then solve that problem and that problem only.
If you understand the problem and you understand the technology options then you should have an answer.
If you have already done this and the problem is concurrent page requests then I apologise in advance, but I suspect not.

Can this online highscore scheme be abused?

Background
One problem with games using online highscore lists is that they often can be abused. The game sends the current score to the server and a cunning user can analyze the protocol/scheme and send bogus scores. That is why some highscore lists are topped with 999999 scores.
A common solution to this problem is to encrypt the score in some way, and on top of that put other mechanisms to recognize false scores. But even if you do this, it's the client that sends the score and the client is living in the user's computer and can be reverse-engineered.
My idea
I am designing/thinking about a game (that I will complete, yeah right :) ) where you configure your player/robot with instructions on how to perform a task (and when these instructions are to be carried out). When a "Go" button is pressed the game runs the instructions. Finally a result and, if successful, a score, is obtained.
So, how about this: Instead of submitting the score, the actual instructions are sent to the server, where they are run, using the same implementation. Then the server calculates the score and places the user on the highscore list.
The question
Are there ways this idea can be abused to get a false score?
I understand that this probably is not a new idea. But if it works, it wouldn't be impossible to extend it to other games too, where it is possible to record all user actions.
People will always find a way to cheat, but this seems like a reasonable counter measure. You'll have to consider your intended traffic levels as your scheme will require more resources than if it was just recording the high score sent by the client.
But, as an aside - this game sounds an awful lot like my job (giving instructions to a machine so it performs some task). No high-score board though (although, that would be awesome).
as long as the robot program's behavior doesn't depend on the speed of the computer it'll be fine and if the programs are quite small at most a few kilobytes this would work fine; the only way i can see to cheat it is if one cloned the work space and ran a program to find the optimal program for the robot and then put it in and submitted it or if some one posted the solutions, and people used that but both of those issues can be solved with randomization.
(a note about the issue of speed dependent games, it's fine for the game to uniformly slow down if the computer can't run it at full speed but if the physics time step depends on the frame rate, you can get problems like the jump height varying with the frame rate)

Acceptable load time

I am working on a point of sale (POS vending machine) project which has many images on the screen where the customer is expected to browse almost all of them. Here are my questions:
Can you please suggest me test cases for testing load time for images?
What is the acceptable load time for these images on screen?
Do we have any standards for testing these kind of acceptable load time?
"What is an acceptable loading time?" is a very broad question, one that has been studied as a research question for human computer interaction issues. In general the answer depends on:
How predictable the loading time is? (does it vary according to time of day, e.g. from 9am to 2am. unpredictable is usually the single most annoying thing about waiting)
How good is the feedback to the user? (does it look like it's broken or have a nice progress bar during the waiting? knowing it's nearly there can help ease the pain, even if the loading times are always consistent)
Who are the users and what other systems have they used previously? If it was all writing in a book before then waiting 2 minutes for images is going to be positively slow. If you're replacing something that took 3 minutes then it's pretty fast.
Ancillary input issues, e.g. does it buffer presses whilst loading and also move items around on the display so people press before it's finished and accidentally press the wrong thing? Does it annoyingly eat input soon after you've started to input it so you have to type/scan it again?
In terms of testing I'm assuming you're not planning on observing users and asking "how hard would you like to hurt this proxy for frustration?" What you can realistically test is how it copes under realistic loads and how accurate the predictions are.

how can i get to know about my web site maximum capacity for the visitors at the same time

how can i get to know about my web site maximum capacity for the visitors at the same time?
-kind of stress test for unexpected situations-
You are correct when you think in terms of stress test. You need to be able to reproduce the amount of users you are expecting in order to know precisely how many concurrently users your application will be able to handle.
You start with a low number of users and then you can increase it until you reach a point where your app stops answering in a acceptable amount of time.
I'm afraid there is no simple answer to this, but the simplest way to do this that I can think of is write a simple script that will make GET/POST requests (maybe even using wget) and run it on a farm on Amazon EC2 or something like that so you can truly reach the max capacity of your infrastructure.
If your site is primarily static content, then you will most likely be limited by bandwidth. In this case, an estimate of the capacity can be easily calculated for a given set of expected user activities.
If you have site that is built on common software, you might be able to find benchmarks of that software that will give you a rough estimate of the capacity you can expect.
If this is a critical site or it is a hand-built or highly customized application, then there is no substitute for testing. You need "web performance load testing" software - google for it. This type of software will simulate many browsers visiting your site at the same time. There is a wide variety of choices, from free to $$$$$$$$s.

Which SEO practices are likely to be responsible for SO questions appearing so quickly in Google searches?

Does anyone have some idea as to how come questions posted here on SO are showing up so quickly on Google?.
Sometimes questions submitted are appearing as the first 10 entries or so - on the first page within 30 minutes of submitting a question. Pray tell, what sort of magic is being wielded here?
Anybody have some ideas, suggestions?. My first thought is that they have info in their sitemap that tells google robots to trawl every N minutes or so - is that whats going on?
BTW, I am aware that simply instructing Googlebots to scan your site every N minutes will not work if you dont have quality information (that is constantly being updated on your site).
I'd just like to know if there is something else that SO may be doing right (apart from the marvelous content of course)
To put it simply, more popular websites with more quality content and more frequent changes are ranked higher with Google's algorithm, and are indexed and cached more frequently than sites that are less popular or change less frequently.
Broadly speaking, it's only content that does it. The size and quality of content has reached Google's threshold for "spider as fast as the site will permit". SO has to actively throttle the Googlebot; Jeff has said on Coding Horror that they were getting more then 50,000 requests per day from Google, and that was over a year ago.
If you scan through non-news sites from the Alexa top 500 you will find virtually all of those have results in Google that are just minutes old. (i.e. type site:archive.org into Google and choose "Latest" in the menu on the left)
So there's nothing practical you can do to your own site to speed up spidering, except to increase the amount of traffic to your site...
It is really simple.
SO is a PageRank 6 site that gives the world new information.
Google has a strong bias on new information. It will crawl the site many times a day and it will immediately add the pages to its index. It will favor a page (top 10) to say a specific query for a small period of time (a few days) and then it will stop favoring that page and rank it as normal.
This is standard G procedure and it happens with many many sites.
As you might guess, grayhat/blackhat seo uses that fact in many ways.
Also helped by SO providing an RSS feed, I think google likes feeds from reliable sources.