Can you work on your own Mechanical Turk task without getting paid (and without paying Amazon)? - mechanicalturk

Is it possible to work on your own Mechanical Turk task without getting paid, and thus without paying Amazon (to avoid the ~20% fee) ? I would like to crowdsource my task while still being able to work on it in my free time.

Technically, you can deploy a copy of your project to Amazon Mechanical Turk Developer Sandbox, and work on it at Worker Sandbox for free.
However, you do need to pay attention to the guidelines and policies of these sandbox sites. The only banned actions I can find so far are:
Load or performance testing your MTurk application using the MTurk Developer Sandbox is not allowed. If you have a special need, please contact us here to specify your special circumstance. You will be contacted by a member of our team to discuss your requirements.
Therefore, I assume that as long as you don't spam the sandbox sites with bots or a large number of workers, it should be fine to work on your own project personally.
You might still need to confirm with MTurk Support though.

Related

Considerations for Creating Industrial Applications (Native/Web)

What considerations are needed when creating a web app that is intended to be used in an industrial plant setting for a company? My specific use case is an industrial facility with several different production plants that would each have its own device for the application interface.
How do companies enforce the usage of such apps on a monitor/tablet? For example, could I prevent them from using other stuff on the tablet?
Importantly, how would security work? They'd share a device. There may be multiple operators that use the app in a given shift. Would they all use the same authentication session (this is not preferable, as I'd like to uniquely identify the active user)? Obviously I could use standard username/passwords with token based sessions that expire, however, this leaves a lot of potential for account hijacking. Ideally, they'd be able to log on very quickly (PIN, perhaps?) and their session would end when they are done.
As long as there is internet connection, I would presume that there isn't much pro/con regarding the use of native applications versus web based or progressive web apps. Is this assumption correct?
What's the best way of identifying which device the application is being run on?
Is this a common thing to do in general? What other technologies are used to create software that obtains input from industrial operators?
--
Update - this is a good higher level consideration of the question at hand, however, it has become apparent why focused, specific questions are helpful. As such, I will follow up with questions that are specific.
Identifying the Area/Device a Web Application is Accessed On
Enforcing Specific Application Use on Tablets
Best Practices for Web App Authentication in Industrial Settings
I'm not able to answer everything in great detail but here are a few pointers. In the environment as you describe we usually see these two options. 1) you tell them what you need, internet, security, if they give you device and how it will be configured 2) they tell you exactly what you need to deliver.
I do not think you can 100% prevent them. We did it by providing the tablet( well laptops in our case) and the OS configuration took care of that, downside we had few devices to support. You seem to hint that there is always an internet connection so I guess you can collect all info about the system and send it back to you daily?
We were allowed to "tap" into their attendance SW and when you entered the facility you were able to use your 4 digit pin to log in if you were out of premisses you could not log in at all. I can imagine the following: you log in with your username and password - this does full verification, after that, you can use 4 digit pin to login for next n hours.
maybe, kinda, depends on what you are doing. Does the browser have all features you need? Our system needs multicast to perform really fast, so we have a native app
touched on this in 1. You could also use device enrolment process. You can also contractually force them that there will be only your software and it may invalidate support contract. It really depends on your creativity. My favourite( and it works - just tell them, there will only be installed my software and if not you will pay me double for support. I only saw one customer who installed some crap on the device when there were told not to
it really depends on what industry you are talking about, every industry is different. We almost always build a custom solution
The enforcement of the device/app usage depends on the customer, if the customer asked for help in the enforcement, then you can provide guide, training and workshops. If the customer serious about the enforcement then it will be a policy that's adapted by all the organization from top to down. Usually seniors will resist a workflow change more than juniors, so top management/executive should deal with that. Real life story: SAP team took 6 months to transform major newspaper workflow, during that few seniors got fired because they refuse to adapt the change.
Security shouldn't handicap the users, usually in industrial environment the network is isolated or at least restricted through VPN to connect multiple sites (plants in your case), regarding the active user: we usually provide guide/training/workshop for the users and inform them that using colleague account or device will prevent the system from tracking your accomplishment/tasks, so each user is responsible to make sure the active account/device is the one assigned to him/her.
It depends, with native you have more controls than web, but if the app is just doing monitoring then most of today apps use web for monitoring and the common way to receive input is REST APIs (even if the industrial devices doesn't support REST API, a middleware could be written to transform the output). If you need more depth about native vs web you need to ask new question with more details about the requirements.
Depends on the tech you are using (native or web), and things I mentioned in point 2: you can use whitelist of devices that's allowed to run the app. overall there are many best ways to track down the device.
How common in general? I think such information can only be achieved by survey, the world full of variations. And having something common not mean its safe or best, our industry keep changing at all levels. So to stay in the loop, we must keep learning and self-updating without reboot.

Instagram Automation without API allowed?

my two partners and me are about to create a software which automates liking, commenting and following for Instagram with the use of browser simulation (that means that we log into the account of the user through a browser, like google chrome).
Is that kind of automation allowed by Instagram? And if not, is there a possiblity to get aproved?
Yes it's against their terms. I wouldn't bother nor risk it. Instagram is actively suing bot services. Look at the biggest bot service, Instagress - mysteriously shut down entirely.
They're also penalizing accounts that use bots. I run an agency and have seen my clients' engagement mysteriously drop by 50-90% for a seemingly endless amount of time after using bots.
I imagine the purpose of doing it with "browser simulation" like Chrome is to try to avoid detection? Good luck. Instagram is smart and of course has some of the best programmers in the world who know how to combat this type of stuff.
I would say that such operation goes against the terms of user of Instagram. Under "General Description", section 10:
We prohibit crawling, scraping, caching or otherwise accessing any content on the Service via automated means, including but not limited to, user profiles and photos (except as may be the result of standard search engine protocols or technologies used by a search engine with Instagram's express consent).
Since you will be accessing content (and performing actions) via automated means, I would interpret that as a violation of this section.

Protect from bots creating multiple free accounts and uploading files

I am developing a web for my university where users can create an account and upload images. Images are private and can only be seen by the person who uploaded them. For instance, is like a cloud file system.
Each user have a free account with 500MB. I am using Amazon S3 to store the images, that is to say storage implies costs.
How can I avoid that bots upload millions of MB? How can I avoid that a bot creates million of new accounts and upload 500MB per account without affecting the user experience?
On one hand I definitely don't want to put a CAPTCHA in the registration form because it negatively affects the conversion rate. On the other, I don't want to pay thousands of dollars because a bot upload million of dummy images.
Does anyone know whether Dropbox, Google Drive, etc, suffers from this (content uploaded by bots)? It seems that is not a problem because I couldn't find anything about it. All spam related problems I could read about only covered spam in forums. It makes sense also. Spam in forums can be read by other users. Spam in a service like Dropbox or Google Drive reaches no one. Nonetheless I have to protect it to avoid cost surprises.
As far as I can see, without using CAPTCHAs this can be done:
Set up monitoring systems that warn for specific abuse patterns (the same IP uploading lots of data and creating new accounts repeatedly).
Throttle users that follow those patterns; this will hopefully make them realize and make the process worthless. If this fails, then disable those accounts and have their owners mail/talk to you in order to explain what's happening.
Since you say it's a system for your university, make users provide proof of enrollment (e.g. an university e-mail address) in case of abuse.
Have this forbidden usage explicit in your terms of use.
Of course, a smart enough bot can work around all those problems.
For a more advanced solution, you might try some machine learning or AI that learns about normal and abnormal usage patterns, then applies that information to judge a possible abuser.
I would recommend to :
make users register using their email
don't allow multiple accounts for a single email
send them an email registration confirm, and deactivate the "unconfirmed" accounts after a short amount of time (eg 3 days)
AFAIK, Drupal embeds this kind of controls out-of-the-box or with little effort (and no programming).
This won't solve all your problems, but in fact it will reduce the risk of bot exploits.
As you said you need a registration, there are two points to tackle this problem - make sure no bots register and/or limit the number of uploads.
I personally would use both points. For the user signup, design a login form where the user has to enter its email address, send them a mail with a link in it and activate their account only after clicking this link. Or let the user solve a simple math question on signup.
For the second point, you can store the number of uploaded bytes per user and time. You can then set a quota on allowed upload usage per time, for example you may not upload more than 10MB per hour. If a user hits this limit more than n times, you can deactivate his account.
And: set up and alerting and monitoring system. For example monitor the number of non-activated users, monitor the amount of uploads etc. and set up alerts if these exceed a certain threshold.
The above mentioned methods may not be perfect and probably won't block out all bots, but they will at least make it way harder for bots to upload unwanted data. Also these methods are quite simple, so you can start of with your project and see if this is really a problem. And if you get bots to upload data, you will at least receive alerts and can invent a better solution afterwards.

How can I download information from bank accounts?

There are a number of free finance tracking sites out there like mint.com, wesabe.com etc.. .
I've tried all of them and all seem to miss the mark in one way or another. I'm interested in creating my own website, or possibly just a stand alone windows program for tracking my finances in ASP.NET or C#.NET.
I'm assuming the answer is no, but is there any way that a personal developer can download transactions from financial websites like these? I know once you login to most financial sites you can download a CSV or Quicken file. Yet I really like how I can log-in to my Mint.com account and update all my accounts with one click.
Popular applications (like Quicken) and most major US banks support Open Financial Exchange (OFX). If a bank can connect to Quicken, it probably supports OFX (though not guaranteed).
I doubt very many banks have public APIs for this. More likely than not, you will need to send HTTPS requests to the various banking websites, and you will probably have to have custom code for each bank that you wish to support, tailored to the structure of their websites and their form elements.

How would you go about making an application that automatically retrieves your bank account balance twice a day?

I'm building a utility that will hopefully keep my wife in tune with how much money we have available.
I need a simple secure way of logging into my bank account and retrieving the balance.
Something like mechanize is the only method I can think of. I'm not even sure if that would work given the properly authenticated https that banks use.
Any ideas?
Write a perl script using LWP::UserAgent. It supports HTTPS connections. The only issue might be if the site requires javascript.
Web Client Programming with Perl has a few examples to get you started if you're not too familiar with perl.
If you really want to go there, get these extensions for Firefox: Live HTTP Headers, Firebug, FireCookie, and HttpFox. Also download cURL and a scripting language that can run cURL command-line tasks (or a scripting language like PHP or Perl that has access to cURL libraries directly).
I've started down this road for some idempotent GET tasks like getting PDFs of the S&P reports (of the stocks I track) from my online brokerage, and downloading the check images for my bank account. Both tasks are repetitive and slow ways of downloading data to my computer that the financial institutions don't provide any way of making it easier.
Here's why you shouldn't: (as a shortcut I'm going to call the archetypal large bank, brokerage, or other financial institution "BloatBank")
BloatBank is not likely to make public their API for accessing this kind of information. So it can change any time and all your hard work will be for naught. Whenever they change their mechanism, you'll have to adapt.
If BloatBank finds out you've been using automatic scripting to try to access your account information, they may ban you because you've violated their terms of service.
You might screw up, and the interaction between the hodgepodge of scripts on BloatBank's server, and your scripts that access your account, might cause a Bad Thing like closing your account. Testing this kind of script is tremendously difficult because you don't have any documentation about how their online service works, and you don't have a test account you can mess with.
(a variant of the above) You think you're safe because you're issuing GET requests. But BloatBank is just a crazy bank that doesn't know anything about REST, so there are some GET requests that can mess up your account.
If someone else does use your script to maliciously sniff your online password or mess with your account, any liability coverage from BloatBank may disappear because you've opened a security hole.
Why don't you teach your wife how to login to the bank herself? Or use Quicken (or Mint, etc) and teach her how to use the auto-download feature?
Have you checked out Watir? It is fantastic for automating web-browser actions. And since it's written in Ruby, you can take the results and store them in a DB (or email them to yourself) if needed.
If you are open to AIR, I'd say build an AIR app. I have worked with mechanize and I think it's cool. AIR gives you similar features with a richer GUI (see HTMLLoader and DOM manipulation of webpage).
If I were you, I'd simply pull the page and manipulate the DOM to suit my visual needs.
Please, if you find this easy to do for your bank please post your bank's name. If I have the same one I'll be closing my account.
More to your question. The process of loading a web page inside of your code rather than in a browser can be a black art, especially if their is any javascript involved. Your best bet would probably be embedding the IE Web Browser control in your app and then simulating key strokes and mouse clicks to arrive at your balance page. Then scrape the HTML for the balance.
I could try paying for Quicken and letting it do the balance downloading. Then I'd just need to find a way to get the number out of the software automatically.
This way I'm not violating any terms of service and I'm also reducing security risk since all "hacking" goes on locally.