In Amazon Mechanical Turk, Batch HITs remain in mTurk website UI Manager after approving HITs with API - mechanicalturk

I am currently creating a Batch Project via the Amazon Mechanical Turk website (http://requester.mturk.com). After all the HITs have been completed, I download the CSV and approve or reject the hits.
I am then iterating through the CSV and using the mTurk API to call ApproveAssignment or RejectAssignment on the AssignmentId for each row.
When looking as a Worker (in which I completed my own HITs for testing), I see that HITs have been properly approved or rejected. However when looking as the requester it appears that none of the assignments have been approved or denied and the batch project looks like it is still waiting to be reviewed.
Any thoughts? Any help would be greatly appreciated.
Thanks!

This is a known issue. Once you operate on a HIT via the API, the batch interface no long works correctly.
The Manage HITs Individually page should still be updated correctly, though.

Related

Cannot see my MTurk HIT (as a worker) created from the API using boto3 and Python

I am planning to do a large scale crowdsourcing experiment on MTurk and would therefore like to do this using the API and Python, since I am very interested in the ReviewPolicies Feature. I tested this (not only in the sandbox, but also in the marketplace) and can't find my created test HIT (reward set to 0.01).
What could be the reason for this? Also I read in in some prior tasks, that HITs created with the API are not visible in the developer interface. But they must be visible to workers on the website interface, aren't they? If not, how will these HITs be found by workers on the dashboard/marketplace?
I published the HIT successfully on the marketplace (API) and I can see the HIT response. I expected to find this specific HIT also on the dashboard (signed in as a worker).

"Invalid state transition" response when switching from test to live

I have a problem with the YouTube Livestreaming API, and it is only causing a problem on one single account.
The CMS I support has a live to YouTube function that automatically schedules and delivers a livestream from our studios to YouTube as a parallel channel to our website. We support multiple teams who all authenticate their accounts against our application to do this.
About 6 weeks ago we had a single group report that they are no longer seeing their content streaming live to YouTube. All the other accounts, as well as our test channels, are working fine.
With the account in question we can see the livestream get created, the broadcast is created, and they are bound together. Once the encoders are started we are able to successfully transition the stream to "TESTING" without problem approximately 10 minutes prior to the scheduled start time. Where we are seeing the problem is in the final step where we transition the stream from "TESTING" to "LIVE" at the starting time of the broadcast. We get a response with "(#100) Status transition not allowed" when we attempt to transition to live. Prior to this step we retrieve the lifeCycleStatus value, and it shows as "TESTING".
If a user logs into YouTube Studio at this point, they are able to manually transition the stream to live.
The fact that this is working with multiple other accounts, and all are using a common code base and app, I am concerned that there is something about the account itself that is causing this issue. I have not been able to see any significant differences in the account settings when comparing with our test account.
Is there any way I can get further information about why the transition is failing, or something I should be specifically looking for as a potential problem?

Amazon Mechanical Turk: Created a Job using website UI, but would like to accept/reject jobs using the Python API

I created a data collection Job (HITs) using the mechanical turk website.
I would like to reject/approve jobs using python API, because that would accelerate the process.
I can approve the process using the python API, but that doesn't update the Assignment status on the website. Does anyone have any idea on this?
Thanks.
Unfortunately, HITs created through the website are not visible or manageable in the UI (and vice-versa). So, that's why you can't see or operate on them via API calls. You'll need to create the HITs
You can reuse the layouts created in the website, though. Check out this article: https://blog.mturk.com/tutorial-using-the-mturk-requester-website-together-with-python-and-boto-4a7ef0264b7e

Protect from bots creating multiple free accounts and uploading files

I am developing a web for my university where users can create an account and upload images. Images are private and can only be seen by the person who uploaded them. For instance, is like a cloud file system.
Each user have a free account with 500MB. I am using Amazon S3 to store the images, that is to say storage implies costs.
How can I avoid that bots upload millions of MB? How can I avoid that a bot creates million of new accounts and upload 500MB per account without affecting the user experience?
On one hand I definitely don't want to put a CAPTCHA in the registration form because it negatively affects the conversion rate. On the other, I don't want to pay thousands of dollars because a bot upload million of dummy images.
Does anyone know whether Dropbox, Google Drive, etc, suffers from this (content uploaded by bots)? It seems that is not a problem because I couldn't find anything about it. All spam related problems I could read about only covered spam in forums. It makes sense also. Spam in forums can be read by other users. Spam in a service like Dropbox or Google Drive reaches no one. Nonetheless I have to protect it to avoid cost surprises.
As far as I can see, without using CAPTCHAs this can be done:
Set up monitoring systems that warn for specific abuse patterns (the same IP uploading lots of data and creating new accounts repeatedly).
Throttle users that follow those patterns; this will hopefully make them realize and make the process worthless. If this fails, then disable those accounts and have their owners mail/talk to you in order to explain what's happening.
Since you say it's a system for your university, make users provide proof of enrollment (e.g. an university e-mail address) in case of abuse.
Have this forbidden usage explicit in your terms of use.
Of course, a smart enough bot can work around all those problems.
For a more advanced solution, you might try some machine learning or AI that learns about normal and abnormal usage patterns, then applies that information to judge a possible abuser.
I would recommend to :
make users register using their email
don't allow multiple accounts for a single email
send them an email registration confirm, and deactivate the "unconfirmed" accounts after a short amount of time (eg 3 days)
AFAIK, Drupal embeds this kind of controls out-of-the-box or with little effort (and no programming).
This won't solve all your problems, but in fact it will reduce the risk of bot exploits.
As you said you need a registration, there are two points to tackle this problem - make sure no bots register and/or limit the number of uploads.
I personally would use both points. For the user signup, design a login form where the user has to enter its email address, send them a mail with a link in it and activate their account only after clicking this link. Or let the user solve a simple math question on signup.
For the second point, you can store the number of uploaded bytes per user and time. You can then set a quota on allowed upload usage per time, for example you may not upload more than 10MB per hour. If a user hits this limit more than n times, you can deactivate his account.
And: set up and alerting and monitoring system. For example monitor the number of non-activated users, monitor the amount of uploads etc. and set up alerts if these exceed a certain threshold.
The above mentioned methods may not be perfect and probably won't block out all bots, but they will at least make it way harder for bots to upload unwanted data. Also these methods are quite simple, so you can start of with your project and see if this is really a problem. And if you get bots to upload data, you will at least receive alerts and can invent a better solution afterwards.

Running MTurk HITs on external website

I am implementing a website on which the recruited MTurk workers will perform tasks. I plan to recruit workers using MTurk tasks, using which I will redirect them to an external website for actual work. I have the following questions relating to this plan.
Is there any foreseeable problems with this approach of running HITs? If so, how can we mitigate them?
how should I implement the authentication procedure on my external site? For example, how can I make sure the people who come to the website to perform a specific task are indeed the same group of people recruited earlier for this particular task on MTurk?
when the workers finish the task, how should I integrate the payment procedure with MTurk based on their performance? For example, say worker is owed $3 after finishing the task on my external site, is it possible for me to tell MTurk to pay him/her this amount programmatically?
The external site will be built using Python, if such detail matters.
Any suggestions and comments based on your experiences and insights in using MTurk would be much appreciated!
I am thinking through this for a similar project of mine. I've experimented as a worker myself. Here is my plan, I hope it is of use to you. (I have not implemented it yet. It is based on an academic HIT I participated in as a worker.) Here goes:
A. Create a template that has language something like:
1. Please open this web site in a new browser window:
http://your-url.xyz.blah/tasks/${token}
2. Read and follow the instructions there.
3. After completing the task, you will receive a confirmation code. Paste
it here: [________]
B. Create some random tokens for your Mechnical Turk data file:
1A1B43B327015141
09F49F2D47823E0C
B5C49A18B3DB56F4
4E93BB63B0938728
CCE7FA60BFEB3198
...
(Generate these tokens from your app; it needs to cross-reference them.)
C. Your app extracts the token from URL, looks up the task, and does whatever it needs to do. I personally don't worry about people stumbling onto a URL, since it is a one-time use token.
D. After a user completes the task on the external web site, the external app gives a confirmation code. The confirmation code should be random and opaque. Only your application will know if any particular code corresponds to a correct or incorrect answer. In fact, if you want, the correctness may not even be determined in real time -- it could be the result of an aggregation and/or comparison across multiple submissions.
E. Write some code to interact programmatically. Take the token and confirmation code supplied from the MTurk result and make sure they match with your external app. If they don't match, reject the HIT. If they match, check the correctness in your external app and approve or reject. You might consider a bonus pay structure.
So, to answer your particular questions:
I don't anticipate problems with the approach I described. That said, Mechanical Turk is both an art and a science. Perhaps more art. Writing good questions and paying Turkers appropriately is something you have to figure out with a combination of common sense, market research, and experimentation.
See (C) above. A token is designed to only be used once. Use long enough tokens and the probability of collision becomes very low.
See (E) above. The Mechanical Turk Developer Guide is a good place to start.
Please share your results back. Or have the Turkers send StackOverflow hundreds of postcards. :)
Notes:
I'm currently exploring qualification tests. I suspect they can be very useful.
I want to get a Turker's Worker ID in my external application, but I haven't figured that part out yet. I'm reading up on it; for example: Getting workerId by assignmentId
I am thinking about using the ExternalQuestion feature from the API: "... you can host the questions on your own web site using an "external" question. ... A HIT with an external question displays a web page from your web site in a frame in the Worker's web browser. Your web page displays a form for the Worker to fill out and submit. The Worker submits results using your form, and your form submits the results back to Mechanical Turk. Using your web site to display the form gives your web site control over how the question appears and how answers are collected."
You might also find PsiTurk to be useful: "PsiTurk is an open platform for conducting custom behvioral experiments on Amazon's Mechanical Turk. ... It is intended to provide most of the backend machinery necessary to run your experiment. It uses AMT's External Question HIT type, meaning that you can collect data using any website. As long as you can turn your experiment into a website, you can run it with PsiTurk!"