How do I retrieve actual worker answers to an mturk qualification test? - mechanicalturk

I have a qualification test that I'd like to tune. If there are questions that are frequently missed, I will consider swapping them out or adding clarification.
The test is part of a Qualification Type. The worker is granted the test automatically after finishing with a score of the correct answers. My HITs require a minimum score on the qualification type to accept.
I don't see a view in the requester UI to see actual worker responses.
GetQualificationsForQualificationType only returns the status and value (score), though the documentation example response has answers. I've tried both java and WS calls. The sdk also has a getQualificationRequests() but it returns null for this qualification type. It returns an array of QualificationRequest. That class has getTest() and getAnswer() methods, but I don't see what it would return. It seems to me that qualification requests are only for those without tests, which are granted manually.
Anyone know a way to get the actuals? Thanks!

There's no way to to do this if you're using an AnswerKey. There are two general strategies to get around this:
Do not set the qualification to be autogranted (or use an Answer Key). Then you can use the GetQualificationRequests operation to see actual answers to individual questions. This may not work if you want to quickly qualify lots of workers, although you could write a script to poll for new requests and approve them based on the answers, while saving the qualification test answers locally. (They are no longer available from MTurk once the qualification has been granted.)
You can specify your qualification scores in such a way that each score uniquely identifies a pattern of answers. A simple way to do this is something like a three-question qualification test that scores each question separately by a factor of ten:
a. Q1: correct = 1, incorrect = 0
b. Q2: correct = 10, incorrect = 0
c. Q3: correct = 100, incorrect = 0
Then scores of 1, 10, or 100 indicate 1 correct answer. 11, 101, or 110 indicate two correct answers, and scores of 111 indicate three correct answers. You can use the In comparator for your QualificationRequirements to then require, for example, that a worker has a score that is "In" 11, 101, 110, or 111 if you want them to have 2 or more correct answers.
There are obviously other scoring patterns that would similarly produce uniquely identifiable patterns of scores.

Related

Query time for a specific entity is 10000 times higher [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 9 months ago.
Improve this question
We run into a problem: select for a filter by a certain id takes a very long time. For all id about 5ms, for this - 10 seconds.
This is explain. Left - normal, right - long. This is absolutely the same sql query, where the difference is only in one digit 'where id = ...'
this
It is striking that a filter is used on the right, but for some reason it is not on the left, as well as some huge number of 'rows removed'. Such a number can only be obtained by multiplying the number of rows in the joined tables. Once again I repeat that the sql query is absolutely the same except for the entity id, the number of retrieved data for entities is comparable.
One of the tables also uses btre index. The only thing that this id has is special - it comes after the numbering break, 22,23,24,30 for example. But I was not able to reproduce the problem on this principle.
Unfortunately, I cannot show the code, but I hope that this information will be enough to advise something.
upd:
I found the reason. Postgres for some reason expects that one of the tables will return only 1 structure, when as a real return in 10k+ and therefore chooses the wrong algorithm. For other entity ids, it "thinks" correctly and chooses higher algorithms. Can you find how posgres counts plan lines? What could be the problem?
If I understand correctly, your problem is data histogram. We cannot support you because you cannot provide example code. Briefly, one of your table has a data whose id columns has heterogenous data in it. For example; your table has 1 billion records and in that table each id has 500 records. Yet, some of the id' s (virtually, let say) 20 or 200 millions records. So, if you search for these highly non-selective rows the database optimizer will not help you.
Check your data histogram!

Limiting the result of intersection between two sets in Redis efficiently

I have an exam software system and one of the features is to show students random questions from a huge set given that the question has never been show to the student before. I'm using redis to implement it, so I made two types sets in my Redis DB, the first one is the question bank and then each user has a his own set of previously viewed questions that gets updated after the user sees a question in an exam.
However, in order to make the requirement, I need to find 10 question from the questions bank for each exam that the user has never seen before. I thought of using:
SDIFFSTORE nextQuestionsToShow questionBank userQuestionsSet
SRANDMEMBER nextQuestionsToShow 10
and after processing the result, I delete the produced set nextQuestionsToShow.
However, I think this is inefficient (time and memory wise) since it's an anytime online exam system for users during the day, and the question bank has a huge amount of questions per category (some categories has over 100K questions), and this means that the difference is a huge set for each user that has to be stored to only select 10 random questions. So is there a more efficient way to select 10 random questions from the question bank that the user hasn't answered before? Thanks a lot in advance.
Instead of using SET to store userQuestionsSet and questionBank, you can use bitmap (Redis STRING) to store these two sets. Then you can use the BITOP to efficiently get the difference between two bitmap.
UPDATE
First of all, you need to give each question to a unique number. Then use a bitmap to store the userQuestionsSet and questionBank. Say, you have the following questions in bank: 1: question1, 2: question2, 3: question3, 4: question4, 5: question5. And user has already viewed question3:
// initialize question bank: 00111110
SETBIT question-bank 1 1
SETBIT question-bank 2 1
...
SETBIT question-bank 5 1
// user has viewed question3: 00001000
SETBIT user 3 1
Get the difference between question bank and user viewed questions:
// XOR to get the difference: 00111110 XOR 00001000
BITOP XOR result question-bank user
// 00110110
// questions not viewed: 1, 2, 4, 5
GET result
When you GET the binary string stored in result, you can scan the string and randomly get 10 questions for the user.
NOTE
You should be careful that SETBIT might be an expensive operation, and you'd better pre-allocate memory for these bitmaps. See the doc's WARNING part for detail.

Can proc sql embedded in sas macros dynamically merge to data-sets, simulating residential treatment placement decisions for trouble youth?

Good afternoon and happy Friday, folks
I’m trying to automate a placement simulation of youth into residential treatment where they will have the highest likelihood of success. Success is operationalized as “not recidivating” within 3 years of entering treatment. Equations predicting recidivism have been generated for each location, and the equations have been applied to each individual in the scenario (based on youth characteristics like risk, age, etc., LOS). Each youth has predicted success rates for every location, which throws in a wrench: youth are not qualified for all of the treatment facilities for which they have predicted success rates. Indeed, treatment locations have differing, yet overlapping qualifications.
Let’s take a made-up example. Johnny (ID # 5, below) is a 15-year-old boy with drug charges. He could have “predicted success rates” of 91% for location A, 88% for location B, 50% for location C, and 75% for location D. Johnny is most likely to be successful (i.e., not recidivate within three years of entering treatment) if he is treated at location A; unfortunately, location A only accepts youth who are 17 years old or older; therefore, Johnny would not qualify for treatment here. Alternatively, for Johnny, location B is the next best location. Let us assume that Johnny is qualified for location B, but that all of location-B beds are filled; so, we must now look to location D, as it is now Johnny’s “best available” option at 75%.
The score so far: We are matching youth to available beds in location for which they qualify and might enjoy the greatest likelihood of success. Unfortunately, each location only has a certain number of available beds, and the number of available beds different across locations. The qualifications of entry into treatment facilities differ, yet overlap (e.g., 12-17 year-olds vs 14-20 year-olds).
In order to simulate what placement decisions might look like based on success rates, I went through the scenario describe above for over 400 youth, by hand, in excel. It took me about a week. I’d like to use PROC SQL imbedded in a SAS MACRO to automate these placement scenarios with the ultimate goals of a) obtain the ability to bootstrap iterations in order to examine effect sizes across distributions, b) save time, and c) prevent further brain damage from banging my head again desk and wall in frustration whilst doing this by hand. Whilst never having had the necessity—nay—the privilege of using SQL in my typical roll as a researcher, I believe that this time has now come to pass and I’m excited about it! Honestly. I believe it has the capacity I’m looking for. Unfortunately, it is beating the devil out of me!
Here’s what I’ve got cookin’ so far: I want to create and automate the placement simulation with the clever use of merging/joining/switching/or something like that.
I have two datasets (tables). The first dataset contains all of the youth information (one row per youth; several columns with demographics, location ranks, which correspond to the predicted success rates). The order of rows in the youth dataset (was/will be randomly generated (to simulate the randomness with which youth enter the system and are subsequently place into treatment). Note that I will be “cleaning” the youth dataset prior to merging such that rank-column cells will only be populated for programs for which a respective youth qualifies. This should take the “does the youth even qualify for the program” problem out of the equation.
However, it still leaves the issue of availability left to be contended with in the scenario.
The second dataset containing the treatment facility beds, with each row corresponding to an available bed in one of the treatment location; two columns contain bed numbers and location names. Each bed (row) has only one location cell populated, but locations will populate several cells.
Thus, in descending order, I want to merge each youth row with the available bed that represents his/her best chance of success, and so the merge/join/switch/thing should take place
on youth.Rank1= distinct TF.Location,
and if youth.Rank1≠ TF.location then
merge on youth.Rank2= TF.location,
if youth.Rank2≠ TF.location then merge at
youth.Rank3 = TF.location, etc.
Put plainly: “Merge on rank1 unless rank1 location is no longer available, then merge on rank2, unless rank2 location is no longer available, and on down the line, etc., etc., until all option are exhausted and foster care (i.e., alternative services). Is the only option.
I’ve had no success getting this to work. I haven’t even been successful getting the union function to work. About the only successful thing I’ve done in SQL so far is create a view of a single dataset. It’s pretty sad. I’ve been following this guidance, but I get hung up around the “where” command:
proc sql; /Calls the SQL procedure*/;
create table x as /*Tells SAS to create a table called x*/
select /*Specifies the column(s) to be selected*/
from /*Specificies the tables(s) (data sets) to be queried*/
where /*Subjests the data based on a condition*/
group by /*Classifies the data into groups based on the specified
column(s)*/
order by /*Sorts the resulting rows observations) by the specified
column(s)*/
; quit; /*Ends the proc sql procedure*/
Frankly, I’m stuck and I could use some advice. This greenhorn in me is in way over his head.
I appreciate any help or guidance anyone might lend.
Cheers!
P
The process you describe (and to be honest I skiped to the end so I might of missed something) does not lend itself to SQL because each step could affect the results of the next one. However, you want to get the most best results for the most kids. (I think a lot of that text was to convince us how important it is to help out). You don't actually give us anything we can really use to help since you don't give any details of your data model, your data, or expected results. There really is no way to answer this question. But I don't care -- I'm going to go forward with some suggestions because it is a friday and I've never done a stream of consciousness answer to a stream of consciousness question before. I will suggest you don't formulate your solution just in sql, but instead use a higher level program and engage is a process like the one described below -- because this a DB questions I've noted the locations where the DB might be involved.
Generate a list kids (this can be in a table -- called NEEDY-KID)
Have a list of locations to assign (this can also be a table LOCATION)
Run your matching for best fit from KID to location -- at this point don't worry about assign more than one kid to a location -- there can be duplicates (put this in table called KID2LOC using a query)
Check KID2LOC for locations assigned twice -- use some method to remove the duplicate ones so each loc is only assigned once. (remove from the KID2LOC using a query)
Prune the LOCATION list to remove assigned locations (once again -- a query)
If kids exist without a location go to 3 with new pruned location list.
Done.

How can I set a minimum passing score for a qualification test?

I want to create a qualification test using AMT's command-line interface. This test will only use question types that can be automatically scored by AMT. I can set the individual score for each question, but cannot find any documentation on how to set a minimum passing score for the test. For example, if a worker scores below 80, I would like them to fail the test. If they score above 80, I would like them to pass the test.
The Qualification test doesn't work like pass/fail. It assigns a QualificationScore. You can use QualificationValueMapping in the AnswerKey to translate a test score into a particular set of qualification scores (see: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_AnswerKeyDataStructureArticle.html). The pass/fail element has to be part of your QualificationRequirement for a particular HIT - i.e., "pass" should mean a score higher than the QualificationRequirement's threshold and fail should be lower than that. Mapping would then allow you to translate total points into, e.g., values above and below the requirement's threshold.

Rough estimate of test cases

I'm curious how many test cases others have for a site similar to mine. It's your basic CRUD with business workflow website. 3 user roles, a couple input pages, a couple search pages, a business rule engine, etc. Maybe 50k lines of .NET code (workflow and persistence altogether). DB with about 10 main tables plus about 100 supporting tables (lookups, logs, etc.). The main UI for entering data is quite big, around 100 data fields, multiple grids, about 5 action/submit type buttons.
I know this is vague and I'm only hoping for order of magnitude figures. I'm also thinking of basic test cases, not code coverage type cases. But like if I told you we had 25 test cases I'm sure you'd say way WAY not enough. So I'm just looking for ballpark figures.
TIA
I would have as many test cases as it takes to ensure a high level of confidence in the system.
The number of tables, rules, lines of code, etc is actually immaterial.
You should have the appropriate unit tests to ensure your domain objects and business rules are firing correctly. You should have tests to ensure your queries execute appropriately (this is a harder one).
You might even want to have test cases for paths through the software. In other words, click here, get this page, click there, edit a field, save the page, go back... This type is the most difficult as the tests are usually recorded and have to be rerecorded when the pages change (ie: a field is added or removed).
Generally speaking it's more about coverage than number of tests. You want your tests to cover as much of the applications funcionality as is feasible. Note that I didn't say possible. You can cover an entire application (100%) with test cases, but for every little change, bug fix, etc you'll have to recode those tests. This is more desired for a mature app. For newer apps you don't want to hamstring your developers and QA team that way as they'll spend inordinate amounts of time fixing/changing unit tests...
For any system, you could easily spend as much time developing your automated tests as you do the system itself. In some cases, even more.
As for our group, we tend to have lots of unit tests. However, for testing paths through the system we only record those once a particular area has moved into a "maintenance" type of mode. Meaning we expect little change for quite a while in that area and the path test is simply to ensure no one jacked it up.
UPDATE: the comments here led me to the following:
Going a little further: Let's examine 1 small piece of code:
Int32 AddNumbers(Int32 a, Int32 b) {
return a+b;
}
On the face of it you could get away with a single test:
Int32 result = AddNumbers(1,2);
Assert.Equals(result, 3);
However, that probably isn't enough. What happens if you do this:
Int32 result = AddNumbers(Int32.MaxValue, 1);
Assert.Equals(result, (Int32.MaxValue+1));
Now we have a failure. Here's another one:
Int32 result = AddNumbers(Int32.MinValue, -1);
Assert.Equals(result, (Int32.MinValue-1));
So, we have an extremely simple method that requires at least 3 tests. The initial to see if it can give any result, then 2 for bounds checking. That's 3 tests for essentially 2 lines of code (method definition and the one line computation).
As your code becomes more complex, things get really dicey:
Decimal DivideThis(Decimal a, Decimal b) {
result = Decimal.Divide(a,b);
}
This slight change introduces yet another exception condition beyond bounds: DivideByZero. So now we are up to 4 tests required for 2 lines of code.
Now, let's simplify it a bit:
String AppendData(String data, String toAppend) {
return String.Format("{0}{1}", data, toAppend);
}
Our test case here is:
String result = AppendData("Hello", "World");
Assert.Equals(result, "HelloWorld");
That's just one test case for the code block, with no others really needed.
What does this tell us: For starters 2 lines of code might cause us to need between 1 and 4 test cases. You mentioned 50k lines... Using that logic, you will need between 50,000 and 200,000 test cases...
Of course, life is rarely so simple. In those 50k lines of code you have, there are going to be large blocks of code that have very limited inputs. For example a mortgage interest calculator might take 3 parameters, and return 1 value (the APR). The code itself might run 100 lines or so (been awhile, just work with me). The number of test cases for this is going to be determined by edge cases along the lines of making sure you properly handle rounding.
So, let's say it's 5 cases: which brings us to 20 lines of code = 1 case. Calculating that out your 50k lines might result in 2,500 test cases. Obviously much smaller than what we expected above.
Finally, I'm going to throw another wrinkle into the mix. Some test systems can handle inputs and your assertions coming from a data file. Considering our first one we could have a data file that has a line for each parameter combination we want to test. In this scenario, we only need 1 test case to cover 3 (or more..) possible conditions.
The test case might look like (pseudo code):
read input file.
parse expected result, parameter 1, parameter 2
run method
assert method result = parsed result
repeat for each line of the file
With that capability, we are down to 1 test case per scenario. I would say 1 per method, but the reality is that most methods are rarely standalone and it's entirely possible that numerous methods are implicitly tested through explicit testing of others; therefore not requiring their own individual tests.
This leads me to this: It is impossible to determine the right number of test cases without a full understanding of your code base. 5 cases that are at the UI level might be enough for complete coverage depending on the complexity of the tests; or it might take thousands. Therefore it's much better to base it on code coverage. What percentage of the code, and branching logic, are you testing?
If you ask a car salesman for a rough price of a car and he would give me that price, I wouldn't buy my car there, because he forgot to ask me some important questions. What kind of car do you want? Which extras do you want on the car? etc.
Same for number of test cases .... If a hiring manager would ask me that question I would probably give him the following answer.
#test cases = between #Requirements*2 and #Requirements*infinite (some requirements can lead to bollions of possibilities)
I also would say that based on my experience the number would realistically be #Requirements*5 (is the number I use at the initial phase, for projects with new, changed and omitted functionality)
where the following error margin has to be taken depending on the phase I am making this estimate:
Initiation phase : error margins = 400%
...
Testing phase : error margin = 10%
By the time you start the testing phase, detailed requirements/specs are available, volatillity of requirements is stabilized, creep of requirements is almost zero, etc.
At that time I also will be able to give better estimates ...