How can I set a minimum passing score for a qualification test? - mechanicalturk

I want to create a qualification test using AMT's command-line interface. This test will only use question types that can be automatically scored by AMT. I can set the individual score for each question, but cannot find any documentation on how to set a minimum passing score for the test. For example, if a worker scores below 80, I would like them to fail the test. If they score above 80, I would like them to pass the test.

The Qualification test doesn't work like pass/fail. It assigns a QualificationScore. You can use QualificationValueMapping in the AnswerKey to translate a test score into a particular set of qualification scores (see: http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_AnswerKeyDataStructureArticle.html). The pass/fail element has to be part of your QualificationRequirement for a particular HIT - i.e., "pass" should mean a score higher than the QualificationRequirement's threshold and fail should be lower than that. Mapping would then allow you to translate total points into, e.g., values above and below the requirement's threshold.

Related

How does score multiplier operator in Watson Discovery service work?

I have a set of JSON documents uploaded to my WDS instance. I want to understand the importance of the score multiplier operator(^). The document just says,"Increases the score value of the search term". I have tried a simple query on one field, it multiplies the score by the number specified.
If I specify two fields & I want Watson Discovery to know which of the two fields is more important for the search, is score multiplier applicable in this case? With two fields & a score multiplier applied to one, I could not identify the difference. Also, on what datatypes is this allowed? It didn't work with a number.
I found this through some more experiments. Score multiplier is used when you want to increase the relative importance of fields in the query. So, for example, you want to give more importance to Name.LastName in the below example:
Name.FirstName:"ABC",Name.LastName:"DEF"^3
Here, LastName is given more relevance & the search results are ordered in the same way.
Could be useful for someone.

Data Science: Using Inferential Statistics to label train dataset

Lack of High Schools in remote areas is a problem for students in developing country. Students in some locations are better than that in other. So, I have to find those locations. Now, the main problem is defining "BETTER". I have made some rules that will define the profile of a location.
Right now, I am concerned with the good students.
So, what I have done is-
1. Used some inferential statistics to and made some rules to come up with the conclusion that Location A,B,C,etc are the most potential locations where you can put the high schools because according to my rules these locations contain quality students.
I did all of the things above to label the data because I required to define "BETTER" and label the data so that I can now use machine learning algorithm to learn the factors which makes a location a potential location so that if I give a data point from test data to the model, it will instantly tell if the location is better or not.
Overview of the method:
For each location, I have these 4 information:
total_students_staying_for_high_school_education(A),
total_students_leaving_for_high_school_education_in_another_place(B),
mean_grade_point_of_students_of_type_B,
ratio (calculated as B/A),
For the location whose ratio > 1
I applied the chi-squared significance test to come up with a statistic which would tell me if students are leaving that place in significant amount than staying. I used ANOVA and then Tukey test to compare means_grade points and then find combinations of pairs of locations whose means vary and whose is greater than the others.
I then wrote a python program with a custom comparator that first compares if mean_grade of those points vary and returns the one with greater mean. If the means don't vary, the comparator return the location with the one whose chi-squared value is greater.
This is how, the whole process comes up with few suggestions of location and I call those location "BETTER".
What I am concerned about is-
1. How do I verify if my rules are valid? Or do I even need to verify it?
2. Most importantly, is mingling statistics with machine learning as described above an appropriate approach?Is there any major leakage in the method?Can anyone suggest a more general method?

Time complexity of zadd when value has score greater than highest score present in the targeted sorted set

If every value one adds to a sorted set (redis) is one with the highest score, will the time complexity be O(log(N)) for each zadd?
OR, for such edge cases, redis performs optimizations (e.g. an exception that in such cases where score is higher than the highest score in the set, simply add the value at the highest spot)?
Practically, I ask because I keep a global sorted set in my app where values are zadded with time since epoch as the score. And I'm wondering whether this will still be O(log(N)), or would it be faster?
Once a Sorted Set has grown over the thresholds set by the zset-max-ziplist-* configuration directives, it is encoded as a skip list. Optimizing insertion for this edge case seems impossible due to the need to maintain the skip list's upper levels. A cursory review of the source code shows that, as expected, this isn't handled in any special way.

How do I retrieve actual worker answers to an mturk qualification test?

I have a qualification test that I'd like to tune. If there are questions that are frequently missed, I will consider swapping them out or adding clarification.
The test is part of a Qualification Type. The worker is granted the test automatically after finishing with a score of the correct answers. My HITs require a minimum score on the qualification type to accept.
I don't see a view in the requester UI to see actual worker responses.
GetQualificationsForQualificationType only returns the status and value (score), though the documentation example response has answers. I've tried both java and WS calls. The sdk also has a getQualificationRequests() but it returns null for this qualification type. It returns an array of QualificationRequest. That class has getTest() and getAnswer() methods, but I don't see what it would return. It seems to me that qualification requests are only for those without tests, which are granted manually.
Anyone know a way to get the actuals? Thanks!
There's no way to to do this if you're using an AnswerKey. There are two general strategies to get around this:
Do not set the qualification to be autogranted (or use an Answer Key). Then you can use the GetQualificationRequests operation to see actual answers to individual questions. This may not work if you want to quickly qualify lots of workers, although you could write a script to poll for new requests and approve them based on the answers, while saving the qualification test answers locally. (They are no longer available from MTurk once the qualification has been granted.)
You can specify your qualification scores in such a way that each score uniquely identifies a pattern of answers. A simple way to do this is something like a three-question qualification test that scores each question separately by a factor of ten:
a. Q1: correct = 1, incorrect = 0
b. Q2: correct = 10, incorrect = 0
c. Q3: correct = 100, incorrect = 0
Then scores of 1, 10, or 100 indicate 1 correct answer. 11, 101, or 110 indicate two correct answers, and scores of 111 indicate three correct answers. You can use the In comparator for your QualificationRequirements to then require, for example, that a worker has a score that is "In" 11, 101, 110, or 111 if you want them to have 2 or more correct answers.
There are obviously other scoring patterns that would similarly produce uniquely identifiable patterns of scores.

How test-cases should look like

Let us say I want to make a boundary-value test case, I prepare and do the testing but how do I write the test-cases in a nice way? Is there any standardized way of doing this?
For writing test cases for boundary value analysis always take into consideration test data and on boundary less than boundary and greater than boundary.
Let us take an example:
Number range allowed is between 1 to 1000 then, following would be the conditions to be taken in considerations
1) Test cases with test data exactly as the input boundaries of input domain i.e. values 1 and 1000 in our case.
2) Test data with values just below the extreme edges of input domains i.e. values 0 and 999.
3) Test data with values just above the extreme edges of input domain i.e. values 2 and 1001.
Orelse u could also take 0 1 999 1000 1001
Boundary value analysis is often called as a part of stress and negative testing.
Note: There is no hard-and-fast rule to test only one value from each equivalence class you created for input domains. You can select multiple valid and invalid values from each equivalence class according to your needs and previous judgments.