good method for approving mechanical turk participants? - mechanicalturk

Does anyone have a good method for approving people who took part in a survey on survey monkey, and who were recruited through mechanical turk?
I filter out people who did not pay attention during the survey by asking questions with obvious answers - if people get 'n' number of them wrong, I exclude them from payment.
After I downloaded .csv from mechanical turk, I paste two columns at the end of the .csv the MTurk Id and a 1 or 0 next to the name, indicating whether they will be paid or not.
How can I write a function that will search through the two columns containing MTurk ID's (the one that came in the .csv and the one that I pasted in) and then return whether the MTurk ID has a 1 or 0 next to it? his would make dis/approving so much easier.

I assume you are using a spreadsheet program since you mention "Adding two columns"? Why don't you just sort by the column with the one or zero in it to group the approved turk IDs together?
Here is how to accomplish this with a vlookup:
Assume you have your list of Turk IDs and the 1/0 approval code in columns A and B (A contains the Turk IDs and B contains a 1 or 0). Also assume you have the ID to test in column C and you are going to put the result of the vlookup test in column D:
A - Turk ID B - Approval C - ID to test D - Result
----------- ------------ -------------- ----------
1 ABC12345 0 DEF46253
2 ERF78878 1 HFH36251
3 HFH36251 1 ERF78878
4 DEF46253 0 ABC12345
Set the formula of cell D1 to =VLOOKUP(C1,$A$1:$B$4,2,FALSE)
Paste that into D2..D4 (obviously your list will be larger)
It will find the Turk ID in Col A and fill in the corresponding Approval value in Col D.
If you want to know what the arguments to the vlookup function are - the first is the value to look for (the ID you want to check), the second is the entire range of values to check (use the $'s in front of the cell references to make them absolute, so they don't change when you paste the formula into new cells), the third is the column of that range to pull (column 2 of the range is the approval number), the last argument is FALSE which forces an exact match of ID to ID).
Hope that helps.

Related

Use of Index and Match function with Count/Counta/Countif to count the results on a 3rd wookbok

I have great success in guidance from the experts on Stack. I need guidance once more. :)
I have workbook 1 that has a copy of names on column A and email addresses on column B. Workbook 2 that has copy of names on A and email addresses on column B and data that I need copied on Column c on workbook 2.
I have a workbook 3 in which I would like to use to build a table with matched and counted from the formula I need. Below I will show.
workbook 1 pic:
Workbook 2 pic:
Workbook 3 pic:
Ultimately, I wanted the index,matched and counted total to show up on the basic table shown on workbook 3. As you can see there are 4 email addresses that match between work book 1 and workbook 2, but there are also 4 email addresses that do not match. after the matching and unmatching is found, i want the formula to give me the count of the matched and unmatched from column c in workbook 2.
so if you just use your eyes and count, you will see that rows 2 through 4 these email addresses matched between work book 1 and work book 2. that would give me a total of 2 drinks expenses and 2 food expenses from the matching found, but also give me 2 tickets expenses and 2 parking expenses from the unmatched found.
For the Matched:
=SUMPRODUCT((COUNTIFS('Sheet1'!A:A,'Sheet2'!$A$2:$A$100,'Sheet1'!B:B,'Sheet2'!$B$2:$B$100)>0)*('Sheet2'!$C$2:$C$100=$B3))
For the unMatched
=SUMPRODUCT((COUNTIFS('Sheet1'!A:A,'Sheet2'!$A$2:$A$100,'Sheet1'!B:B,'Sheet2'!$B$2:$B$100)=0)*('Sheet2'!$C$2:$C$100=$B7))
These are untested as I did not want to retype that much data, but in thoery should work.

How to copy to cells from a range if the columns to the right match?

I looked for a solution for this using VLOOKUP and programmatically and I couldn't find it. I hope you guys can help.
I have two spreadsheets with same headers and similar data. One is complete the other is not. The first column (lets call it "ID") of the completed spreadsheet messed up.
I want to copy the values from the "ID" column of the incomplete version to the new version based on if the cell to the right of each (lets call it "Names") matches.
To clarify, the algorithm or formula has to look through the column "Names" of the OLD (incomplete) version and if it finds a match in the NEW version, copy it to its left.
I cannot just sort alphabetically and copy and paste, because the completed worksheet has some duplicates that may be needed.
EDIT: EXAMPLE OF MY DATA:
Sheet1 Sheet 2
ID NAME ID Name Age
112 John 156 Dog 11
113 Bob 1xx Bob 15
156 Dog 1xx Bob 16
1xx John 18
Since the ID is messed up (because the ID I work with got messed up when exporting from Google Fusion Tables) I need to copy to the NEW file the "Ids" from the OLD version. This is just a simple example, I have over 200 000 rows of data.
Assuming ID is in A1 on both sheets and that the xx indicate IDs to be replaced, please add a new ColumnA in Sheet 2 and in A2 there:
=IF(ISNUMBER(B2),B2,INDEX(Sheet1!A:A,MATCH(C2,Sheet1!B:B,0)))
copied down to suit.
The first part ISNUMBER(B2) tests for a numeric ID in the data set that is a mixture of sound and corrupt. If that is a number and corrupt there may be no way to identify the corruption from the information provided.
So if that test is passed accept the value from the corrupted sheet (ie B2).
If however the test fails, then find the relevant Name's location (for Row2 the relevant name is Dog) in the incomplete sheet (ie Row4) and use INDEX to lookup the value associated with Dog (to its left) in the incomplete sheet.
Assuming the new spreadsheet is called "New" and the old is called "Old," and assuming that the names appear in column A of each spreadsheet, starting in row 1, then use this formula in column B of the new spreadsheet:
=iferror(vlookup(New!A1,Old!$A:$A,1,false),"?????")
So your spreadsheet looks like this:
A B
1 Coke =iferror(vlookup(New!A1,Old!$A:$A,1,false),"?????")
2 Pepsi =iferror(vlookup(New!A2,Old!$A:$A,1,false),"?????")
3 Sprite =iferror(vlookup(New!A3,Old!$A:$A,1,false),"?????")
4 asdfvasdl =iferror(vlookup(New!A4,Old!$A:$A,1,false),"?????")
5 Dr. Pepper =iferror(vlookup(New!A5,Old!$A:$A,1,false),"?????")
and it should display like this:
A B
1 Coke Coke
2 Pepsi Pepsi
3 Sprite Sprite
4 asdfvasdl ???????
5 Dr. Pepper Dr. Pepper

Trailing Average Using AverageIf in Excel

I am trying to find the average for the last 3 instances only. I am using the AVERAGEIF statement and it will calculate the average for the entire range but I need it to only calculate for that last 3 instances it finds (or less if there is less than 3 available). I need the entire column for G and H to have the average for the last 3 games that the Team played.
This is what I have:
=AVERAGEIF(B3:C17,B17,D3:E17)
You can do this with array formulas (They have to be entered using the keys Ctrl+Shift+Enter)...
Basic steps are:
Find the row (including and above current) that is the third highest row number containing the team name (or use row 1 otherwise)
Use the INDIRECT ranges in your AVERAGEIF from B-that_row to C-current_row and D_that_row to E-current_row
So in cell F17 you would have the formula
{=AVERAGEIF(INDIRECT("B"&LARGE(IF(--($B$3:B17=B17)+($C$3:C17=B17),ROW($B$3:B17),1),3)&":"&CELL("address",C17)),B17,INDIRECT("D"&LARGE(IF(--($B$3:B17=B17)+($C$3:C17=B17),ROW($B$3:B17),1),3)&":"&CELL("address",E17)))}
We repeat some of the logic, because we have two ranges (criteria range and average range).
IF(--($B$3:B17=B17)+($C$3:C17=B17),ROW($B$3:B17),1) means that if column B or (using +) column C has the value of in B17, give me the row number, otherwise 1 (our <3 case... we could make this 3, the first row of team names)
LARGE(...,3) will give us the third highest of this array --> the third highest row number having our team name
INDIRECT("B"&...&":"&CELL("address",C17)) is going to give us the range using our third highest row number to the current row, columns B and C
then we do exactly the same thing as you were doing in AVERAGEIF but using this INDIRECT range and the equivalent for columns D and E
Fun question! Good luck. And remember to use Ctrl+Shift+Enter to enter it!
EDIT The above was giving an #NUM! error for the first two rows - that was because the LARGE function was trying to get the third largest in an array of 2! Also noticed that there were some cases where the column letter needed to be absolute (i.e. $) for copying to the Away column. So the updated formula:
{=AVERAGEIF(INDIRECT("B"&LARGE(IF(--($B$3:$B17=B17)+($C$3:$C17=B17),ROW($B$3:$B17),1),MIN(3,ROW()-2))&":"&CELL("address",$C17)),B17,INDIRECT("D"&LARGE(IF(--($B$3:$B17=B17)+($C$3:$C17=B17),ROW($B$3:$B17),1),MIN(3,ROW()-2))&":"&CELL("address",$E17)))}
Replaced the 3 with MIN(3,ROW()-2) so that we get 3 if there are, but 1 or 2 if we are in one of the first two data rows
OK I posted this prematurely and attempted to delete it when I realised it wouldn't work. It should work now.... providing you add another condition which is the game dates in column A. Remember that this is an array formula so hit ctrl+shift+enter. Dates in column A; teams in column B; stats in column D. This formula can reside somewhere permanent on the sheet so you can enter the team name (shown as F13 here) to get the three most recent stats.
=AVERAGE(VLOOKUP(LARGE(IF(B3:B24=F13,A3:A24),1),A3:D24,4),VLOOKUP(LARGE(IF(B3:B24=F13,A3:A24),2),A3:D24,4),VLOOKUP(LARGE(IF(B3:B24=F13,A3:A24),3),A3:D24,4))

Extract substring of list based on another list

Using two lists, one consisting of names with added information in various forms (see below for example - list 1) and one consisting of the clear formatted names, i.e. with no added information (list 2)
List 1
--------
Netto City | Value
Imerco City | value
Bilka Suburb | value
Bauhaus, City | Value
City FDB Superb | Value
List 2
------
Netto
Imerco
Bilka
Bauhaus
FDB Super
What I am trying to do is create a filter, so that no matter what the first column of my source data(list 1) looks like, i will be able to sum the values based on (list 2).
Something similar to this: Excel - extracting data based on another list
I tried using vlookup, but that does not search for substrings, then i tried using
=IF(COUNTIF(A$4:A$9;"*"&D5&"*")>0;
INDIRECT(ADDRESS(MATCH("*"&D5&"*";A$4:A$9;0);4));"not found")
But that appears to do the opposite, search list 1 for a single cell value from list 2.
I can't quite get my head around if this works just as well, I havent been able to get it to work anyway, thus my search for the other way. Search List 2, for each item from List 1.
But, ultimately, what I am trying to accomplish is to create a list from the source data, which I can use to categorize each item in list 1 from, based on list 3
List 3
Bilka | Cat1
Imerco | Cat2
FDB Super | Cat1
etc.
For that to work, i need a clean list of the source data, without all the extra information which comes with it.
I use the following sumif
=SUMIFS($F$3:$F$703;$B$3:$B$703;
"="&$H4;$D$3:$D$703;">="&I$2;$D$3:$D$703;"<="&I$3)
to sum all sums belonging to a particular item in List 3 (where i've manually created List 3), between to dates.
The purpose of this is to create a sheet that contains all expenditures to a particular store or category of ones own choosing, for instance the ones listed in List 1, are primarily food stores.
Edit - Clarification.
What I am proposing to do is a multistage process.
Stage 1:
Insert original source data (done)
Stage 2:
Filter source data for unique values (done)
Stage 3:
Create list of approve names for each item in source data
- Ie, Bilka Suburb into Bilka, Netto City into Netto
Here 'Netto' and 'Bilka' are approved names which is manually created to allow for grouping in stage 4. I am looking to automatize this step.
Stage 4:
Group each item from the list of Stage 3, based on name and date-interval, weekly monthly whatever (done) if i could only get Stage 3 to work, as it works on my manually corrected data.
Stage 5:
Select appropriate category, and type for each item in resulting list from Stage 3:
Bilka, is a food place, so it would get the category 'food', same as netto, where Bauhaus would get the category 'Building Supplies', each of these items would get the type 'expense' where say wage would get the type 'income' (done)
the solution to stage 5, is just a vlookup, based on the category into a table that lists each category with a type, so that is simple enough.
Final Solution: Requires that the list to iterate over is in column G, and outputs the list of approved names in column H. There is the error of if not being able to know the difference between an item such as "Super" and "SU", I don't know how to fix that. If anyone has any suggestions on that I am all ears.
Sub LoopCells()
Sheets("RawData").Select
Sheets("RawData").Activate
LRApproved = Cells(Rows.Count, "H").End(xlUp).Row
LRsource = Cells(Rows.Count, "G").End(xlUp).Row
For Each approvedcell In Worksheets("RawData").Range("H2:H" & LRApproved).Cells 'Approved stores entered by users
For Each sourcecell In Worksheets("RawData").Range("G2:G" & LRsource).Cells 'items found from bank statement export
If InStr(UCase(sourcecell.Value), UCase(approvedcell.Value)) <> 0 Then
sourcecell.Offset(0, 2).Value = approvedcell.Value
End If
Next sourcecell
Next approvedcell
End Sub
Thanks for all the help.
Edit: Added final solution and VBA tag.
This works for me:
=SUM(B$3:B$7*NOT(ISERROR(SEARCH(A11,A$3:A$7))))
This assumes that your example list 1 is in range A3:B7 and your list 2 in A11:B15. Paste the above formula in cell B11 and press CtrlShift-Enter to enter it as an array formula. Then you can drag-copy it all the way down to B15.
Explanation: SEARCH for e.g. "Netto" in the cells of List 1. For cells that do not contain that string, SEARCH returns an error. So we're looking for cells that do not return an error. We now have an array of booleans indicating this. Multiply it element-by-element by the array of values. In this multiplication, TRUE is interpreted as 1 and FALSE as zero, so you're screening out the values that don't correspond to "Netto".
Here's a secreenshot of my setup:
Perhaps I've misunderstood but can't you use SUMIF?
=SUMIF(A$4:A$9;"*"&D5&"*";B$4:B$9)
instead of going with VBA, you can extract this with simple small formula. =Index(List2!A2:A10,Match(1,Countif(List1A2,""&List2!A2:A10&""),0)) (Press Ctrl+Shift+Enter). Assume you want to extract the list 2 in to list 1.

Excel countif Pulling apart a cell to do different things

Excel 2007
I have a row of cells with variation of numbers and letters (which all mean something.. not random.)
It's basically a timesheet. If they take a sick day they put in S, if they take a partial sick day they put in PS. The problem is they also put in the hours they did work too. They put it in this format: (number)/PS.
Now if it were just letters I could just do =countif(range,"S") to keep track of how many s / ps cells there are. How would I keep track if they are PS where it also has a number separated by a slash then PS.... I also still need to be able to use that number to add to a total. Is it even possible or will I have to format things different to be able to keep track of all this stuff.
Assuming this is something like what your data looks like:
A B C D E
1 1 2 S 4/PS 8
...then you could do this:
1- add a column that just totals the "S" entries with a COUNTIF function.
2- add a hidden row beneath each real data row that will copy the numerical part of the PS entries only with this function in each column:
=IF(RIGHT(B1,2)="PS",IF(ISERROR(LEFT(B1,LEN(B1)-SEARCH("/",B1)-1)),"",INT(LEFT(B1,LEN(B1)-SEARCH("/",B1)-1))),"")
3- add another column to the right that just totals the "PS" entries by summing the hidden row from step 2.
3- add another column that totals everything by just summing the data row. that will ignore the text entries automagically.
4- have a grand total column that adds those three columns up
If you don't want to see the "S" and "PS" total columns, you can of course just hide them.
So in the end, the sheet would look like this:
A B C D E F G H I J
1 1 2 S 4/PS 8 1 4 11 16
2 4 <--- hidden row
HTH...
My quick take on this is:
pass the cell value into a CSTR function, so no matter what is entered you will be working with a string.
parse the information. Look for S, PS, or any other code you deem to be valid. Use Left or Right functions if you need to look at partial string.
check for number by testing the ascii value, or trying a CINT function, which will only work if the string can be converted to integer.
If you can show a sample of your cells with variation of numbers and letters I can give you more help. Hope this works out.
-- Mike