Combining condition files - psychopy

I am working on an experiment and have parts of it built using some coder and some builder, but am stuck on a spot. The experiment presents two different lists of words to people (list A and list B) and each word in it's respective list is randomly paired with a number at the beginning of the experiment. Psychopy shows both the word and the number to participants and it is important that after they are randomly paired the word-number pairs are then yoked for the remainder of the experiment. I have used coder to randomize the pairing and construct a conditions file for the two word lists on the fly. Builder then uses these constructed conditions files to present the stimuli (words and numbers).
This is the part where I'm stuck. After the two word lists and their paired numbers are presented, I need to present a subset of both lists A and B as well as a third list of word-number pairs that was not previously presented. So, for example, a person might see something like this during the experiment:
First presentation:
List A:
frank - 1, susan - 3
List B:
shoe - 2, dingy - 1
Second presentation:
frank - 1, shoe - 2, hotel - 4
The beginning of the experiment is where coder is used to create the word and number lists as well as write the two list's condition files. That code is below:
import random
import csv
studylista=["shoe","bear","balls","dingy"]
pointslista=[1,2,3,4]
listaRand=random.sample(studylista,len(studylista))
listapointsRand=random.sample(pointslista,len(pointslista))
with open('WordsandPointslista.csv','wb') as w:
writer=csv.writer(w)
writer.writerow(['studylista','pointslista'])
for i in range(len(listaRand)):
writer.writerow([listaRand[i],listapointsRand[i]])
studylistb=["frank","robert","daniel","susan"]
pointslistb=[1,2,3,4]
listbRand=random.sample(studylistb,len(studylistb))
listbpointsRand=random.sample(pointslistb,len(pointslistb))
with open('WordsandPointslistb.csv','wb') as w:
writer=csv.writer(w)
writer.writerow(['studylistb','pointslistb'])
for i in range(len(listbRand)):
writer.writerow([listbRand[i],listbpointsRand[i]])
I need a random subset of the two previously presented lists along with an additional list that has not been presented to be seen all together by the participant. The previous word-number pairings for the already seen lists also need to be preserved. I cannot seem to discover how to do this.
I currently have the two word-number lists presented in separate routines with loops around each one. I am trying to figure out how to create a third routine that will show only some of the previously seen word-number pairs along with some new word-number pairs.
Thanks.

In your code above, don't create two separate two-column CSV files, but combine them into a single four-column file. This file can be used in multiple loops. If I understand your design correctly, it would be used first in a loop to present all the 'A' word/number pairs, and then again in a second loop to present all the 'B' word/number pairs. Lastly, use it in a final loop to present just a subset of the 'A' & 'B' pairs. This subsetting is applied via the "Selected rows" field in the loop dialog. Randomisation is optional in the first two loops, as you have already shuffled the rows, but would likely be necessary in the third loop to avoid presenting rows in the same order as in the first two loops.
Then there is the question of how to handle the third set of word/number pairs. The easiest thing to do would be to simply create them at the same time as the A & B sets and stick them in the same CSV file. But in this case, you would need the same number of words and numbers, some of which wouldn't be presented due to only running through a subset in the final loop. The alternative is to have a second code component prior to the third loop which reads in the existing file, shuffles the rows, subsets it, and then adds the new columns. i.e. doing a lot of the things which the Builder loop would otherwise do for you, but allowing you not to 'waste' words, if that is important to you.
Lastly, I also simplified your code above. Builder already imports the numpy.radom.shuffle function, which is simpler than doing all the sampling and so on, and then you don't need to import the standard random library.

Related

Bigquery Regexp pull from 2 different patterns

I'm trying to pull product search colors from urls in my bigquery data, but the urls format changed at one point so I'm trying to pull them from two different formats.
The first one is like /someproduct/color/blue/color/red and the second one is like /someproduct/?colors=blue,red.
In both these examples, the person filtered on multiple colors, and I would like it the output of the extraction to be like 'blue,red' and all in one column.
I can pull the first one with:
REGEXP_EXTRACT_all(url,r'color/([A-Za-z]+)')
and the second one with:
REGEXP_EXTRACT(url,r'colors=(\w*,\w*)')
The first one produces an array and I don't know how to get all the outputs to show as one line item in one row. A person can filter up to 20 colors, so there can be a lot of repeating within both of the URL types.
The second one also can use some improving as I have to add in an additional \w*, for each additional color filtered on, and I have a case statement that handles that, but I don't think it's the most efficient way to handle it.
Consider below approach
select url,
coalesce(regexp_extract(url, r'colors=(\w*,\w*)'), array_to_string(regexp_extract_all(url,r'color/([A-Za-z]+)'), ',')) colors
from your_table
if applied to sample data in your question - output is

Separating columns ( array of arrays) - Advanced SQL looping

I tried using a name that more accurately describes my question but msg said I am limited to 150 chars.
Looking for assistance from someone who has advanced SQL skills. Ideally I want to do it in SQL to let the computer do the work. Too much manual manipulation is ripe with the possibility of mistakes.
I've already searched for users groups within Google. All emails are being returned saying the email does not exist anymore.
What I am using appears to be a proprietary version of Dremel SQL / Google SQL, however, someone experienced in Dremel SQL will probably be able to guide me in the right direction.
BACKGROUND INFO:
Pulling a column that is an array column which holds another array (a notes column). I think maybe an array of arrays?
I have not figured a way to do what I am trying to do with Google or Dremel SQL yet.
So for now, I am doing it the hard way.
As originally pulled, the data looks like this [{Array of arrays}, {Array of arrays}, {Array of arrays}, etc., repeat... :
More specifically: [{4 or more text fields which could also hold numbers and separated by commas}, {another set of fields}, {another set of fields}...]
I.E. (this is all in just one column of data and hundreds of rows)
[
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":534, "User_email":"someone#emailaddress.com","user_shortname":"someone"},
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":1224, "User_email":"someone#emailaddress.com","user_shortname":"someone"},
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":1664, "User_email":"someone#emailaddress.com","user_shortname":"someone"}
...
]
The number of these is different for each row pulled and each has a specific ID #
A typical row of data is:
ID #, start_date, end_date, some other fields, notes_(the array field)
WHAT I AM DOING NOW is:
SQL data pull,
exporting to google sheets,
make separate tabs for the different array columns.
copying the notes column (the array column holding arrays) to a separate tab on Google Sheets, then
Split Text To Columns using the first curly brace "{" as the separater.
Here is where my dillema is.
Once pulled, I need to split all of those columns again to separate each of the individual elements in each array. Unable to Split text to Columns again with all of them highlighted. I can Split Text to Columns again one at a time but will really be a pain if I have to do that individually for each column and every row (hundreds of rows). Need to find a way to automate this.
I will also need to change each of unix dates to calendar dates within each array PLUS add rows to the spreadsheet depending on the number of columns from the first split. The columns are different for each row depending on how many notes have been added.
OR... do it with SQL (which appears to be a proprietary type of SQL similar to NoSQL but not the same). I have tried using syntax's for IBM SQL, Oracle SQL, SQL Server, and others found online but none work.
OR... do it with a looping function within Google Sheets.
Possibly re-add it to the database as a new table once both sets of arrays are completely split up.
END RESULT
ID#, date1, date 2, first created date (right now a unix date), first note, first other field, etc...
Then add a new row with
Same ID# from above, date1 from row above, date 2 from row above, next (2nd) created date (right now a unix date), 2nd note, 2nd other field, etc...
Add a new row...
3rd set of notes etc.

Dynamically creating a pivot table using fuzzy matching

So, I'm constantly being given data in new and different formats. I'm on a crusade to get my work to standardize data for easy use, and if I managed to convince the powers that be to standardize data, this problem becomes entirely moot. Until then, I have the following problem:
I get data in a variety of ways. Sometimes my gross sales are called total sales. Sometimes gross sales before discounts, total sales before discounts, Gross_Sales, etc. Discounts, deductions, exempt amounts, etc. form another column. So on and so forth. I'd like to be able to do the following:
1) Figure out what columns I want,
2) Turn those columns into a pivot table.
For part 1, I have two options, and I'm wondering if there's anymore: The 1st is to use Microsoft's fuzzy-matching add-in to help me match. I'd have a separate tab dedicated to fuzzy matching each column I need. The second is to just generate a long list of all the variants, and to test each one until I find a hit, assign it, and move onto testing the next one.
The second part is turning all of this into a pivot table - the resouces I have so far are https://www.thespreadsheetguru.com/blog/2014/9/27/vba-guide-excel-pivot-tables and How to Create a Pivot Table in VBA
Is there a better method? Is there another way?
Edit: Slightly better method - Grab the data columns, place them into a table, and pivot everything off of that table - it removes the need to re-create pivot tables, just need to move the data over.
Having the same problem, I use a mix of your two methods.
My data consists of a bunch of logs for rejected x-ray images, and the reject reason is a free text field. My solution was to create a table where the first column contains my desired output categories, and then each subsequent column contains a different variation of it.
For example, a row might have (column one/ouput first entry):
Positioning, POS, Positioning Error, Patient Positioning
Note that these are all fairly different from each other. Where the fuzzy matching comes in - it is used to capture all the smaller differences and mispellings around those other columns. When the fuzzy matching section decides a given reason matches a column's entry, it is then replaced with the appropriate desired output reason from column 1 of the table. In my example, a reason of 'Possitioning Err' [sic] would match to column 3 (Positioning Error) and then get converted to Positioning.
Then wash rinse repeat over the rest of your data as needed. This approach was super useful and fairly flexible in helping standardize my data. It was also computationally more expensive, but you'd only need to run the matching portion once I guess.
As for the actual mechanics of going about doing this - I use 2010, so no inbuilt functionality. I run the fuzzy matching code on a temporary worksheet until best percentage matches are found, and then overwrite the actual source data afterwards.

Comparing two datasets in SSRS

I'm looking to compare two datasets with each other. In an ideal world, I'd like to have it to show a green item if the data matches between the two. I have created two different GDocs files to get the code out there, to prevent SO from dinging me on formatting.
The first dataset is from our program itself, it pulls everything from our application, and displays the information, based on company code. The second dataset is from an external source requiring validation. The main fields I am matching are "NPI Number (Type 1)" from DS1 vs. "NPI" from DS2. If there is a match to highlight in green the row from both sides of data.
Dataset 1
Dataset 2
You may need to use LookUp function and set that as a expression to fill the background color of a text box or row of a table
Sample Expression: =iif(Len(Lookup(Fields!NPI.Value, Fields!NPI.Value, Fields!ProviderName.Value, "DS1"))>0,"Green","Red")
I have created a sample here. Download entire content and run it.

How to apply a single section across multiple columns in Business Intelligence

I do a lot of reporting out of our Electronic Health Record using a Business Objects product, and one thing I run into frequently is records for which most of the columns are the same, but a few may have multiple different values.
For instance, a report I'm working on has 8 columns, mostly static information about the patient/encounter, some lab values, and a column for the consulting physician. All the columns will have only a single value per patient/encounter, except for consulting physician which may have multiple. I'd like to somehow set the table to show only a single row for the data that is unchanged, so they don't end up seeing the FIN, MRN, and lab values over and over.
However, as far as I've been able to tell with my fiddling around, I can only apply a section or break to a single column. Creating multiple sections or breaks nests them. Does anybody know of a way to treat multiple columns as sort of a composite section?
edit: I did try pulling the consulting physician column out into its own table and then setting the room number as a section, but it still caused repeated rows of the other data for any that had multiple consultings.
Additional edit: As requested here's a mockup of approximately what I'd like to see. This is mostly how it looks already when I tell BO to use the room number (the number in blue, top left of each row) as a section, however in the case of the third room, it would repeat the information in the first 5 columns for each consulting listed.
Couple of ways to do it, but putting breaks on each column is what I would do.
So, starting from "FIN" and working to "Attending", add a break on each column. It will add a summary row for each, so it will look like:
Then select the summary rows, right-click, and Delete: