Spaced repetition (SRS) for learning - sql

A client has asked me to add a simple spaced repeition algorithm (SRS) for an onlinebased learning site. But before throwing my self into it, I'd like to discuss it with the community.
Basically the site asks the user a bunch of questions (by automatically selecting say 10 out of 100 total questions from a database), and the user gives either a correct or incorrect answer. The users result are then stored in a database, for instance:
userid questionid correctlyanswered dateanswered
1 123 0 (no) 2010-01-01 10:00
1 124 1 (yes) 2010-01-01 11:00
1 125 1 (yes) 2010-01-01 12:00
Now, to maximize a users ability to learn all answers, I should be able to apply an SRS algorithm so that a user, next time he takes the quiz, receives questions incorrectly answered more often; than questions answered correctly. Also, questions that are previously answered incorrectly, but recently often answered correctly should occur less often.
Have anyone implemented something like this before? Any tips or suggestions?
Theese are the best links I've found:
http://en.wikipedia.org/wiki/Spaced_repetition
http://www.mnemosyne-proj.org/principles.php
http://www.supermemo.com/english/ol/sm2.htm

What you want to do is to have a number X_i for all questions i. You can normalize these numbers (make their sum 1) and pick one at random with the corresponding probability.
If N is the number of different questions and M is the number of times each question has been answered in average, then you could find X in M*N time like this:
Create the array X[N] set to 0.
Run through the data, and every time you see question i answered wrong, increase N[i] by f(t) where t is the answering time and f is an increasing function.
Because f is increasing, a question answered wrong a long time ago has less impact than one answered wrong yesterday. You can experiment with different f to get a nice behaviour.
The smarter way
A faster way is not to generate X[] every time you choose questions, but save it in a database table.
You won't be able to apply f with this solution. Instead just add 1 every time the question is answered wrongly, and then run through the table regularly - say every midnight - and multiply all X[i] by a constant - say 0.9.
Update: Actually you should base your data on corrects, not wrongs. Otherwise, questions not answered neither true nor false for a long time, will have a smaller chance of getting chosen. It should be opposite.

Anki is an open source program implementing spaced repetition.
Being open source, you can browse the source for libanki, a spaced repetition library for Anki.
As of Januray 2013, Anki version 2 sources can be browsed here.
The sources are in Python, the executable pseudo code language.
Reading the source to understand the algorithm may be feasible. The data model is defined using sqlalechmey, the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.

Here is a spaced repetition algorithm that is well documented and easy to understand.
Features
Introduces sub-decks for efficiently learning large decks (Super
useful!)
Intuitive variable names and algorithm parameters. Fully
open-source with human-readable examples.
Easily configurable
parameters to accommodate for different users' memorization
abilities.
Computationally cheap to compute next card. No need to
run a computation on every card in the deck.
https://github.com/Jakobovski/SaneMemo.
Disclaimer: I am the author of SaneMemo.
import random
import datetime
# The number of times needed for the user to get the card correct(EASY) consecutively before removing the card from
# the current sub_deck.
CONSECUTIVE_CORRECT_TO_REMOVE_FROM_SUBDECK_WHEN_KNOWN = 2
CONSECUTIVE_CORRECT_TO_REMOVE_FROM_SUBDECK_WHEN_WILL_FORGET = 3
# The number of cards in the sub-deck
SUBDECK_SIZE = 15
REMINDER_RATE = 1.6
class Deck(object):
def __init__(self):
self.cards = []
# Used to make sure we don't display the same card twice
self.last_card = None
def add_card(self, card):
self.cards.append(card)
def get_next_card(self):
self.cards = sorted(self.cards) # Sorted by next_practice_time
sub_deck = self.cards[0:min(SUBDECK_SIZE, len(self.cards))]
card = random.choice(sub_deck)
# In case card == last card lets select again. We don't want to show the same card two times in a row.
while card == self.last_card:
card = random.choice(sub_deck)
self.last_card = card
return card
class Card(object):
def __init__(self, front, back):
self.front = front
self.back = back
self.next_practice_time = datetime.utc.now()
self.consecutive_correct_answer = 0
self.last_time_easy = datetime.utc.now()
def update(self, performance_str):
""" Updates the card after the user has seen it and answered how difficult it was. The user can provide one of
three options: [I_KNOW, KNOW_BUT_WILL_FORGET, DONT_KNOW].
"""
if performance_str == "KNOW_IT":
self.consecutive_correct_answer += 1
if self.consecutive_correct_answer >= CONSECUTIVE_CORRECT_TO_REMOVE_FROM_SUBDECK_WHEN_KNOWN:
days_since_last_easy = (datetime.utc.now() - self.last_time_easy).days
days_to_next_review = (days_since_last_easy + 2) * REMINDER_RATE
self.next_practice_time = datetime.utc.now() + datetime.time(days=days_to_next_review)
self.last_time_easy = datetime.utc.now()
else:
self.next_practice_time = datetime.utc.now()
elif performance_str == "KNOW_BUT_WILL_FORGET":
self.consecutive_correct_answer += 1
if self.consecutive_correct_answer >= CONSECUTIVE_CORRECT_TO_REMOVE_FROM_SUBDECK_WHEN_WILL_FORGET:
self.next_practice_time = datetime.utc.now() + datetime.time(days=1)
else:
self.next_practice_time = datetime.utc.now()
elif performance_str == "DONT_KNOW":
self.consecutive_correct_answer = 0
self.next_practice_time = datetime.utc.now()
def __cmp__(self, other):
"""Comparator or sorting cards by next_practice_time"""
if hasattr(other, 'next_practice_time'):
return self.number.__cmp__(other.next_practice_time)

Related

pyomo: minimal production time / BIG M

I am looking for a way to map a minimum necessary duty cycle in an optimization model.
After several attempts, however, I have now reached the end of my knowledge and hope for some inspiration here.
The idea is that a variable (binary) mdl.ontime is set so that the sum of successive ontime values is greater than or equal to the minimum duty cycle:
def ontime(mdl,t):
min_on_time = 3 # minimum on time in h
if t < min_on_time: return mdl.ontime[t] == 0
return sum(mdl.ontime[t-i] for i in range(min_on_time)) >= min_on_time
That works so far, if the variable mdl.ontime will not be recognized at all.
Then I tried three different constraints, unfortunately they all gave the same result: CPLEX only finds inf. results.
The first variant was:
def flag(mdl,t):
return mdl.ontime[t] + (mdl.production[t]>=0.1) >= 2
So if mdl.ontime is 1 and mdl.production is greater or equal 0.1 (the assumption is just exact enough), it should be greater or equal 2: a logical addition therm.
The second attemp was quite similar to the first:
def flag(mdl,t):
return (mdl.ontime[t]) >= (mdl.production[t] >= 0.1)
If mdl.ontime is 1, it should be greater or equal mdl.production compared with 0.1.
And the third with a big M variable:
def flag(mdl,t):
bigM = 10**6
return mdl.ontime[t] * bigM >= mdl.production[t]
bigM instead should be great enough in my case...
All of them do not work at all...and I have no idea, why CPLEX returns the error that there is only an infeasible solution.
Basically the model runs if I don't consider the ontime-integration.
Do you guys have any more ideas how I could implement this?
Many greetings,
Mathias
It isn't real clear what the desired relationship is between your variables/constraints. That said, I don't think this is legal. I'm surprised that it isn't popping an error....and if not popping an error, I'm pretty sure it isn't doing what you think:
def flag(mdl,t):
return mdl.ontime[t] + (mdl.production[t]>=0.1) >= 2
You are essentially burying an inferred binary variable in there with the test on mdl.production, which isn't going to work, I believe. You probably need to introduce another variable or such.

Cannot make sense of keras.datasets.imdb

I have two problems:
First off, the documentation for tf.keras.datasets.imdb.get_word_index says
Retrieves the dictionary mapping word indices back to words.
While in fact it's the contrary,
print(tf.keras.datasets.imdb.get_word_index())
{'fawn': 34701, 'tsukino': 52006, 'nunnery': 52007
I tried to run this in TensorFlow 2.0
(train_data_raw, train_labels), (test_data_raw, test_labels) = keras.datasets.imdb.load_data()
words2idx = tf.keras.datasets.imdb.get_word_index()
idx2words = {idx:word for word, idx in words2idx.items()}
i = 0
train_ex = [idx2words[x] for x in train_data_raw[0]]
train_ex = ' '.join(train_ex)
print(train_ex)
This result in a nonsense string
the as you with out themselves powerful lets loves their [...]
Shouldn't I get a valid movie review?
I did a bit of digging and found that there are a few "offsets" in the processing which need to be undone in order to return a sensible review language. I modified your line to subtract 3 from the index that appears in the raw sequence (since the default is to start real words with index=3), and also the first character is a dummy marker (set to 1), so the real text starts at position 2 (or index 1 in python).
train_ex = [idx2words[x-3] for x in train_data_raw[0][1:]]
Using the above modification gives me the following for the review you originally selected:
this film was just brilliant casting location scenery story direction everyone's really suited the part they played ...
It appears that some punctuation and capitalization is removed etc, but this seems to return sensible reviews.
I hope this helps.

Optimizing code for better performance and quality

I have this calculation method that calculates 6 fields and total.
It works.
Question is how can I optimize it performance wise and code quality wise.
Just want to get some suggestions on how to write better code.
def _ocnhange_date(self):
date = datetime.datetime.now().strftime ("%Y-%m-%d %H:%M:%S")
self.date = date
self.drawer_potential = self.drawer_id.product_categ_price * self.drawer_qty
self.flexible_potential = self.flexible_id.product_categ_price * self.flexible_qty
self.runner_potential = self.runner_id.product_categ_price * self.runner_qty
self.group_1_potential = self.group_1_id.product_categ_price * self.group_1_qty
self.group_2_potential = self.group_2_id.product_categ_price * self.group_2_qty
self.group_3_potential = self.group_3_id.product_categ_price * self.group_3_qty
total = [self.drawer_potential,self.flexible_potential,self.runner_potential,self.group_1_potential,
self.group_2_potential,self.group_3_potential]
self.total_potentail = sum(total)
First things first: you should worry about performance mostly on batch operations. Your case is an onchange method, which means:
it will be triggered manually by user interaction.
it will only affect a single record at a time.
it will not perform database writes.
So, basically, this one will not be a critical bottleneck in your module.
However, you're asking how that could get better, so here it goes. It's just an idea, in some points just different (not better), but this way you can maybe see a different approach in some place that you prefer:
def _ocnhange_date(self):
# Use this alternative method, easier to maintain
self.date = fields.Datetime.now()
# Less code here, although almost equal
# performance (possibly less)
total = 0
for field in ("drawer", "flexible", "runner",
"group_1", "group_2", "group_3"):
potential = self["%s_id"].product_categ_price * self["%s_qty"]
total += potential
self["%s_potential"] = potential
# We use a generator instead of a list
self.total_potential = total
I only see two things you can improve here:
Use Odoo's Datetime class to get "now" because it already takes Odoo's datetime format into consideration. In the end that's more maintainable, because if Odoo decides to change the whole format system wide, you have to change your method, too.
Try to avoid so many assignments and instead use methods which allow a combined update of some values. For onchange methods this would be update() and for other value changes it's obviously write().
def _onchange_date(self):
self.update({
'date': fields.Datetime.now(),
'drawer_potential': self.drawer_id.product_categ_price * self.drawer_qty,
'flexible_potential': self.flexible_id.product_categ_price * self.flexible_qty,
# and so on
})

PsychoPy: "Conditional" but random selection from list across two dimensions

In my experiment, I am presenting images (faces) that are different across 2 dimensions: face identity and emotion.
There are 5 faces displaying 5 different emotional expressions; making 25 unique stimuli in total. These only need to be presented once (so 25 trials).
After I present one of the faces, the next face has to be different on only the emotion OR the identity, but the same on the other.
Example:
Face 1, emotion 1 -> face 3, emotion 1 -> face 3, emotion 4 -> ... etc.
1: is psychopy up to this task? I have mostly worked with the builder so far, except for some data-logging code, but I'd be happy to get more experienced with the coder.
My hunch is that I would need to add two columns to the list of trials, one for identity and one for emotion. Then use the getEarlierTrial call somehow, but I pretty much get lost at this point.
2: Would anyone be willing to point me in the right direction please?
Many thanks in advance.
This is difficult to implement in Builder's normal mode of operation, which is to drive trials from a fixed list of conditions. Although the order of rows can be randomised across subjects, the pairings of values across columns remains constant.
The standard answer to this is what you allude to in your comment above: in code, shuffle the conditions file at the beginning of each experiment, so each subject is in essence having their trials driven by a unique conditions file.
You seem happy to do this in Matlab. That would work fine, as this stuff can be done before PsychoPy even launches. But it could also very easily be implemented in Python code. That way you could do everything in PsychoPy, and in this case there would be no need to abandon Builder. You'd just insert a code component with some code to run at the beginning of the experiment that customises a conditions file.
You'll need to create three lists, not two, i.e. you also need a list of pseudo-random choices to alternate between preserving either face or emotion from trial to trial: if you do this fully randomly, you'll get unbalanced, and exhaust one of the attributes before the other.
from numpy.random import shuffle
# make a list of 25 dictionaries of unique face/emotion pairs:
pairsList = []
for face in ['1', '2', '3', '4', '5']:
for emotion in ['1', '2', '3', '4', '5']:
pairsList.append({'faceNum': face, 'emotionNum': emotion})
shuffle(pairsList)
# a list of whether to alternate between preserving face or emotion across trials:
attributes = ['faceNum', 'emotionNum'] * 12 # length 24
shuffle(attributes)
# need to create an initial selection before cycling though the
# next 24 randomised but balanced choices:
pair = pairsList.pop()
currentFace = pair['faceNum']
currentEmotion = pair['emotionNum']
images = ['face_' + currentFace + '_emotion_' + currentEmotion + '.jpg']
for attribute in attributes:
if attribute == 'faceNum':
selection = currentFace
else:
selection = currentEmotion
# find another pair with the same selected attribute:
for pair in pairsList:
if pair[attribute] == selection:
# store the combination for this trial:
currentFace = pair['faceNum']
currentEmotion = pair['emotionNum']
images.append('face_' + currentFace + '_emotion_' + currentEmotion + '.jpg')
# remove this combination so it can't be used again
pairsList.remove(pair)
images.reverse()
print(images)
Then just write the images list to a single column .csv file to use as a conditions file.
Remember to set the loop in Builder to be in a fixed order, not randomised, as the list itself has the randomisation built in.

Is this considered memoisation?

In optimising some code recently, we ended up performing what I think is a "type" of memoisation but I'm not sure we should be calling it that. The pseudo-code below is not the actual algorithm (since we have little need for factorials in our application, and posting said code is a firing offence) but it should be adequate for explaining my question. This was the original:
def factorial (n):
if n == 1 return 1
return n * factorial (n-1)
Simple enough, but we added fixed points so that large numbers of calculations could be avoided for larger numbers, something like:
def factorial (n):
if n == 1 return 1
if n == 10 return 3628800
if n == 20 return 2432902008176640000
if n == 30 return 265252859812191058636308480000000
if n == 40 return 815915283247897734345611269596115894272000000000
# And so on.
return n * factorial (n-1)
This, of course, meant that 12! was calculated as 12 * 11 * 3628800 rather than the less efficient 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1.
But I'm wondering whether we should be calling this memoisation since that seems to be defined as remembering past results of calculations and using them. This is more about hard-coding calculations (not remembering) and using that information.
Is there a proper name for this process or can we claim that memoisation extends back not just to calculations done at run-time but also those done at compile-time and even back to those done in my head before I even start writing the code?
I'd call it pre-calculation rather than memoization. You're not really remembering any of the calculations you've done in the process of calculating a final answer for a given input, rather you're pre-calculating some fixed number of answers for specific inputs. Memoization as I understand it is really more akin to "caching" a set of results as you calculate them for later reuse. If you were to store each value calculated so that you didn't need to recalculate it again later, that would be memoization. Your solution differs in that you never store any "calculated" results from your program, only the fixed points that have been pre-calculated. With memoization if you reran the function with an input different than one of the pre-calculated ones it would not have to recalculate the result, it would simply reuse it.
Whether or not you are hard coding the results in, this is still memoization because you have already calculated results that you are expecting to calculate again. Now this may come in the form of run-time, or compile time.. but either way, it's memoization.
Memoization is done at run-time. You are optimizing at compile time. So, it is not.
See for example ... Wikipedia
Or ...
Memoization
The term memoization was coined by Donald Michie (1968) to refer to the process by which a function is made to automatically remember the results of previous computations. The idea has become more popular in recent years with the rise of functional languages; Field and Harrison (1988) devote a whole chapter to it. The basic idea is just to keep a table of previously computed input/result pairs.
Peter Norvig
University of California
(the bold is mine)
Link
def memoisation(f):
dct = {}
def myfunction(x):
if x not in dct:
dct[x] = f(x)
return dct[x]
return myfunction
#memoisation
def fibonacci(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
def nb_appels(n):
if n==0 or n==1:
return 0
else:
return 1 + nb_appels(n-1) + 1 + nb_appels(n-2)
print(fibonacci(13))
print ('nbappel',nb_appels(13))