Weighted random letter in Objective-C - objective-c

I need a simple way to randomly select a letter from the alphabet, weighted on the percentage I want it to come up. For example, I want the letter 'E' to come up in the random function 5.9% of the time, but I only want 'Z' to come up 0.3% of the time (and so on, based on the average occurrence of each letter in the alphabet). Any suggestions? The only way I see is to populate an array with, say, 10000 letters (590 'E's, 3 'Z's, and so on) and then randomly select an letter from that array, but it seems memory intensive and clumsy.

Not sure if this would work, but it seems like it might do the trick:
Take your list of letters and frequencies and sort them from
smallest frequency to largest.
Create a 26 element array where each element n contains the sum of all previous weights and the element n from the list of frequencies. Make note of the sum in the
last element of the array
Generate a random number between 0 and the sum you made note of above
Do a binary search of the array of sums until you reach the element where that number would fall
That's a little hard to follow, so it would be something like this:
if you have a 5 letter alphabet with these frequencies, a = 5%, b = 20%, c = 10%, d = 40%, e = 25%, sort them by frequency: a,c,b,e,d
Keep a running sum of the elements: 5, 15, 35, 60, 100
Generate a random number between 0 and 100. Say it came out 22.
Do a binary search for the element where 22 would fall. In this case it would be between element 2 and 3, which would be the letter "b" (rounding up is what you want here, I think)

You've already acknowledged the tradeoff between space and speed, so I won't get into that.
If you can calculate the frequency of each letter a priori, then you can pre-generate an array (or dynamically create and fill an array once) to scale up with your desired level of precision.
Since you used percentages with a single digit of precision after the decimal point, then consider an array of 1000 entries. Each index represents one tenth of one percent of frequency. So you'd have letter[0] to letter[82] equal to 'a', letter[83] to letter[97] equal to 'b', and so on up until letter[999] equal to 'z'. (Values according to Relative frequencies of letters in the English language)
Now generate a random number between 0 and 1 (using whatever favourite PRNG you have, assuming uniform distribution) and multiply the result by 1000. That gives you the index into your array, and your weighted-random letter.

Use the method explained here. Alas this is for Python but could be rewritten for C etc.
https://stackoverflow.com/a/4113400/129202

First you need to make a NSDicationary of the letters and their frequencies;
I'll explain it with an example:
let's say your dictionary is something like this:
{#"a": #0.2, #"b", #0.5, #"c": #0.3};
So the frequency of you letters covers the interval of [0, 1] this way:
a->[0, 0.2] + b->[0.2, 0.7] + c->[0.7, 1]
You generate a random number between 0 and 1. Then easily by checking that this random belongs to which interval and returning the corresponding letter you get what you want.
you seed the random function at the beginning of you program: srand48(time(0));
-(NSSting *)weightedRandomForDicLetters:(NSDictionary *)letterFreq
{
double randomNumber = drand48();
double endOfInterval = 0;
for (NSString *letter in dic){
endOfInterval += [[letterFreq objectForKey:letter] doubleValue];
if (randomNumber < endOfInterval) {
return letter;
}
}
}

Related

How to find most similar numerical arrays to one array, using Numpy/Scipy?

Let's say I have a list of 5 words:
[this, is, a, short, list]
Furthermore, I can classify some text by counting the occurrences of the words from the list above and representing these counts as a vector:
N = [1,0,2,5,10] # 1x this, 0x is, 2x a, 5x short, 10x list found in the given text
In the same way, I classify many other texts (count the 5 words per text, and represent them as counts - each row represents a different text which we will be comparing to N):
M = [[1,0,2,0,5],
[0,0,0,0,0],
[2,0,0,0,20],
[4,0,8,20,40],
...]
Now, I want to find the top 1 (2, 3 etc) rows from M that are most similar to N. Or on simple words, the most similar texts to my initial text.
The challenge is, just checking the distances between N and each row from M is not enough, since for example row M4 [4,0,8,20,40] is very different by distance from N, but still proportional (by a factor of 4) and therefore very similar. For example, the text in row M4 can be just 4x as long as the text represented by N, so naturally all counts will be 4x as high.
What is the best approach to solve this problem (of finding the most 1,2,3 etc similar texts from M to the text in N)?
Generally speaking, the most widely standard technique of bag of words (i.e. you arrays) for similarity is to check cosine similarity measure. This maps your bag of n (here 5) words to a n-dimensional space and each array is a point (which is essentially also a point vector) in that space. The most similar vectors(/points) would be ones that have the least angle to your text N in that space (this automatically takes care of proportional ones as they would be close in angle). Therefore, here is a code for it (assuming M and N are numpy arrays of the similar shape introduced in the question):
import numpy as np
cos_sim = M[np.argmax(np.dot(N, M.T)/(np.linalg.norm(M)*np.linalg.norm(N)))]
which gives output [ 4 0 8 20 40] for your inputs.
You can normalise your row counts to remove the length effect as you discussed. Row normalisation of M can be done as M / M.sum(axis=1)[:, np.newaxis]. The residual values can then be calculated as the sum of the square difference between N and M per row. The minimum difference (ignoring NaN or inf values obtained if the row sum is 0) is then the most similar.
Here is an example:
import numpy as np
N = np.array([1,0,2,5,10])
M = np.array([[1,0,2,0,5],
[0,0,0,0,0],
[2,0,0,0,20],
[4,0,8,20,40]])
# sqrt of sum of normalised square differences
similarity = np.sqrt(np.sum((M / M.sum(axis=1)[:, np.newaxis] - N / np.sum(N))**2, axis=1))
# remove any Nan values obtained by dividing by 0 by making them larger than one element
similarity[np.isnan(similarity)] = similarity[0]+1
result = M[similarity.argmin()]
result
>>> array([ 4, 0, 8, 20, 40])
You could then use np.argsort(similarity)[:n] to get the n most similar rows.

How do I calculate the sum efficiently?

Given an integer n such that (1<=n<=10^18)
We need to calculate f(1)+f(2)+f(3)+f(4)+....+f(n).
f(x) is given as :-
Say, x = 1112222333,
then f(x)=1002000300.
Whenever we see a contiguous subsequence of same numbers, we replace it with the first number and zeroes all behind it.
Formally, f(x) = Sum over all (first element of the contiguous subsequence * 10^i ), where i is the index of first element from left of a particular contiguous subsequence.
f(x)=1*10^9 + 2*10^6 + 3*10^2 = 1002000300.
In, x=1112222333,
Element at index '9':-1
and so on...
We follow zero based indexing :-)
For, x=1234.
Element at index-'0':-4,element at index -'1':3,element at index '2':-2,element at index 3:-1
How to calculate f(1)+f(2)+f(3)+....+f(n)?
I want to generate an algorithm which calculates this sum efficiently.
There is nothing to calculate.
Multiplying each position in the array od numbers will yeild thebsame number.
So all you want to do is end up with 0s on a repeated number
IE lets populate some static values in an array in psuedo code
$As[1]='0'
$As[2]='00'
$As[3]='000'
...etc
$As[18]='000000000000000000'```
these are the "results" of 10^index
Given a value n of `1234`
```1&000 + 2&00 +3 & 0 + 4```
Results in `1234`
So, if you are putting this on a chip, then probably your most efficient method is to do a bitwise XOR between each register and the next up the line as a single operation
Then you will have 0s in all the spots you care about, and just retrive the values in the registers with a 1
In code, I think it would be most efficient to do the following
```$n = arbitrary value 11223334
$x=$n*10
$zeros=($x-$n)/10```
Okay yeah we can just do bit shifting to get a value like 100200300400 etc.
To approach this problem, it could help to begin with one digit numbers and see what sum you get.
I mean like this:
Let's say, we define , then we have:
F(1)= 45 # =10*9/2 by Euler's sum formula
F(2)= F(1)*9 + F(1)*100 # F(1)*9 is the part that comes from the last digit
# because for each of the 10 possible digits in the
# first position, we have 9 digits in the last
# because both can't be equal and so one out of ten
# becomse zero. F(1)*100 comes from the leading digit
# which is multiplied by 100 (10 because we add the
# second digit and another factor of 10 because we
# get the digit ten times in that position)
If you now continue with this scheme, for k>=1 in general you get
F(k+1)= F(k)*100+10^(k-1)*45*9
The rest is probably straightforward.
Can you tell me, which Hackerrank task this is? I guess one of the Project Euler tasks right?

Ranking Big O Functions By Complexity

I am trying to rank these functions — 2n, n100, (n + 1)2, n·lg(n), 100n, n!, lg(n), and n99 + n98 — so that each function is the big-O of the next function, but I do not know a method of determining if one function is the big-O of another. I'd really appreciate if someone could explain how I would go about doing this.
Assuming you have some programming background. Say you have below code:
void SomeMethod(int x)
{
for(int i = 0; i< x; i++)
{
// Do Some Work
}
}
Notice that the loop runs for x iterations. Generalizing, we say that you will get the solution after N iterations (where N will be the value of x ex: number of items in array/input etc).
so This type of implementation/algorithm is said to have Time Complexity of Order of N written as O(n)
Similarly, a Nested For (2 Loops) is O(n-squared) => O(n^2)
If you have Binary decisions made and you reduce possibilities into halves and pick only one half for solution. Then complexity is O(log n)
Found this link to be interesting.
For: Himanshu
While the Link explains how log(base2)N complexity comes into picture very well, Lets me put the same in my words.
Suppose you have a Pre-Sorted List like:
1,2,3,4,5,6,7,8,9,10
Now, you have been asked to Find whether 10 exists in the list. The first solution that comes to mind is Loop through the list and Find it. Which means O(n). Can it be made better?
Approach 1:
As we know that List of already sorted in ascending order So:
Break list at center (say at 5).
Compare the value of Center (5) with the Search Value (10).
If Center Value == Search Value => Item Found
If Center < Search Value => Do above steps for Right Half of the List
If Center > Search Value => Do above steps for Left Half of the List
For this simple example we will find 10 after doing 3 or 4 breaks (at: 5 then 8 then 9) (depending on how you implement)
That means For N = 10 Items - Search time was 3 (or 4). Putting some mathematics over here;
2^3 + 2 = 10 for simplicity sake lets say
2^3 = 10 (nearly equals --- this is just to do simple Logarithms base 2)
This can be re-written as:
Log-Base-2 10 = 3 (again nearly)
We know 10 was number of items & 3 was the number of breaks/lookup we had to do to find item. It Becomes
log N = K
That is the Complexity of the alogorithm above. O(log N)
Generally when a loop is nested we multiply the values as O(outerloop max value * innerloop max value) n so on. egfor (i to n){ for(j to k){}} here meaning if youll say for i=1 j=1 to k i.e. 1 * k next i=2,j=1 to k so i.e. the O(max(i)*max(j)) implies O(n*k).. Further, if you want to find order you need to recall basic operations with logarithmic usage like O(n+n(addition)) <O(n*n(multiplication)) for log it minimizes the value in it saying O(log n) <O(n) <O(n+n(addition)) <O(n*n(multiplication)) and so on. By this way you can acheive with other functions as well.
Approach should be better first generalised the equation for calculating time complexity. liken! =n*(n-1)*(n-2)*..n-(n-1)so somewhere O(nk) would be generalised formated worst case complexity like this way you can compare if k=2 then O(nk) =O(n*n)

Number of possible binary strings of length k

One of my friends was asked this question recently:
You have to count how many binary strings are possible of length "K".
Constraint: Every 0 has a 1 in its immediate left.
This question can be reworded:
How many binary sequences of length K are posible if there are no two consecutive 0s, but the first element should be 1 (else the constrains fails). Let us forget about the first element (we can do it bcause it is always fixed).
Then we got a very famous task that sounds like this: "What is the number of binary sequences of length K-1 that have no consecutive 0's." The explanation can be found, for example, here
Then the answer will be F(K+1) where F(K) is the K`th fibonacci number starting from (1 1 2 ...).
∑ From n=0 to ⌊K/2⌋ of (K-n)Cn; n is the number of zeros in the string
The idea is to group every 0 with a 1 and find the number of combinations of the string, for n zeros there will be n ones grouped to them so the string becomes (k-n) elements long. There can be no more than of K/2 zeros as there would not have enough ones to be to the immediate left of each zero.
E.g. 111111[10][10]1[10] for K = 13, n = 3

Recognizing when to use the modulus operator

I know the modulus (%) operator calculates the remainder of a division. How can I identify a situation where I would need to use the modulus operator?
I know I can use the modulus operator to see whether a number is even or odd and prime or composite, but that's about it. I don't often think in terms of remainders. I'm sure the modulus operator is useful, and I would like to learn to take advantage of it.
I just have problems identifying where the modulus operator is applicable. In various programming situations, it is difficult for me to see a problem and realize "Hey! The remainder of division would work here!".
Imagine that you have an elapsed time in seconds and you want to convert this to hours, minutes, and seconds:
h = s / 3600;
m = (s / 60) % 60;
s = s % 60;
0 % 3 = 0;
1 % 3 = 1;
2 % 3 = 2;
3 % 3 = 0;
Did you see what it did? At the last step it went back to zero. This could be used in situations like:
To check if N is divisible by M (for example, odd or even)
or
N is a multiple of M.
To put a cap of a particular value. In this case 3.
To get the last M digits of a number -> N % (10^M).
I use it for progress bars and the like that mark progress through a big loop. The progress is only reported every nth time through the loop, or when count%n == 0.
I've used it when restricting a number to a certain multiple:
temp = x - (x % 10); //Restrict x to being a multiple of 10
Wrapping values (like a clock).
Provide finite fields to symmetric key algorithms.
Bitwise operations.
And so on.
One use case I saw recently was when you need to reverse a number. So that 123456 becomes 654321 for example.
int number = 123456;
int reversed = 0;
while ( number > 0 ) {
# The modulus here retrieves the last digit in the specified number
# In the first iteration of this loop it's going to be 6, then 5, ...
# We are multiplying reversed by 10 first, to move the number one decimal place to the left.
# For example, if we are at the second iteration of this loop,
# reversed gonna be 6, so 6 * 10 + 12345 % 10 => 60 + 5
reversed = reversed * 10 + number % 10;
number = number / 10;
}
Example. You have message of X bytes, but in your protocol maximum size is Y and Y < X. Try to write small app that splits message into packets and you will run into mod :)
There are many instances where it is useful.
If you need to restrict a number to be within a certain range you can use mod. For example, to generate a random number between 0 and 99 you might say:
num = MyRandFunction() % 100;
Any time you have division and want to express the remainder other than in decimal, the mod operator is appropriate. Things that come to mind are generally when you want to do something human-readable with the remainder. Listing how many items you could put into buckets and saying "5 left over" is good.
Also, if you're ever in a situation where you may be accruing rounding errors, modulo division is good. If you're dividing by 3 quite often, for example, you don't want to be passing .33333 around as the remainder. Passing the remainder and divisor (i.e. the fraction) is appropriate.
As #jweyrich says, wrapping values. I've found mod very handy when I have a finite list and I want to iterate over it in a loop - like a fixed list of colors for some UI elements, like chart series, where I want all the series to be different, to the extent possible, but when I've run out of colors, just to start over at the beginning. This can also be used with, say, patterns, so that the second time red comes around, it's dashed; the third time, dotted, etc. - but mod is just used to get red, green, blue, red, green, blue, forever.
Calculation of prime numbers
The modulo can be useful to convert and split total minutes to "hours and minutes":
hours = minutes / 60
minutes_left = minutes % 60
In the hours bit we need to strip the decimal portion and that will depend on the language you are using.
We can then rearrange the output accordingly.
Converting linear data structure to matrix structure:
where a is index of linear data, and b is number of items per row:
row = a/b
column = a mod b
Note above is simplified logic: a must be offset -1 before dividing & the result must be normalized +1.
Example: (3 rows of 4)
1 2 3 4
5 6 7 8
9 10 11 12
(7 - 1)/4 + 1 = 2
7 is in row 2
(7 - 1) mod 4 + 1 = 3
7 is in column 3
Another common use of modulus: hashing a number by place. Suppose you wanted to store year & month in a six digit number 195810. month = 195810 mod 100 all digits 3rd from right are divisible by 100 so the remainder is the 2 rightmost digits in this case the month is 10. To extract the year 195810 / 100 yields 1958.
Modulus is also very useful if for some crazy reason you need to do integer division and get a decimal out, and you can't convert the integer into a number that supports decimal division, or if you need to return a fraction instead of a decimal.
I'll be using % as the modulus operator
For example
2/4 = 0
where doing this
2/4 = 0 and 2 % 4 = 2
So you can be really crazy and let's say that you want to allow the user to input a numerator and a divisor, and then show them the result as a whole number, and then a fractional number.
whole Number = numerator/divisor
fractionNumerator = numerator % divisor
fractionDenominator = divisor
Another case where modulus division is useful is if you are increasing or decreasing a number and you want to contain the number to a certain range of number, but when you get to the top or bottom you don't want to just stop. You want to loop up to the bottom or top of the list respectively.
Imagine a function where you are looping through an array.
Function increase Or Decrease(variable As Integer) As Void
n = (n + variable) % (listString.maxIndex + 1)
Print listString[n]
End Function
The reason that it is n = (n + variable) % (listString.maxIndex + 1) is to allow for the max index to be accounted.
Those are just a few of the things that I have had to use modulus for in my programming of not just desktop applications, but in robotics and simulation environments.
Computing the greatest common divisor
Determining if a number is a palindrome
Determining if a number consists of only ...
Determining how many ... a number consists of...
My favorite use is for iteration.
Say you have a counter you are incrementing and want to then grab from a known list a corresponding items, but you only have n items to choose from and you want to repeat a cycle.
var indexFromB = (counter-1)%n+1;
Results (counter=indexFromB) given n=3:
`1=1`
`2=2`
`3=3`
`4=1`
`5=2`
`6=3`
...
Best use of modulus operator I have seen so for is to check if the Array we have is a rotated version of original array.
A = [1,2,3,4,5,6]
B = [5,6,1,2,3,4]
Now how to check if B is rotated version of A ?
Step 1: If A's length is not same as B's length then for sure its not a rotated version.
Step 2: Check the index of first element of A in B. Here first element of A is 1. And its index in B is 2(assuming your programming language has zero based index).
lets store that index in variable "Key"
Step 3: Now how to check that if B is rotated version of A how ??
This is where modulus function rocks :
for (int i = 0; i< A.length; i++)
{
// here modulus function would check the proper order. Key here is 2 which we recieved from Step 2
int j = [Key+i]%A.length;
if (A[i] != B[j])
{
return false;
}
}
return true;
It's an easy way to tell if a number is even or odd. Just do # mod 2, if it is 0 it is even, 1 it is odd.
Often, in a loop, you want to do something every k'th iteration, where k is 0 < k < n, assuming 0 is the start index and n is the length of the loop.
So, you'd do something like:
int k = 5;
int n = 50;
for(int i = 0;i < n;++i)
{
if(i % k == 0) // true at 0, 5, 10, 15..
{
// do something
}
}
Or, you want to keep something whitin a certain bound. Remember, when you take an arbitrary number mod something, it must produce a value between 0 and that number - 1.