"Weighted" random number generator - objective-c

I'm currently using the below code to grab a random element from an array. How would I go about changing the code so that it returns an element weighted on the percentage that I want it to come up? For example, I want the element at index 0 to come up 27.4% of the time, but the element at index 7 to come up only 5.9% of the time.
NSArray *quoteArray = #[ #"quote1", #"quote2", #"quote3", #"quote4", #"quote5", #"quote6", #"quote7", #"quote8", ];
NSString *quoteString;
int r = arc4random() % [quoteArray count];
if(r<[rewardTypeArray count])
quoteString = [quoteArray objectAtIndex:r];

I would use an array of float (wrapped into NSNumber) objects.
Every object represents a percentage.In this case you would have an array of 8 objects:
Object 1: #27.5 ;
...
Object 7: #5.9 .
Then you get a random number from 1 to 100. If you want more precision you can also get a random number with the decimal part, and the precision doesn't influence the efficiency and neither the memory used.
Then when you get the number you iterate through all the array, keep track of the index and the percentage that you have. You use a float to sum all the percentages met and you stop only when the total percentage is greater on equal that the one that you have.
Example
NSArray* percentages= #[ #27.4 , ... , #5.9];
float randomNumber= arc4random_uniform(100) + (float)arc4random_uniform(101)/100;
NSUInteger n=0;
float totalPercentage= 0.0;
for(NSUInteger i=0; i<percentages.count; i++)
{
totalPercentage+= [ percentages[i] floatValue ];
if( totalPercentage >= randomNumber) // This case we don't care about
// the comparison precision
{
break;
}
n++;
}
// Now n is index that you want

The easiest way would be to generate a random number based on how fine-grained you want the percentage to be. To calculate to the tenth of a percent, you could generate between 0-1000, and 274 of the values you could randomly generate would be the first element. 59 values would correspond to element 7.
For example:
0-273 = index 1 27.4%
274-301 = index 2 2.7%
302-503 = index 3 20.1%
504-550 = index 4 4.6%
551-700 = index 5 14.9%
701-941 = index 6 24%
942-1000 = index 7 5.9%
The percentages don't add up properly, so I did my math wrong somewhere, but you get the point.

You can make another array with counter that would keep tracking how many times each one of your elements is being generated. If that counter is less than your target let that index come in your r, otherwise regenarate.

Related

Objective-C NSIndexSet / NSArray - Selecting the "Best" Index from Set using Standard Dev

I have a question now about using standard deviation.
And if I'm using it properly for my case as laid out below.
The Indexes are all Unique
here's a few questions I have about Standard Deviation:
1) Since I'm using all of the data should I be using a population Standard Dev or
should I use a sample Standard Dev?
2) Does it matter what the length (range) of the full playlist is (1...15)
I have a program which takes a Playlist of Songs and gets recommendations
for each song from Spotify.
Say the playlist has a length of 15.
Each tracks gets an a array of Suggestions of about 30 tracks.
And in the end my program will filter down all of the suggestions to
create a new playlist of only 15 tracks.
There is often duplicates that get recommended.
I have devised a method for finding these duplicates and
then putting their index into a NSIndexSet.
In my example there is a duplicate track that was suggested for tracks
in the original playlist at indexes 4, 6, 7, 12
I'm trying to calculate out which is the best one of the duplicates to pick.
All of the NSSet methods etc were not going to help me an would not take
into account "where" the duplicates where place. To me it makes sense
that the more ofter within a "zone" a track was suggested would make the
most sense to "use" it at that location in the final suggested playist.
Originally I was just selecting the index closest to the mean (7.25)
But to me I would think that 6 would be a better choice than 7.
The 12 seems to throw it off.
So I started to investigating StdDev and figured that could help me out
How do you think my approach to this here is?
NSMutableIndexSet* dupeIndexsSet; // contains indexes 4,6,7,12
// I have an extension on NSIndexSet to create a NSArray from it
NSArray* dupesIndexSetArray = [dupeIndexsSet indexSetAsArray];
// #[4, 6, 7, 12]
NSUInteger dupeIndexsCount = [dupeIndexSetArray count]; // 4
NSUInteger dupeIndexSetFirst = [dupeIndexsSet firstIndex]; // 4
NSUInteger dupeIndexSetLast = [dupeIndexsSet lastIndex]; // 12
// I have an extension on NSArray to calculate the mean
NSNumber* dupeIndexsMean = [dupeIndexArray meanOf]; // 7.25;
the populationSD is 2.9475
the populationVariance is 8.6875
the sampleSD is 3.4034
the sampleVariance is 11.5833
Which SD should I use?
Or will it matter
I learned that the SD is the range from the Mean
so I figured I would calculate out what those values are.
double mean = [dupeIndexsMean doubleValue];
double dev = [populationSD doubleValue];
NSUInteger stdDevRangeStart = MAX(round(mean - dev), dupeIndexSetFirst);
// 7.25 - 2.8475 = 4.4025, round 4, dupeIndexSetFirst = 4
// stdDevRangeStart = 4;
NSUInteger stdDevRangeEnd = MIN(round(mean + dev), dupeIndexSetLast);
// 7.25 + 2.8475 = 10.0975, round 10, dupeIndexSetLast = 12
// stdDevRangeEnd = 10;
NSUInteger stdDevRangeLength1 = stdDevRangeEnd - stdDevRangeStart; // 6
NSUInteger stdDevRangeLength2 = MAX(round(dev * 2), stdDevRangeLength1);
// 2.8475 * 2 = 5.695, round 6, stdDevRangeLength1 = 6
// stdDevRangeLength2 = 6;
NSRange dupeStdDevRange = NSMakeRange(stdDevRangeStart, stdDevRangeLength2);
// startIndex = 4, length 6
So I figured if this new range would give me a better range that
would include a more accurate stdDev and not include the 12.
I create a newIndexSet from the original one that only includes the indexes
that are included from my new dupeStdDevRange
NSMutableIndexSet* stdDevIndexSet = [NSMutableIndexSet new];
[dupeIndexsSet enumerateIndexesInRange:dupeStdDevRange
options:NSEnumerationConcurrent
usingBlock:^(NSUInteger idx, BOOL * _Nonnull stop)
{
[stdDevIndexSet addIndex:idx];
}];
// stdDevIndexSet has indexes = 4, 6, 7
the new stdDevIndexSet now includes indexes 4,6,7
12 was not included, which is great cause I thought was throwing everything off
now with this new "stdDevIndexSet" I check it against the original IndexSet
If the stdDevIndexSet count is less, then I reload this new indexSet into
the whole process and calculate everything again.
if ([stdDevIndexSet count] < dupeIndexesCount)
{
[self loadDupesIndexSet:stdDevIndexSet];
}
else {
doneTrim = YES;
}
So it is different, so I start the whole process again with index set that
includes 4,6,7
updated calculations
dupeIndexsMean = 5.6667;
populationSD = 1.2472;
populationVariance = 1.5556;
sampleSD = 1.5275;
sampleVariance = 2.3333;
stdDevRangeStart = 4;
stdDevRangeEnd = 7;
The newTrimmed IndexSet now "fits" the Stand Deviation Range.
So if I use the new Mean rounded to 6.
My Best Index to use is 6 from the original (4, 6, 7, 12)
Which now makes send to me.
So big question am I approaching this correctly?
Do things like the original Size (length) of the "potential" range matter?
IE if the original playlist length was 20 tracks as compared to 40 tracks?
(I'm thinking not).

Returning the smallest value within an Array List

I need to write a method that returns me the smallest distance (which is a whole number value) within an Array List called "babyTurtles". There are 5 turtles within this array list and they all move a random distance each time the program is ran.
I've been trying to figure out how to do it for an hour and all I've accomplished is making myself frustrated and coming here.
p.s.
In my class we wrote this code to find the average distance moved by the baby turtles:
public double getAverageDistanceMovedByChildren() {
if (this.babyTurtles.size() == 0) {
return 0;
}
double sum = 0;
for (Turtle currentTurtle : this.babyTurtles) {
sum = sum + currentTurtle.getDistanceMoved();
}
double average = sum / this.babyTurtles.size();
return average;
}
That's all I've got to work on, but I just can't seem to find out how to do it.
I'd really appreciate it if you could assist me.
This will give you the index in the array list of the smallest number:
int lowestIndex = distanceList.indexOf(Collections.min(distanceList));
You can then get the value using this:
int lowestDistance = distanceList.get(lowestIndex);

NSString constrainedToSize method?

Not to get confused with the NSString sizeWithFont method that returns a CGSize, what I'm looking for is a method that returns an NSString constrained to a certain CGSize. The reason I want to do this is so that when drawing text with Core Text, I can get append an ellipses (...) to the end of the string. I know NSString's drawInRect method does this for me, but I'm using Core Text, and kCTLineBreakByTruncatingTail truncates the end of each line rather than the end of the string.
There's this method that I found that truncates a string to a certain width, and it's not that hard to change it to make it work for a CGSize, but the algorithm is unbelievably slow for long strings, and is practically unusable. (It took over 10 seconds to truncate a long string). There has to be a more "computer science"/mathematical algorithm way to do this faster. Anyone daring enough to try to come up with a faster implementation?
Edit: I've managed to make this in to a binary algorithm:
-(NSString*)getStringByTruncatingToSize:(CGSize)size string:(NSString*)string withFont:(UIFont*)font
{
int min = 0, max = string.length, mid;
while (min < max) {
mid = (min+max)/2;
NSString *currentString = [string substringWithRange:NSMakeRange(min, mid - min)];
CGSize currentSize = [currentString sizeWithFont:font constrainedToSize:CGSizeMake(size.width, MAXFLOAT)];
if (currentSize.height < size.height){
min = mid + 1;
} else if (currentSize.height > size.height) {
max = mid - 1;
} else {
break;
}
}
NSMutableString *finalString = [[string substringWithRange:NSMakeRange(0, min)] mutableCopy];
if(finalString.length < self.length)
[finalString replaceCharactersInRange:NSMakeRange(finalString.length - 3, 3) withString:#"..."];
return finalString;
}
The problem is that this sometimes cuts the string too short when it has room to spare. I think this is where that last condition comes in to play. How do I make sure it doesn't cut off too much?
Good news! There is a "computer science/mathematical way" to do this faster.
The example you link to does a linear search: it just chops one character at a time from the end of the string until it's short enough. So, the amount of time it takes will scale linearly with the length of the string, and with long strings it will be really slow, as you've discovered.
However, you can easily apply a binary search technique to the string. Instead of starting at the end and dropping off one character at a time, you start in the middle:
THIS IS THE STRING THAT YOU WANT TO TRUNCATE
^
You compute the width of "THIS IS THE STRING THAT". If it is too wide, you move your test point to the midpoint of the space on the left. Like this:
THIS IS THE STRING THAT YOU WANT TO TRUNCATE
^ |
On the other hand, if it isn't wide enough, you move the test point to the midpoint of the other half:
THIS IS THE STRING THAT YOU WANT TO TRUNCATE
| ^
You repeat this until you find the point that is just under your width limit. Because you're dividing your search area in half each time, you'll never need to compute the width more than log2 N times (where N is the length of the string) which doesn't grow very fast, even for very long strings.
To put it another way, if you double the length of your input string, that's only one additional width computation.
Starting with Wikipedia's binary search sample, here's an example. Note that since we're not looking for an exact match (you want largest that will fit) the logic is slightly different.
int binary_search(NSString *A, float max_width, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imax >= imin)
{
/* calculate the midpoint for roughly equal partition */
int imid = (imin + imax) / 2;
// determine which subarray to search
float width = ComputeWidthOfString([A substringToIndex:imid]);
if (width < max_width)
// change min index to search upper subarray
imin = imid + 1;
else if (width > max_width )
// change max index to search lower subarray
imax = imid - 1;
else
// exact match found at index imid
return imid;
}
// Normally, this is the "not found" case, but we're just looking for
// the best fit, so we return something here.
return imin;
}
You need to do some math or testing to figure out what's the right index at the bottom, but it's definitely imin or imax, plus or minus one.

undo sorting of nsarray

Here is a example of what I want to do with NSArray contains NSNumber.
This is the NSArray "score" I want to edit.
score[0]=30,
score[1]=10,
score[2]=20
score[3]=0
Sort the Array in ascending-order
score[0]=30 //[0]This number shows the index before sorting the array
score[1]=20 //[2]
score[2]=10 //[1]
score[3]=0 //[3]
Edit the Array(In this case,4th gives 1st 10 points,and 3rd gives 2nd 5points)
score[0]=40 //[0]
score[1]=25 //[2]
score[2]=5 //[1]
score[3]=-10 //[3]
And sort the array back they were.
score[0]=40
score[1]=5
score[2]=25
score[3]=-10
I have a problem with no.4 method in the list.Can someone give me some idea with this?
Thanks in advance.
You don't actually need to swap the values themselves, you can set up an extra level of indirection and use that.
Before the sort, you have the indexes initialised to point to the corresponding scores:
index[0] = 0 score[0] = 30
index[1] = 1 score[1] = 10
index[2] = 2 score[2] = 20
index[3] = 3 score[3] = 0
When you sort, you actually sort the indexes based on the scores they point to rather than the scores themselves. So, instead of the following comparison in your sort:
if score[i] > score[i+1] then swap score[i], score[i+1]
you instead use:
if score[index[i]] > score[index[i+1]] then swap index[i], index[i+1]
Following the sort, you then have:
index[0] = 0 score[0] = 30
index[1] = 2 score[1] = 10 \ These two indexes have been swapped
index[2] = 1 score[2] = 20 / but NOT the scores.
index[3] = 3 score[3] = 0
Then, to move points, you use the indirect indexes rather than the direct values:
score[index[0]] += 10; score[index[3]] -= 10;
score[index[1]] += 5; score[index[2]] -= 5;
Then you throw away the indexes altogether, the original array doesn't need restoration to its original order, simply because its order was never changed.
Make an additional array initialized as such:
index[0] = 0
index[1] = 1
index[2] = 2
:
And every time your sorting algorithm swaps two indices in score, you also swap the same indices in index.
In case your sorting algorithm is built-in, so that you have no control over it, you will have to replace each score with a tuple (an object, a two-element array, whichever is easier in objective-c), where on element is the score, and the other element is its original index. When you sort, you would pass a custom comparator to the sorting function, so that only the score is used for comparison. That would sort your score-index tuples, so that you can use the index to restore them to original order.
[Code will be typed in, check it!]
I assume your score array is actually mutable as you intend to alter it:
NSMutableArray *score = ...;
Create another array the same size and initialized to 0..n:
NSMutableArray *indices = [NSMutableArray arrayWithCapacity:[score count]];
// add the numbers 0..[score count] to indices
Now sort the indices array using a custom comparator which looks up the score array:
[indices sortUsingComparator:(NSComparator)^(NSNumber *a, NSNumber *b)
{
return [((NSNumber *)[[score objectAtIndex:[a integerValue]])
compare:[[score objectAtIndex:[b integerValue]]
];
}
]
Now you can modify your original array via the indices array, e.g. to modify the "4th" element after the source:
[score replaceObjectAtIndex:[[indices objectAtIndex:3] integerValue] withObject:...];
Now you don't need to "unsort" score at all, your step 4 is "do nothing".

Generate combinations ordered by an attribute

I'm looking for a way to generate combinations of objects ordered by a single attribute. I don't think lexicographical order is what I'm looking for... I'll try to give an example. Let's say I have a list of objects A,B,C,D with the attribute values I want to order by being 3,3,2,1. This gives A3, B3, C2, D1 objects. Now I want to generate combinations of 2 objects, but they need to be ordered in a descending way:
A3 B3
A3 C2
B3 C2
A3 D1
B3 D1
C2 D1
Generating all combinations and sorting them is not acceptable because the real world scenario involves large sets and millions of combinations. (set of 40, order of 8), and I need only combinations above the certain threshold.
Actually I need count of combinations above a threshold grouped by a sum of a given attribute, but I think it is far more difficult to do - so I'd settle for developing all combinations above a threshold and counting them. If that's possible at all.
EDIT - My original question wasn't very precise... I don't actually need these combinations ordered, just thought it would help to isolate combinations above a threshold. To be more precise, in the above example, giving a threshold of 5, I'm looking for an information that the given set produces 1 combination with a sum of 6 ( A3 B3 ) and 2 with a sum of 5 ( A3 C2, B3 C2). I don't actually need the combinations themselves.
I was looking into subset-sum problem, but if I understood correctly given dynamic solution it will only give you information is there a given sum or no, not count of the sums.
Thanks
Actually, I think you do want lexicographic order, but descending rather than ascending. In addition:
It's not clear to me from your description that A, B, ... D play any role in your answer (except possibly as the container for the values).
I think your question example is simply "For each integer at least 5, up to the maximum possible total of two values, how many distinct pairs from the set {3, 3, 2, 1} have sums of that integer?"
The interesting part is the early bailout, once no possible solution can be reached (remaining achievable sums are too small).
I'll post sample code later.
Here's the sample code I promised, with a few remarks following:
public class Combos {
/* permanent state for instance */
private int values[];
private int length;
/* transient state during single "count" computation */
private int n;
private int limit;
private Tally<Integer> tally;
private int best[][]; // used for early-bail-out
private void initializeForCount(int n, int limit) {
this.n = n;
this.limit = limit;
best = new int[n+1][length+1];
for (int i = 1; i <= n; ++i) {
for (int j = 0; j <= length - i; ++j) {
best[i][j] = values[j] + best[i-1][j+1];
}
}
}
private void countAt(int left, int start, int sum) {
if (left == 0) {
tally.inc(sum);
} else {
for (
int i = start;
i <= length - left
&& limit <= sum + best[left][i]; // bail-out-check
++i
) {
countAt(left - 1, i + 1, sum + values[i]);
}
}
}
public Tally<Integer> count(int n, int limit) {
tally = new Tally<Integer>();
if (n <= length) {
initializeForCount(n, limit);
countAt(n, 0, 0);
}
return tally;
}
public Combos(int[] values) {
this.values = values;
this.length = values.length;
}
}
Preface remarks:
This uses a little helper class called Tally, that just isolates the tabulation (including initialization for never-before-seen keys). I'll put it at the end.
To keep this concise, I've taken some shortcuts that aren't good practice for "real" code:
This doesn't check for a null value array, etc.
I assume that the value array is already sorted into descending order, required for the early-bail-out technique. (Good production code would include the sorting.)
I put transient data into instance variables instead of passing them as arguments among the private methods that support count. That makes this class non-thread-safe.
Explanation:
An instance of Combos is created with the (descending ordered) array of integers to combine. The value array is set up once per instance, but multiple calls to count can be made with varying population sizes and limits.
The count method triggers a (mostly) standard recursive traversal of unique combinations of n integers from values. The limit argument gives the lower bound on sums of interest.
The countAt method examines combinations of integers from values. The left argument is how many integers remain to make up n integers in a sum, start is the position in values from which to search, and sum is the partial sum.
The early-bail-out mechanism is based on computing best, a two-dimensional array that specifies the "best" sum reachable from a given state. The value in best[n][p] is the largest sum of n values beginning in position p of the original values.
The recursion of countAt bottoms out when the correct population has been accumulated; this adds the current sum (of n values) to the tally. If countAt has not bottomed out, it sweeps the values from the start-ing position to increase the current partial sum, as long as:
enough positions remain in values to achieve the specified population, and
the best (largest) subtotal remaining is big enough to make the limit.
A sample run with your question's data:
int[] values = {3, 3, 2, 1};
Combos mine = new Combos(values);
Tally<Integer> tally = mine.count(2, 5);
for (int i = 5; i < 9; ++i) {
int n = tally.get(i);
if (0 < n) {
System.out.println("found " + tally.get(i) + " sums of " + i);
}
}
produces the results you specified:
found 2 sums of 5
found 1 sums of 6
Here's the Tally code:
public static class Tally<T> {
private Map<T,Integer> tally = new HashMap<T,Integer>();
public Tally() {/* nothing */}
public void inc(T key) {
Integer value = tally.get(key);
if (value == null) {
value = Integer.valueOf(0);
}
tally.put(key, (value + 1));
}
public int get(T key) {
Integer result = tally.get(key);
return result == null ? 0 : result;
}
public Collection<T> keys() {
return tally.keySet();
}
}
I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem falls under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
Check out this question in stackoverflow: Algorithm to return all combinations
I also just used a the java code below to generate all permutations, but it could easily be used to generate unique combination's given an index.
public static <E> E[] permutation(E[] s, int num) {//s is the input elements array and num is the number which represents the permutation
int factorial = 1;
for(int i = 2; i < s.length; i++)
factorial *= i;//calculates the factorial of (s.length - 1)
if (num/s.length >= factorial)// Optional. if the number is not in the range of [0, s.length! - 1]
return null;
for(int i = 0; i < s.length - 1; i++){//go over the array
int tempi = (num / factorial) % (s.length - i);//calculates the next cell from the cells left (the cells in the range [i, s.length - 1])
E temp = s[i + tempi];//Temporarily saves the value of the cell needed to add to the permutation this time
for(int j = i + tempi; j > i; j--)//shift all elements to "cover" the "missing" cell
s[j] = s[j-1];
s[i] = temp;//put the chosen cell in the correct spot
factorial /= (s.length - (i + 1));//updates the factorial
}
return s;
}
I am extremely sorry (after all those clarifications in the comments) to say that I could not find an efficient solution to this problem. I tried for the past hour with no results.
The reason (I think) is that this problem is very similar to problems like the traveling salesman problem. Until unless you try all the combinations, there is no way to know which attributes will add upto the threshold.
There seems to be no clever trick that can solve this class of problems.
Still there are many optimizations that you can do to the actual code.
Try sorting the data according to the attributes. You may be able to avoid processing some values from the list when you find that a higher value cannot satisfy the threshold (so all lower values can be eliminated).
If you're using C# there is a fairly good generics library here. Note though that the generation of some permutations is not in lexicographic order
Here's a recursive approach to count the number of these subsets: We define a function count(minIndex,numElements,minSum) that returns the number of subsets of size numElements whose sum is at least minSum, containing elements with indices minIndex or greater.
As in the problem statement, we sort our elements in descending order, e.g. [3,3,2,1], and call the first index zero, and the total number of elements N. We assume all elements are nonnegative. To find all 2-subsets whose sum is at least 5, we call count(0,2,5).
Sample Code (Java):
int count(int minIndex, int numElements, int minSum)
{
int total = 0;
if (numElements == 1)
{
// just count number of elements >= minSum
for (int i = minIndex; i <= N-1; i++)
if (a[i] >= minSum) total++; else break;
}
else
{
if (minSum <= 0)
{
// any subset will do (n-choose-k of them)
if (numElements <= (N-minIndex))
total = nchoosek(N-minIndex, numElements);
}
else
{
// add element a[i] to the set, and then consider the count
// for all elements to its right
for (int i = minIndex; i <= (N-numElements); i++)
total += count(i+1, numElements-1, minSum-a[i]);
}
}
return total;
}
Btw, I've run the above with an array of 40 elements, and size-8 subsets and consistently got back results in less than a second.