Is there really no predefined dynamic 2d container in Smalltalk? Do I have to make my own? - smalltalk

I need a dynamic 2d container and I was suprised that I could not find anything usefull in the Collections. So I made my own in oldskool fashion but somehow I feel like there must be somthing im missing. The whole concept in smalltalk pharo is based on using their stuff instead of having to build your own.

OK, so you want to have a collection of objects (morphs in your case) arranged by rows and columns. Here is one way to do this
Initialization: Create an instance variable in your class for holding the objects, and initialize it as:
morphs := OrderedCollection new
Addition: Place new objects in your collection by means of a method like this one
placeMorph: morph atRow: i column: j
| row |
row := morphs at: i ifAbsentPut: [OrderedCollection new].
j - row size timesRepeat: [row add: nil].
row at: j put: morph
Note that by adding nil exactly j - row size times (which could be <= 0) ensures the existence of a slot at row i column j.
Retrieval: Get the object at a given position in the grid or nil
morphAtRow: i column: j
| row |
row := morphs at: i ifAbsent: [^nil].
^row at: j ifAbsent: [nil]
Another possibility would be to use a Dictionary, which could make sense if the grid is large and sparse. In that case you could do the following
Initialization
morphs := Dictionary new
Addition
placeMorph: morph atRow: i column: j
morphs at: i -> j put: morph
Retrieval
morphAtRow: i column: j
^morphs at: i -> j ifAbsent: [nil]
Note that I've used associations i -> j for the keys. Another possibility would have been to use pairs {i.j}.

Pharo has Matrix class. That's pretty much a 2d container, unless you are talking about something else I do not understand :)

Related

For loop for array in Pharo Smalltalk

I'm trying to make an array with random numbers (just 0 or 1), but when I run it, it just prints this: End of statement list encountered ->
This is my code:
GenList
| lista |
lista := Array new: 31.
1 to: 30 do: [ :i | lista at: i put: 2 atRandom - 1]
^lista
What can I do?
Some interesting things to consider:
1. The method selector doesn't start with a lowercase letter
It is a tradition for selectors to start with a lowercase letter. In this sense, genLista would be more correct than GenLista.
2. The method selector includes the abbreviated word 'gen'
For instance, genLista could be renamed to genereLista o listaAlAzar (if you decide to use Spanish)
3. The Array named lista has 31 elements, not 30
The result of Array new: 31 is an array of 31 elements. However, the code below it only fills 30 of them, leaving the last one uninitialized (i.e., nil). Possible solution: lista := Array new: 30.
4. A dot is missing causing a compilation error
The code
1 to: 30 do: [ :i | lista at: i put: 2 atRandom - 1]
^lista
does not compile because there is no dot indicating the separation between the two sentences. Note that the error happens at compilation time (i.e., when you save the method) because the return token ^ must start a statement (i.e., it cannot be inlined inside a statement).
There are other cases where a missing dot will not prevent the code from compiling. Instead, an error will happen at runtime. Here is a (typical) example:
1 to: 10 do: [:i | self somethingWith: i] "<- missing dot here"
self somethingElse
the missing dot will generate the runtime error self not understood by block.
5. There is a more expressive way of generating 0s and 1s at random
The calculation 2 atRandom - 1 is ok. However, it forces the reader to mentally do the math. A better way to reveal your intention would have been
#(0 1) atRandom
6. When playing with random numbers don't forget to save the seed
While it is ok to use atRandom, such a practice should only be used with "toy" code. If you are developing a system or a library, the recommended practice is to save the seed somewhere before generating any random data. This will allow you to reproduce the generation of random quantities later on for the sake of debugging or confirmation. (Note however, that this will not suffice for making your program deterministically reproducible because unordered (e.g. hashed) collections could form differently in successive executions.)

Conditional swapping of items in an array

I have a collection of items of type A, B, and C.
I would like to process the collection and swap all A and B pairs, but if there is C (which is also a collection), I want to process that recursively.
So
#(A1 A2 B1 B2 A3 #(A4 A5 B3) )
would be translated into
#(A1 B1 A2 B2 A3 #(A4 B3 A5) )
The swap isn't transitive so #(A1 B1 B2) will be translated into #(B1 A1 B2) and not #(B1 B2 A1).
I wanted to use overlappingPairsDo: but the problem is that the second element is always processed twice.
Can this be achieved somehow with Collection API without resorting to primitive forloops?
I am looking for readable, not performant solution.
I think my solution below should do what you're after, but a few notes up front:
The "requirements" seem a bit artificial - as in, I'm having a hard time imagining a use case where you'd want this sort of swapping. But that could just be my lack of imagination, of course, or due to your attempt to simplify the problem.
A proper solution should, in my opinion, create the objects required so that code can be moved where it belongs. My solution just plunks it (mostly) in a class-side method for demonstration purposes.
You're asking for this to be "achieved somehow with [the] Collection API without resorting to primitive for[-]loops" - I wouldn't be so quick to dismiss going down to the basics. After all, if you look at the implementation of, say, #overlappingPairsDo:, that's exactly what they do, and since you're asking your question within the pharo tag, you're more than welcome to contribute your new way of doing something useful to the "Collections API" so that we can all benefit from it.
To help out, I've added a class SwapPairsDemo with two class-side methods. The first one is just a helper, since, for demonstration purposes, we're using the Array objects from your example, and they contain ByteSymbol instances as your A and B types which we want to distinguish from the C collection type - only, ByteSymbols are of course themselves collections, so let's pretend they're not just for the sake of this exercise.
isRealCollection: anObject
^anObject isCollection
and: [anObject isString not
and: [anObject isSymbol not]]
The second method holds the code to show swapping and to allow recursion:
swapPairsIn: aCollection ifTypeA: isTypeABlock andTypeB: isTypeBBlock
| shouldSwapValues wasJustSwapped |
shouldSwapValues := OrderedCollection new: aCollection size - 1 withAll: false.
aCollection overlappingPairsWithIndexDo: [:firstItem :secondItem :eachIndex |
(self isRealCollection: firstItem)
ifTrue: [self swapPairsIn: firstItem ifTypeA: isTypeABlock andTypeB: isTypeBBlock]
ifFalse: [
shouldSwapValues at: eachIndex put: ((self isRealCollection: secondItem) not
and: [(isTypeABlock value: firstItem)
and: [isTypeBBlock value: secondItem]])
]
].
(self isRealCollection: aCollection last)
ifTrue: [self swapPairsIn: aCollection last ifTypeA: isTypeABlock andTypeB: isTypeBBlock].
wasJustSwapped := false.
shouldSwapValues withIndexDo: [:eachBoolean :eachIndex |
(eachBoolean and: [wasJustSwapped not])
ifTrue: [
aCollection swap: eachIndex with: eachIndex + 1.
wasJustSwapped := true
]
ifFalse: [wasJustSwapped := false]
]
That's a bit of a handful, and I'd usually refactor a method this big, plus you might want to take care of nil, empty lists, etc., but hopefully you get the idea for an approach to your problem. The code consists of three steps:
Build a collection (size one less than the size of the main collection) of booleans to determine whether two items should be swapped by iterating with overlappingPairsWithIndexDo:.
This iteration doesn't handle the last element by itself, so we need to take care of this element possibly being a collection in a separate step.
Finally, we use our collection of booleans to perform the swapping, but we don't swap again if we've just swapped the previous time (I think this is what you meant by the swap not being transitive).
To run the code, you need to supply your collection and a way to tell whether things are type "A" or "B" - I've just used your example, so I just ask them whether they start with those letters - obviously that could be substituted by whatever fits your use case.
| collection target isTypeA isTypeB |
collection := #(A1 A2 B1 B2 A3 #(A4 A5 B3) ).
target := #(A1 B1 A2 B2 A3 #(A4 B3 A5) ).
isTypeA := [:anItem | anItem beginsWith: 'A'].
isTypeB := [:anItem | anItem beginsWith: 'B'].
SwapPairsDemo swapPairsIn: collection ifTypeA: isTypeA andTypeB: isTypeB.
^collection = target
Inspecting this in a workspace returns true, i.e. the swaps on the collection have been performed so that it is now the same as the target.
Here's a solution without recursion that uses a two step approach:
result := OrderedCollection new.
#(1 3 4 6 7)
piecesCutWhere: [ :a :b | a even = b even ]
do: [ :run |
result addAll: ((result isEmpty or: [ result last even ~= run first even ])
ifTrue: [ run ]
ifFalse: [ run reverse ]) ].
result asArray = #(1 4 3 6 7) "--> true"
So first we split the collection wherever we see the possibility for swapping. Then, in the second step, we only swap if the last element of the result collection still allows for swapping.
Adding recursion to this should be straight forward.

Optimal Solution: Get a random sample of items from a data set

So I recently had this as an interview question and I was wondering what the optimal solution would be. Code is in Objective-c.
Say we have a very large data set, and we want to get a random sample
of items from it for testing a new tool. Rather than worry about the
specifics of accessing things, let's assume the system provides these
things:
// Return a random number from the set 0, 1, 2, ..., n-2, n-1.
int Rand(int n);
// Interface to implementations other people write.
#interface Dataset : NSObject
// YES when there is no more data.
- (BOOL)endOfData;
// Get the next element and move forward.
- (NSString*)getNext;
#end
// This function reads elements from |input| until the end, and
// returns an array of |k| randomly-selected elements.
- (NSArray*)getSamples:(unsigned)k from:(Dataset*)input
{
// Describe how this works.
}
Edit: So you are supposed to randomly select items from a given array. So if k = 5, then I would want to randomly select 5 elements from the dataset and return an array of those items. Each element in the dataset has to have an equal chance of getting selected.
This seems like a good time to use Reservoir Sampling. The following is an Objective-C adaptation for this use case:
NSMutableArray* result = [[NSMutableArray alloc] initWithCapacity:k];
int i,j;
for (i = 0; i < k; i++) {
[result setObject:[input getNext] atIndexedSubscript:i];
}
for (i = k; ![input endOfData]; i++) {
j = Rand(i);
NSString* next = [input getNext];
if (j < k) {
[result setObject:next atIndexedSubscript:j];
}
}
return result;
The code above is not the most efficient reservoir sampling algorithm because it generates a random number for every entry of the reservoir past the entry at index k. Slightly more complex algorithms exist under the general category "reservoir sampling". This is an interesting read on an algorithm named "Algorithm Z". I would be curious if people find newer literature on reservoir sampling, too, because this article was published in 1985.
Interessting question, but as there is no count or similar method in DataSet and you are not allowed to iterate more than once, i can only come up with this solution to get good random samples (no k > Datasize handling):
- (NSArray *)getSamples:(unsigned)k from:(Dataset*)input {
NSMutableArray *source = [[NSMutableArray alloc] init];
while(![input endOfData]) {
[source addObject:[input getNext]];
}
NSMutableArray *ret = [[NSMutableArray alloc] initWithCapacity:k];
int count = [source count];
while ([ret count] < k) {
int index = Rand(count);
[ret addObject:[source objectAtIndex:index]];
[source removeObjectAtIndex:index];
count--;
}
return ret;
}
This is not the answer I did in the interview but here is what I wish I had done:
Store pointer to first element in dataset
Loop over dataset to get count
Reset dataset to point at first element
Create NSMutableDictionary for storing random indexes
Do for loop from i=0 to i=k. Each iteration, generate a random value, check if value exists in dictionary. If it does, keep generating a random value until you get a fresh value.
Loop over dataset. If the current index is within the dictionary, add it to a the array of random subset values.
Return array of random subsets.
There are multiple ways to do this, the first way:
1. use input parameter k to dynamically allocate an array of numbers
unsigned * numsArray = (unsigned *)malloc(sizeof(unsigned) * k);
2. run a loop that gets k random numbers and stores them into the numsArray (must be careful here to check each new random to see if we have gotten it before, and if we have, get another random, etc...)
3. sort numsArray
4. run a loop beginning at the beginning of DataSet with your own incrementing counter dataCount and another counter numsCount both beginning at 0. whenever dataCount is equal to numsArray[numsCount], grab the current data object and add it to your newly created random list then increment numsCount.
5. The loop in step 4 can end when either numsCount > k or when dataCount reaches the end of the dataset.
6. The only other step that may need to be added here is before any of this to use the next command of the object type to count how large the dataset is to be able to bound your random numbers and check to make sure k is less than or equal to that.
The 2nd way to do this would be to run through the actual list MULTIPLE times.
// one must assume that once we get to the end, we can start over within the set again
1. run a while loop that checks for endOfData
a. count up a count variable that is initialized to 0
2. run a loop from 0 through k-1
a. generate a random number that you constrain to the list size
b. run a loop that moves through the dataset until it hits the rand element
c. compare that element with all other elements in your new list to make sure it isnt already in your new list
d. store the element into your new list
there may be ways to speed up the 2nd method by storing a current list location, that way if you generate a random that is past the current pointer you dont have to move through the whole list again to get back to element 0, then to the element you wish to retreive.
A potential 3rd way to do this might be to:
1. run a loop from 0 through k-1
a. generate a random
b. use the generated random as a skip count, move skip count objects through the list
c. store the current item from the list into your new list
Problem with this 3rd method is without knowing how big the list is, you dont know how to constrain the random skip count. Further, even if you did, chances are that it wouldnt truly look like a randomly grabbed subset that could easily reach the last element in the list as it would become statistically unlikely that you would ever reach the end element (i.e. not every element is given an equal shot of being select.)
Arguably the FASTEST way to do this is method 1, where you generate the random numerics first, then traverse the list only once (yes its actually twice, once to get the size of the dataset list then again to grab the random elements)
We need a little probability theory. As others, I will ignore the case n < k. The probability that the n'th item will be selected into the set of size k is just C(n-1, k-1) / C(n, k) where C is the binomial coefficient. A bit of math says shows that this is just k/n. For the rest, note that the selection of the n'th item is independent of all other selections. In other words, "the past doesn't matter."
So an algorithm is:
S = set of up to k elements
n = 0
while not end of input
v = next value
n = n + 1
if |S| < k add v to S
else if random(0,1) >= k/n replace a randomly chosen element of S with v
I will let the coders code this one! It's pretty trivial. All you need is an array of size k and one pass over the data.
If you care about efficiency (as your tags suggest) and the number of items in the population is known, don't use reservior sampling. That would require you to loop through the entire data set and generate a random number for each.
Instead, pick five values ranges from 0 to n-1. In the unlikely case, there is a duplicate among the five indexes, replace the duplicate with another random value. Then use the five indexes to do a random-access lookup to the i-th element in the population.
This is simple. It uses a minimum number of calls the random number generator. And it accesses memory only for the relevant selections.
If you don't know the number of data elements in advance, you can loop-over the data once to get the population size and proceed as above.
If you aren't allow to iterate over the data more than once, use a chunked form of reservior sampling: 1) Choose the first five elements as the initial sample, each having a probability of 1/5th. 2) Read in a large chunk of data and choose five new samples from the new set (using only five calls to Rand). 3) Pairwise, decide whether to keep the new sample item or old sample element (with odds proportional the the probablities for each of the two sample groups). 4) Repeat until all the data has been read.
For example, assume there are 1000 data elements (but we don't know this in advance).
Choose the first five as the initial sample: current_sample = read(5); population=5.
Read a chunk of n datapoints (perhaps n=200 in this example):
subpop = read(200);
m = len(subpop);
new_sample = choose(5, subpop);
loop-over the two samples pairwise:
for (a, b) in (current_sample and new_sample): if random(0 to population + m) < population, then keep a, otherwise keep *b)
population += m
repeat

how to optimize search difference between array / list of object

Premesis:
I am using ActionScript with two arraycollections containing objects with values to be matched...
I need a solution for this (if in the framework there is a library that does it better) otherwise any suggestions are appreciated...
Let's assume I have two lists of elements A and B (no duplicate values) and I need to compare them and remove all the elements present in both, so at the end I should have
in A all the elements that are in A but not in B
in B all the elements that are in B but not in A
now I do something like that:
for (var i:int = 0 ; i < a.length ;)
{
var isFound:Boolean = false;
for (var j:int = 0 ; j < b.length ;)
{
if (a.getItemAt(i).nome == b.getItemAt(j).nome)
{
isFound = true;
a.removeItemAt(i);
b.removeItemAt(j);
break;
}
j++;
}
if (!isFound)
i++;
}
I cycle both the arrays and if I found a match I remove the items from both of the arrays (and don't increase the loop value so the for cycle progress in a correct way)
I was wondering if (and I'm sure there is) there is a better (and less CPU consuming) way to do it...
If you must use a list, and you don't need the abilities of arraycollection, I suggest simply converting it to using AS3 Vectors. The performance increase according to this (http://www.mikechambers.com/blog/2008/09/24/actioscript-3-vector-array-performance-comparison/) are 60% compared to Arrays. I believe Arrays are already 3x faster than ArrayCollections from some article I once read. Unfortunately, this solution is still O(n^2) in time.
As an aside, the reason why Vectors are faster than ArrayCollections is because you provide type-hinting to the VM. The VM knows exactly how large each object is in the collection and performs optimizations based on that.
Another optimization on the vectors is to sort the data first by nome before doing the comparisons. You add another check to break out of the loop if the nome of list b simply wouldn't be found further down in list A due to the ordering.
If you want to do MUCH faster than that, use an associative array (object in as3). Of course, this may require more refactoring effort. I am assuming object.nome is a unique string/id for the objects. Simply assign that the value of nome as the key in objectA and objectB. By doing it this way, you might not need to loop through each element in each list to do the comparison.

Algorithm for 'Syncing' 2 Arrays

Array 1 | Array 2
=================
1 | 2
2 | 3
3 | 4
5 | 5
| 6
What is a good algorithm to 'sync' or combine Array 2 into Array 1? The following needs to happen:
Integers in Array 2 but not in Array 1 should be added to Array 1.
Integers in both Arrays can be left alone.
Integers in Array 1 but not in Array 2 should be removed from Array 1.
I'll eventually be coding this in Obj-C, but I'm really just looking for a pseudo-code representation of an efficient algorithm to solve this problem so feel free to suggest an answer in whatever form you'd like.
EDIT:
The end result I need is bit hard to explain without giving the backstory. I have a Cocoa application that has a Core Data entity whose data needs to be updated with data from a web service. I cannot simply overwrite the contents of Array 1 (the core data entity) with the content of Array 2 (the data parsed from the web into an array) because the Array 1 has relationships with other core data entities in my application. So basically it is important that integers contained in both Arrays are not overwritten in array one.
Array1 = Array2.Clone() or some equivalent might be the simplest solution, unless the order of elements is important.
I'm kind of guessing since your example leaves some things up in the air, but typically in situations like this I would use a set. Here's some example code in Obj-C.
NSMutableSet *set = [NSMutableSet set];
[set addObjectsFromArray:array1];
[set addObjectsFromArray:array2];
NSArray *finalArray = [[set allObjects] sortedArrayUsingSelector:#selector(compare:)];
(Assuming this is not a simple Array1 = Array2 question,) if the arrays are sorted, you're looking at a single O(n+m) pass over both arrays. Point to the beginning of both arrays, then advance the pointer containing the smaller element. Keep comparing the elements as you go and add/delete elements accordingly. The efficiency of this might be worth the cost of sorting the arrays, if they aren't already such.
In my approach, You will need Set data structure. I hope you can find some implementations in Obj-C.
Add all elements of Array1 to Set1
and do the same for Array2 to Set2.
Loop through elements of Array1.
Check if it is contained in Set2
(using provided method.) If it is
not, removed the element from Set1.
Loop through elements of Array2. If it
is not existed in Set1 yet, add it
to Set1.
All elements of Set1 is now your "synced" data.
The algorithm complexity of "contains","delete", and "add" operation of "Set" on some good implementation, such as HashSet, would give you the efficiency you want.
EDIT: Here is a simple implementation of Set assumed that the integer are in limited range of 0 - 100 with every elements initialized to 0, just to give more clear idea about Set.
You first need to define array bucket of size 101. And then for ..
contains(n) - check if bucket[n] is 1 or not.
add(n) - set bucket[n] to 1.
delete(n) - set bucket[n] to 0.
You say:
What is a good algorithm to 'sync' or combine Array 2 into Array 1? The following needs to happen:
Integers in Array 2 but not in Array 1 should be added to Array 1.
Integers in both Arrays can be left alone.
Integers in Array 1 but not in Array 2 should be removed from Array 1.
Here's some literal algorithmic to help you (python):
def sync(a, b):
# a is array 1
# b is array 2
c = a + b
for el in c:
if el in b and el not in a:
a.append(el) # add to array 1
elif el in a and el not in b:
a.remove(el) # remove from array 1
# el is in both arrays, skip
return a # done
Instead of "here's what needs to happen", try describing the requirements in terms of
"here's the required final condition". From that perspective it appears that the desired end-state is for array1 to contain exactly the same values as array2.
If that's the case, then why not the equivalent of this pseudocode (unless your environment has a clone or copy method)?
array1 = new int[array2.length]
for (int i in 0 .. array2.length - 1) {
array1[i] = array2[i]
}
If order, retention of duplicates, etc. are issues, then please update the question and we can try again.
Well, if the order doesn't matter, you already have your algorithm:
Integers in Array 2 but not in Array 1 should be added to Array 1.
Integers in both Arrays can be left alone.
Integers in Array 1 but not in Array 2 should be removed from Array 1.
If the order of the integers matter, what you want is a variation of the algorithms that determine the "difference" between two strings. The Levenshtein algorithm should suit your well.
However, I suspect you actually want the first part. In that case, what exactly is the question? How to find the integers in an array? Or ... what?
Your question says nothing about order, and if you examine your three requirements, you'll see that the postcondition says that Array2 is unchanged and Array1 now contains exactly the same set of integers that is in Array2. Unless there's some requirement on the order that you're not telling us about, you may as well just make a copy of Array2.