I wrote the following code:
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
for (i in src)
{
if (i % 2 == 0) dest.add(Math.sqrt(i.toDouble()))
}
IntellJ (in my case AndroidStudio) is asking me if I want to replace the for loop with operations from stdlib. This results in the following code:
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
src.filter { it % 2 == 0 }
.mapTo(dest) { Math.sqrt(it.toDouble()) }
Now I must say, I like the changed code. I find it easier to write than for loops when I come up with similar situations. However upon reading what filter function does, I realized that this is a lot slower code compared to the for loop. filter function creates a new list containing only the elements from src that match the predicate. So there is one more list created and one more loop in the stdlib version of the code. Ofc for small lists it might not be important, but in general this does not sound like a good alternative. Especially if one should chain more methods like this, you can get a lot of additional loops that could be avoided by writing a for loop.
My question is what is considered good practice in Kotlin. Should I stick to for loops or am I missing something and it does not work as I think it works.
If you are concerned about performance, what you need is Sequence. For example, your above code will be
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
src.asSequence()
.filter { it % 2 == 0 }
.mapTo(dest) { Math.sqrt(it.toDouble()) }
In the above code, filter returns another Sequence, which represents an intermediate step. Nothing is really created yet, no object or array creation (except a new Sequence wrapper). Only when mapTo, a terminal operator, is called does the resulting collection is created.
If you have learned java 8 stream, you may found the above explaination somewhat familiar. Actually, Sequence is roughly the kotlin equivalent of java 8 Stream. They share similiar purpose and performance characteristic. The only difference is Sequence isn't designed to work with ForkJoinPool, thus a lot easier to implement.
When there is multiple steps involved or the collection may be large, it's suggested to use Sequence instead of plain .filter {...}.mapTo{...}. I also suggest you to use the Sequence form instead of your imperative form because it's easier to understand. Imperative form may become complex, thus hard to understand, when there are 5 or more steps involved in the data processing. If there is just one step, you don't need a Sequence, because it just creates garbage and gives you nothing useful.
You're missing something. :-)
In this particular case, you can use an IntProgression:
val progression = 0 until 1_000_000 step 2
You can then create your desired list of squares in various ways:
// may make the list larger than necessary
// its internal array is copied each time the list grows beyond its capacity
// code is very straight forward
progression.map { Math.sqrt(it.toDouble()) }
// will make the list the exact size needed
// no copies are made
// code is more complicated
progression.mapTo(ArrayList(progression.last / 2 + 1)) { Math.sqrt(it.toDouble()) }
// will make the list the exact size needed
// a single intermediate list is made
// code is minimal and makes sense
progression.toList().map { Math.sqrt(it.toDouble()) }
My advice would be to choose whichever coding style you prefer. Kotlin is both object-oriented and functional language, meaning both of your propositions are correct.
Usually, functional constructs favor readability over performance; however, in some cases, procedural code will also be more readable. You should try to stick with one style as much as possible, but don't be afraid to switch some code if you feel like it's better suited to your constraints, either readability, performance, or both.
The converted code does not need the manual creation of the destination list, and can be simplified to:
val src = (0 until 1000000).toList()
val dest = src.filter { it % 2 == 0 }
.map { Math.sqrt(it.toDouble()) }
And as mentioned in the excellent answer by #glee8e you can use a sequence to do a lazy evaluation. The simplified code for using a sequence:
val src = (0 until 1000000).toList()
val dest = src.asSequence() // change to lazy
.filter { it % 2 == 0 }
.map { Math.sqrt(it.toDouble()) }
.toList() // create the final list
Note the addition of the toList() at the end is to change from a sequence back to a final list which is the one copy made during the processing. You can omit that step to remain as a sequence.
It is important to highlight the comments by #hotkey saying that you should not always assume that another iteration or a copy of a list causes worse performance than lazy evaluation. #hotkey says:
Sometimes several loops. even if they copy the whole collection, show good performance because of good locality of reference. See: Kotlin's Iterable and Sequence look exactly same. Why are two types required?
And excerpted from that link:
... in most cases it has good locality of reference thus taking advantage of CPU cache, prediction, prefetching etc. so that even multiple copying of a collection still works good enough and performs better in simple cases with small collections.
#glee8e says that there are similarities between Kotlin sequences and Java 8 streams, for detailed comparisons see: What Java 8 Stream.collect equivalents are available in the standard Kotlin library?
Related
I start with a list of integers from 1 to 1000 in listOfRandoms.
I would like to left join on random from the createDatabase list.
I am currently using a find{} statement within a loop to do this but feel like this is too heavy. Is there not a better (quicker) way to achieve same result?
Psuedo Code
data class DatabaseRow(
val refKey: Int,
val random: Int
)
fun main() {
val createDatabase = (1..1000).map { i -> DatabaseRow(i, Random()) }
val listOfRandoms = (1..1000).map { j ->
val lookup = createDatabase.find { it.refKey == j }
lookup.random
}
}
As mentioned in comments, the question seems to be mixing up database and programming ideas, which isn't helping.
And it's not entirely clear which parts of the code are needed, and which can be replaced. I'm assuming that you already have the createDatabase list, but that listOfRandoms is open to improvement.
The ‘pseudo’ code compiles fine except that:
You don't give an import for Random(), but none of the likely ones return an Int. I'm going to assume that should be kotlin.random.Random.nextInt().
And because lookup is nullable, you can't simply call lookup.random; a quick fix is lookup!!.random, but it would be safer to handle the null case properly with e.g. lookup?.random ?: -1. (That's irrelevant, though, given the assumption above.)
I think the general solution is to create a Map. This can be done very easily from createDatabase, by calling associate():
val map = createDatabase.associate{ it.refKey to it.random }
That should take time roughly proportional to the size of the list. Looking up values in the map is then very efficient (approx. constant time):
map[someKey]
In this case, that takes rather more memory than needed, because both keys and values are integers and will be boxed (stored as separate objects on the heap). Also, most maps use a hash table, which takes some memory.
Since the key is (according to comments) “an ascending list starting from a random number, like 18123..19123”, in this particular case it can instead be stored in an IntArray without any boxing. As you say, array indexes start from 0, so using the key directly would need a huge array and use only the last few cells — but if you know the start key, you could simply subtract that from the array index each time.
Creating such an array would be a bit more complex, for example:
val minKey = createDatabase.minOf{ it.refKey }
val maxKey = createDatabase.maxOf{ it.refKey }
val array = IntArray(maxKey - minKey + 1)
for (row in createDatabase)
array[row.refKey - minKey] = row.random
You'd then access values with:
array[someKey - minKey]
…which is also constant-time.
Some caveats with this approach:
If createDatabase is empty, then minOf() will throw a NoSuchElementException.
If it has ‘holes’, omitting some keys inside that range, then the array will hold its default value of 0 — you can change that by using the alternative IntArray constructor which also takes a lambda giving the initial value.)
Trying to look up a value outside that range will give an ArrayIndexOutOfBoundsException.
Whether it's worth the extra complexity to save a bit of memory will depend on things like the size of the ‘database’, and how long it's in memory for; I wouldn't add that complexity unless you have good reason to think memory usage will be an issue.
Currently, I am looking into Kotlin and have a question about Sequences vs. Collections.
I read a blog post about this topic and there you can find this code snippets:
List implementation:
val list = generateSequence(1) { it + 1 }
.take(50_000_000)
.toList()
measure {
list
.filter { it % 3 == 0 }
.average()
}
// 8644 ms
Sequence implementation:
val sequence = generateSequence(1) { it + 1 }
.take(50_000_000)
measure {
sequence
.filter { it % 3 == 0 }
.average()
}
// 822 ms
The point here is that the Sequence implementation is about 10x faster.
However, I do not really understand WHY that is. I know that with a Sequence, you do "lazy evaluation", but I cannot find any reason why that helps reducing the processing in this example.
However, here I know why a Sequence is generally faster:
val result = sequenceOf("a", "b", "c")
.map {
println("map: $it")
it.toUpperCase()
}
.any {
println("any: $it")
it.startsWith("B")
}
Because with a Sequence you process the data "vertically", when the first element starts with "B", you don't have to map for the rest of the elements. It makes sense here.
So, why is it also faster in the first example?
Let's look at what those two implementations are actually doing:
The List implementation first creates a List in memory with 50 million elements. This will take a bare minimum of 200MB, since an integer takes 4 bytes.
(In fact, it's probably far more than that. As Alexey Romanov pointed out, since it's a generic List implementation and not an IntList, it won't be storing the integers directly, but will be ‘boxing’ them — storing references to Int objects. On the JVM, each reference could be 8 or 16 bytes, and each Int could take 16, giving 1–2GB. Also, depending how the List gets created, it might start with a small array and keep creating larger and larger ones as the list grows, copying all the values across each time, using more memory still.)
Then it has to read all the values back from the list, filter them, and create another list in memory.
Finally, it has to read all those values back in again, to calculate the average.
The Sequence implementation, on the other hand, doesn't have to store anything! It simply generates the values in order, and as it does each one it checks whether it's divisible by 3 and if so includes it in the average.
(That's pretty much how you'd do it if you were implementing it ‘by hand’.)
You can see that in addition to the divisibility checking and average calculation, the List implementation is doing a massive amount of memory access, which will take a lot of time. That's the main reason it's far slower than the Sequence version, which doesn't!
Seeing this, you might ask why we don't use Sequences everywhere… But this is a fairly extreme example. Setting up and then iterating the Sequence has some overhead of its own, and for smallish lists that can outweigh the memory overhead. So Sequences only have a clear advantage in cases when the lists are very large, are processed strictly in order, there are several intermediate steps, and/or many items are filtered out along the way (especially if the Sequence is infinite!).
In my experience, those conditions don't occur very often. But this question shows how important it is to recognise them when they do!
Leveraging lazy-evaluation allows avoiding the creation of intermediate objects that are irrelevant from the point of the end goal.
Also, the benchmarking method used in the mentioned article is not super accurate. Try to repeat the experiment with JMH.
Initial code produces a list containing 50_000_000 objects:
val list = generateSequence(1) { it + 1 }
.take(50_000_000)
.toList()
then iterates through it and creates another list containing a subset of its elements:
.filter { it % 3 == 0 }
... and then proceeds with calculating the average:
.average()
Using sequences allows you to avoid doing all those intermediate steps. The below code doesn't produce 50_000_000 elements, it's just a representation of that 1...50_000_000 sequence:
val sequence = generateSequence(1) { it + 1 }
.take(50_000_000)
adding a filtering to it doesn't trigger the calculation itself as well but derives a new sequence from the existing one (3, 6, 9...):
.filter { it % 3 == 0 }
and eventually, a terminal operation is called that triggers the evaluation of the sequence and the actual calculation:
.average()
Some relevant reading:
Kotlin: Beware of Java Stream API Habits
Kotlin Collections API Performance Antipatterns
Say I have a list of size 30k elements, and I would like to perform an operation on all possible pairs within a list. So I had:
list.asSequence().flatMap { i ->
list.asSequence().map { j -> /* perform operation here */ }
}
Question 1:
Is there anything that I can use as an alternative? (Such as applicative functors).
I also noticed that this flatMap-map operation is significantly slower than the imperative loop version. (perhaps due to closures?)
for(i in list){
for(j in list){
}
}
Question 2: Is there a way to improve the performance of the flatMap/map version?
Some alternatives with performance impacts:
com.google.common.collect.Sets.cartesianProduct(java.util.Set...): "Returns every possible list that can be formed by choosing one element from each of the given sets in order; the 'n-ary Cartesian product' of the sets."
This requires your list elements to be unique. If they're not then you'd have to wrap each element in a unique object so that they can all be added to the input set.
In my testing, however, I've found it to be slower than the flatMap/map solution. :-(
forEach/forEach: As you simply want to perform an operation on each pair then you don't actually need to use flatMap or map to transform the list so you can use forEach/forEach instead:
list.forEach { i ->
list.forEach { j -> /* perform operation here */ }
}
In my testing I've found this to be slightly faster than the for/for solution. :-)
If you do need to transform the list then your flatMap/map solutions appears to be the best solution.
Answering to the question 2, we're considering to add flatMap overload which doesn't create closures for each element in the outer collection/sequence: https://youtrack.jetbrains.com/issue/KT-8602
But in case if you want to perform some side effects on each pair, rather than transforming the sequence, I'd advice to stick with for-loops or inlined forEach lambdas, which is effectively the same.
Let's say I'd like to iterate through a generic iterator in reverse, without knowing about the internals of the iterator and essentially not cheating via untyped magic and assuming this could be any type of iterable, which serves a iterator; can we optimise the reverse of a iterator at runtime or even via macros?
Forwards
var a = [1, 2, 3, 4].iterator();
// Actual iteration bellow
for(i in a) {
trace(i);
}
Backwards
var a = [1, 2, 3, 4].iterator();
// Actual reverse iteration bellow
var s = [];
for(i in a) {
s.push(i);
}
s.reverse();
for(i in s) {
trace(i);
}
I would assume that there has to be a simpler way, or at least fast way of doing this. We can't know a size because the Iterator class doesn't carry one, so we can't invert the push on to the temp array. But we can remove the reverse because we do know the size of the temp array.
var a = [1,2,3,4].iterator();
// Actual reverse iteration bellow
var s = [];
for(i in a) {
s.push(i);
}
var total = s.length;
var totalMinusOne = total - 1;
for(i in 0...total) {
trace(s[totalMinusOne - i]);
}
Is there any more optimisations that could be used to remove the possibility of the array?
It bugs me that you have to duplicate the list, though... that's nasty. I mean, the data structure would ALREADY be an array, if that was the right data format for it. A better thing (less memory fragmentation and reallocation) than an Array (the "[]") to copy it into might be a linked List or a Hash.
But if we're using arrays, then Array Comprehensions (http://haxe.org/manual/comprehension) are what we should be using, at least in Haxe 3 or better:
var s = array(for (i in a) i);
Ideally, at least for large iterators that are accessed multiple times, s should be cached.
To read the data back out, you could instead do something a little less wordy, but quite nasty, like:
for (i in 1-s.length ... 1) {
trace(s[-i]);
}
But that's not very readable and if you're after speed, then creating a whole new iterator just to loop over an array is clunky anyhow. Instead I'd prefer the slightly longer, but cleaner, probably-faster, and probably-less-memory:
var i = s.length;
while (--i >= 0) {
trace(s[i]);
}
First of all I agree with Dewi Morgan duplicating the output generated by an iterator to reverse it, somewhat defeats its purpose (or at least some of its benefits). Sometimes it's okay though.
Now, about a technical answer:
By definition a basic iterator in Haxe can only compute the next iteration.
On the why iterators are one-sided by default, here's what we can notice:
if all if iterators could run backwards and forwards, the Iterator classes would take more time to write.
not all iterators run on collections or series of numbers.
E.g. 1: an iterator running on the standard input.
E.g. 2: an iterator running on a parabolic or more complicated trajectory for a ball.
E.g. 3: slightly different but think about the performance problems running an iterator on a very large single-linked list (eg the class List). Some iterators can be interrupted in the middle of the iteration (Lambda.has() and Lambda.indexOf() for instance return as soon as there is a match, so you normally don't want to think of what's iterated as a collection but more as an interruptible series or process iterated step by step).
While this doesn't mean you shouldn't define two-ways iterators if you need them (I've never done it in Haxe but it doesn't seem impossible), in the absolute having two-ways iterators isn't that natural, and enforcing Iterators to be like that would complicate coding one.
An intermediate and more flexible solution is to simply have ReverseXxIter where you need, for instance ReverseIntIter, or Array.reverseIter() (with using a custom ArrayExt class). So it's left for every programmer to write their own answers, I think it's a good balance; while it takes more time and frustration in the beginning (everybody probably had the same kind of questions), you end up knowing the language better and in the end there are just benefits for you.
Complementing the post of Dewi Morgan, you can use for(let i = a.length; --i >= 0;) i; if you wish to simplify the while() method. if you really need the index values, I think for(let i=a.length, k=keys(a); --i in k;) a[k[i]]; is the best that give to do keeping the performance. There is also for(let i of keys(a).reverse()) a[i]; which has cleaner writing, but its iteration rate increases 1n using .reduce()
This question is coded in pseudo-PHP, but I really don't mind what language I get answers in (except for Ruby :-P), as this is purely hypothetical. In fact, PHP is quite possibly the worst language to be doing this type of logic in. Unfortunately, I have never done this before, so I can't provide a real-world example. Therefore, hypothetical answers are completely acceptable.
Basically, I have lots of objects performing a task. For this example, let's say each object is a class that downloads a file from the Internet. Each object will be downloading a different file, and the downloads are run in parallel. Obviously, some objects may finish downloading before others. The actual grabbing of data may run in threads, but that is not relevant to this question.
So we can define the object as such:
class DownloaderObject() {
var $url = '';
var $downloading = false;
function DownloaderObject($v){ // constructor
$this->url = $v;
start_downloading_in_the_background(url=$this->$url, callback=$this->finished);
$this->downloading = true;
}
function finished() {
save_the_data_somewhere();
$this->downloading = false;
$this->destroy(); // actually destroys the object
}
}
Okay, so we have lots of these objects running:
$download1 = new DownloaderObject('http://somesite.com/latest_windows.iso');
$download2 = new DownloaderObject('http://somesite.com/kitchen_sink.iso');
$download3 = new DownloaderObject('http://somesite.com/heroes_part_1.rar');
And we can store them in an array:
$downloads = array($download1, $download2, $download3);
So we have an array full of the downloads:
array(
1 => $download1,
2 => $download2,
3 => $download3
)
And we can iterate through them like this:
print('Here are the downloads that are running:');
foreach ($downloads as $d) {
print($d->url . "\n");
}
Okay, now suppose download 2 finishes, and the object is destroyed. Now we should have two objects in the array:
array(
1 => $download1,
3 => $download3
)
But there is a hole in the array! Key #2 is being unused. Also, if I wanted to start a new download, it is unclear where to insert the download into the array. The following could work:
$i = 0;
while ($i < count($downloads) - 1) {
if (!is_object($downloads[$i])) {
$downloads[$i] = new DownloaderObject('http://somesite.com/doctorwho.iso');
break;
}
$i++;
}
However, that is terribly inefficient (and while $i++ loops are nooby). So, another approach is to keep a counter.
function add_download($url) {
global $downloads;
static $download_counter;
$download_counter++;
$downloads[$download_counter] = new DownloaderObject($url);
}
That would work, but we still get holes in the array:
array(
1 => DownloaderObject,
3 => DownloaderObject,
7 => DownloaderObject,
13 => DownloaderObject
)
That's ugly. However, is that acceptable? Should the array be "defragmented", i.e. the keys rearranged to eliminate blank spaces?
Or is there another programmatic structure I should be aware of? I want a structure that I can add stuff to, remove stuff from, refer to keys in a variable, iterate through, etc., that is not an array. Does such a thing exist?
I have been coding for years, but this question has bugged me for very many of those years, and I am still not aware of an answer. This may be obvious to some programmers, but is extremely non-trivial to me.
The problem with PHP's "associative arrays" is that they aren't arrays at all, they're Hashmaps. Having holes there is perfectly fine. You might look at a linked list, as well, but a Hashmap seems perfectly suited to what you're doing.
What is maintaining your array of downloaders?
If you encapsulate the array in a class that is notified by the downloader when it is finished you won't have to worry about stale references to destroyed objects.
This class can manage the organisation of the array internally and present an interface to its users that looks more like an iterator than an array.
"$i++ loops" are nooby, but only because the code becomes much clearer if you use a for loop:
$i = 0;
while ($i < count($downloads) - 1) {
if (!is_object($downloads[$i])) {
$downloads[$i] = new DownloaderObject('http://somesite.com/doctorwho.iso');
break;
}
$i++;
}
Becomes
for($i=0;$i<count($downloads)-1;++$i){
if (!is_object($downloads[$i])) {
$downloads[$i] = new DownloaderObject('http://somesite.com/doctorwho.iso');
break;
}
}
Coming from a C# perspective, my first thought would be that you need a different data structure to an array - you need to think about the problem using a higher-level data structure. Perhaps a Queue, List or Stack would suit your purposes better?
The short answer to your question is that in PHP arrays are used for almost everything and you rarely end up using other data structures. Having holes in your array indexes isn't anything to worry about. In other programming languages such as Java you have a more diverse set of data structures to choose from: Sets, Hashes, Lists, Vectors and more. It seems that you would also need to have a closer interaction between the Array and DownloaderObject class. Just because the object $download2 has "destroyed()" itself the array will maintain a reference to that object.
Some good answers to this question, which reflect on the relative experience on the answerers. Thank you very much — they proved very educational.
I posted this question nearly three years ago. In hindsight, I can see my knowledge in that area was severely lacking. The biggest problem I had was that I was coming from a PHP perspective, which does not have the ability to arbitrarily pop elements. Something the other answers to this question helped me to discover was that a fundamentally superior model is 'linked lists'.
For C, I wrote a blog post about linked lists which contains code samples (too numerous to post here) but would neatly fill the original question's use case.
For PHP, a linked list implementation appears here, which I have never tried, but imagine it would also be the right way to deal with the above.
Interestingly, Python lists contain the pop() method which, unlike PHP's array_pop(), can pop arbitrary elements and keep everything in order. For example:
>>> x = ['baa', 'ram', 'ewe'] # our starting point
>>> x[1] # making sure element 1 is 'ram'
'ram'
>>> x.pop(1) # let's arbitrarily pop an element in the middle
'ram'
>>> x # the one we popped ('ram') is now gone
['baa', 'ewe']
>>> x[1] # and there are no holes: item 2 has become item 1
'ewe'