I've been reading a fair bit about the performance of using LINQ rather than using a for each loop and from what I understand using a LINQ query would be a little bit slower but generally worth it for convenience and expressiveness. However I am a bit confused about how much slower it is if you were to use the results of the query in a for loop.
Let's say that I have a set called 'Locations' and a set of objects called 'Items'. Each 'item' can only belong to one 'location'. I want to link items that are under the same location to each other. If I were to do this using a normal 'For Each' loop it would be something like this:
For Each it as Item in Items
If it.Location.equals(Me.Location)
Me.LinkedItems.Add(it)
End If
Next
However if i was to use LINQ it would instead be this:
For Each it as Item in Items.Where(Function(i) i.Location.equals(Me.Location))
Me.LinkedItems.Add(it)
Next
Now my question is, is the second (LINQ) option going to loop once through the entire 'Items' set to complete the query, then loop through the results to add them to the list, resulting in essentially two loops, or will it do the one loop like the first (For Each) option? If the answer is the former, I assume then that it would be silly to use LINQ in this situation.
It will do one loop - it's lazily evaluated.
However, you may be able to do better than this. What's the type of LinkedItems? If it has an appropriate AddRange method, you should be able to do:
Me.LinkedItems.AddRange(Items.Where(Function(i) i.Location.equals(Me.Location)))
More on lazy evaluation
Basically Where maintains an iterator, and only finds the next matching item when you ask for it. In C#, the implementation would be something like:
// Error handling omitted
public static IEnumerable<T> Where(this IEnumerable<T> source,
Func<T, bool> predicate)
{
foreach (T element in source)
{
if (predicate(element))
{
yield return element;
}
}
}
It's the use of yield return here which would make it lazily evaluated. If you're not familiar with C# iterator blocks, you might want to look at these articles which explain them in more detail.
Of course Where could have been implemented "manually" instead of using an iterator block, but the above implementation is sufficient to show the lazy evaluation.
It will do the query once, since you are foreaching over a list of Items.Where. In your case that's a pre-filtered list of the condition you want and you should really go with the LINQ.
Related
I have a list of Longs in Kotlin and I want to make them strings for UI purposes with maybe some prefix or altered in some way. For example, adding "$" in the front or the word "dollars" at the end.
I know I can simply iterate over them all like:
val myNewStrings = ArrayList<String>()
longValues.forEach { myNewStrings.add("$it dollars") }
I guess I'm just getting nitpicky, but I feel like there is a way to inline this or change the original long list without creating a new string list?
EDIT/UPDATE: Sorry for the initial confusion of my terms. I meant writing the code in one line and not inlining a function. I knew it was possible, but couldn't remember kotlin's map function feature at the time of writing. Thank you all for the useful information though. I learned a lot, thanks.
You are looking for a map, a map takes a lambda, and creates a list based on the result of the lambda
val myNewStrings = longValues.map { "$it dollars" }
map is an extension that has 2 generic types, the first is for knowing what type is iterating and the second what type is going to return. The lambda we pass as argument is actually transform: (T) -> R so you can see it has to be a function that receives a T which is the source type and then returns an R which is the lambda result. Lambdas doesn't need to specify return because the last line is the return by default.
You can use the map-function on List. It creates a new list where every element has been applied a function.
Like this:
val myNewStrings = longValues.map { "$it dollars" }
In Kotlin inline is a keyword that refers to the compiler substituting a function call with the contents of the function directly. I don't think that's what you're asking about here. Maybe you meant you want to write the code on one line.
You might want to read over the Collections documentation, specifically the Mapping section.
The mapping transformation creates a collection from the results of a
function on the elements of another collection. The basic mapping
function is
map().
It applies the given lambda function to each subsequent element and
returns the list of the lambda results. The order of results is the
same as the original order of elements.
val numbers = setOf(1, 2, 3)
println(numbers.map { it * 3 })
For your example, this would look as the others said:
val myNewStrings = longValues.map { "$it dollars" }
I feel like there is a way to inline this or change the original long list without creating a new string list?
No. You have Longs, and you want Strings. The only way is to create new Strings. You could avoid creating a new List by changing the type of the original list from List<Long> to List<Any> and editing it in place, but that would be overkill and make the code overly complex, harder to follow, and more error-prone.
Like people have said, unless there's a performance issue here (like a billion strings where you're only using a handful) just creating the list you want is probably the way to go. You have a few options though!
Sequences are lazily evaluated, when there's a long chain of operations they complete the chain on each item in turn, instead of creating an intermediate full list for every operation in the chain. So that can mean less memory use, and more efficiency if you only need certain items, or you want to stop early. They have overhead though so you need to be sure it's worth it, and for your use-case (turning a list into another list) there are no intermediate lists to avoid, and I'm guessing you're using the whole thing. Probably better to just make the String list, once, and then use it?
Your other option is to make a function that takes a Long and makes a String - whatever function you're passing to map, basically, except use it when you need it. If you have a very large number of Longs and you really don't want every possible String version in memory, just generate them whenever you display them. You could make it an extension function or property if you like, so you can just go
fun Long.display() = "$this dollars"
val Long.dollaridoos: String get() = "$this.dollars"
print(number.display())
print(number.dollaridoos)
or make a wrapper object holding your list and giving access to a stringified version of the values. Whatever's your jam
Also the map approach is more efficient than creating an ArrayList and adding to it, because it can allocate a list with the correct capacity from the get-go - arbitrarily adding to an unsized list will keep growing it when it gets too big, then it has to copy to another (larger) array... until that one fills up, then it happens again...
Hi I've got list of 1330 objects and would like to apply method and obtain set as result.
val result = listOf1330
.asSequence()
.map {
someMethod(it)
}
val resultSet = result.toSet()
It works fine without toSet but if then execution time is about 10 times longer.
I've used sequence to make it work faster and it is but as a result I need list without duplicates (set).
Simply: What is most effective way to convert sequence to set?
val result = listOf1330.mapTo(HashSet()) { someMethod(it) }
It makes less sense to use streams or sequences to implement the transformation - you will need all elements from the collection, not several. The mapTo (and map) functions are inline in Kotlin. It means the code will be substituted into the call site, it will not have lambda created and executed many times. We use mapTo to avoid the second copy of the collection done by the toSet() function.
The .parallelStream() may add more performance, if you like to run the computation in several threads. It is still a good idea to measure how good the load is balanced between threads. The performance may depend on the collection implementation class, on which you call it
If your someObject has a slow implementation of equals() or hashCode(), or gives the same hash code for many objects, then that could account for the delay, and you may be able to improve it.
Otherwise, if the objects are big, the delay may be mostly due to the amount of memory that must be accessed to store them all; if so, that's the price you'll have to pay if you want a set with all those objects in memory.
Sequence.toSet() uses a LinkedHashSet. You could try providing another Set instance, using e.g. toCollection(HashSet()), to see if that's any faster. (You wouldn't get the same iteration order, though.)
I agree with gidds answer on HashSet and LinkedHashSet performance.
LinkedHashSet is more expensive for insertions than HashSet;
However, in the above use case, I think we can leverage parallelStream to improve the performance. Under the hood, Kotlin uses the Java parallelStream.
val result: Set<String> = listOf("sdgds", "fdgdfsg", "dsfgsdfg")
.parallelStream()
.map {
someMethod(it)
}.collect(Collectors.toSet())
The Collectors.toSet() uses HashSet. So, we should be ok in insertion performance perspective.
Use distict or distictBy.
val result = sequenceOf("a", "b", "a", "c").distinct()
// -> "a", "b", "c"
// for more complex cases use custom comparator function
val result = getMyObjectsSequence().distinctBy { it.name }
This approach lets keep using sequence without involving explicit Iterables (List, Set, etc.).
Nevertheless, there is no magic, and "distinct" still uses HashSet under the hood and in case of really huge sequence it may cause sufficient memory usage and it must be kept in mind while applying this function.
In C# if I have the following object:
IEnumerable<Product> products;
and if I want to get how many elements it contains I use:
int productCount = products.Count();
but it looks like there is no such method in VB.NET. Anybody knows how to achieve the same result in VB.NET?
Count is available in VB.NET:
Dim x As New List(Of String)
Dim count As Integer
x.Add("Item 1")
x.Add("Item 2")
count = x.Count
http://msdn.microsoft.com/en-us/library/bb535181.aspx#Y0
In later versions of .net, there is an extension method called Count() associated with IEnumerable<T>, which will use IList<T>.Count() or ICollection.Count() if the underlying enumerator supports either of those, or will iteratively count the items if it does not.
An important caveat not always considered with this: while an IEnumerable<DerivedType> may generally be substituted for an IEnumerable<BaseType>, a type which implements IList<DerivedType> but does not implement ICollection may be efficiently counted when used as an IEnumerable<DerivedType>, but not when cast as IEnumerable<BaseType> (even though the class would support an IList<DerivedType>.Count() method which would return the correct result, the system wouldn't look for that--it would look for IList<BaseType> instead, which would not be implemented.
In general, IEnumerable won't have a Count unless the underlying collection supports (eg List).
Think about what needs to happen for a generic IEnumerable to implement a Count method. Since the IEnumerable only executes when data is requested, in order to perform a Count, it needs to iterate through till the end keeping track of how many elements it has found.
Generally, this iteration will come to an end but you can setup a query that loops forever. Count is either very costly time-wise or dangerous with IEnumerable.
We are implementing some EF data repositories, and we have some queries which would include TOP 1
I have read many posts suggesting to use .Take(1)
The code I'm reviewing uses .First()
I understand that both of these produce the same result for the object assignment, but do they both actually resolve to the same query? When the DB is queried, will it actually be with TOP 1 for both requests? Or will they execute the query in full into the enumerable, then simply take the first entry in the collection?
Furthermore, if we used .FirstOrDefault() then why should we expect any different behavior? I know that when using an IEnumerable, calling .First() on an empty collection will throw, but if this is actually only changing the query to include TOP 1 then I should expect absolutely no functional difference between .First() and .FirstOrDefault().... right?
Alternatively, is there some better method than these Enumerable extentions for making the query execute TOP 1?
From LINQPad:
C#:
age_Centers.Select(c => c.Id).First();
age_Centers.Select(c => c.Id).FirstOrDefault();
age_Centers.Select(c => c.Id).Take(1).Dump();
SQL:
SELECT TOP (1) [t0].[Id]
FROM [age_Centers] AS [t0]
GO
SELECT TOP (1) [t0].[Id]
FROM [age_Centers] AS [t0]
GO
SELECT TOP (1) [t0].[Id]
FROM [age_Centers] AS [t0]
*Note that Take(1) enumerates and returns an IQueryable.
Redirect the DataContext Log property to Console.Out or a TextFile and see what query each option produces.
How .First() works:
If the collection is of type IList, then the first element is accessed by index position which is different depending on the collection implementation. Otherwise, an iterator returns the first element.
And .Take(int count) always iterate.
If there's any gain, it happens if the collection implements IList and the speed to access the first element by index is higher than that of returning an iterator. I don't believe it will be significant. ;)
Sources:
http://www.hookedonlinq.com/FirstOperator.ashx
http://www.hookedonlinq.com/TakeOperator.ashx
First will query Take 1, so there is no difference in query. Calling FirstOrDefault will be one step statement, because Take returns IEnumerable do you will need to call First anyway.
First will throw exception so FirstOrDefault is always preferred.
And ofcourse people who wrote EF query converter are smart enough to call Take 1 instead executing entire result set and returning first item.
You can this verify using SQL profiler.
**First()** operates on a collection of any number of objects and returns the first object. **Take(1)** operates on a collection of any number of objects and returns a collection containing the first object.
You can also use Single
Single() operates on a collection of exactly one object and simply returns the object.
Not long time before I've discovered, that new dynamic keyword doesn't work well with the C#'s foreach statement:
using System;
sealed class Foo {
public struct FooEnumerator {
int value;
public bool MoveNext() { return true; }
public int Current { get { return value++; } }
}
public FooEnumerator GetEnumerator() {
return new FooEnumerator();
}
static void Main() {
foreach (int x in new Foo()) {
Console.WriteLine(x);
if (x >= 100) break;
}
foreach (int x in (dynamic)new Foo()) { // :)
Console.WriteLine(x);
if (x >= 100) break;
}
}
}
I've expected that iterating over the dynamic variable should work completely as if the type of collection variable is known at compile time. I've discovered that the second loop actually is looked like this when is compiled:
foreach (object x in (IEnumerable) /* dynamic cast */ (object) new Foo()) {
...
}
and every access to the x variable results with the dynamic lookup/cast so C# ignores that I've specify the correct x's type in the foreach statement - that was a bit surprising for me... And also, C# compiler completely ignores that collection from dynamically typed variable may implements IEnumerable<T> interface!
The full foreach statement behavior is described in the C# 4.0 specification 8.8.4 The foreach statement article.
But... It's perfectly possible to implement the same behavior at runtime! It's possible to add an extra CSharpBinderFlags.ForEachCast flag, correct the emmited code to looks like:
foreach (int x in (IEnumerable<int>) /* dynamic cast with the CSharpBinderFlags.ForEachCast flag */ (object) new Foo()) {
...
}
And add some extra logic to CSharpConvertBinder:
Wrap IEnumerable collections and IEnumerator's to IEnumerable<T>/IEnumerator<T>.
Wrap collections doesn't implementing Ienumerable<T>/IEnumerator<T> to implement this interfaces.
So today foreach statement iterates over dynamic completely different from iterating over statically known collection variable and completely ignores the type information, specified by user. All that results with the different iteration behavior (IEnumarble<T>-implementing collections is being iterated as only IEnumerable-implementing) and more than 150x slowdown when iterating over dynamic. Simple fix will results a much better performance:
foreach (int x in (IEnumerable<int>) dynamicVariable) {
But why I should write code like this?
It's very nicely to see that sometimes C# 4.0 dynamic works completely the same if the type will be known at compile-time, but it's very sadly to see that dynamic works completely different where IT CAN works the same as statically typed code.
So my question is: why foreach over dynamic works different from foreach over anything else?
First off, to explain some background to readers who are confused by the question: the C# language actually does not require that the collection of a "foreach" implement IEnumerable. Rather, it requires either that it implement IEnumerable, or that it implement IEnumerable<T>, or simply that it have a GetEnumerator method (and that the GetEnumerator method returns something with a Current and MoveNext that matches the pattern expected, and so on.)
That might seem like an odd feature for a statically typed language like C# to have. Why should we "match the pattern"? Why not require that collections implement IEnumerable?
Think about the world before generics. If you wanted to make a collection of ints, you'd have to use IEnumerable. And therefore, every call to Current would box an int, and then of course the caller would immediately unbox it back to int. Which is slow and creates pressure on the GC. By going with a pattern-based approach you can make strongly typed collections in C# 1.0!
Nowadays of course no one implements that pattern; if you want a strongly typed collection, you implement IEnumerable<T> and you're done. Had a generic type system been available to C# 1.0, it is unlikely that the "match the pattern" feature would have been implemented in the first place.
As you've noted, instead of looking for the pattern, the code generated for a dynamic collection in a foreach looks for a dynamic conversion to IEnumerable (and then does a conversion from the object returned by Current to the type of the loop variable of course.) So your question basically is "why does the code generated by use of the dynamic type as a collection type of foreach fail to look for the pattern at runtime?"
Because it isn't 1999 anymore, and even when it was back in the C# 1.0 days, collections that used the pattern also almost always implemented IEnumerable too. The probability that a real user is going to be writing production-quality C# 4.0 code which does a foreach over a collection that implements the pattern but not IEnumerable is extremely low. Now, if you're in that situation, well, that's unexpected, and I'm sorry that our design failed to anticipate your needs. If you feel that your scenario is in fact common, and that we've misjudged how rare it is, please post more details about your scenario and we'll consider changing this for hypothetical future versions.
Note that the conversion we generate to IEnumerable is a dynamic conversion, not simply a type test. That way, the dynamic object may participate; if it does not implement IEnumerable but wishes to proffer up a proxy object which does, it is free to do so.
In short, the design of "dynamic foreach" is "dynamically ask the object for an IEnumerable sequence", rather than "dynamically do every type-testing operation we would have done at compile time". This does in theory subtly violate the design principle that dynamic analysis gives the same result as static analysis would have, but in practice it's how we expect the vast majority of dynamically accessed collections to work.
But why I should write code like this?
Indeed. And why would the compiler write code like that? You've removed any chance it might have had to guess that the loop could be optimized. Btw, you seem to interpret the IL incorrectly, it is rebinding to obtain IEnumerable.Current, the MoveNext() call is direct and GetEnumerator() is called only once. Which I think is appropriate, the next element might or might not cast to an int without problems. It could be a collection of various types, each with their own binder.