Given a datatable containing two columns like this:
Private Function CreateDataTable() As DataTable
Dim customerTable As New DataTable("Customers")
customerTable.Columns.Add(New DataColumn("Id", GetType(System.Int32)))
customerTable.Columns.Add(New DataColumn("Name", GetType(System.String)))
Dim row1 = customerTable.NewRow()
row1.Item("Id") = 1
row1.Item("Name") = "Customer 1"
customerTable.Rows.Add(row1)
Dim row2 = customerTable.NewRow()
row2.Item("Id") = 2
row2.Item("Name") = "Customer 2"
customerTable.Rows.Add(row2)
Dim row3 = customerTable.NewRow()
row3.Item("Id") = 3
row3.Item("Name") = "Customer 3"
customerTable.Rows.Add(row3)
Return customerTable
End Function
Would you use this snippet to retrieve a List(Of Integer) containing all Id's:
Dim table = CreateDataTable()
Dim list1 As New List(Of Integer)
For i As Integer = 0 To table.Rows.Count - 1
list1.Add(CType(table.Rows(i)("Id"), Integer))
Next
Or rather this one:
Dim list2 = (From r In table.AsEnumerable _
Select r.Field(Of Integer)("Id")).ToList()
This is not a question about whether to type cast the Id column to Integer by using .Field(Of Integer), CType, CInt, DirectCast or whatever but generally about whether or not you choose Linq over forloops as the subject implies.
For those who are interested: I ran some iterations with both versions which resulted in the following performance graph:
graph http://dnlmpq.blu.livefilestore.com/y1pOeqhqQ5neNRMs8YpLRlb_l8IS_sQYswJkg17q8i1K3SjTjgsE4O97Re_idshf2BxhpGdgHTD2aWNKjyVKWrQmB0J1FffQoWh/analysis.png?psid=1
The vertical axis shows the milliseconds it took the code to convert the rows' ids into a generic list with the number of rows shown on the horizontal axis. The blue line resulted from the imperative approach (forloop), the red line from the declarative code (linq).
Whatever way you generally choose: Why do you go that way and not the other?
Whenever possible I favor the declarative way of programming instead of imperative. When you use a declarative approach the CLR can optimize the code based on the characteristics of the machine. For example if it has multiple cores it could parallelize the execution while if you use an imperative for loop you are basically locking this possibility. Today maybe there's no big difference but I think that in the future more and more extensions like PLINQ will appear allowing better optimization.
I avoid linq unless it helps readability a lot, because it completely destroys edit-and-continue.
When they fix that, I will probably start using it more, because I do like the syntax a lot for some things.
For almost everything I've done I've come to the conclusion that LINQ is optimized enough. If I handcrafted a for loop it would have better performance, but in the grand scheme of things we are usually talking milliseconds. Since I rarely have a situation where those milliseconds will make any kind of impact, I find it's much more important to have readable code with clear intentions. I would much rather have a call that is 50ms slower than have someone come along and break it altogether!
Resharper has a cool feature that will flag and convert loops into Linq expressions. I will flip it to the Linq version and see if that hurts or helps readability. If the Linq expression more clearly communicates the intent of the code, I will go with that. If the Linq expression is unreadable, I will flip back to the foreach version.
Most of the performance issues don't really compare with readability for me.
Clarity trumps cleverness.
In the above example, I would go with the the Linq version since it clearly explains the intent and also locks out people accidently adding side effects in the loop.
I recently found myself wondering whether I've been totally spoiled by LINQ. Yes, I now use it all the time to pick all sort of things out from all sort of collections.
I started to, but found out in some cases, I saved time by using this approach:
for (var i = 0, len = list.Count; i < len; i++) { .. }
Not necessarily in all cases, but some. Most extension methods use the foreach approach of querying.
I try to follow these rules:
Whenever I'm just querying (filtering, projecting, ...) collections, use LINQ.
As soon as I'm actually 'doing' something with the result (i.e, introduce side effects), I'll use a for loop.
So in this example, I'll use LINQ.
Also, I always try to split up the 'query definition' from the 'query evaluation':
Dim query = From r In table.AsEnumerable()
Select r.Field(Of Integer)("Id")
Dim result = query.ToList()
This makes it clear when that (in this case in-memory) query will be evaluated.
Related
I'm just looking to be able to sort the results of a BatchedJoinBlock (http://msdn.microsoft.com/en-us/library/hh194683.aspx) so that the different results of the different targets stay together. I will explain! Example in some pseudo-code:
Dim batchedJoin = New BatchedJoinBlock(Of String, object)(4)
batchedJoin.Target1.Post("String1Target1")
batchedJoin.Target2.Post(CType(BuildIt, StringBuilder1))
batchedJoin.Target1.Post("String1Target2")
batchedJoin.Target2.Post(CType(BuildIt, StringBuilder2))
Dim results = batchedJoin.Receive()
'This sorts one result...
Dim SortByResult = results.Item1.OrderBy(Function(item) item.ToString, New NaturalStringComparer)
Basically I've got a string and an object, the SortByResult variable above sorts the strings exactly as I'd like them to sort. I'm looking for a way to get the objects that used to be at the same index number in target2 into the same order. e.g. if "String1Target1" changes order I'd like to somehow reliably refer to/pair it together with "StringBuilder1". The actual end result just needs to be that the objects (target2) are sorted in the order that is dictated by the strings being sorted (target1). Something like:
Dim EndResult = results.Item2.OrderBy(strings in target1)
but I'll gladly take an intermediate solution! I've also tried using a dictionary (results.Item2.ToDictionary) with the string as a key (which would also be a fine solution) but it's a bit beyond my ken using lamba expressions in the proper context. I can realistically do this in several steps with a list or something, but I'm trying to get something more efficient/learn something, and it seems like there's a lot of default options with the results of the jointblock that I'm just not experienced enough to use. Thanks in advance for any help you can provide!
To me, it looks like you don't actually want BatchedJoinBlock, because the two pieces of data always come together. A better option for that would be a BatchBlock of Tuple<string, object>. When you have that, you can then use LINQ directly to sort each batch:
results.OrderBy(Function(tuple) tuple.Item1)
Using VB.Net, I'm looking for best practices in dealing with the following idiom:
For i as Integer = 0 To o1.Count - 1
o1(i).x = o2(i).x
Next
What I really want is something VB.Net doesn't offer -- additional and simultaneous iterators on For Each statements.
For Each m1 As c1 In o1, m2 As c2 In o2
m1.x = m2.x
Next
I'm interested in both Linq and non-Linq recommendations, comments about Copy method design, comparisions to C# or other languages.
I'm sure your actual real world situation is more complicated than the example you have shown, but I thought it was worth mentioning that, if all you are doing is trying to copy an entire list of items from one list to another, you can simply do something like this:
Dim list2 = New List(Of Object)(list1)
I'm not entirely sure what you are trying to do with this. We could probably help you better if you elaborate a little more on your problem.
If both lists are of the same Type you could append one list to the other and then use a for each on just one list.
I submitted a request to extend the For Each statement to implement additional and simultaneous iterators.
http://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2923571-allow-additional-clauses-on-for-each-similar-to-c-
I'm still interested in other solutions but until then, this is the answer.
If Not m_Batchs Is Nothing Then
For Each Batch In m_Batchs
newListItem = lstWsJobs.Items.Add(Batch.Id.ToString)
With newListItem
.Name = Batch.Id.ToString()
.SubItems.Add(Batch.JobId.ToString)
.SubItems.Add(Batch.Complete.ToString)
.SubItems.Add(Batch.User)
.SubItems.Add(Batch.Time.ToString)
End With
Next
End If
I have this list view (which is working fine) and i want to find an efficient way of populating it in a specific order, ie by date, by identity etc.
I know i can use linq but as i understand this is inefficient. If m_batchs is a large list of objects then i will looping through this list many, many time (as linq behind the scenes loops through the object collection).
Any ideas?
LINQ is not inefficient in general but almost always it's easier to read and faster to implement, change and extend. Also, does it really matter if one approach is 1 millisecond faster on 1000 iterations?
So i assume that Batch is a custom type and m_Batchs is a List<Batch>:
// order by date
var query = m_Batchs.OrderBy(b => b.Time);
// order by identity
query = m_Batchs.OrderBy(b => b.ID);
// ...
Measure the difference between this simple LINQ query and your custom implementation.
Edit: Sorry, that was C#
Dim batchByTime = m_Batchs.OrderBy(Function(b) b.Time);
Dim batchByID = m_Batchs.OrderBy(Function(b) b.ID);
I've been using LINQ so much in the last couple of weeks that when I had to write a one line function to remove < and > from a string, I found that I had written it as a LINQ query:
Public Function StripLTGT(text As String) As String
Return String.Join("", (From i As Char In text.ToCharArray Where i <> "<"c Where i <> ">"c).ToArray)
End Function
My question is, is it better to do it with LINQ as above or with StringBuilder as I've always done, as below:
Public Function StripLTGT(text As String) As String
Dim a As New StringBuilder(text)
a = a.Replace("<", "")
a = a.Replace(">", "")
Return a.ToString
End Function
Both work, the second one is easier to read, but the first one is designed for executing queries against arrays and other enumerables.
Regex.Replace("[<>]", "")
Is much more straightforward.
Or:
myString = myString.Replace("<", "").Replace(">", "")
Whether or not option A, B or C is faster than the others is hard to say because option A may be better on small strings while option B may be better on long strings, etc.
Either one should really be fine in terms of functionality. The first one is not efficient as is. The ToArray call is doing far more work than necessary (if you're on .NET 4.0, it is not needed anyway), and the ToCharArray call is not needed. Basically the characters in the input string are being iterated a lot more than they need to be, and extra arrays are allocated superfluously.
I wouldn't say this particularly matters; but you asked about efficiency, so that's why I mention it.
The second one seems fine to me. Note that if you wanted to go the one-line route, you could still do so with a StringBuilder and I think still have something more concise than the LINQ version (though I haven't counted characters). Whether or not this even outperforms the more direct String.Replace option is kind of unclear to me, though:
' StringBuilder.Replace version:
Return New StringBuilder(text).Replace("<", "").Replace(">", "").ToString()
' String.Replace version:
Return text.Replace("<", "").Replace(">", "")
I'm working on my first project using LINQ (in mvc), so there is probably something very simple that I missed. However, a day of searching and experimenting has not turned up anything that works, hence the post.
I'm trying to write a LINQ query (Linq to SQL) that will contain a multiple number of conditions in the where statement separated by an OR or an AND. We don't know how many conditions are going to be in the query until runtime. This is for a search filter control, where the user can select multiple criteria to filter by.
select * from table
where table.col = 1
OR table.col = 2
OR table.col = 7
.... 'number of other conditions
Before I would just construct the SQL query as a string while looping over all conditions. However, it seems like there should be a nice way of doing this in LINQ.
I have tried looking using expression trees, but they seem a bit over my head for the moment. Another idea was to execute a lambda function inside the where statement, like so:
For Each value In values
matchingRows = matchingRows.Where(Function(row) row.col = value)
However, this only works for AND conditions. How do I do ORs?
I would use PredicateBuilder for this. It makes dynamic WHERE clauses very easy.
AND is easy - you can just call Where in a loop. OR is much trickier. You mention SQL, so I'm assuming this is something like LINQ-to-SQL, in which case one way I've found to do this involves building custom Expression trees at runtime - like so (the example is C#, but let me know if you need help translating it to VB; my VB isn't fantastic any more, so I'll let you try first... you can probably read C# better than I can write VB).
Unfortunately, this won't work with EF in 3.5SP1 (due to the Expression.Invoke), but I believe this is fixed in 4.0.
Something like this should work (forgive my VB):
Expression(Of Func(Of Something, Boolean)) filter = Nothing
ParameterExpression rowParam = Expression.Parameter("row", CType(Something))
For Each value In values
filterPart = Expression.Equal( _
Expression.Property(rowParam, "col"), _
Expression.Constant(value)))
If filter Is Nothing Then
filter = filterPart
Else
filter = Expression.OrElse(filter, filterPart)
End If
Next
If newPredicate IsNot Nothing Then
matchingRows = matchingRows.Where( _
Expression.Lambda(Of Func(Of SomeType, Boolean))(filter, rowParam))
End If
No guarantees, however, my VB is a little rusty :-)
But PredicateBuilder might be a better solution if you want to do more complicated stuff than just Ands and Ors.