Create a List of elements from a DataTable LINQ Column - vb.net

I would like to know how I can convert elements of a column of a DataTable to a list of type string, grouping the elements to avoid repetition.
For example my DataTable would look like this
DataTable
and I want to make a list containing the elements of only "User" without repeating itself using LINQ.
The code I was trying to use is
InvoiceList = InvoiceDT.AsEnumerable().GroupBy(Function(r) r("User").ToString).ToList(Function(g) g.ToList())
But it doesn't work for me since I am new to LINQ and still have problems forming the structures.

I'd use this:
InvoiceList = InvoiceDT.AsEnumerable().Select(Function(r) r("User").ToString()).Distinct().ToList()
If you wanted a GroupBy solution it's
InvoiceList = InvoiceDT.AsEnumerable().GroupBy(Function(r) r("User").ToString()).Select(Function(g) g.Key).ToList()
Where your code went wrong was in trying to pass a delegate to ToList; it doesn't take one (and you wouldn't ToList the g either, as it's a list of data rows with all varying properties).
To reshape our IGrouping (something like a list of objects that all share the same Key, which is a property of the list that the IGrouping represents) produced by the groupby into a sequence of string Keys we Select the Key, and then ToList that
There is a lot of back and forthing between developers over things like ToList vs ToArray - some people universally use ToList because, for collections of an unknown number of elements, both list and array will grow and resize repeatedly in the same way but using ToArray requires one additional resizing step at the end to trim off any unused slots. Mostly that's trivial in terms of an overall performance consideration and should be weighed against the benefit of releasing the memory with the trim. Getting into finer details is way beyond the scope of this answer but you can read some huge blog posts about it.
I personally think it's more important to generate sensible code by calling the method that results in the relevant type depending on what you plan to do with it; I ToList if I need List functionality (add/insert/remove).. I prefer ToArray if an array suits the follow-on purposes (read/write/random access, no insert or delete), and if I'll only ever enumerate it I don't To... anything at all - I just ForEach the result of the query, which can give a bigger performance boost than anything else because it means I may not have to enumerate the entire set (if I stop early) or allocate memory all at once for doing so (if I'm writing to a socket or file)
On the use of ToString; it's worth avoiding if you think you'll fall into a pattern where you do it on every column just to get a string. If the column is already a string it's an acceptable way to get the object that DataRow.Item gives you, into a string. If the column is another type it's better to cast it:
DirectCast(r("Age"), Integer)
r.Field(Of Integer)("Age")
Thing is, it's verbose, and ugly, and intellisense doesn't help you out with writing Age or knowing it's an Int. LINQ in VB is bad enough for verbosity without pouring gas on that fire. If you're working with datatables of a known structure, it's a lot nicer if you make strongly typed ones:
Add a new file of type DataSet to your project
Open it so the design surface appears. In the properties grid call it something reasonable, such as AccountsDataSet
Right click, Add Table, call it Invoices
Right click the emppty table, Add Column, call it User
Then use it like:
Dim dt as new AccountsDataSet.InvoicesDataTable
Populate it like:
dt.AddInvoicesRow("John Smith", ... other properties here)
Query it like:
dt.Select(Function(r) r.User).Distinct()
Much nicer than accessing column names by string, and having them be objects that need casting..
Consider the dataset generator as a way to quickly, visually, create poco classes with named, typed properties

Try this
dim list as List(of string) = InvoiceDT.Rows.
Cast(of DataRow)().
Select(Function(r) r("User").ToString()).
Distinct().
ToList()
Here you cast Row collection as IEnumerable(of DataRow), rest is trivial

Related

VB.NET "For each" versus ".GetUpperBound(0)"

I would like to know what is preferred...
Dim sLines() As String = s.Split(NewLine)
For each:
For Each sLines_item As String In sLines
.GetUpperBound:
For i As Integer = 0 To sLines.GetUpperBound(0)
I have no idea why the "For Each" was introduced for such cases. Until now I have only used .GetUpperBound, and I don't see any PRO for the "For Each".
Thank you
ps: When I use ."GetUpperBound(0)", I do know that I am iterating over the vector.
The "For Each" in contrast sounds like "I don't care in which order the vector is given to me". But that is just personal gusto, I guess.
Short answer: Do not use GetUpperBound(). The only advantage of GetUpperBound() is that it works for multi-dimensional arrays, where Length doesn't work. However, even that usage is outdated since there is Array.GetLength() available that takes the dimension parameter. For all other uses, For i = 0 to Array.Length - 1 is better and probably the fastest option.
It's largely a personal preference.
If you need to alter the elements of the array, you should use For i ... because changing sLines_item will not affect the corresponding array element.
If you need to delete elements of the array, you can iterate For i = ubound(sLines) to 0 step -1 (or the equivalent).
Short answer
You should always use For Each on IEnumerable types unless you have no other choice.
Long answer
Contrary to the popular understanding, For Each is not a syntactic sugar on top of For Next. It will not necessarily iterate over every element of its source. It is a syntactic sugar on top of IEnumerable.GetEnumerator(). For Each will first get an enumerator to its source then loop until it cannot enumerate further. Basically, it will be replaced by the following code. Keep in mind that this is an oversimplification.
' Ask the source for a way to enumerate its content in a forward only manner.
Dim enumerator As IEnumerator = sLines.GetEnumerator()
' Loop until there is no more element in front of us.
While enumerator.Next() Then
' Invoke back the content of the for each block by passing
' the currently enumerated element.
forEachContent.Invoke(enumerator.Current)
End While
The major difference between this and a classical For Next loop is that it does not depend on any length. This fixes two limitations in modern .NET languages. The first one has to do with the Count method. IEnumerable provides a Count method, but the implementation might not be able to keep track of the actual amount of elements it stores. Because of this, calling IEnumerable.Count might cause the source to be iterated over to actually count the amount of element it contains. Moreover, doing this as the end value for traditional For Next loop will cause this process to be done for every element in the loop. This is very slow. Here is an illustration of this process:
For i As Integer = 0 To source.Count() ' This here will cause Count to be
' evaluated for every element in source.
DoSomething(source(i))
Next
The use of For Each fixes this by never requesting the length of the source.
The second limitation it fixes is the lack of a concept for arrays with infinite amount of elements. An example of such cases would be an array containing every digit of PI where each digit is only calculated when you request them. This is where LINQ makes its entrance and really shines because it enables you to write the following code:
Dim piWith10DigitPrecision = From d In InfinitePiSource
Take 10
Dim piWith250DigitPrecision = From d In InfinitePiSource
Take 250
Dim infite2PiSource = From d In InfinitePiSource
Select d * 2
Now, in an infinite source, you cannot depend on a length to iterate over all of its elements. It has an infinite length thus making a traditional For Next loop an infinite loop. This does not change anything for the first two examples I have given with pi because we explicitly provides the amount of elements we want, but it does for the third one. When would you stop iterating? For Each, when combined with Yield (used by the Take operator), makes sure that you never iterate until you actually requests a specific value.
You might have already figured it out by now but these two things means that For Each effectively have no concept of bounds because it simply does not require them. The only use for GetLowerBound and GetUpperBound are for non-zero-indexed arrays. For instance, you might have an array that indexes values from 1 instead of zero. Even then, you only need GetLowerBound and Length. Obviously, this is only if the position of the element in the source actually matters. If it does not, you can still use For Each to iterate over all elements as it is bound agnostic.
Also, as already mentioned, GetLength should be used for zero-indexed multi-dimensional arrays, again, only if the position of the element matters and not just the element itself.

Using Orderby on BatchedJoinBlock(Of T1, T2) - Dataflow (Task Parallel Library)

I'm just looking to be able to sort the results of a BatchedJoinBlock (http://msdn.microsoft.com/en-us/library/hh194683.aspx) so that the different results of the different targets stay together. I will explain! Example in some pseudo-code:
Dim batchedJoin = New BatchedJoinBlock(Of String, object)(4)
batchedJoin.Target1.Post("String1Target1")
batchedJoin.Target2.Post(CType(BuildIt, StringBuilder1))
batchedJoin.Target1.Post("String1Target2")
batchedJoin.Target2.Post(CType(BuildIt, StringBuilder2))
Dim results = batchedJoin.Receive()
'This sorts one result...
Dim SortByResult = results.Item1.OrderBy(Function(item) item.ToString, New NaturalStringComparer)
Basically I've got a string and an object, the SortByResult variable above sorts the strings exactly as I'd like them to sort. I'm looking for a way to get the objects that used to be at the same index number in target2 into the same order. e.g. if "String1Target1" changes order I'd like to somehow reliably refer to/pair it together with "StringBuilder1". The actual end result just needs to be that the objects (target2) are sorted in the order that is dictated by the strings being sorted (target1). Something like:
Dim EndResult = results.Item2.OrderBy(strings in target1)
but I'll gladly take an intermediate solution! I've also tried using a dictionary (results.Item2.ToDictionary) with the string as a key (which would also be a fine solution) but it's a bit beyond my ken using lamba expressions in the proper context. I can realistically do this in several steps with a list or something, but I'm trying to get something more efficient/learn something, and it seems like there's a lot of default options with the results of the jointblock that I'm just not experienced enough to use. Thanks in advance for any help you can provide!
To me, it looks like you don't actually want BatchedJoinBlock, because the two pieces of data always come together. A better option for that would be a BatchBlock of Tuple<string, object>. When you have that, you can then use LINQ directly to sort each batch:
results.OrderBy(Function(tuple) tuple.Item1)

Linq to Entities Query Only Recognize Integers

My app is EF 5.0 Code First using DbContext. I've been trying to manipulate (filter & sort) using Linq. I'm first trying to load a context to Local like:
context.mytable.orderby(function a a.somestring).thenby(function b d.someint).load
I don't get any error but the string ordering is ignored. The int sort works. Similarly when I try to set my binding source the linq portions comparing strings or dates doesn't work but when I use integers it works. I assume these are not converting to SQL query correctly but I can't figure out how to fix them. I would appreciate a point in the right direction.
Load() however doesn't create anything, it only takes a bunch of items and loads them from the underlying source to your context. You can't impose ordering on context.mytable because mytable doesn't hold the actual data, it holds part of the query you'll be using to get the actual data.
To get ordering to your UI you need to either order the list you'll be binding to, or use an unordered list and wrap it in an object whose job is to apply ordering. Since you're using Local go with the later and use CollectionViewSource.
Here's an example of CollectionViewSource in WPF. It's bound to the property StateOrProvinces of the DataContext that is the unordered list I want displayed in order:
<UserControl.Resources>
<CollectionViewSource x:Key="stateOrProvinceViewSource" Source="{Binding Path=StateOrProvinces}" >
<CollectionViewSource.SortDescriptions>
<scm:SortDescription PropertyName="Name" />
</CollectionViewSource.SortDescriptions>
</CollectionViewSource>
</UserControl.Resources>
with scm being:
xmlns:scm="clr-namespace:System.ComponentModel;assembly=WindowsBase"
The same thing in code:
CollectionViewSource orderedView = new CollectionViewSource()
{
Source = StateOrProvinces,
};
orderedView.SortDescriptions.Add(new SortDescription("Name", ListSortDirection.Ascending));
Using an extra layer to apply ordering has a few advantages. It lets you display the same list with different ordering in different places and you add/remove items from the underlying list without having to manually apply ordering every time.
You should also look at CollectionView class that has some extra bells and whistles but I wont go into it, to cut down on info overload.

sorting and getting uniques

i have a string that looks like this
"apples,fish,oranges,bananas,fish"
i want to be able to sort this list and get only the uniques. how do i do it in vb.net? please provide code
A lot of your questions are quite basic, so rather than providing the code I'm going to provide the thought process and let you learn from implementing it.
Firstly, you have a string that contains multiple items separated by commas, so you're going to need to split the string at the commas to get a list. You can use String.Split for that.
You can then use some of the extension methods for IEnumerable<T> to filter and order the list. The ones to look at are Enumerable.Distinct and Enumerable.OrderBy. You can either write these as normal methods, or use Linq syntax.
If you need to get it back into a comma-separated string, then you'll need to re-join the strings using the String.Join method. Note that this needs an array so Enumerable.ToArray will be useful in conjunction.
You can do it using LINQ, like this:
Dim input = "apples,fish,oranges,bananas,fish"
Dim strings = input.Split(","c).Distinct().OrderBy(Function(s) s)
I'm not a VB.NET programmer, but I can give you a suggestion:
Split the string into an array
Create a second array
Cycle through the first array, adding any value that is not in the second.
Upon completion, your second array will have only unique values.

What is the advised way to make an empty array in VB.NET?

What is the best way to take an array in VB.NET which can either be Nothing or initialised and give it a length of zero?
The three options I can think of are:
ReDim oBytes(-1)
oBytes = New Byte(-1) {}
oBytes = New Byte() {}
The first example is what most of the developers in my company (we used to do VB 6) have always used. I personaly prefer the third example as it is the easiest to understand what is happening.
So what are the positives and negative to each approach (option 2 and 3 are very similar I know)?
EDIT
So does anyone know of a reason to avoid ReDim other that because it is a holdover from the VB days?
Not that I won't accept that as the answer if that is all anyone has!
I recommend: oBytes = New Byte() {}
You should try to avoid "classic VB-isms" like Redim, and other holdovers from the classic VB days. I would recommend the third option.
Edit
To provide some more information about why to avoid it, see this MSDN page. While the page doesn't specifically advise against it, you can see that Redim suffers from shortcomings (and potential for confusion) that the other syntax does not.
Redim can only be used on existing arrays. Even so, it is semantically equivalent to declaring a new array. Redim releases the old array and creates a new one (so it isn't as if Redim has the ability to "tack on" or "chop off" elements). Additionally, it is destructive unless the Preserve keyword is used, even though there is no visual indication that an assignment is taking place.
Because Redim cannot create an array (but can only work on existing arrays), it can only be used within a procedure; at the class level you're forced to use the New Byte() {} method, leaving you with two visually distinct patterns for assigning new arrays, even though they're semantically identical.