Using Orderby on BatchedJoinBlock(Of T1, T2) - Dataflow (Task Parallel Library) - vb.net

I'm just looking to be able to sort the results of a BatchedJoinBlock (http://msdn.microsoft.com/en-us/library/hh194683.aspx) so that the different results of the different targets stay together. I will explain! Example in some pseudo-code:
Dim batchedJoin = New BatchedJoinBlock(Of String, object)(4)
batchedJoin.Target1.Post("String1Target1")
batchedJoin.Target2.Post(CType(BuildIt, StringBuilder1))
batchedJoin.Target1.Post("String1Target2")
batchedJoin.Target2.Post(CType(BuildIt, StringBuilder2))
Dim results = batchedJoin.Receive()
'This sorts one result...
Dim SortByResult = results.Item1.OrderBy(Function(item) item.ToString, New NaturalStringComparer)
Basically I've got a string and an object, the SortByResult variable above sorts the strings exactly as I'd like them to sort. I'm looking for a way to get the objects that used to be at the same index number in target2 into the same order. e.g. if "String1Target1" changes order I'd like to somehow reliably refer to/pair it together with "StringBuilder1". The actual end result just needs to be that the objects (target2) are sorted in the order that is dictated by the strings being sorted (target1). Something like:
Dim EndResult = results.Item2.OrderBy(strings in target1)
but I'll gladly take an intermediate solution! I've also tried using a dictionary (results.Item2.ToDictionary) with the string as a key (which would also be a fine solution) but it's a bit beyond my ken using lamba expressions in the proper context. I can realistically do this in several steps with a list or something, but I'm trying to get something more efficient/learn something, and it seems like there's a lot of default options with the results of the jointblock that I'm just not experienced enough to use. Thanks in advance for any help you can provide!

To me, it looks like you don't actually want BatchedJoinBlock, because the two pieces of data always come together. A better option for that would be a BatchBlock of Tuple<string, object>. When you have that, you can then use LINQ directly to sort each batch:
results.OrderBy(Function(tuple) tuple.Item1)

Related

Create a List of elements from a DataTable LINQ Column

I would like to know how I can convert elements of a column of a DataTable to a list of type string, grouping the elements to avoid repetition.
For example my DataTable would look like this
DataTable
and I want to make a list containing the elements of only "User" without repeating itself using LINQ.
The code I was trying to use is
InvoiceList = InvoiceDT.AsEnumerable().GroupBy(Function(r) r("User").ToString).ToList(Function(g) g.ToList())
But it doesn't work for me since I am new to LINQ and still have problems forming the structures.
I'd use this:
InvoiceList = InvoiceDT.AsEnumerable().Select(Function(r) r("User").ToString()).Distinct().ToList()
If you wanted a GroupBy solution it's
InvoiceList = InvoiceDT.AsEnumerable().GroupBy(Function(r) r("User").ToString()).Select(Function(g) g.Key).ToList()
Where your code went wrong was in trying to pass a delegate to ToList; it doesn't take one (and you wouldn't ToList the g either, as it's a list of data rows with all varying properties).
To reshape our IGrouping (something like a list of objects that all share the same Key, which is a property of the list that the IGrouping represents) produced by the groupby into a sequence of string Keys we Select the Key, and then ToList that
There is a lot of back and forthing between developers over things like ToList vs ToArray - some people universally use ToList because, for collections of an unknown number of elements, both list and array will grow and resize repeatedly in the same way but using ToArray requires one additional resizing step at the end to trim off any unused slots. Mostly that's trivial in terms of an overall performance consideration and should be weighed against the benefit of releasing the memory with the trim. Getting into finer details is way beyond the scope of this answer but you can read some huge blog posts about it.
I personally think it's more important to generate sensible code by calling the method that results in the relevant type depending on what you plan to do with it; I ToList if I need List functionality (add/insert/remove).. I prefer ToArray if an array suits the follow-on purposes (read/write/random access, no insert or delete), and if I'll only ever enumerate it I don't To... anything at all - I just ForEach the result of the query, which can give a bigger performance boost than anything else because it means I may not have to enumerate the entire set (if I stop early) or allocate memory all at once for doing so (if I'm writing to a socket or file)
On the use of ToString; it's worth avoiding if you think you'll fall into a pattern where you do it on every column just to get a string. If the column is already a string it's an acceptable way to get the object that DataRow.Item gives you, into a string. If the column is another type it's better to cast it:
DirectCast(r("Age"), Integer)
r.Field(Of Integer)("Age")
Thing is, it's verbose, and ugly, and intellisense doesn't help you out with writing Age or knowing it's an Int. LINQ in VB is bad enough for verbosity without pouring gas on that fire. If you're working with datatables of a known structure, it's a lot nicer if you make strongly typed ones:
Add a new file of type DataSet to your project
Open it so the design surface appears. In the properties grid call it something reasonable, such as AccountsDataSet
Right click, Add Table, call it Invoices
Right click the emppty table, Add Column, call it User
Then use it like:
Dim dt as new AccountsDataSet.InvoicesDataTable
Populate it like:
dt.AddInvoicesRow("John Smith", ... other properties here)
Query it like:
dt.Select(Function(r) r.User).Distinct()
Much nicer than accessing column names by string, and having them be objects that need casting..
Consider the dataset generator as a way to quickly, visually, create poco classes with named, typed properties
Try this
dim list as List(of string) = InvoiceDT.Rows.
Cast(of DataRow)().
Select(Function(r) r("User").ToString()).
Distinct().
ToList()
Here you cast Row collection as IEnumerable(of DataRow), rest is trivial

Two queries related in UniObjects for .NET

Context
I have a interface in VB.NET that extract the data from the UniVerse using UniObjects for .NET
Problem
From the COB file I need to get all keys where the FEC.COB field is equal to a specific date and the field SEC is equal to 04.
An expert in UniVerse Database told me that I can run the follow queries:
SELECT COB WITH FEC.COB > “31/10/2013”
SELECT.ID 1 2 04
But I don't know how can I do that with UniObjects library. Can anyone help me?
I don't use UniObjects as my shop normally gets data our of UniVerse via ODBC. Also my VB is bad, so I don't have much metacode for you, but the basic idea would be to do something like this.
1.) Create a UV Session. Hopefully you have that much worked out as I can be of next to no help there.
2.) Once the session is established Execute your query by doing something like this
session.Command.Text = "SELECT COB WITH FEC.COB > '31/10/2013'"
session.Command.Exec
(I converted your double quotes to single quotes and Universe won't mind).
3.) If you just need the IDs, you can get them by iterating through the select list that your query returns. A command line query will always return to list 0 unless you specify otherwise in your UV query. In most cases your results will be in session.SelectList(0)
Dim objSelect As object
Set objSelect = objSession.SelectList(0)
4.) It looks like the SelectList object has a ReadList method which returns a Dynamic Array Object, which you should be able to iterate through using normal array looping. Additionally you can use a while loop and next to do what you need to do.
Dim someObject as Object
someObject = objSelect.Next ' Get first ID
Do While Not objSelect.LastRecordRead
' Do something here with someObject. Maybe ToString it or something
someObject = objSelect.Next' Get next ID
Loop
Hope that is somewhat helpful.

VB.Net - Looking for Design Pattern - For Each Instead of For with index

Using VB.Net, I'm looking for best practices in dealing with the following idiom:
For i as Integer = 0 To o1.Count - 1
o1(i).x = o2(i).x
Next
What I really want is something VB.Net doesn't offer -- additional and simultaneous iterators on For Each statements.
For Each m1 As c1 In o1, m2 As c2 In o2
m1.x = m2.x
Next
I'm interested in both Linq and non-Linq recommendations, comments about Copy method design, comparisions to C# or other languages.
I'm sure your actual real world situation is more complicated than the example you have shown, but I thought it was worth mentioning that, if all you are doing is trying to copy an entire list of items from one list to another, you can simply do something like this:
Dim list2 = New List(Of Object)(list1)
I'm not entirely sure what you are trying to do with this. We could probably help you better if you elaborate a little more on your problem.
If both lists are of the same Type you could append one list to the other and then use a for each on just one list.
I submitted a request to extend the For Each statement to implement additional and simultaneous iterators.
http://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2923571-allow-additional-clauses-on-for-each-similar-to-c-
I'm still interested in other solutions but until then, this is the answer.

Do you choose Linq over Forloops?

Given a datatable containing two columns like this:
Private Function CreateDataTable() As DataTable
Dim customerTable As New DataTable("Customers")
customerTable.Columns.Add(New DataColumn("Id", GetType(System.Int32)))
customerTable.Columns.Add(New DataColumn("Name", GetType(System.String)))
Dim row1 = customerTable.NewRow()
row1.Item("Id") = 1
row1.Item("Name") = "Customer 1"
customerTable.Rows.Add(row1)
Dim row2 = customerTable.NewRow()
row2.Item("Id") = 2
row2.Item("Name") = "Customer 2"
customerTable.Rows.Add(row2)
Dim row3 = customerTable.NewRow()
row3.Item("Id") = 3
row3.Item("Name") = "Customer 3"
customerTable.Rows.Add(row3)
Return customerTable
End Function
Would you use this snippet to retrieve a List(Of Integer) containing all Id's:
Dim table = CreateDataTable()
Dim list1 As New List(Of Integer)
For i As Integer = 0 To table.Rows.Count - 1
list1.Add(CType(table.Rows(i)("Id"), Integer))
Next
Or rather this one:
Dim list2 = (From r In table.AsEnumerable _
Select r.Field(Of Integer)("Id")).ToList()
This is not a question about whether to type cast the Id column to Integer by using .Field(Of Integer), CType, CInt, DirectCast or whatever but generally about whether or not you choose Linq over forloops as the subject implies.
For those who are interested: I ran some iterations with both versions which resulted in the following performance graph:
graph http://dnlmpq.blu.livefilestore.com/y1pOeqhqQ5neNRMs8YpLRlb_l8IS_sQYswJkg17q8i1K3SjTjgsE4O97Re_idshf2BxhpGdgHTD2aWNKjyVKWrQmB0J1FffQoWh/analysis.png?psid=1
The vertical axis shows the milliseconds it took the code to convert the rows' ids into a generic list with the number of rows shown on the horizontal axis. The blue line resulted from the imperative approach (forloop), the red line from the declarative code (linq).
Whatever way you generally choose: Why do you go that way and not the other?
Whenever possible I favor the declarative way of programming instead of imperative. When you use a declarative approach the CLR can optimize the code based on the characteristics of the machine. For example if it has multiple cores it could parallelize the execution while if you use an imperative for loop you are basically locking this possibility. Today maybe there's no big difference but I think that in the future more and more extensions like PLINQ will appear allowing better optimization.
I avoid linq unless it helps readability a lot, because it completely destroys edit-and-continue.
When they fix that, I will probably start using it more, because I do like the syntax a lot for some things.
For almost everything I've done I've come to the conclusion that LINQ is optimized enough. If I handcrafted a for loop it would have better performance, but in the grand scheme of things we are usually talking milliseconds. Since I rarely have a situation where those milliseconds will make any kind of impact, I find it's much more important to have readable code with clear intentions. I would much rather have a call that is 50ms slower than have someone come along and break it altogether!
Resharper has a cool feature that will flag and convert loops into Linq expressions. I will flip it to the Linq version and see if that hurts or helps readability. If the Linq expression more clearly communicates the intent of the code, I will go with that. If the Linq expression is unreadable, I will flip back to the foreach version.
Most of the performance issues don't really compare with readability for me.
Clarity trumps cleverness.
In the above example, I would go with the the Linq version since it clearly explains the intent and also locks out people accidently adding side effects in the loop.
I recently found myself wondering whether I've been totally spoiled by LINQ. Yes, I now use it all the time to pick all sort of things out from all sort of collections.
I started to, but found out in some cases, I saved time by using this approach:
for (var i = 0, len = list.Count; i < len; i++) { .. }
Not necessarily in all cases, but some. Most extension methods use the foreach approach of querying.
I try to follow these rules:
Whenever I'm just querying (filtering, projecting, ...) collections, use LINQ.
As soon as I'm actually 'doing' something with the result (i.e, introduce side effects), I'll use a for loop.
So in this example, I'll use LINQ.
Also, I always try to split up the 'query definition' from the 'query evaluation':
Dim query = From r In table.AsEnumerable()
Select r.Field(Of Integer)("Id")
Dim result = query.ToList()
This makes it clear when that (in this case in-memory) query will be evaluated.

sorting and getting uniques

i have a string that looks like this
"apples,fish,oranges,bananas,fish"
i want to be able to sort this list and get only the uniques. how do i do it in vb.net? please provide code
A lot of your questions are quite basic, so rather than providing the code I'm going to provide the thought process and let you learn from implementing it.
Firstly, you have a string that contains multiple items separated by commas, so you're going to need to split the string at the commas to get a list. You can use String.Split for that.
You can then use some of the extension methods for IEnumerable<T> to filter and order the list. The ones to look at are Enumerable.Distinct and Enumerable.OrderBy. You can either write these as normal methods, or use Linq syntax.
If you need to get it back into a comma-separated string, then you'll need to re-join the strings using the String.Join method. Note that this needs an array so Enumerable.ToArray will be useful in conjunction.
You can do it using LINQ, like this:
Dim input = "apples,fish,oranges,bananas,fish"
Dim strings = input.Split(","c).Distinct().OrderBy(Function(s) s)
I'm not a VB.NET programmer, but I can give you a suggestion:
Split the string into an array
Create a second array
Cycle through the first array, adding any value that is not in the second.
Upon completion, your second array will have only unique values.