Using Linq for Object Dataset Processing - vb.net

I have a collection (IList(Of Sample)) of the following class:
Public Class Sample
Public sampleNum As String
Public volume As Integer
Public initial As Single
Public final As Single
End Class
This collection is filled from a regex that gets passed over a file.
What I want to do is use Linq to generate a collection of these for each unique samplenum using the following conditions.
For each samplenum:
Have the highest volume where the final is greater then one
If the sample has multiple records for this volume then pick the one with the the highest final
If the previous step leaves us with no records pick the record with the highest final ignoring volume
I am extremely new to Linq and just can't get my head around this. For now I have solved this using for each's and temporary lists but I am interested in how this would be handled using pure Linq.
Sample Data:
samplenum | volume | initial | final
1 | 50 | 8.47 | 6.87
1 | 300 | 8.93 | 3.15
2 | 5 | 8.28 | 6.48
2 | 10 | 8.18 | 5.63
2 | 5 | 8.33 | 6.63
2 | 10 | 8.26 | 5.58
3 | 1 | 8.31 | 0.75
3 | 5 | 8.19 | 0.03
4 | 50 | 8.28 | 6.55
4 | 300 | 7.19 | 0.03

This should hopefully solve your problems:
Dim source As IEnumerable(Of Sample)
' Get the data...
Dim processed = source _
.GroupBy(Function(s) s.sampleNum) _
.Select(Function(s) Process(s))
Dim array = processed.ToArray()
Console.ReadLine()
The Process function:
Private Function Process(ByVal sequence As IEnumerable(Of Sample)) As Sample
Dim filtered = (
From s In sequence
Where s.final > 1
Order By
s.volume Descending,
s.final Descending
)
' If we don't have any elements after the filtering,
' return the one with the highest final.
' Otherwise, return the first element.
If Not filtered.Any() Then
Return (From s In sequence Order By s.final Descending).FirstOrDefault()
Else
Return filtered.First()
End If
End Function

Try this. I haven't tried it but it should do what you want. There is probs a better way of doing this:
' For each sample number in the list
For Each el In (From p In lst Select p.sampleNum).Distinct()
' can cause odd results in some cases so always put the foreach var into another var
Dim var As String = el
' get another "list" but for this sample num
Dim res As IEnumerable(Of Sample) = lst.Where(Function(p) p.volume > 1 AndAlso p.sampleNum = var)
Dim sam As Sample ' the result
If Not res Is Nothing And res.Any() Then
' we have a result, so get the first result where the
sam = res.Where(Function(p) p.volume = res.Max(Function(x) x.volume)).First()
Else
' we have no results, so resort back to the normal list, for this sample number
sam = lst.Where(Function(p) p.sampleNum = var AndAlso p.volume = lst.Max(Function(x) x.volume)).First()
End If
'
' do what ever with the sample here
'
Next

Related

How to eval a string containing column names?

As I cannot attach a conditional formatting on a Table, I need an abstract function to chech if a set of records or all records have errors inside, and show these errors into forms and/or reports.
Because, to achieve this goal in the 'standard' mode, I have to define the rule [○for a field of a table every time I use that field in a control or report, and this means the need to repeate the same things an annoying lot of times, not to tell about introducing errors and resulting in a maintenance nightmare.
So, my idea is to define all the check for all the tables and their rows in an CheckError-table, like the following fragment related to the table 'Persone':
TableName | FieldName | TestNumber | TestCode | TestMessage | ErrorType[/B][/COLOR]
Persone | CAP | 4 | len([CAP]) = 0 or isnull([cap]) | CAP mancante | warning
Persone | Codice Fiscale | 1 | len([Codice Fiscale]) < 16 | Codice fiscale nullo o mancante | error
Persone | Data di nascita | 2 | (now() - [Data di nascita]) < 18 * 365 | Minorenne | info
Persone | mail | 5 | len([mail)] = 0 or isnull([mail] | email mancante | warning
Persone | mail | 6 | (len([mail)] = 0 or isnull([mail]) | richiesto l'invio dei referti via e- mail, | error
| | | and [modalità ritiro referti] = ""e-mail"" | ma l'indirizzo e-mail è mancante |
Persone | Via | 3 | len([Via]) = 0 or isnull([Via]) | Indirizzo mancante | warning
Now, in each form or report which use the table Persona, I want to set an 'onload' property to a function
' to validate all fields in all rows and set the appropriate bg and fg color
Private Sub Form_Open(Cancel As Integer)
Call validazione.validazione(Form, "Persone", 0)
End Sub
' to validate all fields in the row identified by ID and set the appropriate bg and fg color
Private Sub Codice_Fiscale_LostFocus()
Call validazione.validazione(Form, "Persone", ID)
End Sub
So, the function validazione, at a certain point, as exactly one row for the table Persone, and the set of expressions described in the column [TestCode] above.
Now, I need to logically evaluate the TestString against the table row, to obtain a true or a false.
If true, I'll set the fg and bg color of the field as normal
if false, I'll set the the fg and bg color as per error, info or warning, as defined by the column [ErrorType] above.
All the above is easy, ready, and running, except for the red statement above:
How can I evaluate the teststring against the table row, to obtain a result?
Thank you
Paolo

Excel/VBA/Conditional Formatting: Dictionary of Dictionaries

I've got an Excel workbook that obtains data from an MS SQL database. One of the sheets is used to check the data against requirements and to highlight faults. In order to do that, I've got a requirements sheet where the requirement is in a named range; after updating the data I copy the conditional formatting of the table header to all data rows. That works pretty nicely so far. The problem comes when I have more than one set of requirements:
An (agreeably silly) example could be car racing, where requirements may exist for driver's license and min/max horsepower. When looking at the example, please imagine there are a few thousand rows and 71 columns presently...
+-----+--------+----------------+------------+---------+
| Car | Race | RequirementSet | Horsepower | License |
+-----+--------+----------------+------------+---------+
| 1 | Monaco | 2 | 200 | A |
+-----+--------+----------------+------------+---------+
| 2 | Monaco | 2 | 400 | B |
+-----+--------+----------------+------------+---------+
| 3 | Japan | 3 | 200 | C |
+-----+--------+----------------+------------+---------+
| 4 | Japan | 3 | 300 | A |
+-----+--------+----------------+------------+---------+
| 5 | Japan | 3 | 350 | B |
+-----+--------+----------------+------------+---------+
| 6 | Mexico | 1 | 200 | A |
+-----+--------+----------------+------------+---------+
The individual data now needs to be checked against the requirements set in another sheet:
+-------------+---------------+---------------+---------+
| Requirement | MinHorsepower | MaxHorsepower | License |
+-------------+---------------+---------------+---------+
| 1 | 200 | 250 | A |
+-------------+---------------+---------------+---------+
| 2 | 250 | 500 | B |
+-------------+---------------+---------------+---------+
| 3 | 250 | 400 | A |
+-------------+---------------+---------------+---------+
In order to relate back to my present situation, I am only looking at either the Monaco, Japan or Mexico Race, and there is only 1 record in the requirements sheet, where the value in e.g. Cell B2 is always the MinHorsepower and the value in C2 is always the MaxHorsepower. So these cells are a named range that I can access in my data sheet.
Now however I would like to obtain all races at once, and refer conditional formatting formulas to the particular requirement.
Focussing on "Horsepower" in Monaco (requirement set 2), I can now find out that the min Horsepower is 250 and the max is 500 - so I will colour that column for car 1 as red and for car 2 as green.
The formula is programatically copied from the header row (the first conditional format rule is if row(D1) = 1 then do nothing)
I can't decide what the best approach to the problem is. Ideally, the formula is readable, something like `AND(D2 >= MinHorsepower; D2 <= MaxHorsepower) - I cannot imagine it to be maintainable if I had to use Vlookup combined with Indirect and Match to match a column header in requirements for that particular requirement - especially when it comes to combining criteria like in the HP example with min and max above.
I am wondering if I should read the requirements table into a dictionary or something in VBA, and then use a function like
public function check(requirementId as int, requirement$)
which then in Excel I could use like =D2 >= check(c2, "MinHorsepower")
Playing around with this a little bit it appears to be pretty slow as opposed to the previous system where I could only have one requirement. It would be fantastic if you could help me out with a fresh approach to this problem. I'll update this question as I go along; I'm not sure if I managed to illustrate the example really well but the actual data wouldn't mean anything to you.
In any case, thanks for hanging in until here!
Edit 29 October 2016
I have found a solution as basis for mine. Using the following code I can add my whole requirements table to a dictionary, and access the requirement.
Using a class clsRangeToDictionary (based on Tim Williams clsMatrix)
Option Explicit
Private m_array As Variant
Private dictRows As Object
Private dictColumns As Object
Public Sub Init(vArray As Variant)
Dim i As Long
Set dictRows = CreateObject("Scripting.Dictionary")
Set dictColumns = CreateObject("Scripting.Dictionary")
'add the row keys and positions. Skip the first row as it contains the column key
For i = LBound(vArray, 1) + 1 To UBound(vArray, 1)
dictRows.Add vArray(i, 1), i
Next i
'add the column keys and positions, skipping the first column
For i = LBound(vArray, 2) + 1 To UBound(vArray, 2)
dictColumns.Add vArray(1, i), i
Next i
' store the array for future use
m_array = vArray
End Sub
Public Function GetValue(rowKey, colKey) As Variant
If dictRows.Exists(rowKey) And dictColumns.Exists(colKey) Then
GetValue = m_array(dictRows(rowKey), dictColumns(colKey))
Else
Err.Raise 1000, "clsRangeToDictionary:GetValue", "The requested row key " & CStr(rowKey) & " or column Key " & CStr(colKey) & " does not exist"
End If
End Function
' return a zero-based array of RowKeys
Public Function RowKeys() As Variant
RowKeys = dictRows.Keys
End Function
' return a zero-based array of ColumnKeys
Public Function ColumnKeys() As Variant
ColumnKeys = dictColumns.Keys
End Function
I can now read the whole RequirementSet table into a dictionary and write a helper to obtain the particular requirement roughly so:
myDictionaryObject.GetValue(table1's RequirementSet, "MinHorsePower")
If someone could help me figure out how to put this into an answer giving the credit due to Tim Williams that'd be great.

VB.NET - Combine rows in DataTable based on shared value

I'm trying to combine rows in a DataTable based on their shared ID. The table data looks something like this:
Member | ID | Assistant | Content
---------------------------------------------
16 | 1234 | jkaufman | 1/1/2015 - stuff1
16 | 1234 | jkaufman | 1/2/2015 - stuff2
16 | 4321 | mhatfield | 1/3/2015 - stuff3
16 | 4321 | mhatfield | 1/4/2015 - stuff4
16 | 4321 | mhatfield | 1/5/2015 - stuff5
16 | 5678 | psmith | 1/6/2015 - stuff6
I want to combine rows based on matching IDs. There are two steps I could use some clarification on. The first is merging the rows. The second is combining the Content columns so that the contents aren't lost. For the example above, here's what I want:
Member | ID | Assistant | Content
-------------------------------------------------------------------------------------------
16 | 1234 | jkaufman | 1/1/2015 - stuff1 \r\n 1/2/2015 - stuff2
16 | 4321 | mhatfield | 1/3/2015 - stuff3 \r\n 1/4/2015 - stuff4 \r\n 1/5/2015 - stuff5
16 | 5678 | psmith | 1/6/2015 - stuff6
My eventual goal is copy the DataTable to an Excel spreadsheet so I'm not sure sure if the \r\n is the correct newline character but that's the least of my concerns at this point.
Here's my code right now (EDIT: updated to current code):
Dim tmpRow As DataRow
dtFinal = dt.Clone()
Dim i As Integer = 0
While i < dt.Rows.Count
tmpRow = dtFinal.NewRow()
tmpRow.ItemArray = dt.Rows(i).ItemArray.Clone()
Dim j As Integer = i + 1
While j <= dt.Rows.Count
If j = dt.Rows.Count Then 'if we've iterated off the end of the datset
i = j
Exit While
End If
If dt.Rows(i).Item("ID") = dt.Rows(j).Item("ID") Then 'if we've found another entry for this id
'append change to tmpRow
tmpRow.Item("Content") = tmpRow.Item("Content").ToString & Environment.NewLine & dt.Rows(j).Item("Content").ToString
Else 'if we've run out of entries to combine
i = j
Exit While
End If
j += 1
End While
'add our combined row to the final result
dtFinal.ImportRow(tmpRow)
End While
When I export the final table to Excel, the spreadsheet is blank so I'm definitely doing something wrong.
Any help would be fantastic. Thanks!
I see various problems with your approach (with both versions; but the second one seems better). That's why I have preferred to write a whole working code to help transmit my ideas clearly.
Dim dtFinal As DataTable = New DataTable
For Each col As DataColumn In dt.Columns
dtFinal.Columns.Add(col.ColumnName, col.DataType)
Next
Dim oldRow As Integer = -1
Dim row As Integer = -1
While oldRow < dt.Rows.Count - 1
dtFinal.Rows.Add()
row = row + 1
oldRow = oldRow + 1
Dim curID As String = dt.Rows(oldRow)(1).ToString()
Dim lastCol As String = ""
While (oldRow < dt.Rows.Count AndAlso dt.Rows(oldRow)(1).ToString() = curID)
lastCol = lastCol & dt.Rows(oldRow)(3).ToString() & Environment.NewLine
oldRow = oldRow + 1
End While
oldRow = oldRow - 1
For i As Integer = 0 To 2
dtFinal.Rows(row)(i) = dt.Rows(oldRow)(i)
Next
dtFinal.Rows(row)(3) = lastCol
End While
Note that trying to come up with the most "elegant" solution or to maximise the given in-built functionalities might not be the best way to face certain situations. In the problem you propose, for example, I think that it is better going step by step (and reducing code size/improving elegance only after a properly working version is in place). This is the kind of code I have tried to create here: a simple one delivering what is expected (I think that this is the exact functionality you want; in any case, bear in mind that I am including a simplistic code which you are expected to take as a mere help to understand the point).
I find the VB syntax clunky compared to how it would be in C#, but you may prefer this Linq with grouping solution:
Dim merge = (From rw In dt.Rows.OfType(Of DataRow)()
Group rw By
New With {.fld1 = rw(0)}.fld1,
New With {.fld2 = rw(1)}.fld2,
New With {.fld3 = rw(2)}.fld3 Into Group).
Select(Function(x)
Return New With {.Member = x.fld1,
.ID = x.fld2,
.Assistant = x.fld3,
.Content = String.Join("", x.Group.Select(Function(y)
Return String.Join("", y.ItemArray)
End Function))}
End Function)

VBA Excel 2010: Inserting values from dictionary into new row

I'm having trouble with some VBA for Excel 2010. I have a list of names that have different serial numbers associated with them. The following code will look at a name in column A, look it up in the names dictionary for an array of serial numbers associated with this name, and print out each number in a new column.
Names Dictionary:
Names("Jane B") = [111112, 22222]
Output:
|Joe A | 11111
|Jane B | 111112| 22222 |
|Jim C | 11111 | 121212 | 1122112
Code:
Dim name, counter
For i = 2 To Worksheets("Contacts").UsedRange.Rows.Count
name = Worksheets("Contact").Cells(i, 1)
counter = 0
If names.Exists(name) Then
For Each serial In names(name)
Worksheets("Contact").Cells(i, 2+counter).Value = serial
counter = counter + 1
Next serial
End If
Next i
So far, so good. But the output format isn't good for inputting into Access. Instead, I'd like to have the following format:
|Joe A | 11111
|Jane B | 111112
|Jane B | 22222
|Jim C | 11111
|Jim C | 121212
|Jim C | 1122112
Here's my code:
Dim name, counter
For i = 2 To Worksheets("Contact").UsedRange.Rows.Count
name= Worksheets("Contact").Cells(i, 1)
counter = 0
If names.Exists(name) Then
For Each serial In names(name)
Worksheets("Contact").Cells(i + counter, 2).Value = serial
Worksheets("Contact").Cells(i + counter, 1).Value = name
Worksheets("Contact").Cells(i + counter + 1, 1).EntireRow.Insert
counter = counter + 1
Next serial
End If
Next i
This is where I run into a problem. My output looks like this:
|Joe A | 11111
|Joe A | 1700
|Joe A | 1700
|Joe A | 1700
|Joe A | 1700
|Joe A | 1700
|Joe A | 1700
While the numbers are all made up, the 1700 output is actually what is outputting, although that doesn't relate to any serial number (???).
Can anyone spot what's off in my code?
Thank you all for your time and consideration.
With gratitude,
Zac
Try this: Use a new sheet (example: "NewContactSheet").
Instead of inserting rows to the current contact sheet, which makes you insert a row then scan the next row (the one you just inserted) and insert it again and again.
Then scan the contact sheet one row at a time, and compare to the dictionary exactly as you are. Then, one serial at a time per Name, you add cell 1 and 2 on the new sheet and increment the row.
Without the dictionary to test with, and based on the original post saying "So far so good"...
Sub SerialNameMover()
Dim name As String
Dim counter As Integer
Dim lastContactRow As Integer
Dim newSheet As String
Dim nRow As Integer
Dim i As Integer
newSheet = "NewContactSheet"
nRow = 2
lastContactRow = Worksheets("Contact").UsedRange.Rows.Count
For i = 2 To lastContactRow
name = Sheets("Contact").Cells(i, 1)
If Names.Exists(name) Then
For Each serial In Names(name)
Sheets(newSheet).Cells(nRow, 1) = name
Sheets(newSheet).Cells(nRow, 2) = serial
nRow = nRow + 1
Next serial
End If
Next i
End Sub

Dynamically select a column from a generic list

I have a table that is 200 columns wide and need to return the data of a specific row and column but I won't know the column until runtime. I can easily get the row I want into either a list, an individual strongly typed object, or an Array through LINQ but I can't for the life of me figure out how to find the column I need.
So For instance (on a smaller scale) my table looks like this
GrowerKey | day1 | day2 | day3 | day4 |
-----------------------------------------
3 | 1 | 3 | 2 | 2 |
4 | 6 | 1 | 9 | 1 |
5 | 8 | 8 | 2 | 4 |
and I can get the row I want with something simple like this
Dim CleanRecord As List(Of Grower_Clean_Schedule) = (From key In eng.Grower_Clean_Schedules
Where key.Grower_Key = Grower_Key).ToList
how do I then return only the value of a specific column of that row (like say the value stored in "day2") When I won't know which column until runtime?
Something like this (starting with CleanRecord which you defined in your question):
dim matchingRow = CleanRecord.First()
dim props = matchingRow.GetType().GetProperties( _
BindingFlags.Instance or BindingFlags.Public))
dim myReturnVal = (from prop in props _
where prop.Name = "day2" _
select prop.GetValue(matchingRow, Nothing).FirstOrDefault()
return myReturnVal