VB LINQ - Take one random row from each group - vb.net

I'm trying to get one random row from each group of rows. I'm trying to use LINQ, but I'm not sure if that's the right approach. I'd like a Dictionary of Key/Name pairs.
My table is as such:
AnswerGroup AnswerKey AnswerName
---------------------------------------------
1 1 Yes
1 2 No
2 1 Never
2 2 A little bit
2 3 Mostly
2 4 Always
3 1 White
3 2 African American
3 3 Hispanic
3 4 Asian or Pacific Islander
For each AnswerGroup I need to choose a random Key/Name pair.
I have the beginnings of a LINQ query, but frankly I'm lost as I don't understand LINQ grouping and how to add an Enumerable.Take(1) to the group.
Dim answerGroup As String = "AnswerGroup"
Dim answerKey As String = "AnswerKey"
Dim answerName As String = "AnswerName"
Dim query = _
From rows As DataRow In surveyAnswerKeys.Rows _
Order By rows(answerGroup) _
Group By questionSortKey = rows(answerGroup) _
Into questionGroups = Group
Any help would be appreciated. Thanks!
Edit:
I can expand the following query in the debugger to see an In Memory Query that produces a series of DataRows. When I hover over questionGroups it says it's a IEnumerable(Of Object). When I try to run that query into a list or DataTable I get error:
"Public member 'ToTable' on type 'WhereSelectEnumerableIterator(Of
VB$AnonymousType_0(Of Object,IEnumerable(Of Object)),Object)' not
found."
Dim answerGroup As String = "QuestionSortKey"
Dim answerNo As String = "AnswerNo"
Dim surveyDefinitionNo As String = "Pk_SurveyDefinitionNo"
Dim query = _
From rows In surveyAnswerKeys.Rows _
Where rows(answerNo) IsNot Nothing _
Order By Guid.NewGuid() _
Group By questionSortKey = rows(answerGroup) _
Into questionGroups = Group _
Select questionGroups.First()
Dim randomAnswerNos As DataTable = query.ToTable

One quick way to shuffle items is to sort by a "random" value - Guid.NewGuid() usually works well enough. Then just pull the first row from each group:
Dim query = _
From rows As DataRow In surveyAnswerKeys.Rows _
Order By Guid.NewGuid() _
Group By questionSortKey = rows(answerGroup) _
Into questionGroups = Group _
Select questionGroups.First()

Linq can't be used to pull a random row. I'd suggest you store all of the rows in a table and manually loop through each group. Then based on the number of rows in each group, generate a random number and pick that row. Only use LINQ to do the query and retrieve the results.

Related

Get SUM of Unique Value in a collection

I'm looking to get the SUM of unique values in an excel worksheet using VB.net.
I am using a collection
So far my code gets me the Distinct Values, however I'm stumped on the Count side of things.
I feel like I'm close, but something is missing...
My data could look like:
Apple
Apple
Peach
Cherry
I'm looking for Results to be:
Apple 2
Peach 1
Cherry 1
This is where I am:
MySub:
Dim c, r As Range
Dim i As Integer
Dim dc As New Collection
Dim s As String
For Each c In r
dc.Add(c.Value, c.Value)
Next c
For i = 1 To dc.Count
s = dc.Item(i)
Next i
This produces my distinct list of values, but I'm not seeing how to obtain the SUM of those values.
Thanks for any pointers.
Assuming this really is VB.Net, you could use a Dictionary(Of String, Integer) like this:
Dim counts As New Dictionary(Of String, Integer)
For Each c In r
If Not counts.ContainsKey(c.Value) Then
counts.Add(c.Value, 0)
End If
counts.Item(c.Value) = counts.Item(c.Value) + 1
Next c
For Each pair In counts
Debug.Print(pair.Key & " " & pair.Value)
Next

SQL statement that selects array values

I am working on a visual basic project. I have a mdb database connected to my project. I want to add a SELECT query that finds the results which are in array that i give it on my program
I have tried to write a statement like that:
SELECT kodu, adi_soyadi, sectigi_ders_say
FROM ogrenciler
WHERE kodu IN ?
But it does not work. In my page codes I have an array and I want to find results from "ogrenciler" table where the "kodu" is in my array.
Well, you could send that array to a temp table in Access, but that would prevent more then one user using the software at the same time. (or you could add some user name to the temp table. However, if the array of choices is small, say about max 50, then you can create the sql string.
eg:
Dim MySQL As String = "SELECT * from tblHotels WHERE ID IN("
Dim IdList(5) As Integer
Dim i As Integer
For i = 1 To 5
IdList(i) = i
Next
Dim MyList As String = ""
For i = 1 To 5
If MyList <> "" Then MyList = MyList & ","
MyList = MyList & IdList(i)
Next
MySQL = MySQL & MyList & ")"
Using MyCon2 As New OleDbConnection(My.Settings.OLESQL)
Dim da As New OleDbDataAdapter(MySQL, MyCon2)
Dim rstDat As New DataTable()
da.Fill(rstDat)
For i = 0 To rstDat.Rows.Count - 1
Debug.Print(rstDat.Rows(i).Item("HotelName"))
Next ' etc etc. etc.
End Using
So you can use the SQL format of:
SELECT * FROM tblHotels where ID IN (1,2,3)
And thus build up the "list". The only downside to this approach is that the sql string is limited to 2000 characters. So, if your list is larger then say 50 or so items, then you have to adopt a different approach.

UDF with Intersect runs slow

So I am creating a function to replace some manual index/match formulas. Note that this function works, but my problem is with speed. So I have a PivotTable with 6 columns and approx. 200.000 rows. I want this to find the value (and I don't use the pivotfunctions, meaning that this is just a table in pivot format) I found that this runs faster than having it in a regular data table. Both would be imported from a SQL table.
A single piece of this formula runs instantly, but the performance slows down when I have a few hundreds in the same sheet.
So any ideas on how to speed this up?
Function getnum2(ByVal Comp As String, Period As String, Measure As String, Optional BU As String, _
Optional Country As String, Optional Table As String, Optional TableSheet As String) As Double
Dim pTable As PivotTable, wTableSheet As Worksheet
If BU = "" Then
BU = "Group"
End If
If Country = "" Then
Country = "Total"
End If
If TableSheet = "" Then
Set wTableSheet = Worksheets("Data")
Else
Set wTableSheet = Worksheets(TableSheet)
End If
If Table = "" Then
Set pTable = wTableSheet.PivotTables("PivotTable1")
Else
Set pTable = wTableSheet.PivotTables(Table)
End If
'Find match
If Intersect(pTable.PivotFields("Bank").PivotItems(Comp).DataRange.EntireRow, _
pTable.PivotFields("Date").PivotItems(Period).DataRange.EntireRow, _
pTable.PivotFields("Business Unit").PivotItems(BU).DataRange.EntireRow, _
pTable.PivotFields("Country").PivotItems(Country).DataRange.EntireRow, _
pTable.PivotFields("Name").PivotItems(Measure).DataRange) Is Nothing Then
getnum2 = "No match"
ElseIf Intersect(pTable.PivotFields("Bank").PivotItems(Comp).DataRange.EntireRow, _
pTable.PivotFields("Date").PivotItems(Period).DataRange.EntireRow, _
pTable.PivotFields("Business Unit").PivotItems(BU).DataRange.EntireRow, _
pTable.PivotFields("Country").PivotItems(Country).DataRange.EntireRow, _
pTable.PivotFields("Name").PivotItems(Measure).DataRange).Count > 1 Then
getnum2 = "More than 1 match"
Else
getnum2 = Intersect(pTable.PivotFields("Bank").PivotItems(Comp).DataRange.EntireRow, _
pTable.PivotFields("Date").PivotItems(Period).DataRange.EntireRow, _
pTable.PivotFields("Business Unit").PivotItems(BU).DataRange.EntireRow, _
pTable.PivotFields("Country").PivotItems(Country).DataRange.EntireRow, _
pTable.PivotFields("Name").PivotItems(Measure).DataRange)
End If
End Function
Rather than calling the function three times, you could use a variable:
Function getnum2(ByVal Comp As String, Period As String, Measure As String, Optional BU As String, _
Optional Country As String, Optional Table As String, Optional TableSheet As String) As Double
Dim pTable As PivotTable, wTableSheet As Worksheet
Dim rgResult as Range
If BU = "" Then
BU = "Group"
End If
If Country = "" Then
Country = "Total"
End If
If TableSheet = "" Then
Set wTableSheet = Worksheets("Data")
Else
Set wTableSheet = Worksheets(TableSheet)
End If
If Table = "" Then
Set pTable = wTableSheet.PivotTables("PivotTable1")
Else
Set pTable = wTableSheet.PivotTables(Table)
End If
'Find match
Set rgResult = Intersect(pTable.PivotFields("Bank").PivotItems(Comp).DataRange.EntireRow, _
pTable.PivotFields("Date").PivotItems(Period).DataRange.EntireRow, _
pTable.PivotFields("Business Unit").PivotItems(BU).DataRange.EntireRow, _
pTable.PivotFields("Country").PivotItems(Country).DataRange.EntireRow, _
pTable.PivotFields("Name").PivotItems(Measure).DataRange)
if rgResult Is Nothing Then
getnum2 = "No match"
ElseIf rgResult.Count > 1 Then
getnum2 = "More than 1 match"
Else
getnum2 = rgResult.Value
End If
End Function
One very simple way to achieve this is by using two PivotTables.
In PivotTable 1, put all fields but the numeric one you want to
return in the ROWS area, and put the field that you want to return in
the VALUES area with aggregation set to COUNT.
In PivotTable 2, put all fields but the numeric one you want to
return in the ROWS area, and put the field that you want to return in
the VALUES area with aggregation set to SUM or MIN or MAX (It doesn't
matter which).
Then you can use a paramatized GETPIVOTDATA function to check PivotTable 1 to see if the thing you're looking up is unique (i.e. COUNT = 1) and if so, then look up the SUM/MIN/MAX of that item in PivotTable2. Given the item is unique, then the SUM/MIN/MAX is only operating on one number, and so does nothing to it.
Here's how that looks, using simplified data:
I've added conditional formatting to the two Pivots to highlight multiple occurances where we want to return the text 'Multiple Items', and as you can see, the formula that is populating column 6 of the Lookup table is only returning unique items as per your requirements.
Here's the formula, using Table notation as my Lookup range has been turned into an Excel Table:
=IF(GETPIVOTDATA("6",$A$3,"1",[#1],"2",[#2],"3",[#3],"4",[#4],"5",[#5])=1,GETPIVOTDATA("6",$H$3,"1",[#1],"2",[#2],"3",[#3],"4",[#4],"5",[#5]),"Multiple Items")
If I randomise the input cells in the Lookup table, you can see what happens when some items aren't in the PivotTable:
This approach works because the field you want to return is a numeric one, meaning you can add it to the VALUES pane of the Pivot. But you could still use this to return strings, by adding a unique ID to the source data, such as the row number, and putting that in the VALUES field, then retrieving it with the double GETPIVOTDATA lookup and using it to retrieve the associated string in the source data.
Another approach is to simply concatenate your columns into a primary key using a suitable delimiter such as the pipe character | and then use that as your lookup key. If you did a binary search on sorted data, this would be lightning fast. (I discuss this at http://dailydoseofexcel.com/archives/2015/04/23/how-much-faster-is-the-double-vlookup-trick/ ). The down side is that you wouldn't be warned in the event that there were multiple items. But it would be possible to do a second lookup using the match position returned by the first, to see if you get another result, and if so then return "Multiple Items". This would be super-fast.
Here's the fastest way to do this: using Binary Match on a sorted lookup table.
On the left I have 5 columns x 1048575 rows of random numbers between 1 and 10. These have been concatenated in column G to make a non-unique key, and then sorted ascending on that key.
(Because the concatenated key is text, it gets sorted alphabetical from left to right, which is why 1|1|1|1|10 appears between 1|1|1|1|1 and 1|1|1|1|2)
I gave the data in Column G the named range of Concat to simplify the formula. My lookup formula in J2 returns the row number of the lookup item if and only if that item is unique to the dataset. The formula is:
=IF(OR( AND(INDEX(Concat,MATCH(I2,Concat,1))=I2, MATCH(I2,Concat,1)=1), AND(INDEX(Concat,MATCH(I2,Concat,1))=I2,INDEX(Concat,MATCH(I2,Concat,1)-1)<>I2)),MATCH(I2,Concat,1),NA())
This executes in 0.01 milliseconds for one instance, for a lookup table of 1048576 rows. My double GETPIVOTDATA approach above took 6 milliseconds. So there you have it: a complex formula that gives a 600 times efficiency boost.
I can explain the formula later in need, but note that some of the complexity is due to the edge case where you may have a unique item appearing in row 1. If I leave out that edge case, then the formula is as follows:
=IF( AND(INDEX(Concat,MATCH(I3,Concat,1))=I3,INDEX(Concat,MATCH(I3,Concat,1)-1)<>I3),MATCH(I3,Concat,1),NA())

How to count string occurences in a list(Of String)

I am looking to dynamically count from a list, how many times items have occured. I can do it below if I specify the value I am looking for, but what I am really looking to do is to iterate through my list, count occurences, and total them out. My current code is below:
Dim itemlist As New List(Of String)
itemlist.add("VALUE1")
itemlist.add("VALUE2")
itemlist.add("VALUE3")
Dim count As Integer = 0
For Each value In itemlist
If value.Equals("VALUE1") Then count += 1
Next
Msgbox(count.tostring)
So my point would be instead of searching for the value, let the app total them up and display the counted occurences it to the user, similar to a "COUNTIF" in excel. I cant find much on this without using LINQ, Thanks
You can do this very easily with LINQ:
Msgbox(itemlist.Where(Function(value) value = "VALUE1").Count)
To count duplicates, once again it's easy with LINQ:
Dim itemlist As New List(Of String)
itemlist.Add("RED")
itemlist.Add("RED")
itemlist.Add("RED")
itemlist.Add("GREEN")
dim groups = itemList.GroupBy(Function(value) value)
For Each grp In groups
Console.WriteLine(grp(0) & " - " & grp.Count )
Next
Output:
RED - 3
GREEN - 1

VBA - Access 03 - Iterating through a list box, with an if statement to evaluate

So I have a one list box with values like DeptA, DeptB, DeptC & DeptD. I have a method that causes these to automatically populate in this list box if they are applicable. So in other words, if they populate in this list box, I want the resulting logic to say they are "Yes" in a boolean field in the table.
So to accomplish this I am trying to use this example of iteration to cycle through the list box first of all, and it works great:
dim i as integer
dim myval as string
For i = o to me.lstResults.listcount - 1
myVal = lstResults.itemdata(i)
Next i
if i debug.print myval, i get the list of data items that i want from the list box. so now i am trying to evaluate that list so that I can have an UPDATE SQL statement to update the table as i need it to be done.
so, i know this is a mistake, but this is what i tried to do (giving it as an example so that you can see what i am trying to get to here)
dim sql as string
dim i as integer
dim myval as string
dim db as database
sql = "UPDATE tblMain SET "
for i = 0 to me.lstResults.listcount - 1
myval = lstResults.itemdata(i)
If MyVal = "DeptA" Then
sql = sql & "DeptA = Yes"
ElseIF myval = "DeptB" Then
sql = sql & "DeptB = Yes"
ElseIf MyVal = "DeptC" Then
sql = sql & "DeptC = Yes"
ElseIf MyVal = "DeptD" Then
sql = sql & "DeptD = Yes"
End If
Next i
debug.print (sql)
sql = sql & ";"
set db= currentdb
db.execute(sql)
msgbox "Good Luck!"
So you can see why this is going to cause problems because the listbox that these values (DeptA, DeptB, etc) automatically populate in are dynamic....there is rarely one value in the listbox, and the list of values changes per OrderID (what the form I am using this on populates information for in the first place; unique instance).
I am looking for something that will evaluate this list one at a time (i.e. iterate through the list of values, and look for "DeptA", and if it is found add yes to the SQL string, and if it not add no to the SQL string, then march on to the next iteration). Even though the listbox populates values dynamically, they are set values, meaning i know what could end up in it.
Thanks for any help,
Justin
I don't understand what you're trying to accomplish. However, I suspect your UPDATE statement needs a WHERE clause. ('WHERE OrderID = X', with X replaced by the OrderID of the row you're editing)
I suppose you could create a dictionary object with values initially set to False.
Dim dict As Object
Set dict = CreateObject("Scripting.Dictionary")
dict.Add "DeptA", False
dict.Add "DeptB", False
' .. etc. '
Then go through the items in your listbox, changing the dict value to True.
dict(myval) = True
Finally, build your UPDATE statement based on the dictionary values.
But that all seems like too much work to me. So now I'm wondering about your table structure. Is tblMain set up similar to this?:
OrderID DeptA DeptB DeptC DeptD
------- ----- ----- ----- -----
127 True False False True
If so, consider a related table for the Dept information.
OrderID Which_Department
------- ----------------
127 DeptA
127 DeptD
The rule of thumb governing this is "columns are expensive; rows are cheap".
Edit: Seems to me you have two sets of items: SetA is all possible items; SetB is a subset of SetA. You want to produce a True for each item in SetB and a False for each SetA item which is not in SetB. Is that correct when you substitute dict (the dictionary object) for SetA and lstResults for SetB?
What I was trying to suggest is load dict with all the possible "DeptX" keys and assign them as False. Then iterate your lstResults and change each of those (in dict) to True. Afterward, build your SQL statement from dict.
Dim varKeys As Variant
Dim i As Integer
Dim strFragment As String
varKeys = dict.keys()
For i = LBound(varKeys) To UBound(varKeys)
strFragment = strFragment & ", " & varKeys(i) & " = " & dict(varKeys(i))
Next i
strFragment = Mid(strFragment, 3)
sql = sql & strFragment & "WHERECLAUSE"