Count unique values in CSV file - vb.net

I am trying to create a VB (.NET framework) to call a csv and count how many unique values in a column. For example, the csv would look like this:
Source Barcode Name,Source ID,Destination Barcode Name,Destination ID
BARCODE_0006,A,Barcode_0001,F
BARCODE_0002,B,Barcode_0001,G
BARCODE_0003,C,Barcode_0001,H
BARCODE_0004,D,Barcode_0001,I
BARCODE_0005,E,Barcode_0001,J
The script would return 5 in this example. Note, the number of unique values in column 1 can change.

Start with this:
Public Iterator Function ReadCSV(filePath As String, Optional delimiter As String = ",") As IEnumerable(Of String())
Using parser As New TextFieldParser(filePath)
parser.Delimiters = New String() {delimiter}
While Not parser.EndOfData
Yield parser.ReadFields()
End While
End Using
End Function
And then use it to solve your problem like this:
Dim count As Integer =
ReadCSV("MyFilePath.csv"). ' Open the file
Skip(1). ' Skip the header
Select(Function(r) r(0)). ' Just the first column in each row
Distinct(). ' Unique entries only
Count() ' How many there are

Related

Refined list sorting by substring integer after alphabetical sorting

I have some information in a list (called listLines). Each line below is in a List(Of String).
1|This is just a header
3|This is just a footer
2|3456789|0000000|12312312313|BLUE|1|35.00
2|7891230|0000000|45645645655|BLUE|1|22.00
2|7891230|0000000|45645645658|RED|2|13.00
2|3456789|0000000|12312312316|RED|2|45.00
2|3456789|0000000|12312312317|YELLOW|5|-9.00
2|3456789|0000000|12312312315|ORANGE|3|15.00
2|7891230|0000000|45645645659|YELLOW|5|32.00
2|3456789|0000000|12312312314|GREEN|4|-20.00
2|7891230|0000000|45645645656|GREEN|4|39.00
2|7891230|0000000|45645645657|ORANGE|3|-18.50
I'm doing a listLines.sort() on the list to sort it alphabetically. Below is what I get after the .sort().
1|This is just a header
2|3456789|0000000|12312312313|BLUE|1|35.00
2|3456789|0000000|12312312314|GREEN|4|-20.00
2|3456789|0000000|12312312315|ORANGE|3|15.00
2|3456789|0000000|12312312316|RED|2|45.00
2|3456789|0000000|12312312317|YELLOW|5|-9.00
2|7891230|0000000|45645645655|BLUE|1|22.00
2|7891230|0000000|45645645656|GREEN|4|39.00
2|7891230|0000000|45645645657|ORANGE|3|-18.50
2|7891230|0000000|45645645658|RED|2|13.00
2|7891230|0000000|45645645659|YELLOW|5|32.00
3|This is just a footer
With that said, I need to output this information to a file. I'm able to do this ok. I still have a problem though. There is a sequence number in the above data at position 5 just after the listed colors (RED, BLUE, ETC..) that you can see. It's just before the last value which is a decimal type.
I need to further sort this list, keeping it in alphabetical order since position 2 is an account number and I want to keep the account numbers grouped together. I just want them to be resorted in sequential order based on the sequence number.
I was looking at another thread trying to figure out how I can do this. I found a piece of code like listLines.OrderBy(Function(q) q.Substring(35)).ToArray. I think this would probably help me if this was a fixed length file, it isn't however. I was thinking I can do some kind of .split() to get the 5th piece of information and sort it but then it's going to unalphabetize and mix the lines back up because I don't know how to specify to still keep it alphabetical.
Right now I'm outputting my alphabetical list like below so I can format it with commas and double quotes.
For Each listLine As String In listLines
strPosition = Split(listLine, "|")
Dim i As Integer = 1
Dim iBound As Integer = UBound(strPosition)
Do While (i <= iBound)
strOutputText = strOutputText & Chr(34) & strPosition(i) & Chr(34) & ","
i += 1
Loop
My main question is how do I re-sort after .sort() to then get each account (position1) in sequential order (position 5)? OR EVEN BETTER, how can I do both at the same time?
The List(Of T) class has an overload of the Sort method that takes a Comparison(Of T) delegate. I would suggest that you use that. It allows you to write a method or lambda expression that will take two items and compare them any way you want. In this case, you could do that like this:
Dim items = New List(Of String) From {"1|This Is just a header",
"3|This Is just a footer",
"2|3456789|0000000|12312312313|BLUE|1|35.00",
"2|7891230|0000000|45645645655|BLUE|1|22.00",
"2|7891230|0000000|45645645658|RED|2|13.00",
"2|3456789|0000000|12312312316|RED|2|45.00",
"2|3456789|0000000|12312312317|YELLOW|5|-9.00",
"2|3456789|0000000|12312312315|ORANGE|3|15.00",
"2|7891230|0000000|45645645659|YELLOW|5|32.00",
"2|3456789|0000000|12312312314|GREEN|4|-20.00",
"2|7891230|0000000|45645645656|GREEN|4|39.00",
"2|7891230|0000000|45645645657|ORANGE|3|-18.50"}
items.Sort(Function(x, y)
Dim xParts = x.Split("|"c)
Dim yParts = y.Split("|"c)
'Compare by the first column first.
Dim result = xParts(0).CompareTo(yParts(0))
If result = 0 Then
'Compare by the second column next.
result = xParts(1).CompareTo(yParts(1))
End If
If result = 0 Then
'Compare by the sixth column last.
result = xParts(5).CompareTo(yParts(5))
End If
Return result
End Function)
For Each item In items
Console.WriteLine(item)
Next
If you prefer a named method then do this:
Private Function CompareItems(x As String, y As String) As Integer
Dim xParts = x.Split("|"c)
Dim yParts = y.Split("|"c)
'Compare by the first column first.
Dim result = xParts(0).CompareTo(yParts(0))
If result = 0 Then
'Compare by the second column next.
result = xParts(1).CompareTo(yParts(1))
End If
If result = 0 Then
'Compare by the sixth column last.
result = xParts(5).CompareTo(yParts(5))
End If
Return result
End Function
and this:
items.Sort(AddressOf CompareItems)
Just note that this is rather inefficient because it splits both items on each comparison. That's not a big deal for a small list but, if there were a lot of items, it would be better to split each item once and then sort based on those results.

Can I use a method that returns a list of strings in SSRS report code as the headers in a tablix?

I have table that needs to contain 50 columns for each half hour in the day (+2 for daylight savings). So each column will be HH1, HH2, HH3... HH50.
I have written this piece of code in the report properties code section.
Function GetHH() As List(Of String)
Dim headers As List(Of String) = new List(Of String)
For index As Integer = 1 to 50
headers.Add("HH" & index)
Next
return headers
End Function
Is there a way to use the output of this function as the headers of my tablix? Or will I need to add the headers to some sort of dataset in the database and add it from there?
The column group functionality would be well suited for this. As you mentioned, you would need to write a SQL statement to return these values in a dataset. Then you can set your column group to group on these values. This way your table always gets the right number of columns and you don't have to add them manually.

Using LINQ to find updated rows in DataTable

I'm building an application in VB.NET where I am pushing data from one database to another. The source database is SQL Server and the target is MySQL.
What I am doing is first creating DataTables for each table in each database which I use to do a comparison. I've written the queries in such a way so that the source and target DataTables contain exactly the same columns and values to make the comparison easier.
This side of the application works fine. What I do next is find rows which do not exist in the target database by finding PKs which do not exist. I then insert these new rows into the target database with no problem.
The Problem
What I now need to do is find rows in each table that have been updated, i.e. are not identical to the corresponding rows in the target DataTable. I have tried using Except() as per the example below:
Public Function GetUpdates(ByVal DSDataSet As MSSQLQuery, ByVal AADataSet As MySQLQuery, Optional ByVal PK As String = Nothing) As List(Of DataRow)
' Determines records to be updated in the AADB and returns list of new Rows
' Param DSDataSet - MSSQLQuery Object for source table
' Param AADataSet - MySQLQuery Object for destination table
' Optional Param PK - String of name common columns to treat as PK
' Returns List(Of DataRow) containing rows to update in table
Dim orig = DSDataSet.GetDataset()
Dim origTable = orig.Tables(0).AsEnumerable()
Dim destination = AADataSet.GetDataset()
Dim destinationTable = destination.Tables(0).AsEnumerable()
' Get Records which are not in destination table
Dim ChangedRows = Nothing
If IsNothing(PK) Then
ChangedRows = destinationTable.AsEnumerable().Except(origTable.AsEnumerable(), DataRowComparer.Default)
End If
Dim List As New List(Of DataRow)
For Each addRow In ChangedRows
List.Add(addRow)
Next
Return List
End Function
The trouble is that it ends up simply returning the entire set of source rows.
How can I check for these changed rows? I could always hardcode queries to return what I want but this introduces problems because I need to make comparisons for 15 tables so it would be a complete mess.
Ideally I need a solution where it will take into account the variable number columns from the source tables for comparison against what is essentially an identical target table and simply compare the DataRows for equality.
There should be a corresponding row in the target tables for every source row since the addition of new rows is performed prior to this check for updated rows.
I am also open to using methods other than LINQ to achieve this.
Solution
In the end I implemented a custom comparer to use in the query as shown below. It first checks if the first column value matches (PK in my case) where if it does then it we check column-wise that everything matches.
Any discrepancy will set the flag value to FALSE which we return. If there aren't any issues then TRUE will be returned. In this case I used = to compare equality between values rather than Equals() since I'm not concerned about a strict equality.
The resulting set of DataRows is used to UPDATE the database using the first column value (PK) in the WHERE clause.
Imports System.Data
Class MyDataRowComparer
Inherits EqualityComparer(Of DataRow)
Public Overloads Overrides Function Equals(x As DataRow, y As DataRow) As Boolean
If x.Item(0).ToString().Equals(y.Item(0).ToString()) Then
' If PK matches then check column-wise.
Dim Flag As Boolean = True
For Counter As Integer = 0 To x.ItemArray.Count - 1
If Not x.Item(Counter) = y.Item(Counter) Then
Flag = False
End If
Next
Return Flag
Else
' Otherwise don't bother and just skip.
Return False
End If
End Function
...
End Class
class MyDataRowComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
return x["ColumnName"].Equals(y["ColumnName"]);
// Can add more columns to the Comparison
}
public int GetHashCode(DataRow obj)
{
return obj["ColumnName"].GetHashCode();
// Can add more columns to calculate HashCode
}
}
Now the Except statement will be like:
ChangedRows = destinationTable.AsEnumerable()
.Except(origTable.AsEnumerable(), MyDataRowComparer)

Convert text data file to csv format

My text data file is like this:
{1000}xxx{1200}xxx{3000}xxxxxx{5000}
{1000}xx{1500}xxxxxx{4000}xx{6000}
{1000}xxxx{1600}xxx{3000}xxx{6000}
...
I need to convert this data file to csv file or excel file to analyze. I tried Excel or other convert software. But it is not working.
Can I use VB to do that? I did not use VB for a long time (over 10 years).
I am sorry. I did not make it clear.
The number in curly brackets is the field name. Each record doesn't have same field. The result after converted should be like this:
(header line) 1000 1200 1500 1600 3000 4000 5000 6000
(record line) xxx xxx xxx xxx
. xxx xxx xxx xxx
. xxx xxx xxx xxx
We have the text data file everyday (10 - 20 records). Although data is not big, we don't need to re-type to excel file if we can convert to csv file. This can help us lot of time.
You almost definitely could use a programming language (like VB) to make this change. I'm not sure you need to do that though.
If you are trying to write a program to convert the same type of file over and over, it might make sense to build a program in VB.net.
FYI, its hard to help advise you further without understanding more about what you need to do? For example, the size of the file, how often you will need to do it, what the target format will be, etc...
... but the answer I provided did answer the question you asked! ... and I am seeking rep points ;)
In light of your explanation of how the data is structured:
Imports System.IO
Imports System.Text
Imports System.Text.RegularExpressions
Module Module1
Class Cell
Property ColumnName As String
Property Value As String
' To help with debugging/general usage
Public Overrides Function ToString() As String
Return String.Format("Col: {0} Val: {1}", ColumnName, Value)
End Function
End Class
Dim table As New List(Of List(Of Cell))
Sub Main()
Dim src As String = "C:\temp\sampledata.txt"
Dim dest = "C:\temp\sampledata.csv"
Dim colNames As New List(Of String)
' This regex will look for zero or more characters ".*" surrounded by braces "\{ \}" and
' collect the zero or more characters in a group "( )". The "?" makes it non-greedy.
' The second capture group "( )" gets all the characters up to but not including
' the next "\{" (if it is present).
Dim cellSelector = New Regex("\{(.*?)\}([^\{]*)")
' Read in the cells and record the column names.
Using inFile = New StreamReader(src)
While Not inFile.EndOfStream
Dim line = inFile.ReadLine
Dim rowContent As New List(Of Cell)
For Each m As Match In cellSelector.Matches(line)
rowContent.Add(New Cell With {.ColumnName = m.Groups(1).Value, .Value = m.Groups(2).Value})
If Not colNames.Contains(m.Groups(1).Value) Then
colNames.Add(m.Groups(1).Value)
End If
Next
table.Add(rowContent.OrderBy(Function(c) c.ColumnName).ToList)
End While
End Using
colNames.Sort()
' add the header row of the column names
Dim sb As New StringBuilder(String.Join(",", colNames) & vbCrLf)
' output the data in csv format
For Each r In table
Dim col = 0
Dim cellNo = 0
While cellNo < r.Count AndAlso col < colNames.Count
' If this row has a cell with the appropriate column name then
' add the value to the output.
If r(cellNo).ColumnName = colNames(col) Then
sb.Append(r(cellNo).Value)
cellNo += 1
End If
' add a separator if is not the last item in the row
If col < colNames.Count - 1 Then
sb.Append(","c)
End If
col += 1
End While
sb.AppendLine()
Next
File.WriteAllText(dest, sb.ToString)
End Sub
End Module
From your sample data, the output is
1000,1200,1500,1600,3000,4000,5000,6000
xxx,xxx,,,xxxxxx,,,
xx,,xxxxxx,,,xx,,,
xxxx,,,xxx,xxx,,,,
I notice that none of the final columns have data in them. Is that just a copy-and-paste error or intentional?
EDIT: I use Option Infer On, which is why some of the type declarations are missing.

Creating an Array from a CSV text file and selecting certain parts of the array. in VB.NET

Ive searched over and over the internet for my issue but I havent been able to find / word my searches correctly...
My issue here is that I have a Comma Separated value file in .txt format... simply put, its a bunch of data delimited by commas and text qualifier is separated with ""
For example:
"So and so","1234","Blah Blah", "Foo","Bar","","","",""
"foofoo","barbar","etc.."
Where ever there is a carriage return it signifies a new row and every comma separates a new column from another.
My next step is to go into VB.net and create an array using these values and having the commas serve as the delimeter and somehow making the array into a table where the text files' format matches the array (i hope im explaining myself correctly :/ )
After that array has been created, I need to select only certain parts of that array and store the value into a variable for later use....
Andthats where my trouble comes in... I cant seem to get the correct logic as to how to make the array and selecting the certain info out of it..
If any
You might perhaps give a more detailed problem description, but I gather you're looking for something like this:
Sub Main()
Dim fileOne As String = "a1,b1,c1,d1,e1,f1,g1" + Environment.NewLine + _
"a2,b2,c2,d2,e2,f2,g2"
Dim table As New List(Of List(Of String))
' Process the file
For Each line As String In fileOne.Split(Environment.NewLine)
Dim row As New List(Of String)
For Each value In line.Split(",")
row.Add(value)
Next
table.Add(row)
Next
' Search the "table" using LINQ (for example)
Dim v = From c In table _
Where c(2) = "c1"
Console.WriteLine("Rows containing 'c1' in the 3rd column:")
For Each x As List(Of String) In v
Console.WriteLine(x(0)) ' printing the 1st column only
Next
' *** EDIT: added this after clarification
' Fetch value in row 2, column 3 (remember that lists are zero-indexed)
Console.WriteLine("Value of (2, 3): " + table(1)(2))
End Sub