Convert text data file to csv format - vb.net

My text data file is like this:
{1000}xxx{1200}xxx{3000}xxxxxx{5000}
{1000}xx{1500}xxxxxx{4000}xx{6000}
{1000}xxxx{1600}xxx{3000}xxx{6000}
...
I need to convert this data file to csv file or excel file to analyze. I tried Excel or other convert software. But it is not working.
Can I use VB to do that? I did not use VB for a long time (over 10 years).
I am sorry. I did not make it clear.
The number in curly brackets is the field name. Each record doesn't have same field. The result after converted should be like this:
(header line) 1000 1200 1500 1600 3000 4000 5000 6000
(record line) xxx xxx xxx xxx
. xxx xxx xxx xxx
. xxx xxx xxx xxx
We have the text data file everyday (10 - 20 records). Although data is not big, we don't need to re-type to excel file if we can convert to csv file. This can help us lot of time.

You almost definitely could use a programming language (like VB) to make this change. I'm not sure you need to do that though.
If you are trying to write a program to convert the same type of file over and over, it might make sense to build a program in VB.net.
FYI, its hard to help advise you further without understanding more about what you need to do? For example, the size of the file, how often you will need to do it, what the target format will be, etc...
... but the answer I provided did answer the question you asked! ... and I am seeking rep points ;)

In light of your explanation of how the data is structured:
Imports System.IO
Imports System.Text
Imports System.Text.RegularExpressions
Module Module1
Class Cell
Property ColumnName As String
Property Value As String
' To help with debugging/general usage
Public Overrides Function ToString() As String
Return String.Format("Col: {0} Val: {1}", ColumnName, Value)
End Function
End Class
Dim table As New List(Of List(Of Cell))
Sub Main()
Dim src As String = "C:\temp\sampledata.txt"
Dim dest = "C:\temp\sampledata.csv"
Dim colNames As New List(Of String)
' This regex will look for zero or more characters ".*" surrounded by braces "\{ \}" and
' collect the zero or more characters in a group "( )". The "?" makes it non-greedy.
' The second capture group "( )" gets all the characters up to but not including
' the next "\{" (if it is present).
Dim cellSelector = New Regex("\{(.*?)\}([^\{]*)")
' Read in the cells and record the column names.
Using inFile = New StreamReader(src)
While Not inFile.EndOfStream
Dim line = inFile.ReadLine
Dim rowContent As New List(Of Cell)
For Each m As Match In cellSelector.Matches(line)
rowContent.Add(New Cell With {.ColumnName = m.Groups(1).Value, .Value = m.Groups(2).Value})
If Not colNames.Contains(m.Groups(1).Value) Then
colNames.Add(m.Groups(1).Value)
End If
Next
table.Add(rowContent.OrderBy(Function(c) c.ColumnName).ToList)
End While
End Using
colNames.Sort()
' add the header row of the column names
Dim sb As New StringBuilder(String.Join(",", colNames) & vbCrLf)
' output the data in csv format
For Each r In table
Dim col = 0
Dim cellNo = 0
While cellNo < r.Count AndAlso col < colNames.Count
' If this row has a cell with the appropriate column name then
' add the value to the output.
If r(cellNo).ColumnName = colNames(col) Then
sb.Append(r(cellNo).Value)
cellNo += 1
End If
' add a separator if is not the last item in the row
If col < colNames.Count - 1 Then
sb.Append(","c)
End If
col += 1
End While
sb.AppendLine()
Next
File.WriteAllText(dest, sb.ToString)
End Sub
End Module
From your sample data, the output is
1000,1200,1500,1600,3000,4000,5000,6000
xxx,xxx,,,xxxxxx,,,
xx,,xxxxxx,,,xx,,,
xxxx,,,xxx,xxx,,,,
I notice that none of the final columns have data in them. Is that just a copy-and-paste error or intentional?
EDIT: I use Option Infer On, which is why some of the type declarations are missing.

Related

How to populate column based on other columns

I need to understand why this does not work in MS Access:
UPDATE main_records
SET main_records.rece = Str(main_records.Nr) & "," & Str(main_records.Pag);
The intent is to populate the rece column (63 chars string) in all records of main_records with the contents of Nr and Pag (converted to string and concatenated).
It looks so easy but ...
"the contents of Nr and Pag (converted to string and concatenated)"
If that is what you intend to do, you are to use a string conversion function
examples is
CStr( expression ).

Count unique values in CSV file

I am trying to create a VB (.NET framework) to call a csv and count how many unique values in a column. For example, the csv would look like this:
Source Barcode Name,Source ID,Destination Barcode Name,Destination ID
BARCODE_0006,A,Barcode_0001,F
BARCODE_0002,B,Barcode_0001,G
BARCODE_0003,C,Barcode_0001,H
BARCODE_0004,D,Barcode_0001,I
BARCODE_0005,E,Barcode_0001,J
The script would return 5 in this example. Note, the number of unique values in column 1 can change.
Start with this:
Public Iterator Function ReadCSV(filePath As String, Optional delimiter As String = ",") As IEnumerable(Of String())
Using parser As New TextFieldParser(filePath)
parser.Delimiters = New String() {delimiter}
While Not parser.EndOfData
Yield parser.ReadFields()
End While
End Using
End Function
And then use it to solve your problem like this:
Dim count As Integer =
ReadCSV("MyFilePath.csv"). ' Open the file
Skip(1). ' Skip the header
Select(Function(r) r(0)). ' Just the first column in each row
Distinct(). ' Unique entries only
Count() ' How many there are

Why split seems not to work properly when separator is more than one char?

I'm pretty new to vb.net and I don't understand why split works differently if separator is more then one character.
I've tryed this on .net fiddle and I was surprised from result:
Dim Txt as string = "123_|_ABC_|_sd"
Dim c() as string
c= Txt.Split("_|_")
console.WriteLine(c.length)
For i = 0 to c.length -1
console.WriteLine(c(i))
Next
Txt = "123|ABC|sd"
c= Txt.Split("|")
console.WriteLine(c.length)
For i = 0 to c.length -1
console.WriteLine(c(i))
Next
Result was, for first part of code:
5
123
|
ABC
|
sd
For the second part of code:
3
123
ABC
sd
My questions are:
Why this happens?
Is there a way to get result of 2nd part of code with a separator of multiple chars?
The .NET String.Split() method accepts two arguments:
The string you want to split
A list of characters, any of which will be recognized as a delimiter
So, this line of code:
c= Txt.Split("_|_")
Doesn't say "use | as a delimter". Rather, it says "use _ or | or _ as a delimiter".
The most straightforward way to get the behavior that you are looking for is to use the String class from the Microsoft.VisualBasic namespace. This is often sneered at, but there are cases where it simply does things better than the corresponding .NET classes.
So, import the Microsoft.VisualBaisc namespace, then...
c = Split(Txt,"_|_")
You can try a Regex.Split() with the pattern
"_\|_"
Example:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim Txt As String = "123_|_ABC_|_sd"
Dim c() As String
c = Regex.Split(Txt, "_\|_")
For i = 0 To c.Length - 1
Console.WriteLine(c(i))
Next
Console.ReadLine()
End Sub
End Module
Results:
123
ABC
sd

Read complex tab separated file(multiple column lines and line breaks) into the objects

Ik have a tab separated data file. Each object is separated from each other with 2 line breaks and each object's first and third row is the column names.
My Tab Separated File
ID [TAB] NAME
001 [TAB] Croline
DATE [TAB] DOC
30/06/2010 [TAB] 101435
2 x EMPTY LINE
ID [TAB] NAME
002 [TAB] Grek
DATE [TAB] DOC
30/06/2010 [TAB] 101437
2 x EMPTY LINE
...........
...........
My Object Class
Public Class MyObject
Public Property Id As String
Public Property Name As String
Public Property Date As String
Public Property Doc As String
End Class
How can I read this file into the MyObjects?
It's hard to help you to understand how to do this without knowing, more specifically, what part of this task you are having trouble with, but perhaps a simple working example with help you get started.
If you define the your data class like this:
Public Class MyObject
Public Property Id As String
Public Property Name As String
Public Property [Date] As String ' Note that "Date" must be surrounded with brackets since it is a keyword in VB
Public Property Doc As String
End Class
Then you can load it like this:
' Create a list to hold the loaded objects
Dim objects As New List(Of MyObject)()
' Read all of the lines from the file into an array of strings
Dim lines() As String = File.ReadAllLines("test.txt")
' Loop through the array of lines from the file. Step by 7 each
' time so that the current value of "i", at each iteration, will
' be the index of the first line of each object
For i As Integer = 0 To lines.Length Step 7
If lines.Length >= i + 3 Then
' Create a new object to store the data for the current item in the file
Dim o As New MyObject()
' Get the values from the second line
Dim fields() As String = lines(i + 1).Split(ControlChars.Tab)
o.Id = fields(0)
o.Name = fields(1)
' Get the values from the fourth line
fields = lines(i + 3).Split(ControlChars.Tab)
o.Date = fields(0)
o.Doc = fields(1)
' Add this item to the list
objects.Add(o)
End If
Next
The code to load it is very basic. It does no extra validation to ensure that the data in the file is correctly formatted, but, given a valid file, it will work to load the data into a list of objects.
The solution will be something like (pseudocode):
Create an empty list of MyObjects
Open file for reading
While there are lines left to read:
create a MyObject instance i
read a line and ignore it.
read a line into s1
split s1 at tab character into a and b
set i.Id to a1
set i.Name to b1
read a line and ignore it
read a line into s2
split s2 at tab character into a and b
set i.Date to a2
set i.Doc to b2
add i to your list
read a line and ignore it.
read a line and ignore it.
Translating this into vb.net is left as an exercise for the reader.
Rather than writing specific code for that data, I'd first convert it to simple CSV. It might make sense if this is just a one-shot thing.
1) Load the file in Notepad++
2) Replace \t by ; (using Extended Search Mode), giving you this kind of data :
ID;NAME
001;Croline
DATE;DOC
30/06/2010;101435
ID;NAME
002;Grek
DATE;DOC
30/06/2010;101437
3) Remove all DATE;DOC lines by searching for DATE;DOC\n and replacing with ; (you might need to do Edit > EOL Conversions > UNIX Format for that to work)
4) Do the same to replace all ID;NAME lines by a placeholder symbol that is guaranteed not to be used in the data, maybe £. Your data should look like this :
£001;Croline
;30/06/2010;101435
£002;Grek
;30/06/2010;101437
5) Do Edit > Blank Operations > Remove Unnecessary Blank and EOL, which should put all your data on a single line.
6) Search & Replace your placeholder symbol £ by \n
7) Remove extra blank line at the top. Voilà, you have a CSV file.

Creating an Array from a CSV text file and selecting certain parts of the array. in VB.NET

Ive searched over and over the internet for my issue but I havent been able to find / word my searches correctly...
My issue here is that I have a Comma Separated value file in .txt format... simply put, its a bunch of data delimited by commas and text qualifier is separated with ""
For example:
"So and so","1234","Blah Blah", "Foo","Bar","","","",""
"foofoo","barbar","etc.."
Where ever there is a carriage return it signifies a new row and every comma separates a new column from another.
My next step is to go into VB.net and create an array using these values and having the commas serve as the delimeter and somehow making the array into a table where the text files' format matches the array (i hope im explaining myself correctly :/ )
After that array has been created, I need to select only certain parts of that array and store the value into a variable for later use....
Andthats where my trouble comes in... I cant seem to get the correct logic as to how to make the array and selecting the certain info out of it..
If any
You might perhaps give a more detailed problem description, but I gather you're looking for something like this:
Sub Main()
Dim fileOne As String = "a1,b1,c1,d1,e1,f1,g1" + Environment.NewLine + _
"a2,b2,c2,d2,e2,f2,g2"
Dim table As New List(Of List(Of String))
' Process the file
For Each line As String In fileOne.Split(Environment.NewLine)
Dim row As New List(Of String)
For Each value In line.Split(",")
row.Add(value)
Next
table.Add(row)
Next
' Search the "table" using LINQ (for example)
Dim v = From c In table _
Where c(2) = "c1"
Console.WriteLine("Rows containing 'c1' in the 3rd column:")
For Each x As List(Of String) In v
Console.WriteLine(x(0)) ' printing the 1st column only
Next
' *** EDIT: added this after clarification
' Fetch value in row 2, column 3 (remember that lists are zero-indexed)
Console.WriteLine("Value of (2, 3): " + table(1)(2))
End Sub