Extracting and save relevant data from a txt file - vb.net

I tried to write a code but I didn't succeed at all. Could someone help me please?
I want my program to read the data.txt file
The data.txt file contains:
Name: Christian
Phone: x
Address: x
Name: Alexander
Phone: x
Address: x
I would like the program to save the names in a output file: output_data.txt
The output_data.txt file should be contain:
Christian
Alexander
This is what I have so far:
Using DataReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("data.txt")
Dim DataSaver As System.IO.StreamWriter
DataReader.TextFieldType = FileIO.FieldType.Delimited
DataReader.SetDelimiters("Name: ")
Dim Row As String()
While Not DataReader.EndOfData
Row = DataReader.ReadFields()
Dim DataSplited As String
For Each DataSplited In Row
My.Computer.FileSystem.WriteAllText("output_data.txt", DataSplited, False)
'MsgBox(DataSplited)
Next
End While
End Using
But the output_data.txt file does not save properly to what "MsgBox (DataSplited)" shows. MsgBox(DataSplit) delimits the name by Name: but also shows the rest, such as address, phone. I don't know what the problem is.

this will work:
Dim strFile = "c:\test5\data.txt"
Dim InputBuf As String = File.ReadAllText(strFile)
Dim OutBuf As String = ""
Dim InBufHold() As String = Split(InputBuf, "Name:")
For i = 1 To InBufHold.Length - 1
OutBuf += Trim(Split(InBufHold(i), vbCrLf)(0)) & vbCrLf
Next
File.WriteAllText("c:\test5\output_data.txt", OutBuf)

Dim names = File.ReadLines("data.txt").
Where(Function(line) line.StartsWith("Name: ")).
Select(Function(line) line.SubString(5).Trim())
File.WriteAllText("output_data.txt", String.Join(vbCrLf, names))

Related

VB.net Read Specific Lines From a Text File That Start With and Stop Reading When Start With

I'm looking to read lines from a text file that start with certain characters and stop when the line starts with other characters. So in my example I would like to start reading at line AB and stop at line EF however not all lines will contain the CD line. There will always be a AB line and EF line, however the number of lines in between is unknown.
Here is an example of the lines in a text file I would be reading. You can see that this will create two rows in the DataGridView however the first row is missing the CD line and should be blank.
AB-id1
EF-address1
AB-id2
CD-name1
EF-address2
Here is the code I have so far:
Dim lines() As String = File.ReadAllLines(textfile)
For i As Integer = 0 To lines.Length - 1
If lines(i).StartsWith("AB") Then
Dim nextLines As String() = lines.Skip(i + 1).ToArray
Dim info As String = nextLines.FirstOrDefault(Function(Line) Line.StartsWith("CD"))
Dim name As String = "Yes"
Dim info2 As String = nextLines.FirstOrDefault(Function(Line) Line.StartsWith("EF"))
Dim address As String = "Yes"
End If
DataGridView.Rows.Add(name,address)
Next
Now the output I currently get is:
|Yes|Yes|
|Yes|Yes|
And I should be getting:
||Yes|
|Yes|Yes|
It looks like it's reading too far down the text file and I need it to stop reading at EF. I've tried Do while and Do Until with no success. Any suggestions?
You could use the Array.FindIndex function to get the index of the next line starting with your prefix. This way you don't have to skip lines and create a new array each time.
Try this out instead:
Dim lines() As String = File.ReadAllLines(textFile)
For i As Integer = 0 To lines.Length - 1
If lines(i).StartsWith("AB") Then
Dim addressIndex As Integer = Array.FindIndex(lines, i + 1, Function(Line) Line.StartsWith("EF"))
Dim address As String = If(addressIndex <> -1, lines(addressIndex).Substring(3), "") ' Get everything past the "-"
Dim name As String = ""
If addressIndex <> -1 Then
Dim nameIndex As Integer = Array.FindIndex(lines, i + 1, addressIndex - i, Function(line) line.StartsWith("CD"))
If nameIndex <> -1 Then
name = lines(nameIndex).Substring(3) ' Get everything past the "-"
End If
End If
DataGridView.Rows.Add(name, address)
End If
Next

How do I set max character length of filenames?

I am currently using the following code to remove the defined special characters and white spaces from any file names within my defined directory:
For Each file As FileInfo In files
newName = Regex.Replace(file.Name, "[!##$%^&*()_ ]", "")
If (file.Name <> newName) Then
newPath = Path.Combine(dir, newName)
file.CopyTo(newPath)
End If
Next
Edit: How do I trim the characters of the new file name (newName) to all but the first 26 characters?
Answer:
For Each file As FileInfo In files
If (file.Name.Length >= 36) Then
Dim maxLen As Integer = 26 - file.Extension.Length
newName = ${Regex.Replace(Path.GetFileNameWithoutExtension(file.Name), "[!##$%^&*()_ ]",
"").Substring(0, maxLen)}{file.Extension}"
newPath = Path.Combine(dir, newName)
file.CopyTo(newPath, True)
ElseIf (file.Name.Length < 36) Then
newName = Regex.Replace(file.Name, "[!##$%^&*()_ ]", "")
If (file.Name <> newName) Then
newPath = Path.Combine(dir, newName)
file.CopyTo(newPath)
End If
End If
Next
You can use Linq as follows:
Dim dir = "c:\myFolder"
Dim except = "[!##$%^&*()_ ]".ToArray
For Each file As FileInfo In files
Dim maxLen As Integer = 26 - file.Extension.Length
Dim newPath = Path.Combine(dir,
$"{New String(Path.GetFileNameWithoutExtension(file.Name).
ToCharArray.
Except(except).
Take(maxLen).
ToArray)}{file.Extension}")
file.CopyTo(newPath, True)
Next
Suppose you have a file with name:
abcdefg_hijk!lm#pno#pq%r(stuvy)x$z.dbf
The newPath output will be:
c:\myFolder\abcdefghijklmpnoqrstuv.dbf
If that is what you need to do.
Edit:
Alternative using RegEx:
Dim except = "[!##$%^&*()_ ]"
For Each file As FileInfo In files
Dim maxLen As Integer = 26 - file.Extension.Length
Dim newName = $"{Regex.Replace(Path.GetFileNameWithoutExtension(file.Name),
except,
"").Substring(0, maxLen)}{file.Extension}"
Dim newPath = Path.Combine(dir, newName)
file.CopyTo(newPath, True)
Next
So the newPath for a file with name:
n_BrucesMiddle NH 12 34 5 W3_H.dbf
... will be:
c:\myFolder\nBrucesMiddleNH12345W3.dbf
The unwanted characters have been removed and the maximum length of the new file name (newName) including the extension is 26.
Here's regex101 example.
Again, if that is what you need. Good luck.
Use String.Remove
newName = newName.Remove(26)
Note that: string length should be greater than or equal to 26
EDIT:
If you want the extension to remain. use this instead:
newName = newName.Remove(26, newName.length - 30)
To rename files you can use .MoveTo method.
From the docs:
Moves a specified file to a new location, providing the option to
specify a new file name.
You probably want to rename only the "name" part so extension remain unchanged.
This approach will support any file extension (not only extension with 3 characters)
For Each file As FileInfo In files
Dim newName As String = Path.GetFileNameWithoutExtension(file.Name).Remove(26)
Dim newPath = Path.Combine(file.DirectoryName, $"{newName}{file.Extension}")
file.MoveTo(newPath)
Next
If you are asking how to get a subset of a String, you would get the Substring. If you want the first 26 characters it would be the second overload.
Example:
Dim filename As String = "this is a terrible f^ln#me! that is (way) too long.txt"
Dim newFilename As String = Regex.Replace(filename, "[!##$%^&*()_ ]", String.Empty).Substring(0, 26)
Live Demo: Fiddle
Update
As per your comment, here is how you'd trim down just the filename without the extension:
Dim filename As String = "this is a terrible f^ln#me! that is (way) too long.txt"
Dim extension As String = Path.GetExtension(filename)
Dim shortFilename As String = Path.GetFileNameWithoutExtension(filename)
Dim newFilename As String = Regex.Replace(shortFilename, "[!##$%^&*()_ ]", String.Empty).Substring(0, 26) & extension
Live Demo: Fiddle

Select text between key words

This is a follow on question to Select block of text and merge into new document
I have a SGM document with comments added and comments in my sgm file. I need to extract the strings in between the start/stop comments so I can put them in a temporary file for modification. Right now it's selecting everything including the start/stop comments and data outside of the start/stop comments.
Dim DirFolder As String = txtDirectory.Text
Dim Directory As New IO.DirectoryInfo(DirFolder)
Dim allFiles As IO.FileInfo() = Directory.GetFiles("*.sgm")
Dim singleFile As IO.FileInfo
Dim Prefix As String
Dim newMasterFilePath As String
Dim masterFileName As String
Dim newMasterFileName As String
Dim startMark As String = "<!--#start#-->"
Dim stopMark As String = "<!--#stop#-->"
searchDir = txtDirectory.Text
Prefix = txtBxUnique.Text
For Each singleFile In allFiles
If File.Exists(singleFile.FullName) Then
Dim fileName = singleFile.FullName
Debug.Print("file name : " & fileName)
' A backup first
Dim backup As String = fileName & ".bak"
File.Copy(fileName, backup, True)
' Load lines from the source file in memory
Dim lines() As String = File.ReadAllLines(backup)
' Now re-create the source file and start writing lines inside a block
' Evaluate all the lines in the file.
' Set insideBlock to false
Dim insideBlock As Boolean = False
Using sw As StreamWriter = File.CreateText(backup)
For Each line As String In lines
If line = startMark Then
' start writing at the line below
insideBlock = True
' Evaluate if the next line is <!Stop>
ElseIf line = stopMark Then
' Stop writing
insideBlock = False
ElseIf insideBlock = True Then
' Write the current line in the block
sw.WriteLine(line)
End If
Next
End Using
End If
Next
This is the example text to test on.
<chapter id="Chapter_Overview"> <?Pub Lcl _divid="500" _parentid="0">
<title>Learning how to gather data</title>
<!--#start#-->
<section>
<title>ALTERNATE MISSION EQUIPMENT</title>
<para0 verdate="18 Jan 2019" verstatus="ver">
<title>
<applicabil applicref="xxx">
</applicabil>Three-Button Trackball Mouse</title>
<para>This is the example to grab all text between start and stop comments.
</para></para0>
</section>
<!--#stop#-->
Things to note: the start and stop comments ALWAYS fall on a new line, a document can have multiple start/stop sections
I thought maybe using a regex on this
(<section>[\w+\w]+.*?<\/section>)\R(<\?Pub _gtinsert.*>\R<pgbrk pgnum.*?>\R<\?Pub /_gtinsert>)*
Or maybe use IndexOf and LastIndexOf, but I couldn't get that working.
You can read the entire file and split it into an array using the string array of {"<!--#start#-->", "<!--#stop#-->"} to split, into this
Element 0: Text before "<!--#start#-->"
Element 1: Text between "<!--#start#-->" and "<!--#stop#-->"
Element 2: Text after "<!--#stop#-->"
and take element 1. Then write it to your backup.
Dim text = File.ReadAllText(backup).Split({startMark, stopMark}, StringSplitOptions.RemoveEmptyEntries)(1)
Using sw As StreamWriter = File.CreateText(backup)
sw.Write(text)
End Using
Edit to address comment
I did make the original code a little compact. It can be expanded out into the following, which allows you to add some validation
Dim text = File.ReadAllText(backup)
Dim split = text.Split({startMark, stopMark}, StringSplitOptions.RemoveEmptyEntries)
If split.Count() <> 3 Then Throw New Exception("File didn't contain one or more delimiters.")
text = split(1)
Using sw As StreamWriter = File.CreateText(backup)
sw.Write(text)
End Using

Search text file for a ranged value

I want to read and write the same file with StreamReader and StreamWriter. I know that in my code I am trying to open the file twice and that is the problem. Could anyone give me another way to do this? I got confused a bit.
As for the program, I wanted to create a program where I create a text if it doesnt exist. If it exists then it compares each line with a Listbox and see if the value from the Listbox appears there. If it doesnt then it will add to the text.
Dim SR As System.IO.StreamReader
Dim SW As System.IO.StreamWriter
SR = New System.IO.StreamReader("D:\temp\" & Cerberus.TextBox1.Text & "_deleted.txt", True)
SW = New System.IO.StreamWriter("D:\temp\" & Cerberus.TextBox1.Text & "_deleted.txt", True)
Dim strLine As String
Do While SR.Peek <> -1
strLine = SR.ReadLine()
For i = 0 To Cerberus.ListBox2.Items.Count - 1
If Cerberus.ListBox2.Items.Item(i).Contains(strLine) = False Then
SW.WriteLine(Cerberus.ListBox2.Items.Item(i))
End If
Next
Loop
SR.Close()
SW.Close()
SR.Dispose()
SW.Dispose()
MsgBox("Duplicates Removed!")
If your file is not that large, consider using File.ReadAllLines and File.WriteAllLines.
Dim path = "D:\temp\" & Cerberus.TextBox1.Text & "_deleted.txt"
Dim lines = File.ReadAllLines(path) 'String() -- holds all the lines in memory
Dim linesToWrite = Cerberus.ListBox2.Items.Cast(Of String).Except(lines)
File.AppendAllLines(path, linesToWrite)
If the file is large, but you only have to write a few lines, then you can use File.ReadLines:
Dim lines = File.ReadLines(path) 'IEnumerable(Of String)\
'holds only a single line in memory at a time
'but the file remains open until the iteration is finished
Dim linesToWrite = Cerberus.ListBox2.Items.Cast(Of String).Except(lines).ToList
File.AppendAllLines(path, linesToWrite)
If there are a large number of lines to write, then use the answers from this question.

Read sections of INI file in VB .NET 2012

Can you help me on this? I want to get the section name and fields of my INI file. Example:
[connection]
server=localhost
user=root
password=root
My program should return the section name and the fields:
connection
server
user
password
Thanks in advance..
Just in case someone needs it. Here it is:
Dim path As String = Application.StartupPath & "\path_to_file"
' get a list of the files in this directory
' -----------------------------------------
Dim file_list As String() = Directory.GetFiles(path, "*.ini")
' go through the list of files
' -------------------------------------------------------------------
For Each f As String In file_list
Dim a = New System.IO.FileInfo(f).Name
' open the file and read it
' -------------------------
Dim sr As StreamReader = New StreamReader(f)
' read through the file line by line
' -----------------------------------
Do While sr.Peek() >= 0
Dim temp_name = Split(sr.ReadLine(), "=")
first_part = temp_name(0)
second_part = temp_name(1)
' do something with the information here
Loop
sr.Close()
Next