I need to do search and replace across 2 lines of a large ascii text file, where this may occur n times (n>1000) in random places. A text file looks like this:
....
StringVariable='
my contents'
.....
and I want it to read:
....
StringVariable='my contents'
....
For small files, I use AllText, which works fine for small files:
My.Computer.FileSystem.WriteAllText(MyInputFile, My.Computer.FileSystem.ReadAllText(MyOutputFile).Replace("='" & vbCrLf, "='"), False)
For large ones, AllText crashes with out of memory error. I see posts to use ReadLine and WriteLine, and how to test strings for characters, but I am missing how to combine multiple lines 'n' times without losing my place in the file. I guess I could split the large file into many small files carefully to allow use of AllText, and then recombine, but that seems crude. Is there a better way?
I see how to fix the case listed above, but I have other cases (e.g, 2 CR's after a specific string) and struggling to resolve for flexible case where you want to replace a multi-line string with a variable length multi-line string.
Here is the code I used for the initial case above:
Private Sub RemoveCRBefore(ByVal Infile As String, ByVal Outfile As String, ByVal LookedFor As String)
Dim Line0 As String = ""
Dim LinedUp As String = ""
Dim LookLong As Integer = LookedFor.Length
Dim FirstLine As Boolean = True
Using sr As StreamReader = New StreamReader(Infile)
Using sw = System.IO.File.CreateText(Outfile)
Dim Line1 As String = sr.ReadLine
Do While (Not Line1 Is Nothing)
If Line1.Length >= LookLong Then
If LookedFor = Line1.Substring(0, LookLong) And Not FirstLine Then
LinedUp = Line0.Replace(vbCrLf, "") & Line1
Line0 = LinedUp
FirstLine = True
Else
If FirstLine = False Then sw.WriteLine(Line0)
Line0 = Line1
End If
Else
sw.WriteLine(Line0)
Line0 = Line1
End If
Line1 = sr.ReadLine
FirstLine = False
Loop
sw.WriteLine(Line0)
End Using
End Using
End Sub
Related
This is a follow on question to Select block of text and merge into new document
I have a SGM document with comments added and comments in my sgm file. I need to extract the strings in between the start/stop comments so I can put them in a temporary file for modification. Right now it's selecting everything including the start/stop comments and data outside of the start/stop comments.
Dim DirFolder As String = txtDirectory.Text
Dim Directory As New IO.DirectoryInfo(DirFolder)
Dim allFiles As IO.FileInfo() = Directory.GetFiles("*.sgm")
Dim singleFile As IO.FileInfo
Dim Prefix As String
Dim newMasterFilePath As String
Dim masterFileName As String
Dim newMasterFileName As String
Dim startMark As String = "<!--#start#-->"
Dim stopMark As String = "<!--#stop#-->"
searchDir = txtDirectory.Text
Prefix = txtBxUnique.Text
For Each singleFile In allFiles
If File.Exists(singleFile.FullName) Then
Dim fileName = singleFile.FullName
Debug.Print("file name : " & fileName)
' A backup first
Dim backup As String = fileName & ".bak"
File.Copy(fileName, backup, True)
' Load lines from the source file in memory
Dim lines() As String = File.ReadAllLines(backup)
' Now re-create the source file and start writing lines inside a block
' Evaluate all the lines in the file.
' Set insideBlock to false
Dim insideBlock As Boolean = False
Using sw As StreamWriter = File.CreateText(backup)
For Each line As String In lines
If line = startMark Then
' start writing at the line below
insideBlock = True
' Evaluate if the next line is <!Stop>
ElseIf line = stopMark Then
' Stop writing
insideBlock = False
ElseIf insideBlock = True Then
' Write the current line in the block
sw.WriteLine(line)
End If
Next
End Using
End If
Next
This is the example text to test on.
<chapter id="Chapter_Overview"> <?Pub Lcl _divid="500" _parentid="0">
<title>Learning how to gather data</title>
<!--#start#-->
<section>
<title>ALTERNATE MISSION EQUIPMENT</title>
<para0 verdate="18 Jan 2019" verstatus="ver">
<title>
<applicabil applicref="xxx">
</applicabil>Three-Button Trackball Mouse</title>
<para>This is the example to grab all text between start and stop comments.
</para></para0>
</section>
<!--#stop#-->
Things to note: the start and stop comments ALWAYS fall on a new line, a document can have multiple start/stop sections
I thought maybe using a regex on this
(<section>[\w+\w]+.*?<\/section>)\R(<\?Pub _gtinsert.*>\R<pgbrk pgnum.*?>\R<\?Pub /_gtinsert>)*
Or maybe use IndexOf and LastIndexOf, but I couldn't get that working.
You can read the entire file and split it into an array using the string array of {"<!--#start#-->", "<!--#stop#-->"} to split, into this
Element 0: Text before "<!--#start#-->"
Element 1: Text between "<!--#start#-->" and "<!--#stop#-->"
Element 2: Text after "<!--#stop#-->"
and take element 1. Then write it to your backup.
Dim text = File.ReadAllText(backup).Split({startMark, stopMark}, StringSplitOptions.RemoveEmptyEntries)(1)
Using sw As StreamWriter = File.CreateText(backup)
sw.Write(text)
End Using
Edit to address comment
I did make the original code a little compact. It can be expanded out into the following, which allows you to add some validation
Dim text = File.ReadAllText(backup)
Dim split = text.Split({startMark, stopMark}, StringSplitOptions.RemoveEmptyEntries)
If split.Count() <> 3 Then Throw New Exception("File didn't contain one or more delimiters.")
text = split(1)
Using sw As StreamWriter = File.CreateText(backup)
sw.Write(text)
End Using
What I want to do is replace all 'A' in a string with "Bb". but it will only loop with the original string not on the new string.
for example:
AAA
BbAA
BbBbA
and it stops there because the original string only has a length of 3. it reads only up to the 3rd index and not the rest.
Dim txt As String
txt = output_text.Text
Dim a As String = a_equi.Text
Dim index As Integer = txt.Length - 1
Dim output As String = ""
For i = 0 To index
If (txt(i) = TextBox1.Text) Then
output = txt.Remove(i, 1).Insert(i, a)
txt = output
TextBox2.Text += txt + Environment.NewLine
End If
Next
End Sub
I think this leaves us looking for a String.ReplaceFirst function. Since there isn't one, we can just write that function. Then the code that calls it becomes much more readable because it's quickly apparent what it's doing (from the name of the function.)
Public Function ReplaceFirst(searched As String, target As String, replacement As String) As String
'This input validation is just for completeness.
'It's not strictly necessary.
'If the searched string is "null", throw an exception.
If (searched Is Nothing) Then Throw New ArgumentNullException("searched")
'If the target string is "null", throw an exception.
If (target Is Nothing) Then Throw New ArgumentNullException("target")
'If the searched string doesn't contain the target string at all
'then just return it - were done.
Dim foundIndex As Integer = searched.IndexOf(target)
If (foundIndex = -1) Then Return searched
'Build a new string that replaces the target with the replacement.
Return String.Concat(searched.Substring(0, foundIndex), replacement, _
searched.Substring(foundIndex + target.Length, searched.Length - (foundIndex + target.Length)))
End Function
Notice how when you read the code below, you don't even have to spend a moment trying to figure out what it's doing. It's readable. While the input string contains "A", replace the first "A" with "Bb".
Dim input as string = "AAA"
While input.IndexOf("A") > -1
input = input.ReplaceFirst(input,"A","Bb")
'If you need to capture individual values of "input" as it changes
'add them to a list.
End While
You could optimize or completely replace the function. What matters is that your code is readable, someone can tell what it's doing, and the ReplaceFirst function is testable.
Then, let's say you wanted another function that gave you all of the "versions" of your input string as the target string is replaced:
Public Function GetIterativeReplacements(searched As String, target As String, replacement As String) As List(of string)
Dim output As New List(Of String)
While searched.IndexOf(target) > -1
searched = ReplaceFirst(searched, target, replacement)
output.Add(searched)
End While
Return output
End Function
If you call
dim output as List(of string) = GetIterativeReplacments("AAAA","A","Bb")
It's going to return a list of strings containing
BbAAA, BbBbAA, BbBbBbA, BbBbBbBb
It's almost always good to keep methods short. If they start to get too long, just break them into smaller methods with clear names. That way you're not trying to read and follow and test one big, long function. That's difficult whether or not you're a new programmer. The trick isn't being able to create long, complex functions that we understand because we wrote them - it's creating small, simpler functions that anyone can understand.
Check your comments for a better solution, but for future reference you should use a while loop instead of a for loop if your condition will be changing and you're wanting to take that change into account.
I've made a simple example below to help you understand. If you tried the same with a for loop, you'd only get "one" "two" and "three" printed because the for loop doesn't 'see' that vals was changed
Dim vals As New List(Of String)
vals.Add("one")
vals.Add("two")
vals.Add("three")
Dim i As Integer = 0
While i < vals.Count
Console.WriteLine(vals(i))
If vals(i) = "two" Then
vals.Add("four")
vals.Add("five")
End If
i += 1
End While
If you do want to replace one by one instead of using the Replace function, you could use a while loop to look for the index of your search character/string, and then replace/insert at that index.
Sub Main()
Dim a As String = String.Empty
Dim b As String = String.Empty
Dim c As String = String.Empty
Dim d As Int32 = -1
Console.Write("Whole string: ")
a = Console.ReadLine()
Console.Write("Replace: ")
b = Console.ReadLine()
Console.Write("Replace with: ")
c = Console.ReadLine()
d = a.IndexOf(b)
While d > -1
a = a.Remove(d, b.Length)
a = a.Insert(d, c)
d = a.LastIndexOf(b)
End While
Console.WriteLine("Finished string: " & a)
Console.ReadLine()
End Sub
Output would look like this:
Whole string: This is A string for replAcing chArActers.
Replace: A
Replace with: Bb
Finished string: This is Bb string for replBbcing chBbrBbcters.
I was going to write a while loop to answer your question, but realized (with assistance from others) that you could just .replace(x,y)
Output.Text = Input.Text.Replace("A", "Bb")
'Input = N A T O
'Output = N Bb T O
Edit: There is probably a better alternative, but i quickly jotted this loop down, hope it helps.
You've said your new and don't fully understand while loops. So if you don't understand functions either or how to pass arguments to them, I'd suggest looking that up too.
This is your Event, It can be a Button click or Textbox text change.
'Cut & Paste into an Event (Change textboxes to whatever you have input/output)
Dim Input As String = textbox1.Text
Do While Input.Contains("A")
Input = ChangeString(Input, "A", "Bb")
' Do whatever you like with each return of ChangeString() here
Loop
textbox2.Text = Input
This is your Function, with 3 Arguments and a Return Value that can be called in your code
' Cut & Paste into Code somewhere (not inside another sub/Function)
Private Function ChangeString(Input As String, LookFor As Char, ReplaceWith As String)
Dim Output As String = Nothing
Dim cFlag As Boolean = False
For i As Integer = 0 To Input.Length - 1
Dim c As Char = Input(i)
If (c = LookFor) AndAlso (cFlag = False) Then
Output += ReplaceWith
cFlag = True
Else
Output += c
End If
Next
Console.WriteLine("Output: " & Output)
Return Output
End Function
I have this code but it have errors , what should i do ?
Dim lines As New List(Of String)
lines = RichTextBox1.Lines.ToList
'Dim FilterText = "#"
For i As Integer = lines.Count - 1 To 0 Step -1
'If (lines(i).Contains(FilterText)) Then
RichTextBox1.Lines(i) = RichTextBox1.Lines(i).Replace("#", "#sometext")
'End If
Next
RichTextBox1.Lines = lines.ToArray
Update: while the following "works" it does only modify the array which was returned from the Lines-property. If you change that array you don't change the text of the TextBox. So you need to re-assign the whole array to the Lines-property if you want to change the text(as shown below). So i keep the first part of my answer only because it fixes the syntax not the real issue.
It's not
RichTextBox1.Lines(i).Replace = "#sometext"
but
RichTextBox1.Lines(i) = "#sometext"
You can loop the Lines forward, the reverse loop is not needed here.
Maybe you want to replace all "#" with "#sometext" instead:
RichTextBox1.Lines(i) = RichTextBox1.Lines(i).Replace("#","#sometext")
So here the full code necessary (since it still seems to be a problem):
Dim newLines As New List(Of String)
For i As Integer = 0 To RichTextBox1.Lines.Length - 1
newLines.Add(RichTextBox1.Lines(i).Replace("#", "#sometext"))
Next
RichTextBox1.Lines = newLines.ToArray()
But maybe you could even use:
RichTextBox1.Text = RichTextBox1.Text.Replace("#","#sometext")`
because if we have # abcd this code change it to # sometextabcd ! I
Want a code to replace for example line 1 completely to # sometext
Please provide all relevant informations in the first place next time:
Dim newLines As New List(Of String)
For Each line As String In RichTextBox1.Lines
Dim newLine = If(line.Contains("#"), "#sometext", line)
newLines.Add(newLine)
Next
RichTextBox1.Lines = newLines.ToArray()
I'm a bit of a newbie so any advice would be great. I have a program
that opens a CSV, and then saves it as a csv with a different name. there will be a set of rules to change fields but haven't got that far yet.
when I run this on a small csv file (about 4 columns and rows) it works fine, but with a larger file, it fails with the error above. i'm sure its something daft but I I'm at a loss.
Thanks,
Dean
Dim FileName = tbOpen.Text
Dim fileout = tbSave.Text
Dim lines = File.ReadAllLines(FileName)
Dim output As New List(Of String)
For Each line In lines
Dim fields = line.Split(","c)
If fields(1) = "" Then 'This is where the error is triggered
fields(1) = "Norman"
End If
If fields(3) = "" Then
fields(3) = "Blue Leather"
End If
If fields(4) = "" Then
fields(3) = "Interlined"
End If
output.Add(String.Join(","c, fields))
Next
File.WriteAllLines(fileout, output)
Try
Dim a As String = My.Computer.FileSystem.ReadAllText(tbSave.Text)
Dim b As String() = a.Split(vbNewLine)
ListBox2.Items.AddRange(b)
Catch ex As Exception
MsgBox("error")
End Try
Keep in mind that arrays start a index 0. VB is notorious for stretching this concept. Normally, when you declare an array with 10 elements the indexes would be from 0 - 9. With VB, on the other hand, the indexes will be from 0 - 10 which will actually give you 11 elements.
I have a text file with the format:
(title,price,id#)
CD1,11.00,111111
CD2,12.00,222222
CD3,13.00,333333
CD4,14.00,444444
CD5,15.00,555555
CD6,16.00,666666
What is the best way to go change the price of the appropriate CD if I'm given the id# and new price?
I'm sure it has something do to with getting the line and splitting it, but I'm not sure how I edit just one line and not mess up the whole file.
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). For such a small file it's probably the easiest to change the line in memory and then rewrite all to the file:
Dim idToFind = "444444"
Dim newPrice = "100"
Dim lines = IO.File.ReadAllLines(path)
For i = 0 To lines.Length - 1
Dim line = lines(i)
Dim fields = line.Split(","c)
If fields.Length > 2 Then
Dim id = fields(2)
If id = idToFind Then
Dim title = fields(0)
lines(i) = String.Format("{0},{1},{2}", title, newPrice, id)
Exit For
End If
End If
Next
IO.File.WriteAllLInes(path, lines)
Okay, now we know it's a short file, life becomes much easier:
Load the file into an array of lines using File.ReadAllLines
Find the right line using string.Split to split each line into the constituent parts, and check the ID.
When you've found the right line, replace it with the complete new line
Write the file back with File.WriteAllLines
That should be enough to get you going.
If its just a file with like 25 lines, you could do a simple input-transform-output routine and update the price per line.
Something like this (Using Streamreader / writer ).
Sub UpdatePrice(ByVal pricesToUpdate As Dictionary(Of Integer, String), ByVal inputPath As String)
If Not IO.File.Exists(inputPath) Then Return
Try
Using inputStream = New IO.StreamReader(inputPath, System.Text.Encoding.UTF8, True)
Using outputStream = New IO.StreamWriter(inputPath + ".tmp", False, System.Text.Encoding.UTF8)
While Not inputStream.EndOfStream
Dim inputLine = inputStream.ReadLine
Dim content = inputLine.Split(","c)
If Not content.Length >= 3 Then
outputStream.WriteLine(inputLine)
Continue While
End If
Dim id As Integer
If Not Integer.TryParse(content(2), id) Then
outputStream.WriteLine(inputLine)
Continue While
End If
If Not pricesToUpdate.ContainsKey(id) Then
outputStream.WriteLine(inputLine)
Continue While
End If
content(1) = pricesToUpdate(id)
outputStream.WriteLine(String.Join(",", {content(0), content(1), content(2)}))
End While
End Using
End Using
If IO.File.Exists(inputPath + ".tmp") Then
IO.File.Delete(inputPath)
IO.File.Move(inputPath + ".tmp", inputPath)
End If
Catch ex As IO.IOException
If IO.File.Exists(inputPath + ".tmp") Then IO.File.Delete(inputPath + ".tmp")
End Try
End Sub