Remove "Invisible" Control Characters in VB.Net - vb.net

I am currently reading in a text file in VB.Net using
Dim fileReader As String
fileReader = My.Computer.FileSystem.ReadAllText(file)
File contains several lines of text and when I read in the text file, it knows that they are on separate lines and prints them out accordingly.
However, when I try to split fileReader into an array of the different lines, the line break seems to stay there, even if I use Split(ControlChars.Cr) or Split(ControlChars.NewLine). It will successfully split it into the separate lines but when I display it, it will "push" the text down a line, like the line break is still there...
Does anyone have any ideas on what is going on and how I can remove these "invisible" control chars.
Text File:
Test1
Test2
Test3
Test4
fileReader:
Test1
Test2
Test3
Test4
lines() printout
Test1
Test2
Test3
Test4

Use trim() on each line, it'll remove extraneous whitespace.

The System.IO.File class has a ReadAllLines method that will actually give you back an array of strings, one per line.
If that method doesn't work, either, I would examine exactly what bytes are causing you issues. In the watch window, you can do a System.Text.Encoding.ASCII.GetBytes (sampleLine) and examine exactly what you are working with.
I'm assuming you are using ASCII encoding, if not, you'll need to swap out ASCII with the correct option, and then modify your file read to read based on that encoding, as well.

As mentioned use the Readalllines method to have it split automatically.
The problem you are having is PC ASCII files are usually split with a carriage return and a new line, splitting on just one will leave the other. You can split and trim as mentioned or use the other split that splits on strings instead of chars.
dim s() as string = Split(fileReader ,vbCrLf)
Trim will remove spaces from the data as well, depending on your situation that could be a problem for you.

Ran into a similar problem recently. The Trim() doesnt work because the extra lines are already there after doing the split (or using File.ReadAllLines). Here's what worked for me:
Dim allText As String = System.IO.File.ReadAllText(filePath)
allText = allText.Replace(Chr(13), "")
Dim lines As String() = allText.Split(vbLf)
Chr(13) is the Control-M character that result in extra lines using Split() or File.ReadAllLines.

Related

Streamwriter adding unwanted characters to the beginning of the line

I've written a program to write out an automated, delimited invoice file, and it seems to work properly, the file looks correct in notepad. However, when the file is received on the other end, there are a couple of extra characters at the beginning.
The code is basically
fWriter = My.Computer.FileSystem.OpenTextFileWriter(OutputPath)
LineString = 'insert line here
fWriter.WriteLine(LineString)`
the screenshot from the client has a three odd shaped characters at the beginning. They aren't in the input string, and I'm lead to believe it's because the OpenTextFileWriter isn't writing ascii, but it's a flat text file, or it's supposed to be.
Any help would be appreciated.

Read a text file and display result in a window

I have a text file which contains about 60 lines. I would like to parse out all the text from that file and display in a window. The text file contains words that are separated by an underscore. I would like to use regular expression to solve this problem.
Update:
This is my code as of now. I am trying to read "filename" in my code.
Dim filename = "D:\databases.txt"
Dim regexpression As String = "/^[^_]*_([^_]*)\w/"
I know I don't have much done here anyway but I am trying to learn VB on my own and have gotten stuck here.
Please feel free to suggest what I should be doing instead.
Something like this:
TextBox1.Lines = IO.File.ReadAllLines("fileName")
To remove underscores:
TextBox1.Lines = IO.File.ReadAllLines("fileName").Replace("_", String.Empty)
If you also need other special characters removed, you can use Regex.Replace:
Remove special characters from a string
Also on MSDN:
How to: Strip Invalid Characters from a String
Or the old school way - loop through all characters, and filter only those you need:
Most efficient way to remove special characters from string

Cutting up a CSV file using split

I have a csv file, when i use the split function, my issue is that the 16th segment of the array has a name in it (in most cases) that has first and last name split by a comma. This obviously causes me issues as it puts my array out of sync. any suggestions on how i can handle this?
the string in the 16th segment is surrounded by "" if that helps, the split function still splits it though.
you can use TextFieldParser as indicated here
I recommend Lumen CSV Library, it can correctly handle field values with commas.
Also it has a very good performance, and a very simple usage.
See the link above, it won't disappoint you.
I think you're missing the point. Split is only good for simple csv parsing. Anything that gets even a little complicated means a lot of extra code. Something like the TextFieldParser is better suited to what you want. However if you must use Split here's one way:
Dim TempArray() As String
Dim Output As New List(Of String)
If SourceString.Contains("""") Then
TempArray = SourceString.Split(""""c)
Output.AddRange(TempArray(0).Split(","c))
Output.Add(TempArray(1))
'If the quoted part of the csv line is at the end of the line omit this statement.
Output.AddRange(TempArray(2).Split(","c))
Else
Output = New List(Of String)(SourceString.Split(","c))
End If
This assumes that the data is strictly organized, except for the quotes, if not you'll have to add validation code.
Split by "," with the quotes instead of just a comma. Don't forget to take care of the first and last quotes on the line.

how to remove white spaces except vbCr/vbCrlf/newline in vb.net

I am currently using this code to remove to much new lines from an explode string..
Me.rtb.Lines = Me.rtb.Text.Split(New Char() {ControlChars.Lf}, _
StringSplitOptions.RemoveEmptyEntries)
I use RichTextBox. I split those string with
incoming = stringOfRtb.Split(ControlChars.CrLf.ToCharArray) ''vcrlf splitter
as a result, i get strings per line.. but sometimes, I think the first code removes not only the white spaces, but also the vbCrlf or the newlines that the module sends back. now the awful thing here is, it appears on random places so the strings that I put into textboxes shuffles and gets other arrays every time I receive the same data.
sometimes, its like this..
While rtb.Text.contains("vbLf")
rtb.txt = rbt.txt.replace("vbLfvbLf","vbLF")
end while
this code should make double chars into one.

String Manipulation Inconsistency

This is a more general question. I am reading a document and saving its contents into a string variable. The resulting variable contains approximately 1 million characters (no cleansing). My code would then search the string, and extract key words. However, I am hung-up on an issue:
If I pass the string directly to a message box, it will show me the contents using Mid:
Messagebox.Show(Mid(searchString, startPos, endPos))
However, if I first pass the mid to a string variable, the contents are empty and the messagebox displays nothing:
Dim myString as String
myString = Mid(searchString, startPos, endPos)
Messagebox.Show(myString)
The same effect happens when I use .substring and when I use a stringbuilder.
Does anybody know why this is happening? I assume something is happening during assignment, but I am not sure what is lost?
Here is a snippet of code:
searchPos = textString.IndexOf(searchText, searchPos, StringComparison.OrdinalIgnoreCase)
MessageBox.Show(searchPos)
MessageBox.Show(Mid(textString, searchPos, 100))
So, the inconsistency is as such: the length of textString is around 3,700,000 characters. When I find the indexOf, the value returned in the first Messagebox is 455,225. However, if I try to pull out the characters using Mid, the second messagebox is blank.
Also, although it claims to be 3,700,000 characters, if I do a messagebox on textString, I am only shown around 6 characters of what appears to be XML. The file that is being searched is an old .ppt file, and I know I can just work-around it, but I am confused by how the computer can find the indexof my searchText correctly, but then cannot show me anything.
Thoughts?