Problem with File IO and splitting strings with Environment.NewLine in VB.Net - vb.net

I was experimenting with basic VB.Net File IO and String splitting. I encountered this problem. I don't know whether it has something to do with the File IO or String splitting.
I am writing text to a file like so
Dim sWriter As New StreamWriter("Data.txt")
sWriter.WriteLine("FirstItem")
sWriter.WriteLine("SecondItem")
sWriter.WriteLine("ThirdItem")
sWriter.Close()
Then, I am reading the text from the file
Dim sReader As New StreamReader("Data.txt")
Dim fileContents As String = sReader.ReadToEnd()
sReader.Close()
Now, I am splitting fileContents using Environment.NewLine as the delimiter.
Dim tempStr() As String = fileContents.Split(Environment.NewLine)
When I print the resulting Array, I get some weird results
For Each str As String In tempStr
Console.WriteLine("*" + str + "*")
Next
I added the *s to the beginning and end of the Array items during printing, to find out what is going on. Since NewLine is used as the delimiter, I expected the strings in the Array to NOT have any NewLine's. But the output was this -
*FirstItem*
*
SecondItem*
*
ThirdItem*
*
*
Shouldn't it be this -
*FirstItem*
*SecondItem*
*ThirdItem*
**
??
Why is there a new line in the beginning of all but the first string?
Update: I did a character by character print of fileContents and got this -
F - 70
i - 105
r - 114
s - 115
t - 116
I - 73
t - 116
e - 101
m - 109
- 13
- 10
S - 83
e - 101
c - 99
o - 111
n - 110
d - 100
I - 73
t - 116
e - 101
m - 109
- 13
- 10
T - 84
h - 104
i - 105
r - 114
d - 100
I - 73
t - 116
e - 101
m - 109
- 13
- 10
It seems 'Environment.NewLine' consists of
- 13
- 10
13 and 10.. I understand. But the empty space in between? I don't know whether it is coming due to printing to the console or is really a part of NewLine.
So, when splitting, only the character equivalent of ASCII value 13, which is the first character of NewLine, is used as delimiter (as explained in the replies) and the remaining stuff is still present in the strings. For some reason, the mysterious empty space in the list above and ASCII value 10 together result in a new line being printed.
Now it is clear. Thanks for the help. :)

First of all, yes, WriteLine tacks on a newline to the end of the string, hence the blank line at the end.
The problem is the way you're calling fileContents.Split(). The only version of that function that takes only one argument takes a char(), not a string. Environment.NewLine is a string, not a char, so (assuming you have Option Strict Off) when you're calling the function it's implicitly converting it to a char, using only the first character in the string. This means that instead of splitting your string on the actual sequence of two characters that make up Environment.NewLine, it's actually splitting only on the first of those characters.
To get your desired output, you need to call it like this:
Dim delims() as String = { Environment.NewLine }
Dim tempStr() As String = fileContents.Split(delims, _
StringSplitOptions.RemoveEmptyEntries)
This will cause it to split on the actual string, rather than the first character as it's doing now, and it will remove any blank entries from the results.

Why not just use File.ReadAllLines? One single call reads the file and returns a string array with the lines.
Dim tempStr() As String = File.ReadAllLines("data.txt")

I just ran into the same issue, and found all the comments very helpful. However, I corrected my issue by replacing "Environment.NewLine" with vbLF (as opposed to vbCrLf, which had the same issue). Any issues with this approach? (It seems more straight forward, but I'm not a programmer, so I wouldn't know of any potential issues).

Related

How to get the last number of a string using selenium webdriver

1 - 2 of 2
Above is my text. This is from paging of a web application. How do i extract the last number of the above text. SO i will get the count of list in that page and i can run a loop with respect to the number.
You can use substring
Let's consider your example. You have a String 1 - 2 of 2 (pagination probably)
Each of individual character is a specified index of a String
1 = 0
space = 1
- = 2
space = 3
etc.
String has a set of methods to perform various tasks. One of them is length() which gives you number of characters in your String
What you can do is to pass your length of String to substring.
Example:
myString.substring(0,1) will give you results of 1
myString.substring(0,myString.length()) wil give you results of 1 - 2 of 5
Additional info: myString.length() is an int type so you can perform math operations like + or -
myString.substring(0,myString.length()-1) will give you results of 1 - 2 of
I gave you the tools, now it's time for you to find the solutions.
You could just split the string using the spaces and then grab the last element of the split array. That should cover you even if the last number has more than one digit. Throw in a trim, just in case, to remove any leading/trailing white space.
String[] splitter = pageCount.trim().split(" ");
System.out.println(splitter[splitter.length - 1]);

Find Each Occurrence of X and Insert a Carriage Return

A colleague has some data he is putting into a flat file (.txt) and needs to insert a carriage return before EACH occurrence of 'POL01', 'SUB01','VEH01','MCO01'.
I did use:
For Each line1 As String In System.IO.File.ReadAllLines(BodyFileLoc)
If line1.Contains("POL01") Or line1.Contains("SUB01") Or line1.Contains("VEH01") Or line1.Contains("MCO01") Then
Writer.WriteLine(Environment.NewLine & line1)
Else
Writer.WriteLine(line1)
End If
Next
But unfortunately it turns out that the file is not formatted in 'lines' by SSIS but as one whole string.
How can I insert a carriage return before every occurrence of the above?
Test Text
POL01CALT302276F 332 NBPM 00101 20151113201511130001201611132359 2015111300010020151113000100SUB01CALT302276F 332 NBPMP01 Akl Abi-Khalil 19670131 M U33 Stoford Close SW19 6TJ 2015111300010020151113000100VEH01CALT302276F 332 NBPM001LV56 LEJ N 2006VAUXHALL CA 2015111300010020151113000100MCO01CALT302276F 332 NBPM0101 0 2015111300010020151113000100POL01CALT742569N
You can use regular expressions for this, specifically by using Regex.Replace to find and replace each occurrence of the strings you're looking for with a newline followed by the matching text:
Dim str as String = "xxxPOL01xxxSUB01xxxVEH01xxxMCO01xxx"
Dim output as String = Regex.Replace(str, "((?:POL|SUB|VEH|MCO)01)", Environment.NewLine + "$1")
'output contains:
'xxx
'POL01xxx
'SUB01xxx
'VEH01xxx
'MCO01xxx
There may be a better way to construct this regular expression, but this is a simple alternation on the different letters, followed by 01. This matched text is represented by the $1 in the replacement string.
If you're new to regular expressions, there are a number of tools that help you understand them - for example, regex101.com will show you an explanation of the one I have used here:

Finding Characters in String

So, I'm having this problem and I have no idea how to handle it
Say I have a string with the following format:
"3 6 9 12 13 15 16"
I'm searching for "6" and I find it at position 3, and I remove it.
Next, I search for 6 again, and I find it at position IndexOf(6) (whatever that is). This time I don't want to remove it because it's the 6 in 16.
if string1.contains(6) then
string1 = string1.RemoveAt(string.IndexOf(6),2)
end if
This is vbnet, but any solution to this problem would help.
P.S. This is just a sample code, the main code I'm using has too many things attached to it, and cleaning it for this example would be a nightmare
You asked for a "fancier" solution so I'll give you one:
Dim input As String = "3 6 9 12 13 15 16"
Dim output As String = String.Join(" ", input.Split(" "c).Where(Function(s) s <> "6"))
Debug.WriteLine(output)
3 9 12 13 15 16
There could be more elegant ways of doing this, but if you are processing numbers, convert it to numbers, then you can just look for the number you are interested in.
If you need it in space delimited form, you can always convert it back.
I'm afraid there won't be any "shortcuts" with this one.
Another thing to consider if you are working with actual strings delimited by space, and are looking for patterns, then Regular Expressions is the way to go.
Dim input As String = "3 4 5 6 13 14 15 16"
Dim inputArray() As String = input.Split(" ")
Dim lst As New List(Of Integer)
For Each s In inputArray
lst.Add(Convert.ToInt32(s))
Next
If lst.Contains(6) Then
lst.Remove(6)
End If
A better way to solve the problem is to use regex (tested with sed on Mac OSX):
echo "6 3 6 9 12 13 15 16" | sed -E "s/(6 |[^1-9]6| 6$)//g"
# outputs
3 9 12 13 15 16

Convert =00 formatted UTF codes in a plain text file to the correct utf character in vb.net

writing a simple program to extract all the postal addresses from a big plain text file, having a problem as some of the addresses use non-standard characters.
This is some source text from the file I need to process:
Rua Vale de Louro, N=BA 97
Bloco 2, 1=BA A
but it needs to read:
Rua Vale de Louro, Nº 97
Bloco 2, 1º A
now obviously i could do a simple replace for this one characters but I need it to work with every character.
BA is the hex value of the º symbol in utf32 (albeit with a load of zeros preceding it) so if I can code something to find all these "=xx" instances in the string and replace them with the correct utf character that would solve it. but for the life of me I can't figure out how.
Can Anyone Help?
Thanks
Use
Dim txt As String = IO.File.ReadAllText("fileName", System.Text.Encoding.encoding) 'ASCII, UFT32, UFT8, Unicode etc...
Change the word encoding with the appropriate one.
It can be done using regular expressions with a match evaluator to calculate the replacement string.
Dim input = "Rua Vale de Louro, N=BA 97 Bloco 2, 1=BA A"
Dim expected = "Rua Vale de Louro, Nº 97 Bloco 2, 1º A"
Dim regex = new Regex("=([0-9A-Fa-f]+)",RegexOptions.CultureInvariant, TimeSpan.FromSeconds(10))
Dim evaluator = Function(match) Char.ConvertFromUtf32(Convert.ToInt32(match.Groups(1).Value, 16))
Dim actual = regex.Replace(input, evaluator)
The pattern matches = followed by one or more hex digits. The hex digits are in group 1.
The evaluator takes the hex digits, converts to an integer from base 16 and then converts to a Unicode codepoint.

How to convert string to byte in Visual Basic

I'm trying to simulate an algoritham in cryptography and I need to convert a string of 0s and 1s back into a word. Example:
I have: 01011110010101101000001101100001101
I have split it into an array of strings:
0101111, 0010101, ...
each member has 7 characters. I want to get a letter that 0101111 represents in UTF8? How do I do this?
I try CType("0010101", Byte), but it fails. I can pass max 111 this way.
Help :/
UTF-8 is 8 bit, those are only 7 bits. Do you mean 7 bit ASCII?
In that case here you go:
Function BinToStr(binStr As String) As String
Dim i As Long
For i = 0 To (Len(binStr) / 7) - 1
[A1] = CLng(Mid(binStr, i * 7 + 1, 7))
BinToStr = BinToStr & Chr([BIN2DEC(A1)])
Next
End Function
If that's not what you're looking for, let me know.