Find Each Occurrence of X and Insert a Carriage Return - vb.net

A colleague has some data he is putting into a flat file (.txt) and needs to insert a carriage return before EACH occurrence of 'POL01', 'SUB01','VEH01','MCO01'.
I did use:
For Each line1 As String In System.IO.File.ReadAllLines(BodyFileLoc)
If line1.Contains("POL01") Or line1.Contains("SUB01") Or line1.Contains("VEH01") Or line1.Contains("MCO01") Then
Writer.WriteLine(Environment.NewLine & line1)
Else
Writer.WriteLine(line1)
End If
Next
But unfortunately it turns out that the file is not formatted in 'lines' by SSIS but as one whole string.
How can I insert a carriage return before every occurrence of the above?
Test Text
POL01CALT302276F 332 NBPM 00101 20151113201511130001201611132359 2015111300010020151113000100SUB01CALT302276F 332 NBPMP01 Akl Abi-Khalil 19670131 M U33 Stoford Close SW19 6TJ 2015111300010020151113000100VEH01CALT302276F 332 NBPM001LV56 LEJ N 2006VAUXHALL CA 2015111300010020151113000100MCO01CALT302276F 332 NBPM0101 0 2015111300010020151113000100POL01CALT742569N

You can use regular expressions for this, specifically by using Regex.Replace to find and replace each occurrence of the strings you're looking for with a newline followed by the matching text:
Dim str as String = "xxxPOL01xxxSUB01xxxVEH01xxxMCO01xxx"
Dim output as String = Regex.Replace(str, "((?:POL|SUB|VEH|MCO)01)", Environment.NewLine + "$1")
'output contains:
'xxx
'POL01xxx
'SUB01xxx
'VEH01xxx
'MCO01xxx
There may be a better way to construct this regular expression, but this is a simple alternation on the different letters, followed by 01. This matched text is represented by the $1 in the replacement string.
If you're new to regular expressions, there are a number of tools that help you understand them - for example, regex101.com will show you an explanation of the one I have used here:

Related

Pandas use contains for a specific word excluding similar words

I am filtering a string using Result[Results['Subject'].str.contains('lock')] but I need to esclude words like "clock"
What I need is the sting staring with "lock", ending as " lock" or containing " lock "
Many thanks
Use a regex word delimiter \b:
Results[Results['Subject'].str.contains(r'\block\b')]
Example input:
Results = pd.DataFrame({'Subject': ['lock', 'clock', 'abc lock', 'locker']})
Output:
Subject
0 lock
2 abc lock

regex: match everything, but not a certain string including white sapce (regular expression, inspite of, anything but, VBA visual basic)

Folks, there are already billions of questions on "regex: match everything, but not ...", but non seems to fit my simple question.
A simple string: "1 Rome, 2 London, 3 Wembley Stadium" and I want to match just "1 Rome, 2 London, 3 Wembley Stadium", in order to extract only the names but not the ranks ("Rome, London, Wembley Stadium").
Using a regex tester (https://extendsclass.com/regex-tester.html), I can simply match the opposite by:
([0-9]+\s*) and it gives me:
"1 Rome, 2 London, 3 Wembley Stadium".
But how to reverse it? I tried something like:
[^0-9 |;]+[^0-9 |;], but it also excludes white spaces that I want to maintain (e.g. after the comma and in between Wembley and Stadium, "1 Rome, 2 London, 3 Wembley Stadium"). I guess the "0-9 " needs be determined somehow as one continuous string. I tried various brackets, quotation marks, \s*, but nothing jet.
Note: I'm working in a visual basic environment and not allowing lookbehinds!
You can use
\d+\s*(.*?)(?=,\s*\d+\s|$)
See the regex demo, get the values from match.Submatches(0). Details:
\d+ - one or more digits
\s* - zero or more whitespaces
(.*?) - Group 1: zero or more chars other than line break chars as few as possible
(?=,\s*\d+\s|$) - a positive lookahead that requires ,, zero or more whitespaces, one or more digits and then a whitespace OR end of string immediately to the right of the current location.
Here is a demo of how to get all matches:
Sub TestRegEx()
Dim matches As Object, match As Object
Dim str As String
str = "1 Rome, 2 London, 3 Wembley Stadium"
Set regex = New regExp
regex.Pattern = "\d+\s*(.*?)(?=,\s*\d+\s|$)"
regex.Global = True
Set matches = regex.Execute(str)
For Each match In matches
Debug.Print match.subMatches(0)
Next
End Sub
Output:

Perl6 split function adding extra elements to array

my #r = split("", "hi");
say #r.elems;
--> output: 4
split is adding two extra elements to the array, one at the beginning and another at the end.
I have to do shift and pop after every split to correct for this.
Is there a better way to split a string?
If you're splitting on the empty string, you will get an empty element at the start and the end of the returned list as there is also an empty string before and after the string.
What you want is .comb without parameters, written out completely functionally:
"hi".comb.elems.say; # 2
See https://docs.raku.org/routine/comb#(Str)_routine_comb for more info.
The reason for this is when you use an empty Str “” for the delimiter it is the same as if you had used the regex /<|wb>/ which matches next to characters. So it also matches before the first character, and after the last character. Perl 5 removes these “extra” strings for you in this case (and in this case only), which is likely where the confusion lays.
What Perl 6 does instead is allow you to explicitly :skip-empty values
'hi'.split('') :skip-empty
'hi'.split('', :skip-empty)
split("", "hi") :skip-empty
split("", "hi", :skip-empty)
Or to specify what you actually want
'hi'.comb( /./ )
'hi'.comb( 1 )
'hi'.comb
comb( /./, 'hi' )
comb( 1, 'hi' )

Extra blank space between words

Please help me with 2 questions on how to do the GREL expression for:
If there are double spaces between 2 words in a column, how can I eliminate 1 space Example: Robert--Smith to Robert-Smith The minus character equals a blank for illustration
How can I look for an exact word in a text filter.
Thanks!
1°) try transform---> value.replace(" "," ")
Or, simply common transforms ----> collapse consecutive white spaces
2°) Column ---> text filters and enter you word
Or, do column---> Facet---> Customs facet and type : value.contains(" you_word ")
or value.contains(/(yourexactword)/)
This will return a True or False facet
H.
#hpiedcoq is the right answer if you need to have them in GREL. if not you can just use the point and click interface:
for the first question: Select your column and select Edit cells > Common transforms > Collapse consecutive white space
for the second question: select your column > text filter > enter the work you are looking for. You can select case sensitive if you want to take into account upper and lower case in your search.
1.1 transform -- > value.replace(" "," ")
Deletes all double whitespace.
1.2 transform -- > value.trim()
Deletes all double whitespace and deletes whitespaces before and after the string.
1.3 transform -- > value.replace(/\b \b/," ")
Replace with regular expression, deletes only double whitespace between two words.
Text filter > turn on regular expression and use \b.
Text filter with regular expression: \bWord\b = exact word, before and after the word may or may not be a only whitespace.

Problem with File IO and splitting strings with Environment.NewLine in VB.Net

I was experimenting with basic VB.Net File IO and String splitting. I encountered this problem. I don't know whether it has something to do with the File IO or String splitting.
I am writing text to a file like so
Dim sWriter As New StreamWriter("Data.txt")
sWriter.WriteLine("FirstItem")
sWriter.WriteLine("SecondItem")
sWriter.WriteLine("ThirdItem")
sWriter.Close()
Then, I am reading the text from the file
Dim sReader As New StreamReader("Data.txt")
Dim fileContents As String = sReader.ReadToEnd()
sReader.Close()
Now, I am splitting fileContents using Environment.NewLine as the delimiter.
Dim tempStr() As String = fileContents.Split(Environment.NewLine)
When I print the resulting Array, I get some weird results
For Each str As String In tempStr
Console.WriteLine("*" + str + "*")
Next
I added the *s to the beginning and end of the Array items during printing, to find out what is going on. Since NewLine is used as the delimiter, I expected the strings in the Array to NOT have any NewLine's. But the output was this -
*FirstItem*
*
SecondItem*
*
ThirdItem*
*
*
Shouldn't it be this -
*FirstItem*
*SecondItem*
*ThirdItem*
**
??
Why is there a new line in the beginning of all but the first string?
Update: I did a character by character print of fileContents and got this -
F - 70
i - 105
r - 114
s - 115
t - 116
I - 73
t - 116
e - 101
m - 109
- 13
- 10
S - 83
e - 101
c - 99
o - 111
n - 110
d - 100
I - 73
t - 116
e - 101
m - 109
- 13
- 10
T - 84
h - 104
i - 105
r - 114
d - 100
I - 73
t - 116
e - 101
m - 109
- 13
- 10
It seems 'Environment.NewLine' consists of
- 13
- 10
13 and 10.. I understand. But the empty space in between? I don't know whether it is coming due to printing to the console or is really a part of NewLine.
So, when splitting, only the character equivalent of ASCII value 13, which is the first character of NewLine, is used as delimiter (as explained in the replies) and the remaining stuff is still present in the strings. For some reason, the mysterious empty space in the list above and ASCII value 10 together result in a new line being printed.
Now it is clear. Thanks for the help. :)
First of all, yes, WriteLine tacks on a newline to the end of the string, hence the blank line at the end.
The problem is the way you're calling fileContents.Split(). The only version of that function that takes only one argument takes a char(), not a string. Environment.NewLine is a string, not a char, so (assuming you have Option Strict Off) when you're calling the function it's implicitly converting it to a char, using only the first character in the string. This means that instead of splitting your string on the actual sequence of two characters that make up Environment.NewLine, it's actually splitting only on the first of those characters.
To get your desired output, you need to call it like this:
Dim delims() as String = { Environment.NewLine }
Dim tempStr() As String = fileContents.Split(delims, _
StringSplitOptions.RemoveEmptyEntries)
This will cause it to split on the actual string, rather than the first character as it's doing now, and it will remove any blank entries from the results.
Why not just use File.ReadAllLines? One single call reads the file and returns a string array with the lines.
Dim tempStr() As String = File.ReadAllLines("data.txt")
I just ran into the same issue, and found all the comments very helpful. However, I corrected my issue by replacing "Environment.NewLine" with vbLF (as opposed to vbCrLf, which had the same issue). Any issues with this approach? (It seems more straight forward, but I'm not a programmer, so I wouldn't know of any potential issues).