Split string on parentheses and braces - vb.net

Let me say, I hate working with strings! I'm trying to find a way to split a string on brackets. For example, the string is:
Hello (this is) me!
And, from this string, get an array with Hello and me. I would like to do this with parentheses and braces (not with brackets). Please note that the string is variable, so something like SubString wouldn't work.
Thanks in advance,
FWhite

You can use regular expressions (Regex), below code should exclude text inside all parenthesis and braces, also removes an exclamation mark - feel free to expand CleanUp method to filter out other punctuation symbols:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim re As New Regex("\(.*\)|{.*}") 'anything inside parenthesis OR braces
Dim input As String = "Hello (this is) me and {that is} him!"
Dim inputParsed As String = re.Replace(input, String.Empty)
Dim reSplit As New Regex("\b") 'split by word boundary
Dim output() As String = CleanUp(reSplit.Split(inputParsed))
'output = {"Hello", "me", "and", "him"}
End Sub
Private Function CleanUp(output As String()) As String()
Dim outputFiltered As New List(Of String)
For Each v As String In output
If String.IsNullOrWhiteSpace(v) Then Continue For 'remove spaces
If v = "!" Then Continue For 'remove punctuation, feel free to expand
outputFiltered.Add(v)
Next
Return outputFiltered.ToArray
End Function
End Module
To explain the regular expression I used (\(.*\)|{.*}):
\( is just a (, parenthesis is a special symbol in Regex, needs to be escaped with a \.
.* means anything, i.e. literally any combination of characters.
| is a logical OR, so the expression will match either left or ride side of it.
{ does not need escaping, so it just goes as is.
Overall, you can read this as Find anything inside parenthesis or braces, then the code says replace the findings with an empty string, i.e. remove all occurrences. One of the interesting concepts here is understanding greedy vs lazy matching. In this particular case greedy (default) works well, but it's good to know other options.
Useful resources for working with Regex:
http://regex101.com/ - Regex test/practice/sandbox.
http://www.regular-expressions.info/ - Theory and examples.
http://www.regular-expressions.info/wordboundaries.html - How word boundaries work.

Try this code:
Dim var As String = "Hello ( me!"
Dim arr() As String = var.Split("(")
MsgBox(arr(0)) 'Display Hello
MsgBox(arr(1)) 'Display me!

Something like this should work for you:
Dim x As String = "Hello (this is) me"
Dim firstString As String = x.Substring(0, x.IndexOf("("))
Dim secondString As String = x.Substring(x.IndexOf(")") + 1)
Dim finalString = firstString & secondString
x = "Hello (this is) me"
firstString = "Hello "
secondString = " me"
finalString = "Hello me"

Related

Cannot trim closed bracket from vb.net string

I have an issue trimming a string in vb.net
Dim bgColor1 As String = (foundRows(count).Item(16).ToString())
'This returns Color [Indigo] I need it to be just Indigo so vb.net can read it.
'So i used this
Dim MyChar() As Char = {"C", "o", "l", "r", "[", "]", " "}
Dim firstBgcolorbgColor1 As String = bgColor1.TrimStart(MyChar)
'But the ] is still in the string so it looks like this Indigo]
Any ideas on why i cannot trim the ]?
Update
Didn't see that the input was "Color [Indigo]". I would not recommend TrimStart() & TrimEnd()
You have a variety of options to choose from:
Imports System
Imports System.Text.RegularExpressions
Public Module Module1
Public Sub Main()
Dim Color As String = "Color [Indigo]"
' Substring() & IndexOf()
Dim openBracket = Color.IndexOf("[") + 1
Dim closeBracket = Color.IndexOf("]")
Console.WriteLine(Color.Substring(openBracket, closeBracket - openBracket))
' Replace()
Console.WriteLine(Color.Replace("Color [", String.Empty).Replace("]", String.Empty))
' Regex.Replace()
Console.WriteLine(Regex.Replace(Color, "Color \[|\]", String.Empty))
' Regex.Match()
Console.WriteLine(Regex.Match(Color, "\[(\w+)\]").Groups(1))
End Sub
End Module
Results:
Indigo
Indigo
Indigo
Indigo
Demo
Well, you are calling TrimStart(...), which as the name implies, will only trim the front part of the string.
Did you mean to call Trim(MyChar) instead?
You could use a Regex to do the job:
Dim colorRegex As New Regex("(?<=\[)\w+") 'Get the word following the bracket ([)
Dim firstBgcolorbgColor1 As String = colorRegex.Match(bgColor1).Value
The TrimStart, TrimEnd, and Trim functions remove spaces from beginning, end and both side of the strings respectively. You are using TrimStart to remove any leading the spaces but it is leaving white space is at the end. So you need to use Trim. Trim won't remove anything else than white space characters, so your ] character will still appear in the final string. You need to do String.Remove to remove characters you don't want.
Examples here: http://www.dotnetperls.com/remove-vbnet

Lowercase the first word

Does anybody know how to lowercase the first word for each line in a textbox?
Not the first letter, the first word.
I tried like this but it doesn't work:
For Each iz As String In txtCode.Text.Substring(0, txtCode.Text.IndexOf(" "))
iz = LCase(iz)
Next
When you call Substring, it is making a copy of that portion of the string and returning it as a new string object. So, even if you were successfully changing the value of that returned sub-string, it still would not change the original string in the Text property.
However, strings in .NET are immutable reference-types, so when you set iz = ... all you are doing is re-assigning the iz variable to point to yet another new string object. When you set iz, you aren't even touching the value of that copied sub-string to which it previously pointed.
In order to change the value of the text box, you must actually assign a new string value to its Text property, like this:
txtCode.Text = "the new value"
Since that is the case, I would recommend building a new string, using a StringBuilder object, and then, once the modified string is complete, then set the text box's Text property to that new string, for instance:
Dim builder As New StringBuilder()
For Each line As String In txtCode.Text.Split({Environment.NewLine}, StringSplitOptions.None)
' Fix case and append line to builder
Next
txtCode.Text = builder.ToString()
The solutions here are interesting but they are ignoring a fundamental tool of .NET: regular expressions. The solution can be written in one expression:
Dim result = Regex.Replace(txtCode.Text, "^\w+",
Function (match) match.Value.ToLower(), RegexOptions.Multiline)
(This requires the import System.Text.RegularExpressions.)
This solution is likely more efficient than all the other solutions here (It’s definitely more efficient than most), and it’s less code, thus less chance of a bug and easier to understand and to maintain.
The problem with your code is that you are running the loop only on each character of the first word in the whole TextBox text.
This code is looping over each line and takes the first word:
For Each line As String In txtCode.Text.Split(Environment.NewLine)
line = line.Trim().ToLower()
If line.IndexOf(" ") > 0 Then
line = line.Substring(0, line.IndexOf(" ")).Trim()
End If
// do something with 'line' here
Next
Loop through each of the lines of the textbox, splitting all of the words in the line, making sure to .ToLower() the first word:
Dim strResults As String = String.Empty
For Each strLine As String In IO.File.ReadAllText("C:\Test\StackFlow.txt").Split(ControlChars.NewLine)
Dim lstWords As List(Of String) = strLine.Split(" ").ToList()
If Not lstWords Is Nothing Then
strResults += lstWords(0).ToLower()
If lstWords.Count > 1 Then
For intCursor As Integer = 1 To (lstWords.Count - 1)
strResults += " " & lstWords(intCursor)
Next
End If
End If
Next
I used your ideas guys and i made it up to it like this:
For Each line As String In txtCode.Text.Split(Environment.NewLine)
Dim abc() As String = line.Split(" ")
txtCode.Text = txtCode.Text.Replace(abc(0), LCase(abc(0)))
Next
It works like this. Thank you all.

Avoid escaping all double-quotes in strings?

In a string that has multiple double-quotes, is there an easier solution than escaping each and everyone of them?
For instance, here's an HTML string:
Dim test As Regex = New Regex("^<div class="blah">\r\n<div class="blah"></div>\r\n</div>\r\n<div class="blah">\r\n(.+?)\r\n<div> class="blah">\r\n", RegexOptions.Singleline)
Alternatively, to hold the string, can VB.Net be told to use another character than the double-quote, eg.
New Regex(#my string="" my other string=""#)
?
Thank you.
If you are certain that the # symbol does not appear in your string...
Dim s As String = Replace("^<div class=#blah#>\r\n<div class=#blah#></div>\r\n</div>\r\n<div class=#blah#>\r\n(.+?)\r\n<div> class=#blah#>\r\n", "#", """")
Dim test As Regex = New Regex(s, RegexOptions.Singleline)

VB.NET Get text in between Quotations or other symbols

I want to be able to extract a string in between quotation marks or parenthesis etc. to a variable. For example my text might be "Hello there "Bob" ". I want to extract the text "Bob" from in between the two quotation marks and put it in the string "name" for later use. The same would be for "Hello there (Bob)". How would I go about this? Thanks.
=======EDIT======
Sorry, I worded this poorly. Ok, so lets say I have a textbox(Textbox1) and a button. If the user inputs the text: MsgBox "THIS IS MY MESSAGE" I want that when the Button is pressed, only the text THIS IS MY MESSAGE is displayed.
This is a solution very simple:
Dim sAux() As String = TextBox1.Text.Split(""""c)
Dim sResult As String = ""
If sAux.Length = 3 Then
sResult = sAux(1)
Else
' Error or something (number of quotes <> 2)
End If
There are basically three methods -- regular expressions, string.indexof and substring and finally looping over the characters one by one. I would avoid the latter as it is just reinventing the wheel. Whether to use regexs or indexof depends upon the complexity of your requirements and data. Indexof is a bit wordy but fairly straightforward and possibly just what you want in this case.
Dim str as String = "Hello there ""Bob"""
Dim startName as Integer
Dim endName as Integer
Dim name as String = ""
startName = str.IndexOf("""")
endName = str.Indexof("""", If(startName > 0, startName,0))
If (endName>startName) Then
name = str.SubString(startName, endName)
End If
If you need to do this for arbitrary symbols, then you want regexs.

.split removes tabs/spaces in string?

i am trying to split a string up into separate lines with the following code, but for some reason it is also removing the spaces in the string.
Dim calculationText As String
calculationText = File.ReadAllText(fileName)
Dim fields() As String
fields = calculationText.Split(vbCrLf)
when i am in debugger mode, i look at fields, and every element has a line of the string but all the spaces and tabs are removed.
any reason for this?
If you are reading from a file, can you use:
Sub Main()
Dim fields As New List(Of String)
' read file into list
Using sr As System.IO.StreamReader = My.Computer.FileSystem.OpenTextFileReader(filename)
Try
Do While sr.Peek() >= 0
fields.Add(sr.ReadLine())
Loop
Finally
If sr IsNot Nothing Then sr.Close()
End Try
End Using
' check results
For Each line As String In fields
Console.WriteLine(line)
Next
End Sub
How 'bout:
Dim fields() As String = File.ReadAllLines(fileName)
As for why string.Split() is doing weird things...
vbCrLf is a string, and there's not an overload for string.split that accepts a single string parameter. If he were to turn on Option Explicit it wouldn't even compile, but since it's off, vbCrLf can be interpreted as an array of characters. And in this code, that's exactly what happens:
Sub Main()
Dim z As String = "The quick brown" & vbCrLf & " fox jumps over the lazy dogs."
Dim a() As String = z.Split(vbCrLf)
For Each c As String In a
Console.WriteLine(c)
Next
Console.ReadKey(True)
End Sub
You'll see two line breaks between the 1st and 2nd parts of that string. Something else is stripping out the spaces. Can you share the larger code block?
Gotta say I've never seen it do that, and I've used String.Split extensively. Are they really really gone, or is it a trick of the debugger?
There's not actually any .Split method that takes one string as the parameter, so the VB compiler would be doing "things" behind the scenes to pick a different overload. To try and force the correct overload, you could try calculationText.Split(vbCrLf.ToCharArray()). I doubt it will help, but you never know :-)