I am writing a program in Visual Basic 2010 that lists how many times a word of each length occurs in a user-inputted string. Although most of the program is working, I have one problem:
When looping through all of the characters in the string, the program checks whether there is a next character (such that the program does not attempt to loop through characters that do not exist). For example, I use the condition:
If letter = Microsoft.VisualBasic.Right(input, 1) Then
Where letter is the character, input is the string, and Microsoft.VisualBasic.Right(input, 1) extracts the rightmost character from the string. Thus, if letter is the rightmost character, the program will cease to loop through the string.
This is where the problems comes in. Let us say the string is This sentence has five words. The rightmost character is an s, but an s is also the fourth and sixth character. That means that the first and second s will break the loop just as the others will.
My questions is whether there is a way to ensure that only the last s, or whatever character is the last one in the string can break the loop.
There are a few methods you can use for this, one as Neolisk shows; here are a couple of others:
Dim breakChar As Char = "s"
Dim str As String = "This sentence has five words"
str = str.Replace(".", " ")
str = str.Replace(",", " ")
str = str.Replace(vbTab, " ")
' other chars to replace
Dim words() As String = str.ToLower.Split(New Char() {" "}, StringSplitOptions.RemoveEmptyEntries)
For Each word In words
If word.StartsWith(breakChar) Then Exit For
Console.WriteLine("M1 Word: ""{0}"" Length: {1:N0}", word, word.Length)
Next
If you need to loop though chars for whatever reason, you can use something like this:
Dim breakChar As Char = "s"
Dim str As String = "This sentence has five words"
str = str.Replace(".", " ")
str = str.Replace(",", " ")
str = str.Replace(vbTab, " ")
' other chars to replace
'method 2
Dim word As New StringBuilder
Dim words As New List(Of String)
For Each c As Char In str.ToLower.Trim
If c = " "c Then
If word.Length > 0 'support multiple white-spaces (double-space etc.)
Console.WriteLine("M2 Word: ""{0}"" Length: {1:N0}", word.ToString, word.ToString.Length)
words.Add(word.ToString)
word.Clear()
End If
Else
If word.Length = 0 And c = breakChar Then Exit For
word.Append(c)
End If
Next
If word.Length > 0 Then
words.Add(word.ToString)
Console.WriteLine("M2 Word: ""{0}"" Length: {1:N0}", word.ToString, word.ToString.Length)
End If
I wrote these specifically to break on the first letter in a word as you ask, adjust as needed.
VB.NET code to calculate how many times a word of each length occurs in a user-inputted string:
Dim sentence As String = "This sentence has five words"
Dim words() As String = sentence.Split(" ")
Dim v = From word As String In words Group By L = word.Length Into Group Order By L
Line 2 may need to be adjusted to remove punctuation characters, trim extra spaces etc.
In the above example, v(i) contains word length, and v(i).Group.Count contains how many words of this length were encountered. For debugging purposes, you also have v(i).Group, which is an array of String, containing all words belonging to this group.
Related
I have a VBA formula-function to split a string and add space between each character. It works fines only for an Ascii string. But I want to do the same for the Tamil Language. Since it is Unicode, the result is not readable. It splits even the auxiliary characters, Upper dots, Prefix, Suffix auxilary characters which should not be separated in Tamil/Hindi/Kanada/Malayalam/All India Languages. So, how to write a function to split a Tamil Word into readable characters.
Function AddSpace(Str As String) As String
Dim i As Long
For i = 1 To Len(Str)
AddSpace = AddSpace & Mid(Str, i, 1) & " "
Next i
AddSpace = Trim(AddSpace)
End Function
Adding Space is not the important point of this question. Splitting the Unicode string into an array from any of those languages is the requirement.
For example, the word, "பார்த்து" should be separated as "பா ர் த் து", not as "ப ா ர ் த ் த ு". As you can see, the first two letters "பா" (ப + ா) are combined. If I try to manually put a space in between them, I can't do it in any word processor. If you want to test, please put it in Notepad and add space between each character. It won't allow you to separate as ("ப ா"). So "பார்த்து" should be separated as "பா ர் த் து". It is the correct separation in Tamil like languages. This is the one that I am struggling to achieve in VBA.
The Character Code table for Tamil is here.
Tamil/Hindi/many Indian languages have (1)Consonants, (2)Independent vowels, (3)Dependent vowel signs, (4)Two-part dependent vowel signs. Among these 4 types, the first two are each one separate lettter, no issues with them. but the last 2 are dependent, they should not be separated from its joint character. For example, the letter, பா (ப + ் ), it contains one independent (ப) and one dependent (ா) letter.
If this info is not enough, please comment what should I post more.
(Note: It is possible in C#.Net using the code from the MS link by #Codo)
You can assign a string to a Byte array so the following might work
Dim myBytes as Byte
myBytes = "Tamilstring"
which generates two bytes for each character. You could then create a second byte array twice the size of the first by using space$ to crate a suitable string and then use a for loop (step 4) to copy two bytes at a time from the first to the second array. Finally, assign the byte array back to a string.
The problem you have is you are looking for what Unicode calls an extended grapheme cluster.
For a Unicode compatible regex engine that is simply /\X/
Not sure how you do that in VBA.
Referring the link mentioned by #ScottCraner in comments on the question and Character code for Tamil.
Check the result in cell A2 and highlighted in yellow are Dependent vowel signs which are used in DepVow string
Sub Split_Unicode_String()
'https://stackoverflow.com/questions/68774781/how-to-split-an-unicode-string-to-readable-characters
Dim my_string As String
'input string
Dim buff() As String
'array of input string characters
Dim DepVow As String
'Create string of Dependent vowel signs
Dim newStr As String
'result string with spaces as desired
Dim i As Long
my_string = Range("A1").Value
ReDim buff(Len(my_string) - 1) 'array of my_string characters
For i = 1 To Len(my_string)
buff(i - 1) = Mid$(my_string, i, 1)
Cells(1, i + 2) = buff(i - 1)
Cells(2, i + 2) = AscW(buff(i - 1)) 'used this for creating DepVow below
Next i
'Create string of Dependent vowel signs preceded and succeeded by comma
DepVow = "," & Join(Array(ChrW$(3006), ChrW$(3021), ChrW$(3009)), ",")
newStr = ""
For i = LBound(buff) To UBound(buff)
If InStr(1, DepVow, ChrW$(AscW(buff(i + 1))), vbTextCompare) > 0 Then
newStr = newStr & ChrW$(AscW(buff(i))) & ChrW$(AscW(buff(i + 1))) & " "
i = i + 1
Else
newStr = newStr & ChrW$(AscW(buff(i))) & " "
End If
Next i
'result string in range A2
Cells(2, 1) = Left(newStr, Len(newStr) - 1)
End Sub
Try below algorithm. which will concat all the mark characters with letter characters.
redim letters(0)
For i=1 To Len(Str)
If ascW(Mid(Str,i,1)) >3005 And ascW(Mid(Str,i,1)) <3022 Then
letters(UBound(letters)-1) = letters(UBound(letters)-1)+Mid(Str,i,1)
Else REDIM PRESERVE
letters(UBound(letters) + 1)
letters(UBound(letters)-1) = Mid(Str,i,1)
End If
Next
MsgBox(join(letters, ", "))'return பா, ர், த், து,
I'm a programing student, so I've started with vb.net as my first language and I need some help.
I need to know how I delete excess white spaces between words in a sentence, only using these string functions: Trim, instr, char, mid, val and len.
I made a part of the code but it doesn't work, Thanks.
enter image description here
Knocked up a quick routine for you.
Public Function RemoveMyExcessSpaces(str As String) As String
Dim r As String = ""
If str IsNot Nothing AndAlso Len(str) > 0 Then
Dim spacefound As Boolean = False
For i As Integer = 1 To Len(str)
If Mid(str, i, 1) = " " Then
If Not spacefound Then
spacefound = True
End If
Else
If spacefound Then
spacefound = False
r += " "
End If
r += Mid(str, i, 1)
End If
Next
End If
Return r
End Function
I think it meets your criteria.
Hope that helps.
Unless using those VB6 methods is a requirement, here's a one-line solution:
TextBox2.Text = String.Join(" ", TextBox1.Text.Split(New Char() {" "c}, StringSplitOptions.RemoveEmptyEntries))
Online test: http://ideone.com/gBbi55
String.Split() splits a string on a specific character or substring (in this case a space) and creates an array of the string parts in-between. I.e: "Hello There" -> {"Hello", "There"}
StringSplitOptions.RemoveEmptyEntries removes any empty strings from the resulting split array. Double spaces will create empty strings when split, thus you'll get rid of them using this option.
String.Join() will create a string from an array and separate each array entry with the specified string (in this case a single space).
There is a very simple answer to this question, there is a string method that allows you to remove those "White Spaces" within a string.
Dim text_with_white_spaces as string = "Hey There!"
Dim text_without_white_spaces as string = text_with_white_spaces.Replace(" ", "")
'text_without_white_spaces should be equal to "HeyThere!"
Hope it helped!
I have just started using visual basic and wanted to create a program that counted the number of times a word appeared. My plan was develop a program that analyses a sentence that contains several words without punctuation. When
a word in that sentence is input, the program identifies all of the positions where the word occurs in the sentence.
I started by making a code that counted the amount of spaces in a sentence but am now stuck.
Module Module1
Sub Main()
Dim Sentence As String
Dim SentenceLength As Integer
Dim Text As String
Console.WriteLine("ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY")
Console.WriteLine("Enter your word ") : Sentence = Console.ReadLine
Dim TextCounter As Integer = 0
Dim MainWord As String = Sentence
Dim CountChar As String = " "
Do While InStr(MainWord, CountChar) > 0
MainWord = Mid(MainWord, 1 + InStr(MainWord, CountChar), Len(MainWord))
TextCounter = TextCounter + 1
Text = TextCounter + 2
Console.WriteLine(Text)
Loop
Console.WriteLine(TextCounter)
Console.Write("Press Enter to Exit")
Console.ReadLine()
End Sub
End Module
A quick & dirty method is to split the string into an array of strings, then count how many times a word appears in it:
Dim words() As String = Sentence.Split(new char() {" ", ",", ".", ";"} ' add other punctuation as appropriate
Dim count = words.Count(Function(word) word = MainWord)
This uses the String.Split method to split the string each time a space is encountered. Then it uses the Enumerable.Count extension method to count the words that match a certain condition, that the word is equal to MainWord
To count substrings:
Dim count = UBound(Split("catty cat", "cat")) ' 2
To count words:
Dim countWords = Regex.Matches("catty cat", "\bcat\b").Count ' 1
have some method which guard user to allow only letters, digits, - and single spaces. Saying letters i thought that letters only (a-z and A-Z) but without e.g ę, Ę, ą Ą, ś Ś, ż Ż etc... Can you please help me to fix my below code to check also that one? (not using regex)
For Each c As Char In txtSymbol.Text
If Not Char.IsLetterOrDigit(c) AndAlso c <> "-"c AndAlso c <> " " Then
MessageBox.Show("Only lower/upper letters, digits, - and single spaces are allowed"", "Warning", MessageBoxButtons.OK, MessageBoxIcon.Warning)
Exit Try
End If
Next
For further discussion:
'--We elping user with leading and ending spaces to be removed and more than one space in same placed to be convert to only one space
Dim str As String = txtNazwa.Text.Trim 'deleting leading and ending spaces
While str.Contains(" ") 'deleting more than one space in same place
str = str.Replace(" ", " ")
End While
txtNazwa.Text = str 'corrected one we passed to textbox
'Now we checking further for only those can be presented to pass test:
'--> single space
'--> letters a-z A-Z
'--> digits
'--> -
Dim pattern As String = "^([a-zA-Z0-9-]+\s)*[a-zA-Z0-9-]+$"
Dim r As New Regex(pattern)
If Not r.IsMatch(str) Then
Exit Try
End If
You can try to use this regex:
^([a-zA-Z0-9-]+\s)*[a-zA-Z0-9-]+$
Regex Demo
In your code you can try it like this:
Dim str As String = "^[a-zA-Z0-9 ]*$"
Dim r As New Regex(str)
Console.WriteLine(r.IsMatch("yourInputString"))
lets say I have a string that I want to split based on several characters, like ".", "!", and "?". How do I figure out which one of those characters split my string so I can add that same character back on to the end of the split segments in question?
Dim linePunctuation as Integer = 0
Dim myString As String = "some text. with punctuation! in it?"
For i = 1 To Len(myString)
If Mid$(entireFile, i, 1) = "." Then linePunctuation += 1
Next
For i = 1 To Len(myString)
If Mid$(entireFile, i, 1) = "!" Then linePunctuation += 1
Next
For i = 1 To Len(myString)
If Mid$(entireFile, i, 1) = "?" Then linePunctuation += 1
Next
Dim delimiters(3) As Char
delimiters(0) = "."
delimiters(1) = "!"
delimiters(2) = "?"
currentLineSplit = myString.Split(delimiters)
Dim sentenceArray(linePunctuation) As String
Dim count As Integer = 0
While linePunctuation > 0
sentenceArray(count) = currentLineSplit(count)'Here I want to add what ever delimiter was used to make the split back onto the string before it is stored in the array.'
count += 1
linePunctuation -= 1
End While
If you add a capturing group to your regex like this:
SplitArray = Regex.Split(myString, "([.?!])")
Then the returned array contains both the text between the punctuation, and separate elements for each punctuation character. The Split() function in .NET includes text matched by capturing groups in the returned array. If your regex has several capturing groups, all their matches are included in the array.
This splits your sample into:
some text
.
with punctuation
!
in it
?
You can then iterate over the array to get your "sentences" and your punctuation.
.Split() does not provide this information.
You will need to use a regular expression to accomplish what you are after, which I infer as the desire to split an English-ish paragraph into sentences by splitting on punctuation.
The simplest implementation would look like this.
var input = "some text. with punctuation! in it?";
string[] sentences = Regex.Split(input, #"\b(?<sentence>.*?[\.!?](?:\s|$))");
foreach (string sentence in sentences)
{
Console.WriteLine(sentence);
}
Results
some text.
with punctuation!
in it?
But you are going to find very quickly that language, as spoken/written by humans, does not follow simple rules most of the time.
Here it is in VB.NET for you:
Dim sentences As String() = Regex.Split(line, "\b(?<sentence>.*?[\.!?](?:\s|$))")
Once you've called Split with all 3 characters, you've tossed that information away. You could do what you're trying to do by splitting yourself or by splitting on one punctuation mark at a time.