Split text between multiple delimiters - vb.net

I have a rather tricky problem with my ongoing project.
I pretty much need to extract ceritain Strings between muliple delimiters out of a bigger String.
To give you a better understanding, what I mean, here is an example:
Some Text that wont be needed
Some Text that wont be needed
Some Text that wont be needed
Some Text that wont be needed
Some Text that wont be needed
Textstart (Start-Delimiter)
Text I want
Text I want
Text I want
Text I want
Text I want
Textend (End-Delimiter)
So far, so easy. But now comes a messy part in. The End-delimiters change sometimes like this
Textstart
Text I want
Text I want
Text I want
Text I want
Textend2 (another end delimiter)
I also solved that Problem, but now since I discovered, that the start delimiter can also occur twice before the next endpart.
Like this:
Textstart (Start-Delimiter)
Text I want
Text I want
Textstart
Text I want
Text I want
Textend (End-Delimiter)
This really is confusing to me. This is the function right now. It works but only if the start delimiter does not occur twice.
I could split the text first by the end strings and after that by the start string, but I don't know hot to split a text by multiple delimiters.
Function NewTextGet(ByVal Text As String, ByVal StartString As String, ByVal EndStrings() As String)
Dim AllBlocks As New List(Of String)
Dim FirstSplit() As String = Strings.Split(Text, StartString) ' Splits Text at Start delimiter
For Each splt In FirstSplit.Skip(1)
Dim EndSplit1 = splt.Split({EndStrings(0)}, StringSplitOptions.None) ' First end delimiter Split
Dim EndSplit2 = EndSplit1(0).Split({EndStrings(1)}, StringSplitOptions.None) ' Second delimiter Split
Dim EndSplit3 = EndSplit2(0).Split({EndStrings(2)}, StringSplitOptions.None) ' Third delimiter Split
If EndSplit3.Length > 1 Then
AllBlocks.Add(EndSplit3(0))
ElseIf EndSplit2.Length > 1 Then
AllBlocks.Add(EndSplit2(0))
Else
AllBlocks.Add(EndSplit1(0))
End If
Next
Return AllBlocks
End Function`
I hope I explained this well enough, and thank you for any help :)

This version produces a List(Of List(OF String)). So each set of lines will be in a different list:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim fileName As String = "C:\Users\mikes\Downloads\gYeziPRE.txt"
Dim blocks As List(Of List(Of String)) = NewTextGet(My.Computer.FileSystem.ReadAllText(fileName), "ctxt", New String() {"done", "sdone", "prompt"})
For i As Integer = 0 To blocks.Count - 1
Debug.Print("Block: " & i)
For Each line As String In blocks(i)
Debug.Print(line)
Next
Debug.Print("")
Next
End Sub
Function NewTextGet(ByVal Text As String, ByVal StartString As String, ByVal EndStrings() As String) As List(Of List(Of String))
Dim started As Boolean = False
Dim curBlock As List(Of String)
Dim AllBlocks As New List(Of List(Of String))
Dim lines() As String = Text.Split(Environment.NewLine)
For Each line As String In lines
If line.Contains(StartString) Then
If Not started Then
started = True
curBlock = New List(Of String)
End If
Dim i As Integer = line.IndexOf(StartString)
curBlock.Add(line.Substring(i + StartString.Length).Trim())
ElseIf EndStrings.Contains(line.Trim()) Then
started = False
If Not IsNothing(curBlock) Then
AllBlocks.Add(curBlock)
End If
curBlock = Nothing
ElseIf started = True AndAlso Not IsNothing(curblock) Then
curBlock.Add(line.Trim())
End If
Next
If Not IsNothing(curBlock) Then
AllBlocks.Add(curBlock)
End If
Return AllBlocks
End Function
Output:
Block: 0
"What's up?"
para "All these Trainers"
line "look the same, but"
para "only one is the"
line "leader!"
Block: 1
"Am I Koji?"
para "Why, yes, I am!"
Block: 2
"Well done!"
para "Here!"
para "The Fist Badge!"
Block: 3
"<PLAYER> received"
line "Fist Badge."
Block: 4
"Here!"
para "Take this TM!"
Block: 5
"Hah!"
para "That was joyful"
line "sparring!"
Block: 6
"Japanese"
line "onomatopoeia"
cont "are so kawaii!"
Block: 7
"Hiri hiri!"
Block: 8
"Uwaaaa!"
Block: 9
"Well, you chose"
line "unwisely."
Block: 10
"You have more"
line "chances."
Block: 11
"Koji is hot."
para "Dressing like him"
line "is<...>"
para "wonderful!"
Block: 12
"Wasn't supposed"
line "to happen!"
Block: 13
"Can't wait for"
line "Halloween!"
Block: 14
"Ninjas are so"
line "cool!"
Block: 15
"Not skilled"
line "enough!"
Block: 16
"Time to study"
line "ninjutsu instead"
cont "of pretending."

Try this
Function NewTextGet(ByVal RawText As String, ByVal StartString As String, ByVal EndStrings() As String) As List(Of String)
Dim bEnd As List(Of String) = EndStrings.ToList
bEnd.Insert(0, StartString)
Dim Blocks As New List(Of String)
Dim Splits() As String = Split(RawText, vbNewLine, , CompareMethod.Text)
For x As Integer = 0 To Splits.Length - 1
1:
Dim block As String = ""
If Splits(x).Contains(StartString) Then
block = Splits(x)
For y As Integer = x + 1 To Splits.Length - 1
Dim BlockEnd As Boolean = False
For Each s As String In bEnd
If Splits(y).Contains(s) Then BlockEnd = True
Next
block &= vbNewLine
If BlockEnd Then
If Splits(y).Contains(StartString) Then
Blocks.Add(block & vbNewLine)
x = y - 1
GoTo 1
End If
x = y + 1
block &= Splits(y)
Blocks.Add(block & vbNewLine)
Exit For
End If
block &= Splits(y)
Next
End If
Next
Return Blocks
End Function
usage
For Each s As String In NewTextGet(Raw, "ctxt", New String() {"sdone", "done", "prompt"})
TextBox2.Text &= s & "=======" & vbNewLine
Next
use this order {"sdone", "done", "prompt"} to avoid conflection while spliting

Related

Creating a text translator with text file

I'm trying to have this program take a normal English sentence in one text box and with the click of a button convert it into a textese sentence.
Any word that can be shortened will be replaced with the words in a text file.
An example of a line in the text file is
anyone,ne1
There are 52 lines (replacement words).
What would be the best way to approach this problem? Is a nested loop possibly a good route to take?
I don't have much experience and trying to learn the language more so open to trying all methods.
Below is what I have so far before I begin the coding process as I'm not really sure where to head from here. Only the words that are found within the text file will be replaced, so I think I'd use an If/Else statement that would ignore and leave any words not found alone.
Public Class frmTextese
Dim inputData() As String = IO.File.ReadAllLines("Textese.txt")
Private Sub btnTranslate_Click(sender As Object, e As EventArgs) Handles btnTranslate.Click
Dim english As Integer = 0
Dim englishSentence As String = txtEnglish.Text
Dim result() As String
result = englishSentence.Split(englishSentence)
Dim line As String
Dim data() As String
For i As Integer = 0 To (inputData.Length - 1)
line = inputData(i)
data = line.Split(" "c)
Next
'txtTextese.Text =
End Sub
End Class
An image of what I am trying to achieve:
Try this and have a look at the comments to get an idea of what is happening. I should add that strictly speaking, this site isn't a code writing service, but I can't help it sometimes. This also takes into account capitalization of words and common punctuation after the words. It's a bit quick and dirty, but for the basics, it seems to work ok.
Public Class frmTextese
'create a new empty dictionary where we'll add the pairs of english and
'textese words
Dim englishToTextese As New Dictionary(Of String, String)
Private Sub frmTextese_Load(sender As Object, e As EventArgs) Handles MyBase.Load
'load textese dictionary to an array
Dim filedata() As String = IO.File.ReadAllLines("k:\textese.txt")
'Split each line into its elements and add to the dictionary
For Each item As String In filedata
Dim splitstring() As String = item.Split(","c)
englishToTextese.Add(splitstring(0), splitstring(1))
Next
End Sub
Private Function IsCapitalized(word As String) As Boolean
'If the first character in the word is upper case then return true
'else return false
If Mid(word, 1, 1) = Mid(word, 1, 1).ToUpper Then
Return True
Else
Return False
End If
End Function
Private Function GetPunctuation(word As String) As Tuple(Of String, String)
'If the last character in the word is a punctuation mark, return a pair of values which are :
'the word without puctuation, and the punctuation. Else
'return the word and an empty string
Dim result As Tuple(Of String, String)
If "!':;?/.,".Contains(word.Last()) Then
result = New Tuple(Of String, String)(Mid(word, 1, word.Length - 1), word.Last)
Else
result = New Tuple(Of String, String)(word, "")
End If
Return result
End Function
Private Sub btnTranslate_Click(sender As Object, e As EventArgs) Handles btnTranslate.Click
Dim textese As String = ""
'split the text in the txtEnglish textbox into its component words including any punctuation
Dim words() As String = txtEnglish.Text.Split(" "c)
For Each englishWord As String In words
'get the word and any punctuation following it as a pair of items
'and store in punctResult
Dim punctResult As Tuple(Of String, String) = GetPunctuation(englishWord)
Dim texteseWord As String
'store the first item (the word) in englishWord
englishWord = punctResult.Item1
'store the secont item (the punctuation or a blank string) in punctuation
Dim punctuation As String = punctResult.Item2
'If the english word is in the dictionary
If englishToTextese.ContainsKey(englishWord.ToLower) Then
'get the textesevertion
texteseWord = englishToTextese(englishWord.ToLower)
'if the original english word was capiutalized, capitalize the textese word
If IsCapitalized(englishWord) Then
texteseWord = texteseWord.ToUpperInvariant
End If
'add the word to the textese sentence
textese = textese & texteseWord & punctuation & " "
Else
'if the word isn't in the dictionary, add the original english word and its original state
'of capitalization and punctuation to the textese sentence
textese = textese & englishWord & punctuation & " "
End If
Next
'store the new texteze sentence in the textbox
txtTextese.Text = textese
End Sub
End Class

permutation not accepting large words

the vb.net code below permutates a given word...the problem i have is that it does not accept larger words like "photosynthesis", "Calendar", etc but accepts smaller words like "book", "land", etc ...what is missing...Pls help
Module Module1
Sub Main()
Dim strInputString As String = String.Empty
Dim lstPermutations As List(Of String)
'Loop until exit character is read
While strInputString <> "x"
Console.Write("Please enter a string or x to exit: ")
strInputString = Console.ReadLine()
If strInputString = "x" Then
Continue While
End If
'Create a new list and append all possible permutations to it.
lstPermutations = New List(Of String)
Append(strInputString, lstPermutations)
'Sort and display list+stats
lstPermutations.Sort()
For Each strPermutation As String In lstPermutations
Console.WriteLine("Permutation: " + strPermutation)
Next
Console.WriteLine("Total: " + lstPermutations.Count.ToString)
Console.WriteLine("")
End While
End Sub
Public Sub Append(ByVal pString As String, ByRef pList As List(Of String))
Dim strInsertValue As String
Dim strBase As String
Dim strComposed As String
'Add the base string to the list if it doesn't exist
If pList.Contains(pString) = False Then
pList.Add(pString)
End If
'Iterate through every possible set of characters
For intLoop As Integer = 1 To pString.Length - 1
'we need to slide and call an interative function.
For intInnerLoop As Integer = 0 To pString.Length - intLoop
'Get a base insert value, example (a,ab,abc)
strInsertValue = pString.Substring(intInnerLoop, intLoop)
'Remove the base insert value from the string eg (bcd,cd,d)
strBase = pString.Remove(intInnerLoop, intLoop)
'insert the value from the string into spot and check
For intCharLoop As Integer = 0 To strBase.Length - 1
strComposed = strBase.Insert(intCharLoop, strInsertValue)
If pList.Contains(strComposed) = False Then
pList.Add(strComposed)
'Call the same function to review any sub-permutations.
Append(strComposed, pList)
End If
Next
Next
Next
End Sub
End Module
Without actually creating a project to run this code, nor knowing how it 'doesn't accept' long words, my answer would be that there are a lot of permutations for long words and your program is just taking much longer than you're expecting to run. So you probably think it has crashed.
UPDATE:
The problem is the recursion, it's blowing up the stack. You'll have to rewrite your code to use an iteration instead of recursion. Generally explained here
http://www.refactoring.com/catalog/replaceRecursionWithIteration.html
Psuedo code here uses iteration instead of recursion
Generate list of all possible permutations of a string

Using Functions in Visual Basic

The program I'm working on has two different functions, one that calculates the number of syllables in a text file, and another that calculates the readability of the text file based on the formula
206.835-85.6*(Number of Syllables/Number of Words)-1.015*(Number of Words/Number of Sentences)
Here are the problems I'm having:
I'm supposed to display the contents of the text file in a multi-line text box.
I'm supposed to display the answer I get from the function indexCalculation in a label below the text box.
I'm having trouble calling the function to actually have the program calculate the answer to be displayed in the label.
Here is the code I have so far.
Option Strict On
Imports System.IO
Public Class Form1
Private Sub ExitToolStripMenuItem_Click(sender As Object, e As EventArgs) Handles ExitToolStripMenuItem.Click
Me.Close()
End Sub
Private Sub OpenToolStripMenuItem_Click(sender As Object, e As EventArgs) Handles OpenToolStripMenuItem.Click
Dim open As New OpenFileDialog
open.Filter = "text files |project7.txt|All file |*.*"
open.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory)
If open.ShowDialog() = Windows.Forms.DialogResult.OK Then
Dim selectedFileName As String = System.IO.Path.GetFileName(open.FileName)
If selectedFileName.ToLower = "project7.txt" Then
Dim text As String = File.ReadAllText("Project7.txt")
Dim words = text.Split(" "c)
Dim wordCount As Integer = words.Length
Dim separators As Char() = {"."c, "!"c, "?"c, ":"c}
Dim sentences = text.Split(separators, StringSplitOptions.RemoveEmptyEntries)
Dim sentenceCount As Integer = sentences.Length
Dim vowelCount As Integer = 0
For Each word As String In words
vowelCount += CountSyllables(word)
Next
vowelCount = CountSyllables(text)
Label1.Show(indexCalculation(wordCount, sentenceCount, vowelCount))
Else
MessageBox.Show("You cannot use that file!")
End If
End If
End Sub
Function CountSyllables(word As String) As Integer
word = word.ToLower()
Dim dipthongs = {"oo", "ou", "ie", "oi", "ea", "ee", _
"eu", "ai", "ua", "ue", "au", "io"}
For Each dipthong In dipthongs
word = word.Replace(dipthong, dipthong(0))
Next
Dim vowels = "aeiou"
Dim vowelCount = 0
For Each c In word
If vowels.IndexOf(c) >= 0 Then vowelCount += 1
Next
If vowelCount = 0 Then
vowelCount = 1
End If
Return vowelCount
End Function
Function indexCalculation(ByRef wordCount As Integer, ByRef sentenceCount As Integer, ByRef vowelCount As Integer) As Integer
Dim answer As Integer = CInt(206.835 - 85.6 * (vowelCount / wordCount) - 1.015 * (wordCount / sentenceCount))
Return answer
End Function
End Class
Any suggestions would be greatly appreciated.
Here are my suggestions:
update your indexCalculation function to take in Integers, not strings. that way you don't have to convert them to numbers.
remove all of your extra variables you are not using. this will clean things up a bit.
remove your streamreader. it appears you are reading the text via File.ReadAllText
Label1.Show(answer) should be changed to Label1.Show(indexCalculation(wordCount,sentenceCount,vowelCount)) -- unless Label1 is something other than a regular label, use Label1.Text = indexCalculation(wordCount,sentenceCount,vowelCount))
Then for the vowelCount, you need to do the following:
Dim vowelCount as Integer = 0
For Each word as String in words
vowelCount += CountSyllables(word)
Next
Also, add the logic to the CountSyllables function to make it 1 if 0. If you don't want to include the last character in your vowel counting, then use a for loop instead of a for each loop and stop 1 character short.

Flesch Readability Index in Visual Basic

I'm working on a program that is supposed to perform the calculations for the Flesch Readability Index. The program is supposed to read in a text file "Project7.txt", it's then supposed to display the text in a multi-line text box and perform the following calculations:
Count the number of words in the file.
Count the number of syllables in the file.
Count the number of sentences in the file (a sentence can be ended by a ".", "?", "!", or ":"
The program is then supposed to plug the values into the following formula and display the result in a label (label1).
206.835-85.6*(Number of syllables/Number of words) - 1.015*(Number of words/Number of sentences)
Here is the code I have written so far.
Option Strict On
Imports System.IO
Public Class Form1
Private Sub ExitToolStripMenuItem_Click(sender As Object, e As EventArgs) Handles ExitToolStripMenuItem.Click
Me.Close()
End Sub
Private Sub OpenToolStripMenuItem_Click(sender As Object, e As EventArgs) Handles OpenToolStripMenuItem.Click
Dim open As New OpenFileDialog
open.Filter = "text files |project7.txt|All file |*.*"
open.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory)
If open.ShowDialog() = Windows.Forms.DialogResult.OK Then
Dim selectedFileName As String = System.IO.Path.GetFileName(open.FileName)
If selectedFileName.ToLower = "project7.txt" Then
Dim doc As String = ""
Dim line As String
Using reader As New StreamReader(open.OpenFile)
While Not reader.EndOfStream
doc += reader.ReadLine
Console.WriteLine(line)
End While
Dim text = File.ReadAllText("Project7.txt")
Dim words = text.Split(" "c)
Dim wordCount = words.Length
Dim separators As Char() = {"."c, "!"c, "?"c, ":"c}
Dim sentences = text.Split(separators, StringSplitOptions.RemoveEmptyEntries)
Dim sentenceCount = sentences.Length
End Using
Else
MessageBox.Show("You cannot use that file!")
End If
End If
End Sub
Function CountSyllables(word As String) As Integer
word = word.ToLower()
Dim dipthongs = {"oo", "ou", "ie", "oi", "ea", "ee", _
"eu", "ai", "ua", "ue", "au", "io"}
For Each dipthong In dipthongs
word = word.Replace(dipthong, dipthong(0))
Next
Dim vowels = "aeiou"
Dim vowelCount = 0
For Each c In word
If vowels.IndexOf(c) >= 0 Then vowelCount += 1
Next
Return vowelCount
End Function
End Class
Any suggestions are appreciated. Thanks in advance for the help.
Is the code always reporting one more sentence than there actually is?
If so take a look at this from the String.Split method MSDN docs:
When the Split function encounters two delimiters in a row, or a
delimiter at the beginning or end of the string, it interprets them as
surrounding an empty string ("")...
I'm sure your last sentence ends with your sentence delimiter so what's happening is your assignment to sentences is getting an extra, empty array element. See for yourself by breakpointing the line after your assignment and hovering your mouse over sentences. Examine the contents of the array.
The fix is to call Split with the option to remove empty array values. To do that though you'll need to call the Split overload that takes an array of Char for the delimiters:
Replace this line:
Dim sentences = text.Split("."c, "!"c, "?"c, ":"c)
With this:
Dim separators As Char() = {"."c, "!"c, "?"c, ":"c}
Dim sentences = text.Split(separators, StringSplitOptions.RemoveEmptyEntries)
And you should be good.

Splitting a string based on a set length of characters

MVC 3. Vb.net. Part of my app generates PDF files using Itextsharp. Some strings are too long to go onto the background image correctly. So I basically need to split this string when its over 26 characters long and when it splits it cant split in the middle of a word. from there I will use newline to add the string to the right to the next line... Any ideas that might point me in the right direction.. I did start bulding the function that I will pass the string into test for length and then pass back the string after it finishes but I am stummped after that..
Private Function stringLength(ByVal _string As String) As String
If _string.Length < 26 Then
_string.Split(
End If
End Function
I'm sure there's a million different ways to do this.
You basically need to get all of your words split by the space into a list. After that, you just need to keep checking if the current word plus a space plus the next word reach your threshold or not, and if it does, you move to the next line. Once you have all of your lines, then you rejoin the list into a single string again.
Private Function LimitWidth(ByVal text As String, ByVal maxCharacters As Integer) As String
Dim words As List(Of String) = text.Split(" "c).ToList()
If text.Length < maxCharacters OrElse words.Count = 1 Then
Return text
Else
Dim lines As New List(Of String)
Dim currentLine As String = words(0)
For i As Integer = 1 To words.Count - 1
If (currentLine & " " & words(i)).Length > maxCharacters Then
lines.Add(currentLine)
currentLine = words(i)
If i = words.Count - 1 Then
lines.Add(currentLine)
End If
Else
If i = words.Count - 1 Then
lines.Add(currentLine & " " & words(i))
End If
currentLine &= " " & words(i)
End If
Next
Return String.Join(Environment.NewLine, lines.ToArray())
End If
End Function
To Test:
Private Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click
MessageBox.Show(LimitWidth("This is a really long sentence " & _
"meant to demonstrate how to split " & _
"the words into a confined character length.", 26))
End Sub
It sounds like you are asking for a word wrap function.
Since I feel that it's better to answer in a way that promotes learning than to just give answers, I have for you a link that walks you through the process of using Test Driven Development (TDD) to solve this problem. It just so happens that the word wrap problem is a popular coding kata, and Robert C. Martin wrote a somewhat silly fictional story about a developer being taught how to use TDD to solve the word wrap kata.
The code examples are in Java, but it should be trivial to read and translate.
http://thecleancoder.blogspot.com/2010/10/craftsman-62-dark-path.html
The goofy bits are skip-able. Just jump down to the sentences right before the first code snippet.
I would add to it handling of multiline input text with following:
Private Function LimitWidth(ByVal text As String, ByVal maxCharacters As Integer, SplitSign As String) As String
Dim Output As String = ""
Dim OrgLines As List(Of String) = text.Split(Environment.NewLine).ToList()
For x As Integer = 1 To OrgLines.Count - 1
Dim words As List(Of String) = OrgLines(x).Split(" "c).ToList()
If text.Length < maxCharacters OrElse words.Count = 1 Then
Output += OrgLines(x)
Else
Dim lines As New List(Of String)
Dim currentLine As String = words(0)
For i As Integer = 1 To words.Count - 1
If (currentLine & " " & words(i)).Length > maxCharacters Then
lines.Add(currentLine)
currentLine = words(i)
If i = words.Count - 1 Then
lines.Add(currentLine)
End If
Else
If i = words.Count - 1 Then
lines.Add(currentLine & " " & words(i))
End If
currentLine &= " " & words(i)
End If
Next
Output += String.Join(SplitSign, lines.ToArray())
End If
Next
Return Output
End Function
use:
LimitWidth("your text", 80, Environment.NewLine)