Find all variations of letters in a word in Vb.net

Find all variations of letters in a word in Vb.net - vb.net

In a program used to find words from random scrabble letters how do you loop through each of the possible combinations of letters? ie: abc acb bac bca cab cba

What you're trying to do here is permute the set of characters contained within the string.
There's a question here about it:
Find string permutation including the single character using C# or F#

Related

How to parse specific data in a string line?

What makes my question different than many others already asked is that I know how to parse data with delimiters such as a comma or space, but I'm unsure how to parse data that is separated by spaces but also contain spaces. This is an example line:
M 9 12.02 Adam Productions Inc
So I need to parse "M", 9, 12.02, and "Adam Productions Inc" out to different variables but I'm not sure how to do that. Here is my code to parse by spaces.
Dim contents() As String = strRawFile.Split(New [Char]() {CChar(" ")}, StringSplitOptions.RemoveEmptyEntries)
Obviously this does parse my first 3 pieces of data, but then it tears apart the 4th piece. How can I modify my code to overcome this?
Desired Result:
M
9
12.02
Adam Productions Inc

One way is to split it out and then join together items from the 4th element of the array until the array length. This assumes that there will not be any embedded spaces in the first 3 items.

How to separate words characters and non word characters?

Unicode have categories of characters. Some are alpha numeric. Some are punctuation.
What about if I want to know whether a word belongs to keyword or not
For example,
A,a,b,c, tend to belong to words. So is Ƈ,Ǝ,ǟ, so are all chinese characters.
Sentences like
Hello World, I "like" (to) eat ƇƎǟ and 款开源 ©
Have keywords:
Hello
World
I
like
to
eat
ƇƎǟ
款
开
源
Here, , (),© are not word characters and hence should just be ignored and use.
© doesn't count as punctuation either. '©'.IsPunctuation returns false in vb.net but I want to get rid of that too.
Now I want to make a program that can split sentences into keywords. For that I need to know which characters are word characters and which one is not.
Is there a vb.net function for that?

Do it the other way round: use IsLetter for your test. Or better yet, use regular expressions to split your string by words:
Dim str = "Hello World, I ""like"" (to) eat ƇƎǟ and 款开源 ©"
Dim wordPattern As New Regex("\p{L}+")
For Each match in wordPattern.Matches(str))
Console.WriteLine(match)
Next
Here, \p{L} matches any word character. However, the above matches “款开源” in a single rather than in separate matches since there is no separator between the characters.

u need to deal with "keycodes"
like if u only want letters [a-z]
then
for(c>='a' && c<='z'){
}
or
for(c>=97 && C<=122){
}

Which Unicode characters are "composing" characters (whose sole purpose is to add accent, tilda)?

This is related to
What are the characters that count as the same character under collation of UTF8 Unicode? And what VB.net function can be used to merge them?
This is how I plan to do this:
Use http://msdn.microsoft.com/en-us/library/dd374126%28v=vs.85%29.aspx to turn the string into
KD form.
Basically it'll turn most variation such as superscript into the normal number. Also it decompose tilda and accent into 2 characters.
Next step would be to remove all characters whose sole purpose is tildaing or accenting character.
How do I know which characters are like that? Which characters are just "composing characters"
How do I find such characters? After I find those, how do I get rid of it? Should I scan character by character and remove all such "combining characters?"
For example:
Character from 300 to 362 can be gotten rid off.
Then what?

Combining characters are listed in UnicodeData.txt as having a nonzero Canonical_Combining_Class, and a General_Category of Mn (Mark, nonspacing).

For each character in the string, call GetUnicodeCategory and check the UnicodeCategory for NonSpacingMark, SpacingCombiningMark or EnclosingMark.
You may be able to do it more efficiently using regex, eg Regex.Replace(str, "\p{M}", "").

Regex issue using ICU regex/regexkitlite

Starting a new question as my other question solved a different issue with the regex.
Here's my regex:
(?i)\\d{1,4}(?<!v(?:ol)?\\.?\\s?)(?![^\\(]*\\))
Regex split up for clarity:
(?i) - case insensitive
\\d{1,4} - a number with 1-4 digits
(?<!v(?:ol)?\\.?\\s?) the number cannot be preceded by 'v', 'v.', 'vol', 'vol.', with or without a space on the end.
(?![^\\(]*\\)) - Number cannot be inside parentheses.
It all works except for the 'vol.' bit.:
#"Words words 342 words (2342) (words 2 words) (words).ext" result 342 - correct.
#"Words - words words (2010) (words 2 words) (words).ext" result nil - correct.
#"words words v34 35.ext" result 34 - incorrect.
#"Words vol.342 343 (1234) (3 words) (desc).ext" result 342 - incorrect.
What am I doing wrong with my 'vol.' section?

You need to put the lookbehind before the number. Also, you need to add digits as illegal characters inside the lookbehind, or the 4 in v.34 will match. Try
(?i)(?<!v(?:ol)?\\.?\\s*\\d*)\\d{1,4}(?![^(]*\\))
This is expecting (edit: wrongly, as it turns out) that regexkitlite supports infinite repetition inside lookbehind which not many regex flavors do.
A look into the docs shows that it does support finite (but variable) repetition inside lookbehind, and if you are aware that the following will only work if there is at most one space between vol. and the number, then you could try
(?i)(?<!v(?:ol)?\\.?\\s?)(?<!\\d)\\d{1,4}(?![^(]*\\))

How to remove strings contained in a list in VB.NET?

How can I find words like and, or, to, a, no, with, for etc. in a sentence using VB.NET and remove them. Also where can I find all words list like above.

Note that unless you use Regex word boundaries you risk falling afoul of the Scunthorpe (Sfannythorpe) problem.
string pattern = #"\band\b";
Regex re = new Regex(pattern);
string input = "a band loves and its fans";
string output = re.Replace(input, ""); // a band loves its fans
Notice the 'and' in 'band' is untouched.

You can indeed replace your list of words using the .Replace function (as colithium described) ...
myString.Replace("and", "")
Edit:
... but indeed, a nicer way is to use Regular Expressions (as edg suggested) to avoid replacing parts of words.
As your question suggests that you would like to clean-up a sentence to keep meaningfull words, you have to do more than just remove two- and three letter words.
What you need is a list of stop-words:
http://en.wikipedia.org/wiki/Stop_word
A comma seperated list of stop-words for the English language can be found here:
http://www.textfixer.com/resources/common-english-words.txt

The easiest way is:
myString.Replace("and", "")
You'd loop over your word list and have a statement like the above. Google for a list of common English words?
List of English 2 Letter Words
List of English 3 Letter Words

You can match the words and remove them using regular expressions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find all variations of letters in a word in Vb.net - vb.net

In a program used to find words from random scrabble letters how do you loop through each of the possible combinations of letters? ie: abc acb bac bca cab cba

What you're trying to do here is permute the set of characters contained within the string. There's a question here about it: Find string permutation including the single character using C# or F#

Related

How to parse specific data in a string line?

How to separate words characters and non word characters?

Which Unicode characters are "composing" characters (whose sole purpose is to add accent, tilda)?

Regex issue using ICU regex/regexkitlite

How to remove strings contained in a list in VB.NET?

Categories

Resources