Extracting characters from an input string vb.net - vb.net

Hey guys I'm stuck with this question. Please help.
I want to write a program that can extract alphabetical characters and special characters from an input string. An alphabetical character is any character from "a" to "z"(capital letters and numbers not included") a special character is any other character that is not alphanumerical.
Example:
string = hello//this-is-my-string#capetown
alphanumerical characters = hellothisismystringcapetown
special characters = //---#
Now my question is this:
How do I loop through all the characters?
(the for loop I'm using reads like this for x = 0 to strname.length)...is this correct?
How do I extract characters to a string?
How do I determine special characters?
any input is greatly appreciated.
Thank you very much for your time.

You could loop through each character as follows:
For Each _char As Char In strname
'Code here
Next
or
For x as integer = 0 to strname.length - 1
'Code here
Next
or you can use Regex to replace the values you do not need in your string (I think this may be faster but I am no expert) Take a look at: http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx
Edit
The replacement code will look something as follows although I am not so sure what the regular expression (variable called pattern currently only replacing digits) would be:
Dim pattern As String = "(\d+)?" 'You need to update the regular expression here
Dim input As String = "123//hello//this-is-my-string#capetown"
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(input, "")

Since you need to keep the values, you'll want to loop through your string. Keeping a list of characters as a result will come in handy since you can build a fresh string later. Then take advantage of a simple Regex test to determine where to place things. The psuedo code looks something like this.
Dim alphaChars As New List(Of String)
Dim specialChars As New List(Of String)
For Each _char As Char in testString
If Regex.IsMatch(_char, "[a-z]")) Then
alphaChars.Add(_char)
Else
specialChars.Add(_char)
End If
Next
Then If you need to dump your results into a full string, you can simply use
String.Join(String.Empty, alphaChars.ToArray())
Note that this code makes the assumption that ANYTHING else than a-z is considered a special character, so if needs be you can do a second regular expression in your else clause to test for you special characters in a similar manner. It really depends on how much control you have over the input.

Related

Parsing string - "Contains" is insufficient

I use this code to check if a String is in another String:
If StringData(1).Contains("-SomeText2.") Then
'some code
End If
'StringData(1) looks like this:
'-SomeText1.1401-|-SomeText2.0802-|-SomeText3.23-|-SomeText4.104-|
'In case I look for -SomeText1. I need 1401
'In case I look for -SomeText2. I need 0802
'In case I look for -SomeText3. I need 23
'In case I look for -SomeText4. I need 104
I first check if -SomeText2. is in StringData(1), and if it is, I need to get the next part of the text: 0802 which is the part I don't know how to do, how can I do it?
All the strings are separated by | and all substrings start and end with - and have a . separating the first part from the second. I check all the strings starting with - and ending with . because there are some with - and | in the middle, so Split function won't work.
Those strings change quite often, so I need something to check it no matter the length of the strings.
I would just split the string up and get the text between "." and "-" when the search text is found like this:
Dim str As String = "-SomeText1.1401-|-SomeText2.0802-|-SomeText3.23-|-SomeText4.104-"
Dim searches() As String = {"-SomeText1", "-SomeText2", "-SomeText3", "-SomeText4"}
For Each search As String In searches
For Each value As String In str.Split(CChar("|"))
If value.Contains(search) Then
Dim partIwant As String = value.Substring(value.IndexOf(".") + 1, value.Length - value.IndexOf(".") - 2)
MsgBox(partIwant)
'Outputs: 1401, 0802, 23, 104
Exit For
End If
Next
Next
In this example, we just use Contains() to see if our search string is present or not...we can't actually use that function to get any further information because all it returns is a True or False. So once we know that our string has been found, it's just a matter of some string manipulation to grab the text between the "." and "-" characters. IndexOf() will get us the index of the period, and then we just pull the text between there and the last character of the string.
Your question has nothing to do with WPF, so the tag and title are misleading.
To solve your problem, you should use String.IndexOf(string) instead of String.Contains(string). That tells you at which position the given string starts. If that value is -1, it means that the original string does not contain your search string at all.
Once you have that starting index, you can use String.IndexOf(string, int) to search for the next occurrence of -, so you know where the entry stops. The second parameter tells it at which index it should start the search, and in this case you should start the search at the index where you found your first match.
Now that you know the starting index of your match, the end index of the entry and the length of your search string, you can put those together and easily use String.Substring(int, int) to get the part of the string that you are interested in.
That's the straight forward, naive solution. A more sophisticated solution would simply build a regular expression for the search string that is built in a way that the part you are interested in is included in the capture group. But that's a more elaborate topic.

Replacing Characters Simultaneously

Hey guys I'm trying to make a program that helps people encrypt messages and decrypt messages using the Caesar shift cipher, I know it's probably already been done, I want to have a go myself though.
The problem I've been having is when it comes to encrypting the text. The user selects a number (between 1-25) and then the application will change the letters corresponding to the number chosen, e.g. if the user inputs "HI" and selects 2, both characters are moved two places down the alphabet outputting "JK". My main problem is the replacing characters though, mostly because I've set up the program to be able to encrypt large blocks of text, because my code is:
If cmbxKey.Text = "1" Then
If txtOutput.Text.Contains("a") Then
sOutput = txtOutput.Text.Replace("a", "b")
txtOutput.Text = sOutput
End If
If txtOutput.Text.Contains("b") Then
sOutput = txtOutput.Text.Replace("b", "c")
txtOutput.Text = sOutput
End If
End If
This means if the user inputs "HAY" it will change it to "HBY" and then because of the second if statement it will change it to "HCY" but I only want it to be changed once. Any suggestions to avoid this???? Thanks guys
Since you want to shift all characters, start out by looping though the characters using something like ToArray:
For each s as string in txtOutput.Text.ToArray
'This will be here for each character in the string, even spaces
Next
Then, rather than having cases for every letter, look at it's ascii number:
ACS(s)
...and shift it by the number you want to. Keep in mind that if the number is greater than (I don't know if you want upper/lower case) 122, you want to subtract 65 to get you back to "A".
Then you can convert it back into a character using:
CHR(?)
So this might look something like this:
Dim sb as new text.StringBuilder()
For each s as string in txtOutput.Text.ToArray
If asc(s) > 122 Then
sb.append(CHR(ASC(s) + ?YourShift? - 65)
Else
sb.append(CHR(ASC(s) + ?YourShift?)
END IF
Next
txtOutput.Text = sb.ToString
A very simple method of changing your application while keeping your strategy is to replace the lower case characters with upper case characters. Then they won't be recognized by the Replace method anymore.
Obviously, the problem is that you want to implement an algorithm. In general, an algorithm should be smart in the sense that you don't have to do the grunt work. That's why a method such as the one presented by Steve is smarter; it doesn't require you to map each character separately, which is tedious, and - as most tedious tasks - error prone.
One big issue arise when you're facing a String that the basic Alphanumeric table can't handle. A String that contains words like :
"Déja vu" -> The "é" is going to be what ?
And also, how about encoding the string "I'm Aaron Mbilébé" if you use .ToUpper().
.ToUpper returns "I'M AARON MBILÉBÉ".
You've lost the casing, and how do you handle the shifting of "É" ?
Of course, a code should be smart as pointed above, and I was used to deal with strings just by using the System.Text.ASCIIEncoding to make things easier. But from the moment I started to use large amount of textual datas, sources from the web, files (...) I was forced to dig deeper, and seriously consider string encoding (and System Endianness by the way, when coding and decoding string to/from array of bytes)
Re-think of what do you really want in the end. If you're the only one to use your code, and you're certain that you'll only use A..Z, 0..9, a..z, space and a fixed amount of allowed characters (like puntuation) then, just build a Table containing each of those chars.
Private _AllowedChars As Char() = { "A"c, "B"c, ... "0"c, "1"c, .. "."c, ","c ... }
or
Private _AllowedChars As Char() = "ABCDEF....012...abcd..xyz.;,?:/".ToCharArray()
Then use
Private Function ShiftChars(ByVal CurrentString As String, ByVal ShiftValue As Integer) As String
Dim AllChars As Char() = CurrentString.ToCharArray()
Dim FinalChars As Char()
Dim i As Integer
FinalChars = New Char(AllChars.Length - 1) {} ' It's VB : UpperBound is n+1 item.
' so n items is UpperBound - 1
For i = 0 To AllChars.Length - 1
FinalChars(i) = _AllowedChars((Array.IndexOf(_AllowedChars, AllChars(i)) + ShiftValue) Mod _AllowedChars.Length)
Next
Return New String(FinalChars)
End Function
And
Private Function UnShiftChars(ByVal CurrentString As String, ByVal ShiftValue As Integer) As String
' ... the same code until :
FinalChars(i) = _AllowedChars((Array.IndexOf(_AllowedChars, AllChars(i)) - ShiftValue + _AllowedChars.Length) Mod _AllowedChars.Length)
' ...
End Function
^^ Assuming ShiftValue is always positive (defined once)
But again, this only works when you have a predefined set of allowed characters. If you want a more flexible tool, you ought to start dealing with encodings, array of byte, BitConverter and have a look at system endianness. That's why I asked if someone else is goind to use your application : let's try this string :
"Xin chào thế giới" ' which is Hello World in vietnamese (Google Trad)
In that case, you may give up..? No ! You ALWAYS have a trick in your cards !
Just create your allowed chars on the fly
Private _AllowedChars As New SortedList(Of Char, Char)
-> get the string to encode (shift)
Private Function ShiftChars(ByVal CurrentString As String, ByVal ShiftValue As Integer) As String
Dim AllChars As Char() = CurrentString.ToCharArray()
Dim FinalChars As Char()
Dim i As Integer
' Build your list of allowed chars...
_AllowedChars.Clear()
For i = 0 To AllChars.Length - 1
If Not _AllowedChars.ContainsKey(AllChars(i)) Then
_AllowedChars.Add(AllChars(i), AllChars(i))
End If
Next
' Then, encode...
FinalChars = New Char(AllChars.Length - 1) {}
For i = 0 To AllChars.Length - 1
FinalChars(i) = _AllowedChars.Keys.Item((_AllowedChars.IndexOfKey(AllChars(i)) + ShiftValue) Mod _AllowedChars.Count)
Next
Return New String(FinalChars)
End Function
The same for Unshift/decode.
Note : in foreing languages, the resulting string is pure garbage and totally unreadable, unless you (un)shift the chars again.
However, the main limitation of this workaround is the same as the fixed chars array above : Once you encode your string, and add a char in your encoded string that doesn't exists in the initial generated allowed chars, then you've nuked your data and you won't be able to decode your string. All you'll have is pure garbage.
So one day... one day maybe, you'll have to dig deeper at the byte level of the thing, in a defined extended encoding (Unicode/UTF8/16) to secure the integrity of your data.

Isolate a a substring within quotes from an entire line

To start here is an example of a line I am trying to manipulate:
trait slot QName(PrivateNamespace("*", "com.company.assembleegameclient.ui:StatusBar"), "_-0IA") type QName(PackageNamespace(""), "Boolean") value False() end
I wrote a code that will go through and read through each line and stop at the appropriate line. What I am trying to achieve now is to read through the characters and save just the
_-0IA
to a new string. I tried using Trim(), Replace(), and indexof so far but I am having a ton of difficulties because of the quotation marks. Has anyone deal with this issue before?
Assuming your source string will always follow a strict format with only some data changes, something like this might work:
'Split the string by "," and extract the 3rd element. Trim the space and _
quotation mark from the front and extract the first 5 characters.
Dim targetstr As String = sourcestr.Split(","c)(2).TrimStart(" """.ToCharArray).Substring(0, 5)
If the length of the target string is variable it can be done like this:
Dim temp As String = teststr.Split(","c)(2).TrimStart(" """.ToCharArray)
'Use the index of the next quotation mark instead of a fixed length
Dim targetstr As String = temp.Substring(0, temp.IndexOf(""""c))

Detect chinese character in a string VB.NET

Is there a way to detect a Chinese character in a string which is build like this:
dim test as string = "letters 中國的"
Now I want to substring only the Chinese characters. But my code is database driven, so I can't substring it, because the length is always different. So is there a way I can split the string, from the moment I detect a Chinese character?
I think you can use regexp like in the following example, didn't test and I haven't code using VB.net for years So syntax may be not correct.
Dim m As Match = Regex.Match(value, "[\u4e00-\u9fa5]+",
RegexOptions.IgnoreCase)
' If successful, write the group.
If (m.Success) Then
Dim key As String = m.Groups(1).Value
End If
http://msdn.microsoft.com/en-us/library/twcw2f1c.aspx

Strip all punctuation from a string in VB.net

How do I strip all punctuation from a string in vb.net? I really do not want to do stringname.Replace("$", "") for every single bit of punctuation, though it would work.
How do i do this quickly and efficiently?
Other than coding something that codes this for me....
You can use a regular expression to match anything that you want to remove:
str = Regex.Replace(str, "[^A-Za-z]+", String.Empty);
[^...] is a negative set that matches any character that is not in the set. You can just put any character there that you want to keep.
Quick example using a positive regex match. Simply place the characters you want removed in it:
Imports System.Text.RegularExpressions
Dim foo As String = "The, Quick brown fox. Jumped over the Lazy Dog!"
Console.WriteLine(Regex.Replace(foo,"[!,.\"'?]+", String.Empty))
If you want a non-regex solution, you could try something like this:
Dim stringname As String = "^^^%%This,,,... is $$my** original(((( stri____ng."
Dim sb As New StringBuilder
Dim c As Char
For Each c In stringname
If Not (Char.IsSymbol(c) OrElse Char.IsPunctuation(c)) Then
sb.Append(c)
End If
Next
Console.WriteLine(sb.ToString)
Output is "This is my original string".