Replace or Encode em-dash symbol in vb.net - vb.net

while replace the em-dash value as empty or another symbol then it doesn't work.
Dim sWebsiteText As String = "Hari—prasanth"
sWebsiteText = sWebsiteText.Replace("—", "")

Related

How to split on a string instead of a character?

I have a file name like below:
sub_fa__hotchkis_type1a__180310__PUO4x4__180813
I want to separate it with double underscores "__" and using this code:
Dim MdlNameArr() As String = Path.GetFileNameWithoutExtension(strProjMdlName).Split(New Char() {"__"}, StringSplitOptions.RemoveEmptyEntries)
myTool.Label9.Text = MdlNameArr(1).ToString
I expect the result will be "hotchkis_type1a" but it returns "fa".
It doesnt recognize single underscore "_".
Is there any method to use it properly?
You need to split on a string rather than just a character, so if we look at the available overloads for String.Split, we find the nearest one to that is String.Split(string(), options) which takes an array of strings as the separators and requires the inclusion of StringSplitOptions like this:
Dim s = "sub_fa__hotchkis_type1a__180310__PUO4x4__180813"
Dim separators() As String = {"__"}
Dim parts = s.Split(separators, StringSplitOptions.None)
If parts.Length >= 2 Then
Console.WriteLine(parts(1))
Else
Console.WriteLine("Not enough parts found.")
End If
Outputs:
hotchkis_type1a

VB2010 String adds up Length by adding ""

I'am comparing Strings in Visual Basic 2010 Express. While cuting the String together it sometimes adds a Char with "", what I hoped is "nothing"
Example:
Dim text as String = "test"
Dim sign as Char = ""
text = text + sign
while debuging it says that the new text is "test", but if I ask for the Length it is 5.
This is a problem when I try to compare this with an other String
Dim bigtext as String = "test1234"
Dim text as String = "test"
Dim sign as Char = ""
text = text + sign
bigtext.indexOf(text) 'should be 0 (index), but is -1 (not found)
any idea how to filter a "" away or any other workaround?
Edit - my workoround for now:
Now I add "§" everywhere instead of "" and when I need to use indexOf() to compare something, I Replace("§", "") it.
(with Replace() it is deleted)
As far as I can see, a Char variable always has a character in it (which can be the null character). Concatenating it to another string will append that character to the existing string.
I see two workarounds:
Use a String for sign instead of a Char. The string could be empty or have a single character in it.
Trim the undesired character from the resulting string:
text = (text + sign).Trim(CChar(""))

Changing Unicode character to string

I have to following string:
Dim text As String = "user̲upload"
I want to change the Unicode character #0332 to underscore. I have the following code for this:
Dim test As New Text.StringBuilder
test.Append(text.Replace("#0332", "_"))
Dim normalizedUrl As String = test.ToString()
However it does not work, my "test" string has the same value as "text" variable. Anyone has an idea what can be wrong?
You can;
text.Replace(ChrW(&H332), "_")

Mixed Encoding to String

I have a string in VB.net that may contain something like the following:
This is a 0x000020AC symbol
This is the UTF-32 encoding for the Euro Symbol according to this article http://www.fileformat.info/info/unicode/char/20ac/index.htm
I'd like to convert this into
This is a € symbol
I've tried using UnicodeEncoding() class in VB.net (Framework 2.0, as I'm modifying a legacy application)
When I use this class to encode, and then decode I still get back the original string.
I expected that the UnicodeEncoding would recognise the already encoded part and not encode it against. But it appears to not be the case.
I'm a little lost now as to how I can convert a mixed encoded string into a normal string.
Background: When saving an Excel spreadsheet as CSV, anything outside of the ascii range gets converted to ?. So my idea is that if I can get my client to search/replace a few characters, such as the Euro symbol, into an encoded string such as 0x000020AC. Then I was hoping to convert those encoded parts back into the real symbols before I insert to a SQL database.
I've tried a function such as
Public Function Decode(ByVal s As String) As String
Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(s)
Dim output As String = ""
output = uni.GetString(encodedBytes)
Return output
End Function
Which was based on the examples on the MSDN at http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding.aspx
It could be that I have a complete mis-understanding of how this works in VB.net. In C# I can simply use escaped characters such as "\u20AC". But no such thing exists in VB.net.
Based on advice from Heinzi I implemented a Regex.Replace method using the following code, this appear to work for my examples.
Public Function Decode(ByVal s As String) As String
Dim output As String = ""
Dim sRegex As String = "0x[0-9a-zA-Z]{8}"
Dim r As Regex = New Regex(sRegex)
Dim myEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf HexToString)
output = r.Replace(s, myEvaluator)
Return output
End Function
Public Function HexToString(ByVal hexString As Match) As String
Dim uni As New UnicodeEncoding(True, True)
Dim input As String = hexString.ToString
input = input.Substring(2)
input = input.TrimStart("0"c)
Dim output As String
Dim length As Integer = input.Length
Dim upperBound As Integer = length \ 2
If length Mod 2 = 0 Then
upperBound -= 1
Else
input = "0" & input
End If
Dim bytes(upperBound) As Byte
For i As Integer = 0 To upperBound
bytes(i) = Convert.ToByte(input.Substring(i * 2, 2), 16)
Next
output = uni.GetString(bytes)
Return output
End Function
Have you tried:
Public Function Decode(Byval Coded as string) as string
Return StrConv(Coded, vbUnicode)
End Function
Also, your function is invalid. It takes s as an argument, does a load of stuff and then outputs the s that was put into it instead of the stuff that was processed within it.

Remove special characters from a string

These are valid characters:
a-z
A-Z
0-9
-
/
How do I remove all other characters from my string?
Dim cleanString As String = Regex.Replace(yourString, "[^A-Za-z0-9\-/]", "")
Use either regex or Char class functions like IsControl(), IsDigit() etc. Get a list of these functions here: http://msdn.microsoft.com/en-us/library/system.char_members.aspx
Here's a sample regex example:
(Import this before using RegEx)
Imports System.Text.RegularExpressions
In your function, write this
Regex.Replace(strIn, "[^\w\\-]", "")
This statement will replace any character that is not a word, \ or -. For e.g. aa-b#c will become aa-bc.
Dim txt As String
txt = Regex.Replace(txt, "[^a-zA-Z 0-9-/-]", "")
Function RemoveCharacter(ByVal stringToCleanUp)
Dim characterToRemove As String = ""
characterToRemove = Chr(34) + "#$%&'()*+,-./\~"
Dim firstThree As Char() = characterToRemove.Take(16).ToArray()
For index = 1 To firstThree.Length - 1
stringToCleanUp = stringToCleanUp.ToString.Replace(firstThree(index), "")
Next
Return stringToCleanUp
End Function
I've used the first solution from LukeH, but then realized that this code replaces the dot for extension, therefore I've just upgraded the code slightly:
Dim fileNameNoExtension As String = Path.GetFileNameWithoutExtension(fileNameWithExtension)
Dim cleanFileName As String = Regex.Replace(fileNameNoExtension, "[^A-Za-z0-9\-/]", "") & Path.GetExtension(fileNameWithExtension)
cleanFileName will the file name with no special characters with extension.