Detecting Unrecognized (codepage-unicode) Characters in a string

Detecting Unrecognized (codepage-unicode) Characters in a string - vb.net

How do you detect an unrecognized code page character in a string in vb.Net? These charaters usually show up in a default character such as "?" or a square when the current code page can not recognize the original character from some other output.
I have text fields from an external source that displays the "square" character for some long dash character (not chr(150)) and I want to be able to replace it with character code 45 (dash) to make it compatible but I can't determine how to check for the default special "unrecognized" character in a replace. I searched the net but can't find a solution to this problem! I played with System.Text.Encoding but still can get what I want. Any idea how to do this?
Thanks!

I see this question was asked quite a while ago, I figure you found the answer by now. At any rate, this is what I'm doing at the moment. I look for the specific characters I want to replace and in another array I put what I want them to be changed to. I hope this works for you.
Private Function CleanText(TextToClean As String) As String
Dim CleanedText As String = TextToClean
Dim BadText(5) As Char
Dim GoodText(5) As String
BadText(0) = ChrW(169) ' © (alt 0169, copyright)
BadText(1) = ChrW(174) ' ® (alt 0174, registered trademark)
BadText(2) = ChrW(8482) ' ™ (alt 0153, trademark)
BadText(3) = ChrW(8364) ' € (alt 0128, Euro)
BadText(4) = ChrW(176) ' ° (alt 0176, degrees)
GoodText(0) = "(c)"
GoodText(1) = "(r)"
GoodText(2) = "(tm)"
GoodText(3) = "(euro)"
GoodText(4) = "o"
For i As Integer = 0 To BadText.GetUpperBound(0)
CleanedText = CleanedText.Replace(BadText(i), GoodText(i))
Next
Return CleanedText
End Function

Related

how to parse string containing Unicode ID's as well as plain text for display in datagrid view [duplicate]

This question already has answers here:
How do I convert Unicode escape sequences to Unicode characters in a .NET string?
(5 answers)
Closed 3 years ago.
I am trying to parse a string (returned by a web server), which contains non-standard (as far as I can tell) unicode Id's such as "\Ud83c" or "\U293c", as well as plain text. I need to display this string, emojis in tact, to the user in a datagrid view.
btw, I am blind so please excuse any formatting errors :(
full example of what my code is parsing: "Castle: \Ud83d\Udc40Jerusal\U00e9m.Miles"
the code I wrote which is failing miserably:
Public Function ParseUnicodeId(LNKText As String) As String
Dim workingarray() As String
Dim CurString As String
Dim finalString As String
finalString = ""
' split at \ char
workingarray = Split(LNKText, chr(92))
For Each CurString In workingarray
If CurString <> "" Then
' remove leading U so number can be converted to hex
CurString = Right(CurString, Len(CurString) - 1)
' attempt to cut off right most chars until number can be converted to text as there is nothign separating end of Unicode chars and start of plain text
Do While IsNumeric(CurString) = False
If CurString = "" Then
Exit Do
End If
CurString = Left(CurString, Len(CurString) - 1)
Loop
If CurString.StartsWith("U", StringComparison.InvariantCultureIgnoreCase) Then
CurString = CurString.Substring(1)
End If
' convert result from above to hex
Dim numeric = Int32.Parse(CurString, NumberStyles.HexNumber)
' convert to bytes
Dim bytes = BitConverter.GetBytes(numeric)
' convert resulting bytes to a real char for display
finalString = finalString & Encoding.Unicode.GetString(bytes)
End If
Next
ParseUnicodeId = finalString
End Function
I tried to do this all kinds of ways; but can't seem to get it right. My code currently returns empty strings, although my guess is that is because of some of the more recent changes I have made to cut off the leading U or to try and chop off one char at a time. If I take those bits out and just pass it something like "Ud83c", it works perfectly; its only when plain text is mixed in that it fails, but I can't seem to come up with a way to separate the two and re-combine at the end.

You can use Regex.Unescape() to convert the unicode escaped char (\uXXXX) to a string.
If you receive \U instead of \u, you also need to perform that substitution, since \U is not recognized as a valid escape sequence.
Dim input as String = "Castle: \Ud83d\Udc40Jerusal\U00e9m.Miles"
Dim result As String = Regex.Unescape(input.Replace("\U", "\u")).
This prints (it may depend on the Font used):
Castle: 👀Jerusalém.Miles
As a note, you might also have used the wrong encoding when you decoded the input stream.

VB.NET Get text in between multiple Quotations

i need some help.
i need to get there text file value on Quot ("") on multi textbox1, textbox2, textbox3. but can only get on value (first value on textbox1)
now a time i just get one value (firt value on Quot)
text file (2.txt):
C:\contexture\img2itp.exe "\mynetwork\1.png" "\mynetwork\2.png" "148"
code vb:
Using sr As New StreamReader("C:\test\2.txt")
Dim line As String
' Read the stream to a string and write the string to the console.
line = sr.ReadToEnd()
Dim s As String = line
Dim i As Integer = s.IndexOf("""")
Dim f As String = s.Substring(i + 1, s.IndexOf("""", i + 1) - i - 1)
TextBox1.Text = f
thanks for a help :)

Regex Match
What Is Regex :
A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.
Source: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
Here we could consider two Regex Expressions to solve this problem a simple version and a more complex creating capture groups.
Simple one :
"(.*?)"
Here is the explanation: https://regex101.com/r/mzSmH5/1
Complex One :
(["'])(?:(?=(\\?))\2.)*?\1"(.*?)"
Here is an explanation: https://regex101.com/r/7ZMVsB/1
VB.NET Implementation
This would be a job for a Regex.Matches which would work like this:
Dim value As String = IO.File.ReadAllText("C:\test\2.txt")
Dim matches As MatchCollection = Regex.Matches(value, """(.*?)""")
' Loop over matches.
For Each m As Match In matches
' Loop over captures.
For Each c As Capture In m.Captures
' Display.
Console.WriteLine("Index={0}, Value={1}", c.Index, c.Value)
Next
Next

VB2010 String adds up Length by adding ""

I'am comparing Strings in Visual Basic 2010 Express. While cuting the String together it sometimes adds a Char with "", what I hoped is "nothing"
Example:
Dim text as String = "test"
Dim sign as Char = ""
text = text + sign
while debuging it says that the new text is "test", but if I ask for the Length it is 5.
This is a problem when I try to compare this with an other String
Dim bigtext as String = "test1234"
Dim text as String = "test"
Dim sign as Char = ""
text = text + sign
bigtext.indexOf(text) 'should be 0 (index), but is -1 (not found)
any idea how to filter a "" away or any other workaround?
Edit - my workoround for now:
Now I add "§" everywhere instead of "" and when I need to use indexOf() to compare something, I Replace("§", "") it.
(with Replace() it is deleted)

As far as I can see, a Char variable always has a character in it (which can be the null character). Concatenating it to another string will append that character to the existing string.
I see two workarounds:
Use a String for sign instead of a Char. The string could be empty or have a single character in it.
Trim the undesired character from the resulting string:
text = (text + sign).Trim(CChar(""))

Cannot trim closed bracket from vb.net string

I have an issue trimming a string in vb.net
Dim bgColor1 As String = (foundRows(count).Item(16).ToString())
'This returns Color [Indigo] I need it to be just Indigo so vb.net can read it.
'So i used this
Dim MyChar() As Char = {"C", "o", "l", "r", "[", "]", " "}
Dim firstBgcolorbgColor1 As String = bgColor1.TrimStart(MyChar)
'But the ] is still in the string so it looks like this Indigo]
Any ideas on why i cannot trim the ]?

Update
Didn't see that the input was "Color [Indigo]". I would not recommend TrimStart() & TrimEnd()
You have a variety of options to choose from:
Imports System
Imports System.Text.RegularExpressions
Public Module Module1
Public Sub Main()
Dim Color As String = "Color [Indigo]"
' Substring() & IndexOf()
Dim openBracket = Color.IndexOf("[") + 1
Dim closeBracket = Color.IndexOf("]")
Console.WriteLine(Color.Substring(openBracket, closeBracket - openBracket))
' Replace()
Console.WriteLine(Color.Replace("Color [", String.Empty).Replace("]", String.Empty))
' Regex.Replace()
Console.WriteLine(Regex.Replace(Color, "Color \[|\]", String.Empty))
' Regex.Match()
Console.WriteLine(Regex.Match(Color, "\[(\w+)\]").Groups(1))
End Sub
End Module
Results:
Indigo
Indigo
Indigo
Indigo
Demo

Well, you are calling TrimStart(...), which as the name implies, will only trim the front part of the string.
Did you mean to call Trim(MyChar) instead?

You could use a Regex to do the job:
Dim colorRegex As New Regex("(?<=\[)\w+") 'Get the word following the bracket ([)
Dim firstBgcolorbgColor1 As String = colorRegex.Match(bgColor1).Value

The TrimStart, TrimEnd, and Trim functions remove spaces from beginning, end and both side of the strings respectively. You are using TrimStart to remove any leading the spaces but it is leaving white space is at the end. So you need to use Trim. Trim won't remove anything else than white space characters, so your ] character will still appear in the final string. You need to do String.Remove to remove characters you don't want.
Examples here: http://www.dotnetperls.com/remove-vbnet

How to update Format function from VB to VB.NET

I am trying to port a VB function to VB.NET, but I cannot get the function to work correctly and update properly.
rFormat = Format(Format(Value, fmt), String$(Len(fmt), "#"))
It seems like the problem lies with the String$() function parameter which is used to align decimal points of values. How would I be able to properly fix this, or is there another way to achieve this?
EDIT
The following is an example console application that shows the issues that I am having.
Imports Microsoft.VisualBasic
Module Module1
Sub Main()
Dim rFormat As String
Dim fmt As String
Dim value As Object
fmt = "########.000"
value = 12345.2451212
'value = 12345
'~~~~~~~~~~~~~~~~~~~~~
'rFormat = Microsoft.VisualBasic.Format(Microsoft.VisualBasic.Format(value, fmt), "".PadLeft(fmt.Length, "#"c))
'Console.WriteLine(rFormat) ' <<Not working prints all "#" for any value!>>>
'rFormat = Microsoft.VisualBasic.Format(Microsoft.VisualBasic.Format(value, fmt), "".PadLeft(fmt.Length))
'Console.WriteLine(rFormat) '<<Not working prints nothing>>
'rFormat = (String.Format(value, fmt)).PadLeft(Len(fmt))
'Console.WriteLine(rFormat) ' <<Not working prints the value 12345.2451212>>> should print>>>>> 12345.245
'for integer values< works good>
rFormat = String.Format("{0," + fmt.Length.ToString + "}", String.Format(value, fmt))
Console.WriteLine(rFormat) ' <<Not working prints the value 12345.2451212>>> should print>>>>> 12345.245
'for integer values< works good>
End Sub
End Module

All String$ does is repeat the character specified in the second parameter the number of times specified in the first parameter.
So if fmt is, for example "9999", then the String$ command will produce "####".
You can replace this with the String.PadLeft method and continue to use the VB Format function from the Microsoft.VisualBasic namespace:
rFormat = Microsoft.VisualBasic.Format(Microsoft.VisualBasic.Format(value, fmt), "".PadLeft(fmt.Length, "#"c))
EDIT:
Based on the edit in the question, the correct format logic should be:
rFormat = String.Format("{0:" & fmt & "}", value)
It is very helpful to review the String.Format documentation since it has a lot of examples and explanation.

It sounds like you're wanting to pad out your results so they are a fixed length. How about using the String.PadLeft Method or the String.PadLeft(int32,char) Method to Pad out rFormat.
Something like this for spaces:
rFormat = (String.Format(value, fmt)).PadLeft(Len(fmt))
Edit
Boy is it hard to find VB6 documentation online. It appears that the # in a VB6 Custom Format has to do with String justification per this Forum posting and this SO answer they suggest something something like this.
rFormat = String.Format("{0," + fmt.Length.ToString + "}", String.Format(value, fmt))
This is using the Composite Formatting Alignment Component
Alignment Component
The optional alignment component is a signed integer indicating the preferred formatted field width. If the value of alignment is less than the length of the formatted string, alignment is ignored and the length of the formatted string is used as the field width. The formatted data in the field is right-aligned if alignment is positive and left-aligned if alignment is negative. If padding is necessary, white space is used. The comma is required if alignment is specified.
The main issue that I see in your updated example is that you are using an object to store your Double. By changing values declaration to a Decimal and changing the format function I was able to get it to work.
Sub Main()
Dim rFormat As String
Dim fmt As String
Dim value As Double
fmt = "#######0.000"
value = 12345.2451212
rFormat = String.Format("{0," + fmt.Length.ToString + "}", value.ToString(fmt))
Console.WriteLine(rFormat)
Console.ReadLine()
End Sub

In VBNet, you can also do this:
Dim rFormat As String = String.Empty
Dim fmt As String = "########.000"
Dim value As Object = 12345.2451212
rFormat = (CDbl(value)).ToString(fmt)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Detecting Unrecognized (codepage-unicode) Characters in a string - vb.net

Related

how to parse string containing Unicode ID's as well as plain text for display in datagrid view [duplicate]

VB.NET Get text in between multiple Quotations

VB2010 String adds up Length by adding ""

Cannot trim closed bracket from vb.net string

How to update Format function from VB to VB.NET

Categories

Resources