Get substring until first numeric character - vb.net

like my title already explained, I want to get a substring of a string (who contains a address) and I would like to have only the street..
It's not possible to only take the text (non-numeric) chars, because then the box will remain.
It's not possible to take substring till first space, because the streetname can contain a space..
For example 'developerstreet 123a' -> would like to have 'developerstreet'
The 'a' is a box number of the house, which I'm not interested in..
How can I do this in VB.NET?

Parsing addresses is notoriously difficult, so I caution you to make sure that you a very deliberate about the choices you make. I would strongly recommend reviewing the documentation provided by the postal service. If these are US addresses, you should start by looking at the USPS Publication 28.
However, to answer your specific question, you can find the index of the first numeric character in a string by using the Char.IsDigit method. You may also want to take a look at the Char.IsNumber method, but that's probably more inclusive than what you really want. For instance, this will get the index of the first numeric character in the input string:
Dim index As Integer = -1
For i As Integer = 0 to input.Length - 1
If Char.IsDigit(input(i)) Then
index = i
Exit For
End If
Next
However, for complex string parsing, like this, I would suggest learning Regular Expressions. Getting the non-numeric portion at the beginning of a string becomes trivial with RegEx:
Dim m As Match = Regex.Match(input, "^\D+")
If m.Success Then
Dim nonNumericPart As String = m.Value
End If
Here is the meaning of the regular expression in the above example:
^ - The matching string must start at the beginning of the line
\D - Any non-numeric character
+ - One or more times

try this:
Private Sub MyFormLoad(sender As Object, e As EventArgs) Handles Me.Load
Dim str As String = "developerstreet 123a"
Dim index As Integer = GetIndexOfNumber(str)
Dim substr As String = str.Substring(0, index)
MsgBox(substr)
End Sub
Public Function GetIndexOfNumber(ByVal str As String)
For n = 0 To str.Length - 1
If IsNumeric(str.Substring(n, 1)) Then
Return n
End If
Next
Return -1
End Function
output will be: developerstreet

text.Substring(0, text.IndexOfAny("0123456789"))

Related

Generate A Unquie Id (GUID) That Is Only 25 Characters In Length

What is the best appraoch to generate unquie ID's (no special characters) that will be 25 characters in length? I was thinking of generating a GUID and taking a substring of that, but I dont know if thats the best idea for uniqueness.
This is for dissconnected systems use. Creating a primary key in a database will not work in my situation. I need to create a unquie ID manually
I tried this but I am seeing duplicates in the output for some reason. So it doesnt seem too unquie even in this simple test..
Sub Main()
Dim sb As StringBuilder = New StringBuilder
For i As Integer = 1 To 100000
Dim s As String = GenerateRandomString(25, True)
sb.AppendLine(s)
sb.AppendLine(Environment.NewLine)
Next
Console.WriteLine(sb.ToString)
Console.ReadLine()
End Sub
Public Function GenerateRandomString(ByRef len As Integer, ByRef upper As Boolean) As String
Dim rand As New Random()
Dim allowableChars() As Char = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLOMNOPQRSTUVWXYZ0123456789".ToCharArray()
Dim final As String = String.Empty
For i As Integer = 0 To len - 1
final += allowableChars(rand.Next(allowableChars.Length - 1))
Next
Return IIf(upper, final.ToUpper(), final)
End Function
You’re probably seeing duplicates because New Random() is seeded according to a system clock, which may not have changed by the next iteration.
Try a cryptographically secure RNG:
Const ALLOWABLE_ALL As String = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
Const ALLOWABLE_UPPERCASE As String = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
Dim allowable As String = If(upper, ALLOWABLE_UPPERCASE, ALLOWABLE_ALL)
Dim result(len - 1) As Char
Dim current As Integer = 0
Using r As New Security.Cryptography.RNGCryptoServiceProvider()
Do
Dim buffer(255) As Byte
r.GetBytes(buffer)
For b As Byte In buffer
If b < allowable.Length Then
result(current) = allowable(b)
current += 1
If current = len Then Return New String(result)
End If
Next
Loop
End Using
This is also “more random” than your implementation in that letters aren’t weighted twice as heavily if upper is True.
A GUID might be 32 digits, but only if expressed in hexadecimal. That means it will only use characters 0-9 and A-F. If your string can use the entire alphabet then you can express the same GUID in fewer characters, especially if your string can be case sensitive.
See http://en.wikipedia.org/wiki/Globally_unique_identifier#Text_encoding for an example of alternative encoding, or http://web.archive.org/web/20100408172352/http://prettycode.org/2009/11/12/short-guid/ for example code. EDIT: Or Hans's method above which is much better. If you want to encode a GUID with only A-Z, a-z and 0-9 characters then you will need to look up Base-62 encoding (as opposed to base-64) because you only have 62 characters to encode into.
Stop trying to re-invent the wheel and just use .NET's built in GUID generator:
System.Guid.NewGuid()
which will generate a properly randomly seeded GUID, then simply sub-string it to your limit. Even better if you grab the last 25 chars, instead of the first 25.
PS: I don't consider this a great idea in general, because it's the entire GUID that's considered unique, not part of it, but it should satisfy what you want.

Replacing nth occurrence of string

This should be fairly simple but I'm having one of those days. Can anyone advise me as to how to replace the first and third occurrence of a character within a string? I have looked at replace but that cannot work as the string could be of different lengths. All I want to do is replace the first and third occurrence.
There is an overload of the IndexOf method which takes a start position as a parameter. Using a loop you'll be able to find the position of the first and third occurences. Then you could use a combination of the Remove and Insert methods to do the replacements.
You could also use a StringBuilder to do the replacements. The StringBuilder has a Replace method for which you can specify a start index and a number of characters affected.
aspiringCoder,
Perhaps something like this might be useful to you (in line with what Meta-Knight was talking about <+1>)
Dim str As String = "this is a test this is a test this is a test"
Dim first As Integer
Dim third As Integer
Dim base As Integer = 0
Dim i As Integer
While str.length > 0
If i = 0 Then
first = str.IndexOf("test")
else if i = 2 Then
third = base + str.IndexOf("test")
end if
base = base + str.IndexOf("test")
str = str.Remove(0, str.IndexOf("test") + "test".length -1 )
i++
End While
It might have a one-off error somewhere...but this should at least get you started.

splitting a string to access integer within it

i have a string "<PinX F='53mm'></PinX>", I want to access the 53 within the string and do some addition to it and then add the answer back into that string. I've been thinking about this and wasn't sure whether this can be done with regular expression or not? Can anybody help me out.
thanks
Yes, you can use a regular expression. This will get the digits, parse them to a number, add one to it, and put it back in the string (that is, the result is actually a new string as strings are immutable).
string s = Regex.Replace(
input,
#"(\d+)",
m => (Int32.Parse(m.Groups[1].Value) + 1).ToString()
);
Take a look at the HTML Agility Pack.
A regular expression looks like a good fit for this particular problem:
\d+
Will match one or more digits.
Int32.Parse(Regex.Match("<PinX F='53mm'></PinX>", #"\d+").Value)
Will return 53.
In this single case yes. "'(.*?)' then access the first group, but if this is part of a larger xml regular expressions should not be used. You should utilize the xml parser build into .net find the attribute with xsd and get the value.
Alternatively, here's a small routine...
' Set testing string
Dim s As String = "<PinX F='53mm'></PinX>"
' find first occurence of CHAR ( ' )
Dim a As Integer = s.IndexOf("'")
' find last occurence of CHAR ( ' )
Dim b As Integer = s.LastIndexOf("'")
' get substring "53mm" from string
Dim substring As String = s.Substring(a, b - a)
' get integer values from substring
Dim length As Integer = substring.Length
Dim c As Char = Nothing
Dim result As String = Nothing
For i = 1 To length - 1
c = substring.Chars(i)
If IsNumeric(c) Then
result = result & c
End If
Next
Console.WriteLine(Int32.Parse(result))
Console.ReadLine()

How to strip a string of all alpha's?

Dim phoneNumber As String = "077 47578 587(num)"
How do i strip the above string off every character which isnt a number. So only the numbers are left and then check to make sure it is 11 characters long?
dim number as string = Regex.Replace(phoneNumber,"[^0-9]","")
if number.length = 11 then
'valid number
else
'not valid
end if
You could loop on each character and check if it is a digit. While looping, check that the number of accepted characters (digits) is less than 11.
or
use a regex to remove all the alpha but you still will have to count at the end ....
Dim phoneNumber As String = "077 47578 587(num)"
Dim newPhoneNumber = String.Empty
For i = 0 To phoneNumber.Length - 1
If IsNumeric(phoneNumber(i)) Then
newPhoneNumber += phoneNumber(i)
End If
Next
Dim valid = newPhoneNumber.Length = 11
One possible solution is to treat the string as a character array, then retrieve only those characters with ascii codes within the paramaeters you define.
Ascii codes can be found at a resource such as: http://www.bolen.net/html/misc/ASCII-codes.html
Alternatively, you could use a regular expression to retrieve only the characters you want. My regex isn't so hot, so I can't give an example :)

Shortening a repeating sequence in a string

I have built a blog platform in VB.NET where the audience are very young, and for some reason like to express their commitment by repeating sequences of characters in their comments.
Examples:
Hi!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3
LOLOLOLOLOLOLOLOLOLOLOLOLLOLOLOLOLOLOLOLOLOLOLOLOL
..and so on.
I don't want to filter this out completely, however, I would like to shorten it down to a maximum of 5 repeating characters or sequences in a row.
I have no problem writing a function to handle a single repeating character. But what is the most effective way to filter out a repeating sequence as well?
This is what I used earlier for the single repeating characters
Private Shared Function RemoveSequence(ByVal str As String) As String
Dim sb As New System.Text.StringBuilder
sb.Capacity = str.Length
Dim c As Char
Dim prev As Char = String.Empty
Dim prevCount As Integer = 0
For i As Integer = 0 To str.Length - 1
c = str(i)
If c = prev Then
If prevCount < 10 Then
sb.Append(c)
End If
prevCount += 1
Else
sb.Append(c)
prevCount = 0
End If
prev = c
Next
Return sb.ToString
End Function
Any help would be greatly appreciated
You should be able to recursively use the 'Longest repeated substring problem' to solve this. On the first pass you will get two matching sub-strings, and will need to check if they are contiguous. Then repeat the step for one of the sub-strings. Cut off the algo, if the strings are not contiguous, or if the string size become less than a certain number of characters. Finally, you should be able to keep the last match, and discard the rest. You will need to dig around for an implementation :(
Also have a look at this previously asked question: finding long repeated substrings in a massive string