Split cells into 2 columns when any number - vba

I can see numerous posts around this topic but none that specifically solves the problem I have.
I have a string that has text and numbers. I need to split the string into 2 columns when it first sees a number.
Example:
Ballyvic Boru5/6
First Drift2/1
Sizing Cusimanoin15/2
Becomes:

You can use a simple formula to find the first number, along with LEFT and MID to split the string.
Part 1:
=LEFT(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890"))-1)
Part 2:
=MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),99)

Here's a regex method:
You must set a reference to Microsoft VBScript Regular Expressions x.x, where x.x is the highest version you have (mine is 5.5)
Option Explicit
Sub splitCells()
Dim RegEx As New RegExp, i As Long, tempStr As String
With RegEx
.Global = True
.IgnoreCase = True
.Pattern = "(([a-z]*\s?)*\D)(\d.*)"
End With
With ThisWorkbook.Worksheets(1)
For i = 1 To .Cells(.Rows.Count, 1).End(xlUp).Row
If RegEx.Test(.Cells(i, 1)) Then
tempStr = .Cells(i, 1)
.Cells(i, 1) = RegEx.Replace(tempStr, "$1")
.Cells(i, 2) = RegEx.Replace(tempStr, "$3")
End If
Next i
End With
End Sub
Breaking down the Regular Expression:
(([a-z]*\s?)*\D)(\d.*)
[a-z]* matches any character in the alphabet, with the * multiplier for unlimited occurances
\s? Matches any whitespace character, with the ? multiplier to match 0-1 occurances (meaning there may or may not be a white space
Both of the above is enclosed in a grouping (), followed by another * to match 0-unlimited occurances
\D This excludes all digits
The above is enclosed in a group with the first (([..])*\D)
We have our final group: (\d.*), which matches the first digit and everything else afterwards.

Here's a pair functions you can use on the worksheet (as opposed to having to run a VBA procedure to 'fix' the cells one time):
Public Function splitNum1(str As String) As String
Dim p
For p = 1 To Len(str)
If Mid(str, p, 1) Like "#" Then Exit For
Next
splitNum1 = Left(str, p - 1)
End Function
Public Function splitNum2(str As String) As String
splitNum2 = Right(str, Len(str) - Len(splitNum1(str)))
End Function
splitNum1 returns the string on the "left" side of the number.
splitNum2 returns the string beginning with the first nummber.

Related

Excel cell content validation with use of VBA code

I am looking for a solution to validate and highlight my cell in case false.
I tried the most promising solution: Regex. But still can not find the pattern I need.
My latest attempt was this pattern: "[A-Z-0-9_.]" This works only if the cell contains only a symbol and nothing else, if the symbol is part of a string it does not work.
Problem is that it does not catch cells that have an odd character in a string of text: Example C4UNIT| or B$GROUP.
Specification Cell can contain only capital characters and two allowed symbols Dash - and Underbar _
This is my complete code:
Function ValidateCellContent()
Sheets("MTO DATA").Select
Dim RangeToCheck As Range
Dim CellinRangeToCheck As Range
Dim CollNumberFirst As Integer
Dim CollNumberLast As Integer
Dim RowNumberFirst As Integer
Dim RowNumberLast As Integer
'--Start on Column "1" and Row "3"
CollNumberFirst = 1
RowNumberFirst = 3
'--Find last Column used on row "2" (Write OMI Headings)
CollNumberLast = Cells(2, Columns.count).End(xlToLeft).Column
RowNumberLast = Cells(Rows.count, 1).End(xlUp).Row
'--Set value of the used range of cell addresses like: "A3:K85"
Set RangeToCheck = Range(Chr(64 + CollNumberFirst) & RowNumberFirst & ":" & Chr(64 + CollNumberLast) & RowNumberLast)
Debug.Print "Cells used in active Range = " & (Chr(64 + CollNumberFirst) & RowNumberFirst & ":" & Chr(64 + CollNumberLast) & RowNumberLast)
For Each CellinRangeToCheck In RangeToCheck
Debug.Print "CellinRangeToCheck value = " & CellinRangeToCheck
If Len(CellinRangeToCheck.Text) > 0 Then
'--Non Printables (Space,Line Feed,Carriage Return)
If InStr(CellinRangeToCheck, " ") _
Or InStr(CellinRangeToCheck, Chr(10)) > 0 _
Or InStr(CellinRangeToCheck, Chr(13)) > 0 Then
CellinRangeToCheck.Font.Color = vbRed
CellinRangeToCheck.Font.Bold = True
'--Allowed Characters
ElseIf Not CellinRangeToCheck.Text Like "*[A-Z-0-9_.]*" Then
CellinRangeToCheck.Font.Color = vbRed
CellinRangeToCheck.Font.Bold = True
Else
CellinRangeToCheck.Font.Color = vbBlack
CellinRangeToCheck.Font.Bold = False
End If
End If
Next CellinRangeToCheck
End Function
Try this:
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
'we want only validate when cell content changed, if whole range is involved (i.e. more than 1 cell) then exit sub
If Target.Cells.Count > 1 Then Exit Sub
'if there is error in a cell, also color it red
If IsError(Target) Then
Target.Interior.ColorIndex = 3
Exit Sub
End If
'validate cell with our function, if cell content is valid, it'll return True
'if it i s not valid, then color cell red
If Not ValidateText(Target.Value) Then
Target.Interior.ColorIndex = 3
End If
End Sub
Function ValidateText(ByVal txt As String) As Boolean
Dim i As Long, char As String
'loop through all characters in string
For i = 1 To Len(txt)
char = Mid(txt, i, 1)
If Not ((Asc(char) >= 65 And Asc(char) <= 90) Or char = "-" Or char = "_") Then
'once we come upon invalid character, we can finish the function with False result
ValidateText = False
Exit Function
End If
Next
ValidateText = True
End Function
I've originally assumed you wanted to use RegEx to solve your problem. As per your comment you instead seem to be using the Like operator.
Like operator
While Like accepts character ranges that may resemble regular expressions, there are many differences and few similarities between the two:
Like uses ! to negate a character range instead of the ^ used in RegEx.
Like does not allow/know quantifiers after the closing bracket ] and thus always matches a single character per pair of brackets []. To match multiple characters you need to add multiple copies of your character range brackets.
Like does not understand advanced concepts like capturing groups or lookahead / lookbehind
probably more differences...
The unavailability of quantifiers leaves Like in a really bad spot for your problem. You always need to have one character range to compare to for each character in your cell's text. As such the only way I can see to make use of the Like operator would be as follows:
Private Function IsTextValid(ByVal stringToValidate As String) As Boolean
Dim CharValidationPattern As String
CharValidationPattern = "[A-Z0-9._-]"
Dim StringValidationPattern As String
StringValidationPattern = RepeatString(CharValidationPattern, Len(stringToValidate))
IsTextValid = stringToValidate Like StringValidationPattern
End Function
Private Function RepeatString(ByVal stringToRepeat As String, ByVal repetitions As Long) As String
Dim Result As String
Dim i As Long
For i = 1 To repetitions
Result = Result & stringToRepeat
Next i
RepeatString = Result
End Function
You can then pass the text you want to check to IsTextValid like that:
If IsTextValid("A.ASDZ-054_93") Then Debug.Print "Hurray, it's valid!"
As per your comment, a small Worksheet_Change event to place into the worksheet module of your respective worksheet. (You will also need to place the above two functions there. Alternatively you can make them public and place them in a standard module.):
Private Sub Worksheet_Change(ByVal Target As Range)
Dim ValidationRange As Range
Set ValidationRange = Me.Range("A2:D5")
Dim TargetCell As Range
For Each TargetCell In Target.Cells
' Only work on cells falling into the ValidationRange
If Not Intersect(TargetCell, ValidationRange) Is Nothing Then
If IsTextValid(TargetCell.Text) Then
TargetCell.Font.Color = vbBlack
TargetCell.Font.Bold = False
Else
TargetCell.Font.Color = vbRed
TargetCell.Font.Bold = True
End If
End If
Next TargetCell
End Sub
Regular Expressions
If you want to continue down the RegEx road, try this expression:
[^A-Z0-9_-]+
It will generate a match, whenever a passed-in string contains one or more characters you don't want. All cells with only valid characters should not return a match.
Explanation:
A-Z will match all capital letters,
0-9 will match all numbers,
_- will match underscore and dash symbols.
The preceding ^ will negate the whole character set, meaning the RegEx only matches characters not in the set.
The following + tells the RegEx engine to match one or more characters of the aforementioned set. You only want to match your input, if there is at least one illegal char in there. And if there are more than one, it should still match.
Once in place, adapting the system to changing requirements (different chars considered legal) is as easy as switching out a few characters between the [brackets].
See a live example online.

How to add decimal and remove text from alphanumerical string

I have a huge amount of data which is alphanumerical and I need to convert it to purely numerical. Which no text in the string.
Ex.
C0424.100 ---> 424.100 (or 0424.100)
There always is 3 places after the decimal. Any tips on how to go about this? I'm pretty new to VBA. So basically I need to remove all text and a decimal with three digits to the right of it.
This is well described in String functions and how to use them
However, this should get you started. I would handle the formatting in Excel afterwards, but this is the simple string to number conversion. If the strings are more complex, consider using the Search string function to find the numbers, then use Right, Left, Mid functions to trim the string. Lastly use the CDbl() function to convert the string to the double.
Macro code as follows:
Sub temp()
'
' temp Macro
Range("A2").Select
stringToConvert = Selection.Value
trimmedString = Right(stringToConvert, Len(stringToConvert) - 1)
numberToDisplay = CDbl(trimmedString)
Range("A3").Value = numberToDisplay
End Sub
Do you even need VBA? If your data always has just one leading alpha character then you can just use standard Excel functions. For an entry in A2 that you want to convert, place the following formula in a convenient cell (e.g. B2):
=VALUE(RIGHT(A2,LEN(A2)-1))
I got UDF options for you.
Option 1: If you want to remove all the alphas from the beginning of string:
Function RemoveFirstAlphas(txt As String) As String
Dim i As Long
For i = 1 To Len(txt)
Select Case Mid$(txt, i, 1)
Case "0" To "9": Exit For
Case Else: Mid$(txt, i, 1) = Chr(32)
End Select
Next
RemoveFirstAlphas = Trim(txt)
End Function
Option 2: If you want to remove all the alphas from entire string:
Function RemoveAllAlphas(txt As String) As String
Dim ObjRegex As Object
Set ObjRegex = CreateObject("vbscript.regexp")
With ObjRegex
.Global = True
.Pattern = "[a-zA-Z\s]+"
RemoveAllAlphas = .Replace(Replace(txt, "-", Chr(32)), vbNullString)
End With
End Function
No need for VBA. Something like:
=--MID(A1,MIN(FIND({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")),99)
will return the string starting with the first digit, and convert it to a numeric value. You can then format it in the cell however you wish.
The above will work with any number of non-digit leading characters. If will only have a single non-digit character, then #Skippy answer is simpler
If you have to have a VBA routine, something like the following should work -- it will extract the first numeric substring in the string. It does not matter if there are non-digits before or after. And, if there are no digits, the function will return the #NUM! error
Option Explicit
Function ExtractNums(S As String) As Variant
Dim I As Long
For I = 1 To Len(S)
If IsNumeric(Mid(S, I, 1)) Then
ExtractNums = Val(Mid(S, I))
Exit Function
End If
Next I
ExtractNums = CVErr(xlErrNum)
End Function

Excel VBA Using wildcard to replace string within string

I have a difficult situation and so far no luck in finding a solution.
My VBA collects number figures like $80,000.50. and I'm trying to get VBA to remove the last period to make it look like $80,000.50 but without using right().
The problem is after the last period there are hidden spaces or characters which will be a whole lot of new issue to handle so I'm just looking for something like:
replace("$80,000.50.",".**.",".**")
Is this possible in VBA?
I cant leave a comment so....
what about InStrRev?
Private Sub this()
Dim this As String
this = "$80,000.50."
this = Left(this, InStrRev(this, ".") - 1)
Debug.Print ; this
End Sub
Mid + Find
You can use Mid and Find functions. Like so:
The Find will find the first dot . character. If all the values you are collecting are currency with 2 decimals, stored as text, this will work well.
The formula is: =MID(A2,1,FIND(".",A2)+2)
VBA solution
Function getStringToFirstOccurence(inputUser As String, FindWhat As String) As String
getStringToFirstOccurence = Mid(inputUser, 1, WorksheetFunction.Find(FindWhat, inputUser) + 2)
End Function
Other possible solutions, hints
Trim + Clear + Substitute(Char(160)): Chandoo -
Untrimmable Spaces – Excel Formula
Ultimately, you can implement Regular expressions into Excel UDF: VBScript’s Regular Expression Support
How about:
Sub dural()
Dim r As Range
For Each r In Selection
s = r.Text
l = Len(s)
For i = l To 1 Step -1
If Mid(s, i, 1) = "." Then
r.Value = Mid(s, 1, i - 1) & Mid(s, i + 1)
Exit For
End If
Next i
Next r
End Sub
This will remove the last period and leave all the other characters intact. Before:
and after:
EDIT#1:
This version does not require looping over the characters in the cell:
Sub qwerty()
Dim r As Range
For Each r In Selection
If InStr(r.Value, ".") > 0 Then r.Characters(InStrRev(r.Text, "."), 1).Delete
Next r
End Sub
Shortest Solution
Simply use the Val command. I assume this is meant to be a numerical figure anyway? Get rid of commas and the dollar sign, then convert to value, which will ignore the second point and any other trailing characters! Robustness not tested, but seems to work...
Dim myString as String
myString = "$80,000.50. junk characters "
' Remove commas and dollar signs, then convert to value.
Dim myVal as Double
myVal = Val(Replace(Replace(myString,"$",""),",",""))
' >> myVal = 80000.5
' If you're really set on getting a formatted string back, use Format:
myString = Format(myVal, "$000,000.00")
' >> myString = $80,000.50
From the Documentation,
The Val function stops reading the string at the first character it can't recognize as part of a number. Symbols and characters that are often considered parts of numeric values, such as dollar signs and commas, are not recognized.
This is why we must first remove the dollar sign, and why it ignores all the junk after the second dot, or for that matter anything non numerical at the end!
Working with Strings
Edit: I wrote this solution first but now think the above method is more comprehensive and shorter - left here for completeness.
Trim() removes whitespace at the end of a string. Then you could simply use Left() to get rid of the last point...
' String with trailing spaces and a final dot
Dim myString as String
myString = "$80,000.50. "
' Get rid of whitespace at end
myString = Trim(myString)
' Might as well check if there is a final dot before removing it
If Right(myString, 1) = "." Then
myString = Left(myString, Len(myString) - 1)
End If
' >> myString = "$80,000.50"

VBA check if a string follows pattern, space, two letters, six digits

I need to check if last 9 chars from cell follow a pattern.
The searched pattern is space two letters and six digits.
Cells contain some text then should have this pattern.
Usually searched cell content looks something like this:
"Tractor mowers PT009988"
Regards
Michał
This will test for this.
Public Function RegExTest(sCellContent As String) As String
Dim sContent As String, sMatch As Variant, i As Long
sContent = Right(sCellContent, 9)
With CreateObject("VBScript.RegExp")
.Global = True
.ignorecase = True
.Pattern = " [A-Za-z]{2}[0-9]{6}"
If .test(sContent) Then
Set sMatch = .Execute(sContent)
RegExTest = sMatch(i)
Exit Function
End If
End With
End Function
This is the pattern that needs to be matched:
" [A-Za-z]{2}[0-9]{6}"
1 Space, 2 letters (both upper case and lower case) and six digits.
If in range A1 is the value Tractor mowers PT009988 and you put this formula in B1 =RegExTest(A1) then the output in B1 will be PT009988.
If you don't care whether or not this is in the last 9 characters then change sContent = Right(sCellContent, 9) to sContent = sCellContent
Try this (if you want to include upper- and lower-case characters)
Dim c
For Each c In Selection
If c.Value Like "* [A-Z,a-z][A-Z,a-z]######" Then _
Debug.Print c.Value
Next c
https://msdn.microsoft.com/en-us/library/swf8kaxw.aspx

How to normalize filenames listed in a range

I have a list of filenames in a spreadsheet in the form of "Smith, J. 010112.pdf". However, they're in the varying formats of "010112.pdf", "01.01.12.pdf", and "1.01.2012.pdf". How could I change these to one format of "010112.pdf"?
Personally I hate using VBA where worksheet functions will work, so I've worked out a way to do this with worksheet functions. Although you could cram this all into one cell, I've broken it out into a lot of independent steps in separate columns so you can see how it's working, step by step.
For simplicity I'm assuming your file name is in A1
B1 =LEN(A1)
determine the length of the filename
C1 =SUBSTITUTE(A1," ","")
replace spaces with nothing
D1 =LEN(C1)
see how long the string is if you replace spaces with nothing
E1 =B1-D1
determine how many spaces there are
F1 =SUBSTITUTE(A1," ",CHAR(8),E1)
replace the last space with a special character that can't occur in a file name
G1 =SEARCH(CHAR(8), F1)
find the special character. Now we know where the last space is
H1 =LEFT(A1,G1-1)
peel off everything before the last space
I1 =MID(A1,G1+1,255)
peel off everything after the last space
J1 =FIND(".",I1)
find the first dot
K1 =FIND(".",I1,J1+1)
find the second dot
L1 =FIND(".",I1,K1+1)
find the third dot
M1 =MID(I1,1,J1-1)
find the first number
N1 =MID(I1,J1+1,K1-J1-1)
find the second number
O1 =MID(I1,K1+1,L1-K1-1)
find the third number
P1 =TEXT(M1,"00")
pad the first number
Q1 =TEXT(N1,"00")
pad the second number
R1 =TEXT(O1,"00")
pad the third number
S1 =IF(ISERR(K1),M1,P1&Q1&R1)
put the numbers together
T1 =H1&" "&S1&".pdf"
put it all together
It's kind of a mess because Excel hasn't added a single new string manipulation function in over 20 years, so things that should be easy (like "find last space") require severe trickery.
Here's a screenshot of a simple four-step method based on Excel commands and formulas, as suggested in a comment to the answered post (with a few changes)...
This function below works. I've assumed that the date is in ddmmyy format, but adjust as appropriate if it's mmddyy -- I can't tell from your example.
Function FormatThis(str As String) As String
Dim strDate As String
Dim iDateStart As Long
Dim iDateEnd As Long
Dim temp As Variant
' Pick out the date part
iDateStart = GetFirstNumPosition(str, False)
iDateEnd = GetFirstNumPosition(str, True)
strDate = Mid(str, iDateStart, iDateEnd - iDateStart + 1)
If InStr(strDate, ".") <> 0 Then
' Deal with the dot delimiters in the date
temp = Split(strDate, ".")
strDate = Format(DateSerial( _
CInt(temp(2)), CInt(temp(1)), CInt(temp(0))), "ddmmyy")
Else
' No dot delimiters... assume date is already formatted as ddmmyy
' Do nothing
End If
' Piece it together
FormatThis = Left(str, iDateStart - 1) _
& strDate & Right(str, Len(str) - iDateEnd)
End Function
This uses the following helper function:
Function GetFirstNumPosition(str As String, startFromRight As Boolean) As Long
Dim i As Long
Dim startIndex As Long
Dim endIndex As Long
Dim indexStep As Integer
If startFromRight Then
startIndex = Len(str)
endIndex = 1
indexStep = -1
Else
startIndex = 1
endIndex = Len(str)
indexStep = 1
End If
For i = startIndex To endIndex Step indexStep
If Mid(str, i, 1) Like "[0-9]" Then
GetFirstNumPosition = i
Exit For
End If
Next i
End Function
To test:
Sub tester()
MsgBox FormatThis("Smith, J. 01.03.12.pdf")
MsgBox FormatThis("Smith, J. 010312.pdf")
MsgBox FormatThis("Smith, J. 1.03.12.pdf")
MsgBox FormatThis("Smith, J. 1.3.12.pdf")
End Sub
They all return "Smith, J. 010312.pdf".
You don't need VBA. Start by replacing the "."s with nothing:
=SUBSTITUTE(A1,".","")
This will change the ".PDF" to "PDF", so let's put that back:
=SUBSTITUTE(SUBSTITUTE(A1,".",""),"pdf",".pdf")
Got awk? Get the data into a text file, and
awk -F'.' '{ if(/[0-9]+\.[0-9]+\.[0-9]+/) printf("%s., %02d%02d%02d.pdf\n", $1, $2, $3, length($4) > 2 ? substr($4,3,2) : $4); else print $0; }' your_text_file
Assuming the data are exactly as what you described, e.g.,
Smith, J. 010112.pdf
Mit, H. 01.02.12.pdf
Excel, M. 8.1.1989.pdf
Lec, X. 06.28.2012.pdf
DISCLAIMER:
As #Jean-FrançoisCorbett has mentioned, this does not work for "Smith, J. 1.01.12.pdf". Instead of reworking this completely, I'd recommend his solution!
Option Explicit
Function ExtractNumerals(Original As String) As String
'Pass everything up to and including ".pdf", then concatenate the result of this function with ".pdf".
'This will not return the ".pdf" if passed, which is generally not my ideal solution, but it's a simpler form that still should get the job done.
'If you have varying extensions, then look at the code of the test sub as a guide for how to compensate for the truncation this function creates.
Dim i As Integer
Dim bFoundFirstNum As Boolean
For i = 1 To Len(Original)
If IsNumeric(Mid(Original, i, 1)) Then
bFoundFirstNum = True
ExtractNumerals = ExtractNumerals & Mid(Original, i, 1)
ElseIf Not bFoundFirstNum Then
ExtractNumerals = ExtractNumerals & Mid(Original, i, 1)
End If
Next i
End Function
I used this as a testcase, which does not correctly cover all your examples:
Sub test()
MsgBox ExtractNumerals("Smith, J. 010112.pdf") & ".pdf"
End Sub