Extracting Date/Time from comment cell - vba

I have a comment field with cells containing text like this:
Cancelled by user at 2018-01-03 03:11:57 without charge
I want to get the date and time information, but it may not always be in the 3rd/4th from last spaces, otherwise I might try to do some sort of complicated split of the cell. Is there an "in cell" way extract the date time information? Or will this need a VBA script? I prefer the former, but I'm trying to make a macro to simplify my life anyway, so VBA would work too.

I'd propose the following formula:
=MID(A1,FIND("at 20",A1)+3,19)
This would require that the date is always preceded by the word 'at' and the date string starts with 20.

You can try this function. It splits the string checking for items that have the first letter numeric, and builds a result string of just the date information.
Public Function ParseForDate(sCell As String) As String
Dim vSplit As Variant
Dim nIndex As Integer
Dim sResult As String
vSplit = Split(sCell, " ")
For nIndex = 0 To UBound(vSplit)
If IsNumeric(Left$(vSplit(nIndex), 1)) Then
sResult = sResult & vSplit(nIndex) & " "
End If
Next
ParseForDate = Trim$(sResult)
End Function
If you wanted to use it in a formula it would look something like this:
=ParseForDate(A1)
To use it in a VBA routine:
Dim s as String
s = ParseForDate(Range("A1"))

Non-VBA solution: (this is assuming the date format is always the same for all cells)
= MAX(IFERROR(DATEVALUE(MID(A1,ROW(INDEX($A:$A,1):INDEX($A:$A,LEN(A1)-19)),20)),0))
+MAX(IFERROR(TIMEVALUE(MID(A1,ROW(INDEX($A:$A,1):INDEX($A:$A,LEN(A1)-19)),20)),0))
Note this is an array formula, so you must press Ctrl+Shift+Enter instead of just Enter when typing this formula.
You will obviously then need to format the cell as a date and time, but this formula gets the numerical value that Excel uses for its internal date and time system.

Using a regex will enable you to fetch the date and time, irrespective of its placement in the string. The following solution will work if the date and time are of the same format as shown in the example string.
Code:
Sub getDateTime()
Dim objReg, matches, str
str = Sheet1.Cells(1, 1).Value 'Change this as per your requirements
Set objReg = CreateObject("vbscript.regexp")
objReg.Global = True
objReg.Pattern = "\d{4}(?:-\d{2}){2}\s*\d{2}(?::\d{2}){2}"
If objReg.test(str) Then
Set matches = objReg.Execute(str)
strResult = matches.Item(0)
MsgBox strResult
End If
End Sub
Click for Regex Demo
Regex Explanation:
\d{4} - matches 4 digits representing the year
(?:-\d{2}){2} - matches - followed by 2 digits. {2} in the end repeats this match 2 times. Once for getting MM and the next time for DD
\s* - matches 0+ whitespaces to match the space between the Date and Time
\d{2} - matches 2 digits representing the HH
(?::\d{2}){2} - matches : followed by 2 digits. The {2} in the end repeats this match 2 times. First time for matching the :MM and the next time for matching the :SS
Screenshots:
Output:

This will be good for about 90 years (using cell C3 for example):
Sub GetDate()
Dim s As String
s = Range("C3").Comment.Text
arr = Split(s, " ")
For i = LBound(arr) To UBound(arr)
If Left(arr(i), 2) = "20" Then
msg = arr(i) & " " & arr(i + 1)
MsgBox msg
Exit Sub
End If
Next i
End Sub

Related

Excel cell content validation with use of VBA code

I am looking for a solution to validate and highlight my cell in case false.
I tried the most promising solution: Regex. But still can not find the pattern I need.
My latest attempt was this pattern: "[A-Z-0-9_.]" This works only if the cell contains only a symbol and nothing else, if the symbol is part of a string it does not work.
Problem is that it does not catch cells that have an odd character in a string of text: Example C4UNIT| or B$GROUP.
Specification Cell can contain only capital characters and two allowed symbols Dash - and Underbar _
This is my complete code:
Function ValidateCellContent()
Sheets("MTO DATA").Select
Dim RangeToCheck As Range
Dim CellinRangeToCheck As Range
Dim CollNumberFirst As Integer
Dim CollNumberLast As Integer
Dim RowNumberFirst As Integer
Dim RowNumberLast As Integer
'--Start on Column "1" and Row "3"
CollNumberFirst = 1
RowNumberFirst = 3
'--Find last Column used on row "2" (Write OMI Headings)
CollNumberLast = Cells(2, Columns.count).End(xlToLeft).Column
RowNumberLast = Cells(Rows.count, 1).End(xlUp).Row
'--Set value of the used range of cell addresses like: "A3:K85"
Set RangeToCheck = Range(Chr(64 + CollNumberFirst) & RowNumberFirst & ":" & Chr(64 + CollNumberLast) & RowNumberLast)
Debug.Print "Cells used in active Range = " & (Chr(64 + CollNumberFirst) & RowNumberFirst & ":" & Chr(64 + CollNumberLast) & RowNumberLast)
For Each CellinRangeToCheck In RangeToCheck
Debug.Print "CellinRangeToCheck value = " & CellinRangeToCheck
If Len(CellinRangeToCheck.Text) > 0 Then
'--Non Printables (Space,Line Feed,Carriage Return)
If InStr(CellinRangeToCheck, " ") _
Or InStr(CellinRangeToCheck, Chr(10)) > 0 _
Or InStr(CellinRangeToCheck, Chr(13)) > 0 Then
CellinRangeToCheck.Font.Color = vbRed
CellinRangeToCheck.Font.Bold = True
'--Allowed Characters
ElseIf Not CellinRangeToCheck.Text Like "*[A-Z-0-9_.]*" Then
CellinRangeToCheck.Font.Color = vbRed
CellinRangeToCheck.Font.Bold = True
Else
CellinRangeToCheck.Font.Color = vbBlack
CellinRangeToCheck.Font.Bold = False
End If
End If
Next CellinRangeToCheck
End Function
Try this:
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
'we want only validate when cell content changed, if whole range is involved (i.e. more than 1 cell) then exit sub
If Target.Cells.Count > 1 Then Exit Sub
'if there is error in a cell, also color it red
If IsError(Target) Then
Target.Interior.ColorIndex = 3
Exit Sub
End If
'validate cell with our function, if cell content is valid, it'll return True
'if it i s not valid, then color cell red
If Not ValidateText(Target.Value) Then
Target.Interior.ColorIndex = 3
End If
End Sub
Function ValidateText(ByVal txt As String) As Boolean
Dim i As Long, char As String
'loop through all characters in string
For i = 1 To Len(txt)
char = Mid(txt, i, 1)
If Not ((Asc(char) >= 65 And Asc(char) <= 90) Or char = "-" Or char = "_") Then
'once we come upon invalid character, we can finish the function with False result
ValidateText = False
Exit Function
End If
Next
ValidateText = True
End Function
I've originally assumed you wanted to use RegEx to solve your problem. As per your comment you instead seem to be using the Like operator.
Like operator
While Like accepts character ranges that may resemble regular expressions, there are many differences and few similarities between the two:
Like uses ! to negate a character range instead of the ^ used in RegEx.
Like does not allow/know quantifiers after the closing bracket ] and thus always matches a single character per pair of brackets []. To match multiple characters you need to add multiple copies of your character range brackets.
Like does not understand advanced concepts like capturing groups or lookahead / lookbehind
probably more differences...
The unavailability of quantifiers leaves Like in a really bad spot for your problem. You always need to have one character range to compare to for each character in your cell's text. As such the only way I can see to make use of the Like operator would be as follows:
Private Function IsTextValid(ByVal stringToValidate As String) As Boolean
Dim CharValidationPattern As String
CharValidationPattern = "[A-Z0-9._-]"
Dim StringValidationPattern As String
StringValidationPattern = RepeatString(CharValidationPattern, Len(stringToValidate))
IsTextValid = stringToValidate Like StringValidationPattern
End Function
Private Function RepeatString(ByVal stringToRepeat As String, ByVal repetitions As Long) As String
Dim Result As String
Dim i As Long
For i = 1 To repetitions
Result = Result & stringToRepeat
Next i
RepeatString = Result
End Function
You can then pass the text you want to check to IsTextValid like that:
If IsTextValid("A.ASDZ-054_93") Then Debug.Print "Hurray, it's valid!"
As per your comment, a small Worksheet_Change event to place into the worksheet module of your respective worksheet. (You will also need to place the above two functions there. Alternatively you can make them public and place them in a standard module.):
Private Sub Worksheet_Change(ByVal Target As Range)
Dim ValidationRange As Range
Set ValidationRange = Me.Range("A2:D5")
Dim TargetCell As Range
For Each TargetCell In Target.Cells
' Only work on cells falling into the ValidationRange
If Not Intersect(TargetCell, ValidationRange) Is Nothing Then
If IsTextValid(TargetCell.Text) Then
TargetCell.Font.Color = vbBlack
TargetCell.Font.Bold = False
Else
TargetCell.Font.Color = vbRed
TargetCell.Font.Bold = True
End If
End If
Next TargetCell
End Sub
Regular Expressions
If you want to continue down the RegEx road, try this expression:
[^A-Z0-9_-]+
It will generate a match, whenever a passed-in string contains one or more characters you don't want. All cells with only valid characters should not return a match.
Explanation:
A-Z will match all capital letters,
0-9 will match all numbers,
_- will match underscore and dash symbols.
The preceding ^ will negate the whole character set, meaning the RegEx only matches characters not in the set.
The following + tells the RegEx engine to match one or more characters of the aforementioned set. You only want to match your input, if there is at least one illegal char in there. And if there are more than one, it should still match.
Once in place, adapting the system to changing requirements (different chars considered legal) is as easy as switching out a few characters between the [brackets].
See a live example online.

Excel VBA Using wildcard to replace string within string

I have a difficult situation and so far no luck in finding a solution.
My VBA collects number figures like $80,000.50. and I'm trying to get VBA to remove the last period to make it look like $80,000.50 but without using right().
The problem is after the last period there are hidden spaces or characters which will be a whole lot of new issue to handle so I'm just looking for something like:
replace("$80,000.50.",".**.",".**")
Is this possible in VBA?
I cant leave a comment so....
what about InStrRev?
Private Sub this()
Dim this As String
this = "$80,000.50."
this = Left(this, InStrRev(this, ".") - 1)
Debug.Print ; this
End Sub
Mid + Find
You can use Mid and Find functions. Like so:
The Find will find the first dot . character. If all the values you are collecting are currency with 2 decimals, stored as text, this will work well.
The formula is: =MID(A2,1,FIND(".",A2)+2)
VBA solution
Function getStringToFirstOccurence(inputUser As String, FindWhat As String) As String
getStringToFirstOccurence = Mid(inputUser, 1, WorksheetFunction.Find(FindWhat, inputUser) + 2)
End Function
Other possible solutions, hints
Trim + Clear + Substitute(Char(160)): Chandoo -
Untrimmable Spaces – Excel Formula
Ultimately, you can implement Regular expressions into Excel UDF: VBScript’s Regular Expression Support
How about:
Sub dural()
Dim r As Range
For Each r In Selection
s = r.Text
l = Len(s)
For i = l To 1 Step -1
If Mid(s, i, 1) = "." Then
r.Value = Mid(s, 1, i - 1) & Mid(s, i + 1)
Exit For
End If
Next i
Next r
End Sub
This will remove the last period and leave all the other characters intact. Before:
and after:
EDIT#1:
This version does not require looping over the characters in the cell:
Sub qwerty()
Dim r As Range
For Each r In Selection
If InStr(r.Value, ".") > 0 Then r.Characters(InStrRev(r.Text, "."), 1).Delete
Next r
End Sub
Shortest Solution
Simply use the Val command. I assume this is meant to be a numerical figure anyway? Get rid of commas and the dollar sign, then convert to value, which will ignore the second point and any other trailing characters! Robustness not tested, but seems to work...
Dim myString as String
myString = "$80,000.50. junk characters "
' Remove commas and dollar signs, then convert to value.
Dim myVal as Double
myVal = Val(Replace(Replace(myString,"$",""),",",""))
' >> myVal = 80000.5
' If you're really set on getting a formatted string back, use Format:
myString = Format(myVal, "$000,000.00")
' >> myString = $80,000.50
From the Documentation,
The Val function stops reading the string at the first character it can't recognize as part of a number. Symbols and characters that are often considered parts of numeric values, such as dollar signs and commas, are not recognized.
This is why we must first remove the dollar sign, and why it ignores all the junk after the second dot, or for that matter anything non numerical at the end!
Working with Strings
Edit: I wrote this solution first but now think the above method is more comprehensive and shorter - left here for completeness.
Trim() removes whitespace at the end of a string. Then you could simply use Left() to get rid of the last point...
' String with trailing spaces and a final dot
Dim myString as String
myString = "$80,000.50. "
' Get rid of whitespace at end
myString = Trim(myString)
' Might as well check if there is a final dot before removing it
If Right(myString, 1) = "." Then
myString = Left(myString, Len(myString) - 1)
End If
' >> myString = "$80,000.50"

Word VBA: iterating through characters incredibly slow

I have a macro that changes single quotes in front of a number to an apostrophe (or close single curly quote). Typically when you type something like "the '80s" in word, the apostrophe in front of the "8" faces the wrong way. The macro below works, but it is incredibly slow (like 10 seconds per page). In a regular language (even an interpreted one), this would be a fast procedure. Any insights why it takes so long in VBA on Word 2007? Or if someone has some find+replace skills that can do this without iterating, please let me know.
Sub FixNumericalReverseQuotes()
Dim char As Range
Debug.Print "starting " + CStr(Now)
With Selection
total = .Characters.Count
' Will be looking ahead one character, so we need at least 2 in the selection
If total < 2 Then
Return
End If
For x = 1 To total - 1
a_code = Asc(.Characters(x))
b_code = Asc(.Characters(x + 1))
' We want to convert a single quote in front of a number to an apostrophe
' Trying to use all numerical comparisons to speed this up
If (a_code = 145 Or a_code = 39) And b_code >= 48 And b_code <= 57 Then
.Characters(x) = Chr(146)
End If
Next x
End With
Debug.Print "ending " + CStr(Now)
End Sub
Beside two specified (Why...? and How to do without...?) there is an implied question – how to do proper iteration through Word object collection.
Answer is – to use obj.Next property rather than access by index.
That is, instead of:
For i = 1 to ActiveDocument.Characters.Count
'Do something with ActiveDocument.Characters(i), e.g.:
Debug.Pring ActiveDocument.Characters(i).Text
Next
one should use:
Dim ch as Range: Set ch = ActiveDocument.Characters(1)
Do
'Do something with ch, e.g.:
Debug.Print ch.Text
Set ch = ch.Next 'Note iterating
Loop Until ch is Nothing
Timing: 00:03:30 vs. 00:00:06, more than 3 minutes vs. 6 seconds.
Found on Google, link lost, sorry. Confirmed by personal exploration.
Modified version of #Comintern's "Array method":
Sub FixNumericalReverseQuotes()
Dim chars() As Byte
chars = StrConv(Selection.Text, vbFromUnicode)
Dim pos As Long
For pos = 0 To UBound(chars) - 1
If (chars(pos) = 145 Or chars(pos) = 39) _
And (chars(pos + 1) >= 48 And chars(pos + 1) <= 57) Then
' Make the change directly in the selection so track changes is sensible.
' I have to use 213 instead of 146 for reasons I don't understand--
' probably has to do with encoding on Mac, but anyway, this shows the change.
Selection.Characters(pos + 1) = Chr(213)
End If
Next pos
End Sub
Maybe this?
Sub FixNumQuotes()
Dim MyArr As Variant, MyString As String, X As Long, Z As Long
Debug.Print "starting " + CStr(Now)
For Z = 145 To 146
MyArr = Split(Selection.Text, Chr(Z))
For X = LBound(MyArr) To UBound(MyArr)
If IsNumeric(Left(MyArr(X), 1)) Then MyArr(X) = "'" & MyArr(X)
Next
MyString = Join(MyArr, Chr(Z))
Selection.Text = MyString
Next
Selection.Text = Replace(Replace(Selection.Text, Chr(146) & "'", "'"), Chr(145) & "'", "'")
Debug.Print "ending " + CStr(Now)
End Sub
I am not 100% sure on your criteria, I have made both an open and close single quote a ' but you can change that quite easily if you want.
It splits the string to an array on chr(145), checks the first char of each element for a numeric and prefixes it with a single quote if found.
Then it joins the array back to a string on chr(145) then repeats the whole things for chr(146). Finally it looks through the string for an occurence of a single quote AND either of those curled quotes next to each other (because that has to be something we just created) and replaces them with just the single quote we want. This leaves any occurence not next to a number intact.
This final replacement part is the bit you would change if you want something other than ' as the character.
I have been struggling with this for days now. My attempted solution was to use a regular expression on document.text. Then, using the matches in a document.range(start,end), replace the text. This preserves formatting.
The problem is that the start and end in the range do not match the index into text. I think I have found the discrepancy - hidden in the range are field codes (in my case they were hyperlinks). In addition, document.text has a bunch of BEL codes that are easy to strip out. If you loop through a range using the character method, append the characters to a string and print it you will see the field codes that don't show up if you use the .text method.
Amazingly you can get the field codes in document.text if you turn on "show field codes" in one of a number of ways. Unfortunately, that version is not exactly the same as what the range/characters shows - the document.text has just the field code, the range/characters has the field code and the field value. Therefore you can never get the character indices to match.
I have a working version where instead of using range(start,end), I do something like:
Set matchRange = doc.Range.Characters(myMatches(j).FirstIndex + 1)
matchRange.Collapse (wdCollapseStart)
Call matchRange.MoveEnd(WdUnits.wdCharacter, myMatches(j).Length)
matchRange.text = Replacement
As I say, this works but the first statement is dreadfully slow - it appears that Word is iterating through all of the characters to get to the correct point. In doing so, it doesn't seem to count the field codes, so we get to the correct point.
Bottom line, I have not been able to come up with a good way to match the indexing of the document.text string to an equivalent range(start,end) that is not a performance disaster.
Ideas welcome, and thanks.
This is a problem begging for regular expressions. Resolving the .Characters calls that many times is probably what is killing you in performance.
I'd do something like this:
Public Sub FixNumericalReverseQuotesFast()
Dim expression As RegExp
Set expression = New RegExp
Dim buffer As String
buffer = Selection.Range.Text
expression.Global = True
expression.MultiLine = True
expression.Pattern = "[" & Chr$(145) & Chr$(39) & "]\d"
Dim matches As MatchCollection
Set matches = expression.Execute(buffer)
Dim found As Match
For Each found In matches
buffer = Replace(buffer, found, Chr$(146) & Right$(found, 1))
Next
Selection.Range.Text = buffer
End Sub
NOTE: Requires a reference to Microsoft VBScript Regular Expressions 5.5 (or late binding).
EDIT:
The solution without using the Regular Expressions library is still avoiding working with Ranges. This can easily be converted to working with a byte array instead:
Sub FixNumericalReverseQuotes()
Dim chars() As Byte
chars = StrConv(Selection.Text, vbFromUnicode)
Dim pos As Long
For pos = 0 To UBound(chars) - 1
If (chars(pos) = 145 Or chars(pos) = 39) _
And (chars(pos + 1) >= 48 And chars(pos + 1) <= 57) Then
chars(pos) = 146
End If
Next pos
Selection.Text = StrConv(chars, vbUnicode)
End Sub
Benchmarks (100 iterations, 3 pages of text with 100 "hits" per page):
Regex method: 1.4375 seconds
Array method: 2.765625 seconds
OP method: (Ended task after 23 minutes)
About half as fast as the Regex, but still roughly 10ms per page.
EDIT 2: Apparently the methods above are not format safe, so method 3:
Sub FixNumericalReverseQuotesVThree()
Dim full_text As Range
Dim cached As Long
Set full_text = ActiveDocument.Range
full_text.Find.ClearFormatting
full_text.Find.MatchWildcards = True
cached = full_text.End
Do While full_text.Find.Execute("[" & Chr$(145) & Chr$(39) & "][0-9]")
full_text.End = full_text.Start + 2
full_text.Characters(1) = Chr$(96)
full_text.Start = full_text.Start + 1
full_text.End = cached
Loop
End Sub
Again, slower than both the above methods, but still runs reasonably fast (on the order of ms).

Word VBA code to cut numbers from one column and paste them in another

I am looking for some code that can search cell by cell in the 2nd column of a table for numbers and decimal points, cut them and paste them in the cell to the left whilst leaving the text behind.
For example:
1(tab space)Test
1.1(tab space)Test
1.1.1(tab space)Test
1.1.1.1(tab space)Test
Where the bullet points represent separate cells in different columns.
In all instances the numbers are separated from the text by a tab space "Chr9" (as indicated in the example)
Any help or useful snippets of code would much appreciated!
EDIT: I have some code that scans each cell in a column but I dont know the code to tell it to only cut numbers and decimal points up to the first tab space.
The Split function delivers what you are after. Sample code:
Dim inputString As String
Dim splitArray() As String
Dim result As String
inputString = "1 Test"
splitArray = Split(inputString, " ")
If(UBound(splitArray) >= 1) Then 'Making sure that it found something before using it
result = splitArray(1) 'Text
End If
inputString = "1.1 Test"
splitArray = Split(inputString, " ")
If(UBound(splitArray) >= 1) Then
result = splitArray(1) 'Text
End If
'etc.
UPDATE
Code delivering the functionality you want:
Dim splitArray() As String
Dim curTable As Table
Set curTable = ActiveDocument.Tables(1)
For Row = 1 To curTable.Rows.Count
With curTable
splitArray = Split(.Cell(Row, 2).Range.Text, " ")
If (UBound(splitArray) >= 1) Then
.Cell(Row, 2).Range.Text = splitArray(1)
.Cell(Row, 1).Range.Text = splitArray(0)
End If
End With
Next Row

How to normalize filenames listed in a range

I have a list of filenames in a spreadsheet in the form of "Smith, J. 010112.pdf". However, they're in the varying formats of "010112.pdf", "01.01.12.pdf", and "1.01.2012.pdf". How could I change these to one format of "010112.pdf"?
Personally I hate using VBA where worksheet functions will work, so I've worked out a way to do this with worksheet functions. Although you could cram this all into one cell, I've broken it out into a lot of independent steps in separate columns so you can see how it's working, step by step.
For simplicity I'm assuming your file name is in A1
B1 =LEN(A1)
determine the length of the filename
C1 =SUBSTITUTE(A1," ","")
replace spaces with nothing
D1 =LEN(C1)
see how long the string is if you replace spaces with nothing
E1 =B1-D1
determine how many spaces there are
F1 =SUBSTITUTE(A1," ",CHAR(8),E1)
replace the last space with a special character that can't occur in a file name
G1 =SEARCH(CHAR(8), F1)
find the special character. Now we know where the last space is
H1 =LEFT(A1,G1-1)
peel off everything before the last space
I1 =MID(A1,G1+1,255)
peel off everything after the last space
J1 =FIND(".",I1)
find the first dot
K1 =FIND(".",I1,J1+1)
find the second dot
L1 =FIND(".",I1,K1+1)
find the third dot
M1 =MID(I1,1,J1-1)
find the first number
N1 =MID(I1,J1+1,K1-J1-1)
find the second number
O1 =MID(I1,K1+1,L1-K1-1)
find the third number
P1 =TEXT(M1,"00")
pad the first number
Q1 =TEXT(N1,"00")
pad the second number
R1 =TEXT(O1,"00")
pad the third number
S1 =IF(ISERR(K1),M1,P1&Q1&R1)
put the numbers together
T1 =H1&" "&S1&".pdf"
put it all together
It's kind of a mess because Excel hasn't added a single new string manipulation function in over 20 years, so things that should be easy (like "find last space") require severe trickery.
Here's a screenshot of a simple four-step method based on Excel commands and formulas, as suggested in a comment to the answered post (with a few changes)...
This function below works. I've assumed that the date is in ddmmyy format, but adjust as appropriate if it's mmddyy -- I can't tell from your example.
Function FormatThis(str As String) As String
Dim strDate As String
Dim iDateStart As Long
Dim iDateEnd As Long
Dim temp As Variant
' Pick out the date part
iDateStart = GetFirstNumPosition(str, False)
iDateEnd = GetFirstNumPosition(str, True)
strDate = Mid(str, iDateStart, iDateEnd - iDateStart + 1)
If InStr(strDate, ".") <> 0 Then
' Deal with the dot delimiters in the date
temp = Split(strDate, ".")
strDate = Format(DateSerial( _
CInt(temp(2)), CInt(temp(1)), CInt(temp(0))), "ddmmyy")
Else
' No dot delimiters... assume date is already formatted as ddmmyy
' Do nothing
End If
' Piece it together
FormatThis = Left(str, iDateStart - 1) _
& strDate & Right(str, Len(str) - iDateEnd)
End Function
This uses the following helper function:
Function GetFirstNumPosition(str As String, startFromRight As Boolean) As Long
Dim i As Long
Dim startIndex As Long
Dim endIndex As Long
Dim indexStep As Integer
If startFromRight Then
startIndex = Len(str)
endIndex = 1
indexStep = -1
Else
startIndex = 1
endIndex = Len(str)
indexStep = 1
End If
For i = startIndex To endIndex Step indexStep
If Mid(str, i, 1) Like "[0-9]" Then
GetFirstNumPosition = i
Exit For
End If
Next i
End Function
To test:
Sub tester()
MsgBox FormatThis("Smith, J. 01.03.12.pdf")
MsgBox FormatThis("Smith, J. 010312.pdf")
MsgBox FormatThis("Smith, J. 1.03.12.pdf")
MsgBox FormatThis("Smith, J. 1.3.12.pdf")
End Sub
They all return "Smith, J. 010312.pdf".
You don't need VBA. Start by replacing the "."s with nothing:
=SUBSTITUTE(A1,".","")
This will change the ".PDF" to "PDF", so let's put that back:
=SUBSTITUTE(SUBSTITUTE(A1,".",""),"pdf",".pdf")
Got awk? Get the data into a text file, and
awk -F'.' '{ if(/[0-9]+\.[0-9]+\.[0-9]+/) printf("%s., %02d%02d%02d.pdf\n", $1, $2, $3, length($4) > 2 ? substr($4,3,2) : $4); else print $0; }' your_text_file
Assuming the data are exactly as what you described, e.g.,
Smith, J. 010112.pdf
Mit, H. 01.02.12.pdf
Excel, M. 8.1.1989.pdf
Lec, X. 06.28.2012.pdf
DISCLAIMER:
As #Jean-FrançoisCorbett has mentioned, this does not work for "Smith, J. 1.01.12.pdf". Instead of reworking this completely, I'd recommend his solution!
Option Explicit
Function ExtractNumerals(Original As String) As String
'Pass everything up to and including ".pdf", then concatenate the result of this function with ".pdf".
'This will not return the ".pdf" if passed, which is generally not my ideal solution, but it's a simpler form that still should get the job done.
'If you have varying extensions, then look at the code of the test sub as a guide for how to compensate for the truncation this function creates.
Dim i As Integer
Dim bFoundFirstNum As Boolean
For i = 1 To Len(Original)
If IsNumeric(Mid(Original, i, 1)) Then
bFoundFirstNum = True
ExtractNumerals = ExtractNumerals & Mid(Original, i, 1)
ElseIf Not bFoundFirstNum Then
ExtractNumerals = ExtractNumerals & Mid(Original, i, 1)
End If
Next i
End Function
I used this as a testcase, which does not correctly cover all your examples:
Sub test()
MsgBox ExtractNumerals("Smith, J. 010112.pdf") & ".pdf"
End Sub