UDF to remove special characters, punctuation & spaces within a cell to create unique key for Vlookups - vba

I hacked together the following User Defined Function in VBA that allows me to remove certain non-text characters from any given Cell.
The code is as follows:
Function removeSpecial(sInput As String) As String
Dim sSpecialChars As String
Dim i As Long
sSpecialChars = "\/:*?™""®<>|.&##(_+`©~);-+=^$!,'" 'This is your list of characters to be removed
For i = 1 To Len(sSpecialChars)
sInput = Replace$(sInput, Mid$(sSpecialChars, i, 1), " ")
Next
removeSpecial = sInput
End Function
This portion of the code obviously defines what characters are to be removed:
sSpecialChars = "\/:*?™""®<>|.&##(_+`©~);-+=^$!,'"
I also want to include a normal space character, " ", within this criteria. I was wondering if there is some sort of escape character that I can use to do this?
So, my goal is to be able to run this function, and have it remove all specified characters from a given Excel Cell, while also removing all spaces.
Also, I realize I could do this with a =SUBSTITUTE function within Excel itself, but I would like to know if it is possible in VBA.
Edit: It's fixed! Thank you simoco!
Function removeSpecial(sInput As String) As String
Dim sSpecialChars As String
Dim i As Long
sSpecialChars = "\/:*?™""®<>|.&## (_+`©~);-+=^$!,'" 'This is your list of characters to be removed
For i = 1 To Len(sSpecialChars)
sInput = Replace$(sInput, Mid$(sSpecialChars, i, 1), "") 'this will remove spaces
Next
removeSpecial = sInput
End Function

So after the advice from simoco I was able to modify my for loop:
For i = 1 To Len(sSpecialChars)
sInput = Replace$(sInput, Mid$(sSpecialChars, i, 1), "") 'this will remove spaces
Next
Now for every character in a given cell in my spreadsheet, the special characters are removed and replaced with nothing. This is essentially done by the Replace$ and Mid$ functions used together as shown:
sInput = Replace$(sInput, Mid$(sSpecialChars, i, 1), "") 'this will remove spaces
This code is executed for every single character in the cell starting with the character at position 1, via my for loop.
Hopefully this answer benefits someone in the future if the stumble upon my original question.

Related

MS Access VBA: Split string into pre-defined width

I have MS Access form where the user pastes a string into a field {Vars}, and I want to reformat that string into a new field so that (a) it retains whole words, and (b) "fits" within 70 columns.
Specifically, the user will be cutting/pasting variable names from SPSS. So the string will go into the field as whole names---no spaces allowed---with line breaks between each variable. So the first bit of VBA code looks like this:
Vars = Replace(Vars, vbCrLf, " ")
which removes the line breaks. But from there, I'm stumped---ultimately I want the long string that is pasted in the Vars field to be put on consecutive multiple lines that each are no longer than 70 columns.
Any help is appreciated!
Okay, for posterity, here is a solution:
The field name on the form that captures the user input is VarList. The call to the SPSS_Syntax function below returns the list of variable names (in "Vars") that can then be used elsewhere:
Vars = SPSS_Syntax(me.VarList)
Recall that user input into Varlist comes in as each variable (word) with a line break in between each. The problem is that we want the list to be on one line (horizontal, not vertical) AND a line can be no more than 256 characters in length (I'm setting it to 70 characters below). Here's the function:
Public Function SPSS_Syntax(InputString As String)
InputString = Replace(InputString, vbNewLine, " ") 'Puts the string into one line, separated by a space.
MyLength = Len(InputString) 'Computes length of the string
If MyLength < 70 Then 'if the string is already short enough, just returns it as is.
SPSS_Syntax = InputString
Exit Function
End If
MyArray = Split(InputString, " ") 'Creates the array
Dim i As Long
For i = LBound(MyArray) To UBound(MyArray) 'for each element in the array
MyString = MyString & " " & MyArray(i) 'combines the string with a blank space in between
If Len(MyString) > 70 Then 'when the string gets to be more than 70 characters
Syntax = Syntax & " " & vbNewLine & MyString 'saves the string as a new line
MyString = "" 'erases string value for next iteration
End If
Next
SPSS_Syntax = Syntax
End Function
There's probably a better way to do it but this works. Cheers.

Removing All Spaces in String

I created a macro for removing all whitespace in a string, specifically an email address. However it only removes about 95% of the whitespace, and leaves a few.
My code:
Sub NoSpaces()
Dim w As Range
For Each w In Selection.Cells
w = Replace(w, " ", "")
Next
End Sub
Things I have tried to solve the issue include:
~ Confirmed the spaces are indeed spaces with the Code function, it is character 32 (space)
~ Used a substitute macro in conjuction with the replace macro
~ Have additional macro utilizing Trim function to remove leading and trailing whitespace
~ Made a separate macro to test for non-breaking spaces (character 160)
~ Used the Find and Replace feature to search and replace spaces with nothing. Confirmed working.
I only have one cell selected when I run the macro. It selects and goes through all the cells because of the Selection.Cells part of the code.
A few examples:
1 STAR MOVING # ATT.NET
322 TRUCKING#GMAIL.COM
ALEZZZZ#AOL. COM.
These just contain regular whitespace, but are skipped over.
Just use a regular expression:
'Add a reference to Microsoft VBScript Regular Expressions 5.5
Public Function RemoveWhiteSpace(target As String) As String
With New RegExp
.Pattern = "\s"
.MultiLine = True
.Global = True
RemoveWhiteSpace = .Replace(target, vbNullString)
End With
End Function
Call it like this:
Sub NoSpaces()
Dim w As Range
For Each w In Selection.Cells
w.Value = RemoveWhiteSpace(w.Value)
Next
End Sub
Try this:
Sub NoSpaces()
Selection.Replace " ", ""
End Sub
Use "Substitute"
Example...
=SUBSTITUTE(C1:C18," ","")
Because you assume that Selection.Cells includes all cells on the sheet.
Cells.Replace " ", ""
And to add to the excellent advice from all the great contributors, try the
TRIM or LTRIM, or RTRIM and you can read more about these functions here:
https://msdn.microsoft.com/en-us/library/office/gg278916.aspx
Now this does not remove embedded spaces (spaces in between the letters) but it will remove any leading and trailing spaces.
Hope this helps.
Space Problem with Excel
ok, the only way i see this two types of space is by converting their Ascii code value of which I do it here
now to explain this function i made, it will just filter the string character by character checking if its equal to the two types of space i mentioned. if not it will concatenate that character into the string which will be the final value after the loop. hope this helps. Thanks.
Function spaceremove(strs) As String
Dim str As String
Dim nstr As String
Dim sstr As String
Dim x As Integer
str = strs
For x = 1 To VBA.Len(str)
sstr = Left(Mid(str, x), 1)
If sstr = " " Or sstr = " " Then
Else
nstr = nstr & "" & sstr
End If
Next x
spaceremove = nstr
End Function
I copied a HTML table with data and pasted in excel but the cells were filled with unwanted space and all methods posted here didn't work so I debugged and I discovered that it wasn't actually space chars (ASCII 32) it was Non-breaking space) (ASCII 160) or HTML
So to make it work with that Non-breaking space char I did this:
Sub NoSpaces()
Dim w As Range
For Each w In Selection.Cells
w.Value = Replace(w.Value, " ", vbNullString)
w.Value = Replace(w.Value, Chr(160), vbNullString)
Next
End Sub

Excel VBA Using wildcard to replace string within string

I have a difficult situation and so far no luck in finding a solution.
My VBA collects number figures like $80,000.50. and I'm trying to get VBA to remove the last period to make it look like $80,000.50 but without using right().
The problem is after the last period there are hidden spaces or characters which will be a whole lot of new issue to handle so I'm just looking for something like:
replace("$80,000.50.",".**.",".**")
Is this possible in VBA?
I cant leave a comment so....
what about InStrRev?
Private Sub this()
Dim this As String
this = "$80,000.50."
this = Left(this, InStrRev(this, ".") - 1)
Debug.Print ; this
End Sub
Mid + Find
You can use Mid and Find functions. Like so:
The Find will find the first dot . character. If all the values you are collecting are currency with 2 decimals, stored as text, this will work well.
The formula is: =MID(A2,1,FIND(".",A2)+2)
VBA solution
Function getStringToFirstOccurence(inputUser As String, FindWhat As String) As String
getStringToFirstOccurence = Mid(inputUser, 1, WorksheetFunction.Find(FindWhat, inputUser) + 2)
End Function
Other possible solutions, hints
Trim + Clear + Substitute(Char(160)): Chandoo -
Untrimmable Spaces – Excel Formula
Ultimately, you can implement Regular expressions into Excel UDF: VBScript’s Regular Expression Support
How about:
Sub dural()
Dim r As Range
For Each r In Selection
s = r.Text
l = Len(s)
For i = l To 1 Step -1
If Mid(s, i, 1) = "." Then
r.Value = Mid(s, 1, i - 1) & Mid(s, i + 1)
Exit For
End If
Next i
Next r
End Sub
This will remove the last period and leave all the other characters intact. Before:
and after:
EDIT#1:
This version does not require looping over the characters in the cell:
Sub qwerty()
Dim r As Range
For Each r In Selection
If InStr(r.Value, ".") > 0 Then r.Characters(InStrRev(r.Text, "."), 1).Delete
Next r
End Sub
Shortest Solution
Simply use the Val command. I assume this is meant to be a numerical figure anyway? Get rid of commas and the dollar sign, then convert to value, which will ignore the second point and any other trailing characters! Robustness not tested, but seems to work...
Dim myString as String
myString = "$80,000.50. junk characters "
' Remove commas and dollar signs, then convert to value.
Dim myVal as Double
myVal = Val(Replace(Replace(myString,"$",""),",",""))
' >> myVal = 80000.5
' If you're really set on getting a formatted string back, use Format:
myString = Format(myVal, "$000,000.00")
' >> myString = $80,000.50
From the Documentation,
The Val function stops reading the string at the first character it can't recognize as part of a number. Symbols and characters that are often considered parts of numeric values, such as dollar signs and commas, are not recognized.
This is why we must first remove the dollar sign, and why it ignores all the junk after the second dot, or for that matter anything non numerical at the end!
Working with Strings
Edit: I wrote this solution first but now think the above method is more comprehensive and shorter - left here for completeness.
Trim() removes whitespace at the end of a string. Then you could simply use Left() to get rid of the last point...
' String with trailing spaces and a final dot
Dim myString as String
myString = "$80,000.50. "
' Get rid of whitespace at end
myString = Trim(myString)
' Might as well check if there is a final dot before removing it
If Right(myString, 1) = "." Then
myString = Left(myString, Len(myString) - 1)
End If
' >> myString = "$80,000.50"

Removing characters with .Replace in VBA for Excel

The following function was given to me via an answer that I asked earlier today.
What I'm trying to do is to remove a character from a string in Excel using VBA. However, whenever the function runs, it ends up erasing the value stored and returning a #!VALUE error. I cannot seem to figure out what is going on. Anyone mind explaining an alternative:
Function ReplaceAccentedCharacters(S As String) As String
Dim I As Long
With WorksheetFunction
For I = 1 To Len(S)
Select Case Asc(Mid(S, I, 1))
' Extraneous coding removed. Leaving the examples which
' do work and the one that is causing the problem.
Case 32
S = .Replace(S, I, 1, "-")
Case 94
S = .Replace(S, I, 1, "/")
' This is the coding that is generating the error.
Case 34
S = .Replace(S, I, 1, "")
End Select
Next I
End With
ReplaceAccentedCharacters = S
End Function
When the string contains a " (or character code 34 in Decimal, 22 in Hexadecimal... I used both) it is supposed to remove the quotation mark. However, instead, Excel ignores it, and still returns the " mark anyway.
I then tried to go ahead and replace the .Replace() clause with another value.
Case 34
S = .Replace(S, I, 1, "/")
End Select
Using the code above, the script indeed does replace the " with a /.
I ended up finding the following example here in Stack Overflow:
https://stackoverflow.com/a/7386565/692250
And in the answer given, I see the same exact code example similar to the one that I gave and nothing. Excel is still ignoring the quotation mark. I even went so far as to expand the definition with curly braces and still did not get anything.
Try this:
Function blah(S As String) As String
Dim arr, i
'array of [replace, with], [replace, with], etc
arr = Array(Chr(32), "-", Chr(94), "/", Chr(34), "")
For i = LBound(arr) To UBound(arr) Step 2
S = Replace(S, arr(i), arr(i + 1))
Next i
blah = S
End Function
This function was designed to replace one character with another. It was not designed to replace a character with nothing. What happens when you try to replace a character with nothing is that the Counter for iterating through the word will now look (at the last iteration) for a character position that is greater than the length of the word. That returns nothing, and when you try to determine ASC(<nothing>) an error occurs. Other errors in the replacement routine will also occur when the length of the string is changed while the code is running
To modify the routine to replace a character with nothing, I would suggest the following:
In the Case statements:
Case 34
S = .Replace(S, I, 1, Chr(1))
And in the assignment statement:
ReplaceAccentedCharacters = Replace(S, Chr(1), "")
Note that VBA Replace is different from Worksheetfunction Replace

How to normalize filenames listed in a range

I have a list of filenames in a spreadsheet in the form of "Smith, J. 010112.pdf". However, they're in the varying formats of "010112.pdf", "01.01.12.pdf", and "1.01.2012.pdf". How could I change these to one format of "010112.pdf"?
Personally I hate using VBA where worksheet functions will work, so I've worked out a way to do this with worksheet functions. Although you could cram this all into one cell, I've broken it out into a lot of independent steps in separate columns so you can see how it's working, step by step.
For simplicity I'm assuming your file name is in A1
B1 =LEN(A1)
determine the length of the filename
C1 =SUBSTITUTE(A1," ","")
replace spaces with nothing
D1 =LEN(C1)
see how long the string is if you replace spaces with nothing
E1 =B1-D1
determine how many spaces there are
F1 =SUBSTITUTE(A1," ",CHAR(8),E1)
replace the last space with a special character that can't occur in a file name
G1 =SEARCH(CHAR(8), F1)
find the special character. Now we know where the last space is
H1 =LEFT(A1,G1-1)
peel off everything before the last space
I1 =MID(A1,G1+1,255)
peel off everything after the last space
J1 =FIND(".",I1)
find the first dot
K1 =FIND(".",I1,J1+1)
find the second dot
L1 =FIND(".",I1,K1+1)
find the third dot
M1 =MID(I1,1,J1-1)
find the first number
N1 =MID(I1,J1+1,K1-J1-1)
find the second number
O1 =MID(I1,K1+1,L1-K1-1)
find the third number
P1 =TEXT(M1,"00")
pad the first number
Q1 =TEXT(N1,"00")
pad the second number
R1 =TEXT(O1,"00")
pad the third number
S1 =IF(ISERR(K1),M1,P1&Q1&R1)
put the numbers together
T1 =H1&" "&S1&".pdf"
put it all together
It's kind of a mess because Excel hasn't added a single new string manipulation function in over 20 years, so things that should be easy (like "find last space") require severe trickery.
Here's a screenshot of a simple four-step method based on Excel commands and formulas, as suggested in a comment to the answered post (with a few changes)...
This function below works. I've assumed that the date is in ddmmyy format, but adjust as appropriate if it's mmddyy -- I can't tell from your example.
Function FormatThis(str As String) As String
Dim strDate As String
Dim iDateStart As Long
Dim iDateEnd As Long
Dim temp As Variant
' Pick out the date part
iDateStart = GetFirstNumPosition(str, False)
iDateEnd = GetFirstNumPosition(str, True)
strDate = Mid(str, iDateStart, iDateEnd - iDateStart + 1)
If InStr(strDate, ".") <> 0 Then
' Deal with the dot delimiters in the date
temp = Split(strDate, ".")
strDate = Format(DateSerial( _
CInt(temp(2)), CInt(temp(1)), CInt(temp(0))), "ddmmyy")
Else
' No dot delimiters... assume date is already formatted as ddmmyy
' Do nothing
End If
' Piece it together
FormatThis = Left(str, iDateStart - 1) _
& strDate & Right(str, Len(str) - iDateEnd)
End Function
This uses the following helper function:
Function GetFirstNumPosition(str As String, startFromRight As Boolean) As Long
Dim i As Long
Dim startIndex As Long
Dim endIndex As Long
Dim indexStep As Integer
If startFromRight Then
startIndex = Len(str)
endIndex = 1
indexStep = -1
Else
startIndex = 1
endIndex = Len(str)
indexStep = 1
End If
For i = startIndex To endIndex Step indexStep
If Mid(str, i, 1) Like "[0-9]" Then
GetFirstNumPosition = i
Exit For
End If
Next i
End Function
To test:
Sub tester()
MsgBox FormatThis("Smith, J. 01.03.12.pdf")
MsgBox FormatThis("Smith, J. 010312.pdf")
MsgBox FormatThis("Smith, J. 1.03.12.pdf")
MsgBox FormatThis("Smith, J. 1.3.12.pdf")
End Sub
They all return "Smith, J. 010312.pdf".
You don't need VBA. Start by replacing the "."s with nothing:
=SUBSTITUTE(A1,".","")
This will change the ".PDF" to "PDF", so let's put that back:
=SUBSTITUTE(SUBSTITUTE(A1,".",""),"pdf",".pdf")
Got awk? Get the data into a text file, and
awk -F'.' '{ if(/[0-9]+\.[0-9]+\.[0-9]+/) printf("%s., %02d%02d%02d.pdf\n", $1, $2, $3, length($4) > 2 ? substr($4,3,2) : $4); else print $0; }' your_text_file
Assuming the data are exactly as what you described, e.g.,
Smith, J. 010112.pdf
Mit, H. 01.02.12.pdf
Excel, M. 8.1.1989.pdf
Lec, X. 06.28.2012.pdf
DISCLAIMER:
As #Jean-FrançoisCorbett has mentioned, this does not work for "Smith, J. 1.01.12.pdf". Instead of reworking this completely, I'd recommend his solution!
Option Explicit
Function ExtractNumerals(Original As String) As String
'Pass everything up to and including ".pdf", then concatenate the result of this function with ".pdf".
'This will not return the ".pdf" if passed, which is generally not my ideal solution, but it's a simpler form that still should get the job done.
'If you have varying extensions, then look at the code of the test sub as a guide for how to compensate for the truncation this function creates.
Dim i As Integer
Dim bFoundFirstNum As Boolean
For i = 1 To Len(Original)
If IsNumeric(Mid(Original, i, 1)) Then
bFoundFirstNum = True
ExtractNumerals = ExtractNumerals & Mid(Original, i, 1)
ElseIf Not bFoundFirstNum Then
ExtractNumerals = ExtractNumerals & Mid(Original, i, 1)
End If
Next i
End Function
I used this as a testcase, which does not correctly cover all your examples:
Sub test()
MsgBox ExtractNumerals("Smith, J. 010112.pdf") & ".pdf"
End Sub