Please suggest reg express for every nth occurrence of a character in a string in vba - vba

At the very outset, let me confess. I am not a developer nor do I have any technical background. However, I have learned a bit of coding. I want to a regular expression for finding every nth position of a character. For example, google, yahoo, rediff, facebook, cnn, pinterest, gmail. I want to find every third occurrence of the comma in the string and replace it with a semicolon. This is for a excel vba macro I am working on. For now, I am trying to loop through it and then replace. If the data is large, the macro fails. Would appreciate your reply. Thanks a ton in advance.
Here is what I am doing:
Option Explicit
Sub reg()
Dim regx As RegExp
Set regx = New RegExp
Dim allMatches As Object
Dim contents As String
Dim contents2 As String
contents = "hello, wow, true, false, why, try, cry, haha"
contents = "contents1" & contents
regx.pattern = ",{4}"
regx.Global = True
regx.IgnoreCase = False
Set allMatches = regx.Execute(contents)
contents2 = regx.Replace(contents, ";")
MsgBox contents2
End Sub
I get the data from all selected cells. Join it. Add semicolon (an indicator for the row end) at every fourth comma found. Please suggest if there is a better way to do it as I am new to this:
Here is what I have done currently by looping through array. I want to avoid this.
Function insertColon(sInputString As String) As Variant
Dim data3 As String
Dim sFind As String
Dim sReplacewith As String
Dim result As String
'Dim i As Integer
Dim Counter As Integer
sFind = ","
sReplacewith = ";"
data3 = sInputString
' MsgBox = data3
' Dim J As Integer
Application.Volatile
FindN = 0
'Dim i As Integer
' i = 1
Counter = 4
' MsgBox Len(data3)
While InStr(Counter, sInputString, sFind)
FindN = InStr(Counter, sInputString, sFind)
data3 = Application.Substitute(data3, sFind, sReplacewith, Counter)
Counter = Counter + 3
' MsgBox "loop" & i
'
' i = i + 1
Wend

If I understood you properly then all your code could be summarized to a few lines
Dim sText As String
sText = "hello, wow, true, false, why, try, cry, haha, huh, muh"
For p = 3 To Len(sText) Step 2
sText = WorksheetFunction.Substitute(sText, ",", ";", p)
Next p
MsgBox sText
'Output
'hello, wow, true; false, why, try; cry, haha, huh; muh
Test it and let me know whether it fails. If not don't forget to accept the answer.

Related

Access VBA Recordset Truncation

I'm using Access 2016. I have created a function that I've passed to a query - the function's role is to parse a memo (long text) field down from 3-4k characters (parses the string between 2 markers) to a length I can work with. Here is the function:
Function getBASE(sStrx)
Dim sTemp, iBEG As Integer, iEND As Integer, sBEG As String, sEND As String
'
If InStr(UCase(sStrx), "FROM ") > 1 And InStr(sStrx, "{") > 1 Then
iBEG = InStr(sStrx, "{") + 1
iEND = InStr(sStrx, "}") - iBEG
getBASE = Mid(sStrx, iBEG, iEND)
End If
'
End Function
The resultant length of this is around 700 characters. When I create a LEN on the result, I get 700 chars. I then run a sub (code below) that contains a Split function, but the result of the split is 255 characters. However, I have a couple of msgbox's that pop up to check the length of the recordset value and it's 700 (the correct length), but for some reason, it's truncating. I don't know if this is due the function, the sub, or something else. One other thing: I can pass the original value (3-4k chars) to the sub and it works fine. I'm beyond frustrated...thx for the help.
Sub Break_String()
Dim db As DAO.Database, rsSource As DAO.Recordset, rsDest As DAO.Recordset
Set db = CurrentDb()
Set rsSource = db.OpenRecordset("qTEST")
Set rsDest = db.OpenRecordset("tblParsed")
Dim WrdArray() As String
rsSource.MoveFirst
WrdArray() = Split(UCase(rsSource("[getBASE]")), "FROM ")
'
MsgBox Len(rsSource("[getBASE]")) <---- 700 chars
For i = LBound(WrdArray) To UBound(WrdArray)
strg = strg & vbNewLine & WrdArray(i)
Next i
'
MsgBox strg <----- 255 chars
End Sub

Excel VBA Custom Function Remove Words Appearing in One String From Another String

I am trying to remove words appearing in one string from a different string using a custom function. For instance:
A1:
the was why blue hat
A2:
the stranger wanted to know why his blue hat was turning orange
The ideal outcome in this example would be:
A3:
stranger wanted to know his turning orange
I need to have the cells in reference open to change so that they can be used in different situations.
The function will be used in a cell as:
=WORDREMOVE("cell with words needing remove", "cell with list of words being removed")
I have a list of 20,000 rows and have managed to find a custom function that can remove duplicate words (below) and thought there may be a way to manipulate it to accomplish this task.
Function REMOVEDUPEWORDS(txt As String, Optional delim As String = " ") As String
Dim x
'Updateby20140924
With CreateObject("Scripting.Dictionary")
.CompareMode = vbTextCompare
For Each x In Split(txt, delim)
If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
Next
If .Count > 0 Then REMOVEDUPEWORDS = Join(.keys, delim)
End With
End Function
If you can guarantee that your words in both strings will be separated by spaces (no comma, ellipses, etc), you could just Split() both strings then Filter() out the words:
Function WORDREMOVE(ByVal strText As String, strRemove As String) As String
Dim a, w
a = Split(strText) ' Start with all words in an array
For Each w In Split(strRemove)
a = Filter(a, w, False, vbTextCompare) ' Remove every word found
Next
WORDREMOVE = Join(a, " ") ' Recreate the string
End Function
You can also do this using Regular Expressions in VBA. The version below is case insensitive and assumes all words are separated only by space. If there is other punctuation, more examples would aid in crafting an appropriate solution:
Option Explicit
Function WordRemove(Str As String, RemoveWords As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.ignorecase = True
.Global = True
.Pattern = "(?:" & Join(Split(WorksheetFunction.Trim(RemoveWords)), "|") & ")\s*"
WordRemove = .Replace(Str, "")
End With
End Function
My example is certainly not the best code, but it should work
Function WORDREMOVE(FirstCell As String, SecondCell As String)
Dim FirstArgument As Variant, SecondArgument As Variant
Dim FirstArgumentCounter As Integer, SecondArgumentCounter As Integer
Dim Checker As Boolean
WORDREMOVE = ""
FirstArgument = Split(FirstCell, " ")
SecondArgument = Split(SecondCell, " ")
For SecondArgumentCounter = 0 To UBound(SecondArgument)
Checker = False
For FirstArgumentCounter = 0 To UBound(FirstArgument)
If SecondArgument(SecondArgumentCounter) = FirstArgument(FirstArgumentCounter) Then
Checker = True
End If
Next FirstArgumentCounter
If Checker = False Then WORDREMOVE = WORDREMOVE & SecondArgument(SecondArgumentCounter) & " "
Next SecondArgumentCounter
WORDREMOVE = Left(WORDREMOVE, Len(WORDREMOVE) - 1)
End Function

Find and replace all names of variables in VBA module

Let's assume that we have one module with only one Sub in it, and there are no comments. How to identify all variable names ? Is it possible to identify names of variables which are not defined using Dim ? I would like to identify them and replace each with some random name to obfuscate my code (O0011011010100101 for example), replace part is much easier.
List of characters which could be use in names of macros, functions and variables :
ABCDEFGHIJKLMNOPQRSTUVWXYZdefghijklmnopqrstuvwxyzg€‚„…†‡‰Š‹ŚŤŽŹ‘’“”•–—™š›śťžź ˇ˘Ł¤Ą¦§¨©Ş«¬­®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙ÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙
Below are my function I've wrote recenlty :
Function randomName(n as integer) as string
y="O"
For i = 2 To n:
If Rnd() > 0.5 Then
y = y & "0"
Else
y = y & "1"
End If
Next i
randomName=y
End Function
In goal to replace given strings in another string which represent the code of module I use below sub :
Sub substituteNames()
'count lines in "Module1" which is part of current workbook
linesCount = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.CountOfLines
'read code from module
code = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.Lines(StartLine:=1, Count:=linesCount)
inputStr = Array("name1", "name2", "name2") 'some hardwritten array with string to replace
namesLength = 20 'length of new variables names
For i = LBound(inputStr) To UBound(inputStr)
outputString = randomName(namesLength-1)
code = Replace(code, inputStr(i), outputString)
Next i
Debug.Print code 'view code
End Sub
then we simply substitute old code with new one, but how to identify strings with names of variables ?
Edition
Using **Option Explicit ** decrease safety of my simple method of obfuscation, because to reverse changes you only have to follow Dim statements and replace ugly names with something normal. Except that to make such substitution harder, I think it's good idea to break the line in the middle of variable name :
O0O000O0OO0O0000 _
0O00000O0OO0
the simple method is also replacing some strings with chains based on chr functions chr(104)&chr(101)&chr(108)&chr(108)&chr(111) :
Sub stringIntoChrChain()
strInput = "hello"
strOutput = ""
For i = 1 To Len(strInput)
strOutput = strOutput & "chr(" & Asc(Mid(strInput, i, 1)) & ")&"
Next i
Debug.Print Mid(strOutput, 1, Len(strOutput) - 1)
End Sub
comments like below could make impression on user and make him think that he does not poses right tool to deal with macro etc.:
'(k=Äó¬)w}ż^¦ů‡ÜOyúm=ěËnóÚŽb W™ÄQó’ (—*-ĹTIäb
'R“ąNPÔKZMţ†üÍQ‡
'y6ű˛Š˛ŁŽ¬=iýQ|˛^˙ ‡ńb ¬ĂÇr'ń‡e˘źäžŇ/âéç;1qýěĂj$&E!V?¶ßšÍ´cĆ$Âű׺Ůî’ﲦŔ?TáÄu[nG¦•¸î»éüĽ˙xVPĚ.|
'ÖĚ/łó®Üă9Ę]ż/ĹÍT¶Mµę¶mÍ
'q[—qëýY~Pc©=jÍ8˘‡,Ú+ń8ŐűŻEüńWü1ďëDZ†ć}ęńwŠbŢ,>ó’Űçµ™Š_…qÝăt±+‡ĽČg­řÍ!·eŠP âńđ:ŶOážű?őë®ÁšńýĎáËTbž}|Ö…ăË[®™
You can use a regular expression to find variable assignments by looking for the equals sign. You'll need to add a reference to the Microsoft VBScript Regular Expressions 5.5 and Microsoft Visual Basic for Applications Extensibility 5.3 libraries as I've used early binding.
Please be sure to back up your work and test this before using it. I could have gotten the regex wrong.
UPDATE:
I've refined the regular expressions so that it no longer catches datatypes of strongly typed constants (Const ImAConstant As String = "Oh Noes!" previously returned String). I've also added another regex to return those constants as well. The last version of the regex also mistakenly caught things like .Global = true. That was corrected. The code below should return all variable and constant names for a given code module. The regular expressions still aren't perfect, as you'll note that I was unable to stop false positives on double quotes. Also, my array handling could be done better.
Sub printVars()
Dim linesCount As Long
Dim code As String
Dim vbPrj As VBIDE.VBProject
Dim codeMod As VBIDE.CodeModule
Dim regex As VBScript_RegExp_55.RegExp
Dim m As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim i As Long
Dim j As Long
Dim isInDatatypes As Boolean
Dim isInVariables As Boolean
Dim datatypes() As String
Dim variables() As String
Set vbPrj = VBE.ActiveVBProject
Set codeMod = vbPrj.VBComponents("Module1").CodeModule
code = codeMod.Lines(1, codeMod.CountOfLines)
Set regex = New RegExp
With regex
.Global = True ' match all instances
.IgnoreCase = True
.MultiLine = True ' "code" var contains multiple lines
.Pattern = "(\sAs\s)([\w]*)(?=\s)" ' get list of datatypes we've used
' match any whole word after the word " As "
Set matches = .Execute(code)
End With
ReDim datatypes(matches.count - 1)
For i = 0 To matches.count - 1
datatypes(i) = matches(i).SubMatches(1) ' return second submatch so we don't get the word " As " in our array
Next i
With regex
.Pattern = "(\s)([^\.\s][\w]*)(?=\s\=)" ' list of variables
' begins with a space; next character is not a period (handles "with" assignments) or space; any alphanumeric character; repeat until... space
Set matches = .Execute(code)
End With
ReDim variables(matches.count - 1)
For i = 0 To matches.count - 1
isInDatatypes = False
isInVariables = False
' check to see if current match is a datatype
For j = LBound(datatypes) To UBound(datatypes)
If matches(i).SubMatches(1) = datatypes(j) Then
isInDatatypes = True
Exit For
End If
'Debug.Print matches(i).SubMatches(1)
Next j
' check to see if we already have this variable
For j = LBound(variables) To i
If matches(i).SubMatches(1) = variables(j) Then
isInVariables = True
Exit For
End If
Next j
' add to variables array
If Not isInDatatypes And Not isInVariables Then
variables(i) = matches(i).SubMatches(1)
End If
Next i
With regex
.Pattern = "(\sConst\s)(.*)(?=\sAs\s)" 'strongly typed constants
' match anything between the words " Const " and " As "
Set matches = .Execute(code)
End With
For i = 0 To matches.count - 1
'add one slot to end of array
j = UBound(variables) + 1
ReDim Preserve variables(j)
variables(j) = matches(i).SubMatches(1) ' again, return the second submatch
Next i
' print variables to immediate window
For i = LBound(variables) To UBound(variables)
If variables(i) <> "" And variables(i) <> Chr(34) Then ' for the life of me I just can't get the regex to not match doublequotes
Debug.Print variables(i)
End If
Next i
End Sub

Excel VBA Type Mismatch (13)

I am getting a type mismatch error in VBA and I am not sure why.
The purpose of this macro is to go through a column in an Excel spreadsheet and add all the emails to an array. After each email is added to the first array, it's also supposed to added to a second array but split into two pieces at the # symbol in order to separate name from domain. Like so: person#gmail.com to person and gmail.com.
The problem that I'm getting is that when it gets to the point where it's supposed to split the email, it throws a Type Mismatch error.
Specifically this part:
strDomain = Split(strText, "#")
Here is the complete code:
Sub addContactListEmails()
Dim strEmailList() As String 'Array of emails
Dim blDimensioned As Boolean 'Is the array dimensioned?
Dim strText As String 'To temporarily hold names
Dim lngPosition As Long 'Counting
Dim strDomainList() As String
Dim strDomain As String
Dim dlDimensioned As Boolean
Dim strEmailDomain As String
Dim i As Integer
Dim countRows As Long
'countRows = Columns("E:E").SpecialCells(xlVisible).Rows.Count
countRows = Range("E:E").CurrentRegion.Rows.Count
MsgBox "The number of rows is " & countRows
'The array has not yet been dimensioned:
blDimensioned = False
Dim counter As Long
Do While counter < countRows
counter = counter + 1
' Set the string to the content of the cell
strText = Cells(counter, 5).Value
If strText <> "" Then
'Has the array been dimensioned?
If blDimensioned = True Then
'Yes, so extend the array one element large than its current upper bound.
'Without the "Preserve" keyword below, the previous elements in our array would be erased with the resizing
ReDim Preserve strEmailList(0 To UBound(strEmailList) + 1) As String
Else
'No, so dimension it and flag it as dimensioned.
ReDim strEmailList(0 To 0) As String
blDimensioned = True
End If
'Add the email to the last element in the array.
strEmailList(UBound(strEmailList)) = strText
'Also add the email to the separation array
strDomain = Split(strText, "#")
If strDomain <> "" Then
If dlDimensioned = True Then
ReDim Preserve strDomainList(0 To UBound(strDomainList) + 1) As String
Else
ReDim strDomainList(0 To 0) As String
dlDimensioned = True
End If
strDomainList(UBound(strDomainList)) = strDomain
End If
End If
Loop
'Display email addresses, TESTING ONLY!
For lngPosition = LBound(strEmailList) To UBound(strEmailList)
MsgBox strEmailList(lngPosition)
Next lngPosition
For i = LBound(strDomainList) To UBound(strDomainList)
MsgBox strDomainList(strDomain)
Next
'Erase array
'Erase strEmailList
End Sub
ReDiming arrays is a big hassle. Welcome to the world of collections and Dictionarys. Collection objects are always accessible. Dictionaries require a reference to Microsoft Scripting Runtime (Tools>References>scroll down to find that text and check the box> OK). They dynamically change size for you, you can add, remove items very easily compared to arrays, and Dictionaries especially allow you to organize your data in more logical ways.
In the below code I used a dictionary there the key is the domain (obtained with the split function). Each value for a key is a collection of email addresses with that domain.
Put a break point on End Sub and look at the contents of each of these objects in your locals window. I think you'll see they make more sense and are easier in general.
Option Explicit
Function AllEmails() As Dictionary
Dim emailListCollection As Collection
Set emailListCollection = New Collection 'you're going to like collections way better than arrays
Dim DomainEmailDictionary As Dictionary
Set DomainEmailDictionary = New Dictionary 'key value pairing. key is the domain. value is a collection of emails in that domain
Dim emailParts() As String
Dim countRows As Long
Dim EmailAddress As String
Dim strDomain As String
'countRows = Columns("E:E").SpecialCells(xlVisible).Rows.Count
Dim sht As Worksheet 'always declare your sheets!
Set sht = Sheets("Sheet1")
countRows = sht.Range("E2").End(xlDown).Row
Dim counter As Long
Do While counter < countRows
counter = counter + 1
EmailAddress = Trim(sht.Cells(counter, 5))
If EmailAddress <> "" Then
emailParts = Split(EmailAddress, "#")
If UBound(emailParts) > 0 Then
strDomain = emailParts(1)
End If
If Not DomainEmailDictionary.Exists(strDomain) Then
'if you have not already encountered this domain
DomainEmailDictionary.Add strDomain, New Collection
End If
'Add the email to the dictionary of emails organized by domain
DomainEmailDictionary(strDomain).Add EmailAddress
'Add the email to the collection of only addresses
emailListCollection.Add EmailAddress
End If
Loop
Set AllEmails = DomainEmailDictionary
End Function
and use it with
Sub RemoveUnwantedEmails()
Dim allemailsDic As Dictionary, doNotCallSheet As Worksheet, emailsSheet As Worksheet
Set doNotCallSheet = Sheets("DoNotCallList")
Set emailsSheet = Sheets("Sheet1")
Set allemailsDic = AllEmails
Dim domain As Variant, EmailAddress As Variant
Dim foundDoNotCallDomains As Range, emailAddressesToRemove As Range
For Each domain In allemailsDic.Keys
Set foundDoNotCallDomains = doNotCallSheet.Range("A:A").Find(domain)
If Not foundDoNotCallDomains Is Nothing Then
Debug.Print "domain found"
'do your removal
For Each EmailAddress In allemailsDic(domain)
Set emailAddressesToRemove = emailsSheet.Range("E:E").Find(EmailAddress)
If Not emailAddressesToRemove Is Nothing Then
emailAddressesToRemove = ""
End If
Next EmailAddress
End If
Next domain
End Sub
strDomain must store array of the split text, therefore,
Dim strDomain As Variant
Afterwards, strDomain should be referenced by index, if operations with certain fragments will be made:
If strDomain(i) <> "" Then
The split function returns an array of strings based on the provided separator.
In your if you are sure that the original string is an email, with just one "#" in it then you can safely use the below code:
strDomain = Split(strText, "#")(1)
This will get you the part after "#" which is what you are looking for.
Split returns an array:
Dim mailComp() As String
[...]
mailComp = Split(strText, "#")
strDomain = mailComp(1)
Try strDomain = Split(strText,"#")(1) to get the right hand side of the split where (0) would be the left. And of course works with more than 2 splits as well. You could dim you string variable as an array strDomain() and then Split(strText,"#") will place all the seperated text into the array.

loading formatted data in VBA from a text file

I'm looking for the best way of loading formatted data in VBA. I’ve spent quite some time trying to find the equivalent of C-like or Fortran-like fscanf type functions, but without success.
Basically I want to read from a text file millions of numbers placed on many (100,000’s) lines with 10 numbers each (except the last line, possibly 1-10 numbers). The numbers are separated by spaces, but I don’t know in advance the width of each field (and this width changes between data blocks).
e.g.
397143.1 396743.1 396343.1 395943.1 395543.1 395143.1 394743.1 394343.1 393943.1 393543.1
-0.11 -0.10 -0.10 -0.10 -0.10 -0.09 -0.09 -0.09 -0.09 -0.09
0.171 0.165 0.164 0.162 0.158 0.154 0.151 0.145 0.157 0.209
Previously I’ve used the Mid function but in this case I can’t, because I don’t know in advance the width of each field. Also it's too many lines to load in an Excel sheet. I can think of a brute force way in which I look at each successive character and determine whether it’s a space or a number, but it seems terribly clumsy.
I’m also interested in pointers on how to write formatted data, but this seems easier -- just format each string and concatenate them using &.
The following snippet will read whitespace-delimited numbers from a text file:
Dim someNumber As Double
Open "YourDataFile.txt" For Input As #1
Do While Not (EOF(1))
Input #1, someNumber
`// do something with someNumber here...`
Loop
Close #1
update: Here is how you could read one line at a time, with a variable number of items on each line:
Dim someNumber As Double
Dim startPosition As Long
Dim endPosition As Long
Dim temp As String
Open "YourDataFile" For Input As #1
Do While Not (EOF(1))
startPosition = Seek(1) '// capture the current file position'
Line Input #1, temp '// read an entire line'
endPosition = Seek(1) '// determine the end-of-line file position'
Seek 1, startPosition '// jump back to the beginning of the line'
'// read numbers from the file until the end of the current line'
Do While Not (EOF(1)) And (Seek(1) < endPosition)
Input #1, someNumber
'// do something with someNumber here...'
Loop
Loop
Close #1
You could also use regular expressions to replace multiple whitespaces to one space and then use the Split function for each line like the example code shows below.
After 65000 rows have been processed a new sheet will be added to the Excel workbook so the source file can be bigger than the max number of rows in Excel.
Dim rx As RegExp
Sub Start()
Dim fso As FileSystemObject
Dim stream As TextStream
Dim originalLine As String
Dim formattedLine As String
Dim rowNr As Long
Dim sht As Worksheet
Dim shtCount As Long
Const maxRows As Long = 65000
Set fso = New FileSystemObject
Set stream = fso.OpenTextFile("c:\data.txt", ForReading)
rowNr = 1
shtCount = 1
Set sht = Worksheets.Add
sht.Name = shtCount
Do While Not stream.AtEndOfStream
originalLine = stream.ReadLine
formattedLine = ReformatLine(originalLine)
If formattedLine <> "" Then
WriteValues formattedLine, rowNr, sht
rowNr = rowNr + 1
If rowNr > maxRows Then
rowNr = 1
shtCount = shtCount + 1
Set sht = Worksheets.Add
sht.Name = shtCount
End If
End If
Loop
End Sub
Function ReformatLine(line As String) As String
Set rx = New RegExp
With rx
.MultiLine = False
.Global = True
.IgnoreCase = True
.Pattern = "[\s]+"
ReformatLine = .Replace(line, " ")
End With
End Function
Function WriteValues(formattedLine As String, rowNr As Long, sht As Worksheet)
Dim colNr As Long
colNr = 1
stringArray = Split(formattedLine, " ")
For Each stringItem In stringArray
sht.Cells(rowNr, colNr) = stringItem
colNr = colNr + 1
Next
End Function