I am trying to split data using VBA within word.
I have got the data using the following method
d = ActiveDocument.Tables(1).Cell(1, 1).Range.Text
This works and gets the correct data. Data for this example is
This
is
a
test
However, when I need to split the string into a list of strings using the delimiter as \n
Here is an example of the desired output
This,is,a,test
I am currently using
Dim dataTesting() As String
dataTesting() = Split(d, vbLf)
Debug.Print dataTesting(0)
However, this returns all the data and not just the first line.
Here is what I have tried within the Split function
\n
\n\r
\r
vbNewLine
vbLf
vbCr
vbCrLf
Word uses vbCr (ANSI 13) to write a "new" paragraph (created when you press ENTER) - represented in the Word UI by ¶ if the display of non-printing characters is activated.
In this case, the table cell content you show would look like this
This¶
is¶
a¶
test¶
The correct way to split an array delimited by a pilcro in Word is:
Dim d as String
d = ActiveDocument.Tables(1).Cell(1, 1).Range.Text
Dim dataTesting() As String
dataTesting() = Split(d, vbCr)
Debug.Print dataTesting(0) 'result is "This"
You can try this (regex splitter from this thread)
Sub fff()
Dim d As String
Dim dataTesting() As String
d = ActiveDocument.Tables(1).Cell(1, 1).Range.Text
dataTesting() = SplitRe(d, "\s+")
Debug.Print "1:" & dataTesting(0)
Debug.Print "2:" & dataTesting(1)
Debug.Print "3:" & dataTesting(2)
Debug.Print "4:" & dataTesting(3)
End Sub
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(Text, ChrW(-1)), ChrW(-1))
End Function
If this doesn't work, there may be strange unicode/Wprd characters in your Word doc. It may be soft breaks, for instance. You could try to not split with "\W+" in stead of "\s+". I cannot test this without your document.
Dim dataTesting() As String
dataTesting() = Split(d, vbLf)
Debug.Print dataTesting(0)
works fine and thank you very much for your example,
for why it have returned a whole array is because you have used 0 as index, in many programming languages 0 is the whole array, so the first element is ,
so in my case counting from 1 this perfectly split a string that I had troubles with.
To be more exact this is how it was used in my case
Dim dataTesting() As String
dataTesting() = Split(Document.LatheMachineSetup.Heads.Item(1).Comment, vbCrLf)
MsgBox (dataTesting(1))
And that comment is a multiline string.
Image
So this msg box returned exactly first line.
I am using following code to read csv file and finally the data is converted into array.
Function Sample(strPath) As String()
Dim MyData As String, strData() As String
Open strPath For Binary As #1
MyData = Space$(LOF(1))
Get #1, , MyData
Close #1
Sample = Split(MyData, vbCrLf)
End Function
Following code convert each comma separated line into array.
SaleRows = Sample(filepath)
For Each SaleRow In SaleRows
SaleRowArray = Split(SaleRow, ",")
Next SaleRow
Everything works fine but when a cell contains comma, my above function fails. Is there any way to handle comma in a particular cell?
Sample csv
Please observe comma in second row.
When I get array from the comma record, I get below.
It's likely that the strings in your CSV file are escaped with quote marks. If you want to parse the CSV using your own code, then your code needs to be able to ignore commas which occur inside quotes.
A quick way to do this would be to use RegEx. You could try the following:
Private Function changeDelimiter(line as String) as String
Dim regEx As Object
Set regEx = CreateObject("VBScript.regexp")
regEx.IgnoreCase = True
regEx.Global = True
regex.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
changeDelimiter = regex.Replace(line, "#|#")
End Function
Then change the Split() line in your code as follows:
SaleRowArray = Split(changeDelimiter(SaleRow), "#|#")
The function replaces all commas outside quote marks with #|# (which I assume won't ever come up in your input data). You then split the lines into data fields on #|# instead of comma.
I have a PDF file that contains a table. I want to use Excel-VBA to search just the first column for an array of values. I have a work around solution at the moment. I converted the PDF to a text file and search it like that. The problem is sometimes these values can be found in multiple columns, and I have no way of telling which one it's in. I ONLY want it if it's in the first column.
When the PDF converts to text, it converts it in a way such that there is an unpredictable amount of lines for each piece of information, so I can't convert it back to a table in an excel sheet based on the number of lines (believe me, I tried). The current method searches each line, and if it sees a match, it checks to see if the two strings are the same length. But like I mentioned earlier, (in a rare case but it does happen) there will be a match in a column that is NOT the column I want to search in. So, I'm wondering, is there a way to extract a single column from a PDF? Or even the entire table as it stands?
Public Sub checkNPCClist()
Dim lines As String
Dim linesArr() As String
Dim line As Variant
Dim these As String
lines = Sheet2.Range("F104").Value & ", " & Sheet2.Range("F105").Value & ", " & Sheet2.Range("F106").Value & ", " & Sheet2.Range("F107").Value
linesArr() = Split(lines, ",")
For Each line In linesArr()
If line <> " " Then
If matchlinename(CStr(line)) = True Then these = these & Trim(CStr(line)) & ", "
End If
Next line
If these <> "" Then
Sheet2.Range("H104").Value = Left(these, Len(these) - 2)
Else: Sheet2.Range("H104").Value = "Nope, none."
End If
End Sub
Function matchlinename(lookfor As String) As Boolean
Dim filename As String
Dim textdata As String
Dim textrow As String
Dim fileno As Integer
Dim temp As String
fileno = FreeFile
filename = "C:\Users\...filepath"
lookfor = Trim(lookfor)
Open filename For Input As #fileno
Do While Not EOF(fileno)
temp = textrow
Line Input #fileno, textrow
If InStr(1, textrow, lookfor, vbTextCompare) Then
If Len(Trim(textrow)) = Len(lookfor) Then
Close #fileno
matchlinename = True
GoTo endthis
End If
End If
'Debug.Print textdata
Loop
Close #fileno
matchlinename = False
endthis:
End Function
Task:
My goal is to find all numbered lines in procedures of my Code Modules.
The CodeModule.Find method can be used to check for search terms (target parameter).
Syntax:
object.Find(target, startline, startcol, endline, endcol [, wholeword] [, matchcase] [, patternsearch])
The referring help site https://msdn.microsoft.com/en-us/library/aa443952(v=vs.60).aspx states:
parameter patternsearch: Optional. A Boolean value specifying whether or not the target string is a regular expression pattern.
If True, the target string is a regular expression pattern. False is the default.
As explained above the find method allows a regex pattern search, which I would like to use in order to identify numbered lines in a precise way:
digits followed by a tab. The example below therefore defines a search string s and sets the last parameter PatternSearch in the .Find method to True.
Problem
AFAIK a valid regex definition could be
s = "[0-9]{1,4}[ \t]"
but that doesn't show anything, not even an error.
In order to show at least any results, I defined the search term
s = "[0-9]*[ \t]*)"
in the calling example procedure ListNumberedLines showing erratic results.
Question
Is there any possibility to use a valid regex patternsearch in the CodeModule.Find method?
Example code
Option Explicit
' ==============
' Example Search
' ==============
Sub ListNumberedLines()
' Declare search pattern string s
Dim S As String
10 S = "[0-9]*[ \t]*)"
20 Debug.Print "Search Term: " & S
30 Call findWordInModules(S)
End Sub
Public Sub findWordInModules(ByVal sSearchTerm As String)
' Purpose: find modules ('components') with lines containing a search term
' Method: .CodeModule.Find with last parameter patternsearch set to True
' Based on https://www.devhut.net/2016/02/24/vba-find-term-in-vba-modulescode/
' VBComponent requires reference to Microsoft Visual Basic for Applications Extensibility
' or keep it as is and use Late Binding instead
' Declare module variable oComponent
Dim oComponent As Object 'VBComponent
For Each oComponent In Application.VBE.ActiveVBProject.VBComponents
If oComponent.CodeModule.Find(sSearchTerm, 1, 1, -1, -1, False, False, True) = True Then
Debug.Print "Module: " & oComponent.Name 'Name of the current module in which the term was found (at least once)
'Need to execute a recursive listing of where it is found in the module since it could be found more than once
Call listLinesinModuleWhereFound(oComponent, sSearchTerm)
End If
Next oComponent
End Sub
Sub listLinesinModuleWhereFound(ByVal oComponent As Object, ByVal sSearchTerm As String)
' Purpose: list module lines containing a search term
' Method: .CodeModule.Find with last parameter patternsearch set to True
Dim lTotalNoLines As Long 'total number of lines within the module being examined
Dim lLineNo As Long 'will return the line no where the term is found
lLineNo = 1
With oComponent ' Module
lTotalNoLines = .CodeModule.CountOfLines
Do While .CodeModule.Find(sSearchTerm, lLineNo, 1, -1, -1, False, False, True) = True
Debug.Print vbTab & "Zl. " & lLineNo & "|" & _
Trim(.CodeModule.Lines(lLineNo, 1)) 'Remove any padding spaces
lLineNo = lLineNo + 1 'Restart the search at the next line looking for the next occurence
Loop
End With
End Sub
As #MatsMug says, parsing VBA with Regex is hard impossible, but line-numbers are a simpler case, and should be findable with regex alone.
Fortunately, line numbers can only appear within a procedure body (including before the End Sub/Function/Property statement), so we know they'll never be the first line of your code.
Unfortunately, you can prefix a line-label with 0 or more line continuations:
Sub Foo()
_
_
10 Beep
End Sub
Furthermore, a line number isn't always followed by a space - it can be followed by an instruction separator, giving the line-number the appearance of a line-label:
Sub foo()
10: Beep
End Sub
And if you're code is evil, you might encounter a negative line-number (entered by using hex notation - which VBE dutifully pretty prints back to the code-pane with a leading space and a negative number):
Sub foo()
10 Beep
-1 Beep
End Sub
And we also need to be able to identify numbers that appear on a continued line, that aren't line-numbers:
Sub foo()
Debug.Print _
5 & "is not a line-number"
End Sub
So, here's some evil line-numbering, with a mix of all of those edge-cases:
Option Explicit
Sub foo()
5: Beep
_
_
_
10 Beep
20 _
'Debug.Print _
30
50: Beep
40 Beep
_
-1 _
Beep 'The "-1" line number is achieved by entering "&HFFFFFFFF"
Debug.Print _
2 & "is not a line-number"
60 End Sub
And here's some regex that identifies the line-numbers:
(?<! _)\n( _\n)* ?(?<line_number>(?:\-)?\d+)[: ]
And here's a syntax highlight from regex101:
For the longest time, Rubberduck was struggling with properly/formally parsing line numbers - our work-around was to remove them (replacing them with spaces) before feeding the code module contents to our parser.
Recently we've managed to formally define line numbers:
// lineNumberLabel should actually be "statement-label" according to MS VBAL but they only allow lineNumberLabels:
// A <statement-label> that occurs as the first element of a <list-or-label> element has the effect
// as if the <statement-label> was replaced with a <goto-statement> containing the same
// <statement-label>. This <goto-statement> takes the place of <line-number-label> in
// <statement-list>.
listOrLabel :
lineNumberLabel (whiteSpace? COLON whiteSpace? sameLineStatement?)*
| (COLON whiteSpace?)? sameLineStatement (whiteSpace? COLON whiteSpace? sameLineStatement?)*
;
sameLineStatement : blockStmt;
And lineNumberLabel is defined as:
//Statement labels can only appear at the start of a line.
statementLabelDefinition : {_input.La(-1) == NEWLINE}? (combinedLabels | identifierStatementLabel | standaloneLineNumberLabel);
identifierStatementLabel : unrestrictedIdentifier whiteSpace? COLON;
standaloneLineNumberLabel :
lineNumberLabel whiteSpace? COLON
| lineNumberLabel;
combinedLabels : lineNumberLabel whiteSpace identifierStatementLabel;
lineNumberLabel : numberLiteral;
(full Antlr4 grammar here)
Notice the predicate {_input.La(-1) == NEWLINE}?, which force the parser rule to only match a statementLabelDefinition at the start of a line - a logical line of code.
You see VBA code has physical code lines, like what you're getting from the CodeModule's contents. But VBA code also has a concept of logical code lines, and it turns out that is all the parser cares about.
This would trip any typical regex:
Sub DoSomething()
Debug.Print _
42
End Sub
There's only 1 logical line of code between the signature and the End Sub token, but a simple Find will happily consider that 42 as a "line number" ...which it isn't - it's the argument passed to Debug.Print, in the same instruction, on the same logical code line, but on the next physical code line.
And you can't be dealing with logical code lines without first pre-processing your input, to take line continuation tokens into account.
And in order to do that, you need to actually parse the instructions you're seeing - at least know where they start and where they end... and that's no small undertaking! see ThunderFrame's answer
The VBIDE API is extremely limited, and won't be helpful for that.
TL;DR: You can't parse VBA code with regular expressions alone. So, nope. Sorry! you need a much more complex regex pattern than that - see ThunderFrame's answer.
Conclusion regarding CodeModule.Find via search pattern
Firstly, CodeModule.Find doesn't help via search pattern and its possible use is intransparent.
I agree that the VBIDE API is extremely limited and that there exist excellent professional tools which I highly recommand for any programmer :-)
Consequence: Work around via XML
Secondly I prefer household remedies if possible, so I tried to find an alternative solution using only the helpful parts of VBIDE.
Method
That is why I tried a simple xml conversation of the CodeModule.Lines allowing a flexible search within logical lines.
Instead of using regular expressions in requesting the xml data, I demonstrate a method to find leading numbers via a well defined XPath search (loop thru node list),
thus resolving most problems shown by #ThunderFrame. The search string in function showErls is defined as "line[substring(translate(.,'0123456789','¹¹¹¹¹¹¹¹¹¹'),1,1)="¹"]"
Furthermore function 'lineNumber' returns the logical line number within the module.
Note: To keep it simple, the search is restrained to one module only (user defined constant MYMODULE) and code avoids any regex.
Work around code - main sub
Option Explicit
' ==========================================
' User defined name of module to be analyzed
' ==========================================
Const MYMODULE = "modThunderFrame" ' << change to existing module name or userform
' Declare xml file as object
Dim xCMods As Object ' Late Binding; instead of Early Bd: Dim xCMods As MSXML2.DOMDocument6
Public Sub TestLineNumbers()
' =================
' A. Load/refresh code into xml
' =================
' set xml into memory - contains code module(s) lines
Set xCMods = CreateObject("MSXML2.Domdocument.6.0") ' L.Bd.; instead of E.Bd: Set xCMods = New MSXML2.DOMDocument60
xCMods.async = False
xCMods.validateOnParse = False
' read in user defined code module and load xml, if failed show error message
refreshCM MYMODULE
If xCMods Is Nothing Then Exit Sub
' ======================
' B. search line numbers
' ======================
showERLs
' =============================
' C. Save xml if needed
' =============================
' xCMods.Save ThisWorkbook.Path & "\VBE(" & MYMODULE & ").xml"
' MsgBox "Successfully exported Excel data to " & ThisWorkbook.Path & "\VBE(" & MYMODULE & ").XML!", _
' vbInformation, "Module " & MYMODULE & " to xml"
' =================
' D. terminate xml
' =================
Set xCMods = Nothing
End Sub
Sub procedures
Private Sub showERLs()
' Purpose: [B.] declare XPath search string and define special translate character
Dim s As String
Dim S1 As String: S1 = Chr(185) ' superior number 1 (hex B9) replaces any digit
' declare node and node list
Dim line As Object
Dim lines As Object
' define XPath search string for first digit in line (usual case)
s = "line[substring(translate(.,'0123456789','" & String(10, S1) & "'),1,1)=""" & _
S1 & _
"""]"
' start debugging
Debug.Print "**search string=""" & s & """" & vbNewLine & String(50, "-")
Debug.Print "Line #|Line Content" & vbNewLine & String(50, "-"); ""
' set node list
Set lines = xCMods.DocumentElement.SelectNodes(s)
' -------------------
' loop thru node list
' -------------------
For Each line In lines
Debug.Print Format(lineNumber(line), "00000") & "|" & line.Text ' return logical line number plus line content
Next line
End Sub
Private Sub refreshCM(sModName As String)
' Purpose: [A.] load xml string via LoadXML method
Dim sErrTxt As String
Dim line As Object
Dim lines As Object
Dim xpe As Object
Dim s As String ' xpath expression
Dim pos As Integer ' position of line number prefix
' ======================================
' 1. Read code module lines and load xml
' ======================================
If Not xCMods.LoadXML(readCM(sModName)) Then
' set ParseError object
Set xpe = xCMods.parseError
With xpe
sErrTxt = sErrTxt & vbNewLine & String(20, "-") & vbNewLine & _
"Loading Error No " & .ErrorCode & " of xml file " & vbCrLf & _
Replace(" " & Replace(.URL, "file:///", "") & " ", " ", "[No file found]") & vbCrLf & vbCrLf & _
xpe.reason & vbCrLf & _
"Source Text: " & .srcText & vbCrLf & _
"char?: " & """" & Mid(.srcText, .linepos, 1) & """" & vbCrLf & vbCrLf & _
"Line no: " & .line & vbCrLf & _
"Line pos: " & .linepos & vbCrLf & _
"File pos.: " & .filepos & vbCrLf & vbCrLf
End With
MsgBox sErrTxt, vbExclamation, "XML Loading Error"
Set xCMods = Nothing
Exit Sub
End If
' 2. resolve hex input problem of negative line numbers with leading space (thx #Thunderframe)
s = "line"
Set lines = xCMods.DocumentElement.SelectNodes(s)
' loop thru all logical lines
For Each line In lines
pos = ErlPosInLine(line.Text)
If pos <= Len(line.Text) Then
' to do: add attribute to line node, if wanted
' correct line content
line.Text = Mid(line.Text, pos)
End If
Next
End Sub
Private Function lineNumber(node As Object) As Long
' Purpose: [B.] return logical line number within code module lines
' Param.: IXMLDomNode
' Method: XPath via preceding-sibling count plus one
Dim tag As String: tag = "line"
lineNumber = node.SelectNodes("preceding-sibling::" & tag).Length + 1
End Function
Private Function readCM(Optional modName = "*") As String
' Purpose: return code module line string (VBIDE) of a user defined module to be read into xml
' Call: called from [A.] refreshCM
' xCMods.LoadXML(readCM(sModName))
' Declare variable
Dim s As String
Dim md As CodeModule
If modName = "*" Then Exit Function
On Error GoTo OOPS
' get code module lines into string
Set md = Application.VBE.ActiveVBProject.VBComponents(modName).CodeModule ' MSAccess: Modules("modVBELines")
' change to xml tags
s = getTags(md.lines(1, md.CountOfLines))
' return
readCM = s
OOPS:
End Function
Private Function getTags(ByVal s As String, Optional mode = False) As String
' Purpose: prepares xml string to be loaded
' define constant
Const HEAD = "<?xml version=""1.0"" encoding=""utf-8""?>" & vbCrLf & "<cm>" & vbCrLf
' 1. change tag characters
s = Replace(Replace(s, "<", "<"), ">", ">")
' 2. change special characters (ampersand)
s = Replace(s, "&", "&")
' 3. change "_" points
s = Replace(s, "_" & vbCrLf, Chr(133) & vbLf)
' 4. define logical line entities
If Right(s, 2) = vbCrLf Then s = Left(s, Len(s) - 2)
s = HEAD & " <line>" & Replace(s, vbCrLf, "</line>" & vbCrLf & " <line>") & "</line>" & vbCrLf & "</cm>"
' debug xml tags if second function parameter is true (mode = True)
If mode Then Debug.Print s
' return
getTags = s
End Function
Sub testErlPosInLine()
' Purpose: Test Thunderframe's problem with ERL prefixes (underscores, " ",..) and hex inputs
Dim s As String
s = " _" & vbLf & " -1 xx"
MsgBox "|" & Mid(s, ErlPosInLine(s)) & "|" & vbNewLine & _
"prefix = |" & Mid(s, 1, ErlPosInLine(s) - 1) & "|"
End Sub
Private Function ErlPosInLine(ByVal s As String) As Integer
' Purpose: remove prefix (underscore, tab, " ",.. ) from numbered line
' cf: http://stackoverflow.com/questions/42716936/vba-to-remove-numbers-from-start-of-string-cell
Dim i As Long
For i = 1 To Len(s) ' loop each char
Select Case Mid$(s, i, 1) ' examine current char
Case " " ' permitted chars
Case "_"
Case vbLf, Chr(133), Chr(34)
Case "0" To "9": Exit For ' cut off point
Case Else: Exit For ' i is the cut off point
End Select
Next
If Mid$(s, i, 1) = "-" And Len(s) > 1 Then
If IsNumeric(Mid$(s, i + 1, 1)) Then i = i + 1
End If
' return
ErlPosInLine = i
' debug.print Mid$(s, i) '//strip lead
End Function
I'm rooky for VBA. I have some problem about reversing my data on VBA-Excel. My data is "3>8 , 6>15 , 26>41 (each data on difference cells)" that i could reverse "3>8" to "8>3" follow my requirement by using function reverse. But i couldn't reverse "6>15" and "26>41" to "15>6" and "41>26". It will be "51>6" and "14>62" that failure, I want to be "15>6" and "41>26".
Reverse = StrReverse(Trim(str))
Help me for solve my issue please and thank for comment.
You first need to find the position of the ">" in the cell. you do this by taking the contents of the cell and treating it as a String and finding the ">"
This is done in the line beginning arrowPosition. This is the integer value of the position of the ">" in you original string
Next use Left to extract the text up to the ">" and Right to extract the text after the ">"
Then build a new String of rightstr & ">" & leftStr.
Note I input my data from Sheet1 B5 but you can just use any source as long as it is a String in the correct format.
Sub Test()
Dim myString As String
myString = Sheets("Sheet1").Range("B5")
Debug.Print myString
Debug.Print reverseString(myString)
End Sub
Function reverseString(inputString As String) As String
Dim leftStr As String
Dim rightStr As String
Dim arrowPosition As Integer
arrowPosition = InStr(1, inputString, ">")
leftStr = Left(inputString, arrowPosition - 1)
rightStr = Right(inputString, Len(inputString) - arrowPosition)
reverseString = rightStr & ">" & leftStr
End Function
just because you look for a VBA, you can add this function into your code:
Function rev(t As String) As String
s = Split(t, ">", 2)
rev = s(1) & ">" & s(0)
End Function
of course only if you have to reverse 2 number, otherwise you'll loop the "s", but the function would lose its usefulness