Extract PDF table and insert into Excel - vba

I have a PDF file that contains a table. I want to use Excel-VBA to search just the first column for an array of values. I have a work around solution at the moment. I converted the PDF to a text file and search it like that. The problem is sometimes these values can be found in multiple columns, and I have no way of telling which one it's in. I ONLY want it if it's in the first column.
When the PDF converts to text, it converts it in a way such that there is an unpredictable amount of lines for each piece of information, so I can't convert it back to a table in an excel sheet based on the number of lines (believe me, I tried). The current method searches each line, and if it sees a match, it checks to see if the two strings are the same length. But like I mentioned earlier, (in a rare case but it does happen) there will be a match in a column that is NOT the column I want to search in. So, I'm wondering, is there a way to extract a single column from a PDF? Or even the entire table as it stands?
Public Sub checkNPCClist()
Dim lines As String
Dim linesArr() As String
Dim line As Variant
Dim these As String
lines = Sheet2.Range("F104").Value & ", " & Sheet2.Range("F105").Value & ", " & Sheet2.Range("F106").Value & ", " & Sheet2.Range("F107").Value
linesArr() = Split(lines, ",")
For Each line In linesArr()
If line <> " " Then
If matchlinename(CStr(line)) = True Then these = these & Trim(CStr(line)) & ", "
End If
Next line
If these <> "" Then
Sheet2.Range("H104").Value = Left(these, Len(these) - 2)
Else: Sheet2.Range("H104").Value = "Nope, none."
End If
End Sub
Function matchlinename(lookfor As String) As Boolean
Dim filename As String
Dim textdata As String
Dim textrow As String
Dim fileno As Integer
Dim temp As String
fileno = FreeFile
filename = "C:\Users\...filepath"
lookfor = Trim(lookfor)
Open filename For Input As #fileno
Do While Not EOF(fileno)
temp = textrow
Line Input #fileno, textrow
If InStr(1, textrow, lookfor, vbTextCompare) Then
If Len(Trim(textrow)) = Len(lookfor) Then
Close #fileno
matchlinename = True
GoTo endthis
End If
End If
'Debug.Print textdata
Loop
Close #fileno
matchlinename = False
endthis:
End Function

Related

How can I format value between 2nd and 4th underscore in the file name?

I have VBA code to capture filenames to a table in an MS Access Database.
The values look like this:
FileName
----------------------------------------------------
WC1603992365_Michael_Cert_03-19-2019_858680723.csv
WC1603992365_John_Non-Cert_03-19-2019_858680722.csv
WC1703611403_Paul_Cert_03-27-2019_858679288.csv
Each filename has 4 _ underscores and the length of the filename varies.
I want to capture the value between the 2nd and the 3rd underscore, e.g.:
Cert
Non-Cert
Cert
I have another file downloading program, and it has "renaming" feature with a regular expression. And I set up the following:
Source file Name: (.*)\_(.*)\_(.*)\_(.*)\_\-(.*)\.(.*)
New File Name: \5.\6
In this example, I move the 5th section of the file name to the front, and add the file extension.
For example, WC1603992365_Michael_Cert_03-19-2019_858680723.csv would be saved as 858680723.csv in the folder.
Is there a way that I can use RegEx to capture 3rd section of the file name, and save the value in a field?
I tried VBA code, and searched SQL examples, but I did not find any.
Because the file name length is not fixed, I cannot use LEFT or RIGHT...
Thank you in advance.
One possible solution is to use the VBA Split function to split the string into an array of strings using the underscore as a delimiter, and then return the item at index 2 in this array.
For example, you could define a VBA function such as the following, residing in a public module:
Function StringElement(strStr, intIdx As Integer) As String
Dim strArr() As String
strArr = Split(Nz(strStr, ""), "_")
If intIdx <= UBound(strArr) Then StringElement = strArr(intIdx)
End Function
Here, I've defined the argument strStr as a Variant so that you may pass it Null values without error.
If supplied with a Null value or if the supplied index exceeds the bounds of the array returned by splitting the string using an underscore, the function will return an empty string.
You can then call the above function from a SQL statement:
select StringElement(t.Filename, 2) from Filenames t
Here I have assumed that your table is called Filenames - change this to suit.
This is the working code that I completed. Thank you for sharing your answers.
Public Function getSourceFiles()
Dim rs As Recordset
Dim strFile As String
Dim strPath As String
Dim newFileName As String
Dim FirstFileName As String
Dim newPathFileName As String
Dim RecSeq1 As Integer
Dim RecSeq2 As Integer
Dim FileName2 As String
Dim WrdArrat() As String
RecSeq1 = 0
Set rs = CurrentDb.OpenRecordset("tcsvFileNames", dbOpenDynaset) 'open a recordset
strPath = "c:\in\RegEx\"
strFile = Dir(strPath, vbNormal)
Do 'Loop through the balance of files
RecSeq1 = RecSeq1 + 1
If strFile = "" Then 'If no file, exit function
GoTo ExitHere
End If
FirstFileName = strPath & strFile
newFileName = strFile
newPathFileName = strPath & newFileName
FileName2 = strFile
Dim SubStrings() As String
SubStrings = Split(FileName2, "_")
Debug.Print SubStrings(2)
rs.AddNew
rs!FileName = strFile
rs!FileName68 = newFileName 'assign new files name max 68 characters
rs!Decision = SubStrings(2) 'extract the value after the 3rd underscore, and add it to Decision Field
rs.Update
Name FirstFileName As newPathFileName
strFile = Dir()
Loop
ExitHere:
Set rs = Nothing
MsgBox ("Directory list is complete.")
End Function

VBA Split data by new line word

I am trying to split data using VBA within word.
I have got the data using the following method
d = ActiveDocument.Tables(1).Cell(1, 1).Range.Text
This works and gets the correct data. Data for this example is
This
is
a
test
However, when I need to split the string into a list of strings using the delimiter as \n
Here is an example of the desired output
This,is,a,test
I am currently using
Dim dataTesting() As String
dataTesting() = Split(d, vbLf)
Debug.Print dataTesting(0)
However, this returns all the data and not just the first line.
Here is what I have tried within the Split function
\n
\n\r
\r
vbNewLine
vbLf
vbCr
vbCrLf
Word uses vbCr (ANSI 13) to write a "new" paragraph (created when you press ENTER) - represented in the Word UI by ¶ if the display of non-printing characters is activated.
In this case, the table cell content you show would look like this
This¶
is¶
a¶
test¶
The correct way to split an array delimited by a pilcro in Word is:
Dim d as String
d = ActiveDocument.Tables(1).Cell(1, 1).Range.Text
Dim dataTesting() As String
dataTesting() = Split(d, vbCr)
Debug.Print dataTesting(0) 'result is "This"
You can try this (regex splitter from this thread)
Sub fff()
Dim d As String
Dim dataTesting() As String
d = ActiveDocument.Tables(1).Cell(1, 1).Range.Text
dataTesting() = SplitRe(d, "\s+")
Debug.Print "1:" & dataTesting(0)
Debug.Print "2:" & dataTesting(1)
Debug.Print "3:" & dataTesting(2)
Debug.Print "4:" & dataTesting(3)
End Sub
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(Text, ChrW(-1)), ChrW(-1))
End Function
If this doesn't work, there may be strange unicode/Wprd characters in your Word doc. It may be soft breaks, for instance. You could try to not split with "\W+" in stead of "\s+". I cannot test this without your document.
Dim dataTesting() As String
dataTesting() = Split(d, vbLf)
Debug.Print dataTesting(0)
works fine and thank you very much for your example,
for why it have returned a whole array is because you have used 0 as index, in many programming languages 0 is the whole array, so the first element is ,
so in my case counting from 1 this perfectly split a string that I had troubles with.
To be more exact this is how it was used in my case
Dim dataTesting() As String
dataTesting() = Split(Document.LatheMachineSetup.Heads.Item(1).Comment, vbCrLf)
MsgBox (dataTesting(1))
And that comment is a multiline string.
Image
So this msg box returned exactly first line.

Writing Fixed width text files from excel vba

This is the output of a program.
I have specified what shall be width of each cell in the program and my program shows correct output.
What I want to do is cell content shall be written from right to left. E.g highlighted figure 9983.54 has width of 21. Text file has used first 7 columns. But I want it to use last 7 columns of text file.
Please see expected output image.
I am not getting any clue how to do this. I am not a very professional programmer but I love coding. This text file is used as input to some other program and i am trying to automate writing text file from excel VBA.
Can anyone suggest a way to get this output format?
Here is the code which gave me first output
Option Explicit
Sub CreateFixedWidthFile(strFile As String, ws As Worksheet, s() As Integer)
Dim i As Long, j As Long
Dim strLine As String, strCell As String
'get a freefile
Dim fNum As Long
fNum = FreeFile
'open the textfile
Open strFile For Output As fNum
'loop from first to last row
'use 2 rather than 1 to ignore header row
For i = 1 To ws.Range("a65536").End(xlUp).Row
'new line
strLine = ""
'loop through each field
For j = 0 To UBound(s)
'make sure we only take chars up to length of field (may want to output some sort of error if it is longer than field)
strCell = Left$(ws.Cells(i, j + 1).Value, s(j))
'add on string of spaces with length equal to the difference in length between field length and value length
strLine = strLine & strCell & String$(s(j) - Len(strCell), Chr$(32))
Next j
'write the line to the file
Print #fNum, strLine
Next i
'close the file
Close #fNum
End Sub
'for example the code could be called using:
Sub CreateFile()
Dim sPath As String
sPath = Application.GetSaveAsFilename("", "Text Files,*.txt")
If LCase$(sPath) = "false" Then Exit Sub
'specify the widths of our fields
'the number of columns is the number specified in the line below +1
Dim s(6) As Integer
'starting at 0 specify the width of each column
s(0) = 21
s(1) = 9
s(2) = 15
s(3) = 11
s(4) = 12
s(5) = 10
s(6) = 186
'for example to use 3 columns with field of length 5, 10 and 15 you would use:
'dim s(2) as Integer
's(0)=5
's(1)=10
's(2)=15
'write to file the data from the activesheet
CreateFixedWidthFile sPath, ActiveSheet, s
End Sub
Something like this should work:
x = 9983.54
a = Space(21-Len(CStr(x))) & CStr(x)
Then a will be 14 spaces followed by x:
a = " 9983.54"
Here 21 is the desired column width --- change as necessary. CStr may be unnecessary for non-numeric x.
If you're going to right-justify a lot of different data to different width fields you could write a general purpose function:
Function LeftJust(val As String, width As Integer) As String
LeftJust = Space(width - Len(val)) & val
End Function
The you call it with LeftJust(CStr(9983.54), 21).
Also note that VBA's Print # statement has a Spc(n) parameter that you can use to produce fixed-width output, e.g., Print #fNum, Spc(n); a; before this statement you calculate n: n = 21-Len(CStr(a)).
Hope that helps

Connecting to Access from Excel, then create table from txt file

I am writing VBA code for an Excel workbook. I would like to be able to open a connection with an Access database, and then import a txt file (pipe delimited) and create a new table in the database from this txt file. I have searched everywhere but to no avail. I have only been able to find VBA code that will accomplish this from within Access itself, rather than from Excel. Please help! Thank you
Google "Open access database from excel VBA" and you'll find lots of resources. Here's the general idea though:
Dim db As Access.Application
Public Sub OpenDB()
Set db = New Access.Application
db.OpenCurrentDatabase "C:\My Documents\db2.mdb"
db.Application.Visible = True
End Sub
You can also use a data access technology like ODBC or ADODB. I'd look into those if you're planning more extensive functionality. Good luck!
I had to do this exact same problem. You have a large problem presented in a small question here, but here is my solution to the hardest hurdle. You first parse each line of the text file into an array:
Function ParseLineEntry(LineEntry As String) As Variant
'Take a text file string and parse it into individual elements in an array.
Dim NumFields As Integer, LastFieldStart As Integer
Dim LineFieldArray() As Variant
Dim i As Long, j As Long
'Determine how many delimitations there are. My data always had the format
'data1|data2|data3|...|dataN|, so there was always at least one field.
NumFields = 0
For I = 1 To Len(LineEntry)
If Mid(LineEntry, i, 1) = "|" Then NumFields = NumFields + 1
Next i
ReDim LineFieldArray(1 To NumFields)
'Parse out each element from the string and assign it into the appropriate array value
LastFieldStart = 1
For i = 1 to NumFields
For j = LastFieldStart To Len(LineEntry)
If Mid(LineEntry, j , 1) = "|" Then
LineFieldArray(i) = Mid(LineEntry, LastFieldStart, j - LastFieldStart)
LastFieldStart = j + 1
Exit For
End If
Next j
Next i
ParseLineEntry = LineFieldArray
End Function
You then use another routine to add the connection in (I am using ADODB). My format for entries was TableName|Field1Value|Field2Value|...|FieldNValue|:
Dim InsertDataCommand as String
'LineArray = array populated by ParseLineEntry
InsertDataCommand = "INSERT INTO " & LineArray(1) & " VALUES ("
For i = 2 To UBound(LineArray)
If i = UBound(LineArray) Then
InsertDataCommand = InsertDataCommand & "'" & LineArray(i) & "'" & ")"
Else
InsertDataCommand = InsertDataCommand & LineArray(i) & ", "
End If
Next i
Just keep in mind that you will have to build some case handling into this. For example, if you have an empty value (e.g. Val1|Val2||Val4) and it is a string, you can enter "" which will already be in the ParseLineEntry array. However, if you are entering this into a number column it will fail on you, you have to insert "Null" instead inside the string. Also, if you are adding any strings with an apostrophe, you will have to change it to a ''. In sum, I had to go through my lines character by character to find these issues, but the concept is demonstrated.
I built the table programmatically too using the same parsing function, but of this .csv format: TableName|Field1Name|Field1Type|Field1Size|...|.
Again, this is a big problem you are tackling, but I hope this answer helps you with the less straight forward parts.

modifying/ getting rid of characters in text from txt file ussing vb.net

I have a string of text i captured within AutoCAD (0.000000, 0.000000, 0.000000) wich is saved to a text based file named position.txt.
as you probably have gatherd with a file name such as position.txt the text could be composed of any random number combination eg: (5.745379, 0.846290, 150.6459046).
However for it to be of any use to me I need the captured string to exist without spaces or brackets how can i achiev this in VB.net?
Use String.Replace. Its probably not the most efficient way but it will get the job done.
Dim file as String = My.Computer.FileSystem.ReadAllText("position.txt")
Dim output as String = file.Replace(" ", "") _
.Replace("(", "") _
.Replace(")", "")
My.Computer.FileSystem.WriteAllText("output.txt", output, false)
as above
s = "(5.745379, 0.846290, 150.6459046)"
s = s.replace("(","")
s = s.replace(")","")
and then
dim answer() as string = s.split(",")
dim number as double
For each a as string in answer
if double.tryparse(a,n) then
console.writeline(n.tostring & " is a number")
else
console.writeline(n.tostring & " is rubbish")
next