The method I used was for text files and gives gibberish as expected.
In: John Smith
Out: PK!~8ìz‡[Content_Types].xml ¢( ´”ÏNÂ#Æï&¾C³WÓ.x0ÆP8•D|€a;…Õvw³;ü{{§´5#UôBRf¾ß÷Ív;½Áº,¢%ú IE7鈲™6³T¼Lâ[“Aa
¦bƒAú—½ÉÆaˆXmB*æDîNÊ æXBH¬CÕÜúˆýL:Po0CyÝéÜHe
¡¡˜*†è÷†˜Ã¢ h´æ¿ë$SmDt_÷UV©ç
€¸,—&KÊÛ<×
I'm a novice at VBA, and I'm trying to read a document line by line so that I can eventually have the macro automatically remove entire lines based on their content.
Sub ayaya()
Dim TextLine As String
Open ActiveDocument.Path & "\Doc1.docm" For Input As #1
Do While Not EOF(1) ' Loop until end of file.
Line Input #1, TextLine ' Read line into variable.
Debug.Print TextLine
Loop
Close #1
End Sub
Part of me hoped that it would give "John Smith". I've seen some solutions put the entire document into a text file. Is there any way where I can delimit the data somehow? I'd like to be able to isolate a single line and remove it.
You are trying to read a docx or docm file, which is a zip archive. Word files are not plain text files, so you won't get anything meaningful treating them as such. You need to open the file with Word or another app that can read such files.
This is not a duplicate since I want a solution not constisting in reformatting file to txt:
my intention is to open a csv file using semicolon as delimiter. For that purpose I have used the following code:
Sub prueba2()
Dim sfile As String
Dim wb As Workbook
Dim Path As String
Dim Namefile As String
Path = "V:\evfilesce9i9\apps9\vbe9\dep4\KFTP\KFTP001D_FicherosCeca"
Namefile = "\QryCECARFSECTORIAL0239*.txt"
Set wb = Workbooks.Open(Filename:=Path & Namefile, Delimiter:=";")
End Sub
When I try it, it is opened using commas as delimiter instead of which I have specified (semicolon)
I have read in other questions that this is normal in post 2006 Excel versions, and that the fastest solution is to reformat file to a txt.
This does not fit into my needs because I have to do it without changing format. I don't find any solution.
Could someone help me?
Please see the MS documentation here.
I think you want to use the Format parameter, and not the delimiter parameter.
Try:
Set wb = Workbooks.Open(Filename:=Path & Namefile, Format:=4)
It seems like the Delimiter argument is only used if Format is set to 6, which signifies a custom delimiter character. Semi-colon is a standard delimiter.
Edit:
Hmm... so, this seems to be something that's been tricky in Excel/VBA for a while.
After some more research, the "Format" option may only be used when opening .txt files. Which is why the "reformat file to .txt" is one possible solution.
There are some things that can be done, however.
Excel will handle opening a semicolon delimited file well if the first line of the file is:
sep=;
I know you said you could not reformat the files, but is that something that you can do?
If not, the next things I would suggest would be to either: 1) use the Open Statement to open your file and then write it to a temporary file (perhaps as a .txt), to be reopened with the original Workbooks.Open(Format:=4), or 2) write your own text importer. A sample text importer can be found in this stackoverflow page.
Sub ImportCSVFile(filepath As String)
Dim line As String
Dim arrayOfElements
Dim linenumber As Integer
Dim elementnumber As Integer
Dim element As Variant
linenumber = 0
elementnumber = 0
Open filepath For Input As #1 ' Open file for input
Do While Not EOF(1) ' Loop until end of file
linenumber = linenumber + 1
Line Input #1, line
arrayOfElements = Split(line, ";")
elementnumber = 0
For Each element In arrayOfElements
elementnumber = elementnumber + 1
Cells(linenumber, elementnumber).Value = element
Next
Loop
Close #1 ' Close file.
End Sub
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
You can open PDFs in text editors to see the structure of how the PDF is written.
Using VBA I have opened a PDF as a text file and go to extract the text and save it as a string variable in VBA. I want to look through this text to find a specific element; a polyline (called sTreamTrain) and get the vertices of the polyline by using the InStr function.
When i add more vertices to the polyline I cannot seem to extract the text string of the pdf. I get the error 'Run time error 62' which I do not understand what it means or what about the PDF has changed to now have this error.
Attached (via the link) is a PDF that I can read (Document 15) and a PDF I cannot read (Document 16). I have checked in excel so see that the vertices are present in both files. Also there is a copy of my VBA script as a notepad document and also my excel file (but it is difficult to find in my excel file - the script is "Module 6" function called "CoordExtractor_TestBuild01()")
Link:
https://drive.google.com/open?id=1zOhwnFWZZfy9bTAxKiQFSl7qiQLlYIJV
Code snippet of the text extraction process below to reproduce the problem (given an applicable pdf is used):
Sub CoordExtractor_TestBuild01()
'Opening the PDF and getting the coordinates
Dim TextFile As Integer
Dim FilePath As String
Dim FileContent As String
'File Path of Text File
FilePath = "C:\Users\KAllan\Documents\WorkingInformation\sTreamTrain\Document16 - Original.pdf"
'Determine the next file number available for use by the FileOpen function
TextFile = FreeFile
'Open the text file in a Read State
Open FilePath For Input As TextFile
'Store file content inside a variable
Dim Temp As Long
Temp = LOF(TextFile)
FileContent = Input(LOF(TextFile), TextFile)
'Clost Text File
Close TextFile
End Sub
I would like someone to let me know what runtime error 62 is in this context and propose any workflows to get around it in future. Also, I would like to know whether there certain characters you cannot store as strings? - Perhaps these are included when I increase the number of vertices past a certain number.
Also I would prefer to keep the scrips quite simple and not use external libraries because I want to share the script when it is done so others can use it thus its simpler if it works without extra dependencies etc, however, any and all advice welcome since this is only the first half of this project.
Thank you very much.
According to the MSDN documentation, this error is caused by the file containing
...blank spaces or extra returns at the end of the file or the syntax
is not correct.
Since your code works sometimes on documents with very similar names and content to documents where it doesn't work, we can rule out syntax errors in this case.
You can clean up the file contents before processing it any further by replacing the code at the top of your macro with the one below. With this I can read and extract information from your Document16.pdf:
Sub CoordExtractor_TestBuild01()
'Purpose to link together the extracting real PDF information and outputting the results onto a spreadsheet
'########################################################################################
'Opening the PDF and getting the coordinates
Dim n As Long
Dim TextFile As Integer
Dim FilePath As String
Dim FileContent As String
'File Path of Text File
FilePath = "C:\TEST\Document16.pdf" ' change path
'Determine the next file number available for use by the FileOpen function
TextFile = FreeFile
'Open the text file in a Read State
Open FilePath For Input As TextFile
Dim strTextLine As String
Dim vItem As Variant
Line Input #1, strTextLine
vItem = Split(strTextLine, Chr(10))
' clean file of garbage information line by line
For n = LBound(vItem) To UBound(vItem)
' insert appropriate conditions here - in this case if the string "<<" is present
If InStr(1, vItem(n), "<<") > 0 Then
If FileContent = vbNullString Then
FileContent = vItem(n)
Else
FileContent = FileContent & Chr(10) & vItem(n)
End If
End If
Next n
'Clost Text File
Close TextFile
' insert the rest of the code here
I am trying to get the last modified time for files in a directory. I loop through the directory and print the modified date. The output shows out of 10 files (Did this on other folders too with different number of files). 10 files appeared in the command prompt. All of them printed 12/31/1600.
How could I fix it so that it would print the correct date?
Dim strFilepath = "C:\Test" 'Test folder contains 10 files for test
Dim File As System.IO.FileInfo() = directory.GetFiles()
Dim File1 As System.IO.FileInfo
Dim strLastModified As String
For Each File1 In File 'Loops the GetLastWriteTime
strLastModified = System.IO.File.GetLastWriteTime(strFilepath & File.ToString()).ToShortDateString()
Console.WriteLine(strLastModified)'Prints all 10 files but with the 12/31/1600 date
'Files do exist, code goes into file, it loops through it but wrong date.
Jim gave you already the reason why your date is wron with his link to the dup.
You concat strFilepath and File.ToString() incorrectly because you are missing a backslash \ between them and thus giving something like:
C:\TestYourFile.txt.
Additionally you are using the wrong variable in the For Each.
It should be File1 instead of File (Thanks #Mark).
Solution 1:
That´s the reason why there is the Path.Combine function.
So Change
strLastModified = System.IO.File.GetLastWriteTime(strFilepath & File.ToString()).ToShortDateString()
To
strLastModified = System.IO.File.GetLastWriteTime(Path.Combine(strFilepath, File1.ToString())).ToShortDateString()
Solution 2:
Like Mark commented you could just use the FullName property which makes it even easier:
strLastModified = System.IO.File.GetLastWriteTime(File1.FullName).ToShortDateString()
I believe I have come up with a very efficient way to read very, very large files line-by-line. Please tell me if you know of a better/faster way or see room for improvement. I am trying to get better at coding, so any sort of advice you have would be nice. Hopefully this is something that other people might find useful, too.
It appears to be something like 8 times faster than using Line Input from my tests.
'This function reads a file into a string. '
'I found this in the book Programming Excel with VBA and .NET. '
Public Function QuickRead(FName As String) As String
Dim I As Integer
Dim res As String
Dim l As Long
I = FreeFile
l = FileLen(FName)
res = Space(l)
Open FName For Binary Access Read As #I
Get #I, , res
Close I
QuickRead = res
End Function
'This function works like the Line Input statement'
Public Sub QRLineInput( _
ByRef strFileData As String, _
ByRef lngFilePosition As Long, _
ByRef strOutputString, _
ByRef blnEOF As Boolean _
)
On Error GoTo LastLine
strOutputString = Mid$(strFileData, lngFilePosition, _
InStr(lngFilePosition, strFileData, vbNewLine) - lngFilePosition)
lngFilePosition = InStr(lngFilePosition, strFileData, vbNewLine) + 2
Exit Sub
LastLine:
blnEOF = True
End Sub
Sub Test()
Dim strFilePathName As String: strFilePathName = "C:\Fld\File.txt"
Dim strFile As String
Dim lngPos As Long
Dim blnEOF As Boolean
Dim strFileLine As String
strFile = QuickRead(strFilePathName) & vbNewLine
lngPos = 1
Do Until blnEOF
Call QRLineInput(strFile, lngPos, strFileLine, blnEOF)
Loop
End Sub
Thanks for the advice!
My two cents…
Not long ago I needed reading large files using VBA and noticed this question. I tested the three approaches to read data from a file to compare its speed and reliability for a wide range of file sizes and line lengths. The approaches are:
Line Input VBA statement
Using the File System Object (FSO)
Using Get VBA statement for the whole file and then parsing the string read as described in posts here
Each test case consists of three steps:
Test case setup that writes a text file containing given number of lines of the same given length filled by the known character pattern.
Integrity test. Read each file line and verify its length and contents.
File read speed test. Read each line of the file repeated 10 times.
As you can notice, Step #3 verifies the true file read speed (as asked in the question) while Step #2 verifies the file read integrity and therefore simulates real conditions when string parsing is needed.
The following chart shows the test results for the File read speed test. The file size is 64M bytes for all tests, and the tests differ in line length that varies from 2 bytes (not including CRLF) to 8M bytes.
CONCLUSION:
All the three methods are reliable for large files with normal and abnormal line lengths (please compare to Graeme Howard’s answer)
All the three methods produce almost equivalent file reading speed for normal line lengths
“Superfast way” (Method #3) works fine for extremely long lines while the other two don’t.
All this is applicable to different Offices, different PCs, for VBA and VB6
You can use Scripting.FileSystemObject to do that thing.
From the Reference:
The ReadLine method allows a script to read individual lines in a text file. To use this method, open the text file, and then set up a Do Loop that continues until the AtEndOfStream property is True. (This simply means that you have reached the end of the file.) Within the Do Loop, call the ReadLine method, store the contents of the first line in a variable, and then perform some action. When the script loops around, it will automatically drop down a line and read the second line of the file into the variable. This will continue until each line has been read (or until the script specifically exits the loop).
And a quick example:
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\FSO\ServerList.txt", 1)
Do Until objFile.AtEndOfStream
strLine = objFile.ReadLine
MsgBox strLine
Loop
objFile.Close
Line Input works fine for small files. However, when file sizes reach around 90k, Line Input jumps all over the place and reads data in the wrong order from the source file.
I tested it with different filesizes:
49k = ok
60k = ok
78k = ok
85k = ok
93k = error
101k = error
127k = error
156k = error
Lesson learned - use Scripting.FileSystemObject
With that code you load the file in memory (as a big string) and then you read that string line by line.
By using Mid$() and InStr() you actually read the "file" twice but since it's in memory, there is no problem.
I don't know if VB's String has a length limit (probably not) but if the text files are hundreds of megabyte in size it's likely to see a performance drop, due to virtual memory usage.
I would think , in a large file scenario using a stream would be far more efficient, because memory consumption would be very small.
But your algorithm could alternate between using a stream and loading the entire thing in memory based on the file size. I wouldn't be surprised if one is only better than the other under certain criteria.
'you can modify above and read full file in one go
and then display each line as shown below
Option Explicit
Public Function QuickRead(FName As String) As Variant
Dim i As Integer
Dim res As String
Dim l As Long
Dim v As Variant
i = FreeFile
l = FileLen(FName)
res = Space(l)
Open FName For Binary Access Read As #i
Get #i, , res
Close i
'split the file with vbcrlf
QuickRead = Split(res, vbCrLf)
End Function
Sub Test()
' you can replace file for "c:\writename.txt to any file name you desire
Dim strFilePathName As String: strFilePathName = "C:\writename.txt"
Dim strFileLine As String
Dim v As Variant
Dim i As Long
v = QuickRead(strFilePathName)
For i = 0 To UBound(v)
MsgBox v(i)
Next
End Sub
My take on it...obviously, you've got to do something with the data you read in. If it involves writing it to the sheet, that'll be deadly slow with a normal For Loop. I came up with the following based upon a rehash of some of the items there, plus some help from the Chip Pearson website.
Reading in the text file (assuming you don't know the length of the range it will create, so only the startingCell is given):
Public Sub ReadInPlainText(startCell As Range, Optional textfilename As Variant)
If IsMissing(textfilename) Then textfilename = Application.GetOpenFilename("All Files (*.*), *.*", , "Select Text File to Read")
If textfilename = "" Then Exit Sub
Dim filelength As Long
Dim filenumber As Integer
filenumber = FreeFile
filelength = filelen(textfilename)
Dim text As String
Dim textlines As Variant
Open textfilename For Binary Access Read As filenumber
text = Space(filelength)
Get #filenumber, , text
'split the file with vbcrlf
textlines = Split(text, vbCrLf)
'output to range
Dim outputRange As Range
Set outputRange = startCell
Set outputRange = outputRange.Resize(UBound(textlines), 1)
outputRange.Value = Application.Transpose(textlines)
Close filenumber
End Sub
Conversely, if you need to write out a range to a text file, this does it quickly in one print statement (note: the file 'Open' type here is in text mode, not binary..unlike the read routine above).
Public Sub WriteRangeAsPlainText(ExportRange As Range, Optional textfilename As Variant)
If IsMissing(textfilename) Then textfilename = Application.GetSaveAsFilename(FileFilter:="Text Files (*.txt), *.txt")
If textfilename = "" Then Exit Sub
Dim filenumber As Integer
filenumber = FreeFile
Open textfilename For Output As filenumber
Dim textlines() As Variant, outputvar As Variant
textlines = Application.Transpose(ExportRange.Value)
outputvar = Join(textlines, vbCrLf)
Print #filenumber, outputvar
Close filenumber
End Sub
Be careful when using Application.Transpose with a huge number of values. If you transpose values to a column, excel will assume you are assuming you transposed them from rows.
Max Column Limit < Max Row Limit, and it will only display the first (Max Column Limit) values, and anithing after that will be "N/A"
I just wanted to share some of my results...
I have text files, which apparently came from a Linux system, so I only have a vbLF/Chr(10) at the end of each line and not vbCR/Chr(13).
Note 1:
This meant that the Line Input method would read in the entire file, instead of just one line at a time.
From my research testing small (152KB) & large (2778LB) files, both on and off the network I found the following:
Open FileName For Input: Line Input was the slowest (See Note 1 above)
Open FileName For Binary Access Read: Input was the fastest for reading the whole file
FSO.OpenTextFile: ReadLine was fast, but a bit slower then Binary Input
Note 2:
If I just needed to check the file header (first 1-2 lines) to check if I had the proper file/format, then FSO.OpenTextFile was the
fastest, followed very closely by Binary Input.
The drawback with the Binary Input is that you have to know how many characters
you want to read.
On normal files, Line Input would also be a good
option as well, but I couldn't test due to Note 1.
Note 3:
Obviously, the files on the network showed the largest difference in read speed. They also showed the greatest benefit from reading the file a second time (although there are certainly memory buffers that come into play here).