VBA search and replace script produce long delays - vba

[[UPDATE: Apologies...I forgot the code block. Edited.]]
This is a question from one of my faculty, so I'll quote him directly, and preface this by saying I don't know VB (just the conduit here):
"In order to help an international student, I need to convert Youtube transcripts and Closed Captionings into readable text.
I wrote this routine for the purpose of eliminating time code marks from Youtube transcripts by invoking Visual Basic from within Microsoft Word. Since time codes are always on a separate line and since time codes must contain “:”, I am searching for the character “:” and then deleting the whole line in which it occurs. For some reason, the routine I have written, when completed, forces Word into a long unresponsive period (around 60-100 seconds) after which it works perfectly well. Any suggestions for avoiding that delay or for solving this problem within Word, without writing macros?"
Sub Deleteyt()
Dim oRng As Word.Range
Dim oRngDelete As Word.Range
Set oRng = ActiveDocument.Range
With oRng.Find
.Text = ":"
While .Execute
oRng.Select
Set oRngDelete = ActiveDocument.Bookmarks("\Line").Range
oRngDelete.Delete
Wend
End With
End Sub
So the code works, there's just the long delay up front. Any ideas?

You don't need a macro for this - all you need is a wildcard Find/Replace, with:
Find = <[0-9]#:[0-9]#>*^13
Replace = nothing

Why not modifying the text file which contains the transcript? I would expect you end up with a .txt file.
See example below, highly insipred from: Text file in VBA: Open/Find Replace/SaveAs/Close File
Sub CleanFile()
Dim sBuf As String
Dim sTemp As String
Dim iFileNum As Integer
Dim sFileName As String
' Edit as needed
sFileName = application.GetOpenFilename
iFileNum = FreeFile
Open sFileName For Input As iFileNum
Do Until EOF(iFileNum)
Line Input #iFileNum, sBuf
'If the line does not contain a ':', then include the line in the memory sTemp
if If InStr(sBuf, ":") = 0 Then sTemp = sTemp & sBuf & vbCrLf
Loop
Close iFileNum
iFileNum = FreeFile
sFileName = Application.GetSaveAsFilename()
Open sFileName For Output As iFileNum
Print #iFileNum, sTemp
Close iFileNum
End Sub

Related

VBA - How can you pull Cyrillic text into Powerpoint Label from .TXT File

I am making a random word generator for use in my Global Studies class for a game with the Cyrillic alphabet. I found a VBA setup for PowerPoint 2016 to pull random words from a text file. The problem is, it won't show the Cyrillic. I have tried changing the encoding in the VBA Tools. I have made sure to try different encoding settings for the .txt file, but I can't seem to get actually Cyrillic letters in the label.
The VBA code that I'm using is:
Public myArray, Word1
Sub OnSlideShowPageChange(ByVal SSW As SlideShowWindow)
If SSW.View.CurrentShowPosition = 2 Then
Randomize
Label1.Caption = ""
Dim path
path = ActivePresentation.path & "\words.txt"
Open path For Input As #1
filecontent = Input(LOF(1), #1)
Close #1
myArray = Split(filecontent, vbCrLf)
End If
End Sub
Private Sub CommandButton1_Click()
Word1 = Int((UBound(myArray)) * Rnd)
Label1.Caption = myArray(Word1)
End Sub
Private Sub Label1_Click()
End Sub
Try reading a line at a time from the file rather than using LOF:
Function FileToString(sFileName as string) as String
Dim FileNum As Integer
Dim sBuf As String
Dim sTemp as String
FileToString = False ' by default
If ExistFile(sFileName ) Then
FileNum = FreeFile
sTemp = ""
Open sFileName For Input As FileNum
While Not EOF(FileNum)
Line Input #FileNum, sBuf
sTemp= sTemp & sBuf & vbCrLf
Wend
Close FileNum
FileToString = sTemp
End Function

Why pasting a text file into Excel cuts off sentence in the middle

I wrote a script that loops through hundreds of simple text files. The script opens a file, pastes it into Excel, extracts some data, deletes the text file and moves on to the next one. It worked perfectly until few days ago, when it opened one of those files (it's happening only on that one specific file) after pasting it into Excel I've noticed that one of the lines ended up abruptly...
If I will Debug.Print that file I can see that everything is in there, but if I will paste it into Excel, the same line ends abruptly...
Does anyone knows what might be causing that? What's weirder, if I will manually copy and paste the text into a new text file, Excel processes that without any problem...
Below is a code that deals with those text files.
MyFolder = "C:\Test\"
StrFile = Dir(MyFolder & "tempfile.txt")
Open MyFolder & StrFile For Binary As #1
MyData = Space$(LOF(1))
Get #1, , MyData
Close #1
strData() = Split(MyData, vbCrLf)
LineNumber = 1
For xy = LBound(strData) To UBound(strData)
Debug.Print strData(xy) 'THIS WORKS!
Range("A" & LineNumber).Value = strData(xy) 'THIS DOESN'T!
LineNumber = LineNumber + 1
Next xy
Thank you!
I found what was causing this problem! I opened the text file in a Word and noticed that the line that was abruptly cut had some weird symbols (four empty squares?) that were not visible in Notepad. So then I started opening all the other files in Word and neither one of them had them.
So then I started reading about "non printable characters" and came up with this solution... I'm replacing each ASCII(0) (NULL) character with a blank space.
I'm posting it in case someone else will have to deal with a problem like that:
MyFolder = "C:\Test\"
StrFile = Dir(MyFolder & "tempfile.txt")
Open MyFolder & StrFile For Binary As #1
MyData = Space$(LOF(1))
Get #1, , MyData
Close #1
strData() = Split(MyData, vbCrLf)
LineNumber = 1
For Each sData In strData
sData = Replace(sData, Chr(0), " ")
Range("A" & LineNumber).Value = sData 'Now it works!!
LineNumber = LineNumber + 1
Next
Thank you for your help everybody!
(Need formatting of an answer, but this is really a comment)
Give this a try:
Sub tgr()
Dim oFSO As Object
Dim sFolderPath As String
Dim sFileName As String
Dim sFullText As String
Dim aOutput() As String
Set oFSO = CreateObject("Scripting.FileSystemObject")
sFolderPath = "C:\Test\"
sFileName = Dir(sFolderPath & "tempfile.txt")
sFullText = oFSO.OpenTextFile(sFolderPath & sFileName).ReadAll
aOutput = Split(sFullText, vbCrLf)
With Range("A1").Resize(UBound(aOutput) - LBound(aOutput) + 1)
.Value = Application.Transpose(aOutput)
End With
End Sub
If that still fails, you'll need to provide the problem text file.

How to keep original word document open when saving copy in HTML?

The problem I'm having is when I run my macro to save the current Word Document as a HTML type, the document still remains open but not in the original .docx format, it's in the .htm format.
If I were to edit the document after the macro is ran, it wouldn't remain on the original .docx format later.
I would appreciate feedback on how to remain in the original format when also saving a copy with a different format. Thanks.
Here is my docx to html code in VBA
Sub DocToHTML()
Dim slice As String
Dim strDocName As String
Dim PathOrg As String
On Error Resume Next
strDocName = ActiveDocument.Name
slice = Left(strDocName, InStrRev(strDocName, ".") - 1)
strDocName = ActiveDocument.Path + "\" + slice
ActiveDocument.SaveAs2 FileName:=strDocName, FileFormat:=wdFormatHTML
End Sub
Before you write code to do things like this stop and think how you would do it in the UI without code. Any code that you write will simply automate that process.
So what would you do in the UI?
Save the original document to preserve any changes that you have made.
Save a copy as html.
Reopen the original document.
Possibly close the html version.
So your code can be rewritten as follows:
Sub DocToHTML()
Dim origName As String
Dim saveName As String
Dim docHTML As Document
If Not ActiveDocument.Saved Then ActiveDocument.Save
origName = ActiveDocument.FullName
saveName = Left(origName, InStrRev(origName, ".") - 1)
ActiveDocument.SaveAs2 FileName:=saveName, FileFormat:=wdFormatHTML
Set docHTML = ActiveDocument
Documents.Open origName
docHTML.Close wdDoNotSaveChanges
End Sub

How to count number of rows and to move files automatically with VBA macros?

My goal is to write a VBA macros that will allow:
to choose a folder with files to open
then to count number of rows in each file (each file contain only 1 sheet).
to move to another folder all the files that contain more than 1 row
I'm very new in VBA, so what i found is how to count number of rows from active worksheet, but i still can't manage automatically files opening and moving to another folder:
Sub RowCount()
Dim iAreaCount As Integer
Dim i As Integer
Worksheets("Sheet1").Activate
iAreaCount = Selection.Areas.Count
If iAreaCount <= 1 Then
MsgBox "The selection contains " & Selection.Rows.Count & " rows."
Else
For i = 1 To iAreaCount
MsgBox "Area " & i & " of the selection contains " & _
Selection.Areas(i).Rows.Count & " rows."
Next i
End If
End Sub
Could someone help with this, please?
This is actually easy. Really easy. :)
First, code to choose a folder to look into for Excel files. Used Google and searched for excel vba select folder dialog. First result yields this code:
Function GetFolder(strPath As String) As String
Dim fldr As FileDialog
Dim sItem As String
Set fldr = Application.FileDialog(msoFileDialogFolderPicker)
With fldr
.Title = "Select a Folder"
.AllowMultiSelect = False
.InitialFileName = strPath
If .Show <> -1 Then GoTo NextCode
sItem = .SelectedItems(1)
End With
NextCode:
GetFolder = sItem
Set fldr = Nothing
End Function
We'll get to using it for later. Next, we need a loop to count how many rows there are in each file/sheet. However, we can't count them without these files open. So, let's look for a code that opens workbooks in a loop. Googling excel vba open excel files in folder, we get the second result. First result is a deprecated method in Excel 2007 and up. I will be assuming you're running 2007 and up. Here's the code, applying the proper correction detailed by Siddharth Rout.
Sub OpenFiles()
Dim MyFolder As String
Dim MyFile As String
MyFolder = "Blah blah blah"
MyFile = Dir(MyFolder & "\*.xlsx")
Do While MyFile <> ""
Workbooks.Open Filename:=MyFolder & "\" & MyFile
MyFile = Dir
Loop
End Sub
Now, some semi-advanced best practices. Rather than opening each workbook/worksheet/file and counting the rows in each of the opened files (which is highly counter-intuitive), let's modify the above code to count the rows in each file as well, then move them to another folder if they have more than one (1) used row. We'll also change the above code to take into consideration as well the first function to get the folder we want to apply the second code to.
Sub OpenFiles()
Dim MyFolder As String
Dim MyFile As String
MyFolder = GetFolder("C:\users\yourname\Desktop" 'Modify as needed.
MyFile = Dir(MyFolder & "\*.xlsx") 'Modify as needed.
Do While MyFile <> ""
Workbooks.Open Filename:=MyFolder & "\" & MyFile
MyFile = Dir
Loop
End Sub
See what happened there? We called the GetFolder function and assigned it to MyFolder. We then concatenate MyFolder and a wildcarded string, then pass it to Dir so we can loop over the files. What's the remaining two things? Right, count the used rows AND moving the files. For the used rows, I'll hack a simple function to check the workbook's only sheet to see if the row is 2 or greater.
Function CountUsedRows(Wbk As Workbook) As Long
Dim WS As Worksheet
Set WS = Wbk.Sheets(1)
CountUsedRows = WS.Range("A" & Rows.Count).End(xlUp).Row 'Modify as necessary.
End Function
Now that's simple enough. Next, let's write a simple code to move the files. For personal purposes, I'll write a code to copy instead. It'll be up to you to modify it for moving, as that's a rather sensitive operation and if it messes up... well. Hmm. But something here tells me that there's a much better option. Copying can cause all manners of error from permission denial to erroneous copying. Since we've got the file open, why not just save them instead to the new folder?
Now, let's tie them all together neatly.
Sub OpenFiles()
Dim MyFolder As String
Dim MyFile As String
Dim TargetWB As Workbook
MyFolder = GetFolder("C:\Users\yourname\Desktop") 'Modify as needed.
MyFile = Dir(MyFolder & "\*.xlsx") 'Modify as needed.
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Do While MyFile <> ""
Set TargetWB = Workbooks.Open(Filename:=MyFolder & "\" & MyFile)
With TargetWB
If CountUsedRows(TargetWB) > 1 Then
.SaveAs "C:\Users\yourname\Desktop\Blah\CopyOf" & MyFile 'Modify as needed.
End If
.Close
End With
MyFile = Dir
Loop
Shell "explorer.exe C:\Users\yourname\Desktop\Blah", vbMaximizedFocus 'Open the folder.
Application.DisplayAlerts = True
Application.ScreenUpdating = True
End Sub
Function GetFolder(strPath As String) As String
Dim fldr As FileDialog
Dim sItem As String
Set fldr = Application.FileDialog(msoFileDialogFolderPicker)
With fldr
.Title = "Select a Folder"
.AllowMultiSelect = False
.InitialFileName = strPath
If .Show <> -1 Then GoTo NextCode
sItem = .SelectedItems(1)
End With
NextCode:
GetFolder = sItem
Set fldr = Nothing
End Function
Function CountUsedRows(Wbk As Workbook) As Long
Dim WS As Worksheet
Set WS = Wbk.Sheets(1)
CountUsedRows = WS.Range("A" & Rows.Count).End(xlUp).Row 'Modify as necessary.
End Function
Tried and tested. Let us know if this works for you.
Nice answer from Manhattan: that's exactly how I use Excel's built-in functionality to select a folder and fetch a set of file names.
However, there's an interesting side-question in there:
Are those single-sheet Excel files workbooks, of just .csv text files?
If they have a .csv extension, you don't need to open them in Excel to count the rows!
Here's the code to do it:
Fast VBA for Counting Rows in a CSV file
Public Function FileRowCount(FilePath As String, Optional RowDelimiter As String = vbCr) As Long
' Returns the row count of a text file, including the header row
' Returns - 1 on error
' Unicode-compliant, works on UTF-8, UTF-16, ASCII, with or without a Byte order Marker.
' Reads a typical 30Mb file over the network in 200-300ms. Hint: always copy to a local folder.
' If you're scanning files for use with a SQL driver, use basSQL.TableRowCount: it's 20x slower,
' but it returns a proper test of the file's usability as a SQL 'table'
' Nigel Heffernan Excellerando.Blogspot.com 2015
' Unit test:
' s=Timer : for i = 0 to 99 : n=FileRowCount("C:\Temp\MyFile.csv") : Next i : Print Format(n,"&num;,&num;&num;0") & " rows in " & FORMAT((Timer-s)/i,"0.000") & " sec"
' Network performance on a good day: reads ~ 150 MB/second, plus an overhead of 70 ms for each file
' Local-drive performance: ~ 4.5 GB/second, plus an overhead of 4 ms for each file
On Error Resume Next
Dim hndFile As Long
Dim lngRowCount As Long
Dim lngOffset As Long
Dim lngFileLen As Long
Const CHUNK_SIZE As Long = 8192
Dim strChunk As String * CHUNK_SIZE
If Len(Dir(FilePath, vbNormal)) &LT; 1 Then
FileRowCount = -1
Exit Function
End If
' trap the error of a folder path without a filename:
If FileName(FilePath) = "" Then
FileRowCount = -1
Exit Function
End If
hndFile = FreeFile
Open FilePath For Binary Access Read Shared As &num;hndFile
lngFileLen = LOF(hndFile)
lngOffset = 1
Do Until EOF(hndFile)
Get &num;hndFile, , strChunk
FileRowCount = FileRowCount + UBound(Split(strChunk, RowDelimiter))
Loop
Close &num;hndFile
Erase arrBytes
End Function
Public Function FileName(Path As String) As String
' Strip the folder and path from a file's path string, leaving only the file name
' This does not check for the existence or accessibility of the file:
' all we're doing here is string-handling
' Nigel Heffernan Excellerando.Blogspot.com 2011
Dim strPath As String
Dim arrPath() As String
Const BACKSLASH As String * 1 = "\"
strPath = Trim(Path)
If strPath = "" Then Exit Function
If Right$(strPath, 1) = BACKSLASH Then Exit Function
arrPath = Split(strPath, BACKSLASH)
If UBound(arrPath) = 0 Then ' does not contain "\"
FileName = Path
Else
FileName = arrPath(UBound(arrPath))
End If
Erase arrPath
End Function
Note the use of the Split function to count the row separators: VBA's string-handling is generally slow, especially when you concatenate strings, but there are a couple of places where VBA can perform a string manipulation without internal allocation and deallocation; if you know where they are, you'll find that parts of your code run as fast as a 'C' developer's best work.
Warning: Horrible Hack
Strictly speaking, I should declare Dim arrBytes(CHUNK_SIZE) As Byte and use this Byte array instead of strChunk to receive the Get from a file opened for binary read.
There are two reasons for not doing it the 'right' way:The last Get, which will set end-of-file TRUE, will extract less data from the file than the full 'chunk'. What happens next is that these last few bytes of the file are written into the array without clearing out the data from the previous 'Get'. So you have to do additional plumbing, counting bytes off against LOF(#hwndFile) to detect the 'Last Get' and branching into a statement that clears the buffer, or allocates a smaller byte array and uses that instead;The code will only cope with UTF-8 2-byte encoded character sets, or with single-byte encoded ASCII 'Latin' text if you do a bit of byte-array substitution around your row delimiters.The VBA String type is a byte array with a wrapper that allows your code (or rather, the compiler) to handle all that complexity in the background.
However, it's much faster to go back into the primordial C, using old-school Get statements, than using later libraries like Scripting.FileSystemObject. Also, you have some ability to examine the incoming data at the byte level, to debug issues where you're getting '???????' characters instead of the text you were expecting.
Anyway: this is late to the game, as StackOverflow answers go, and it's an answer to the less-interesting part of your question. But it's going to be interesting to people who need a quick rowcount in their data files, and your question comes at the top of the list when they search for that.

VBA Scripting Runtime Library with Unix file - LF as End Of Line instead of CRLF issue

I'm writing a very simple parser to read text files into Excel.
Files are exceeding available lines in Excel 2012, so I have to go for a line by line approach.
I've tested Microsoft Scripting Runtime Library, TextStream Object and ReadLine method.
It works fine as far as I have Windows files with CRLF as end of line, while it fails when only LF is marking end of line.
I've seen lots of solutions outside VBA, is there any viable solution within VBA?
thanks in advance
Start by making with a test file that uses only LFs as line endings in c:\temp\lfs.txt
This will read the file into a string:
Dim FileNum As Integer
Dim sBuf As String
Dim sTemp As String
FileNum = FreeFile
Open "c:\temp\lfs.txt" For Input As FileNum
While Not EOF(FileNum)
Line Input #FileNum, sBuf
sTemp = sTemp & sBuf
Wend
Close FileNum
' Now, what do we have? First the string itself:
Debug.Print sTemp
' Are there any CRs in it?
Debug.Print Replace(sTemp, vbCr, "CR")
' LFs?
Debug.Print Replace(sTemp, vbLf, "LF")
' Replace the LFs with CRLFs:
Debug.Print Replace(sTemp, vbLf, vbCrLf)
Now if you write it back out to file, you should be able to use it
I used the following very simple text file separated by LFs. The following vba code workes for me. May be you could post short sample of your text file data where the TextStream.ReadLine method fails with?
Private Const TEXT_FILE_PATH As String = "C:\Temp\VBA\textFileForUnix.txt"
Public Sub test()
Dim targetSheet As Worksheet
Set targetSheet = ThisWorkbook.ActiveSheet
ReadText sheet:=targetSheet, textFilePath:=TEXT_FILE_PATH
End Sub
Private Sub ReadText(ByRef sheet As Worksheet, ByRef textFilePath As String)
Dim scriptingFileSystem As Scripting.FileSystemObject
Dim scriptingFile As Scripting.File
Dim scriptingStream As Scripting.TextStream
Set scriptingFileSystem = New Scripting.FileSystemObject
Set scriptingFile = scriptingFileSystem.GetFile(textFilePath)
Set scriptingStream = scriptingFile.OpenAsTextStream(ForReading)
Dim r As Long
Dim c As Byte
With scriptingStream
r = 1
c = 1
Do While Not .AtEndOfStream
sheet.Cells(r, c).Value = .ReadLine
r = r + 1
Loop
.Close
End With
Set scriptingFile = Nothing
Set scriptingFileSystem = Nothing
End Sub
This one-liner will fix any "unusual" end-of-line I've ever came across to a proper CRLF. In particular it will fix either single LF's or single CR's or LF-CR's.
result$ = Replace(Replace(Replace(Replace(Replace(txt$, vbLf & vbCr, vbLf), vbCrLf, vbLf), vbLf, vbCrLf), vbCr, vbCrLf), vbCrLf & vbLf, vbCrLf)