How to Copy URLs from a word document to another document? - vba

I have a word document that is stuffed with codes and URLs that are lying all over the document.
I've been trying to find out how I can extract all the URLs in this word document and have them pasted on another document?
the URLs all have the same website which starts with https://subdomain.domain.com ..
the problem is .. i will need the full URL link which usually ends with a .jpg
i have tried googling but all i find are solutions on how URLs of hyperlinks can be extracted. Couldnt find a solution on my situation so i hope you guys can help!

I edited the code to send the results to C:\temp\my_links.txt. You can edit the code to change the destination.
Public Sub GetUrls()
Dim r As Range
Dim outfile As String
outfile = "C:\temp\my_links.txt"
Open outfile For Output As #1
Set r = ActiveDocument.Range
r.Select
With Selection.Find
.ClearFormatting
.Text = "https://subdomain.domain.com/*.jpg"
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
End With
Do While Selection.Find.Execute
Write #1, Selection.Text
Loop
Close #1
End Sub
When I run this on a test file, I get this in the the output file:
"https://subdomain.domain.com/res1/joe.jpg"
"https://subdomain.domain.com/res2/cat.jpg"
Hope that helps.

This will parse the document and find all the URL's for you and output the array to a new Document:
Option Explicit
Sub FindLinks()
Dim p As Paragraph
Dim vSplit As Variant
Dim nIndex As Integer
Dim sURLs() As String
ReDim sURLs(0)
' find each URL and add it to an array
For Each p In ActiveDocument.Paragraphs
vSplit = Split(p.Range, " ")
For nIndex = 0 To UBound(vSplit)
If InStr(vSplit(nIndex), "https://stackoverflow.com") > 0 Then
ReDim Preserve sURLs(UBound(sURLs) + 1)
sURLs(UBound(sURLs)) = Replace(vSplit(nIndex), "src=", "")
End If
Next
Next
' create a new document and output the array
Dim sURL As Variant
Documents.Add
For Each sURL In sURLs
Selection.TypeText sURL
Selection.TypeParagraph
Next
End Sub

Related

Replace all instances of a string but the first one

I need to clean up text in a MS Word file.
I receive text from a web form like this and then get it as a Word file.
Confirmed: Something
Confirmed: Else
Confirmed: every
Confirmed: time
I would like to get rid of all the "Confirmed" but for the first one, to get something like the following.
Confirmed:
Something
Else
every
time
I count all the words with
Function CountOccurrences(ByVal strToCount As String) As Integer
Dim iCount As Integer
iCount = 0
With ActiveDocument.Content.Find
.Text = strToCount
.Format = False
.Wrap = wdFindStop
Do While .Execute
iCount = iCount + 1
Loop
End With
CountOccurrences = iCount
End Function
I found articles on how to delete just the first one, or the last one, but can't figure out, how to delete all but the first one.
Try this:
Sub tester()
Dim col As Collection, i As Long
Set col = AllOccurrences("Confirmed:")
For i = 2 To col.Count
col(i).Text = vbTab 'replace the text with a tab
Next i
End Sub
'Return a collection with all instances of strToMatch in the activedocument
Function AllOccurrences(ByVal strToMatch As String) As Collection
Dim rv As New Collection, rng As Range
Set rng = ActiveDocument.Range
With rng.Find
.Text = strToMatch
.Format = False
.Wrap = wdFindStop
Do While .Execute
rv.Add ActiveDocument.Range(rng.Start, rng.End)
Loop
End With
Set AllOccurrences = rv
End Function

VBA Word macro not working as expected with field results in document

I have a word document (report) and in that document, I'm importing many text files with fields like this:
{INCLUDETEXT "C:\\PATH\\TOXMLFILES\\Request.xml" \*CHARFORMAT}
Also I'm updating all those fields with a macro on opening the document...
Sub AutoOpen()
With Options
.UpdateFieldsAtPrint = True
.UpdateLinksAtPrint = True
End With
ActiveDocument.Fields.Update
End Sub
Now I need to highlight the text of those imported XMLs (in the IncludeText fields) between <faultstring></faultstring> tags
Here is code I got here on stackoverflow for highlighting text (making it bold)
Sub BoldBetweenQuotes()
' base for a quotes finding macro
Dim blnSearchAgain As Boolean
Dim blnFindStart As Boolean
Dim blnFindEnd As Boolean
Dim rngFind As word.Range
Dim rngFindStart As word.Range
Dim rngFindEnd As word.Range
Set rngFind = ActiveDocument.content
Set rngFindStart = rngFind.Duplicate
Do
' set up find of first of quote pair
With rngFindStart.Find
.ClearFormatting
.Text = "<faultstring>"
.Replacement.Text = ""
.Forward = True
.wrap = wdFindStop
blnFindStart = .Execute
End With
If blnFindStart Then
rngFindStart.Collapse wdCollapseEnd
Set rngFindEnd = rngFindStart.Duplicate
rngFindEnd.Find.Text = "</faultstring>"
blnFindEnd = rngFindEnd.Find.Execute
If blnFindEnd Then
rngFindStart.End = rngFindEnd.Start
' make it bold
rngFindStart.Font.Bold = True
rngFindStart.Start = rngFindEnd.End
rngFindStart.End = rngFind.End
blnSearchAgain = True
Else
blnSearchAgain = False
End If
Else
blnSearchAgain = False
End If
Loop While blnSearchAgain = True
End Sub
Problem is, when I run the macro in my Word document (with the IncludeText fields) it keeps cycling and bolding just the first appearance of text between faultstring tags. When I run it in a new Word document with some random text and faultrstring tags it works well...
EDIT: It turns out the problem is due to the faultstring tags being inside the IncludeText fields. I need to turn the fields into static text after opening the document and updating the fields. How can I do that?
In order to convert dynamic field content to static text using Word's object model (such as VBA) the Fields.Unlink method is required. For the entire document:
ActiveDocument.Fields.Unlink
This is also possible for any given Range; to remove the fields in the last paragraph, for example:
ActiveDocument.Paragraphs.Last.Range.Fields.Unlink
In order to unlink only a certain type of field, loop the Fields collection, test the Field.Type and unlink accordingly. For example, for IncludeText:
Sub DeleteIncludeTextFields()
Dim doc As word.Document
Set doc = ActiveDocument
Debug.Print DeleteFieldType(wdFieldIncludeText, doc)
End Sub
Function DeleteFieldType(fldType As word.WdFieldType, doc As word.Document) _
As Long
Dim fld As word.Field
Dim counter As Long
counter = 0
For Each fld In doc.Fields
If fld.Type = wdFieldIncludeText Then
fld.Unlink
counter = counter + 1
End If
Next
DeleteFieldType = counter
End Function
Assuming you want to do this for all the fields in your document, after updating it:
Sub AutoOpen()
With Options
.UpdateFieldsAtPrint = True
.UpdateLinksAtPrint = True
End With
ActiveDocument.Fields.Update
ActiveDocument.Fields.Unlink
End Sub

Find and Replace Multiple Text Strings on Multiple Text Files from a folder

I'm working on the vba code to accomplish the following tasks
Word Document Open a Text file from the folder
Find and replace the text (multiple Text) based on a excel sheet (which have find what and replace with)
Process all text files in the folder and save it.
I would like to customize the below code for the above requirement,
I'm using Office 2016 and I think I have to replace Application.FileSearch in the script to ApplicationFileSearch for 2003 and prior office editions.
I try to accomplish using the word macro recorder and also used notepad++, I've recorded in Notepad++ also and it works for one file, I would like to do batch process all files in the folder and save it after replacing the text.
As there is too many lines there to replace more than 30 or more lines to replace, I would like the code to look for the text from a excel file like find what and replace with columns.
Sub FindReplaceAllDocsInFolder( )
Dim i As Integer
Dim doc As Document
Dim rng As Range
With Application.FileSearch
.NewSearch
.LookIn = "C:\My Documents"
.SearchSubFolders = False
.FileType = msoFileTypeWordDocuments
If Not .Execute( ) = 0 Then
For i = 1 To .FoundFiles.Count
Set doc = Documents.Open(.FoundFiles(i))
For Each rng In doc.StoryRanges
With rng.Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "Dewey & Cheatem"
.Replacement.Text = "Dewey, Cheatem & Howe"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
.Execute Replace:=wdReplaceAll
End With
Next rng
doc.Save
doc.Close
Set rng = Nothing
Set doc = Nothing
Next i
Else
MsgBox "No files matched " & .FileName
End If
End With
End Sub
Thanks
Jay
Borrowed from https://social.msdn.microsoft.com/Forums/en-US/62fceda5-b21a-40b6-857c-ad28f12c1b23/use-excel-vba-to-open-a-text-file-and-search-it-for-a-specific-string?forum=isvvba
Sub SearchTextFile()
Const strFileName = "C:\MyFiles\TextFile.txt"
Const strSearch = "Some Text"
Dim strLine As String
Dim f As Integer
Dim lngLine As Long
Dim blnFound As Boolean
f = FreeFile
Open strFileName For Input As #f
Do While Not EOF(f)
lngLine = lngLine + 1
Line Input #f, strLine
If InStr(1, strLine, strSearch, vbBinaryCompare) > 0 Then
MsgBox "Search string found in line " & lngLine, vbInformation
blnFound = True
Exit Do
End If
Loop
Close #f
If Not blnFound Then
MsgBox "Search string not found", vbInformation
End If
End Sub
This is simply just finding the match. You can use the built in function "Replace" which land the total fix. You would also have to fit in the "loop through files" code, which here is a snippet.
Dim StrFile As String
StrFile = Dir(pathtofile & "\*" & ".txt")
Do While Len(StrFile) > 0
Debug.Print StrFile
StrFile = Dir
Loop
I wouldve made this a comment, but it was too much text. This isnt meant to be a full blown answer, just giving you the pieces you need to put it all together on your own.
Thanks for all your help. I have found alternate solution using the below EXE. FART.exe (FART - Find and Replace Text). I have create a batch file with the below command example.
https://emtunc.org/blog/03/2011/farting-the-easy-way-find-and-replace-text/
http://fart-it.sourceforge.net/
Examples:
fart "C:\APRIL2011\Scripts\someFile.txt" oldText newText
This line instructs FART to simply replace the string oldText with newText.
fart -i -r "C:\APRIL2011\Scripts*.prm" march2011 APRIL2011
This line will instruct FART to recursively (-r) search for any file with the .prm extension in the \Scripts folder. The -i flag tells FART to ignore case-sensitivity when looking for the string.

VB.NET replacing text in a word document, including a TextBox

I'm trying to replace some placeholder text in a word document using my VB.NET code. Here's the code I'm using:
Dim oWord As Word.Application
Dim aDoc As Word.Document
'Start Word and open the document template.
oWord = CreateObject("Word.Application")
oWord.Visible = True
'Load Invoice Template From Resource File
Dim myTempFile As String = Application.UserAppDataPath & "\mytemp.docx"
My.Computer.FileSystem.WriteAllBytes(myTempFile, My.Resources.InvoiceTemp, False)
aDoc = oWord.Documents.Add(myTempFile, , , True)
oWord.Selection.Find.ClearFormatting()
oWord.Selection.Find.Replacement.ClearFormatting()
With oWord.Selection.Find
.Text = "[iOrder]"
.Replacement.Text = bwOrderID
.Forward = True
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
oWord.Selection.Find.Execute(Replace:=Word.WdReplace.wdReplaceAll)
For Each oCtl As Word.Shape In aDoc.Shapes
If oCtl.Type = Microsoft.Office.Core.MsoShapeType.msoTextBox Then
oCtl.TextFrame.TextRange.Text.Replace("[iOrder]", bwOrderID)
End If
Next
This code executes correctly, with no errors, but it doesn't find "[iOrder]" in the textbox. However, if I then go to the word file and press ctrl + F for find and replace, the search criteria specified in the code is there, if I then click replace, it correctly replaces "[iOrder]" with the bwOrderID string.
I must be missing something here?
Update 1
I've updated my code to:
For Each oCtl As Word.Shape In aDoc.Shapes
If oCtl.Type = Microsoft.Office.Core.MsoShapeType.msoTextBox Then
oCtl.TextFrame.TextRange.Text = "Invoice: " & oCtl.TextFrame.TextRange.Text.Replace("[iOrder]", bwOrderID)
End If
Next
This is working correctly, however I am losing my formatting. Is there a way to keep the formatting? The word "Invoice:" is a different colour to the "[iOrder]" and I'd very much like to keep it like that if possible.
Update 2
I have got the "Find/Replace" working, but I have a formatting issue as a result. Before the Find/Replace, the formatting of the textbox is like this:
Code:
For Each oCtl As Word.Shape In aDoc.Shapes
If oCtl.Type = Microsoft.Office.Core.MsoShapeType.msoTextBox Then
oCtl.TextFrame.TextRange.FormattedText.Text = oCtl.TextFrame.TextRange.FormattedText.Text.Replace("[iOrder]", bwOrderID)
End If
Next
This is what the result is after the above code runs:
I have lost the alignment of the whole string and the color of the [iOrder] section that has been replaced. Is there a way of preserving this formatting?

VBA loop won't stop/doesn't find the "\EndofDoc" marker

I am writing a vba macro to search a word document line by line and trying to find certain names in the document. The looping works fine except for when it gets to the end of the document, it just continues from the top and starts over. Here is the code:
Application.ScreenUpdating = False
Dim i As Integer, Rng As Range
With ActiveDocument.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Text = "?"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchWildcards = True
.Execute
End With
Do While .Find.found
i = i + 1
Set Rng = .Duplicate
Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\line")
MsgBox "Line " & i & vbTab & Rng.Text
If Rng.Bookmarks.Exists("\EndOfDoc") Then Exit Do
.start = Rng.End
.Collapse wdCollapseEnd
.Find.Execute
Loop
End With
Set Rng = Nothing
Application.ScreenUpdating = True
I have also tried this piece of code:
Dim appWD As Word.Application
Dim docWD As Word.Document
Dim rngWD As Word.Range
Dim strDoc As String
Dim intVal As Integer
Dim strLine As String
Dim bolEOF As Boolean
bolEOF = False
' Set strDoc here to include the full
' file path and file name
On Error Resume Next
Set appWD = GetObject(, "Word.Application")
If Err.Number <> 0 Then
Set appWD = CreateObject("Word.Application")
End If
Err.Clear
On Error GoTo 0
strDoc = "c:\KenGraves\Project2\output\master.doc"
Set docWD = appWD.Documents.Open(strDoc)
appWD.Visible = True
docWD.Characters(1).Select
Do
appWD.Selection.MoveEnd Unit:=wdLine, Count:=1
strLine = appWD.Selection.Text
Debug.Print strLine
intVal = LineContainsDescendant(strLine)
If intVal = 1 Then
MsgBox strLine
End If
appWD.Selection.Collapse wdCollapseEnd
If appWD.Selection.Bookmarks.Exists("\EndOfDoc") Then bolEOF = True
Loop Until bolEOF = True
Neither seem to recognize the bookmark ("\EndOfDoc"). It doesn't matter which one gets working. Is it possible that my document does not contain this bookmark?
Not terribly elegant, but this change to one line of your first procedure seems to stop it at the appropriate time. I believe you actually have to insert bookmarks into your document if you want to reference them. They aren't automatically generated.
If i >= ActiveDocument.BuiltInProperties("NUMBER OF LINES") Then Exit Do
Cheers, LC
Unless you have a corrupted document, all Word documents should have the \EndOfDoc bookmark. You can check using simply ActiveDocument.Range.Bookmarks("\EndOfDoc").Exists. If it doesn't then you'll need to supply more details on the version of Word and if possible supply a sample document via Dropbox or the like.
I'm not sure why you're looping to the start of the Word document, when I run the code it works fine. However, if I put a footnote at the end of the document it runs into an endless loop, depending on your documents you may run into additional situations like this where your code fails to handle the document setup.
I would suggest modifying slightly how you check for the end of the document to make your code a bit more robust. I'd still use the bookmark "\EndOfDoc", however I'd check the limits of the range against your current search range.
So at the top of your code declare a range variable and set it to range of the end of the document eg:
Dim rEnd As Range
Set rEnd = ActiveDocument.Bookmarks("\EndOfDoc").Range
and then in your loop, instead of this line:
If Rng.Bookmarks.Exists("\EndOfDoc") Then Exit Do
use this line:
If Rng.End >= rEnd.End Then Exit Do