PDF page count not correct - pdf

I was just wondering why the vbs code in the link below is not counting pdf pages correctly? It seems to under count by half or more the number of pages that actually exist in each pdf.
http://docs.ongetc.com/index.php?q=content/pdf-pages-counting-using-vb-script
Here is the code if you can not access the link above:
' By Chanh Ong
'File: pdfpagecount.vbs
' Purpose: count pages in pdf file in folder
Const OPEN_FILE_FOR_READING = 1
Set gFso = WScript.CreateObject("Scripting.FileSystemObject")
Set gShell = WScript.CreateObject ("WSCript.shell")
Set gNetwork = Wscript.CreateObject("WScript.Network")
directory="."
set base=gFso.getFolder(directory)
call listPDFFile(base)
Function ReadAllTextFile(filespec)
Const ForReading = 1, ForWriting = 2
Dim f
Set f = gFso.OpenTextFile(filespec, ForReading)
ReadAllTextFile = f.ReadAll
End Function
function countPage(sString)
Dim regEx, Match, Matches, counter, sPattern
sPattern = "/Type\s*/Page[^s]" ' capture PDF page count
counter = 0
Set regEx = New RegExp ' Create a regular expression.
regEx.Pattern = sPattern ' Set pattern "^rem".
regEx.IgnoreCase = True ' Set case insensitivity.
regEx.Global = True ' Set global applicability.
set Matches = regEx.Execute(sString) ' Execute search.
For Each Match in Matches ' Iterate Matches collection.
counter = counter + 1
Next
if counter = 0 then
counter = 1
end if
countPage = counter
End Function
sub listPDFFile(grp)
Set pf = gFso.CreateTextFile("pagecount.txt", True)
for each file in grp.files
if (".pdf" = lcase(right(file,4))) then
larray = ReadAllTextFile(file)
pages = countPage(larray)
wscript.echo "The " & file.name & " PDF file has " & pages & " pages"
pf.WriteLine(file.name&","&pages)
end if
next
pf.Close
end sub
Thanks

The solution offered (and accepted) will only work for a limited number of PDF documents. Since PDF documents frequently compress large chunks of data including page metadata, crude regular expression searches for "type\s*/page[^s]" will often miss pages.
The only really reliable solution is to very laboriously decompose the PDF document. I'm afraid I don't have a working VBS solution but I have written a Delphi function which demonstrates how to do this (see http://www.angusj.com/delphitips/pdfpagecount.php).

Try this
Function getPdfPgCnt(ByVal sPath)
Dim strTStr
With CreateObject("Adodb.Stream")
.Open
.Charset = "x-ansi"
.LoadFromFile sPath
strTStr = .ReadText(-1)
End With
With (New RegExp)
.Pattern = "Type\s+/Page[^s]"
.IgnoreCase = True
.Global = True
getPdfPgCnt = .Execute(strTStr).Count
End With
If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function
'Usage : getPdfPgCnt("C:\1.pdf")
Update #1~#2:
Option Explicit
Private Function getPdfPgCnt(ByVal sPath) 'Returns page count of file on passed path
Dim strTStr
With CreateObject("Adodb.Stream")
.Open
.Charset = "x-ansi"
.LoadFromFile sPath
strTStr = .ReadText(-1)
End With
With (New RegExp)
.Pattern = "Type\s*/Page[^s]"
.IgnoreCase = True
.Global = True
getPdfPgCnt = .Execute(strTStr).Count
End With
If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function
'--------------------------------
Dim oFso, iFile
Set oFso = CreateObject("Scripting.FileSystemObject")
'enumerating pdf files in vbs's base directory
For Each iFile In oFso.getFolder(oFso.GetParentFolderName(WScript.ScriptFullName)).Files
If LCase(oFso.GetExtensionName(iFile)) = "pdf" Then WScript.Echo iFile & " has "& getPdfPgCnt(iFile)&" pages."
Next
Set oFso = Nothing
'--------------------------------

Related

How to quickly search selected text on Google in the Microsoft Word Document?

I found an article with a Macro that can selected text in the Microsoft Word Document and search it on Google:
https://www.datanumen.com/blogs/quickly-search-selected-text-google-yahoo-bing-word-document/
However, the first row's code "Dim objIE As Object" makes it cannot be ran on my computer since my company has uninstalled the Internet Explorer (IE) many years ago. And the current Microsoft Edge API does not allow such method.
Sub OpenBrowser(strAddress As String, Menubar As Boolean, nHeight As Long, nWidth As Long, varResizable As Boolean)
Dim objIE As Object
' Create and set the object settings.
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Visible = False
.width = nWidth
.height = nHeight
.Menubar = Menubar
.Visible = True
.resizable = varResizable
.Navigate strAddress
End With
End Sub
Sub SearchOnGoogle()
Dim strText As String
Dim strButtonValue As String
strButtonValue = MsgBox("Do you want to search the selected text on Google?", vbYesNo, "Search on Google")
If strButtonValue = vbNo Then
Exit Sub
Else
' Make sure there is text selected.
If Selection.Type <> wdSelectionIP Then
strText = Selection.text
strText = Trim(strText)
Else
MsgBox ("Please select text first!")
Exit Sub
End If
' Search selected text on Google with browser window opened in set size.
OpenBrowser "https://www.google.com/search?num=20&hl=en&q=" & strText, True, 550, 650, True
End If
End Sub
Then, I have written the following Macro to select the word in MS Word and then search on Google. But it can only search one word only. If multiple words (such as "Social Capital") is selected and ran this Macro, the Chrome will pop-out two times and search "Social" and "Capital" separately.
Sub Google_Search_Single_Word()
Dim theTerm As String
Dim strURL As String
Dim arrSites(1)
Dim appPath As String
Dim strText As String
Dim strButtonValue As String
appPath = """C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"""
If Selection.Type = wdSelectionIP Then
theTerm = Selection.Words(1).Text
Else
theTerm = Selection.Text
End If
arrSites(1) = "http://www.google.com/search?hl=en&q=" + theTerm
For i = 0 To 1 Step 1
strURL = arrSites(i)
Shell (appPath & " -url " & strURL)
Next i
End Sub
Thus I found a version of Excel VBA Macro from the website:https://excelchamps.com/blog/vba-code-search-google-chrome/, which is also applicable to MS Word. However, this is a method that pop-out a box to search. If you don't type anything on that, it still automatically open the Google Chrome, which is not user-friendly.
Sub GoogleSearch()
Dim chromePath As String
Dim search_string As String
Dim query As String
query = InputBox("Please enter the keywords", "Google Search")
search_string = query
search_string = Replace(search_string, " ", "+")
chromePath = "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"
Shell (chromePath & " -url http://www.google.com/search?hl=en&q=" & search_string)
End Sub
I'm thankful that I can enjoy the above contributions from different experts. Does anyone know how can I edit one of the above versions to make a Macro that can quickly search selected text on Google in the Microsoft Word Document?
Here's a version of the Google_Search_Single_Word Sub that can handle multiple words. It uses the helper function URLEncode, and you will need to include the Microsoft ActiveX Data Objects library in your project (Tools > References. If there are multiple versions available, go with the highest version number).
URLEncode is from this answer.
Sub Google_Search_Selected_Text()
Dim theTerm As String
Dim strURL As String
Dim arrSites(1)
Dim appPath As String
Dim strText As String
Dim strButtonValue As String
appPath = """C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"""
theTerm = URLEncode(Selection.Text)
MsgBox theTerm
arrSites(1) = "http://www.google.com/search?hl=en&q=" + theTerm
For i = 0 To 1 Step 1
strURL = arrSites(i)
Shell (appPath & " -url " & strURL)
Next i
End Sub
Public Function URLEncode( _
ByVal StringVal As String, _
Optional SpaceAsPlus As Boolean = False _
) As String
Dim bytes() As Byte, b As Byte, i As Integer, space As String
If SpaceAsPlus Then space = "+" Else space = "%20"
If Len(StringVal) > 0 Then
With New ADODB.Stream
.Mode = adModeReadWrite
.Type = adTypeText
.Charset = "UTF-8"
.Open
.WriteText StringVal
.Position = 0
.Type = adTypeBinary
.Position = 3 ' skip BOM
bytes = .Read
End With
ReDim Result(UBound(bytes)) As String
For i = UBound(bytes) To 0 Step -1
b = bytes(i)
Select Case b
Case 97 To 122, 65 To 90, 48 To 57, 45, 46, 95, 126
Result(i) = Chr(b)
Case 32
Result(i) = space
Case 0 To 15
Result(i) = "%0" & Hex(b)
Case Else
Result(i) = "%" & Hex(b)
End Select
Next i
URLEncode = Join(Result, "")
End If
End Function
Go to Google, type in two words (e.g. dog food), and look at the resulting URL. Notice that the space between dog and food has been replaced with a + symbol (https://www.google.com/search?q=dog%20food). This is called URL escaping and is necessary for the URL to be properly interpreted. The original version of the Sub does not escape the URL, so Google only picks up the first word.
The version I posted escapes the URL to ensures that spaces are converted into +, as well as handling other characters that need to be escaped such as ! -> %21, ? -> %3F, etc.

How to store data from a file in memory for reuse?

I have VBA code which executes on mail's reception.
I want to forward a template to the first address found in the mail. I execute a regex to find the email address in the mail, read a html file (the template) and forward it to the email address.
Outlook shuts down after few minutes. I think it is a performance problem. I want to optimize the code and if I can between two executions not read the template two times. Is it possible to store it into a global variable?
Sub GetEmailAndForward(Item As Outlook.MailItem)
' RegExp
Dim mailRegExp As RegExp
' File
Dim FileTemplate As Integer
Dim FileProperties As Integer
' Properties
Dim splitProperty() As String
' Email
Dim DataLine As String
Dim emails As MatchCollection
Dim email As String
Dim forward As Outlook.MailItem
Dim body As String
Dim forwardText As String
' Path
Dim fileTemplatePath As String
Dim dirPath As String
Dim filePropertyPath As String
dirPath = "C:\OutlookVBA"
Set mailRegExp = New RegExp
With mailRegExp
.Pattern = "[\_]*([a-z0-9]+(\.|\_*)?)+#([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}"
.Global = False
.IgnoreCase = True
End With
' Get the template
fileTemplatePath = dirPath & "\template.html"
' Get the email body to analyse
body = Item.body
' Get the first email found
If mailRegExp.Test(body) Then
Set emails = mailRegExp.Execute(body)
If emails.Count > 0 Then
email = emails.Item(0)
Set forward = Item.forward
FileTemplate = FreeFile()
Open fileTemplatePath For Input As #FileTemplate
While Not EOF(FileTemplate)
Line Input #FileTemplate, DataLine
forwardText = forwardText & DataLine
Wend
forward.BodyFormat = olFormatHTML
forward.HTMLBody = forwardText & forward.HTMLBody
Close #FileTemplate
If Not IsEmpty(email) Then
forward.Recipients.Add email
forward.subject = "RE:" & Item.subject
forward.Send
End If
End If
End If
End Sub
You can use something like this - the function will only read from the file on the first call, and after that will use the text stored in the static variable:
Function GetForWardText(f As String) As String
Static rv As String '<< valuje is maintained between calls
If Len(rv) = 0 Then
rv = CreateObject("scripting.filesystemobject"). _
opentextfile(f, 1).readall()
End If
ForWardText = rv
End Function
In your code, remove this:
FileTemplate = FreeFile()
Open fileTemplatePath For Input As #FileTemplate
While Not EOF(FileTemplate)
Line Input #FileTemplate, DataLine
forwardText = forwardText & DataLine
Wend
and replace with:
forwardText = GetForWardText(fileTemplatePath)

VBSCRIPT to read multi files into an array then write them to a single file

As the title suggests, I have three separate text files that I want to join together in a certain order (i.e., append file1, file2, file3 (in order) to make file4).
From what I've read, to do this with VBScript would require the FileSystemObject to read the files into an array then write the contents to the new file (I am open to whatever works with VBScript if suggested)
I'm having the following issues with my code:
1) The script runs, but produces no data
2) After I get it to run, it is imperative that the files append to the output file in the order of the array in the order (per line) I suggest above.
Here is the Array example I'm working with :
CODE
Const ForReading = 1
Dim arrServiceList(2)
arrServiceList(0) = strText1
arrServiceList(1) = strText2
arrServiceList(2) = strText3
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutputFile = objFSO.CreateTextFile("output.txt")
Set objTextFile1 = objFSO.OpenTextFile("C:\Users\95540\Desktop\Sample1.txt", ForReading)
Set objTextFile2 = objFSO.OpenTextFile("C:\Users\95540\Desktop\Sample2.txt", ForReading)
Set objTextFile3 = objFSO.OpenTextFile("C:\Users\95540\Desktop\Sample3.txt", ForReading)
strText1 = objTextFile1.ReadAll
objTextFile1.Close
strText2 = objTextFile2.ReadAll
objTextFile2.Close
strText3 = objTextFile3.ReadAll
objTextFile3.Close
objOutputFile.WriteLine arrServiceList(0)
objOutputFile.Close
====================
UPDATE TO MY CODE 5-15-15 (Description of corrections in below post)
CODE
Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutputFile = objFSO.CreateTextFile("output.txt")
Set objTextFile1 = objFSO.OpenTextFile("C:\Users\Brill\Desktop\Grab1.txt", ForReading)
Set objTextFile2 = objFSO.OpenTextFile("C:\Users\Brill\Desktop\Grab2.txt", ForReading)
Set objTextFile3 = objFSO.OpenTextFile("C:\Users\Brill\Desktop\Grab3.txt", ForReading)
Do While objTextFile1.AtEndOfStream <> True
Do While objTextFile2.AtEndOfStream <> True
Do While objTextFile3.AtEndOfStream <> True
strText1 = objTextFile1.ReadLine
objOutputFile.Write strText1 & vbTab
strText2 = objTextFile2.ReadLine
objOutputFile.Write strText2 & vbTab
strText3 = objTextFile3.ReadLine
objOutputFile.Write strText3 & vbTab & vbCrLf
Loop
Loop
Loop
objOutputFile.Close
objTextFile1.Close
objTextFile2.Close
objTextFile3.Close
The below works.
Problems with your script. 1. You were assigning the variables to the array before you had populated them. 2. You were not writing all the elements of the array.
Const ForReading = 1
Dim arrServiceList(2)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutputFile = objFSO.CreateTextFile("output.txt")
Set objTextFile1 = objFSO.OpenTextFile("C:\Users\95540\Desktop\Sample1.txt", ForReading)
Set objTextFile2 = objFSO.OpenTextFile("C:\Users\95540\Desktop\Sample2.txt", ForReading)
Set objTextFile3 = objFSO.OpenTextFile("C:\Users\95540\Desktop\Sample3.txt", ForReading)
strText1 = objTextFile1.ReadAll
objTextFile1.Close
strText2 = objTextFile2.ReadAll
objTextFile2.Close
strText3 = objTextFile3.ReadAll
objTextFile3.Close
arrServiceList(0) = strText1
arrServiceList(1) = strText2
arrServiceList(2) = strText3
objOutputFile.WriteLine arrServiceList(0)
objOutputFile.WriteLine arrServiceList(1)
objOutputFile.WriteLine arrServiceList(2)
objOutputFile.Close
Merging/Zipping more then one collection (e.g. some 'column files') into one collection (e.g. a 'table file') is a standard problem with a standard solution strategy (which doesn't involve reading "the files into an array" at all).
This demo code:
Option Explicit
Dim goFS : Set goFS = CreateObject("FileSystemObject")
Dim oFZip : Set oFZip = New cFZip
oFZip.m_aIFSpecs = Split("..\data\a.txt ..\data\b.txt ..\data\c.txt")
oFZip.zip "..\data\abc.txt"
WScript.Echo goFS.OpenTextFile("..\data\abc.txt").ReadAll()
Class cFZip
Public m_aIFSpecs ' array of input files
Function zip(sOFSpec)
Dim tsOut : Set tsOut = goFS.CreateTextFile(sOFSpec)
Dim nUBFiles : nUBFiles = UBound(m_aIFSpecs)
ReDim aFiles(nUBFiles)
Dim f
For f = 0 To nUBFiles
Set aFiles(f) = goFS.OpenTextFile(m_aIFSpecs(f))
Next
Dim bDone
Do
Redim aData(UBound(m_aIFSpecs))
bDone = True
For f = 0 To nUBFiles
If Not aFiles(f).AtEndOfStream Then
bDone = False
aData(f) = aFiles(f).ReadLine()
End If
Next
If Not bDone Then tsOut.WriteLine Join(aData, ",")
Loop Until bDone
For f = 0 To nUBFiles
aFiles(f).Close
Next
tsOut.Close
End Function
End Class
output:
1,10,100
2,20,200
3,30,300
4,,400
,,500
shows the basic approach. I use a Class to make experiments/specific adaptions (e.g. delimiter, quoting, ...) easier.

validation of comma and other characters

This is my sample File !
col1,col2,colx,col3,col4,col5
1,A,,AA,X,Y
2,B,,,*/;wBB,D --invalid or bad
3,E,,,....;*()//FF,Y --invalid or bad
4,G,,,.,;'()XX,P --invalid or bad
5,P,Kk,,...(),D
After following Instruction from here I have
2,B,,,BB,D
3,E,,,FF,Y
4,G,,,XX,P
As bad data in a Csv file my task is to validate records through splitting each column and check for a extra delimiter,if found remove the delimiter
I tried this !
Sub File validation()
Dim goFS: Set goFS = CreateObject("Scripting.FileSystemObject") ' (2)
Dim tsIn: Set tsIn = goFS.OpenTextFile("....bad.csv")
Do Until tsIn.AtEndOfStream
sLine = tsIn.ReadLine()
If sLine = EOF then exit else Loop ' I get a error here
Dim str : strconv(sLine) 'error
End Sub
Function strConv(ByVal str As String) As String
Dim objRegEx As Object, allMatches As Object
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
.MultiLine = False
.IgnoreCase = False
.Global = True
.Pattern = ",,,"
End With
strConv = objRegEx.Replace(str, ",,")
End Function
I need a solution with or without Regex to validate this file and put back into source file!
I am very new to to vba scripting can somebody Help me!
After validation I need file to look something like this
col1,col2,colx,col3,col4,col5
1,A,,AA,X,Y
2,B,,BB,D,
3,E,,FF,Y,
4,G,,XX,P,
5,P,Kk,,,D
Are you saying that rows without a value for colX are "bad"? It appears they just have no value. Regardless, you can check for a value in colX easily enough.
Do While Not tsIn.AtEndOfStream
' Read and split the line...
a = Split(tsIn.ReadLine, ",")
' Check for a value in "colX"...
If Len(Trim(a(2))) = 0 Then
' Not sure what you want to do here. Replace it with another value?
a(2) = "0"
End If
' Write the line to another file...
tsOut.WriteLine Join(a, ",")
Loop
An 'experimental function' (see here) to work out the RegExp for converting bad to good lines:
Function demoRegExp()
demoRegExp = 0
Dim aTests : aTests = Array( _
"2,B,,,BB,D", "2,B,,BB,D," _
, "3,E,,,FF,Y", "3,E,,FF,Y," _
, "field,no comma here,,,what,ever", "field,no comma here,,what,ever," _
)
Dim sC : sC = ","
Dim sF : sF = "[^,]+"
Dim r : Set r = New RegExp
r.Pattern = Join(Array("^(", sF, sC, sF, sC, sC, ")(", sC, ")(", sF, sC, sF, ")$"), "")
WScript.Echo "pattern:", qq(r.Pattern)
Dim i
For i = 0 To UBound(aTests) Step 2
Dim sInp : sInp = aTests(i + 0)
Dim sExp : sExp = aTests(i + 1)
Dim sAct : sAct = r.Replace(sInp, "$1$3$2")
WScript.Stdout.Write qq(sInp) & " => " & qq(sAct)
If sAct = sExp Then
WScript.Echo " ok"
Else
WScript.Echo " Fail - exp:", qq(sExp)
End If
Next
End Function
output:
pattern: "^([^,]+,[^,]+,,)(,)([^,]+,[^,]+)$"
"2,B,,,BB,D" => "2,B,,BB,D," ok
"3,E,,,FF,Y" => "3,E,,FF,Y," ok
"field,no comma here,,,what,ever" => "field,no comma here,,what,ever," ok

Read and store data using vb script

I have a file with following content
aaaaaaaaa filename1.txt
bbbbbbbbbb filename2.dat
i want to read this file and store data to the relevent file seperately.
eg :- aaaaa ---> filename1.txt
Could someone please tell me the way to do this? If you have any worked through examples, that would be a real help!
Sub Main
Set ctlpath = "\\Download_Directory\CA.ctl"
Set subDirectory = "AUB"
Set MkrExtention = ".mkr"
ReadCtl(ctlpath,subDirectory,MkrExtention )
End Sub
Function ReadCtl(ctlFileName,ctlSubDirectory,MarkerFileExtension)
Const ForReading = 1
Const ForWriting = 2
Dim lineData,hashValue,fileName,fileToWrite
Set objFSO = CreateObject("Scripting.FileSystemObject")
If objFSO.FileExists(ctlFileName) Then
Set objFile = objFSO.OpenTextFile(ctlFileName, ForReading)
Do Until objFile.AtEndOfStream
lineData = objFile.ReadLine
hashValue =Split(lineData," ") (0)
fileName =Split(lineData," ") (1)
WriteFileText("\\Download"+ctlSubDirectory+"
\"+fileName+MarkerFileExtension,hashValue)
Loop
objFile.Close
objFSO.Close
End If
ReadCtl = True
End Function
Function WriteFileText(sFilePath, sText)
Dim objFSO1 'As FileSystemObject
Dim objTextFile 'As Object
Set objFSO1 = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO1.CreateTextFile(sFilePath, True)
' Write hash data in to the file .mkr file
objTextFile.Write (sText)
objTextFile.Close
objFSO1.Close
End Function
A long time since you asked this, surprises me nobody answered, is simple enough.
Hope you'r still around, anyway here for reference. This is no real hash like eg in Ruby of course but i kept enough of your script so that you can compare.
ReadCtl("file2hash1.txt")
function ReadCtl(ctlFileName)
const ForReading = 1, ForWriting = 2, ForAppending = 8, createIfNeeded = true
set oFso = createObject("Scripting.FileSystemObject")
Set oRegExpre = new RegExp
oRegExpre.Global = true
oRegExpre.Pattern = "(.*) +(.*)"
if oFso.FileExists(ctlFileName) then
set objFileRead = oFso.OpenTextFile(ctlFileName, ForReading)
do until objFileRead.AtEndOfStream
lineData = objFileRead.ReadLine
hashValue = oRegExpre.Replace(lineData,"$1")
fileName = oRegExpre.Replace(lineData,"$2")
if oFso.fileExists(fileName) then
modus = ForAppending
else
modus = ForWriting
end if
set objFileWrite = oFso.OpenTextFile(fileName, modus, createIfNeeded)
objFileWrite.writeLine hashValue
objFileWrite.close
loop
objFileRead.close
end if
set oFso = nothing
end function