Add a text on an existing PDF document by appending something after the PDF content - pdf

I would like to "overlay" a text onto an existing PDF document, by appending something at the end of the PDF file (after %%EOF). It is very important that nothing before the %%EOF is modified.
Is it even possible to do this ?
How can I "generate" what to append after %%EOF to do this, for a given text ? The technology doesn't really matter, once I have my "blob" I will just append it myself.
Thanks a lot!

How can I "generate" what to append after %%EOF to do this, for a given text ? The technology doesn't really matter, once I have my "blob" I will just append it myself.
That "blob" to append depends on the PDF to append it to. Essentially you'll have to parse the original PDF and find the page object for the page to overlay. Then you can append a new annotation or content stream with the overlay text, a copy of the page object with a reference to that new annotation or content stream, and a new cross reference section. In general you do that using a PDF library for your preferred programming language.
In a comment to your question you asked for example code to run and see the before/after and reverse-engineer it.
In the following example I use Java and the iText 7 PDF library (current development head but any 7.1.x version should do):
try ( PdfReader pdfReader = new PdfReader(SOURCE_PDF);
PdfWriter pdfWriter = new PdfWriter(TARGET_PDF);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter, new StampingProperties().useAppendMode());
Document document = new Document(pdfDocument)
) {
pdfWriter.setCompressionLevel(0);
Paragraph paragraph = new Paragraph("Hello! This text is added for Fratt");
paragraph
.setWidth(100)
.setBorder(new SolidBorder(new DeviceRgb(0f, 0f, 0.6f), 3))
.setRotationAngle(Math.PI / 4);
Rectangle box = pdfDocument.getFirstPage().getCropBox();
document.showTextAligned(paragraph,
(box.getLeft() + box.getRight()) / 2,
(box.getTop() + box.getBottom()) / 2,
1,
TextAlignment.CENTER,
VerticalAlignment.MIDDLE,
0);
}
(ShowTextAtPosition test testAddCenteredBorderedParagraph)
This adds the following rotated framed text to the first page of the source document:
In case of my example document the following "blob" is added after the original %%EOF:
16 0 obj
<</CreationDate(D:20060808104513+02'00')/Creator(TeX)/ModDate(D:20201221183247+01'00')/PTEX.Fullbanner(This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4) kpathsea version 3.5.4)/Producer(pdfeTeX-1.21a; modified using iText® 7.1.14-SNAPSHOT ©2000-2020 iText Group NV \(AGPL-version\))>>
endobj
19 0 obj
<</BaseFont/Helvetica/Encoding/WinAnsiEncoding/Subtype/Type1/Type/Font>>
endobj
1 0 obj
<</Font<</F1 19 0 R/F73 6 0 R/F8 9 0 R>>/ProcSet[/PDF /Text]>>
endobj
2 0 obj
<</Contents[18 0 R 3 0 R 17 0 R]/MediaBox[0 0 595.2756 841.8898]/Parent 10 0 R/Resources 1 0 R/Type/Page>>
endobj
17 0 obj
<</Length 568>>stream
Q
q
0.70711 0.70711 -0.70711 0.70711 358.89 -66.87 cm
q
0 0 0.6 rg
251.62 401.57 m
351.62 401.57 l
354.62 404.57 l
248.62 404.57 l
251.62 401.57 l
f
Q
q
0 0 0.6 rg
351.62 401.57 m
351.62 374.93 l
354.62 371.93 l
354.62 404.57 l
351.62 401.57 l
f
Q
q
0 0 0.6 rg
351.62 374.93 m
251.62 374.93 l
248.62 371.93 l
354.62 371.93 l
351.62 374.93 l
f
Q
q
0 0 0.6 rg
251.62 374.93 m
251.62 401.57 l
248.62 404.57 l
248.62 371.93 l
251.62 374.93 l
f
Q
q
BT
/F1 12 Tf
255.94 391.23 Td
(Hello! This text is)Tj
( )Tj
ET
Q
q
BT
/F1 12 Tf
262.27 377.91 Td
(added for Fratt)Tj
ET
Q
Q
endstream
endobj
18 0 obj
<</Length 2>>stream
q
endstream
endobj
xref
1 2
0000009898 00000 n
0000009976 00000 n
16 4
0000009500 00000 n
0000010098 00000 n
0000010715 00000 n
0000009809 00000 n
trailer
<</ID [<98bc0d0e9347d0a066ab140ebd9ce62c><fa0dda3a13b826a6ecbd129bb048a3d0>]/Info 16 0 R/Prev 9003/Root 15 0 R/Size 20>>
%iText-7.1.14-SNAPSHOT
startxref
10764
%%EOF
Because of the pdfWriter.setCompressionLevel(0) in the code, the content stream is not compressed and you can read and understand it easily.

Related

Assigning an ExtGState to a stroke in a PDF does not work

I am trying to assign a ExtGState object to a stroke in a image stream in a pdf. The ExtGState should set the Blend Mode of the stroke. But no matter what, it does not work and the PDF Specification does not help.
I hope somebody here knows what to do.
Here is my ExtGState Object:
5 0 obj
<< /Type /ExtGState
/BM Multiply
>>
endobj
My Proc Set of the Page:
4 0 obj
<< /ProcSet [/PDF /Text]
/ExtGState << /GS1 5 0 R
>>
>>
endobj
And finally the image stream
6 0 obj
<< >>
stream
3 w
0 0 0 RG
1 J
1 j
178 2658 m
310 2322 l
S
10 w
0.13725490196078433 0.4196078431372549 0.5568627450980392 RG
1 J
1 j
/GS1 gs
[3 5] 6 d
152 2423 m
400 2600 l
S
endstream
endobj
I am using /GS1 gs in order to assign the ExtGState Object to my second stroke. The first stroke is just for checking, if the blend mode works (which does not).
Here you can find my whole pdf: https://pastebin.com/nwGBb7vB
It is supposed to look like this:
You have a syntax error in your graphics state dictionary:
5 0 obj
<< /Type /ExtGState
/BM Multiply
>>
endobj
The Multiply needs to be a PDF name object. In particular it has to start with a slash:
5 0 obj
<< /Type /ExtGState
/BM /Multiply
>>
endobj
After this change you get:

How can I fix the damage in PDF files requested using MS2XML.XMLHTTP?

Friends, this is my first question here... I've been facing some problems when downloading a PDF buffer using MS2XML.XMLHTTP. I've been using Genexus to do so but I also tried right in pure Visual Fox Pro. The problem is that when I send the ResponseText to a string variable, some characters are replaced by question marks, the sam happens when I send the ResponseText to a pdf or txt file. The object created in MS2XML.XMLHTTP.6.0 does not allow using the ResponseBody property. Any thoughts on how could I solve it using MS2XML.XMLHTTP? Thanks.
oHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
oHTTP.Open("GET", 'https://homologacao.plugboleto.com.br/api/v1/boletos/impressa /lote/NIKLfYBWz',.F.)
oHTTP.setRequestHeader("content-type", "application/pdf")
oHTTP.Send()
? oHTTP.responseText
I've received someething like the following (full of question marks):
%PDF-1.4 %??2 0 obj <</ColorSpace/DeviceRGB/Subtype/Image/Height 38/Filter/DCTDecode/Type/XObject/Width 149/BitsPerComponent 8/Length 2619>>stream ???JFIF H H ??C  !"$"$??C?? & ?" ?? ??? } !1AQa"q2???#B??R?$3br? %&'()456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz???????????????????????????????????????????????????? ??? w !1AQaq"2?B????#3R??$4?&'()56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz??????????????????????????????????????????????????? ? ?h???8OO????a&??G?3? ?p1???|b?o?? ??x?_???%??E?en9??T???T>.????JG??rx??????????h?w????????:?!?????????jlm?Tn???????u??? ??Ey?PA?? (?? (?? (?? (??>B???;.??3?e??J??~?F??? y,s??i???#?m=kw???? ?[?K#????vR#G??^$?????k?[??BSu??#???M??????? _??Z?Fo??????/??*x?¾ mn??{)???80??s]W?x? ??+??k??=????????8 ?|D?c?j???h???$?8???:c??(???M/?Ze??;O?[?J????? '?~/j~!???n?urm???1^ITl;?3?%[?b??~?&<=u?Y\x??W6?¬$2?q?1?;??qc??_??qk>?&?v?????,??F?{??x???s??????{?k????r8.??<P?|,????q?]I?]?e???p;??/?W?x??)???A?????&)??dc,?d7?J?s??m?>???????!??9Va?? c???Zv??x+?b?wd??f?8a????????,6????????x?? ?-????<9F???????[~$?{??o???X??????y?ZgQ?#8??ox;? ???|??mZ?? I~a?k ~P ?? j??? '?c??4?F??l??$?8???(?'?"?.?????,????9V?????d???????UU)??? ???o?&???4?7?Z?? g???y?
W[?????d?Q$?#??^mZ???B Z(??QE ?Q#Q#Q#Q#Q#??endstream endobj 6 0 obj <</Filter/FlateDecode/Length 846>>stream x??V???+?q??z???'U./h???{???d?U??xf??PQ?O???unA???x1?0 50]#?\?T?y?B?s?9B? ? 2|????C???2t????k?U??]??]{? ?????s?AH??????h?"w????? f?????i??? ??>?9?8?#??"??G?$???<??0???S?2??sn?n??^?5?\FN?o1?4?~4~??Qe=&?T[???????Z??x??????k?????0z'#?;?'a??a?f~?q?~8ZH~?m???????Mm?p?#hh{????W7??????
?8?Olk'?A|?[???P?5?????uGxRr#?pw<$y?n??kD ???0??ih??9?5v??0?_}iG?Dq?8??_U??5a?????k????d???M??2???C(??;t2uA]z ??6A??o?t?}d????[?<;??R?iO8n??f???40???S?aVX????Y?p2N?eq]N?VeE?>??/V0?]MV?&???.aZ-???z2???????????8o??3?S????????gf??B?'6??]?J endstream endobj 8 0 obj<</Contents 6 0 R/Type/Page/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]/Font<</F1 3 0 R/F2 4 0 R>>/XObject<</Xf1 1 0 R/Xf2 5 0 R/img0 2 0 R>>>>/Parent 7 0 R/MediaBox[0 0 595 842]>>endobj 3 0 obj<</Subtype/Type1/Type/Font/BaseFont/Helvetica-Bold/Encoding/WinAnsiEncoding>>endobj 4 0 obj<</Subtype/Type1/Type/Font/BaseFont/Helvetica/Encoding/WinAnsiEncoding>>endobj 5 0 obj <</Subtype/Form/Filter/FlateDecode/Type/XObject/Matrix [1 0 0 1 0 0]/FormType 1/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]>>/BBox[0 0 292.41 39]/Length 474>>stream x?m??e1C?#???1Ly}??Ua??>????r?R?????r?7gr??a???\??PTj??p???s????~m"???:K??T???1????Gw({c???? !???p?rB g M?QG*?PC ?o??v?????'n[!n2??}*?g}r?G??J?R"aI?S??q ???d;??-??m?????y?lCp??[B(=?L??G[]2??)???
?8???9L????]y)?B??t<??E??????I????????#1?]$? ??h??6?Q[A)?8????<???????z??8c??????s??R????%6? endstream endobj 1 0 obj <</Subtype/Form/Filter/FlateDecode/Type/XObject/Matrix [1 0 0 1 0 0]/FormType 1/Resources<</ProcSet[/PDF/Text]/ExtGState 9 0 R/Font 10 0 R>>/Length 1818/BBox[0 0 595.28 419.53]>>stream x???Ko???t??V|???4-$n??{Pm%]6????P????:?$E?6i ?4??????d??m.U?7L????E??"???e?^r??c????????S#'?????????X?bz?k.J?3!?)??{?V ??'VS1?????8??L???? fU"&Fx?v?Q?9G??EL]?iLIN}?C?i~??4???J<??P?4Ec??F??P%c=!
?=?!U??P?T?b]???k>+¹&?5?9A5ai?"???G????H???J??J?#N??#?3dP??#O=A6%??&dO?eU&5;?Q?#M?'??.??8????P???z!
'??j??O?8??7?
?f????????u???^???:N#?q?Y?xN6Kjv B??Z?????<?? Dx^?J??;A1?3s /?S?k?8??'?9?n??.w?s????g????? M<0????????<?,p???xG!pv?v??O??,?!pv?v?P??l?O??3?M)[????????x??D?h????Z??&i)??,????k???k????j*???-?#?'?x9D)]?J:?=?G??1r? ???!???X?I???|n?q}?=?6?:ðl??????_T??[??_?AC???YI??????+??]??}f}S?P<{??EY??#??q?pah???,Pj?????v~??a?c???{R?7????? ?E~?mv??v?6??t ?? ??Y?????&???F?7P'?e?????R&??(?#????????)?2???P??j?.I??s4?|???s???$z????????E?P??x?{??tU?????????|??b?'?jH????f6 .?g? ?"?????iVR";;?P?'????F?????*??^?b?Nu6rO6? ?Xn[~>t???x2????n?[?D^????6C4O??vx??p?#???$?ru??Yj??55,?Z???u?&?yy????%????+????aMk?3 ???v?1M\A&?q???? '?Sf?,??ce)? ??x?????P?#?Ea&y????/n??~8j???????Co????????????%?? ????????5C???(?<??}???OA???a$?)J?`?!vd????T????D{,?}^?e?]]#?'#T?v??J??;??4?G?e???&b?Bl???K????.?t=s?i?;6.> ?????:?H??Z}:.V? ??) endstream endobj 9 0 obj<</R7 11 0 R/R9 12 0 R>>endobj 10 0 obj<</R8 13 0 R>>endobj 11 0 obj<</TK true/Type/ExtGState/BM/Normal/OPM 1>>endobj 12 0 obj<</Type/ExtGState/SA true>>endobj 13 0 obj<</Subtype/Type1/Type/Font/BaseFont/Helvetica/Encoding 14 0 R>>endobj 14 0 obj<</Type/Encoding/Differences[225/aacute/acircumflex/atilde 231/ccedilla 233/eacute/ecircumflex 243/oacute 245/otilde 250/uacute]>>endobj 7 0 obj<</Kids[8 0 R]/Type/Pages/Count 1>>endobj 15 0 obj<</Type/Catalog/Pages 7 0 R>>endobj 16 0 obj<<>>endobj xref 0 17 0000000000 65535 f 0000004765 00000 n 0000000015 00000 n 0000003908 00000 n 0000004000 00000 n 0000004087 00000 n 0000002787 00000 n 0000007190 00000 n 0000003700 00000 n 0000006794 00000 n 0000006833 00000 n 0000006863 00000 n 0000006922 00000 n 0000006965 00000 n 0000007044 00000 n 0000007240 00000 n 0000007285 00000 n trailer<</Info 16 0 R/ID []/Root 15 0 R/Size 17>>startxref 7305 %%EOF
Since a PDF is a binary file and not a text file, it is quite normal you would see ? and all sorts of other non-printable characters. Instead save it to a file on disk and open with something like ShellExecute. ie:
oHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
oHTTP.Open("GET", 'https://homologacao.plugboleto.com.br/api/v1/boletos/impressa /lote/NIKLfYBWz',.F.)
oHTTP.setRequestHeader("content-type", "application/pdf")
oHTTP.Send()
Local lcFileName
lcFileName = Forcepath(Sys(2015)+'.pdf', Sys(2023))
Strtofile(oHttp.responseText, m.lcFileName)
Declare Long ShellExecute In "shell32.dll" ;
long HWnd, String lpszOp, ;
string lpszFile, String lpszParams, ;
string lpszDir, Long nShowCmd
ShellExecute(_vfp.HWnd,'',m.lcFileName,'','',1)
EDIT: It was not a job MSXML2.XmlHttp. You simply download the file as a PDF and open it:
Local lcFileName, lcRemote
lcRemote = 'https://homologacao.plugboleto.com.br/api/v1/boletos/impressao/lote/NIKLfYBWz'
lcFileName = Forcepath(Sys(2015)+'.pdf', Sys(2023))
If (getFileFromURL(m.lcRemote, m.lcFileName) = 0)
Declare Long ShellExecute In "shell32.dll" ;
long HWnd, String lpszOp, ;
string lpszFile, String lpszParams, ;
string lpszDir, Long nShowCmd
ShellExecute(_vfp.HWnd,'',m.lcFileName,'','',1)
Endif
Procedure getFileFromURL
Lparameters tcRemoteFile,tcLocalFile
Declare Integer URLDownloadToFile In urlmon.Dll;
INTEGER pCaller, String szURL, String szFileName,;
INTEGER dwReserved, Integer lpfnCB
Return URLDownloadToFile(0, m.tcRemoteFile, m.tcLocalFile, 0, 0)
endproc

Visible Signature in a PDF file

I 'm trying to create a visible signature in a PDF file.
Taking a simple PDF "hello world" file:
%PDF-1.7
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj
5 0 obj % page content
<<
/Length 44
>>
stream
BT
10 05 TD
/F1 12 Tf
(Hello, world!) Tj
ET
endstream
endobj
xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
And signing it with a text to appear "Yolo" at some position at the first page produces this:
%PDF-1.7
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj
5 0 obj % page content
<<
/Length 44
>>
stream
BT
10 05 TD
/F1 12 Tf
(Hello, world!) Tj
ET
endstream
endobj
xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
8 0 obj
<</F 132/Type/Annot/Subtype/Widget/Rect[0 0 0 0]/FT/Sig/DR<<>>/T(Signature1)/V 6 0 R/P 3 0 R/AP<</N 7 0 R>>>>
endobj
6 0 obj
<</Contents <...>/Type/Sig/SubFilter/ETSI.CAdES.detached/M(D:20190626125540+00'00')/ByteRange [0 824 60826 1401]/Filter/Adobe.PPKLite>>
endobj
9 0 obj
<</BaseFont/Helvetica/Type/Font/Subtype/Type1/Encoding/WinAnsiEncoding/Name/Helv>>
endobj
10 0 obj
<</BaseFont/ZapfDingbats/Type/Font/Subtype/Type1/Name/ZaDb>>
endobj
12 0 obj
<</Length 35>>stream
BT
1 15 TD
/Helv 6 Tf
(Yolo) Tj
ET
endstream
endobj
7 0 obj
<</Type/XObject/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]>>/Subtype/Form/BBox[0 0 0 0]/Matrix [1 0 0 1 0 0]/Length 8/FormType 1/Filter/FlateDecode>>stream
xœ
endstream
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 4 0 R>>>>/Contents [12 0 R 5 0 R]/Annots[8 0 R]>>
endobj
2 0 obj
<</Type/Pages/MediaBox[0 0 200 200]/Count 1/Kids[3 0 R]>>
endobj
1 0 obj
<</AcroForm<</Fields[8 0 R]/DR<</Font<</Helv 9 0 R/ZaDb 10 0 R>>>>/DA(/Helv 0 Tf 0 g )/SigFlags 3>>/Type/Catalog/Pages 2 0 R>>
endobj
11 0 obj
<</Producer(AdES Tools https://www.turboirc.com)/ModDate(D:20190626125540+00'00')>>
endobj
xref
0 4
0000000000 65535 f
0000061604 00000 n
0000061529 00000 n
0000061414 00000 n
6 7
0000000804 00000 n
0000000000 65535 f
0000000679 00000 n
0000060952 00000 n
0000061050 00000 n
0000061746 00000 n
0000061127 00000 n
trailer
<</Root 1 0 R/Prev 492/Info 11 0 R/Size 17/ID[<4BB225C2F629BB21464F66FBF2FED264><8E3C9AD8354C66931EAAC282088455EA>]>>
startxref
61846
%%EOF
So there is an object in the PDF that shows some text in the first page:
12 0 obj
<</Length 35>>stream
BT
1 15 TD
/Helv 6 Tf
(Yolo) Tj
ET
endstream
endobj
My problem is now that this object is treated like a common text object in adobe reader. I want it, when clicked, to go to the digital signature, like how Adobe Acrobat signs the documents.
What do I miss? Is there a parameter in the digital signature (The 6 or 8 number object) or in any of the other objects my app puts in the new PDF that links the text object with the signature?
Thanks a lot.
Your object 8
8 0 obj
<</F 132/Type/Annot/Subtype/Widget/Rect[0 0 0 0]/FT/Sig/DR<<>>/T(Signature1)/V 6 0 R/P 3 0 R/AP<</N 7 0 R>>>>
endobj
is an AcroForm form field for signatures (as the FT entry with value Sig tells us). At the same time, though, this object also is a form field widget annotation (as can be seen in the Type and Subtype entries). Form field widget annotations are the visual representations of form fields, and if a form field has only one representation, the widget can be merged with the form field as in your object.
In your case the annotation has a 0x0 size (/Rect[0 0 0 0]), i.e. invisible. To have a visible representation, you need an annotation rectangle that does not vanish.
The content that is displayed is defined in the normal appearance /AP<</N 7 0 R>> which points to object 7.
7 0 obj
<</Type/XObject/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]>>/Subtype/Form/BBox[0 0 0 0]/Matrix [1 0 0 1 0 0]/Length 8/FormType 1/Filter/FlateDecode>>stream
xœ
endstream
endobj
At first glance this looks pretty empty, even after decompression.
Thus, what you have to do is
choose a non-vanishing rectangle for your signature form field annotation,
adapt the BBox of the normal appearance stream to that annotation rectangle, and
create a non-empty content in the normal appearance stream of that annotation instead of adding page content.
Furthermore you should fix obvious errors in your PDF, e.g.
object 7, your signature field normal appearance, is marked as free in your cross references
your trailer claims a size of 17
For details please study the PDF specification ISO 32000. Part 1 is published for download by Adobe at https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
In particular sections
12.5 "Annotations"
12.7 "Interactive Forms"
12.8 "Digital Signatures"

Open PDF, save DOCX bugs out after a few dozen documents and outputs garbled/corrupted files

I have a few thousand PDF files that I needs to convert to DOCX. I wrote the following macro:
Sub convertPDFtoDOCX()
'
' convertPDFtoDOCX Macro
'
'
Dim docDirectory As String
Dim pdfDirectory As String
Dim docPath As String
Dim doc As Document
docDirectory = "C:\Users\<USER>\DOCX\"
pdfDirectory = "C:\Users\<USER>\PDF\"
pdfFile = Dir(pdfDirectory & "*.*")
Do While pdfFile <> ""
docPath = docDirectory & pdfFile & ".docx"
Set doc = Documents.Open(FileName:=pdfDirectory & pdfFile)
ActiveDocument.SaveAs2 FileName:=docPath, FileFormat:=wdFormatXMLDocument
Documents.Close
pdfFile = Dir
Loop
End Sub
It works fine for the first few dozen documents, but then starts outputting "corrupted files", that aren't docx and can't be opened with a PDF viewer either. There is no error message when it starts bugging out. The problem doesn't come from the PDF files, since if I stop the macro and start it again on the same documents, they are correctly converted the second time.
"Corrupted" files looks like this:
%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(fr-FR) /StructTreeRoot 91 0 R/MarkInfo<</Marked true>>>>
endobj
2 0 obj
<</Type/Pages/Count 21/Kids[ 3 0 R 27 0 R 31 0 R 42 0 R 44 0 R 46 0 R 48 0 R 55 0 R 59 0 R 61 0 R 63 0 R 65 0 R 67 0 R 69 0 R 71 0 R 73 0 R 75 0 R 77 0 R 79 0 R 81 0 R 88 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 9 0 R/F3 11 0 R/F4 16 0 R/F5 18 0 R/F6 20 0 R/F7 25 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.2 841.8] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 4428>>
stream
xœ­\Ën7Ýð?Ô.Ý ¨Ä7«‚ ¹%e4ð+²’Y$Yt¤¶£A,9RÛÈüÕ|Æ|ÆìÙäæ^²ÈzðQ-¦ È]U¼$//:<yØÞ¾__o«££Ã“ív}ýóæ¦úþðÅýv{ÿñÇë}Ú¾]¸½[ooïï
What causes the issue and how can I fix it?
I use Word 2016 on Windows 10.
I don't think you can fix the issue without a patch from Microsoft. Meanwhile, you can move your code to run outside Word and create a new Word.Application object for each iteration.

Is there a text string variable type in Adobe PDF specification?

In the below example (from gnupdf.org/Introduction_to_PDF; also related: How to generate plain-text source-code PDF examples that work in a document viewer?), text is written verbatim using:
(Hello, world!) Tj
Is there a way I could store this "Hello, world!" in a variable (dictionary?), say /MyStringVar, and then output it multiple places using something like:
(/MyStringVar) Tj
(I've tried the above, couldn't get it to work; /MyStringVar is interpreted verbatim)
Here is the code, hello.pdf:
%PDF-1.4
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj
5 0 obj % page content
<<
/Length 44
>>
stream
BT
70 50 TD
/F1 12 Tf
(Hello, world!) Tj
ET
endstream
endobj
xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
The PDF does not have something like a variable like PostScript does. What may come close to what you are trying to achieve (output the same text multiple places) is a form XObject. Just like a page it has a content stream with graphics objects such as (Hello, world!) Tj, and it can be be drawn on a page (or another XObject) through the graphics Do operator. Its operand corresponds to a key in the XObject dictionary in the Resources dictionary of the page. The PDF would look something like this. (Note that stream lengths, the cross references table and the trailer or no longer valid so consider this pseudo-PDF.)
%PDF-1.4
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
/XObject <<
/A 6 0 R % XObject /A is obj 6 0
>>
>> % /Resources must close here
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj
5 0 obj % page content
<<
/Length 44
>>
stream
BT
70 50 TD % this has no effect on `/A Do` - only on the "manual" `Tj`
/A Do % do the drawing of XObject A
/F1 12 Tf % without this line: "Error: No font in show;"
% if without TD, then the next text is just appended
%-10 50 TD
0 0 TD % "Td/TD move to the start of next line"; but here like \r
(Hello, world - manual!) Tj
ET
endstream
endobj
6 0 obj
<< /Type /XObject
/Subtype /Form
/FormType 1
/BBox [ 0 0 1000 1000 ]
/Matrix [ 1 0 0 1 0 0 ]
/Resources << /ProcSet [ /PDF ] >>
/Length 58
>>
stream
%70 50 TD % without this `TD` setting, `/A Do` places this in 0,0 - bottom left corner
/F1 12 Tf
(Hello, world!) Tj
endstream
endobj
xref
0 7
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
0000000450 00000 n
trailer
<<
/Size 7
/Root 1 0 R
>>
startxref
600
%%EOF
Output in evince:
EDIT The text in the form XObject appears at the lower left corner because the current transformation matrix equals the identity matrix at the time of the show string operation. The initial CTM of the form XObject equals the concatenation of [the CTM in the parent stream when Do is invoked] and [the Matrix entry in the form XObject dictionary]. Which is identity in this case. The text matrix is not propagated from the parent stream to the form XObject.
As an addendum to #Frank's answer:
Deviations
There are some deviations from the PDF specification in the PDF in the answer.
in the page content stream (object 5) the XObject A is drawn from within a text object:
BT
70 50 TD % this has no effect on `/A Do` - only on the "manual" `Tj`
/A Do % do the drawing of XObject A
This is not allowed, cf. section 8.2, especially figure 9 at its end: XObjects may only be inserted at the page description level of the content of a page or XObject.
in the XObject content stream (object 6) a font is referenced
/F1 12 Tf
but no font resources are defined:
/Resources << /ProcSet [ /PDF ] >>
This is not allowed, The Tf operator shall specify the name of a font resource—that is, an entry in the Font subdictionary of the current resource dictionary (section 9.2.2 of the specification) which here is the resource dictionary of the XObject, not the page.
In very early versions of the PDF format a XObject could inherit resources of the page if it omitted the Resources entry... This construct is obsolete and should not be used by conforming writers (section 7.8.3 of the PDF specification) and in the example at hand, the Resources entry is not even omitted after all.
in the XObject content stream (object 6) the text showing operator Tj is used outside a text object:
stream
%70 50 TD % without this `TD` setting, `/A Do` places this in 0,0 - bottom left corner
/F1 12 Tf
(Hello, world!) Tj
endstream
This is not allowed, cf. section 8.2, especially figure 9 at its end: Text showing operators are only allowed in text objects, and as XObject shall not be used inside text objects, this stream cannot be considered to reside in one.
As it displays the XObject nonetheless, evince seems to be quite forgiving concerning PDF validity issues, even more forgiving than the Adobe Reader which already is very forgiving but shows that PDF as:
i.e. it does not display the XObject at all.
Adapted sample
This section contains an adapted sample which is nearer to the specification.
Furthermore the wish of the OP to position the XObject more freely is taken into account:
%PDF-1.4
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/XObject <<
/A 6 0 R
>>
>>
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj
5 0 obj % page content
<<
/Length 588
>>
stream
% draw xobject at 0, 0
/A Do
% draw xobject at 20, 180
q
1 0 0 1 20 180 cm
/A Do
Q
% draw xobject at 100, 100, with different scales and rotations applied
q
1 0 0 1 100 100 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
0.7 0.5 -0.5 0.7 0 0 cm
/A Do
Q
% draw xobject at 120, 180, skewed somewhat
q
1 0 0.3 1 120 180 cm
/A Do
Q
endstream
endobj
6 0 obj
<< /Type /XObject
/Subtype /Form
/FormType 1
/BBox [ 0 0 1000 1000 ]
/Matrix [ 1 0 0 1 0 0 ]
/Resources <<
/ProcSet [ /PDF ]
/Font <<
/F1 4 0 R
>>
>>
/Length 130
>>
stream
BT
/F1 12 Tf
% To not cut off stuff below the base line, namely parts of the comma
1 0 0 1 0 3 Tm
(Hello, world!) Tj
ET
endstream
endobj
xref
0 7
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
0000000450 00000 n
trailer
<<
/Size 7
/Root 1 0 R
>>
startxref
600
%%EOF
(Cross reference entries and stream lengths surely are wrong.)
This results (as seen in Adobe Reader):
All the "Hello, world!" instances are generated using the single XObject of the PDF.