Word OLE Automation - delete first page and manipulate header and footer - com

I am using PHP to start Word Automation and manipulate word documents, but i guess it can be done in all any other language. What i need to do is quite simple, i need to remove the first page and add header and footer.
Here is my code:
$word = new COM('word.applicantion');
$word->Documents->Open('xxx.docx');
$word->Documents[1]->SaveAs($result_file_name, 12);
Any samples?

This is the way you could do it in VBA. This can likely be ported to PHP fairly simply.
Sub RemoveFirstPageAndAddHeaderFooter()
Dim d As Document
Set d = ActiveDocument
Dim pageOne As Range
Set pageOne = d.Bookmarks("\page").Range
pageOne.Select
Selection.Delete
d.Sections(1).Headers(1).Range.Text = "Some text"
d.Sections(1).Footers(1).Range.InlineShapes.AddPicture "C:\beigeplum.jpg", False, True
End Sub
Note on the ...InlineShapes.AddPicture - the onus would be on you to ensure the picture is the right size. If you want more control over this, you would use .Footers(1).Shapes.AddPicture instead as that let's you set the width/height, top/left, etc.

try
{
$word = new COM("word.application") //$word = new COM("C:\x.docx");
or die("couldnt create an instance of word");
//bring word to the front
$word->Visible = 1;
//open a word document
$word->Documents->Open("file.docx");
// remove first page
$range = $word->ActiveDocument->Bookmarks("\page");
$range->Select();
$word->Selection->Delete();
//save the document as docx
$word->Documents[1]->SaveAs("modified_file.docx", 12); // SaveAs('filename', format) // format: 0 - same?, 1 - doc?, 2 - text, 4 - text other encoding
}
catch(Exception $e)
{
echo "error class.document.php - convert_to_docx: $e 20100816.01714";
}
//close word
if($word)
$word->Quit();
//free object resources
//$word->Release();
$word = null;

Related

Photoshop action for changing text number and saving with variable

I have a lot of graphic buttons that I need to make. I have 2 layers
TEXT (This is going to be numbers 1-48 for instance)
White Button image
I'm not sure how to go about writing this action or if I need a script. I need to have the text layer start at 1 and follow this progression.
save file w1.png (this yields a png with a button labeled with a "1"
change text to 2
save file w2.png (this yields a png with a button labeled with a "2"
change text to 3
. . . .
ect. . .all the way to 48. So this would make 48 images automatically. Can this be done with "actions" or do I need to learn scripting?
Save
You'll need a script for this, but it's going to be a rather simple one
function main() {
//this just checks if you have a text layer selected
try {
var textLayer = activeDocument.activeLayer.textItem
} catch (e) {
alert("active layer isn't a text layer");
return
};
var loops = 48,
pngSaveOptions = new PNGSaveOptions(),
outputFolder = Folder.selectDialog('', Folder.desktop); //this will ask for an output folder
for (var i = 0; i < loops; i++) {
var myNum = i + 1;
textLayer.contents = myNum; //this will change layer contents to number only. if you need some text here, write it in quotes like textLayer.contents = "my text" + myNum;
activeDocument.saveAs(new File(outputFolder + "/w" + myNum + ".png"), pngSaveOptions, true, Extension.LOWERCASE);
}
}
app.activeDocument.suspendHistory("temp", "main()");

Using pdfbox - how to get the font from a COSName?

How to get the font from a COSName?
The solution I'm looking for looks somehow like this:
COSDictionary dict = new COSDictionary();
dict.add(fontname, something); // fontname COSName from below code
PDFontFactory.createFont(dict);
If you need more background, I added the whole story below:
I try to replace some string in a pdf. This succeeds (as long as all text is stored in one token). In order to keep the format I like to re-center the text. As far as I understood I can do this by getting the width of the old string and the new one, do some trivial calculation and setting the new position.
I found some inspiration on stackoverflow for replacing https://stackoverflow.com/a/36404377 (yes it has some issues, but works for my simple pdf's. And How to center a text using PDFBox. Unfortunatly this example uses a font constant.
So using the first link's code I get a handling for operator 'TJ' and one for 'Tj'.
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
java.util.List<Object> tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++)
{
Object next = tokens.get(j);
if (next instanceof Operator)
{
Operator op = (Operator) next;
// Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj"))
{
// Tj takes one operator and that is the string to display so lets
// update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
String replaced = prh.getReplacement(string);
if (!string.equals(replaced))
{ // if changes are there, replace the content
previous.setValue(replaced.getBytes());
float xpos = getPosX(tokens, j);
//if (true) // center the text
if (6 * xpos > page.getMediaBox().getWidth()) // check if text starts right from 1/xth page width
{
float fontsize = getFontSize(tokens, j);
COSName fontname = getFontName(tokens, j);
// TODO
PDFont font = ?getFont?(fontname);
// TODO
float widthnew = getStringWidth(replaced, font, fontsize);
setPosX(tokens, j, page.getMediaBox().getWidth() / 2F - (widthnew / 2F));
}
replaceCount++;
}
}
Considering the code between the TODO tags, I will get the required values from the token list. (yes this code is awful, but for now it let's me concentrate on the main issue)
Having the string, the size and the font I should be able to call the getWidth(..) method from the sample code.
Unfortunatly I run into trouble to create a font from the COSName variable.
PDFont doesn't provide a method to create a font by name.
PDFontFactory looks fine, but requests a COSDictionary. This is the point I gave up and request help from you.
The names are associated with font objects in the page resources.
Assuming you use PDFBox 2.0.x and that page is a PDPage instance, you can resolve the name fontname using:
PDFont font = page.getResources().getFont(fontname);
But the warning from the comments to the questions you reference remain: This approach will work only for very simple PDFs and might even damage other ones.
try {
//Loading an existing document
File file = new File("UKRSICH_Mo6i-Spikyer_z1560-FAV.pdf");
PDDocument document = PDDocument.load(file);
PDPage page = document.getPage(0);
PDResources pageResources = page.getResources();
System.out.println(pageResources.getFontNames() );
for (COSName key : pageResources.getFontNames())
{
PDFont font = pageResources.getFont(key);
System.out.println("Font: " + font.getName());
}
document.close();
}

Identify and extract or delete pages of a PDF based on a search string / text (action / javascript)

Good Evening (UK)
I'm trying to filter down a 1500+ page PDF file to only the pages which include a certain text string (typically one or two words). My laptop is locked down with respect to installing more software BUT I have used action(script)s quite a bit
I get the error below when I try to install this action into Abobe Acrobat X Pro (Win 7):
screen dump of error
called "Extract Commented Pages"... supposed to be OK for X and XI this looks like what I want.....
I wondered if there was something simple causing the problem but the actionscript file is rather... busy to say the least.
I used to have an action that I think was based on a legal redaction script but it is filed somewhere!
If you have already got an action that does this or a version of the above that doesn't give the error I get (unable to import the Action.... The file is either invalid or corrupt) I will forever by indebted to your gratitude
Many thanks, have a good weekend!
I recently came across a script found at the following link: http://forums.adobe.com/thread/1077118
I'm having some issues getting the script to run in Acrobat, despite everything looking alright in the script itself. I'll update if I find any errors.
Here is a copy of the script:
// Set the word to search for here
var sWord = "forms";
// Source document = current document
var sd = this;
var nWords, currWord, fp, fpa = [], nd;
var fn = sd.documentFileName.replace(/\.pdf$/i, "");
// Loop through the pages
for (var i = 0; i < sd.numPages; i += 1) {
// Get the number of words on the page
nWords = sd.getPageNumWords(i);
// Loop through the words on the page
for (var j = 0; j < nWords; j += 1) {
// Get the current word
currWord = sd.getPageNthWord(i, j);
if (currWord === sWord) {
// Extract the current page to a new file
fp = fn + "_" + i + ".pdf";
fpa.push(fp);
sd.extractPages({nStart: i, nEnd: i, cPath: fp});
// Stop searching this page
break;
}
}
}
// Combine the individual pages into one PDF
if (fpa.length) {
// Open the document that's the first extracted page
nd = app.openDoc({cPath: fpa[0], oDoc: sd});
// Append any other pages that were extracted
if (fpa.length > 1) {
for (var i = 1; i < fpa.length; i += 1) {
nd.insertPages({nPage: i - 1, cPath: fpa[i], nStart: 0, nEnd: 0});
}
}
// Save to a new document and close this one
nd.saveAs({cPath: fn + "_searched.pdf"});
nd.closeDoc({bNoSave: true});
}

Split a "tagged" PDF document into multiple documents, keeping the tagging

In a project I have to split a PDF document into two documents, one containing all blank pages, and one containing all pages with content.
For this job, I use a PdfReader to read the source file, and two pdfCopy objects (one for the blank pages document, one for the pages with content document) to write the files to.
I use GetImportedPage to read a PdfImportedPage, which is then added to one of the PdfCopy writers.
Now, the problem is the following: the source file is using the "tagged PDF format". To preserve this (which is absolutely required), I use the SetTagged() method on both PdfCopy writers, and use the extra third parameter in GetImportedPage(...) to keep the tagged format. However, when calling the AddPage(...) on the PdfCopy writer, I get an invalid cast exception:
"Unable to cast object of type 'iTextSharp.text.pdf.PdfDictionary' to type 'iTextSharp.text.pdf.PRIndirectReference'."
Anyone has any ideas on how to solve this ? Any hints ?
Also: the project currently refers version 5.1.0.0 of the itext libraries. In 5.4.4.0 the third parameter to GetImportedPage does not seem to be there anymore.
Below, you can find a code extract:
iTextSharp.text.Document targetPdf = new iTextSharp.text.Document();
iTextSharp.text.Document blankPdf = new iTextSharp.text.Document();
iTextSharp.text.pdf.PdfReader sourcePdfReader = new iTextSharp.text.pdf.PdfReader(inputFile);
iTextSharp.text.pdf.PdfCopy targetPdfWriter = new iTextSharp.text.pdf.PdfSmartCopy(targetPdf, new FileStream(outputFile, FileMode.Create));
iTextSharp.text.pdf.PdfCopy blankPdfWriter = new iTextSharp.text.pdf.PdfSmartCopy(blankPdf, new FileStream(blanksFile, FileMode.Append));
targetPdfWriter.SetTagged();
blankPdfWriter.SetTagged();
try
{
iTextSharp.text.pdf.PdfImportedPage page = null;
int n = sourcePdfReader.NumberOfPages;
targetPdf.Open();
blankPdf.Open();
blankPdf.Add(new iTextSharp.text.Phrase("This document contains the blank pages removed from " + inputFile));
blankPdf.NewPage();
for (int i = 1; i <= n; i++)
{
byte[] pageBytes = sourcePdfReader.GetPageContent(i);
string pageText = "";
iTextSharp.text.pdf.PRTokeniser token = new iTextSharp.text.pdf.PRTokeniser(new iTextSharp.text.pdf.RandomAccessFileOrArray(pageBytes));
while (token.NextToken())
{
if (token.TokenType == iTextSharp.text.pdf.PRTokeniser.TokType.STRING)
{
pageText += token.StringValue;
}
}
if (pageText.Length >= 15)
{
page = targetPdfWriter.GetImportedPage(sourcePdfReader, i, true);
targetPdfWriter.AddPage(page);
}
else
{
page = blankPdfWriter.GetImportedPage(sourcePdfReader, i, true);
blankPdfWriter.AddPage(page);
blankPageCount++;
}
}
}
catch (Exception ex)
{
Console.WriteLine("Exception at LOC1: " + ex.Message);
}
The error occurs in the call to targetPdfWriter.AddPage(page); near the end of the code sample.
Thank you very much for your help.
Koen.

How to remove blank pages from PDF using PDFSHarp?

How will i be able to remove a blank page from a PDF file? I have a sample PDF file where the 1st page contains a few strings and a 2nd page with absolutely NOTHING in it. I tried to loop into the pdf pages and get the element count PER page but the funny thing is that i get the same number between the 2 pages =| How did that happen if the 1st page has a few strings and the 2nd page was absolutely blank???
This is my code
Dim inputDOcument As PdfDocument = PdfReader.Open("")
Dim elemountCount As Integer = 0
Dim elemountCount2 As Integer = 0
Dim pdfPageCount As Integer = inputDOcument.PageCount
For x As Integer = 0 To pdfPageCount - 1
elemountCount = inputDOcument.Pages(x).Contents.Elements.Count
elemountCount2 = inputDOcument.Pages(x).Elements.Count
Next
Try to check length of each element:
public bool HasContent(PdfPage page)
{
for(var i = 0; i < page.Contents.Elements.Count; i++)
{
if (page.Contents.Elements.GetDictionary(i).Stream.Length > 76)
{
return true;
}
}
return false;
}
You can try the PDFsharp Document Explorer that comes with PDFsharp to see what the PDF file really contains.
Or load and save the file with a PDFsharp DEBUG build, this will give you a "verbose" file. Viewing that with Notepad could help to understand what the file contains.