How do I get the name of Postscript file object? - filenames

Using Postscript, I want to retrieve the name of the file being executed. I know that I can get the file object using 'currentfile', but how do I get its name?
I want to include the name in the document.
Thanks.

Ghostscript has a .filename operator that'll do it. No idea about the portability. Tiny viewable example:
/Times-Roman findfont
12 scalefont
setfont
newpath
100 200 moveto
currentfile .filename pop show

Related

Searching filename in a textfile

I need to search through a text file and find filenames and folders. If a filename is found, display a text and if a folder is found, display another text.
can anyone suggest how to do it?
cat filenames.txt | while read line
do
echo $line
done

Setting MS Word Paper Size to PostScript Custom Page Size programmatically

The Scene
What I am trying to do is programmatically convert an MS word (.docx) file to a PostScript file (.ps). I am doing this by creating a PostScript printer with one of the default PostScript printer drivers bundled with MS Windows and then printing the Word doc using this printer from Word. The catch is I am trying to do this with a custom page size, i.e. the height and width do not match any of the standard paper sizes ie. A4, A3, Letter etc
If I do this manually in MS Word everything works as expected BUT only if I set the Page Setup paper size to PostScript Custom Page Size. If its not set to this value the output page size is one of the pre-defined page sizes i.e. B5 (default).
But if I set the Paper size to PostScript Custom Page Size and then print with the same printer the output file is the correct height and width as set on the document, in this case, 181mm x 260mm
The Problem
I cant find a way to programmatically set the Page Setup Paper size value to the value "PostScript Custom Page Size", and if I dont set this value then the custom height and width are ignored.
What have I already tried
I have tried doing the following:
Using the Word COM objects in PowerShell
...
#create com object
$word = New-Object -com Word.Application
#dont open word UI
$word.visible = $false
#open input file
$doc = $word.Documents.Open($inputfile)
$width = [double]$word.MillimetersToPoints($widthInMM)
$height = [double]$word.MillimetersToPoints($heightInMM)
#set page setup width and height
$doc.PageSetup.PageWidth = $width
$doc.PageSetup.PageHeight = $height
#save the changes
$doc.Save()
$pBackGround = 0
$pAppend = 0
$pRange = 0
#print the file to default printer (i.e. ps printer)
$doc.printout([ref]$pBackGround,[ref]$pAppend,[ref]$pRange,[ref]$outputfile)
...
Looking at the MS docs, the PageSetup object has a PageSize property, which says the following on the page
Setting the PageHeight or PageWidth property changes the PaperSize property to wdPaperCustom.
And looking at the PaperSize property its an enum, WdPaperSize, which has the following values
But as you can see by the quote above, if you set the height and width the paper size will be set to the wdPaperCustom value. BUT this is not the same as PostScript Custom Page Size, which from what I have read this is not one of the valid enum values.
Pure PowerShell
The only way to print a word (docx) file is using the Start-Process command with verb Print. If you dont want to use the default printer you can pipe it to the out-printer command
Start-Process $file -verb Print | out-printer -name "PrinterName"
This prints the document, but actually opens up Word to print which has 2 problems
a. You have to manually specify the output file name
b. It still uses the MS Word default page settings
Recording a VBA Macro: Recording setting the correct paper size doesn't record setting it to PostScript Custom Page Size. This is what the macro looks like
With Selection.PageSetup
.LineNumbering.Active = False
.Orientation = wdOrientPortrait
.TopMargin = MillimetersToPoints(13)
.BottomMargin = MillimetersToPoints(13)
.LeftMargin = MillimetersToPoints(13)
.RightMargin = MillimetersToPoints(13)
.Gutter = MillimetersToPoints(3)
.HeaderDistance = MillimetersToPoints(12.5)
.FooterDistance = MillimetersToPoints(12.5)
.PageWidth = MillimetersToPoints(181)
.PageHeight = MillimetersToPoints(260)
.FirstPageTray = wdPrinterDefaultBin
.OtherPagesTray = wdPrinterDefaultBin
.SectionStart = wdSectionNewPage
.OddAndEvenPagesHeaderFooter = True
.DifferentFirstPageHeaderFooter = True
.VerticalAlignment = wdAlignVerticalTop
.SuppressEndnotes = False
.MirrorMargins = True
.TwoPagesOnOne = False
.BookFoldPrinting = False
.BookFoldRevPrinting = False
.BookFoldPrintingSheets = 1
.GutterPos = wdGutterPosLeft
End With
As you can see above there is no mention of the paper size being set to any value.
I haven't tried this in c# or .NET because they all seem to use the COM Object API which reverts to my issues in 1.
I think the issue is that Word seems to ignore the printer settings, even Microsoft seems to admit this
With the printer I am creating a PostScript printer defining the specific paper size, height and width, but MS Word when printing ignores these settings and uses its own default settings. Even though the height and width of the pages in Word are set correctly its the paper size property that seems to be messing things up.
So the only logical thing I can think of is remove Word from the mix. The issue there is I cant find anything that handles Word properly. You can just send the file to the printer say in PowerShell, but it seems to still open Word and use the Word settings again.
Does anyone know of a way around this or a way to programmatically set the paper size to PostScript Custom Page Size
For anyone interested in how I solved this, well I say solved but its more of a different solution to achieve the same outcome.
I still have not found a way to change the paper size to PostScript Custom Page Size but instead I was able to get the paper size to change based on the width and height set on the document (as per the documentation), which feels like a better solution to me. So these are the steps I took to solve it:
Chose a PostScript driver that I want to use. I decided to use the Xerox PS Class Driver, which is a PostScript driver bundled with Windows.
Find where the driver is located. Printer drivers on Windows are located in following directory
C:\Windows\System32\DriverStore\FileRepository\
You can located the driver you are after using the following grep like command
findstr /S /I /M /C:"Xerox PS Class Driver" C:\Windows\System32\DriverStore\FileRepository\*.*
Edited the PPD file and added the paper size that I am looking for and set it as the default paper size. The most import part to update is PageSize, it provides an invocation value to invoke supported page sizes. I removed all other page sizes and just added the one I was after, calling it Custom
*% Page Size
*OpenUI *PageSize: PickOne
*OrderDependency: 40 AnySetup *PageSize
*DefaultPageSize: Custom
*PageSize Custom/Custom: "featurebegin{<< /PageSize [369 522] >> setpagedevice}featurecleanup"
*CloseUI: *PageSize
The values in this snippet are in points so you need to convert them. Above I am using 130 mm x 184 mm ~ 369 points x 522 points
More information about PPD files in its spec document
Added a printer using this adjusted printer driver
Add-Printer -Name "PrinterName" -DriverName "Xerox PS Class Driver" -PortName "file:"
To keep things simple I named my printer the size of the page i.e. 130x184 so its easy to use programmatically
Created a new form in the print server properties which matches my new paper size. To do this open Devices and Printers > Click your printer > Click Printer Server Properties in top menu > Check "Create a new form" checkbox > Add a name and set your dimenions > Save Form
Using my PowerShell code above, when the page dimensions of my document are set correctly and my new printer is set as default, the new form I just created above is found because our printer now handles the new page dimensions. In my script above I am actually setting the printer as default printer on Windows I just left that part out, so either add it to the script or manually set the printer as default printer.
Printed a PostScript file using the new printer
Hope this helps someone else too

Extract the text out of illustrator file, any API or script?

I'm working on a project that has 1000 files with the same typo ex. the file with the word "dogs" in text layer that has to be turned into "dog". these 1000 files has the same typo. Is there anyway that I can write a script to do that in Illustrator? or some API that I can extract out text from the file then edit it (change from "dogs" into "dog") then save it back because I don't want to open 1000 files and do it 1000 times.

PostScript code to un-hide hidden text in PDF

I have a PDF with some hidden text in it.
When I press [CTRL+a] I see the hidden text in my document viewer.
I can copy the text too and I can extract the text via pdftotext, but I can't recolorize the text so I can view the hidden text in the PDF viewer without pressing [CTRL+a].
So I had the idea, that I could use PostScript and change the color for the this text object.
But how can I determine what function sets the color or hides the text?
You cannot use PostScript to achieve what you want. You need to resort to manually editing the PDF file...
There are basically three ways to "hide" text:
It could be white (or any color) text on white (or same color as text) background.
It could be covered by another object, say, a white area, or an image.
It could be using Text Rendering Mode 3 ("3 Tr").
The first two cases I'll not explain here, because they are rather unlikely. For the third case you could proceed like this:
Use qpdf to unpack as many as possible compressed 'streams' inside the PDF, creating what qpdf calls the 'QDF mode' of a PDF:
qpdf --qdf --object-streams=disable input.pdf uncompressed.pdf
Open uncompressed.pdf in a good text editor, such as VIm.
Search for the sequence 3 Tr.
(Text rendering mode 3 is described in the PDF-1.7 specification as "Neither fill nor stroke text (invisible).")
Change it to 1 Tr or 2 Tr and save the file.
(Text rendering mode 1 is "stroke text", mode 2 is "Fill, then stroke text." Mode 1 will only show the outlines...)
Re-compress the file:
qpdf uncompressed.pdf input-modified.pdf
Open the new file input-modified.pdf in your favourite PDF viewer. It should now show the "un-hidden" text.
Update
Having received a sample of a PDF file with "hidden" text from the OP (via private channels), I can confirm now that the hiding indeed is achieved by using white text color (RGB-white).
To make such text visible:
Unpack the PDF, using qpdf --qdf --object-streams=disable in.pdf unpacked.pdf
Search for all occurrences of 1 1 1 rg and 1 1 1 RG. These set the RGB colors to white (the first one non-stroking, the second one for stroking operations).
Comments à la %%Contents for page N: in the QDF-version of the uncompressed PDF file will indicate for which page the color setting is valid. (Note, there may be multiple occurrences of the rg and RG operators, each one setting a different (or the same) color for the next drawing operation.)
Now replace the white colors by black ones, by overwriting the found occurrences with 0 0 0 rg and 0 0 0 RG. Do this not all at once, but one after the other and observe what changes on the respective page after saving the changes. (You may want to avoid painting white text to black if it is on a black background already!)
Firstly, hidden text in PDF is done with a text rendering mode, not a colour. Text rendering mode 3 is 'neither stroke nor fill'. So changing the colour won't help you if this is how the text is drawn. Of course we can't tell if this is how the text has been drawn (but I suspect it is) because you haven't made the PDF file publicly available. In almost all cases if you want to discuss a particular file the best thing to do is make it public.
Secondly, you can't use PostScript to change a PDF file (well, you could write a PostScript program to interpret the PDF file, but that would be hard...)

How to parse text from a plain text file and use the result to highlight a PDF file

Back in 2010, some guy claimed to be capable of doing this:
http://www.mobileread.com/forums/showthread.php?t=103847
"The Kindle stores its annotations in a Mobipocket (".mobi") file for each document and in one long text file named "My Clippings.txt." In this post I describe a system that synchronizes these annotations with PDF versions of the corresponding documents on a computer.
Overview
This system is embodied in an Applescript that parses the My Clippings file and controls the Skim PDF reader. The script first parses the clippings file. It then searches through the clippings and isolates any that come from documents on the kindle matching the filename of the currently open PDF file (the "pertinent clippings"). The script then iterates through each of the pertinent clippings, locating the matching text or location in the PDF document and applying highlights or adding notes where appropriate. The end result is an annotated, printable PDF document that matches the document on the kindle.
You can download the script here: http://dl.dropbox.com/u/2541109/KindleClippings.scpt. Before running the script, be sure to change the value of MyEmail to match your sending address and to verify that the Kindle mount point defined in MyClippingsFile is correct. You'll also need the free Skim PDF Reader.
To use it, send or copy a document file to your kindle. Remember, the kindle supports RTF, DOC, TXT and other common text formats and it will convert them into MobiPocket files internally for easier reading. Make some notes. Then take the same document that you just sent to the kindle and convert it to a PDF, e.g. by using the print to PDF feature in Mac OS X. Be sure to keep the filename the same. Open that same PDF in Skim and run the script. The highlights and notes should appear in the PDF.
If you're interested in how this works, read more on my blog here:
[not longer available]
Sadly, his script is no longer available, nor his blog.
Do you guys know if this is possible? I've been looking for this kind of functionality but can't find it anywhere.
This code, using python and PyMuPDF, works:
import fitz
# the document to annotate
doc = fitz.open("text_to_highlight.pdf")
# the text to be marked
text_list = [
"first piece of text",
"second piece of text",
"third piece of text"
]
for page in doc:
for text in text_list:
rl = page.search_for(text, quads = True)
page.add_highlight_annot(rl)
# save to a new PDF
doc.save("text_annotated.pdf")
The original 'My Clippings.txt' should be manipulated somehow, stringr could work but I found more useful to manipulate the text with multiple selections in Sublime Text---the goal is to have a list of highlights in the form of text_list above.
I am trying to do this using Python + a Windows macro creator (I'm a Win 7 user). You can use this approach to save the file as RTF, DOCX, PDF, etc. So far, it's been reasonably effective. Do note 2 things first:
1- the 'My Clippings' file only saves the text and the page, it does not save the location on the page (e.g., if you highlighted "mammals are animals" on page 15, it will give you this line and the page number, but if there are more than one "mammals are animals" on page 15, it's impossible to know which one you've highlighted). This is specially bad when you've highlighted a generic word, like "animals" or "the". And if you made comments by pressing on a word, this word is the only information you'll get about what in that page the comment refers to (e.g., I pressed on "animals" and the menu popped up, I selected 'Comment'. If "animals" appears 20 times on page 15, I cannot know to which of them my comment is refering).
2- The only way to retrieve the location on the page would be to analyze the *.pds and *.pdt files, inside the *.sdr folder in Kindle's drive ('Documents'). I can make no sense of these files.
In Python, you can run an easy code to extract the information you want from "My Clippings". Then you can use a macro creator to automate the process of copying the text and annotating it to the PDF (using Adobe Acrobat, for example), and then saving the PDF file.
Exemplifying with Adobe Acrobat:
Say I want to save all my highlights to the PDF file. First, I'll create a *.txt file on Python and run a script to copy all the strings related to the highlights to this new txt file (i.e., the highlighted text & the page number). Here's an example of such code (but first, copy and paste the "My Clippings.txt" file to the IDE start folder, e.g.: C:\Python27):
#for python 2.7.6
with open('My Clippings.txt','r') as rf:
with open('My Clippings Output.txt','w') as wf:
access = 0
bookTitle = 'Book Title'#put the book file's name as it's written in "My Clippings.txt"
for x in rf:
if access == 1:
wf.write(x)
if bookTitle in x:
access = 1
#for highlights only, instead of all annotations, include this if statement:
if (' | Added on ' in x) and ('- Your Note ' in x) or ('- Your Bookmark ' in x):
access = 0
if x == '==========\n':
access = 0
Then I'll create a macro to copy the page number in the "My Clippings Output.txt" file (it's inside the same folder you put the "My Clippings.txt" file), paste in Acrobat "page window", find (ctrl+f) the string in the page, then press "highlight". Done!
There's a catch in Acrobat though, the search/find function has a limit of ~28 chars, so your highlighted text can't be longer than that. I still don't know how to circumvent this limitation... I raised this problem here https://superuser.com/questions/884221/how-to-search-and-highlight-long-passages-in-a-pdf-file . As a bypass to the 28 chars limit on Acrobat, you can program the macro to copy using "shift"+"right arrow 28 times", and then use "cut" instead of "copy".
There are many free-to-use and libre macro creators out there, just google and choose the one you like best. For Windows, my favorite one is Pulover's Macro Creator. If you have any doubts about the process you can comment here or PM me. I'd prefer you to comment here, so that I can improve the answer