How do you edit a Binary Mainframe file in the RecordEditor using a Cobol Copybook (pt1) - record

How do you edit a Single-Record-type Binary Mainframe file in the RecordEditor using Cobol Copybook on a Windows or Linux PC.
Note: This is an attempt to split a very broad question into a series of simpler Question and Answers.

To Edit a File in the RecordEditor with a Cobol Copybook you must first load the copybook and then edit the file
Loading the Cobol Copybook into the RecordEditor
Select Record Layouts >>> Load Cobol Copybook menu options
On the Cobol Load Screen enter the Cobol Copybook and your Mainframe Data file.
The RecordEditor will read the file and try and work out what the file attributes
are.
The Important attributes
Split Copybook: use No Split for a Single record type file
Font (or Charset / encoding) You need to enter the appropriate encoding for the file.
Cp037 (or IBM037) is US-EBCDIC, CP273 (or (IBM273) is German EBCDIC.
Cobol Dialect Mainframe for IBM Mainframe cobol
File Structure This corresponds RECFM attribute on the Mainframe.
use Fixed Length Binary for RECFM=FB
use Mainframe VB (rdw based) Binary for RECFM=VB
The RecordEditor will try display the file using the current attributes on the Righthand side of the
screen. You can play around with attribute.
If you can not get the file display correctly, you could have the wrong Cobol Copybook.
You must use a Cobol Copybook that matches the data exactly, near enough is never good enough.
Viewing (Editting your file)
Once you have loaded you copybook, go to the Open files screen.
....
Select you data file
select you copybook in the Record Layout field
Click on on Edit (The return key should also work).
Generating Java Code to Read the file.
To generate Java~JRecord Code to read the file select Generate >>> Generate Java~JRecord code for Cobol.
The first screen is basically the same as the Import Cobol Copybook screen. This answer has details on generating Java code

Related

What do I should use for saving a text like a pdf in gdscript

I want to make a multi-device software with godot engine and I want to make it as lite as I can, so I just want to use a Line edit node and a button for saving the text but, is there any way to save it as .txt and .pdf files with code or I need an extra plugin?
Writing a plain text file is relatively easy:
var file = File.new()
file.open("user://some_file.txt", File.WRITE)
file.store_string("Some text")
file.close()
PDF is more difficult. I don't think that there are any out of the box solutions. But remember that PDF is also just a text file with specific commands embedded into the text. You would have to study the specifications of a PDF file and then generate the required structures yourself via the method described above.

Manipulating PDF file

I would like to read a PDF file as a text (postscript), add new objects in the file structure and save the final output as a new PDF but If I just copied the PDF PostScript content and paste it in a newly created PDF file (where encoding='ansi'), the file doesn't work.
I am sure that this may be encoding issue but I am not sure what I should do to have a valid PDF file format after manipulating the original PostScript content.
Here is the piece of code that didn't work with me:
pdf_file = open('Input.pdf', 'r', encoding='ansi').read()
pdf_file_bytes = bytearray(pdf_file, 'ansi')
pdf_file = open('Output_bytes.pdf', 'wb').write(pdf_file_bytes)
And as I said, the output PDF is not valid!
First problem; the content of a PDF file is PDF, not PostScript.
Secondly, PDF is a binary file foramt so if you copy and paste it any kind of translation (such as CR/LF) will break it.
You haven't said what programming language your code uses, though it looks like Python. If it is Python then reading the file as binary instead of text might help.
A PDF file is a complex file format consisting of various objects, unless you under low-level syntax of the PDF specification carefully it will be difficult to impossible to arbitrarily replace some bytes with some other bytes and have it result in a still valid PDF file.
More to the point what are you trying to accomplish. E.g. there may be a high-level way of doing whatever you're trying to do that doesn't involve manipulating PDF syntax directly. E.g. if you need to modify a font, add an annotation, set the PDF version, etc. Otherwise if you actually need to modify PDF syntax you need to use a library capable of dealing with low-level objects.

Open pdf file in Microsoft Word using OLE

I am looking for the method (of Word ole-object) which can open pdf in the Microsoft Word.
I want to copy all pages of pdf into doc/docx and add there footers.
Could anybody give the cue how to import pdf?
PS: any sample code for this problem would be great.
Thanks,
Lilya
You need OCR (Optical Character Recognition) engine for converting PDF to document. PDF is generic format and it can include text as image. So it is very hard to convert PDF to document. SAP hasn't got any OCR function for doing this. Maybe OpenText (if customer using it) has this functionality, I haven't got detail information about opentext. You need third party tools for this. You can use online services or command line utilities to converting PDF files to text files easelly if PDF included text, otherwise you need professional SDKs (for example Abbyy Finereader) for doing this.
I used FoxIT PDF Reader to save the PDF file into text file and make a macro to read the text file. Of course, by doing so, you can only get the text, but nothing else.

Using Texmaker, I want to lock the PDF file created so others cannot copy text or print the file

I'm creating PDFs using Texmaker. I would like to create some of the PDF files so that when I give the PDF to others, they are not able to print the file or to copy the text. I know I can do this with some PDF creator applications, but can I do that from some command like program I have with Latex, MikTex and TexMaker?
It wouldn't be effective anyway. There are bits in the pdf format that purport to forbid the user from doing this, but they are really just suggestions that the reader application may or may not act on. There is nothing to stop a user from removing the code that inspects the bits from a free/libre PDF reader, or just to run a tool over the file to remove the restrictions.

Copy+pasting text from PDF results in garbage

I am writing a Master's thesis - NLP system. I have one component - extractor.
It is extracting a plain text from PDF files. There are a few PDF files that can not be extracted correctly. Extractor (PDFBox library) returns a string like this:
"┤xDn║if|d├gDF"Ti&cD╬lh d FÁhis~n ╗xd f«"d┤ffih »h"
or
"10a61a91a22a25a3a27a17a23a20a8a13a14a61a25a17"
I was checking each file that makes this extraction's problem and all these files' text also can not be copy-pasted from PDF Reader (Adobe Reader and FoxIt reader). Viewing them in this readers is enabled, but after selecting its content and copying to the clipboard I get the same wrong text (as described above - strings of not semantically correct chars or strings of digits and letters).
Could anybody help me???
Very often in such cases, where you can't select, copy'n'paste text from the Acrobat (Reader) window, there is another option which may work nevertheless:
Open 'File' menu,
select 'Save as...',
select 'Text (normal) (*.txt)',
browse to the target directory,
type the name you want to use for the text file.
You'll have all text from all pages in the file and need to locate the spot you wanted to copy'n'paste initially -- insofar it is not as comfortable as direct copy'n'paste. But it works more reliably....
It also works with acroread on Linux (but you have to choose 'Save as text...' from the file menu).
Update
You can use the pdffonts command line utility to get a quick-shot analysis of the fonts used by a PDF.
Here is an example output, which demonstrates where a problem for text extraction will very likely occur. It uses one of these hand-coded PDF files from a GitHub-Repository which was created to provide PDF sample files which are well commented and may easily be opened in a text editor:
$ pdffonts textextract-bad2.pdf
name type encoding emb sub uni object ID
------------------------------- ------------ ----------- --- --- --- ---------
BAAAAA+Helvetica TrueType WinAnsi yes yes yes 12 0
CAAAAA+Helvetica-Bold TrueType WinAnsi yes yes no 13 0
How to interpret this table?
The above PDF file uses two subsetted fonts (as indicated by the BAAAAA+ and CAAAAA+ prefixes to their names, as well as by the yes entries in the sub column), Helvetica and Helvtica-Bold.
Both fonts are of type TrueType.
Both fonts use a WinAnsi encoding (a font encoding maps char identifiers used in the PDF source code to glyphs that should be drawn).
However, only for font /Helvetica there is a /ToUnicode table available inside the PDF (for /Helvetica-Bold there is none), as indicated by the yes/no in the uni-column).
The /ToUnicode table is required to provide a reverse mapping from character identifiers/codes to characters.
A missing /ToUnicode table for a specific font is almost always a sure indicator that text strings using this font cannot be extracted or copied'n'pasted from the PDF. (Even if a /ToUnicode table is there, text extraction may still pose a problem, because this table may be damaged, incorrect or incomplete -- as seen in many real-world PDF files, and as also demonstrated by a few companion files in the above linked GitHub repository.)
If are able to successfully select and copy the text in Adobe Reader -- indicated that the PDF does contain text objects -- but you can't paste the copied text into Notepad without it looking like a bunch of garbage characters, then the problem is probably related to the CMap that the selected text uses.
The PDF specification provides many options for the display of textual content and the related extraction of the text content. A CMap specifies the mapping from character codes to character selectors. The PDF spec outlines some predefined CMaps, but other CMaps can also be embedded.
My guess is that either the CMap for this text is corrupt or that the PDFBox library doesn't support this particular CMap. I suggest trying a different SDK just to see if you get any different results.
When opened as a Gmail attachment in Chrome (the internal PDF browser) copying does copy normal readable characters!
It worked for me when I had this problem and for others as well. I think the Chrome PDF viewer uses the Google Drive OCR automatically... It's like magic!
What was the PDF created with. Some PDFs do not contain any encoding information, just the data to draw it. So there is no way to extract the data.
Select the text you wish to copy.
Right click
Choose option "Export Selection as"
In the dialog box, choose a file name and save the new file as Rich Text Format (RTF)
Open RTF to see your text!
The best way to deal with this is Convert PDF file to Word by using this website.
https://www.ilovepdf.com/pdf_to_word
The garbage issue will be fixed
The best way to deal with this is (assuming you have Adobe Acrobat, or something similar, not sure if Reader can do this) is save the doc as a JPEG. Then recompile all the images as a single pdf, then use the OCR function to find text in the pages, then you can copy and paste the text.
PDF is not a text document. It's more of a vector graphic format that sometimes can contain text. So there are some documents from which you can't extract text unless you are willing to do OCR. That's just the way it is.