gzip several files and pipe them into one input - gzip

I have this program that takes one argument for the source file and then it parse it. I have several files gzipped that I would like to parse, but since it only takes one input, I'm wondering if there is a way to create one huge file using gzip and then pipe it into the only one input.

Use zcat - you can provide it with multiple input files, and it will de-gzip them and then concatenate them just like cat would. If your parser supports piped input into stdin, you can just pipe it directly; otherwise, you can just redirect the output to a file and then invoke your parser program on that file.
If the program actually expects a gzip'd file, then just pipe the output from zcat to gzip to recompress the combined file into a single gzip'd archive.
http://www.mkssoftware.com/docs/man1/zcat.1.asp

Related

Openvms: Extracting RMS Indexed file t to Windows as a sequential flat file

I haven't used openvms for 20+ years. It was my 1st OS. I've been asked if it possible to copy the data from RMS files from openvms server to windows as a text file - so that it's readable.
No-one has experience or knowledge of the record structures etc.
The files are xyz.DAT and are relative files. I'm hoping the dat files are fixed length.
My 1st attempt would be to try and use Datatrieve (DTR) but get an error that the image isn't loaded.
Thought it might be as easy using CONVERT/FDL = nnnn.FDL - by changing the Relative to Sequential. The file seems still to be unreadable.
Is there an easy way to stream an RMS index file to a flat ASCII file?
I use to use COBOL and C to access the data in the past but had lots of libraries to help....
I've notice some solution may use odbc to connect but not sure what I can or cannot install on the server.
I can FTP using Filezilla to the server....
Another plan writing C application to read a file and output out as string.....or DCL too.....doesn't have to be quick...
Any ideas
Has mentioned before
The simple solution MIGHT be to to just use: $ TYPE/OUT=test.TXT test.DAT.
This will handle Relatie and Indexed files alike.
It is much the same as $ CONVERT / FDL=NL: test.DAT test.TXT
Both will just read records from the source and transfer the bytes, byte for byte, to the records in a sequential file.
FTP in ASCII mode will transfer that nicely to windows.
You can also use an 'inline' FDL file to generate a 'unix' LF file like:
$ conv /fdl="record; format stream_lf" test.DAT test.TXT
Or CR-LF file using:
$ conv /fdl="record; format stream" test.DAT test.TXT
Both can be transferring in Binary or Ascii with FTP.
MOSTLY - because this really only works well for TEXT ONLY source .DAT file.
There should be no CR, LF, FF or NUL characters in the source or things will break.
As 'habo' points out, use DUMP /RECORD=COUNT=3 to see how 'readable' the source data is.
If you spot 'binary' data using DUMP then you will need to find a record defintion somewhere which maps byte to Integers or Floating points or Dates as needed.
These defintions can be COBOL LIB files, or BASIC MAPS and are often stores IN the CDD (Common Data Dictionary) or indeed in DATATRIEVE .DIC DICTIONARIES
To use such definition you likely need a program to just read following the 'map' and write/print as text. Normally that's not too hard - notably not when you can find an example program on the server to tweak.
If it is just one or two 'suspect' byte ranges, then you can create a DCL loop to read and write and use F$EXTRACT to select the chunks you like.
If you want further help, kindly describe in words what kind of data is expected and perhaps provide the output from DUMP for 3 or 5 rows.
Good luck!
Hein.

Extract from tar.gz by file name

I've got a large amount of files inside .tar.gz.
Is it (programmatically) possible to extract a file by its filename, without the overhead of decompressing other files?
I'll split the reply into two parts
Is it (programmatically) possible to extract a file by its filename
yes, it is possible to extract a file by its filename.
tar xzf tarfile.tar filename
without the overhead of decompressing other files?
In order to extract a file from a compressed tar file the tar program has to find the file you want. If that is the first file in the tarfile, then it only has to uncompress that. If the file isn't the first in the tarfile the tar program needs to scan through the tarfile until it finds the file you want. To do that is MUST uncompress the preceding files in the tarfile. That doesn't mean it has to extract them to disk or buffer these files in memory. It will stream the uncompression so that them memory overhead isn't significant.

Behave framework .ini file usage

In my .ini file I have
[behave]
format=rerun
outfiles=rerun_failing.features
So I want to use "rerun_failing.features" file for storing scenarios that fail.
However when I run '--steps-catalog' command, it also stores that catalog to the same file. Why is that?
How to make set up two separate files for commands '--rerun' and '--steps-catalog'?
Thanks!
Use behave --dry-run -f steps.catalog ... instead. The output of the steps.catalog formatter is written to stdout, not the "rerun-outputfile".

How to Input Redirect Two Files to Standard Input?

Is it possible to redirect two or more files to standard input in one command? For example
$ myProgram < file1 < file 2
I tried that command however, it seemed like the OS is only taking the first file and ignoring the other...
If not, how can I achieve that?
NOTE: concatenating the two files will not help in my case.
When you do this from bash, it isn't inputting multiple files to standard input, it is called Process Substitution
The output is sent to an file descriptor under /dev/fd/<n> for each substitution

Programatically splitting pdf pages to their own pdf's in UNIX

I am trying to write a program that takes as input a .pdf file and separates each page into their own .pdf files in UNIX command line. I have tried SplitPdf but for some reason I keep getting errors.
update: I have already tried pdftk but it has poor performance and a limitation on the size of the pdf file.
Use pdftk.
The burst command is what you are after.
Man page section: http://www.pdflabs.com/docs/pdftk-man-page/#dest-op-burst
burst
Splits a single, input PDF
document into individual pages. Also
creates a report named doc_data.txt
which is the same as the output from
dump_data. If the output section is
omitted, then PDF pages arenamed:
pg_%04d.pdf, e.g.: pg_0001.pdf,
pg_0002.pdf, etc. To name these pages
yourself, supply a printf-styled
format string in the output section.
For example, if you want pages named:
page_01.pdf, page_02.pdf, etc.,pass
output page_%02d.pdf to pdftk.
Encryption can be applied to the
output by appending output options
such as owner_pw, e.g.: pdftk in.pdf
burst owner_pw foopass