Screen command for buffer - gnu-screen

I want to read the contents of the file into the buffer and stdout it to the screen. I did this:screen -X readbuf /home/nitro/file|screen -X writebuf|cat /tmp/screen-exchangebut cat command showed me screen-exchange file with the previous result of readbuf command. If I do this commands separately - everything will be correct and I'll get the modified screen-exchange file.
How can I perform all three commands readbuf, writebuf and cat at once?

This works for me:
screen -X eval "readbuf /tmp/x" writebuf && cat /tmp/screen-exchange

Related

How to run awk script on multiple files

I need to run a command on hundreds of files and I need help to get a loop to do this:
have a list of input files /path/dir/file1.csv, file2.csv, ..., fileN.csv
need to run a script on all those input files
script is something like: command input=/path/dir/file1.csv output=output1
I have tried things like:
for f in /path/dir/file*.csv; do command, but how do I get to read and write the new file every time?
Thank you....
Try this, (after changing /path/to/data to the correct path. Same with /path/to/awkscript and other places, pointing to your test data.)
#!/bin/bash
cd /path/to/data
for f in *.csv ; do
echo "awk -f /path/to/awkscript \"$f\" > ${f%.csv}.rpt"
#remove_me awk -f /path/to/awkscript "$f" > ${f%.csv}.rpt
done
make the script "executable" with
chmod 755 myScript.sh
The echo version will help you ensure the script is going to work as expected. You still have to carefully examine that output OR work on a copy of your data so you don't wreck you base-line data.
You could take the output of the last iteration
awk -f /path/to/awkscript myFileLast.csv > myFileLast.rpt
And copy/paste to cmd-line to confirm it will work.
WHen you comfortable that the awk script works as you need, then comment out the echo awk .. line, and uncomment the word #remove_me (and save your bash script).
for f in /path/to/files/*.csv ; do
bname=`basename $f`
pref=${bname%%.csv}
awk -f /path/to/awkscript $f > /path/to/store/output/${pref}_new.txt
done
Hopefully this helps, I am on my blackberry so there may be typos

How to extract table data from PDF as CSV from the command line?

I want to extract all rows from here while ignoring the column headers as well as all page headers, i.e. Supported Devices.
pdftotext -layout DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - \
| sed '$d' \
| sed -r 's/ +/,/g; s/ //g' \
> output.csv
The resulting file should be in CSV spreadsheet format (comma separated value fields).
In other words, I want to improve the above command so that the output doesn't brake at all. Any ideas?
I'll offer you another solution as well.
While in this case the pdftotext method works with reasonable effort, there may be cases where not each page has the same column widths (as your rather benign PDF shows).
Here the not-so-well-known, but pretty cool Free and OpenSource Software Tabula-Extractor is the best choice.
I myself am using the direct GitHub checkout:
$ cd $HOME ; mkdir svn-stuff ; cd svn-stuff
$ git clone https://github.com/tabulapdf/tabula-extractor.git git.tabula-extractor
I wrote myself a pretty simple wrapper script like this:
$ cat ~/bin/tabulaextr
#!/bin/bash
cd ${HOME}/svn-stuff/git.tabula-extractor/bin
./tabula $#
Since ~/bin/ is in my $PATH, I just run
$ tabulaextr --pages all \
$(pwd)/DAC06E7D1302B790429AF6E84696FCFAB20B.pdf \
| tee my.csv
to extract all the tables from all pages and convert them to a single CSV file.
The first ten (out of a total of 8727) lines of the CVS look like this:
$ head DAC06E7D1302B790429AF6E84696FCFAB20B.csv
Retail Branding,Marketing Name,Device,Model
"","",AD681H,Smartfren Andromax AD681H
"","",FJL21,FJL21
"","",Luno,Luno
"","",T31,Panasonic T31
"","",hws7721g,MediaPad 7 Youth 2
3Q,OC1020A,OC1020A,OC1020A
7Eleven,IN265,IN265,IN265
A.O.I. ELECTRONICS FACTORY,A.O.I.,TR10CS1_11,TR10CS1
AG Mobile,Status,Status,Status
which in the original PDF look like this:
It even got these lines on the last page, 293, right:
nabi,"nabi Big Tab HD\xe2\x84\xa2 20""",DMTAB-NV20A,DMTAB-NV20A
nabi,"nabi Big Tab HD\xe2\x84\xa2 24""",DMTAB-NV24A,DMTAB-NV24A
which look on the PDF page like this:
TabulaPDF and Tabula-Extractor are really, really cool for jobs like this!
Update
Here is an ASCiinema screencast (which you also can download and re-play locally in your Linux/MacOSX/Unix terminal with the help of the asciinema command line tool), starring tabula-extractor:
As Martin R commented, tabula-java is the new version of tabula-extractor and active. 1.0.0 was released on July 21st, 2017.
Download the jar file and with the latest java:
java -jar ./tabula-1.0.0-jar-with-dependencies.jar \
--pages=all \
./DAC06E7D1302B790429AF6E84696FCFAB20B.pdf
> support_devices.csv
What you want is rather easy, but you're having a different problem also (I'm not sure you are aware of it...).
First, you should add -nopgbrk for ("No pagebreaks, please!") to your command. Because these pesky ^L characters which otherwise appear in the output then need not be filtered out later.
Adding a grep -vE '(Supported Devices|^$)' will then filter out all the lines you do not want, including empty lines, or lines with only spaces:
pdftotext -layout -nopgbrk \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - \
| grep -vE '(Supported Devices|^$|Marketing Name)' \
| gsed '$d' \
| gsed -r 's# +#,#g' \
| gsed '# ##g' \
> output2.csv
However, your other problem is this:
Some of the table fields are empty.
Empty fields appear with the -layout option as a series of space characters, sometimes even two in the same row.
However, the text columns are not spaced identically from page to page.
Therefor you will not know from line to line how many spaces you need to regard as a an "empty CSV field" (where you'd need an extra , separator).
As a consequence, your current code will show only one, two or three (instead of four) fields for some lines, and these fields end up in the wrong columns!
There is a workaround for this:
Add the -x ... -y ... -W ... -H ... parameters to pdftotext to crop the PDF column-wise.
Then append the columns with a combination of utilities like paste and column.
The following command extracts the first columns:
pdftotext -layout -x 38 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 1st-columns.txt
These are for second, third and fourth columns:
pdftotext -layout -x 214 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 2nd-columns.txt
pdftotext -layout -x 390 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 3rd-columns.txt
pdftotext -layout -x 567 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 4th-columns.txt
BTW, I cheated a bit: in order to get a clue about what values to use for -x, -y, -W and -H I did first run this command in order to find the exact coordinates of the column header words:
pdftotext -f 1 -l 1 -layout -bbox \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - | head -n 10
It's always good if you know how to read and make use of pdftotext -h. :-)
Anyway, how to append the four text files as columns side by side, with the proper CVS separator in between, you should find out yourself. Or ask a new question :-)
This can be done easily with an IntelliGet (http://akribiatech.com/intelliget) script as below
userVariables = brand, name, device, model;
{ start = Not(Or(Or(IsSubstring("Supported Devices",Line(0)),
IsSubstring("Retail Branding",Line(0))),
IsEqual(Length(Trim(Line(0))),0)));
brand = Trim(Substring(Line(0),10,44));
name = Trim(Substring(Line(0),45,79));
device = Trim(Substring(Line(0),80,114));
model = Trim(Substring(Line(0),115,200));
output = Concat(brand, ",", name, ",", device, ",", model);
}
For the case where you want to extract that tabular data from PDF over which you have control at creation time (for timesheets contracts your employees have to sign), the following solution will be cleaner:
Create a PDF form with field IDs.
Let people fill and save the PDF forms.
Use a Apache PDFBox, an open source tool that allows to extract form data from a PDF. It includes a command-line example tool PrintFields that you would call as follows to print the desired field information:
org.apache.pdfbox.examples.interactive.form.PrintFields file.pdf
For other options, see this question.
As an alternative to the above workflow, maybe you could also use a digital signature web service that allows PDF form filling and export of the data to tables. Such as SignRequest, which allows to create templates and later export the data of signed documents. (Not affiliated, just found this myself.)

Grep query in C shell script not performing properly

When I run the grep command on the command prompt, the output is correct. However, when I run it as part of a script, I only get partial output. Does anyone know what is wrong with this programme?
#!/bin/csh
set res = `grep -E "OPEN *(OUTPUT|INPUT|I-O|EXTEND)" ~/work/lst/TXT12UPD.lst`
echo $res
Your wildcard is probably being processed by the shell calling awk rather than as part of the awk script.
try escaping the * with a \ (i.e. \*)

How to get few lines from a .gz compressed file without uncompressing

How to get the first few lines from a gziped file ?
I tried zcat, but its throwing an error
zcat CONN.20111109.0057.gz|head
CONN.20111109.0057.gz.Z: A file or directory in the path name does not exist.
zcat(1) can be supplied by either compress(1) or by gzip(1). On your system, it appears to be compress(1) -- it is looking for a file with a .Z extension.
Switch to gzip -cd in place of zcat and your command should work fine:
gzip -cd CONN.20111109.0057.gz | head
Explanation
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a sequence of independently compressed members. To obtain better compression, concatenate all input files before compressing
them.
-d --decompress --uncompress
Decompress.
On some systems (e.g., Mac), you need to use gzcat.
On a mac you need to use the < with zcat:
zcat < CONN.20111109.0057.gz|head
If a continuous range of lines needs be, one option might be:
gunzip -c file.gz | sed -n '5,10p;11q' > subFile
where the lines between 5th and 10th lines (both inclusive) of file.gz are extracted into a new subFile. For sed options, refer to the manual.
If every, say, 5th line is required:
gunzip -c file.gz | sed -n '1~5p;6q' > subFile
which extracts the 1st line and jumps over 4 lines and picks the 5th line and so on.
If you want to use zcat, this will show the first 10 rows
zcat your_filename.gz | head
Let's say you want the 16 first row
zcat your_filename.gz | head -n 16
This awk snippet will let you show not only the first few lines - but a range you can specify. It will also add line numbers which i needed for debugging an error message pointing to a certain line way down in a gzipped file.
gunzip -c file.gz | awk -v from=10 -v to=20 'NR>=from { print NR,$0; if (NR>=to) exit 1}'
Here is the awk snippet used in the one liner above. In awk NR is a built-in variable (Number of records found so far) which usually is equivalent to a line number. the from and to variable are picked up from the command line via the -v options.
NR>=from {
print NR,$0;
if (NR>=to)
exit 1
}

How do I iterate over all the lines output by a command in zsh?

How do I iterate over all the lines output by a command using zsh, without setting IFS?
The reason is that I want to run a command against every file output by a command, and some of these files contain spaces.
Eg, given the deleted file:
foo/bar baz/gamma
That is, a single directory 'foo', containing a sub directory 'bar baz', containing a file 'gamma'.
Then running:
git ls-files --deleted | xargs ls
Will report in that file being handled as two files: 'foo/bar', and '/baz/gamma'.
I need it to handle it as one file: 'foo/bar baz/gamma'.
If you want to run the command once for all the lines:
ls "${(#f)$(git ls-files --deleted)}"
The f parameter expansion flag means to split the command's output on newlines. There's a more general form (#s:||:) to split at an arbitrary string like ||. The # flag means to retain empty records. Somewhat confusingly, the whole expansion needs to be inside double quotes, to avoid IFS splitting on the output of the command substitution, but it will produce separate words for each record.
If you want to run the command for each line in turn, the portable idiom isn't particularly complicated:
git ls-filed --deleted | while IFS= read -r line; do ls $line; done
If you want to run the command as few times as the command line length limit permits, use zargs.
autoload -U zargs
zargs -- "${(#f)$(git ls-files --deleted)}" -- ls
Using tr and the -0 option of xargs, assuming that the lines don't contain \000 (NUL), which is a fair assumption due to NUL being one of the characters that can't appear in filenames:
git ls-files --deleted | tr '\n' '\000' | xargs -0 ls
this turns the line: foo/bar baz/gamma\n into foo/bar baz/gamma\000 which xargs -0 knows how to handle