Grep | cut not finishing process - process

I'm trying to parse some text from a text file in Linux using the following command:
grep "x" | cut -d ":" -f 2 EthernetData1.txt
Everything seems to be working fine as in the display I can see the expected result but the process does not finish so I can't execute another command without clicking [Ctrl + c].
The file is quite big but the proccess seems to reach the end of it.
Any suggestion?
Thank you.

I think you mean
grep "x" EthernetData1.txt | cut ...
i.e. you need to give your input file to grep not cut.

Related

How to place quotes mark in ansible task with grep, awk, sed

My task search for config in CMD column to gather information what is directory of application config and also PID.
---
- hosts: all
pre_tasks:
- name: Check if process is running
become: yes
shell: 'ps -e --format="pid cmd" | grep process.cfg | sed -e "s/[[:space:]]\+/ /g"| grep -v color'
register: proces_out
output looks like this after this command:
32423 /var/local/bin/application -c /var/local/etc/process.cfg
But i think ansible have troubles with 2 greps in 1 command. I need them both because if i dont use reversed "grep -v color" this anoying thing appears "grep --color=auto ", i cant cut out PID that i need in another task which kills process because real process is in second line.
My second idea was to use AWK, which i think would be the best tool for this case, but if i use double quotation marks in --format parameter and in SED command and the single quotation mark in awk parameters they dont want to cooperate. Even if i keep them balanced they interfere with them selfs.
AWK idea:
shell: 'ps -e --format="pid cmd" | grep process.cfg | sed -e "s/[[:space:]]\+/ /g"| awk 'FNR == 2''
I want to ask for a hint what would be the best to avoid incompatibility in code and be able to use it after as a output in variable
## PID
{{ proces_out.stdout.split(' ')[0] }}
## application
{{ proces_out.stdout.split(' ')[1] }}
## config
{{ proces_out.stdout.split(' ')[3] }}
But i think ansible have troubles with 2 greps in 1 command
That is for sure not true
if i dont use reversed "grep -v color" this anoying thing appears "grep --color=auto ", i cant cut out PID that i need in another task which kills process because real process is in second line.
You are running into the classic case of the grep process matching its own regex, as will happen in a lot of "simple" cases. What you want is a regex that matches your string but does not match itself. In that example above it would be:
shell: 'ps -e --format="pid cmd" | grep process[.]cfg | sed -e "s/[[:space:]]\+/ /g"'
because process[.]cfg matches process.cfg but does not match process[.]cfg I also fixed your regex because in a regex, the . means any character, which doesn't appear to be what you really wanted to happen
With regard to that --color bit, you can likely can side-step that nonsense by using the full path to grep, which will cause bash to really execute the binary, versus some alias that uses --color=auto; I actually wouldn't have expected the colors to show up in an ansible run, because it's not the right $TERM but systems are weird
Thank you Matthew for that solution, but i found diffirent option to avoid unnessesery output.
So syntax is almost the same, but i added to --format addonational parameter ppid Parent process id, in most case i belive parent process always have number 1 in output which helps to sort it as i want to.
It look like this:
shell: >
ps -e --format="ppid pid cmd" |
grep process.cfg |
sed -e "s/[[:space:]]\+/ /g"
register: output_process
And output looks like this:
1 54345 /var/local/bin/application -c /var/local/etc/process.cfg
6435 6577 grep --color=auto process.cfg
Now its easy we can use ansible modules to sort it:
- name: Kill process
become: yes
shell: "kill {{ output_process.stdout_lines[0].split(' ')[2] }}"
What it does? it selects line 0 which is first line, splits output between spaces and selects 3rd phrase. In output theres :space: before ppid thats why PID is 3rd
Thank you again for your solution Matthew, it might be helpfull in another case.

awk search and replace for specific fields

For Excel purposes I need to create a CSV file with a exact format, where some columns are temperatures presented as floats. This is my input file structure:
'14/11/09 00:00 13.0C 25.1C 26.5C 25.4C 26.3C 25.0C *** *** Some text Control
'14/11/10 08:49 POWER ON
So far I'm able to get rid of "dot" to have "comma" instead. I have multiple files and I made a list of them. This list I'm passing to my script, which reads it line by line ($line represents input file):
grep "'" $line | tr -s " " | sed -e "s/'//g" | cut -d" " -f 1-15 |
grep "\*\*\*" | sed -e "s/\./,/g" > $basename"_measurements.csv"
14/11/09 00:00 13,0C 25,1C 26,5C 25,4C 26,3C 25,0C *** *** Some text Control
Excel does not accept 13,0C as number. But I simply don't have any idea how to get rid of this "C" close to number, eg: "13,0C" and so on. I cannot do sed on whole line cause I will broke text in columns (eg. last column). I thought of using awk on columns 3-8 and pipe them to sed. But it gets more and more complicated. Maybe there is a smarter way to do it?
this will trim all C next to a digit
sed -r 's/([0-9])C/\1/g'
if you want to do the replacement only with certain fields, you'll have better control with awk

How to extract table data from PDF as CSV from the command line?

I want to extract all rows from here while ignoring the column headers as well as all page headers, i.e. Supported Devices.
pdftotext -layout DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - \
| sed '$d' \
| sed -r 's/ +/,/g; s/ //g' \
> output.csv
The resulting file should be in CSV spreadsheet format (comma separated value fields).
In other words, I want to improve the above command so that the output doesn't brake at all. Any ideas?
I'll offer you another solution as well.
While in this case the pdftotext method works with reasonable effort, there may be cases where not each page has the same column widths (as your rather benign PDF shows).
Here the not-so-well-known, but pretty cool Free and OpenSource Software Tabula-Extractor is the best choice.
I myself am using the direct GitHub checkout:
$ cd $HOME ; mkdir svn-stuff ; cd svn-stuff
$ git clone https://github.com/tabulapdf/tabula-extractor.git git.tabula-extractor
I wrote myself a pretty simple wrapper script like this:
$ cat ~/bin/tabulaextr
#!/bin/bash
cd ${HOME}/svn-stuff/git.tabula-extractor/bin
./tabula $#
Since ~/bin/ is in my $PATH, I just run
$ tabulaextr --pages all \
$(pwd)/DAC06E7D1302B790429AF6E84696FCFAB20B.pdf \
| tee my.csv
to extract all the tables from all pages and convert them to a single CSV file.
The first ten (out of a total of 8727) lines of the CVS look like this:
$ head DAC06E7D1302B790429AF6E84696FCFAB20B.csv
Retail Branding,Marketing Name,Device,Model
"","",AD681H,Smartfren Andromax AD681H
"","",FJL21,FJL21
"","",Luno,Luno
"","",T31,Panasonic T31
"","",hws7721g,MediaPad 7 Youth 2
3Q,OC1020A,OC1020A,OC1020A
7Eleven,IN265,IN265,IN265
A.O.I. ELECTRONICS FACTORY,A.O.I.,TR10CS1_11,TR10CS1
AG Mobile,Status,Status,Status
which in the original PDF look like this:
It even got these lines on the last page, 293, right:
nabi,"nabi Big Tab HD\xe2\x84\xa2 20""",DMTAB-NV20A,DMTAB-NV20A
nabi,"nabi Big Tab HD\xe2\x84\xa2 24""",DMTAB-NV24A,DMTAB-NV24A
which look on the PDF page like this:
TabulaPDF and Tabula-Extractor are really, really cool for jobs like this!
Update
Here is an ASCiinema screencast (which you also can download and re-play locally in your Linux/MacOSX/Unix terminal with the help of the asciinema command line tool), starring tabula-extractor:
As Martin R commented, tabula-java is the new version of tabula-extractor and active. 1.0.0 was released on July 21st, 2017.
Download the jar file and with the latest java:
java -jar ./tabula-1.0.0-jar-with-dependencies.jar \
--pages=all \
./DAC06E7D1302B790429AF6E84696FCFAB20B.pdf
> support_devices.csv
What you want is rather easy, but you're having a different problem also (I'm not sure you are aware of it...).
First, you should add -nopgbrk for ("No pagebreaks, please!") to your command. Because these pesky ^L characters which otherwise appear in the output then need not be filtered out later.
Adding a grep -vE '(Supported Devices|^$)' will then filter out all the lines you do not want, including empty lines, or lines with only spaces:
pdftotext -layout -nopgbrk \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - \
| grep -vE '(Supported Devices|^$|Marketing Name)' \
| gsed '$d' \
| gsed -r 's# +#,#g' \
| gsed '# ##g' \
> output2.csv
However, your other problem is this:
Some of the table fields are empty.
Empty fields appear with the -layout option as a series of space characters, sometimes even two in the same row.
However, the text columns are not spaced identically from page to page.
Therefor you will not know from line to line how many spaces you need to regard as a an "empty CSV field" (where you'd need an extra , separator).
As a consequence, your current code will show only one, two or three (instead of four) fields for some lines, and these fields end up in the wrong columns!
There is a workaround for this:
Add the -x ... -y ... -W ... -H ... parameters to pdftotext to crop the PDF column-wise.
Then append the columns with a combination of utilities like paste and column.
The following command extracts the first columns:
pdftotext -layout -x 38 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 1st-columns.txt
These are for second, third and fourth columns:
pdftotext -layout -x 214 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 2nd-columns.txt
pdftotext -layout -x 390 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 3rd-columns.txt
pdftotext -layout -x 567 -y 77 -W 176 -H 500 \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - > 4th-columns.txt
BTW, I cheated a bit: in order to get a clue about what values to use for -x, -y, -W and -H I did first run this command in order to find the exact coordinates of the column header words:
pdftotext -f 1 -l 1 -layout -bbox \
DAC06E7D1302B790429AF6E84696FCFAB20B.pdf - | head -n 10
It's always good if you know how to read and make use of pdftotext -h. :-)
Anyway, how to append the four text files as columns side by side, with the proper CVS separator in between, you should find out yourself. Or ask a new question :-)
This can be done easily with an IntelliGet (http://akribiatech.com/intelliget) script as below
userVariables = brand, name, device, model;
{ start = Not(Or(Or(IsSubstring("Supported Devices",Line(0)),
IsSubstring("Retail Branding",Line(0))),
IsEqual(Length(Trim(Line(0))),0)));
brand = Trim(Substring(Line(0),10,44));
name = Trim(Substring(Line(0),45,79));
device = Trim(Substring(Line(0),80,114));
model = Trim(Substring(Line(0),115,200));
output = Concat(brand, ",", name, ",", device, ",", model);
}
For the case where you want to extract that tabular data from PDF over which you have control at creation time (for timesheets contracts your employees have to sign), the following solution will be cleaner:
Create a PDF form with field IDs.
Let people fill and save the PDF forms.
Use a Apache PDFBox, an open source tool that allows to extract form data from a PDF. It includes a command-line example tool PrintFields that you would call as follows to print the desired field information:
org.apache.pdfbox.examples.interactive.form.PrintFields file.pdf
For other options, see this question.
As an alternative to the above workflow, maybe you could also use a digital signature web service that allows PDF form filling and export of the data to tables. Such as SignRequest, which allows to create templates and later export the data of signed documents. (Not affiliated, just found this myself.)

grep a number from the line and append it to a file

I went through several grep examples, but don't see how to do the following.
Say, i have a file with a line
! some test here and number -123.2345 text
i can get this line using
grep ! input.txt
but how do i get the number (possibly positive or negative) from this line and append it to the end of another file? Is it possible to apply grep to grep results?
If yes, then i could get the number via something like
grep -Eo "[0-9]{1,}|\-[0-9]{1,}"
p/s/ i am using OS-X
p/p/s/ i'm trying to fetch data from several files and put into a single file for later plotting.
The format with your commands would be:
grep ! input.txt | grep -Eo "[0-9]{1,}|\-[0-9]{1,}" >> output
To grep from grep we use the pipe operator | this lets us chain commands together. To append this output to a file we use the redirection operator >>.
However there are a couple of problems. You regexp is better written: grep -Eoe '-?[0-9.]+' this allows for the decimal and returns the single number instead of two and if you want lines that start with ! then grep ^! is better to avoid matches with lines what contain ! but don't start with it. Better to do:
grep '^!' input | grep -Eoe '-?[0-9.]+' >> output
perl -lne 'm/.*?([\d\.\-]+).*/g;print $1' your_file >>anotherfile_to_append
$foo="! some test here and number -123.2345 text"
$echo $foo | sed -e 's/[^0-9\.-]//g'
$-123.2345
Edit:-
for a file,
[ ]$ cat log
! some test here and number -123.2345 text
some blankline
some line without "the character" and with number 345.566
! again a number 34
[ ]$ sed -e '/^[^!]/d' -e 's/[^0-9.-]//g' log > op
[ ]$ cat op
-123.2345
34
Now lets see the toothpicks :) '/^[^!]/d' / start of pattern, ^ not (like multiply with false), [^!] anyline starting with ! and d delete. Second expression, [^0-9.-] not matching anything within 0 to 9, and . and -, (everything else) // replace with nothing (i.e. delete) and done :)

How to get few lines from a .gz compressed file without uncompressing

How to get the first few lines from a gziped file ?
I tried zcat, but its throwing an error
zcat CONN.20111109.0057.gz|head
CONN.20111109.0057.gz.Z: A file or directory in the path name does not exist.
zcat(1) can be supplied by either compress(1) or by gzip(1). On your system, it appears to be compress(1) -- it is looking for a file with a .Z extension.
Switch to gzip -cd in place of zcat and your command should work fine:
gzip -cd CONN.20111109.0057.gz | head
Explanation
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a sequence of independently compressed members. To obtain better compression, concatenate all input files before compressing
them.
-d --decompress --uncompress
Decompress.
On some systems (e.g., Mac), you need to use gzcat.
On a mac you need to use the < with zcat:
zcat < CONN.20111109.0057.gz|head
If a continuous range of lines needs be, one option might be:
gunzip -c file.gz | sed -n '5,10p;11q' > subFile
where the lines between 5th and 10th lines (both inclusive) of file.gz are extracted into a new subFile. For sed options, refer to the manual.
If every, say, 5th line is required:
gunzip -c file.gz | sed -n '1~5p;6q' > subFile
which extracts the 1st line and jumps over 4 lines and picks the 5th line and so on.
If you want to use zcat, this will show the first 10 rows
zcat your_filename.gz | head
Let's say you want the 16 first row
zcat your_filename.gz | head -n 16
This awk snippet will let you show not only the first few lines - but a range you can specify. It will also add line numbers which i needed for debugging an error message pointing to a certain line way down in a gzipped file.
gunzip -c file.gz | awk -v from=10 -v to=20 'NR>=from { print NR,$0; if (NR>=to) exit 1}'
Here is the awk snippet used in the one liner above. In awk NR is a built-in variable (Number of records found so far) which usually is equivalent to a line number. the from and to variable are picked up from the command line via the -v options.
NR>=from {
print NR,$0;
if (NR>=to)
exit 1
}