Delete "0" or "1" from the end of each line, except the first line - awk

the input file looks like
Kick-off team 68 0
Ball safe 69 1
Attack 77 8
Attack 81 4
Throw-in 83 0
Ball possession 86 3
Goal kick 100 10
Ball possession 101 1
Ball safe 114 13
Throw-in 123 9
Ball safe 134 11
Ball safe 135 1
Ball safe 137 2
and at the end it should look like this:
Kick-off team 68 0
Attack 77 8
Attack 81 4
Ball possession 86 3
Goal kick 100 10
Ball safe 114 13
Throw-in 123 9
Ball safe 134 11
Ball safe 137 2
my solution is
awk '{print $NF}' test.txt | sed -re '2,${/(^0$|^1$)/d}'
how can i directly change the file, e.g. sed -i?

sed -i '2,$ {/[^0-9][01]$/d}' test.txt
2,$ lines to act upon, this one says 2nd line to end of file
{/[^0-9][01]$/d} from filtered lines, delete those ending with 0 or 1
'2,$ {/ [01]$/d}' can be also used if character before last column is always a space
With awk which is better suited for column manipulations:
awk 'NR==1 || ($NF!=1 && $NF!=0)' test.txt > tmp && mv tmp test.txt
NR==1 first line
($NF!=1 && $NF!=0) last column shouldn't be 0 or 1
can also use $NF>1 if last column only have non-negative numbers
> tmp && mv tmp test.txt save output to temporary file and then move it back as original file
With GNU awk, there is inplace option awk -i inplace 'NR==1 || ($NF!=1 && $NF!=0)' test.txt

Here's my take on this.
sed -i.bak -e '1p;/[^0-9][01]$/d' file.txt
The sed script prints the first line, then deletes all subsequent lines that match the pattern you described. This assumes that your first line would be a candidate for deletion; if it contains something other than 0 or 1 in the last field, this script will print it twice. And the -i option is what tells sed to edit in-place (with a backup file).
Awk doesn't have an equivalent option for editing files in-place -- if you want that kind of functionality, you need to implement it in a shell wrapper around your awk script, as #sundeep suggested.
Note that I'm not using GNU sed, but this command should work equally well with it.

awk to the rescue!
$ awk 'NR==1 || $NF && $NF!=1' file
or more cryptic
$ awk 'NR==1 || $NF*($NF-1)' file

This might work for you (GNU sed):
sed -i '1b;/\s[01]$/d' file
Other than the first line, delete any line ending in 0 or 1.

Related

How do I print every nth entry of the mth column, starting from a particular line of a file?

Consider the following data in a file file.txt:
$
$
$
FORCE 10 30 40
* 1 5 4
FORCE 11 20 22
* 2 3 0
FORCE 19 25 10
* 16 12 8
.
.
.
I want to print every 2nd element of the third column, starting from line 4, resulting in:
30
20
25
I have tried:
cat file.txt | sed 's/\|/ /' | awk 'NR%2==4 {print $3}'
However, this is not resulting in anything being printed and no errors generated either.
You might use awk checking that the row number > 3 and then check for an even row number with NR%2==0.
Note that you don't have to use cat
awk 'NR > 3 && NR%2==0 {
print $3
}' file.txt
Output
30
20
25
Using sed
$ sed -En '4~2s/([^ \t]*[ \t]+){2}([^ \t]*).*/\2/p' input_file
30
20
25
I have tried:
cat file.txt | sed 's/\|/ /' | awk 'NR%2==4 {print $3}'
However, this is not resulting in anything being printed and no errors
generated either.
You do not need cat whilst using GNU sed as it can read file on its' own, in this case it would be sed 's/\|/ /' file.txt.
You should consider if you need that part at all, your sample input does not have pipe character at all, so it would do nothing to it. You might also drop that part if lines holding values you want to print do not have that character.
Output is empty as NR%2==4 does never hold, remainder of division by x is always smaller than x (in particular case of %2 only 2 values are possible: 0 and 1)
This might work for you (GNU sed):
sed -nE '4~2s/^((\S+)\s*){3}.*/\2/p' file
Turn off implicit printing by setting the -n option and reduce back slashes in regexps by turning on -E.
From the fourth line and then every second line thereafter, capture the third column and print it.
N.B. The \2 represents the last inhabitant of that back reference which in conjunction with the {3} means the above.
Alternative:
sed -n '4,${s/^\(\(\S\+\)\s*\)\{3\}.*/\2/p;n}' file

How to replace a content on specific patter using sed without losing info?

I have a text file with a bunch of data. I was able to extract exactly what I want using sed; but
I need to replaced only the specific pattern I searched without losing the other content from the file.
Im using the following sed command; but I need to the replacement; but dont know how to do it.
cat file.txt | sed -rn '/([a-z0-9]{2}\s){6}/p' > output.txt
The sed searches for the following pattern: ## ## ## ## ## ##, but I want to replace that pattern like this: ######-######.
cat file.txt | sed -rn '/([a-z0-9]{2}\s){6}/p' > output.txt
Output:
1 | ec eb b8 7b e3 c0 47
9 | 90 20 c2 f6 3d c0 1/1/1
25 | 00 fd 45 3d a7 80 31
Desired Output:
1 | ecebb8-7be3c0 47
9 | 9020c2-f63dc0 1/1/1
25 | 00fd45-3da780 31
Thanks
With your shown samples please try following awk program.
awk '
BEGIN{ FS=OFS="|" }
{
$2=substr($2,1,3) substr($2,5,2) substr($2,8,2)"-" substr($2,11,2) substr($2,14,2) substr($2,17,2) substr($2,19)
}
1
' Input_file
Explanation: Simple explanation would be, setting FS and OFS as | for awk program. Then in 2nd field using awk's substr function keeping only needed values as per shown samples of OP. Where substr function works on method of printing specific indexes/position number values(eg: from which value to which value you need to print). Then saving required values into 2nd field itself and printing current line then.
With awk:
awk '{$3=$3$4$5"-"$6$7$8; print $1"\t",$2,$3,$NF}' file
1 | ecebb8-7be3c0 47
9 | 9020c2-f63dc0 1/1/1
25 | 00fd45-3da780 31
This might work for you (GNU sed):
sed -E 's/ (\S\S) (\S\S) (\S\S)/ \1\2\3/;s//-\1\2\3/' file
Pattern match 3 spaced pairs twice, removing the spaces between the 2nd and 3rd pairs and replacing the first space in the 2nd match by -.
If you want to extract specific substrings, you'll need to write a more specific regex to pull out exactly those.
sed -rn '/([a-z0-9]{2})\s([a-z0-9]{2})\s([a-z0-9]{2})\s([a-z0-9]{2})\s([a-z0-9]{2})\s([a-z0-9]{2})\s/\1\2\3-\4\5\6/' file.txt > output.txt
Notice also how easy it is to avoid the useless cat.
\s is generally not portable; the POSIX equivalent is [[:space:]].

Print every second consequtive field in two columns - awk

Assume the following file
#zvview.exe
#begin Present/3
77191.0000 189.320100 0 0 3 0111110 16 1
-8.072430+6-8.072430+6 77190 0 1 37111110 16 2
37 2 111110 16 3
8.115068+6 0.000000+0 8.500000+6 6.390560-2 9.000000+6 6.803440-1111110 16 4
9.500000+6 1.685009+0 1.000000+7 2.582780+0 1.050000+7 3.260540+0111110 16 5
37 2 111110 16 18
What I would like to do, is print in two columns, the fields after line 6. This can be done using NR. The tricky part is the following : Every second field, should go in one column as well as adding an E before the sign, so that the output file will look like this
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
From the output file you see that I want to keep in $6 only length($6)=10 characters.
How is it possible to do it in awk?
can do all in awk but perhaps easier with the unix toolset
$ sed -n '6,7p' file | cut -c2-66 | tr ' ' '\n' | pr -2ats' '
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
Here is a awk only solution or comparison
$ awk 'NR>=6 && NR<=7{$6=substr($6,1,10);
for(i=1;i<=6;i+=2) {f[++c]=$i;s[c]=$(i+1)}}
END{for(i=1;i<=c;i++) print f[i],s[i]}' file
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
Perhaps shorter version,
$ awk 'NR>=6 && NR<=7{$6=substr($6,1,10);
for(i=1;i<=6;i+=2) print $i FS $(i+1)}' file
8.115068+6 0.000000+0
8.500000+6 6.390560-2
9.000000+6 6.803440-1
9.500000+6 1.685009+0
1.000000+7 2.582780+0
1.050000+7 3.260540+0
to convert format to standard scientific notation, you can pipe the result to
sed or embed something similar in awk script (using gsub).
... | sed 's/[+-]/E&/g'
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
With GNU awk for FIELDWIDTHS:
$ cat tst.awk
BEGIN { FIELDWIDTHS="9 2 9 2 9 2 9 2 9 2 9 2" }
NR>5 && NR<8 {
for (i=1;i<NF;i+=4) {
print $i "E" $(i+1), $(i+2) "E" $(i+3)
}
}
$ awk -f tst.awk file
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0
If you really want to get rid of the leading blanks then there's various ways to do it (simplest being gsub(/ /,"",$<field number>) on the relevant fields) but I left them in because the above allows your output to line up properly if/when your numbers start with a -, like they do on line 4 of your sample input.
If you don't have GNU awk, get it as you're missing a LOT of extremely useful functionality.
I tried to combine #karafka 's answer using substr, so the following does the trick!
awk 'NR>=6 && NR<=7{$6=substr($6,1,10);for(i=1;i<=6;i+=2) print substr($i,1,8) "E" substr($i,9) FS substr($(i+1),1,8) "E" substr($(i+1),9)}' file
and the output is
8.115068E+6 0.000000E+0
8.500000E+6 6.390560E-2
9.000000E+6 6.803440E-1
9.500000E+6 1.685009E+0
1.000000E+7 2.582780E+0
1.050000E+7 3.260540E+0

How to match "field 5 through the end of the line" (for example, in awk)

I want to pretty-print the output of a find-like script that would take input like this:
- 2015-10-02 19:45 102 /My Directory/some file.txt
and produce something like this:
- 102 /My Directory/some file.txt
In other words: "f" (for "file"), file size (right-justified), then pathname (with an arbitrary number of spaces).
This would be easy in awk if I could write a script that takes $1, $4, and "everything from $5 through the end of the line".
I tried using the awk construct substr($0, index($0, $8)), which I thought meant "everything starting with field $8 to the end of $0".
Using index() in this way is offered as a solution on linuxquestions.org and was upvoted 29 times in a stackoverflow.com thread.
On closer inspection, however, I found that index() does not achieve this effect if the starting field happens to match an earlier point in the string. For example, given:
-rw-r--r-- 1 tbaker staff 3024 2015-10-01 14:39 calendar
-rw-r--r-- 1 tbaker staff 4062 2015-10-01 14:39 b
-rw-r--r-- 1 tbaker staff 2374 2015-10-01 14:39 now or later
Gawk (and awk) get the following results:
$ gawk '{ print index($0, $8) }' test.txt
49
15
49
In other words, the value of $8 ('b') matches at index 15 instead of 49 (i.e., like most of the other filenames).
My issue, then is how to specify "everything from field X to the end of the string".
I have re-written this question in order to make this clear.
Looks to me like you should just be using the "stat" command rather than "ls", for the reasons already commented upon:
stat -c "f%15s %n" *
But you should double-check how your "stat" operates; it apparently can be shell-specific.
The built-in awk function index() is sometimes recommended as a way
to print "from field 5 through the end of the string" [1, 2, 3].
In awk, index($0, $8) does not mean "the index of the first character of
field 8 in string $0". Rather, it means "the index of the first occurrence in
string $0 of the string value of field 8". In many cases, that first
occurrence will indeed be the first character in field 8 but this is not the
case in the example above.
It has been pointed out that parsing the output of ls is generally a bad
idea [4], in part because implementations of ls significantly differ in output.
Since the author of that note recommends find as a replacement for ls for some uses,
here is a script using find:
find $# -ls |
sed -e 's/^ *//' -e 's/ */ /g' -e 's/ /|/2' -e 's/ /|/2' -e 's/ /|/4' -e 's/ /|/4' -e 's/ /|/6' |
gawk -F'|' '{ $2 = substr($2, 1, 1) ; gsub(/^-/, "f", $2) }
{ printf("%s %15s %s\n", $2, $4, $6) }'
...which yields the required output:
f 4639 /Users/foobar/uu/a
f 3024 /Users/foobar/uu/calendar
f 2374 /Users/foobar/uu/xpect
This approach recursively walks through a file tree. However, there may of course be implementation differences between versions of find as well.
http://www.linuxquestions.org/questions/linux-newbie-8/awk-print-field-to-end-and-character-count-179078/
How to print third column to last column?
Print Field 'N' to End of Line
http://mywiki.wooledge.org/ParsingLs
Maybe some variation of find -printf | awk is what you're looking for?
$ ls -l tmp
total 2
-rw-r--r-- 1 Ed None 7 Oct 2 14:35 bar
-rw-r--r-- 1 Ed None 2 Oct 2 14:35 foo
-rw-r--r-- 1 Ed None 0 May 3 09:55 foo bar
$ find tmp -type f -printf "f %s %p\n" | awk '{sub(/^[^ ]+ +[^ ]/,sprintf("%s %10d",$1,$2))}1'
f 7 tmp/bar
f 2 tmp/foo
f 0 tmp/foo bar
or
$ find tmp -type f -printf "%s %p\n" | awk '{sub(/^[^ ]+/,sprintf("f %10d",$1))}1'
f 7 tmp/bar
f 2 tmp/foo
f 0 tmp/foo bar
It won't work with file names that contain newlines.

Confusion about awk command when dealing with if statement

$ cat awk.txt
12 32 45
5 2 3
33 11 33
$ cat awk.txt | awk '{FS='\t'} $1==5 {print $0}'
5 2 3
$ cat awk.txt | awk '{FS='\t'} $1==33 {print $0}'
Nothing is returned when judging the first field is 33 or not. It's confusing.
By saying
awk '{FS='\t'} $1==5 {print}' file
You are defining the field separator incorrectly. To make it be a tab, you need to say "\t" (with double quotes). Further reading: awk not capturing first line / separator.
Also, you are setting it every line, so it does not affect the first one. You want to use:
awk 'BEGIN{FS='\t'} $1==5' file
Yes, but why did it work in one case but not in the other?
awk '{FS='\t'} $1==5' file # it works
awk '{FS='\t'} $1==33' file # it does not work
You're using single quotes around '\t', which means that you're actually concatenating 3 strings together: '{FS=', \t and '} $1==5' to produce your awk command. The shell interprets the \t as t, so your awk script is actually:
awk '{FS=t} $1==5'
The variable t is unset, so you're setting the field separator to the empty string "". This means that the line is split into as many fields as characters you have. You can see it doing awk 'BEGIN{FS='\t'} {print NF}' file, that will show how many fields each record has.
Then, $1 is just 3 and $2 contains the second 3.
first of all !. Could you explain better what you really want to do before you ask ?. look....!
more awk.txt
12 32 45
5 2 3
33 11 33
awk -F"[ \t]" '$1 == 5 { print $0}' awk.txt
5 2 3
awk -F"[ \t]" '$1 == 33 { print $0}' awk.txt
33 11 33
awk -F"[ \t]" '$1 == 12 { print $0}' awk.txt
12 32 45
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_23.html
Fcs