Print all but the first three columns - awk

Too cumbersome:
awk '{print " "$4" "$5" "$6" "$7" "$8" "$9" "$10" "$11" "$12" "$13}' things

awk '{for(i=1;i<4;i++) $i="";print}' file

use cut
$ cut -f4-13 file
or if you insist on awk and $13 is the last field
$ awk '{$1=$2=$3="";print}' file
else
$ awk '{for(i=4;i<=13;i++)printf "%s ",$i;printf "\n"}' file

A solution that does not add extra leading or trailing whitespace:
awk '{ for(i=4; i<NF; i++) printf "%s",$i OFS; if(NF) printf "%s",$NF; printf ORS}'
### Example ###
$ echo '1 2 3 4 5 6 7' |
awk '{for(i=4;i<NF;i++)printf"%s",$i OFS;if(NF)printf"%s",$NF;printf ORS}' |
tr ' ' '-'
4-5-6-7
Sudo_O proposes an elegant improvement using the ternary operator NF?ORS:OFS
$ echo '1 2 3 4 5 6 7' |
awk '{ for(i=4; i<=NF; i++) printf "%s",$i (i==NF?ORS:OFS) }' |
tr ' ' '-'
4-5-6-7
EdMorton gives a solution preserving original whitespaces between fields:
$ echo '1 2 3 4 5 6 7' |
awk '{ sub(/([^ ]+ +){3}/,"") }1' |
tr ' ' '-'
4---5----6-7
BinaryZebra also provides two awesome solutions:
(these solutions even preserve trailing spaces from original string)
$ echo -e ' 1 2\t \t3 4 5 6 7 \t 8\t ' |
awk -v n=3 '{ for ( i=1; i<=n; i++) { sub("^["FS"]*[^"FS"]+["FS"]+","",$0);} } 1 ' |
sed 's/ /./g;s/\t/->/g;s/^/"/;s/$/"/'
"4...5...6.7.->.8->."
$ echo -e ' 1 2\t \t3 4 5 6 7 \t 8\t ' |
awk -v n=3 '{ print gensub("["FS"]*([^"FS"]+["FS"]+){"n"}","",1); }' |
sed 's/ /./g;s/\t/->/g;s/^/"/;s/$/"/'
"4...5...6.7.->.8->."
The solution given by larsr in the comments is almost correct:
$ echo '1 2 3 4 5 6 7' |
awk '{for (i=3;i<=NF;i++) $(i-2)=$i; NF=NF-2; print $0}' | tr ' ' '-'
3-4-5-6-7
This is the fixed and parametrized version of larsr solution:
$ echo '1 2 3 4 5 6 7' |
awk '{for(i=n;i<=NF;i++)$(i-(n-1))=$i;NF=NF-(n-1);print $0}' n=4 | tr ' ' '-'
4-5-6-7
All other answers before Sep-2013 are nice but add extra spaces:
Example of answer adding extra leading spaces:
$ echo '1 2 3 4 5 6 7' |
awk '{$1=$2=$3=""}1' |
tr ' ' '-'
---4-5-6-7
Example of answer adding extra trailing space
$ echo '1 2 3 4 5 6 7' |
awk '{for(i=4;i<=13;i++)printf "%s ",$i;printf "\n"}' |
tr ' ' '-'
4-5-6-7-------

Try this:
awk '{ $1=""; $2=""; $3=""; print $0 }'

The correct way to do this is with an RE interval because it lets you simply state how many fields to skip, and retains inter-field spacing for the remaining fields.
e.g. to skip the first 3 fields without affecting spacing between remaining fields given the format of input we seem to be discussing in this question is simply:
$ echo '1 2 3 4 5 6' |
awk '{sub(/([^ ]+ +){3}/,"")}1'
4 5 6
If you want to accommodate leading spaces and non-blank spaces, but again with the default FS, then it's:
$ echo ' 1 2 3 4 5 6' |
awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){3}/,"")}1'
4 5 6
If you have an FS that's an RE you can't negate in a character set, you can convert it to a single char first (RS is ideal if it's a single char since an RS CANNOT appear within a field, otherwise consider SUBSEP), then apply the RE interval subsitution, then convert to the OFS. e.g. if chains of "."s separated the fields:
$ echo '1...2.3.4...5....6' |
awk -F'[.]+' '{gsub(FS,RS);sub("([^"RS"]+["RS"]+){3}","");gsub(RS,OFS)}1'
4 5 6
Obviously if OFS is a single char AND it can't appear in the input fields you can reduce that to:
$ echo '1...2.3.4...5....6' |
awk -F'[.]+' '{gsub(FS,OFS); sub("([^"OFS"]+["OFS"]+){3}","")}1'
4 5 6
Then you have the same issue as with all the loop-based solutions that reassign the fields - the FSs are converted to OFSs. If that's an issue, you need to look into GNU awks' patsplit() function.

Pretty much all the answers currently add either leading spaces, trailing spaces or some other separator issue. To select from the fourth field where the separator is whitespace and the output separator is a single space using awk would be:
awk '{for(i=4;i<=NF;i++)printf "%s",$i (i==NF?ORS:OFS)}' file
To parametrize the starting field you could do:
awk '{for(i=n;i<=NF;i++)printf "%s",$i (i==NF?ORS:OFS)}' n=4 file
And also the ending field:
awk '{for(i=n;i<=m=(m>NF?NF:m);i++)printf "%s",$i (i==m?ORS:OFS)}' n=4 m=10 file

awk '{$1=$2=$3="";$0=$0;$1=$1}1'
Input
1 2 3 4 5 6 7
Output
4 5 6 7

echo 1 2 3 4 5| awk '{ for (i=3; i<=NF; i++) print $i }'

Another way to avoid using the print statement:
$ awk '{$1=$2=$3=""}sub("^"FS"*","")' file
In awk when a condition is true print is the default action.

I can't believe nobody offered plain shell:
while read -r a b c d; do echo "$d"; done < file

Options 1 to 3 have issues with multiple whitespace (but are simple).
That is the reason to develop options 4 and 5, which process multiple white spaces with no problem.
Of course, if options 4 or 5 are used with n=0 both will preserve any leading whitespace as n=0 means no splitting.
Option 1
A simple cut solution (works with single delimiters):
$ echo '1 2 3 4 5 6 7 8' | cut -d' ' -f4-
4 5 6 7 8
Option 2
Forcing an awk re-calc sometimes solve the problem (works with some versions of awk) of added leading spaces:
$ echo '1 2 3 4 5 6 7 8' | awk '{ $1=$2=$3="";$0=$0;} NF=NF'
4 5 6 7 8
Option 3
Printing each field formated with printf will give more control:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=3 '{ for (i=n+1; i<=NF; i++){printf("%s%s",$i,i==NF?RS:OFS);} }'
4 5 6 7 8
However, all previous answers change all FS between fields to OFS. Let's build a couple of solutions to that.
Option 4
A loop with sub to remove fields and delimiters is more portable, and doesn't trigger a change of FS to OFS:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=3 '{ for(i=1;i<=n;i++) { sub("^["FS"]*[^"FS"]+["FS"]+","",$0);} } 1 '
4 5 6 7 8
NOTE: The "^["FS"]*" is to accept an input with leading spaces.
Option 5
It is quite possible to build a solution that does not add extra leading or trailing whitespace, and preserve existing whitespace using the function gensub from GNU awk, as this:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=3 '{ print gensub("["FS"]*([^"FS"]+["FS"]+){"n"}","",1); }'
4 5 6 7 8
It also may be used to swap a field list given a count n:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=3 '{ a=gensub("["FS"]*([^"FS"]+["FS"]+){"n"}","",1);
b=gensub("^(.*)("a")","\\1",1);
print "|"a"|","!"b"!";
}'
|4 5 6 7 8 | ! 1 2 3 !
Of course, in such case, the OFS is used to separate both parts of the line, and the trailing white space of the fields is still printed.
Note1: ["FS"]* is used to allow leading spaces in the input line.

Cut has a --complement flag that makes it easy (and fast) to delete columns. The resulting syntax is analogous with what you want to do -- making the solution easier to read/understand. Complement also works for the case where you would like to delete non-contiguous columns.
$ foo='1 2 3 %s 5 6 7'
$ echo "$foo" | cut --complement -d' ' -f1-3
%s 5 6 7
$

Perl solution which does not add leading or trailing whitespace:
perl -lane 'splice #F,0,3; print join " ",#F' file
The perl #F autosplit array starts at index 0 while awk fields start with $1
Perl solution for comma-delimited data:
perl -F, -lane 'splice #F,0,3; print join ",",#F' file
Python solution:
python -c "import sys;[sys.stdout.write(' '.join(line.split()[3:]) + '\n') for line in sys.stdin]" < file

For me the most compact and compliant solution to the request is
$ a='1 2\t \t3 4 5 6 7 \t 8\t ';
$ echo -e "$a" | awk -v n=3 '{while (i<n) {i++; sub($1 FS"*", "")}; print $0}'
And if you have more lines to process as for instance file foo.txt, don't forget to reset i to 0:
$ awk -v n=3 '{i=0; while (i<n) {i++; sub($1 FS"*", "")}; print $0}' foo.txt
Thanks your forum.

As I was annoyed by the first highly upvoted but wrong answer I found enough to write a reply there, and here the wrong answers are marked as such, here is my bit. I do not like proposed solutions as I can see no reason to make answer so complex.
I have a log where after $5 with an IP address can be more text or no text. I need everything from the IP address to the end of the line should there be anything after $5. In my case, this is actualy withn an awk program, not an awk oneliner so awk must solve the problem. When I try to remove the first 4 fields using the old nice looking and most upvoted but completely wrong answer:
echo " 7 27.10.16. Thu 11:57:18 37.244.182.218 one two three" | awk '{$1=$2=$3=$4=""; printf "[%s]\n", $0}'
it spits out wrong and useless response (I added [] to demonstrate):
[ 37.244.182.218 one two three]
Instead, if columns are fixed width until the cut point and awk is needed, the correct and quite simple answer is:
echo " 7 27.10.16. Thu 11:57:18 37.244.182.218 one two three" | awk '{printf "[%s]\n", substr($0,28)}'
which produces the desired output:
[37.244.182.218 one two three]

I've found this other possibility, maybe it could be useful also...
awk 'BEGIN {OFS=ORS="\t" }; {for(i=1; i<14; i++) print $i " "; print $NF "\n" }' your_file
Note: 1. For tabular data and from column $1 to $14

Use cut:
cut -d <The character between characters> -f <number of first column>,<number of last column> <file name>
e.g.: If you have file1 containing : car.is.nice.equal.bmw
Run : cut -d . -f1,3 file1 will print car.is.nice

This isn't very far from some of the previous answers, but does solve a couple of issues:
cols.sh:
#!/bin/bash
awk -v s=$1 '{for(i=s; i<=NF;i++) printf "%-5s", $i; print "" }'
Which you can now call with an argument that will be the starting column:
$ echo "1 2 3 4 5 6 7 8 9 10 11 12 13 14" | ./cols.sh 3
3 4 5 6 7 8 9 10 11 12 13 14
Or:
$ echo "1 2 3 4 5 6 7 8 9 10 11 12 13 14" | ./cols.sh 7
7 8 9 10 11 12 13 14
This is 1-indexed; if you prefer zero indexed, use i=s + 1 instead.
Moreover, if you would like to have to arguments for the starting index and end index, change the file to:
#!/bin/bash
awk -v s=$1 -v e=$2 '{for(i=s; i<=e;i++) printf "%-5s", $i; print "" }'
For example:
$ echo "1 2 3 4 5 6 7 8 9 10 11 12 13 14" | ./cols.sh 7 9
7 8 9
The %-5s aligns the result as 5-character-wide columns; if this isn't enough, increase the number, or use %s (with a space) instead if you don't care about alignment.

AWK printf-based solution that avoids % problem, and is unique in that it returns nothing (no return character) if there are less than 4 columns to print:
awk 'NF > 3 { for(i=4; i<NF; i++) printf("%s ", $(i)); print $(i) }'
Testing:
$ x='1 2 3 %s 4 5 6'
$ echo "$x" | awk 'NF > 3 { for(i=4; i<NF; i++) printf("%s ", $(i)); print $(i) }'
%s 4 5 6
$ x='1 2 3'
$ echo "$x" | awk 'NF > 3 { for(i=4; i<NF; i++) printf("%s ", $(i)); print $(i) }'
$ x='1 2 3 '
$ echo "$x" | awk 'NF > 3 { for(i=4; i<NF; i++) printf("%s ", $(i)); print $(i) }'
$

Related

awk Can not Select Column with empty value

i am trying to select a column with its missing value
here is my input file separated by tab
1 2 3
4 5
6
7 8
9
i am trying to select the first column in which output will look like
1
4
7
and the length of my column would be 5 in this case
I have tried
awk '$1!=""{print $1}' ./demo.txt
but it returns
1
4
6
7
9
can anybody help with this I am new in AWK
You can use cut:
$ cut -f 1 file # the default delimiter is a tab
Or with sed:
$ sed 's/[[:blank:]].*$//' file
Or awk:
$ awk '{sub(/[[:blank:]].*$/,"")}1' file
Or:
$ awk 'BEGIN{FS=OFS="\t"} {print $1}' file
All those print the first column and all five lines (blank or not)
Prints:
1
4
7
Tell awk to use a tab (\t) as the input field delimiter (-F):
$ awk -F'\t' '{ print $1 }' demo.txt
1
4
7
If you want to print multiple columns, maintaining the same delimiter for output, another approach using the FS and OFS variables:
$ awk 'BEGIN { FS=OFS="\t" } { print $1,$3 }' demo.txt
1 3
4 5
7
9
With sed something like:
sed 's/^\([^[:blank:]]*\).*/\1/' demo.txt
Using FIELDWIDTHS in gnu-awk you can do this for fixed width separated data:
awk 'BEGIN {FIELDWIDTHS = "4 4 *"} {print $1}' file
1
4
7
For demo purpose:
awk 'BEGIN {FIELDWIDTHS = "4 4 *"} {print NR ":", $1}' file
1: 1
2: 4
3:
4: 7
5:
if they're all single digits in 1st column :
echo \
'1 2 3
4 5
6
7 8
9' |
mawk NF=1 FS=
gcat -n
1 1
2 4
3
4 7
5
that's literally all you need. To play it safe, then do
nawk NF=1 FS='[[:space:]]' # overly-verbose so-called
# "proper" posix form
gawk NF=1 FS='[ \t]' # suffices unless the input
# happens to have uncommon bytes
# like \013 \v or \014 \f
or a very fringe way of fudging NF :
mawk 'NF ^= FS="[ \t]"'

Assigning output from awk to variables

I'm trying to create a bash script that ingests the output of another script cpu_latency.bt
The output of cpu_latency.bt is generated every second and looks similar to:
#usecs:
[0] 3 |########## |
[1] 5 |################# |
[2, 4) 5 |################# |
[4, 8) 0 | |
[8, 16) 5 |################# |
[16, 32) 15 |####################################################|
[32, 64) 1 |### |
[64, 128) 0 | |
[128, 256) 1 |### |
#usecs:
[0] 1 |### |
[1] 1 |### |
[2, 4) 6 |###################### |
[4, 8) 2 |####### |
[8, 16) 4 |############## |
I am trying to capture only the first number after the [ and then the number before the first | (so in the last line above that would be 8 and 4
The script below is fairly close (with the exception of handling [0] and [1] lines):
duration=10
while read line
do
echo $line | cut -d "|" -f1 | sed 's/\[//g; s/\,//g; s/)//g' | awk '{print $1,$3}' |
while read key value; do
print "The Key is "$key "and the Value is "$value
done
done < <(timeout $duration cpu_latency.bt | grep "\[")
However the output it returns is not quite right:
Error: no such file "The Key is 16"
Error: no such file "and the Value is 10"
Error: no such file "The Key is 16"
Error: no such file "and the Value is 9"
Can recommend a better way of assigning the output of $1 and $3 to variables so I can write them out to a file?
Thanks
CiCa
#RavinderSingh13 I'm not sure if I I'm misunderstanding how I can use an array for this, but a little more work with a while loop has gotten me considerably closer to what I'm aiming for:
while read key value
do
echo `hostname`".cpu-lat."$key"\\"$value"\\`date +"%s"`" >> /tmp/stats.out
done < <(
timeout $duration /root/bpftrace/cpu_latency.bt | awk '
match($0,/^\[[0-9]+/){
val=substr($0,RSTART+1,RLENGTH-1)
match($0,/[0-9]+ +\|/)
val2=substr($0,RSTART,RLENGTH)
sub(/ +\|/,"",val2)
print val,val2
val=val2=""
}' )
Is this what you're trying to do?
$ awk -F'[][,[:space:]]+' 'sub(/ \|.*/,""){print $2, $NF}' file
0 3
1 5
2 5
4 0
8 5
16 15
32 1
64 0
128 1
0 1
1 1
2 6
4 2
8 4
.
$ awk -v host="$(hostname)" -v date="$(date +%s)" -F'[][,[:space:]]+' '
sub(/ \|.*/,"") { printf "%s.cpu-lat.%s\\%s\\%s\n", host, $2, $NF, date }
' file
mypc.cpu-lat.0\3\1576428453
mypc.cpu-lat.1\5\1576428453
mypc.cpu-lat.2\5\1576428453
mypc.cpu-lat.4\0\1576428453
mypc.cpu-lat.8\5\1576428453
mypc.cpu-lat.16\15\1576428453
mypc.cpu-lat.32\1\1576428453
mypc.cpu-lat.64\0\1576428453
mypc.cpu-lat.128\1\1576428453
mypc.cpu-lat.0\1\1576428453
mypc.cpu-lat.1\1\1576428453
mypc.cpu-lat.2\6\1576428453
mypc.cpu-lat.4\2\1576428453
mypc.cpu-lat.8\4\1576428453
sed -e 's/[^0-9]/ /g' lala |gawk '{ print $1, substr($0,20,5) }'
The 'sed' remove all non-numbers
The 'awk' will print first argument, and from position 20, 5 characters of text. (Which can be changed to numeric by addin 0 to it):
sed -e 's/[^0-9]/ /g' lala |gawk '{ print $1, 0+substr($0,20,5) }'
Of course, Ed is right!:
gawk '{ gsub(/[^0-9]/," "); print $1, 0+substr($0,20,5) }'
EDIT: After seeing OP's attempt from my code, adding following now.
awk -v host="$(hostname)" -v date="$(date +%s)" '
match($0,/^\[[0-9]+/){
val=substr($0,RSTART+1,RLENGTH-1)
match($0,/[0-9]+ +\|/)
val2=substr($0,RSTART,RLENGTH)
sub(/ +\|/,"",val2)
printf("%s.cpu-lat.%s\\%s\\%s\n", host, val, val2, date)
val=val2=""
}' Input_file
Could you please try following. I have mentioned Input_file to pass to awk here in case you want to pass s command's output into awk then try like: your_script | following awk code without Input_file
awk '
match($0,/^\[[0-9]+/){
val=substr($0,RSTART+1,RLENGTH-1)
match($0,/[0-9]+ +\|/)
val2=substr($0,RSTART,RLENGTH)
sub(/ +\|/,"",val2)
print val,val2
val=val2=""
}' Input_file

How to replace multiple empty fields into zeroes using awk

I am using the following command to replace tab delimited empty fields with zeroes.
awk 'BEGIN { FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = 0 }; 1'
How can I do the same, if I have the following input that is not tab delimited and have multiple empty fields ?
input
name A1348138 A1086070 A1080879 A1070208 A821846 A1068905 A1101931
g1 5 8 1 2 1 3 1
g2 1 3 2 1 1 2
desired output
name A1348138 A1086070 A1080879 A1070208 A821846 A1068905 A1101931
g1 5 8 1 2 1 3 1
g2 1 3 2 1 1 2 0
I'd suggest using GNU awk for FIELDWIDTHS to solve the problem you appear to be asking about and also to convert your fixed-width input to tab-separated output (or something else sensible) while you're at it:
$ cat file
1 2 3
4 6
$ gawk -v FIELDWIDTHS='4 4 4' -v OFS='\t' '{for (i=1;i<=NF;i++) {gsub(/^[[:space:]]+|[[:space:]]+$/,"",$i); $i=($i==""?0:$i)}; print}' file
1 2 3
4 0 6
$ gawk -v FIELDWIDTHS='4 4 4' -v OFS=',' '{for (i=1;i<=NF;i++) {gsub(/^[[:space:]]+|[[:space:]]+$/,"",$i); $i=($i==""?0:$i)}; print}' file
1,2,3
4,0,6
$ gawk -v FIELDWIDTHS='4 4 4' -v OFS=',' '{for (i=1;i<=NF;i++) {gsub(/^[[:space:]]+|[[:space:]]+$/,"",$i); $i="\""($i==""?0:$i)"\""}; print}' file
"1","2","3"
"4","0","6"
Take your pick of the above.

Multiply every nth field...elegantly

I have a text file with a series of numbers:
1 2 4 2 2 6 3 4 7 4 4 8 2 4 6 5 5 8
I need to have every third field multiplied by 3, so output would be:
1 2 12 2 2 18 3 4 21 4 4 24 2 4 18 5 5 24
Now, I've hammered out a solution already, but I know there's a quicker, more elegant one out there. Here's what I've gotten to work:
xargs -n1 < input.txt | awk '{printf NR%3 ? "%d " : $0*3" ", $1}' > output.txt
I feel that there must be an awk one-liner that can do this?? How can I make awk look at each field (instead of each record), thus not needing the call to xargs to put every field on a different line? Or maybe sed can do it?
Try:
awk '{for (i=3;i<=NF;i+=3)$i*=3; print}' input.txt > output.txt
I have not tested this yet (posted on my iPod). The print command without parameters should print out the whole (partially modified) line. You might have to set OFS=" " in the BEGIN section to get the blank as the separator in the output.
this line would work too:
awk -v RS="\\n| " -v ORS=" " '!(NR%3){$0*=3}7' file

Substituting multiple (but not all) occurences of a pattern using AWK's gensub() function

This example removes the fifth occurrence of a regular expression:
printf "%s " $(seq 9) | gawk '{ print gensub(/[0-9]/,"","5") }'
1 2 3 4 6 7 8 9
This example removes the sixth instance of a regular expression:
printf "%s " $(seq 9) | gawk '{ print gensub(/[0-9]/,"","6") }'
1 2 3 4 5 7 8 9
Is possible to save the above examples in one?
I tried it, but it does not work:
printf "%s " $(seq 9) | gawk '{ print gensub(/[0-9]/,"","5|6") }'
2 3 4 5 6 7 8 9
I want printed:
1 2 3 4 7 8 9
According to the documentation, the one of the ways I can think of:
printf "%s " $(seq 9) | gawk 'END{ print gensub(/[0-9]/,"","5",gensub(/[0-9]/,"","5")) }'
And another way (with your very specific input):
printf "%s " $(seq 9) | gawk 'END { print gensub(/[0-9] [0-9]/,"","3") }'
Search the target string target for matches of the regular expression
regexp. If how is a string beginning with ‘g’ or ‘G’ (short for
“global”), then replace all matches of regexp with replacement.
Otherwise, how is treated as a number indicating which match of regexp
to replace. If no target is supplied, use $0. It returns the modified
string as the result of the function and the original target string is
not changed.