Printing everything except the first field with awk - awk

I have a file that looks like this:
AE United Arab Emirates
AG Antigua & Barbuda
AN Netherlands Antilles
AS American Samoa
BA Bosnia and Herzegovina
BF Burkina Faso
BN Brunei Darussalam
And I 'd like to invert the order, printing first everything except $1 and then $1:
United Arab Emirates AE
How can I do the "everything except field 1" trick?

$1="" leaves a space as Ben Jackson mentioned, so use a for loop:
awk '{for (i=2; i<=NF; i++) print $i}' filename
So if your string was "one two three", the output will be:
two
three
If you want the result in one row, you could do as follows:
awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}' filename
This will give you: "two three"

Assigning $1 works but it will leave a leading space: awk '{first = $1; $1 = ""; print $0, first; }'
You can also find the number of columns in NF and use that in a loop.
From Thyag: To eliminate the leading space, add sed to the end of the command:
awk {'first = $1; $1=""; print $0'}|sed 's/^ //g'

Use the cut command with -f 2- (POSIX) or --complement (not POSIX):
$ echo a b c | cut -f 2- -d ' '
b c
$ echo a b c | cut -f 1 -d ' '
a
$ echo a b c | cut -f 1,2 -d ' '
a b
$ echo a b c | cut -f 1 -d ' ' --complement
b c

Maybe the most concise way:
$ awk '{$(NF+1)=$1;$1=""}sub(FS,"")' infile
United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN
Explanation:
$(NF+1)=$1: Generator of a "new" last field.
$1="": Set the original first field to null
sub(FS,""): After the first two actions {$(NF+1)=$1;$1=""} get rid of the first field separator by using sub. The final print is implicit.

awk '{sub($1 FS,"")}7' YourFile
Remove the first field and separator, and print the result (7 is a non zero value so printing $0).

awk '{ saved = $1; $1 = ""; print substr($0, 2), saved }'
Setting the first field to "" leaves a single copy of OFS at the start of $0. Assuming that OFS is only a single character (by default, it's a single space), we can remove it with substr($0, 2). Then we append the saved copy of $1.

If you're open to a Perl solution...
perl -lane 'print join " ",#F[1..$#F,0]' file
is a simple solution with an input/output separator of one space, which produces:
United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN
This next one is slightly more complex
perl -F` ` -lane 'print join " ",#F[1..$#F,0]' file
and assumes that the input/output separator is two spaces:
United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-F autosplit modifier, in this example splits on ' ' (two spaces)
-e execute the following perl code
#F is the array of words in each line, indexed starting with 0
$#F is the number of words in #F
#F[1..$#F] is an array slice of element 1 through the last element
#F[1..$#F,0] is an array slice of element 1 through the last element plus element 0

Let's move all the records to the next one and set the last one as the first:
$ awk '{a=$1; for (i=2; i<=NF; i++) $(i-1)=$i; $NF=a}1' file
United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN
Explanation
a=$1 save the first value into a temporary variable.
for (i=2; i<=NF; i++) $(i-1)=$i save the Nth field value into the (N-1)th field.
$NF=a save the first value ($1) into the last field.
{}1 true condition to make awk perform the default action: {print $0}.
This way, if you happen to have another field separator, the result is also good:
$ cat c
AE-United-Arab-Emirates
AG-Antigua-&-Barbuda
AN-Netherlands-Antilles
AS-American-Samoa
BA-Bosnia-and-Herzegovina
BF-Burkina-Faso
BN-Brunei-Darussalam
$ awk 'BEGIN{OFS=FS="-"}{a=$1; for (i=2; i<=NF; i++) $(i-1)=$i; $NF=a}1' c
United-Arab-Emirates-AE
Antigua-&-Barbuda-AG
Netherlands-Antilles-AN
American-Samoa-AS
Bosnia-and-Herzegovina-BA
Burkina-Faso-BF
Brunei-Darussalam-BN

The field separator in gawk (at least) can be a string as well as a character (it can also be a regex). If your data is consistent, then this will work:
awk -F " " '{print $2,$1}' inputfile
That's two spaces between the double quotes.

awk '{ tmp = $1; sub(/^[^ ]+ +/, ""); print $0, tmp }'

Option 1
There is a solution that works with some versions of awk:
awk '{ $(NF+1)=$1;$1="";$0=$0;} NF=NF ' infile.txt
Explanation:
$(NF+1)=$1 # add a new field equal to field 1.
$1="" # erase the contents of field 1.
$0=$0;} NF=NF # force a re-calc of fields.
# and use NF to promote a print.
Result:
United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN
However that might fail with older versions of awk.
Option 2
awk '{ $(NF+1)=$1;$1="";sub(OFS,"");}1' infile.txt
That is:
awk '{ # call awk.
$(NF+1)=$1; # Add one trailing field.
$1=""; # Erase first field.
sub(OFS,""); # remove leading OFS.
}1' # print the line.
Note that what needs to be erased is the OFS, not the FS. The line gets re-calculated when the field $1 is asigned. That changes all runs of FS to one OFS.
But even that option still fails with several delimiters, as is clearly shown by changing the OFS:
awk -v OFS=';' '{ $(NF+1)=$1;$1="";sub(OFS,"");}1' infile.txt
That line will output:
United;Arab;Emirates;AE
Antigua;&;Barbuda;AG
Netherlands;Antilles;AN
American;Samoa;AS
Bosnia;and;Herzegovina;BA
Burkina;Faso;BF
Brunei;Darussalam;BN
That reveals that runs of FS are being changed to one OFS.
The only way to avoid that is to avoid the field re-calculation.
One function that can avoid re-calc is sub.
The first field could be captured, then removed from $0 with sub, and then both re-printed.
Option 3
awk '{ a=$1;sub("[^"FS"]+["FS"]+",""); print $0, a;}' infile.txt
a=$1 # capture first field.
sub( " # replace:
[^"FS"]+ # A run of non-FS
["FS"]+ # followed by a run of FS.
" , "" # for nothing.
) # Default to $0 (the whole line.
print $0, a # Print in reverse order, with OFS.
United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN
Even if we change the FS, the OFS and/or add more delimiters, it works.
If the input file is changed to:
AE..United....Arab....Emirates
AG..Antigua....&...Barbuda
AN..Netherlands...Antilles
AS..American...Samoa
BA..Bosnia...and...Herzegovina
BF..Burkina...Faso
BN..Brunei...Darussalam
And the command changes to:
awk -vFS='.' -vOFS=';' '{a=$1;sub("[^"FS"]+["FS"]+",""); print $0,a;}' infile.txt
The output will be (still preserving delimiters):
United....Arab....Emirates;AE
Antigua....&...Barbuda;AG
Netherlands...Antilles;AN
American...Samoa;AS
Bosnia...and...Herzegovina;BA
Burkina...Faso;BF
Brunei...Darussalam;BN
The command could be expanded to several fields, but only with modern awks and with --re-interval option active. This command on the original file:
awk -vn=2 '{a=$1;b=$2;sub("([^"FS"]+["FS"]+){"n"}","");print $0,a,b;}' infile.txt
Will output this:
Arab Emirates AE United
& Barbuda AG Antigua
Antilles AN Netherlands
Samoa AS American
and Herzegovina BA Bosnia
Faso BF Burkina
Darussalam BN Brunei

There's a sed option too...
sed 's/\([^ ]*\) \(.*\)/\2 \1/' inputfile.txt
Explained...
Swap
\([^ ]*\) = Match anything until we reach a space, store in $1
\(.*\) = Match everything else, store in $2
With
\2 = Retrieve $2
\1 = Retrieve $1
More thoroughly explained...
s = Swap
/ = Beginning of source pattern
\( = start storing this value
[^ ] = text not matching the space character
* = 0 or more of the previous pattern
\) = stop storing this value
\( = start storing this value
. = any character
* = 0 or more of the previous pattern
\) = stop storing this value
/ = End of source pattern, beginning of replacement
\2 = Retrieve the 2nd stored value
\1 = Retrieve the 1st stored value
/ = end of replacement

If you're open to another Perl solution:
perl -ple 's/^(\S+)\s+(.*)/$2 $1/' file

A first stab at it seems to work for your particular case.
awk '{ f = $1; i = $NF; while (i <= 0); gsub(/^[A-Z][A-Z][ ][ ]/,""); print $i, f; }'

Yet another way...
...this rejoins the fields 2 thru NF with the FS and outputs one line per line of input
awk '{for (i=2;i<=NF;i++){printf $i; if (i < NF) {printf FS};}printf RS}'
I use this with git to see what files have been modified in my working dir:
git diff| \
grep '\-\-git'| \
awk '{print$NF}'| \
awk -F"/" '{for (i=2;i<=NF;i++){printf $i; if (i < NF) {printf FS};}printf RS}'

Another and easy way using cat command
cat filename | awk '{print $2,$3,$4,$5,$6,$1}' > newfilename

Related

gawk - Delimit lines with custom character and no similar ending character

Let's say I have a file like so:
test.txt
one
two
three
I'd like to get the following output: one|two|three
And am currently using this command: gawk -v ORS='|' '{ print $0 }' test.txt
Which gives: one|two|three|
How can I print it so that the last | isn't there?
Here's one way to do it:
$ seq 1 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}'
1
$ seq 3 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}'
1|2|3
With paste:
$ seq 1 | paste -sd'|'
1
$ seq 3 | paste -sd'|'
1|2|3
Convert one column to one row with field separator:
awk '{$1=$1} 1' FS='\n' OFS='|' RS='' file
Or in another notation:
awk -v FS='\n' -v OFS='|' -v RS='' '{$1=$1} 1' file
Output:
one|two|three
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
awk solutions work great. Here is tr + sed solution:
tr '\n' '|' < file | sed 's/\|$//'
1|2|3
just flatten it :
gawk/mawk 'BEGIN { FS = ORS; RS = "^[\n]*$"; OFS = "|"
} NF && ( $NF ? NF=NF : —-NF )'
ascii | = octal \174 = hex 0x7C. The reason for —-NF is that more often than not, the input includes a trailing new line, which makes field count 1 too many and result in
1|2|3|
Both NF=NF and --NF are similar concepts to $1=$1. Empty inputs, regardless of whether trailing new lines exist or not, would result in nothing printed.
At the OFS spot, you can delimit it with any string combo you like instead of being constrained by tr, which has inconsistent behavior. For instance :
gtr '\012' '高' # UTF8 高 = \351\253\230 = xE9 xAB x98
on bsd-tr, \n will get replaced by the unicode properly 1高2高3高 , but if you're on gnu-tr, it would only keep the leading byte of the unicode, and result in
1 \351 2 \351 . . .
For unicode equiv-classes, bsd-tr works as expected while gtr '[=高=]' '\v' results in
gtr: ?\230: equivalence class operand must be a single character
and if u attempt equiv-classes with an arbitrary non-ASCII byte, bsd-tr does nothing while gnu-tr would gladly oblige, even if it means slicing straight through UTF8-compliant characters :
g3bn 77138 | (g)tr '[=\224=]' '\v'
bsd-tr : 77138=Koyote 코요태 KYT✜ 高耀太
gnu-tr : 77138=Koyote ?
?
태 KYT✜ 高耀太
I would do it following way, using GNU AWK, let test.txt content be
one
two
three
then
awk '{printf NR==1?"%s":"|%s", $0}' test.txt
output
one|two|three
Explanation: If it is first line print that line content sans trailing newline, otherwise | followed by line content sans trailing newline. Note that I assumed that test.txt has not trailing newline, if this is not case test this solution before applying it.
(tested in gawk 5.0.1)
Also you can try this with awk:
awk '{ORS = (NR%3 ? "|" : RS)} 1' file
one|two|three
% is the modulo operator and NR%3 ? "|" : RS is a ternary expression.
See Ed Morton's explanation here: https://stackoverflow.com/a/55998710/14259465
With a GNU sed, you can pass -z option to match line breaks, and thus all you need is replace each newline but the last one at the end of string:
sed -z 's/\n\(.\)/|\1/g' test.txt
perl -0pe 's/\n(?!\z)/|/g' test.txt
perl -pe 's/\n/|/g if !eof' test.txt
See the online demo.
Details:
s - substitution command
\n\(.\) - an LF char followed with any one char captured into Group 1 (so \n at the end of string won't get matched)
|\1 - a | char and the captured char
g - all occurrences.
The first perl command matches any LF char (\n) not at the end of string ((?!\z)) after slurping the whole file into a single string input (again, to make \n visible to the regex engine).
The second perl command replaces an LF char at the end of each line except the one at the end of file (eof).
To make the changes inline add -i option (mind this is a GNU sed example):
sed -i -z 's/\n\(.\)/|\1/g' test.txt
perl -i -0pe 's/\n(?!\z)/|/g' test.txt
perl -i -pe 's/\n/|/g if !eof' test.txt

Insert blank based on first digit of line

Input:
3abdce
412ae3
21dege
Expected Output - starting digit of line is removed and a blank inserted based on the offset specified by that digit
abd ce
12ae 3
1d ege
I can only remove the first character:
sed 's/^.\{1\}//g' file
GNU awk solution:
awk -v FS="" '{ print substr($0,2,$1), substr($0,$1+2) }' file
$1 - points to the 1st figure value (slice size)
The output:
abd ce
12ae 3
1d ege
this one should do the trick:
awk '{ split($0, a, ""); print substr($0, 2, a[1])" "substr($0, 2+a[1]) }' yourfile
Output:
abd ce
12ae 3
1d ege
If perl is okay
$ perl -F -lane 'print #F[1..$F[0]], " ", #F[$F[0]+1..$#F]' ip.txt
abd ce
12ae 3
1d ege
-F -lane split each line on empty string, so each character is a field, saved in #F array
Then print as required, indexing starts from 0
Using gawk as it supports empty FS and OFS
awk -v FS="" -v OFS="" '{gsub($($1+1),"& ");gsub(/^./,"")}1' inputfile
abd ce
12ae 3
1d ege
Here, FS and OFS are set to blank and two gsub functions are used to to the required search and replace operation.

Replace using gsub in awk

trying to do a gsub in awk.
I want to replace single space with underscore, but the adjoining characters are replaced
awk -F" +" 'NF > 1 {gsub(/[[:alnum:]][ ][[:alnum:]]/, "_")}1' file
Input:
this is example
ca bc dec cat
251 otg op con
this is what I get:
this is example
ca bc de_at
251 otg o_on
Desired output:
this is example
ca bc dec_cat
251 otg op_con
This is one area where (non-GNU) awk isn't the best tool for the job. I'd suggest using sed instead:
$ sed '/ / s/\([[:alnum:]]\) \([[:alnum:]]\)/\1_\2/g' file
this is example
ca bc dec_cat
251 otg op_con
This performs substitutions on lines containing 2 or more spaces, which is an equivalent condition to NF > 1 given your field separator.
The key here is to capture the characters before and after the space and then use them in the replacement. This can be done in GNU awk too, using gensub:
$ gawk -F" +" 'NF > 1 { $0 = gensub(/([[:alnum:]]) ([[:alnum:]])/, "\\1_\\2", 1) }1' file
this is example
ca bc dec_cat
251 otg op_con
gensub returns the result of the substitution, so it must be reassigned to $0 in order to affect the output.

Delete text before comma in a delimited field

I have a pipe delimited file where I want to remove all text before a comma in field 9.
Example line:
www.upstate.edu|upadhyap|Prashant K Upadhyaya, MD||General Surgery|http://www.upstate.edu/hospital/providers/doctors/?docID=upadhyap|Patricia J. Numann Center for Breast, Endocrine & Plastic Surgery|Upstate Specialty Services at Harrison Center|Suite D, 550 Harrison Street||Syracuse|NY|13202|
so the targeted field is: |Suite D, 550 Harrison Street|
and I want it to look like: |550 Harrison Street|
So far what I have tried has either deleted information from other fields (usually the name in field 3) or has had no effect.
The .awk script I have been trying to write looks like this:
mv $1 $1.bak4
cat $1.bak4 | awk -F "|" '{
gsub(/*,/,"", $9);
print $0
}' > $1
The pattern argument to gsub is a regex not a glob. Your * isn't matching what you expect it to. You want /.*,/ there. You are also going to need to OFS to | to keep that delimiter.
mv $1 $1.bak4
awk 'BEGIN{ FS = OFS = "|" }{ gsub(/.*,/,"",$9) } 1' $1.bak4 > $1
I also replaced the verbose print line you had with a true pattern (1) that uses the fact that the default action is print.

Print line numbers starting at zero using awk

Can anyone tell me how to print line numbers including zero using awk?
Here is my input file stackfile2.txt
when I run the below awk command I get actual_output.txt
awk '{print NR,$0}' stackfile2.txt | tr " ", "," > actual_output.txt
whereas my expected output is file.txt
How do I print the line numbers starting with zero (0)?
NR starts at 1, so use
awk '{print NR-1 "," $0}'
Using awk.
i starts at 0, i++ will increment the value of i, but return the original value that i held before being incremented.
awk '{print i++ "," $0}' file
Another option besides awk is nl which allows for options -v for setting starting value and -n <lf,rf,rz> for left, right and right with leading zeros justified. You can also include -s for a field separator such as -s "," for comma separation between line numbers and your data.
In a Unix environment, this can be done as
cat <infile> | ...other stuff... | nl -v 0 -n rz
or simply
nl -v 0 -n rz <infile>
Example:
echo "Here
are
some
words" > words.txt
cat words.txt | nl -v 0 -n rz
Out:
000000 Here
000001 are
000002 some
000003 words
If Perl is an option, you can try this:
perl -ne 'printf "%s,$_" , $.-1' file
$_ is the line
$. is the line number