What is a Unix oneliner to swap between fields? - awk

I have a file with a list
id1 str1 str2 .. strn
id2 str1 str2 .. strm
(the number of str can vary) and I want a oneliner that transforms it into
str1 str2 .. strn [id]
str1 str2 .. strm [id]
There should be a way with awk to do that, but I don't know how to take "all fields" after $1, when they are of variable length.
My idea would be something like
cat file | awk '{ print $2 and the rest " [" $1 "]" }'
but just missing the "$2 and the rest"....

With Perl
perl -F'\s+' -E 'say join " ", #F[1..$#F], "[" . #F[0] . "]"' file
Output
str1 str2 ... strn [id1]
str1 str2 ... strm [id2]

$ awk '{for(i=2;i<=NF;i++)printf "%s%s",$i,OFS (i==NF?"[" $1 "]" ORS:"")}' file
Output:
str1 str2 .. strn [id1]
str1 str2 .. strm [id2]

For just handling the first field then a regex based solution seems simple enough:
sed -E 's/([^[ ]+) (.*)/\2 [\1]/'

Everyone stands up and moves one space.
echo "a b c d e f" | awk '{ f=$1; for(i=1; i<NF; i++){ $i=$(i+1) }; $NF=f }1'
Output:
b c d e f a

Like this:
awk '{v=$1;$1="";sub(/^ /, "");$NF=$NF" ["v"]"}1' file
Or exploded multi-line for readability:
awk '{
v=$1
$1=""
sub(/^ /, "")
$NF=$NF" ["v"]"}
1
' file
Output
str1 str2 ... strn [id1]
str1 str2 ... strm [id2]
Explanations
code
comment
v=$1
assign $1 in v variable
$1=""
unset $1
sub(/^ /, "")
remove leading space from $0
$NF=$NF" ["v"]"
append to the latest field $NF with expected output with id as v variable
1
shorthand for print

$ awk '{$0=$0 " [" $1 "]"; sub(/^[^ ]+ /,"")} 1' file
str1 str2 .. strn [id1]
str1 str2 .. strm [id2]
or if you prefer:
$ awk '{for (i=2; i<=NF; i++) printf "%s ", $i; print "[" $1 "]"}' file
str1 str2 .. strn [id1]
str1 str2 .. strm [id2]

I would harness GNU AWK following way, let file.txt content be
id1 str1 str2 .. strn
id2 str1 str2 .. strm
then
awk '{print substr($0,index($0," ")+1),"[" $1 "]"}' file.txt
gives output
str1 str2 .. strn [id1]
str1 str2 .. strm [id2]
Warning: I assume that your values are sheared by single spaces, if this is not case do not use this solution. Explanation: I use String functions to get
$2 and the rest
by finding placement of first space (via index function) and than getting everything beyond that space (via substr function), which is then followed by 1st field value encased in [...].
(tested in GNU Awk 5.0.1)

Another perl answer
perl -lane '$f = shift #F; push #F, "[$f]"; print "#F"' file

Or a very close, but whitespace agnostic substitution with sed using the [[:space:]] list in character-class to handle the whitespace regardless of whether it is ' ', or '\t' or mixed using sed with the extended REGEX option would be:
sed -E 's/^([^[:space:]]+)[[:space:]]+(.*$)/\2 [\1]/' file
If your sed doesn't support extended regex (and doesn't use the -r option instead of -E for it), then you can do it with standard set regex by escaping the '(' and ')' backreference capture delimiters, e.g.
sed 's/^\([^[:space:]][^[:space:]]*\)[[:space:]][[:space:]]*\(.*$\)/\2 [\1]/' file
(note: standard regex doesn't support the '+', one-or-more occurrence, repetition operator so the '*' zero-or-more repetition is used requiring one literal match also be included to create the equivalent one-or-more repetition match.)
In both cases the standard substitution form is 's/find/replace/ shown with Extended REGEX below where:
find:
^ anchors the search at the beginning of line,
([^[:space:]]+) captures all (one-or-more) characters from the beginning that are not whitespace (the '^' in the character-class inverts the match) for use as the 1st backreference,
[[:space:]]+ select one or more spaces,
(.*$) capture all remaining characters to '$' (end of line) for the 2nd backreference.
replace
\2 insert the text captured as the 2nd backreference,
' ' insert a space, and finally,
\1 insert the text captured as the 1st backreference.
Example Use/Output
$ sed -E 's/^([^[:space:]]+)[[:space:]]+(.*$)/\2 [\1]/' << 'eof'
id1 str1 str2 .. strn
id2 str1 str2 .. strm
eof
str1 str2 .. strn [id1]
str1 str2 .. strm [id2]
Let me know if you have questions.

Using gnu awk and a pattern with 2 capture groups:
awk 'match($0, /^([^ ]+) +(.+)/, a) {print a[2], "["a[1]"]"}' file
Or using POSIX bracket expressions to match spaces with [[:space:]]
gawk 'match($0, /^([^[:space:]]+)[[:space:]]+(.+)/, a) {print a[2], "["a[1]"]"}' file
Output
str1 str2 .. strn [id1]
str1 str2 .. strm [id2]

this should be one-liner- enough ?
echo 'id1 str1 str2 .. strn
id2 str1 str2 .. strm' |
{m,g}awk '$!NF = substr($_, index($_, $(!_+!_))) " [" $!_ "]"'
str1 str2 .. strn [id1]
str1 str2 .. strm [id2]

Related

Replacing with Awk and preserving the FS to OFS

I have a file with input text below (this is not the original file and just example of input text ) and I want to replace all the 2 letter string to numeric 100 . In this file FS can be :,| or " " (space) , I have no other choice but to treat all three of them as FS, and I want to preserve these field separators at the original position (as in input file) in the output
A:B C|D
AA:C EE G
BB|FF XX1 H
DD:MM:YY K
I have tried
awk -F"[:| ]" '{gsub(/[A-Z]{2}/,"100");print}'
but this does not seem to work , please suggest.
Desired output:
A:B C|D
100:C 1000 G
100|100 1001 H
100:100:100 K
There is no functionality in POSIX awk to retain the strings that match the string defined by RS (POSIX) or regexp defined by FS. Since in POSIX RS is just a string there's no need for such functionality and to do it for every FS matching string would be unnecessarily inefficient given it's rarely needed.
With GNU awk where RS can be a regexp, not just a string, you can retain the string that matched the regexp RS with RT but there is no functionality that retains the values that match FS for the same efficiency reason that POSIX doesn't do it. Instead in GNU awk they added a 4th arg to split() so you can retain the strings that match FS in an array yourself if you want it (seps[] below):
$ awk -v FS='[:| ]' '{
split($0,flds,FS,seps)
gsub(/[A-Z]{2}/,"100")
for (i=1;i<=NF;i++) {
printf "%s%s", $i, seps[i]
}
print ""
}' file
A:B C|D
100:C 100 G
100|100 1001 H
100:100:100 K
Look up split() in the GNU awk manual for more info.
in this case
sed 's/[A-Z]\{2\}/100/g' YourFile
awk '{gsub(/[A-Z]{2}/, "100"); print}' YourFile
no need of field separation in this case, change all group of upper letter by "100", unless you specify other constraint than in OP (like other element in the string, you than need to specify what is possible and idealy, add a sample of expected result to be univoq)
Now you certainly have lot more thing around, so this code will certainly failed by changing thing like ABC:DEF with 100C:100F that is certainly not expected
in this case
awk -F '[[:blank:]:|]+' '
{
split( $0, aS, /[^[:blank:]:|]+/)
for( i=1;i<=NF;i++){
if( $i ~ /^[A-Z][A-Z]$/) $i = "100"
printf( "%s%s", $i, aS[i+1])
}
printf( "\n" )
} ' YourFile
Give this sed one-liner a try:
kent$ sed -r 's/(^|[:| ])[A-Z][A-Z]([:| ]|$)/\1100\2/g' file
A:B C|D
100:C 100 G
100|FF XX1 H
100:MM:100 K
Note:
this will search and replace pattern: exact two [A-Z] between two delimiters. If this is not what you want exactly, paste the desired output.
Your code seems to work just fine with my Gnu awk:
A:B C|D
100:C 100 G # even the typo in this record got fixed.
100|100 1001 H
100:100:100 K
I'd say the problem is that the regex /[A-Z]{2}/ should be written /[A-Z][A-Z]/.

Remove word from a comma separated values of specific field

The NIS group file has format
group1:*:100:bat,cat,zat,ratt
group2:*:200:rat,cat,bat
group3:*:300:rat
With : as delimiter, need to remove exact word (for example rat) from 4th column. Any leading or trailing , to the word should be deleted as well to preserve comma separated values format in 4th column
Expected output:
group1:*:100:bat,cat,zat,ratt
group2:*:200:cat,bat
group3:*:300:
You'd better use awk for this job. Try this (GNU awk):
awk 'BEGIN {OFS=FS=":"} {gsub (/\yrat,?\y|\y,?rat\y/, "", $4)}1' file
Using : as field separator, gsub removes all rat in 4th field. \y is used for word boundaries so that rat will match but not rrat.
If perl solution is okay:
Modified sample input to add more relevant cases..
$ cat ip.txt
group1:*:100:bat,cat,zat,ratt
group2:*:200:rat,cat,bat
group3:*:300:rat
group4:*:400:mat,rat,sat
group5:*:500:pat,rat
$ perl -F: -lane '(#a) = split/,/,$F[3]; $F[3] = join ",", grep { $_ ne "rat" } #a; print join ":", #F' ip.txt
group1:*:100:bat,cat,zat,ratt
group2:*:200:cat,bat
group3:*:300:
group4:*:400:mat,sat
group5:*:500:pat
-F: split input line on : and save to #F array
(#a) = split/,/,$F[3] split 4th column on , and save to #a array
$F[3] = join ",", grep { $_ ne "rat" } #a remove elements in #a array exactly matching rat, join those elements with , and modify 4th field of input line
print join ":", #F print the modified #F array elements joined by :
Golfing to avoid the temp array #a
$ perl -F: -lane '$F[3] = join ",", grep { $_ ne "rat" } split/,/,$F[3]; print join ":", #F' ip.txt
Using regex on 4th column:
$ perl -F: -lane '$F[3] =~ s/,rat\b|\brat(,|\b)//g; print join ":", #F' ip.txt
group1:*:100:bat,cat,zat,ratt
group2:*:200:cat,bat
group3:*:300:
group4:*:400:mat,sat
group5:*:500:pat
This might work for you (GNU sed):
sed -r 's/\brat\b,?//g' file
Remove one or more words rat followed by a possible ,.
awk 'NR>1{sub(/rat,*/,"")}1' file
group1:*:100:bat,cat,zat,ratt
group2:*:200:cat,bat
group3:*:300:

why doesn't awk seem to work on splitting into fields based on alternative involving "."?

It is OK to awk split on .:
>printf foo.bar | awk '{split($0, a, "."); print a[1]}'
foo
It is OK to awk split on an alternative:
>printf foo.bar | awk '{split($0, a, "b|a"); print a[1]}'
foo.
Then why is it not OK to split on an anternative involving .:
>printf foo.bar | awk '{split($0, a, ".|a"); print a[1]}'
(nothing printed)
Escape that period and I think you'll be golden:
printf foo.bar | awk '{split($0, a, "\\.|a"); print a[1]}'
JNevill showed how to get it working. But to answer your question of why the escape is needed in one case but not the other, we can find the answer in the awk manual in the summary of "how fields are split, based on the value of FS." (And the same rules apply to the fieldsep given to the split command.)
The bottom line is that when FS is a single character it is not treated as a regular expression but otherwise it is.
Hence split($0, a, ".") works as we hope, taking . to literally be ., but split($0, a, ".|a") takes .|a to be a regexp where . has a special meaning, setting the separator to be any character, and with that the necessity to add the backslashes to have the . treated literally.
FS == " "
Fields are separated by runs of whitespace. Leading and trailing whitespace are ignored. This is the default.
FS == any single character
Fields are separated by each occurrence of the character. Multiple successive occurrences delimit empty fields, as do leading and
trailing occurrences.
FS == regexp
Fields are separated by occurrences of characters that match regexp. Leading and trailing matches of regexp delimit empty fields.
You can see the despite the empty result .|a is really doing something, dividing the line into eight empty fields --- same as a line like ,,,,,,, would do with FS set to ,.
$ printf foo.bar | awk '{split($0, a, ".|a"); for (i in a) print i ": " a[i]; }'
4:
5:
6:
7:
8:
1:
2:
3:

How to split a delimited string into an array in awk?

How to split the string when it contains pipe symbols | in it.
I want to split them to be in array.
I tried
echo "12:23:11" | awk '{split($0,a,":"); print a[3] a[2] a[1]}'
Which works fine. If my string is like "12|23|11" then how do I split them into an array?
Have you tried:
echo "12|23|11" | awk '{split($0,a,"|"); print a[3],a[2],a[1]}'
To split a string to an array in awk we use the function split():
awk '{split($0, array, ":")}'
# \/ \___/ \_/
# | | |
# string | delimiter
# |
# array to store the pieces
If no separator is given, it uses the FS, which defaults to the space:
$ awk '{split($0, array); print array[2]}' <<< "a:b c:d e"
c:d
We can give a separator, for example ::
$ awk '{split($0, array, ":"); print array[2]}' <<< "a:b c:d e"
b c
Which is equivalent to setting it through the FS:
$ awk -F: '{split($0, array); print array[2]}' <<< "a:b c:d e"
b c
In GNU Awk you can also provide the separator as a regexp:
$ awk '{split($0, array, ":*"); print array[2]}' <<< "a:::b c::d e
#note multiple :
b c
And even see what the delimiter was on every step by using its fourth parameter:
$ awk '{split($0, array, ":*", sep); print array[2]; print sep[1]}' <<< "a:::b c::d e"
b c
:::
Let's quote the man page of GNU awk:
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array[1], the second piece in array[2], and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).
Please be more specific! What do you mean by "it doesn't work"?
Post the exact output (or error message), your OS and awk version:
% awk -F\| '{
for (i = 0; ++i <= NF;)
print i, $i
}' <<<'12|23|11'
1 12
2 23
3 11
Or, using split:
% awk '{
n = split($0, t, "|")
for (i = 0; ++i <= n;)
print i, t[i]
}' <<<'12|23|11'
1 12
2 23
3 11
Edit: on Solaris you'll need to use the POSIX awk (/usr/xpg4/bin/awk) in order to process 4000 fields correctly.
I do not like the echo "..." | awk ... solution as it calls unnecessary fork and execsystem calls.
I prefer a Dimitre's solution with a little twist
awk -F\| '{print $3 $2 $1}' <<<'12|23|11'
Or a bit shorter version:
awk -F\| '$0=$3 $2 $1' <<<'12|23|11'
In this case the output record put together which is a true condition, so it gets printed.
In this specific case the stdin redirection can be spared with setting an awk internal variable:
awk -v T='12|23|11' 'BEGIN{split(T,a,"|");print a[3] a[2] a[1]}'
I used ksh quite a while, but in bash this could be managed by internal string manipulation. In the first case the original string is split by internal terminator. In the second case it is assumed that the string always contains digit pairs separated by a one character separator.
T='12|23|11';echo -n ${T##*|};T=${T%|*};echo ${T#*|}${T%|*}
T='12|23|11';echo ${T:6}${T:3:2}${T:0:2}
The result in all cases is
112312
Actually awk has a feature called 'Input Field Separator Variable' link. This is how to use it. It's not really an array, but it uses the internal $ variables. For splitting a simple string it is easier.
echo "12|23|11" | awk 'BEGIN {FS="|";} { print $1, $2, $3 }'
I know this is kind of old question, but I thought maybe someone like my trick. Especially since this solution not limited to a specific number of items.
# Convert to an array
_ITEMS=($(echo "12|23|11" | tr '|' '\n'))
# Output array items
for _ITEM in "${_ITEMS[#]}"; do
echo "Item: ${_ITEM}"
done
The output will be:
Item: 12
Item: 23
Item: 11
Joke? :)
How about echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
This is my output:
p2> echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
112312
so I guess it's working after all..
echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
should work.
echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
code
awk -F"|" '{split($0,a); print a[1],a[2],a[3]}' <<< '12|23|11'
output
12 23 11
The challenge: parse and store split strings with spaces and insert them into variables.
Solution: best and simple choice for you would be convert the strings list into array and then parse it into variables with indexes. Here's an example how you can convert and access the array.
Example: parse disk space statistics on each line:
sudo df -k | awk 'NR>1' | while read -r line; do
#convert into array:
array=($line)
#variables:
filesystem="${array[0]}"
size="${array[1]}"
capacity="${array[4]}"
mountpoint="${array[5]}"
echo "filesystem:$filesystem|size:$size|capacity:$capacity|mountpoint:$mountpoint"
done
#output:
filesystem:/dev/dsk/c0t0d0s1|size:4000|usage:40%|mountpoint:/
filesystem:/dev/dsk/c0t0d0s2|size:5000|usage:50%|mountpoint:/usr
filesystem:/proc|size:0|usage:0%|mountpoint:/proc
filesystem:mnttab|size:0|usage:0%|mountpoint:/etc/mnttab
filesystem:fd|size:1000|usage:10%|mountpoint:/dev/fd
filesystem:swap|size:9000|usage:9%|mountpoint:/var/run
filesystem:swap|size:1500|usage:15%|mountpoint:/tmp
filesystem:/dev/dsk/c0t0d0s3|size:8000|usage:80%|mountpoint:/export
awk -F'['|'] -v '{print $1"\t"$2"\t"$3}' file <<<'12|23|11'

replacing the `'` char using awk

I have lines with a single : and a' in them that I want to get rid of. I want to use awk for this. I've tried using:
awk '{gsub ( "[:\\']","" ) ; print $0 }'
and
awk '{gsub ( "[:\']","" ) ; print $0 }'
and
awk '{gsub ( "[:']","" ) ; print $0 }'
non of them worked, but return the error Unmatched ".. when I put
awk '{gsub ( "[:_]","" ) ; print $0 }'
then It works and removes all : and _ chars. How can I get rid of the ' char?
tr is made for this purpose
echo test\'\'\'\':::string | tr -d \':
teststring
$ echo test\'\'\'\':::string | awk '{gsub(/[:\47]*/,"");print $0}'
teststring
This works:
awk '{gsub( "[:'\'']","" ); print}'
You could use:
Octal code for the single quote:
[:\47]
The single quote inside double quotes, but in that case special
characters will be expanded by the shell:
% print a\': | awk "sub(/[:']/, x)"
a
Use a dynamic regexp, but there are performance implications related
to this approach:
% print a\': | awk -vrx="[:\\\']" 'sub(rx, x)'
a
With bash you cannot insert a single quote inside a literal surrounded with single quotes. Use '"'"' for example.
First ' closes the current literal, then "'" concatenates it with a literal containing only a single quote, and ' reopens a string literal, which will be also concatenated.
What you want is:
awk '{gsub ( "[:'"'"']","" ) ; print $0; }'
ssapkota's alternative is also good ('\'').
I don't know why you are restricting yourself to using awk, anyways you've got many answers from other users. You can also use sed to get rid of " :' "
sed 's/:\'//g'
This will also serve your purpose. Simple and less complex.
This also works:
awk '{gsub("\x27",""); print}'
simplest
awk '{gsub(/\047|:/,"")};1'