How to split a delimited string into an array in awk? - awk
How to split the string when it contains pipe symbols | in it.
I want to split them to be in array.
I tried
echo "12:23:11" | awk '{split($0,a,":"); print a[3] a[2] a[1]}'
Which works fine. If my string is like "12|23|11" then how do I split them into an array?
Have you tried:
echo "12|23|11" | awk '{split($0,a,"|"); print a[3],a[2],a[1]}'
To split a string to an array in awk we use the function split():
awk '{split($0, array, ":")}'
# \/ \___/ \_/
# | | |
# string | delimiter
# |
# array to store the pieces
If no separator is given, it uses the FS, which defaults to the space:
$ awk '{split($0, array); print array[2]}' <<< "a:b c:d e"
c:d
We can give a separator, for example ::
$ awk '{split($0, array, ":"); print array[2]}' <<< "a:b c:d e"
b c
Which is equivalent to setting it through the FS:
$ awk -F: '{split($0, array); print array[2]}' <<< "a:b c:d e"
b c
In GNU Awk you can also provide the separator as a regexp:
$ awk '{split($0, array, ":*"); print array[2]}' <<< "a:::b c::d e
#note multiple :
b c
And even see what the delimiter was on every step by using its fourth parameter:
$ awk '{split($0, array, ":*", sep); print array[2]; print sep[1]}' <<< "a:::b c::d e"
b c
:::
Let's quote the man page of GNU awk:
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array[1], the second piece in array[2], and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).
Please be more specific! What do you mean by "it doesn't work"?
Post the exact output (or error message), your OS and awk version:
% awk -F\| '{
for (i = 0; ++i <= NF;)
print i, $i
}' <<<'12|23|11'
1 12
2 23
3 11
Or, using split:
% awk '{
n = split($0, t, "|")
for (i = 0; ++i <= n;)
print i, t[i]
}' <<<'12|23|11'
1 12
2 23
3 11
Edit: on Solaris you'll need to use the POSIX awk (/usr/xpg4/bin/awk) in order to process 4000 fields correctly.
I do not like the echo "..." | awk ... solution as it calls unnecessary fork and execsystem calls.
I prefer a Dimitre's solution with a little twist
awk -F\| '{print $3 $2 $1}' <<<'12|23|11'
Or a bit shorter version:
awk -F\| '$0=$3 $2 $1' <<<'12|23|11'
In this case the output record put together which is a true condition, so it gets printed.
In this specific case the stdin redirection can be spared with setting an awk internal variable:
awk -v T='12|23|11' 'BEGIN{split(T,a,"|");print a[3] a[2] a[1]}'
I used ksh quite a while, but in bash this could be managed by internal string manipulation. In the first case the original string is split by internal terminator. In the second case it is assumed that the string always contains digit pairs separated by a one character separator.
T='12|23|11';echo -n ${T##*|};T=${T%|*};echo ${T#*|}${T%|*}
T='12|23|11';echo ${T:6}${T:3:2}${T:0:2}
The result in all cases is
112312
Actually awk has a feature called 'Input Field Separator Variable' link. This is how to use it. It's not really an array, but it uses the internal $ variables. For splitting a simple string it is easier.
echo "12|23|11" | awk 'BEGIN {FS="|";} { print $1, $2, $3 }'
I know this is kind of old question, but I thought maybe someone like my trick. Especially since this solution not limited to a specific number of items.
# Convert to an array
_ITEMS=($(echo "12|23|11" | tr '|' '\n'))
# Output array items
for _ITEM in "${_ITEMS[#]}"; do
echo "Item: ${_ITEM}"
done
The output will be:
Item: 12
Item: 23
Item: 11
Joke? :)
How about echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
This is my output:
p2> echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
112312
so I guess it's working after all..
echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
should work.
echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
code
awk -F"|" '{split($0,a); print a[1],a[2],a[3]}' <<< '12|23|11'
output
12 23 11
The challenge: parse and store split strings with spaces and insert them into variables.
Solution: best and simple choice for you would be convert the strings list into array and then parse it into variables with indexes. Here's an example how you can convert and access the array.
Example: parse disk space statistics on each line:
sudo df -k | awk 'NR>1' | while read -r line; do
#convert into array:
array=($line)
#variables:
filesystem="${array[0]}"
size="${array[1]}"
capacity="${array[4]}"
mountpoint="${array[5]}"
echo "filesystem:$filesystem|size:$size|capacity:$capacity|mountpoint:$mountpoint"
done
#output:
filesystem:/dev/dsk/c0t0d0s1|size:4000|usage:40%|mountpoint:/
filesystem:/dev/dsk/c0t0d0s2|size:5000|usage:50%|mountpoint:/usr
filesystem:/proc|size:0|usage:0%|mountpoint:/proc
filesystem:mnttab|size:0|usage:0%|mountpoint:/etc/mnttab
filesystem:fd|size:1000|usage:10%|mountpoint:/dev/fd
filesystem:swap|size:9000|usage:9%|mountpoint:/var/run
filesystem:swap|size:1500|usage:15%|mountpoint:/tmp
filesystem:/dev/dsk/c0t0d0s3|size:8000|usage:80%|mountpoint:/export
awk -F'['|'] -v '{print $1"\t"$2"\t"$3}' file <<<'12|23|11'
Related
gawk - Delimit lines with custom character and no similar ending character
Let's say I have a file like so: test.txt one two three I'd like to get the following output: one|two|three And am currently using this command: gawk -v ORS='|' '{ print $0 }' test.txt Which gives: one|two|three| How can I print it so that the last | isn't there?
Here's one way to do it: $ seq 1 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}' 1 $ seq 3 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}' 1|2|3 With paste: $ seq 1 | paste -sd'|' 1 $ seq 3 | paste -sd'|' 1|2|3
Convert one column to one row with field separator: awk '{$1=$1} 1' FS='\n' OFS='|' RS='' file Or in another notation: awk -v FS='\n' -v OFS='|' -v RS='' '{$1=$1} 1' file Output: one|two|three See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
awk solutions work great. Here is tr + sed solution: tr '\n' '|' < file | sed 's/\|$//' 1|2|3
just flatten it : gawk/mawk 'BEGIN { FS = ORS; RS = "^[\n]*$"; OFS = "|" } NF && ( $NF ? NF=NF : —-NF )' ascii | = octal \174 = hex 0x7C. The reason for —-NF is that more often than not, the input includes a trailing new line, which makes field count 1 too many and result in 1|2|3| Both NF=NF and --NF are similar concepts to $1=$1. Empty inputs, regardless of whether trailing new lines exist or not, would result in nothing printed. At the OFS spot, you can delimit it with any string combo you like instead of being constrained by tr, which has inconsistent behavior. For instance : gtr '\012' '高' # UTF8 高 = \351\253\230 = xE9 xAB x98 on bsd-tr, \n will get replaced by the unicode properly 1高2高3高 , but if you're on gnu-tr, it would only keep the leading byte of the unicode, and result in 1 \351 2 \351 . . . For unicode equiv-classes, bsd-tr works as expected while gtr '[=高=]' '\v' results in gtr: ?\230: equivalence class operand must be a single character and if u attempt equiv-classes with an arbitrary non-ASCII byte, bsd-tr does nothing while gnu-tr would gladly oblige, even if it means slicing straight through UTF8-compliant characters : g3bn 77138 | (g)tr '[=\224=]' '\v' bsd-tr : 77138=Koyote 코요태 KYT✜ 高耀太 gnu-tr : 77138=Koyote ? ? 태 KYT✜ 高耀太
I would do it following way, using GNU AWK, let test.txt content be one two three then awk '{printf NR==1?"%s":"|%s", $0}' test.txt output one|two|three Explanation: If it is first line print that line content sans trailing newline, otherwise | followed by line content sans trailing newline. Note that I assumed that test.txt has not trailing newline, if this is not case test this solution before applying it. (tested in gawk 5.0.1)
Also you can try this with awk: awk '{ORS = (NR%3 ? "|" : RS)} 1' file one|two|three % is the modulo operator and NR%3 ? "|" : RS is a ternary expression. See Ed Morton's explanation here: https://stackoverflow.com/a/55998710/14259465
With a GNU sed, you can pass -z option to match line breaks, and thus all you need is replace each newline but the last one at the end of string: sed -z 's/\n\(.\)/|\1/g' test.txt perl -0pe 's/\n(?!\z)/|/g' test.txt perl -pe 's/\n/|/g if !eof' test.txt See the online demo. Details: s - substitution command \n\(.\) - an LF char followed with any one char captured into Group 1 (so \n at the end of string won't get matched) |\1 - a | char and the captured char g - all occurrences. The first perl command matches any LF char (\n) not at the end of string ((?!\z)) after slurping the whole file into a single string input (again, to make \n visible to the regex engine). The second perl command replaces an LF char at the end of each line except the one at the end of file (eof). To make the changes inline add -i option (mind this is a GNU sed example): sed -i -z 's/\n\(.\)/|\1/g' test.txt perl -i -0pe 's/\n(?!\z)/|/g' test.txt perl -i -pe 's/\n/|/g if !eof' test.txt
AWK Match & Split not finding string pattern
Passing the following commands I would expect the first to split the string (which is also a regex) into two array elements and the second command (match) to print [[:blank:]]. echo "new[[:blank:]]+File\(" | awk '{ split($0, a, "[[:blank:]]"); print a[1]}' prints the whole string as it has not split echo "new[[:blank:]]+File\(" | awk '{ match($0, /[[:blank:]]/, m)}END{print m[0]}' prints nothing What am I missing here? UPDATE I'm calling an awk script with the following command; awk -v regex1=new[[:blank:]]+File\( -f parameterisedRegexAwkScript.awk "$file" >> "output.txt" Then in the my script I attempt to split on the string literal with the following command; len = split(regex1, regex, /[[:blank:]]/, seps but when I print len it's value is 1 when I would have expected it to be 2
echo "new[[:blank:]]+File\(" | awk '{ split($0, a, "[[:blank:]]"); print a[1]}' 3rd argument for split works like setting FS in BEGIN, so in this case you instruct to split at any whitespace, you need to escape [ and ]. Let file.txt content be new[[:blank:]]+File\( then awk '{split($0, a, "\\[\\[:blank:\\]\\]"); print a[1]}' file.txt output new (tested in gawk 4.2.1)
linux csv file concatenate columns into one column
I've been looking to do this with sed, awk, or cut. I am willing to use any other command-line program that I can pipe data through. I have a large set of data that is comma delimited. The rows have between 14 and 20 columns. I need to recursively concatenate column 10 with column 11 per row such that every row has exactly 14 columns. In other words, this: a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p will become: a,b,c,d,e,f,g,h,i,jkl,m,n,o,p I can get the first 10 columns. I can get the last N columns. I can concatenate columns. I cannot think of how to do it in one line so I can pass a stream of endless data through it and end up with exactly 14 columns per row. Examples (by request): How many columns are in the row? sed 's/[^,]//g' | wc -c Get the first 10 columns: cut -d, -f1-10 Get the last 4 columns: rev | cut -d, -f1-4 | rev Concatenate columns 10 and 11, showing columns 1-10 after that: awk -F',' ' NF { print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10$11}'
Awk solution: awk 'BEGIN{ FS=OFS="," } { diff = NF - 14; for (i=1; i <= NF; i++) printf "%s%s", $i, (diff > 1 && i >= 10 && i < (10+diff)? "": (i == NF? ORS : ",")) }' file The output: a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
With GNU awk for the 3rd arg to match() and gensub(): $ cat tst.awk BEGIN{ FS="," } match($0,"(([^,]+,){9})(([^,]+,){"NF-14"})(.*)",a) { $0 = a[1] gensub(/,/,"","g",a[3]) a[5] } { print } $ awk -f tst.awk file a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
If perl is okay - can be used just like awk for stream processing $ cat ip.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p 1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4 1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4 $ awk -F, '{print NF}' ip.txt 16 18 22 $ perl -F, -lane '$n = $#F - 4; print join ",", (#F[0..8], join("", #F[9..$n]), #F[$n+1..$#F]) ' ip.txt a,b,c,d,e,f,g,h,i,jkl,m,n,o,p 1,2,3,4,5,6,3,4,2,43432,5,2,3,4 1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4 -F, -lane split on , results saved in #F array $n = $#F - 4 magic number, to ensure output ends with 14 columns. $#F gives the index of last element of array (won't work if input line has less than 14 columns) join helps to stitch array elements together with specified string #F[0..8] array slice with first 9 elements #F[9..$n] and #F[$n+1..$#F] the other slices as needed Borrowing from Ed Morton's regex based solution $ perl -F, -lape '$n=$#F-13; s/^([^,]*,){9}\K([^,]*,){$n}/$&=~tr|,||dr/e' ip.txt a,b,c,d,e,f,g,h,i,jkl,m,n,o,p 1,2,3,4,5,6,3,4,2,43432,5,2,3,4 1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4 $n=$#F-13 magic number ^([^,]*,){9}\K first 9 fields ([^,]*,){$n} fields to change $&=~tr|,||dr use tr to delete the commas e this modifier allows use of Perl code in replacement section this solution also has the added advantage of working even if input field is less than 14
You can try this gnu sed sed -E ' s/,/\n/9g :A s/([^\n]*\n)(.*)(\n)(([^\n]*\n){4})/\1\2\4/ tA s/\n/,/g ' infile
First variant - with awk awk -F, ' { for(i = 1; i <= NF; i++) { OFS = (i > 9 && i < NF - 4) ? "" : "," if(i == NF) OFS = "\n" printf "%s%s", $i, OFS } }' input.txt Second variant - with sed sed -r 's/,/#/10g; :l; s/#(.*)((#[^#]){4})/\1\2/; tl; s/#/,/g' input.txt or, more straightforwardly (without loop) and probably faster. sed -r 's/,(.),(.),(.),(.)$/#\1#\2#\3#\4/; s/,//10g; s/#/,/g' input.txt Testing Input a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u Output a,b,c,d,e,f,g,h,i,jkl,m,n,o,p a,b,c,d,e,f,g,h,i,jklmn,o,p,q,r a,b,c,d,e,f,g,h,i,jklmnopq,r,s,t,u
Solved a similar problem using csvtool. Source file, copied from one of the other answers: $ cat input.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p 1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4 1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4 Concatenating columns: $ cat input.txt | csvtool format '%1,%2,%3,%4,%5,%6,%7,%8,%9,%10%11%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22\n' - a,b,c,d,e,f,g,h,i,jkl,m,n,o,p,,,,,, 1,2,3,4,5,6,3,4,2,434,3,2,5,2,3,4,,,, 1,2,3,4,5,6,3,4,2,4as,f,e,3,4,3,2,5,2,3,4 anatoly#anatoly-workstation:cbs$ cat input.txt
awk - concatenate two string variable and assign to a third
In awk, I have 2 fields: $1 and $2. They are both strings that I want to concatenate and assign to a variable.
Just use var = var1 var2 and it will automatically concatenate the vars var1 and var2: awk '{new_var=$1$2; print new_var}' file You can put an space in between with: awk '{new_var=$1" "$2; print new_var}' file Which in fact is the same as using FS, because it defaults to the space: awk '{new_var=$1 FS $2; print new_var}' file Test $ cat file hello how are you i am fine $ awk '{new_var=$1$2; print new_var}' file hellohow iam $ awk '{new_var=$1 FS $2; print new_var}' file hello how i am You can play around with it in ideone: http://ideone.com/4u2Aip
Could use sprintf to accomplish this: awk '{str = sprintf("%s %s", $1, $2)} END {print str}' file
You can also concatenate strings from across multiple lines with whitespaces. $ cat file.txt apple 10 oranges 22 grapes 7 Example 1: awk '{aggr=aggr " " $2} END {print aggr}' file.txt 10 22 7 Example 2: awk '{aggr=aggr ", " $1 ":" $2} END {print aggr}' file.txt , apple:10, oranges:22, grapes:7
Concatenating strings in awk can be accomplished by the print command AWK manual page, and you can do complicated combination. Here I was trying to change the 16 char to A and used string concatenation: echo CTCTCTGAAATCACTGAGCAGGAGAAAGATT | awk -v w=15 -v BA=A '{OFS=""; print substr($0, 1, w), BA, substr($0,w+2)}' Output: CTCTCTGAAATCACTAAGCAGGAGAAAGATT I used the substr function to extract a portion of the input (STDIN). I passed some external parameters (here I am using hard-coded values) that are usually shell variable. In the context of shell programming, you can write -v w=$width -v BA=$my_charval. The key is the OFS which stands for Output Field Separate in awk. Print function take a list of values and write them to the STDOUT and glue them with the OFS. This is analogous to the perl join function. It looks that in awk, string can be concatenated by printing variable next to each other: echo xxx | awk -v a="aaa" -v b="bbb" '{ print a b $1 "string literal"}' # will produce: aaabbbxxxstring literal
How to use awk sort by column 3
I have a file (user.csv)like this ip,hostname,user,group,encryption,aduser,adattr want to print all column sort by user, I tried awk -F ":" '{print|"$3 sort -n"}' user.csv , it doesn't work.
How about just sort. sort -t, -nk3 user.csv where -t, - defines your delimiter as ,. -n - gives you numerical sort. Added since you added it in your attempt. If your user field is text only then you dont need it. -k3 - defines the field (key). user is the third field.
Use awk to put the user ID in front. Sort Use sed to remove the duplicate user ID, assuming user IDs do not contain any spaces. awk -F, '{ print $3, $0 }' user.csv | sort | sed 's/^.* //'
Seeing as that the original question was on how to use awk and every single one of the first 7 answers use sort instead, and that this is the top hit on Google, here is how to use awk. Sample net.csv file with headers: ip,hostname,user,group,encryption,aduser,adattr 192.168.0.1,gw,router,router,-,-,- 192.168.0.2,server,admin,admin,-,-,- 192.168.0.3,ws-03,user,user,-,-,- 192.168.0.4,ws-04,user,user,-,-,- And sort.awk: #!/usr/bin/awk -f # usage: ./sort.awk -v f=FIELD FILE BEGIN { FS="," } # each line { a[NR]=$0 "" s[NR]=$f "" } END { isort(s,a,NR); for(i=1; i<=NR; i++) print a[i] } #insertion sort of A[1..n] function isort(S, A, n, i, j) { for( i=2; i<=n; i++) { hs = S[j=i] ha = A[j=i] while (S[j-1] > hs) { j--; S[j+1] = S[j] A[j+1] = A[j] } S[j] = hs A[j] = ha } } To use it: awk sort.awk f=3 < net.csv # OR chmod +x sort.awk ./sort.awk f=3 net.csv
You can choose a delimiter, in this case I chose a colon and printed the column number one, sorting by alphabetical order: awk -F\: '{print $1|"sort -u"}' /etc/passwd
awk -F, '{ print $3, $0 }' user.csv | sort -nk2 and for reverse order awk -F, '{ print $3, $0 }' user.csv | sort -nrk2
try this - awk '{print $0|"sort -t',' -nk3 "}' user.csv OR sort -t',' -nk3 user.csv
awk -F "," '{print $0}' user.csv | sort -nk3 -t ',' This should work
To exclude the first line (header) from sorting, I split it out into two buffers. df | awk 'BEGIN{header=""; $body=""} { if(NR==1){header=$0}else{body=body"\n"$0}} END{print header; print body|"sort -nk3"}'
With GNU awk: awk -F ',' '{ a[$3]=$0 } END{ PROCINFO["sorted_in"]="#ind_str_asc"; for(i in a) print a[i] }' file See 8.1.6 Using Predefined Array Scanning Orders with gawk for more sorting algorithms.
I'm running Linux (Ubuntu) with mawk: tmp$ awk -W version mawk 1.3.4 20200120 Copyright 2008-2019,2020, Thomas E. Dickey Copyright 1991-1996,2014, Michael D. Brennan random-funcs: srandom/random regex-funcs: internal compiled limits: sprintf buffer 8192 maximum-integer 2147483647 mawk (and gawk) has an option to redirect the output of print to a command. From man awk chapter 9. Input and output: The output of print and printf can be redirected to a file or command by appending > file, >> file or | command to the end of the print statement. Redirection opens file or command only once, subsequent redirections append to the already open stream. Below you'll find a simplied example how | can be used to pass the wanted records to an external program that makes the hard work. This also nicely encapsulates everything in a single awk file and reduces the command line clutter: tmp$ cat input.csv alpha,num D,4 B,2 A,1 E,5 F,10 C,3 tmp$ cat sort.awk # print header line /^alpha,num/ { print } # all other lines are data lines that should be sorted !/^alpha,num/ { print | "sort --field-separator=, --key=2 --numeric-sort" } tmp$ awk -f sort.awk input.csv alpha,num A,1 B,2 C,3 D,4 E,5 F,10 See man sort for the details of the sort options: -t, --field-separator=SEP use SEP instead of non-blank to blank transition -k, --key=KEYDEF sort via a key; KEYDEF gives location and type -n, --numeric-sort compare according to string numerical value