AWK how to count patterns on the first column? - awk

I was trying get the total number of "??", " M", "A" and "D" from this:
?? this is a sentence
M this is another one
A more text here
D more and more text
I have this sample line of code but doesn't work:
awk -v pattern="\?\?" '{$1 == pattern} END{print " "FNR}'

$ awk '{ print $1 }' file | sort | uniq -c
1 ??
1 A
1 D
1 M
If for some reason you want an awk-only solution:
awk '{ ++cnt[$1] } END { for (i in cnt) print cnt[i], i }' file
but I think that's needlessly complicated compared to using the built-in unix tools that already do most of the work.
If you just want to count one particular value:
awk -v value='??' '$1 == value' file | wc -l
If you want to count only a subset of values, you can use a regex:
$ awk -v pattern='A|D|(\\?\\?)' '$1 ~ pattern { print $1 }' file | sort | uniq -c
1 ??
1 A
1 D
Here you do need to send a \ in order that the ?s are escaped within the regular expression. And because the \ is itself a special character within the string being passed to awk, you need to escape it first (hence the double backslash).

Related

awk conditional statement based on a value between colon

I was just introduced to awk and I'm trying to retrieve rows from my file based on the value on column 10.
I need to filter the data based on the value of the third value if ":" was used as a separator in column 10 (last column).
Here is an example data in column 10. 0/1:1,9:10:15:337,0,15.
I was able to extract the third value using this command awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
This returns the value 10 but how can I return other rows (not just the value in column 10) if this third value is less than or greater than a specific number?
I tried this awk '{if($10 -F ":" "/1/ ($3<10))" print $0;}' file.txt but it returns a syntax error.
Thanks!
Your code:
awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
should be just 1 awk script:
awk '$10 ~ /1/ { split($10,f,/:/); print f[3] }' file.txt
but I'm not sure that code is doing what you think it does. If you want to print the 3rd value of all $10s that contain :s, as it sounds like from your text, that'd be:
awk 'split($10,f,/:/) > 1 { print f[3] }' file.txt
and to print the rows where that value is less than 7 would be:
awk '(split($10,f,/:/) > 1) && (f[3] < 7)' file.txt

linux csv file concatenate columns into one column

I've been looking to do this with sed, awk, or cut. I am willing to use any other command-line program that I can pipe data through.
I have a large set of data that is comma delimited. The rows have between 14 and 20 columns. I need to recursively concatenate column 10 with column 11 per row such that every row has exactly 14 columns. In other words, this:
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
will become:
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
I can get the first 10 columns. I can get the last N columns. I can concatenate columns. I cannot think of how to do it in one line so I can pass a stream of endless data through it and end up with exactly 14 columns per row.
Examples (by request):
How many columns are in the row?
sed 's/[^,]//g' | wc -c
Get the first 10 columns:
cut -d, -f1-10
Get the last 4 columns:
rev | cut -d, -f1-4 | rev
Concatenate columns 10 and 11, showing columns 1-10 after that:
awk -F',' ' NF { print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10$11}'
Awk solution:
awk 'BEGIN{ FS=OFS="," }
{
diff = NF - 14;
for (i=1; i <= NF; i++)
printf "%s%s", $i, (diff > 1 && i >= 10 && i < (10+diff)?
"": (i == NF? ORS : ","))
}' file
The output:
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
With GNU awk for the 3rd arg to match() and gensub():
$ cat tst.awk
BEGIN{ FS="," }
match($0,"(([^,]+,){9})(([^,]+,){"NF-14"})(.*)",a) {
$0 = a[1] gensub(/,/,"","g",a[3]) a[5]
}
{ print }
$ awk -f tst.awk file
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
If perl is okay - can be used just like awk for stream processing
$ cat ip.txt
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4
1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4
$ awk -F, '{print NF}' ip.txt
16
18
22
$ perl -F, -lane '$n = $#F - 4;
print join ",", (#F[0..8], join("", #F[9..$n]), #F[$n+1..$#F])
' ip.txt
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
1,2,3,4,5,6,3,4,2,43432,5,2,3,4
1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4
-F, -lane split on , results saved in #F array
$n = $#F - 4 magic number, to ensure output ends with 14 columns. $#F gives the index of last element of array (won't work if input line has less than 14 columns)
join helps to stitch array elements together with specified string
#F[0..8] array slice with first 9 elements
#F[9..$n] and #F[$n+1..$#F] the other slices as needed
Borrowing from Ed Morton's regex based solution
$ perl -F, -lape '$n=$#F-13; s/^([^,]*,){9}\K([^,]*,){$n}/$&=~tr|,||dr/e' ip.txt
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
1,2,3,4,5,6,3,4,2,43432,5,2,3,4
1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4
$n=$#F-13 magic number
^([^,]*,){9}\K first 9 fields
([^,]*,){$n} fields to change
$&=~tr|,||dr use tr to delete the commas
e this modifier allows use of Perl code in replacement section
this solution also has the added advantage of working even if input field is less than 14
You can try this gnu sed
sed -E '
s/,/\n/9g
:A
s/([^\n]*\n)(.*)(\n)(([^\n]*\n){4})/\1\2\4/
tA
s/\n/,/g
' infile
First variant - with awk
awk -F, '
{
for(i = 1; i <= NF; i++) {
OFS = (i > 9 && i < NF - 4) ? "" : ","
if(i == NF) OFS = "\n"
printf "%s%s", $i, OFS
}
}' input.txt
Second variant - with sed
sed -r 's/,/#/10g; :l; s/#(.*)((#[^#]){4})/\1\2/; tl; s/#/,/g' input.txt
or, more straightforwardly (without loop) and probably faster.
sed -r 's/,(.),(.),(.),(.)$/#\1#\2#\3#\4/; s/,//10g; s/#/,/g' input.txt
Testing
Input
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u
Output
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
a,b,c,d,e,f,g,h,i,jklmn,o,p,q,r
a,b,c,d,e,f,g,h,i,jklmnopq,r,s,t,u
Solved a similar problem using csvtool. Source file, copied from one of the other answers:
$ cat input.txt
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4
1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4
Concatenating columns:
$ cat input.txt | csvtool format '%1,%2,%3,%4,%5,%6,%7,%8,%9,%10%11%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22\n' -
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p,,,,,,
1,2,3,4,5,6,3,4,2,434,3,2,5,2,3,4,,,,
1,2,3,4,5,6,3,4,2,4as,f,e,3,4,3,2,5,2,3,4
anatoly#anatoly-workstation:cbs$ cat input.txt

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

While using awk showing fatal : cannot open pipe ( Too many open files) error

I was trying to do masking of file with command 'tr' and 'awk' but failing with error fatal: cannot open pipe ( Too many open pipes) error. FILE has approx 1000000 records quite a huge number.
Below is the code I am trying :-
awk - F "|" - v OFS="|" '{ "echo \""$1"\" | tr \" 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\" \" QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq\"" | get line $1}1' FILE.CSV > test.CSV
It is showing error :-
awk: (FILENAME=- FNR=1019) fatal: cannot open pipe `echo ""TTP_123"" | tr "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" "QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq"' (Too many open pipes)
Please let me know what I am doing wrong here
Also a Note any number of columns could be used for masking and can be at any positions in this example I have taken 1 and 2 column positions but it could be 3 and 10 or 5,7,25 columns
Thanks
AJ
First things first, you can't have a space between - and F or v.
I was going to suggest sed, but as you only want to translate the first column, that's not as easy.
Unfortunately, awk doesn't have built-in tr functionality, so you'd have to use the shell like you are and just close the pipe:
awk -F "|" -v OFS="|" '{
command="echo \"\\"$1"\\\" | tr \" 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\" \" QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq\""
command | getline $1
close(command)
}1' FILE.CSV > test.CSV
However, I suggest using perl, which can do field splitting and character translation:
perl -F'\|' -lane '$F[0] =~ tr/0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ/QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq/; print join("|", #F)' FILE.CSV > test.CSV
Or, for a shorter command line, just put the program into a file, drop the e in -lane and use the file name instead of the '...' command.
you can do the mapping in awk instead of making a system call for each line, or perhaps simply
paste -d'|' <(cut -d'|' -f1 file | tr '0-9' 'a-z') <(cut -d'|' -f2- file)
replace the tr arguments with yours.
This does not answer your question, but you can implement tr as an awk function that would save having to spawn lots of external processes
$ cat tr.awk
function tr(str, from, to, s,i,c,idx) {
s = ""
for (i=1; i<=length($str); i++) {
c = substr(str, i, 1)
idx = index(from, c)
s = s (idx == 0 ? c : substr(to, idx, 1))
}
return s
}
{
print $1, tr($1,
" 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ",
" QWERTYUIOPASDFGHJKLZXCVBNM9876543210mnbvcxzlkjhgfdsapoiuytrewq")
}
Example:
$ printf "%s\n" hello wor-ld | awk -f tr.awk
hello KGCCN
wor-ld 3N8-CF

How to use awk sort by column 3

I have a file (user.csv)like this
ip,hostname,user,group,encryption,aduser,adattr
want to print all column sort by user,
I tried awk -F ":" '{print|"$3 sort -n"}' user.csv , it doesn't work.
How about just sort.
sort -t, -nk3 user.csv
where
-t, - defines your delimiter as ,.
-n - gives you numerical sort. Added since you added it in your
attempt. If your user field is text only then you dont need it.
-k3 - defines the field (key). user is the third field.
Use awk to put the user ID in front.
Sort
Use sed to remove the duplicate user ID, assuming user IDs do not contain any spaces.
awk -F, '{ print $3, $0 }' user.csv | sort | sed 's/^.* //'
Seeing as that the original question was on how to use awk and every single one of the first 7 answers use sort instead, and that this is the top hit on Google, here is how to use awk.
Sample net.csv file with headers:
ip,hostname,user,group,encryption,aduser,adattr
192.168.0.1,gw,router,router,-,-,-
192.168.0.2,server,admin,admin,-,-,-
192.168.0.3,ws-03,user,user,-,-,-
192.168.0.4,ws-04,user,user,-,-,-
And sort.awk:
#!/usr/bin/awk -f
# usage: ./sort.awk -v f=FIELD FILE
BEGIN {
FS=","
}
# each line
{
a[NR]=$0 ""
s[NR]=$f ""
}
END {
isort(s,a,NR);
for(i=1; i<=NR; i++) print a[i]
}
#insertion sort of A[1..n]
function isort(S, A, n, i, j) {
for( i=2; i<=n; i++) {
hs = S[j=i]
ha = A[j=i]
while (S[j-1] > hs) {
j--;
S[j+1] = S[j]
A[j+1] = A[j]
}
S[j] = hs
A[j] = ha
}
}
To use it:
awk sort.awk f=3 < net.csv # OR
chmod +x sort.awk
./sort.awk f=3 net.csv
You can choose a delimiter, in this case I chose a colon and printed the column number one, sorting by alphabetical order:
awk -F\: '{print $1|"sort -u"}' /etc/passwd
awk -F, '{ print $3, $0 }' user.csv | sort -nk2
and for reverse order
awk -F, '{ print $3, $0 }' user.csv | sort -nrk2
try this -
awk '{print $0|"sort -t',' -nk3 "}' user.csv
OR
sort -t',' -nk3 user.csv
awk -F "," '{print $0}' user.csv | sort -nk3 -t ','
This should work
To exclude the first line (header) from sorting, I split it out into two buffers.
df | awk 'BEGIN{header=""; $body=""} { if(NR==1){header=$0}else{body=body"\n"$0}} END{print header; print body|"sort -nk3"}'
With GNU awk:
awk -F ',' '{ a[$3]=$0 } END{ PROCINFO["sorted_in"]="#ind_str_asc"; for(i in a) print a[i] }' file
See 8.1.6 Using Predefined Array Scanning Orders with gawk for more sorting algorithms.
I'm running Linux (Ubuntu) with mawk:
tmp$ awk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan
random-funcs: srandom/random
regex-funcs: internal
compiled limits:
sprintf buffer 8192
maximum-integer 2147483647
mawk (and gawk) has an option to redirect the output of print to a command. From man awk chapter 9. Input and output:
The output of print and printf can be redirected to a file or command by appending > file, >> file or | command to the end of the print statement. Redirection opens file or command only once, subsequent redirections append to the already open stream.
Below you'll find a simplied example how | can be used to pass the wanted records to an external program that makes the hard work. This also nicely encapsulates everything in a single awk file and reduces the command line clutter:
tmp$ cat input.csv
alpha,num
D,4
B,2
A,1
E,5
F,10
C,3
tmp$ cat sort.awk
# print header line
/^alpha,num/ {
print
}
# all other lines are data lines that should be sorted
!/^alpha,num/ {
print | "sort --field-separator=, --key=2 --numeric-sort"
}
tmp$ awk -f sort.awk input.csv
alpha,num
A,1
B,2
C,3
D,4
E,5
F,10
See man sort for the details of the sort options:
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
-k, --key=KEYDEF
sort via a key; KEYDEF gives location and type
-n, --numeric-sort
compare according to string numerical value