I am looking for a way of counting the number of times a value in a field appears in a range of fields in a csv file much the same as countif in excel although I would like to use an awk command if possible.
So column 6 should have the range of values and column 7 would have the times the value appears in column 7, as per below
>awk -F, '{print $0}' file3
f1,f2,f3,f4,f5,test
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD
row2_1,row2_2,row2_3,AWERF,row2_5,AWER
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ
>awk -F, '{print $6}' file3
test
SBCD
AWER
ASDF
ASDQ
ASDQ
What i want is:
f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2
#adds field name count that I want:
awk -F, -v OFS=, 'NR==1{ print $0, "count"}
NR>1{ print $0}' file3
Ho do I get the output I want?
I have tried this from previous/similar question but no joy,
>awk -F, 'NR>1{c[$6]++;l[NR>1]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i]","c[s[1]]}}' file3
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,
,
,
,
,
,
very similar question to this one
similar python related Q, for my ref
I would harness GNU AWK for this task following way, let file.txt content be
f1,f2,f3,f4,f5,test
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD
row2_1,row2_2,row2_3,AWERF,row2_5,AWER
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ
then
awk 'BEGIN{FS=OFS=","}NR==1{print $0,"count";next}FNR==NR{arr[$6]+=1;next}FNR>1{print $0,arr[$6]}' file.txt file.txt
gives output
f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2
Explanation: this is two-pass approach, hence file.txt appears twice. I inform GNU AWK that , is both field separator (FS) and output field separator (OFS), then for first line (header) I print it followed by count and instruct GNU AWK to go to next line, so nothing other is done regarding 1st line, then for first pass, i.e. where global number of line (NR) is equal to number of line in file (FNR) I count number of occurences of values in 6th field and store them as values in array arr, then instruct GNU AWK to get to next line, so onthing other is done in this pass. During second pass for all lines after 1st (FNR>1) I print whole line ($0) followed by corresponding value from array arr
(tested in GNU Awk 5.0.1)
You did not copy the code from the linked question properly. Why change l[NR] to l[NR>1] at all? On the other hand, you should change s[1] to s[6] since it's the sixth field that has the key you're counting:
awk -F, 'NR>1{c[$6]++;l[NR]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i]","c[s[6]]}}'
You can also output the header with the new field name:
awk -F, -vOFS=, 'NR==1{print $0,"count"}NR>1{c[$6]++;l[NR]=$0}END{for(i=0;i++<NR;){split(l[i],s,",");print l[i],c[s[6]]}}'
One awk idea:
awk '
BEGIN { FS=OFS="," } # define input/output field delimiters as comma
{ lines[NR]=$0
if (NR==1) next
col6[NR]=$6 # copy field 6 so we do not have to parse the contents of lines[] in the END block
cnt[$6]++
}
END { for (i=1;i<=NR;i++)
print lines[i], (i==1 ? "count" : cnt[col6[i]] )
}
' file3
This generates:
f1,f2,f3,f4,f5,test,count
row1_1,row1_2,row1_3,SBCDE,row1_5,SBCD,1
row2_1,row2_2,row2_3,AWERF,row2_5,AWER,1
row3_1,row3_2,row3_3,ASDFG,row3_5,ASDF,1
row4_1,row4_2,row4_3,PRE-ASDQG,row4_5,ASDQ,2
row4_1,row4_2,row4_3,PRE-ASDQF,row4_5,ASDQ,2
I need 7th column of a csv file to be converted from float to decimal. It's a huge file and I don't want to use while read for conversion. Any shortcuts with awk?
Input:
"xx","x","xxxxxx","xxx","xx","xx"," 00000001.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000002.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000005.0000"
"xx","x","xxxxxx","xxx","xx","xx"," 00000011.0000"
Output:
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
Tried these, worked. But anything simpler ?
awk 'BEGIN {FS=OFS="\",\""} {$7 = sprintf("%.0f", $7)} 1' $test > $test1
awk '{printf("%s\"\n", $0)}' $test1
With your shown samples, please try following awk program.
awk -v s1="\"" -v OFS="," '{$NF = s1 ($NF + 0) s1} 1' Input_file
Explanation: Simple explanation would be, setting OFS as , then in main program; in each line's last field keeping only digits and covering last field with ", re-shuffle the fields and printing edited/non-edited all lines.
Another simple awk solution:
awk 'BEGIN {FS=OFS="\",\""} {$NF = $NF+0 "\""} 1' file
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
awk 'BEGIN{FS=OFS=","} {gsub(/"/, "", $7); $7="\"" $7+0 "\""; print}' file
Output:
"xx","x","xxxxxx","xxx","xx","xx","1"
"xx","x","xxxxxx","xxx","xx","xx","2"
"xx","x","xxxxxx","xxx","xx","xx","5"
"xx","x","xxxxxx","xxx","xx","xx","11"
gsub(/"/, "", $7): removes all " from $7
$7+0: Reduces the number in $7 to minimal representation
I've a text file of thousands of lines
:ABC:xyz:1234:200:some text:xxx:yyyy:11818:AAA:BBB
:ABC:xyz:6789:200:some text:xxx:yyyy:203450:AAA:BBB
:EFG:xyz:11818:200:some text:xxx:yyyy:154678:AAA:BBB
:HIJ:xyz:203450:200:some text:xxx:yyyy:154678:AAA:BBB
:KLM:xyz:7777:200:some text:xxx:yyyy:11818:AAA:BBB
.....
....
:DEL:xyz:1234:200:some text:xxx:yyyy:203450:AAA:BBB
I need to find more than one occurrence of the 9th column i.e the o/p should show
:ABC:xyz:1234:200:some text:xxx:yyyy:11818:AAA:BBB
:KLM:xyz:7777:200:some text:xxx:yyyy:11818:AAA:BBB
:ABC:xyz:6789:200:some text:xxx:yyyy:203450:AAA:BBB
:DEL:xyz:1234:200:some text:xxx:yyyy:203450:AAA:BBB
I tried:
awk -F ":" '$9 > 2 {split($0,a,":"); print $0}'
this prints all the records.
awk -F':' 'NR==FNR{cnt[$9]++;next} cnt[$9]>1' file file
or if you don't want to parse the file twice:
awk -F':' 'cnt[$9]++{printf "%s", prev[$9]; delete prev[$9]; print; next} {prev[$9]=$0 ORS}' file
This should do it in pure awk:
awk -F":" '{if( s[$9] ){ print } else if( f[$9] ){ print f[$9]; s[$9]=1; print }; f[$9]=$0 }'
Explanation:
The "f" array stores values of the 9th column that have occurred at least once.
The "s" array stores values of the 9th column that have occurred twice or more.
If the 9th column has occurred before, print the first occurrence, and this line.
If the 9th column has occurred twice or more before, print this line.
Here is another awk
awk -F: '{++a[$9];b[NR]=$0} END {for (i=1;i<=NR;i++) {split(b[i],c,":");if (a[c[9]]>1) print b[i]}}' file
eg, each row of the file is like :
1, 2, 3, 4,..., 1000
How can print out
1 2 3 4 ... 1000
?
If you just want to delete the commas, you can use tr:
$ tr -d ',' <file
1 2 3 4 1000
If it is something more general, you can set FS and OFS (read about FS and OFS) in your begin block:
awk 'BEGIN{FS=","; OFS=""} ...' file
You need to set OFS (the output field separator). Unfortunately, this has no effect unless you also modify the string, leading the rather cryptic:
awk '{$1=$1}1' FS=, OFS=
Although, if you are happy with some additional space being added, you can leave OFS at its default value (a single space), and do:
awk -F, '{$1=$1}1'
and if you don't mind omitting blank lines in the output, you can simplify further to:
awk -F, '$1=$1'
You could also remove the field separators:
awk -F, '{gsub(FS,"")} 1'
Set FS to the input field separators. Assigning to $1 will then reformat the field using the output field separator, which defaults to space:
awk -F',\s*' '{$1 = $1; print}'
See the GNU Awk Manual for an explanation of $1 = $1