Awk count frequency - awk

Hey i want to count the amount of data in a certain column in awk.
an example dataset is
2 5 8
1 3 7
8 5 9
and I want to count the frequency of the 5 in the second colum. This is what i tried that didn't work
{
total = 0;
for(i=1;i<=NF;i++)
{
if(i==2)
{if($i==5) {total++;}
}
printf("%s ", total);
}
}

How about the following:
awk '{ if ($2==5) count++ } END { print count }'

awk 'NR == 1 {ind = 0} $2 == 5 {ind++} END {print ind}' testdata.txt

Related

If specific columns match, add them to the same row using BASH

I have data that looks like this:
client1 5 10 12 17
client1 6 8 3 20
client1 3 2 2 2
client2 3 3 3 3
client2 4 4 0 0
client2 0 3 3 9
...
client100 3 3 2 1
client100 1 1 1 2
client100 3 3 4 4
I want to make it so there is only one row for every client, with all the information of each client row, merged into one. So for example, client1 and client2 would look like this merged (but obviously I need all the clients merged.)
client1 5 10 12 17 6 8 3 20 3 2 2 2
client2 3 3 3 3 4 4 0 0 0 3 3 9
awk '{ x[$1]=x[$1] " " $2; y[$2]=y[$2] " " $1; }
END {
for (k in x) print k,x[k] >"OUTPUT1";
for (k in y) print k,y[k] >"OUTPUT2";
}' INPUT
When you're merging, you need to add all the fields, not just field 2. The easiest way to do this is to empty field 1, then append the whole record to the array entry.
awk '{ client = $1; $1 = ""; x[client] = x[client] $0 }
END { for (k in x) print k x[k] }' INPUT
I'm not sure what your array y was for. There doesn't seem to be any reason for an array that uses the second field as the keys.
Could you please try following. This should provide output in same order in which $1 has occurred in Input_file.
awk '
{
gsub(/\r/,"")
}
!a[$1]++{
b[++count]=$1
}
{
val=$1
$1=""
sub(/^ +/,"")
c[val]=(c[val]?c[val] OFS:"")$0
}
END{
for(i=1;i<=count;i++){
print b[i],c[b[i]]
}
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program.
{
gsub(/\r/,"")
}
!a[$1]++{ ##Checking condition if $1 is NOT in array a then do following.
b[++count]=$1 ##Creating array b with index count and value is $1.
}
{
val=$1 ##Creating a variable val whose value is $1.
$1="" ##Nullifying $1 here.
sub(/^ +/,"") ##Substituting initial space with null here.
c[val]=(c[val]?c[val] OFS:"")$0 ##Creating an array c whose index is variable val and value is complete line value and its concatenating its own value each time cursor comes here.
}
END{
for(i=1;i<=count;i++){ ##Starting a for loop from i=1 till value of count here.
print b[i],c[b[i]] ##Printing value of array b with index i and array c with index of b[i].
}
}
' Input_file ##Mentioning Input_fie name here.

Stored each of the first 2 blocks of lines in arrays

I've sorted it by using Google Sheet, but its gonna takes a long time, so I figured it out, to settle it down by awk.
input.txt
Column 1
2
2
2
4
4
Column 2
562
564
119
215
12
Range
13455,13457
13161
11409
13285,13277-13269
11409
I've tried this script, so it's gonna rearrange the value.
awk '/Column 1/' RS= input.txt
(as referred in How can I set the grep after context to be "until the next blank line"?)
But it seems, it's only gonna take one matched line
It should be sorted by respective lines.
Result:
562Value2#13455
562Value2#13457
564Value2#13161
119Value2#11409
215Value4#13285
215Value4#13277-13269
12Value4#11409
it should be something like that, the "comma" will be repeating the value from Column 1 and Column 2
etc:
Range :
13455,13457
Result :
562Value2#13455
562Value2#13457
idk what sorting has to do with it but it seems like this is what you're looking for:
$ cat tst.awk
BEGIN { FS=","; recNr=1; print "Result:" }
!NF { ++recNr; lineNr=0; next }
{ ++lineNr }
lineNr == 1 { next }
recNr == 1 { a[lineNr] = $0 }
recNr == 2 { b[lineNr] = $0 }
recNr == 3 {
for (i=1; i<=NF; i++) {
print b[lineNr] "Value" a[lineNr] "#" $i
}
}
$ awk -f tst.awk input.txt
Result:
562Value2#13455
562Value2#13457
564Value2#13161
119Value2#11409
215Value4#13285
215Value4#13277-13269
12Value4#11409

AWK: How can I print averages of consecutive numbers in a file, but skip over alphabetical characters/strings?

I've figured out how to get the average of a file that contains numbers in all lines such as:
Numbers.txt
1
2
4
8
Output:
Average: 3.75
This is the code I use for that:
awk '{ sum += $1; tot++ } END { print sum / tot; }' Numbers.txt
However, the problem is that this doesn't take into account possible strings that might be in the file. For example, a file that looks like this:
NumbersAndExtras.txt
1
2
4
8
Hello
4
5
6
Cat
Dog
2
4
3
For such a file I'd want to print the multiple averages of the consecutive numbers, ignoring the strings such that the result looks something like this:
Output:
Average: 3.75
Average: 5
Average: 3
I could devise some complicated code that might accomplish that with variables and 'if' statements and loops and whatnot, but I've been told it's easier than that given some of awk features. I'd like to know how that might look like, along with an explanation of why it works.
BEGIN runs before reading the first line from file. Set sum and count to 0.
awk 'BEGIN{ sum=0; count=0} {if ( /[a-z][A-Z]/ ) { if (count > 0) {avg = sum/count; print avg;} count=0; sum=0} else { count++; sum += $1} } END{if (count > 0) {avg = sum/count; print avg}} ' NumbersAndExtras.txt
When there is an alphabet on the line, calculate and print average so far.
And do the same in the END block that runs after processing the whole file.
Keep it simple:
awk '/^$/{next}
/^[0-9]+/{a+=$1+0;c++;next}
c&&a{print "Average: "a/c;a=c=0}
END{if(c&&a){print "Average: "a/c}}' input_file
Results:
Average: 3.75
Average: 5
Average: 3
Another one:
$ awk '
function avg(s, c) { print "Average: ", s/c }
NF && !/^[[:digit:]]/ { if (count) avg(sum, count); sum = 0; count = 0; next}
NF { sum += $1; count++ }
END {if (count) avg(sum, count)}
' <file
Note: The value of this answer in explaining the solution; other answers offer more concise alternatives.
Try the following:
Note that this is an awk command with a script specified as a multi-line shell string literal - you can paste the whole thing into your terminal to try it; while it is possible to cram this into a single line, it hurts readability and the ability to comment:
awk '
# Define output function that prints an average.
function printAvg() { print "Average: ", sum/count }
# Skip blank lines
NF == 0 { next}
# Is the line non-numeric?
/[[:alpha:]]/ {
# If this line ends a numeric block, print its
# average now and reset the variables to start the next group.
if (count) {
printAvg()
wasNum = sum = count = 0
}
# Skip to next line.
next
}
# Numeric line: set flag, sum, and increment counter.
{ sum += $1; count++ }
# Finally:
END {
# If there is a group whose average has not been printed yet,
# do it now.
if (count) printAvg()
}
' NumbersAndExtras.txt
If we condense whitespace and strip the comments, we still get a reasonably readable solution, as long as we still use multiple lines:
awk '
function printAvg() { print "Average: ", sum/count }
NF == 0 { next}
/[[:alpha:]]/ { if (count) { printAvg(); sum = count = 0 } next }
{ sum += $1; count++ }
END { if (count) printAvg() }
' NumbersAndExtras.txt

How can I subtract to each column its mean using awk?

I have a file such as the following (but with thousands of rows and hundreds of columns)
1 2 1
1 2 2
3 2 3
3 2 6
How can I subtract to each column/field its mean using awk, in order to obtain such a thing?
-1 0 -2
-1 0 -1
1 0 0
1 0 3
Thank you very much for your help.
The most close solution http://www.unix.com/shell-programming-scripting/102293-normalize-dataset-awk.html does not seem to do the job "element by element". Of course it performs another operation, but the generic concept is "perform an operation on each column using a value calculated on that column"
With awk in two passes:
awk '
NR==FNR {
for (i=1;i<=NF;i++) {
a[i]+=$i
}
next
}
{
for (y=1;y<=NF;y++) {
printf "%2d ", $y-=(a[y]/(NR-FNR))
}
print ""
}' file file
With awk in one pass:
awk '{
for (i=1;i<=NF;i++) {
a[i]+=$i;
b[NR,i]=$i
}
}
END {
for (i=1;i<=NR;i++) {
for (j=1;j<=NF;j++) {
printf "%2d ",b[i,j]-=(a[j]/NR)
}
print ""
}
}' file
import sys, numpy as np
a = np.array([i.strip().split() for i in open(sys.argv[1])],dtype =float)
for i in a - np.mean(a,axis=0): print ' '.join(map(str, i))
Usage : python script.py inputFile

How to get total column from awk

I am testing awk and got this thought .So we know that
raja#badfox:~/Desktop/trails$ cat num.txt
1 2 3 4
1 2 3 4
4 1 2 31
raja#badfox:~/Desktop/trails$ awk '{ if ($1 == '4') print $0}' num.txt
4 1 2 31
raja#badfox:~/Desktop/trails$
so the command going to check for 4 at 1st column in filename num.txt .
So now i want output as there is a 4 at column 4 also and for example if i have 100 column of information and i want get the output as how many columns i have with the term i am searching .
I mean from the above example i want column 4 and column 1 as the output and i am searching for 4 .
If you are trying to find the rows which contain your search item (in this case, the value 4), and you want a count of how many such values appear in the row (as well as the row's data), then you need something like:
awk '{ count=0
for (i = 1; i <= NF; i++) if ($i == 4) count++
if (count) print $i ": " $0
}'
That doesn't quite fit onto one SO line.
If you merely want to identify which columns contain the search value, then you can use:
awk '{ for (i = 1; i <= NF; i++) if ($i == 4) column[i] = 1 }
END { for (i in column) print i }'
This sets the (associative) array element column[i] to 1 for each column that contains the search value, 4. The loop at the end prints the column numbers that contain 4 in an indeterminate (not sorted) order. GNU awk includes sort functions (asort, asorti); POSIX awk does not. If sorted order is crucial, then consider:
awk 'BEGIN { max = 0 }
{ for (i = 1; i <= NF; i++) if ($i == 4) { column[i] = 1; if (i > max) max = i } }
END { for (i = 1; i <= max; i++) if (column[i] == 1) print i }'
You are looking for the NF variable. It's the number of fields in the line.
Here's an example of how to use it:
{
if (NF == 8) {
print $3, $8;
} else if (NF == 9) {
print $3, $9;
}
}
Or inside a loop:
# This will print the line if any field has the value 4
for (i=1; i<=NF; i++) {
if ($i == 4)
print $0
}