awk array initialization at each line fails - awk

I have a log file where each line contains some numbers separated by ,. I just wanted to some operation with each number. It seemed easy with awk, but somehow i got stuck. The array which i'm using to split each line, is getting initialized only once at the first line. After that the array is not getting clear. split is supposed to clear the array first then use it, i have even used delete array. But still the problem persists. Any help will be appreciated.
Below is some example;
This is a sample file
[[bash_prompt$]]$ cat log
1,2,3
2
9
1,4
5,7
7
8
6,2
This is what i'm getting
[[bash_prompt$]]$ awk '{print "New Line " $1; delete a; split($1,a,","); for(var in a){ print "Array Element " var; } }' log
New Line 1,2,3
Array Element 1
Array Element 2
Array Element 3
New Line 2
Array Element 1
New Line 9
Array Element 1
New Line 1,4
Array Element 1
Array Element 2
New Line 5,7
Array Element 1
Array Element 2
New Line 7
Array Element 1
New Line 8
Array Element 1
New Line 6,2
Array Element 1
Array Element 2
But below is what I am expecting
[[bash_prompt$]]$ awk '{print "New Line " $1; delete a; split($1,a,","); for(var in a){ print "Array Element " var; } }' log
New Line 1,2,3
Array Element 1
Array Element 2
Array Element 3
New Line 2
Array Element 2
New Line 9
Array Element 9
New Line 1,4
Array Element 1
Array Element 4
New Line 5,7
Array Element 5
Array Element 7
New Line 7
Array Element 7
New Line 8
Array Element 8
New Line 6,2
Array Element 6
Array Element 2
Mine is GNU Awk 3.1.5. Also I have found another way by using combination of shell and awk, ( running awk individually on each line) but i want to know what I am doing wrong

for (var in array) sets var to the index of the next element of the array so var is the index, not the contents. Use
print "Array Element " a[var]

You don't need to use split() just set the field separator:
awk -F, '{print "New Line",$0;for(i=1;i<=NF;i++)print "Array Element",$i}' log
New Line 1,2,3
Array Element 1
Array Element 2
Array Element 3
New Line 2
Array Element 2
New Line 9
Array Element 9
New Line 1,4
Array Element 1
Array Element 4
New Line 5,7
Array Element 5
Array Element 7
New Line 7
Array Element 7
New Line 8
Array Element 8
New Line 6,2
Array Element 6
Array Element 2

Related

Substract two fields of two consecutive rows in awk

I have a file as follows:
5 6
7 8
12 15
Using awk, how can I find the distance between the second column of one line with the first column of the next line. In this case, distance between 6 and 7 and 8 and 12 and print as follows, distance of first line set to zero:
5 6 0
7 8 1
12 15 4
awk '{print $0, (NR>1?$1-p:0); p=$2}' file
try:
awk 'NR==1{val=$2;print $0,"0";next} {print $0,$1-val;val=$2}' Input_file
Adding explanation now too successfully.
Checking for NR==1(when first line of Input_file) is there, then create a variable named val tp second field of the Input_file and then print the current line with "0" then do next(which will skip all further statements). Then printing the current line along with $1-val's value and then assigning the value of variable of val to $2 of the current line then.
Short awk approach:
awk 'NR==1{ $3=0 }NR>1{ $3=$1-p }{ p=$2 }1' file
The output:
5 6 0
7 8 1
12 15 4
p=$2 - capture the 2nd field value (p - considered as previous line value)

what happened when delete array element in awk?

I wrote the following code :
awk -F"\t" '{
a[1]=1; a[2]=2; a[3]=3; a[4]=4; a[5]=5;
delete a[4];
print "len", length(a);
for( i =1; i<=length(a); i++)
print i"\t"a[i]
for( i in a)
print i"\t"a[i]
}' -
And the output is:
len 4
1 1
2 2
3 3
4
5 5
4
5 5
1 1
2 2
3 3
my question is as I have deleted the 4th element and the length of the array a has become 4, so why there is still 5 elements with the value of the 4th elements become blank when I print the array? Does that indicate that 'delete' only delete the value and the corresponding index remains?
Remove the middle for loop and you'll see what's happening:
$ echo x | awk -F"\t" '{
a[1]=1; a[2]=2; a[3]=3; a[4]=4; a[5]=5;
delete a[4];
print "len", length(a);
for( i in a)
print i"\t"a[i]
}'
len 4
2 2
3 3
5 5
1 1
The delete is working as you expect, removing the array element with index 4, leaving 4 elements with indices 1, 2, 3, and 5. (Even though you are using numeric indices, it's still an associative array and the old a[5] is not now accessible as a[4] --- it's still a[5].)
The reason you're seeing five elements in your example is the middle for loop:
for( i =1; i<=length(a); i++)
print i"\t"a[i]
By simply referring to a[4] in the above print statement, you are recreating an element of the a array with that index having an empty value.

Duplicate number in 2nd row, n number of times, where n is the number in the first row

I have text files that each have a single column of numbers:
2
3
4
I want to duplicate the second line n times, where n is the number in the first row, so the output looks like this:
3
3
I've done something similar in awk but can't seem to figure out this specific example.
$ awk 'NR==1{n=$1;} NR==2{for (i=1;i<=n;i++) print; exit;}' file
3
3
How it works
NR==1{n=$1;}
When we reach the first row, save the number in variable n.
NR==2{for (i=1;i<=n;i++) print; exit;}
When we reach the second row, print it n times and exit.
just for fun
read c d < <(head -2 file) | yes $d | head -n $c
extract first two rows, assign to c and d; repeat $d forever but get first $c rows.

AWK - find min value of each row with arbitrary size

I have a file with the lines as:
5 3 6 4 2 3 5
1 4 3 2 6 5 8
..
I want to get the min on each line, so for example with the input given above, I should get:
min of first line: 2
min of second line: 1
..
How can I use awk to do this for any arbitrary number of columns in each line?
If you don't mind the output using digits instead of words you can use this one liner:
$ awk '{m=$1;for(i=1;i<=NF;i++)if($i<m)m=$i;print "min of line",NR": ",m}' file
min of line 1: 2
min of line 2: 1
If you really do want to count in ordinal numbers:
BEGIN {
split("first second third fourth",count," ")
}
{
min=$1
for(i=1;i<=NF;i++)
if($i<min)
min=$i
print "min of",count[NR],"line: \t",min
}
Save this to script.awk and run like:
$ awk -f script.awk file
min of first line: 2
min of second line: 1
Obviously this will only work for files with upto 4 lines but just increase the ordinal numbers list to the maximum number you think you will need. You should be able to find a list online pretty easily.
Your problem is pretty simple. All you need to do is to define a variable min in the BEGIN part of your script, and at each line, you just have to perform a simple C-like algorithm for minimum element (set the first field as min, and then perform a check with the next field, and so on until you reach the final field of the line). The total number of fields in the line will be known to you because of the variable NF. So its just a matter of writing a for loop. Once the for loop is fully executed for the line, you will have the minimum element with you, and you could just print it.

substituting nth character of a field with awk

I have a file that looks like this
seq1 CT 5 CCCGCTGCTGATGAC
seq2 AG 8 CTGTGTAGATGATGGGTTAGAG
seq3 TG 3 CGTGTGACA
I am trying to replace the nth character of field 4 with the string in field 2, where n= value specified by field 3. The output would be
seq1 CT 5 CCCGCTTGCTGATGAC
seq2 AG 8 CTGTGTAAGATGATGGGTTAGAG
seq3 TG 3 CGTGGTGACA
my attempt looks like this
awk '{a=$3; b=$2; sub(/(substr($4, a, 1))/,b); print $0}’
I guess what is happening is that it is treating whats specified by sub command as a string rather than get the string specified by substr command, and variable b. After searching I can’t find the correct way of doing this.
Cheers,
awk '{$4 = substr($4, 1, $3-1) $2 substr($4, $3+1); print}'