awk to print all numbers between starting number in column1 and ending number in column2 - awk

I have data in 2 columns in below format
92312401 92312403
92312417 92312418
92345810 92345814
and want output to print all numbers starting from value in $1 and end at value in $2 in awk. e.g. output should be like.
92312401
92312402
92312403
92312417
92312418
92345810
92345811
92345812
92345813
92345814
Can someone please help that how it can be done in linux/unix environment.

Quite simply:
awk '{ for(i = $1; i <= $2; ++i) print i }' filename

Related

awk print sum of group of lines

I have a file with a column named (effect) which has rows separated by blank lines,
(effect)
1
1
1
(effect)
1
1
1
1
(effect)
1
1
I know how to print the sum of column like
awk '{sum+=$1;} END{print sum;}' file.txt
Using awk how can I print the sum of each (effect) in for loop? such that I have three lines or multiple lines in other cases like below
sum=3
sum=4
sum=2
You can check if there is an (effect) part, and print the sum when encountering either the (effect) part or when in the END block.
awk '
$1 == "(effect)" { if(seen) print "sum="sum; seen = 1; sum = 0 }
/[0-9]/ { sum += $1 }
END { if (seen) print "sum="sum }
' file
Output
sum=3
sum=4
sum=2
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='(^|\n)?\\(effect\\)[^(]*' '
RT{
gsub(/\(effect\)\n|\n+[[:space:]]*$/,"",RT)
num=split(RT,arr,ORS)
print "sum="num
}
' Input_file
Explanation: Simple explanation would be, using GNU awk. In awk program set RS as (^|\n)?\\(effect\\)[^(]* regex for whole Input_file. In main program checking condition if RT is NOT NULL then using gsub(Global substitution) function to substitute (effect)\n and \n+[[:space:]]*$(new lines followed by spaces at end of value) with NULL in RT. Then splitting value of RT into array named arr with delimiter of ORS and saving its(total contents value OR array length value) into variable named num, then printing sum= along with value of num here to get required results.
With shown samples, output will be as follows:
sum=3
sum=4
sum=2
This should work in any version of awk:
awk '{sum += $1} $0=="(effect)" && NR>1 {print "sum=" sum; sum=0}
END{print "sum=" sum}' file
sum=3
sum=4
sum=2
Similar to #Ravinder's answer, but does not depend on the name of the header:
awk -v RS='' -v FS='\n' '{
sum = 0
for (i=2; i<=NF; i++) sum += $i
printf "sum=%d\n", sum
}' file
RS='' means that sequences of 2 or more newlines separate records.
The Field Separator is newline.
The for loop omits field #1, the header.
However that means that empty lines truly need to be empty: no spaces or tabs allowed. If your data might have blank lines that contain whitespace, you can set
-v RS='\n[[:space:]]*\n'
$ awk -v RS='(effect)' 'NR>1{sum=0; for(i=1;i<=NF;i++) sum+=$i; print "sum="sum}' file
sum=3
sum=4
sum=2

Print every nth column of a file

I have a rather big file with 255 coma separated columns and I need to print out every third column only.
I was trying something like this
awk '{ for (i=0;i<=NF;i+=3) print $i }' file
but that doesn't seem to be the solution, since it prints to only one long column. Anybody can help? Thanks
Here is one way to do this.
The script prog.awk:
BEGIN {FS = ","} # field separator
{for (i = 1; i <= NF; i += 3) printf ("%s%c", $i, i + 3 <= NF ? "," : "\n");}
Invocation:
awk -f prog.awk <input.csv >output.csv
Example input.csv:
1,2,3,4,5,6,7,8,9,10
11,12,13,14,15,16,17,18,19,20
Example output.csv:
1,4,7,10
11,14,17,20
It behaves like that because by default awk splits fields in spaces. You have to tell it to split them with commas, and it's done using the FS variable or the -F switch. Besides that, first field is number one. The zero is the whole line, so also change the initial value of the for loop:
awk -F',' '{ for (i=1;i<=NF;i+=3) print $i }' file

New to Awk. Struggling with negative number formatting

Goal: to output only data that is above 1 and below -1
or
output data that is between 1 and -1
I have the basics of awk and can print column 2 (where my data is)
notice I also specified a range of 0-1
awk '/[0-1]/ {print $2}' test.dat
I am also needing to have the line number so I added NR...
awk '/[0-1]/ {print $2 NR}' test.dat
To make sure I am clear, the point is to identify which lines of the data are outside of the acceptable range, so we can ignore them in our analysis. (ie anything bigger than 1 or lower than -1 is too much of a change).
Any help you can provide would be great. I have pasted some sample data below.
http://pastebin.com/7tpBAqua
Not sure if you want to evaluate the data in every column, or if there's a specific column you need to test. Testing a single column is simplest; testing multiple or all columns is a fairly simple repetitive extension of the pattern. Since you mention column 2 specifically, let's assume you want to print column 2 only when it is between -1 and 1:
awk -F, '($2 >= -1) && ($2 <= 1) { print $2 }'
To test for the field being greater than 1 or less than -1 instead:
awk -F, '($2 <= -1) || ($2 >= 1) { print $2 }'
Printing a different field, or the entire line instead ($0) should be fairly obvious. To examine each field, either simply repeat the entire ($2 >= -1) && ($2 <= 1) { print $2 } clause for each field you're interested in (which quickly gets verbose), or something like this (not tested):
awk -F, '{ for (i = 1; i <= NF; ++i) if (($i >= -1) && ($i <= 1)) print $i; }'
awk -F'[ ,]' 'NR>2{for (i=2;i<=NF;i++) if ($i<-1 || $i>1) { print NR; next } }' file

awk script for finding smallest value from column

I am beginner in AWK, so please help me to learn it. I have a text file with name snd and it values are
1 0 141
1 2 223
1 3 250
1 4 280
I want to print the entire row when the third column value is minimu
This should do it:
awk 'NR == 1 {line = $0; min = $3}
NR > 1 && $3 < min {line = $0; min = $3}
END{print line}' file.txt
EDIT:
What this does is:
Remember the 1st line and its 3rd field.
For the other lines, if the 3rd field is smaller than the min found so far, remember the line and its 3rd field.
At the end of the script, print the line.
Note that the test NR > 1 can be skipped, as for the 1st line, $3 < min will be false. If you know that the 3rd column is always positive (not negative), you can also skip the NR == 1 ... test as min's value at the beginning of the script is zero.
EDIT2:
This is shorter:
awk 'NR == 1 || $3 < min {line = $0; min = $3}END{print line}' file.txt
You don't need awk to do what you want. Use sort
sort -nk 3 file.txt | head -n 1
Results:
1 0 141
I think sort is an excellent answer, unless for some reason what you're looking for is the awk logic to do this in a larger script, or you want to avoid the extra pipes, or the purpose of this question is to learn more about awk.
$ awk 'NR==1{x=$3;line=$0} $3<x{line=$0} END{print line}' snd
Broken out into pieces, this is:
NR==1 {x=$3;line=$0} -- On the first line, set an initial value for comparison and store the line.
$3<x{line=$0} - On each line, compare the third field against our stored value, and if the condition is true, store the line. (We could make this run only on NR>1, but it doesn't matter.
END{print line} -- At the end of our input, print whatever line we've stored.
You should read man awk to learn about any parts of this that don't make sense.
a short answer for this would be:
sort -k3,3n temp|head -1
since you have asked for awk:
awk '{if(min>$3||NR==1){min=$3;a[$3]=$0}}END{print a[min]}' your_file
But i prefer the shorter one always.
For calculating the smallest value in any column , let say last column
awk '(FNR==1){a=$NF} {a=$NF < a?$NF:a} END {print a}'
this will only print the smallest value of the column.
In case if complete line is needed better to use sort:
sort -r -n -t [delimiter] -k[column] [file name]
awk -F ";" '(NR==1){a=$NF;b=$0} {a=$NF<a?$NF:a;b=$NF>a?b:$0} END {print b}' filename
this will print the line with smallest value which is encountered first.
awk 'BEGIN {OFS=FS=","}{if ( a[$1]>$2 || a[$1]=="") {a[$1]=$2;} if (b[$1]<$2) {b[$1]=$2;} } END {for (i in a) {print i,a[i],b[i]}}' input_file
We use || a[$1]=="" because when 1st value of field 1 is encountered it will have null in a[$1].

Counting and matching process

I have a matching problem with awk :(
I will count first column elements in main.file and if its value is more than 2 I will print the first and the second column.
main.file
1725009 7211378
3353866 11601802
3353866 8719104
724973 3353866
3353866 7211378
For example number of "3353866" in the first column is 3, so output.file will be like that:
output.file
3353866 11601802
3353866 8719104
3353866 7211378
How can I do this in awk?
If you mean items with at least 3 occurrences, you can collect occurrences in one array and the collected values as a preformatted or delimited string in another.
awk '{o[$1]++;v[$1]=v[$1] "\n" $0}
END{for(k in o){if(o[k]<3)continue;
print(substr(v[k],1)}' main.file
Untested, not at my computer. The output order will be essentially random; you'll need another variable to keep track of line numbers if you require the order to be stable.
This would be somewhat less hackish in Perl or Python, where a hash/dict can contain a structured value, such as a list.
Another approach is to run through the file twice: it's a little bit slower, but the code is very neat:
awk '
NR==FNR {count[$1]++; next}
count[$1] > 2 {print}
' main.file main.file
awk '{store[$1"-"lines[$1]] = $0; lines[$1]++;}
END {for (l in store) {
split(l, pair, "-"); if (lines[pair[1]] > 2) { print store[l] } } }'
One approach is to track all the records seen, the corresponding key $1 for each record, and how often each key occurs. Once you've record those for all the lines, you can then iterate through all the records stored, only printing those for which the count of the key is greater than two.
awk '{
record[NR] = $0;
key[$0] = $1;
count[$1]++
}
END {
for (n=1; n <= length(record); n++) {
if (count[key[record[n]]] > 2) {
print record[n]
}
}
}'
Sort first, and then use awk to print only when you have 3 times or more the 1st field:
cat your_file | sort -n | awk 'prev == $1 {count++; p0=p1; p1=p2; p2=$2}
prev != $1 {prev=$1; count=1; p2=$2}
count == 3 {print $1 " " p0; print $1 " " p1; print $1 " " p2}
count > 3 {print $1 " " $2}'
This will avoid awk to use too much memory in case of big input file.
based on how the question looks and the Ray Toal edit, I'm guessing you mean based on count, so something like this works:
awk '!y[$1] {y[$1] = 1} x[$1] {if(y[$1]==1) {y[$1]==2; print $1, x[$1]}; print} {x[$1] = $2}'