awk: if a user defined function returns 1, start from the very beginning - awk

Let's say I have the following script:
function helper1() {
if (NR==3 && !/PATTERN/) {
return 1
} else {
if (NR>=13) {
print $0
}
return 0
}
}
BEGIN {
if (helper1() == 1) {
print $0
}
}
Which means, I have a user-defined helper function, which checks a file if the 3rd line contains some PATTERN, and if that's true, then it prints out all the other lines starting from line 13.
But if it's not true (the helper function returns 1), then I'd like awk to print all the lines starting from line 1. Which is not happening :)
Would be grateful for any advice here,
Thank you.

You may use this awk:
awk 'NR < 3 { # for first 2 lines
s = s $0 ORS # store all lines in a variable s
next # skip to next record
}
NR == 3 { # for record number 3
if (/PATTERN/) # if PATTERN is found
p = 1 # set flag p to 1
else # else
printf "%s", s # print first 2 lines
}
(p && NR >= 13) || !p # print if flag is not set or else if NR >= 13
' file
Using a function:
awk '
function helper1() {
if (NR < 3) {
s = s $0 ORS
return 0
}
else if (NR == 3) {
if (/PATTERN/)
p = 1
else
printf "%s", s
}
return (p && NR >= 13) || !p
}
helper1()
' file

Related

AWK, count variable does not return 0

So I have a program almost finished using AWK inside of Unix/Linux! I have to return a count, sum, max, min, and average. The program works when numbers are found. But, if no numbers are found and count does not iterate... I don't get back 0. I get back something like, "1a2,5".
Here is my code,
#! /bin/awk -f
{
sum += $1
}
/[0-9]+/{
if (NR >= 1){
if ($0 != 0 && NF == 1){
if ($0 !~ /[A-Za-z]/){
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
mean = sum/NR
count+=1
}
} else if ($0 == 0)
{exit(0)
}
}
}
END{
printf("# items: %d \n",count)
printf("Total: %lf\n", sum)
printf("Maximum: %lf\n", max)
printf("Minimum: %lf\n", min)
printf("Average: %lf\n", mean)
}
What am I forgetting/needs to be changed so count can return as 0 if nothing is found. Thank you
#! /bin/awk -f
# Nothing to do
NF > 1 {
next
}
# now $0 == $1
# Only with numbers
#$1 ~ /^[0-9]+$/ {
# or
$1 ~ /^[0-9]+(\.[0-9]+)?$/ {
# exit if zero found
if ($1 == 0) {
exit 0
}
# compute
sum += $1
min = (NR==1 || $1<min ? $1 : min)
max = (NR==1 || $1>max ? $1 : max)
mean = sum/NR
count += 1
}
END {
printf("# items: %d \n", count)
printf("Total: %lf\n", sum)
printf("Maximum: %lf\n", max)
printf("Minimum: %lf\n", min)
printf("Average: %lf\n", mean)
}

awk running total count and sum (Cont)

In continuation of previous post , how to calculate 80%-20% rule contribution of vendors on Daily basis ($1) AND Region ($1) wise.
The input file is alredy sorted based on Date & Region and Amount from highest to lowest
Input.csv
Date,Region,Vendor,Amount
5-Apr-15,east,aa,123
5-Apr-15,east,bb,50
5-Apr-15,east,cc,15
5-Apr-15,south,dd,88
5-Apr-15,south,ee,40
5-Apr-15,south,ff,15
5-Apr-15,south,gg,10
7-Apr-15,east,ii,90
7-Apr-15,east,jj,20
In the above input, based on Date($1) AND Region ($2) field need to populate Running Sum of Amount then calculate percentage of Running Sum of Amount for the day & Region
Date,Region,Vendor,Amount,RunningSum,%RunningSum
5-Apr-15,east,aa,123,123,65%
5-Apr-15,east,bb,50,173,92%
5-Apr-15,east,cc,15,188,100%
5-Apr-15,south,dd,88,88,58%
5-Apr-15,south,ee,40,128,84%
5-Apr-15,south,ff,15,143,93%
5-Apr-15,south,gg,10,153,100%
7-Apr-15,east,ii,90,90,82%
7-Apr-15,east,jj,20,110,100%
Once it is derived 80% or first hit of 80%above need to consider as 80% contribution remaining line items need to be consider as 20% contribution.
Date,Region,Countof80%Vendor, SumOf80%Vendor, Countof20%Vendor, SumOf20%Vendor
5-Apr-15,east,2,173,1,15
5-Apr-15,south,2,128,2,25
7-Apr-15,east,1,90,1,20
This awk script will help you do the first part, ask if you need clarification. Basically it stores the values in arrays and prints out the requested info after parsing the document.
awk -F',' 'BEGIN{OFS=FS}
NR==1{print $0, "RunningSum", "%RunningSum"}
NR!=1{
if (date == $1 && region == $2) {
counts[i]++
cities[i][counts[i]] = $3
amounts[i][counts[i]] = $4
rsum[i][counts[i]] = rsum[i][counts[i] - 1] + $4
} else {
date = $1; region = $2
dates[++i] = $1
regions[i] = $2
counts[i] = 1
cities[i][1] = $3
amounts[i][1] = $4
rsum[i][1] = $4
}
}
END{
for(j=1; j<=i; j++) {
total = rsum[j][counts[j]];
for (k=1; k<=counts[j]; k++) {
print dates[j], regions[j], cities[j][k], amounts[j][k], rsum[j][k], int(rsum[j][k]/total*100) "%"
}
if (j != i) { print "" }
}
}' yourfilename
The second part can be done like this (using the output of the first awk script):
awk -F'[,%]' 'BEGIN{ OFS="," }
NR==1 || $0 ~ /^$/ {
over = ""
record = 1
}
! (NR==1 || $0 ~ /^$/) {
if (record) {
dates[++i] = $1
regions[i] = $2
record = ""
}
if (over) {
twenty[i]++
twenties[i] += $4
} else {
eighty[i]++
eighties[i] += $4
}
if ($6 >= 80) {
over = 1
}
}
END {
print "Date","Region","Countof80%Vendor", "SumOf80%Vendor", "Countof20%Vendor", "SumOf20%Vendor"
for (j=1; j<=i; j++) {
print dates[j], regions[j], eighty[j], eighties[j], twenty[j], twenties[j]
}
}' output/file/of/first/script

Awk input variable as a rule

Good day!
I have the next code:
BLOCK=`awk '
/\/\* R \*\// {
level=1
count=0
}
level {
n = split($0, c, "");
for (i = 1; i <= n; i++)
{
printf(c[i]);
if (c[i] == ";")
{
if(level==1)
{
level = 0;
if (count != 0)
printf("\n");
};
}
else if (c[i] == "{")
{
level++;
count++;
}
else if (c[i] == "}")
{
level--;
count++;
}
}
printf("\n")
}' $i`
That code cuts the piece of the file from /* R */ mark to the ';' symbol with taking into account the details like braces etc. But that isn't important. I want to replace the hard-coded /* R */ by the variable:
RECORDSEQ="/* R */"
...
BLOCK=`awk -v rec="$RECORDSEQ" '
rec {
level=1
count=0
}
But that doesn't work.
How can I fix it?
Thank you in advance.
Found the solution:
RECORDSEQ="/* R */"
# Construct regexp for awk
RECORDSEQREG=`echo "$RECORDSEQ" | sed 's:\/:\\\/:g;s:\*:\\\*:g'`
# Cycle for files
for i in $SOURCE;
do
# Find RECORDSEQ and cut out the block
BLOCK=`awk -v rec="$RECORDSEQREG" '
$0 ~ rec {
level=1
count=0
}
...
Many thanks to people who helped.

merge file on the basis of 2 fields

file1
session=1|w,eventbase=4,operation=1,rule=15
session=1|e,eventbase=5,operation=2,rule=14
session=2|t,eventbase=,operation=1,rule=13
file2
field1,field2,field3,session=1,fieldn,operation=1,fieldn
field1,field2,field3,session=1,fieldn,operation=2,fieldn
field1,field2,field3,session=2,fieldn,operation=2,fieldn
field1,field2,field3,session=2,fieldn,operation=1,fieldn
Output
field1,field2,field3,session=1,fieldn,operation=1,fieldn,eventbase=4,rule=15
field1,field2,field3,session=1,fieldn,operation=2,fieldn,eventbase=5,rule=14
field1,field2,field3,session=2,fieldn,operation=2,fieldn,NOMATCH
field1,field2,field3,session=2,fieldn,operation=1,fieldn,eventbase=,rule=13
I have Tried
BEGIN { FS = OFS = "," }
FNR == NR {
split($1,s,"|")
session=s[1];
a[session,$3] = session","$2","$3","$4;
next
}
{
split($4,x,"|");
nsession=x[1];
if(nsession in a)print $0 a[nsession,$6];
else print $0",NOMATCH";
}
Issue is I am not able to FIND nsession in 2D array a with if(nsession in a)
matching 2 files on the combination basis of session and operation
Thanks.. it helped.. Now I am learning :) Thanks team
BEGIN { FS = OFS = "," }
FNR == NR {
split($1,s,"|")
session=s[1];
a[session,$3] = session","$2","$3","$4;
next
}
{
split($4,x,"|");
nsession=x[1];
key=nsession SUBSEP $6
if(key in a)print $0 a[nsession,$6];
else print $0",NOMATCH";
}
You can try
awk -f merge.awk file1 file2
where merge.awk is
NR==FNR {
sub(/[[:blank:]]*$/,"")
getSessionInfo(1)
ar[ses,op]=",eventbase="evb",rule="rule
next
}
{
sub(/[[:blank:]]*$/,"")
getSessionInfo(0)
if ((ses,op) in ar)
print $0 ar[ses,op]
else
print $0 ",NOMATCH"
}
function getSessionInfo(f, a) {
match($0,/session=([^|])[|,]/,a)
ses=a[1]
match($0,/operation=([^,]),/,a)
op=a[1]
if (f) {
match($0,/eventbase=([^,]),/,a)
evb=a[1]
match($0,/rule=(.*)$/,a)
rule=a[1]
}
}

awk '/range start/,/range end/' within script

How do I use the awk range pattern '/begin regex/,/end regex/' within a self-contained awk script?
To clarify, given program csv.awk:
#!/usr/bin/awk -f
BEGIN {
FS = "\""
}
/TREE/,/^$/
{
line="";
for (i=1; i<=NF; i++) {
if (i != 2) line=line $i;
}
split(line, v, ",");
if (v[5] ~ "FOAM") {
print NR, v[5];
}
}
and file chunk:
TREE
10362900,A,INSTL - SEAL,Revise
,10362901,A,ASSY / DETAIL - PANEL,Revise
,,-203,ASSY - PANEL,Qty -,Add
,,,-309,PANEL,Qty 1,Add
,,,,"FABRICATE FROM TEKLAM NE1G1-02-250 PER TPS-CN-500, TYPE A"
,,,-311,PANEL,Qty 1,Add
,,,,"FABRICATE FROM TEKLAM NE1G1-02-750 PER TPS-CN-500, TYPE A"
,,,-313,FOAM SEAL,1.00 X 20.21 X .50 THK,Qty 1,Add
,,,,"BMS1-68, GRADE B, FORM II, COLOR BAC706 (BLACK)"
,,,-315,FOAM SEAL,1.50 X 8.00 X .25 THK,Qty 1,Add
,,,,"BMS1-68, GRADE B, FORM II, COLOR BAC706 (BLACK)"
,PN HERE,Dual Lock,Add
,
10442900,IR,INSTL - SEAL,Update (not released)
,10362901,A,ASSY / DETAIL - PANEL,Revise
,PN HERE,Dual Lock,Add
I want to have this output:
27 FOAM SEAL
29 FOAM SEAL
What is the syntax for adding the command line form '/begin regex/,/end regex/' to the script to operate on those lines only? All my attempts lead to syntax errors and googling only gives me the cli form.
why not use 2 steps:
% awk '/start/,/end/' < input.csv | awk csv.awk
Simply do:
#!/usr/bin/awk -f
BEGIN {
FS = "\""
}
/from/,/to/ {
line="";
for (i=1; i<=NF; i++) {
if (i != 2) line=line $i;
}
split(line, v, ",");
if (v[5] ~ "FOAM") {
print NR, v[5];
}
}
If the from to regexes are dynamic:
#!/usr/bin/awk -f
BEGIN {
FS = "\""
FROM=ARGV[1]
TO=ARGV[2]
if (ARGC == 4) { # the pattern was the only thing, so force read from standard input
ARGV[1] = "-"
} else {
ARGV[1] = ARGV[3]
}
}
{ if ($0 ~ FROM) { p = 1 ; l = 0} }
{ if ($0 ~ TO) { p = 0 ; l = 1} }
{
if (p == 1 || l == 1) {
line="";
for (i=1; i<=NF; i++) {
if (i != 2) line=line $i;
}
split(line, v, ",");
if (v[5] ~ "FOAM") {
print NR, v[5];
}
l = 0 }
}
Now you have to call it like: ./scriptname.awk "FROM_REGEX" "TO_REGEX" INPUTFILE. The last param is optional, if missing STDIN can be used.
HTH
You need to show us what you have tried. Is there something about /begin regex/ or /end regex/ you're not telling us, other wise your script with the additions should work, i.e.
#!/usr/bin/awk -f
BEGIN {
FS = "\""
}
/begin regex/,/end regex/{
line="";
for (i=1; i<=NF; i++) {
if (i != 2) line=line $i;
}
split(line, v, ",");
if (v[5] ~ "FOAM") {
print NR, v[5];
}
}
OR are you using an old Unix, where there is old awk as /usr/bin/awk and New awk as /usr/bin/nawk. Also see if you have /usr/xpg4/bin/awk or gawk (path could be anything).
Finally, show us the error messages you are getting.
I hope this helps.