Grouping GAWK output - gawk

Varlog kindly proivided a solution to a previous problem which looked at finding corresponding DISCHARGE Events to a INDUCT even and deleting this from an array. This left me with all the items that have not yet been discharged to their destination.
The output of this script (below)
/REDIRECT_ITEM_REPLY/ {
match($0, /itemId=<([^>]+)>/, ary1)
match($0, /CscdestinationId=<([^>]+)>/, ary2)
dest[ary1[1]] = ary2[1]
}
/DISCHARGE_VERIFIED/ {
match($0, /itemId=<([^>]+)>/, ary1)
delete dest[ary1[1]]
}
END {
for (id in dest) {
print dest[id]
}
}
OUTPUT:
17: CHU207
17: CHU207
35: CHU214
1: CHU001
157: FLY437
115: FLY424
108: FLY321
I would like to GROUP this information into something like :
CHU207 - 5
CHU001 - 10
FLY437 - 3
I was thinking about using the UNIQ command but just wondered how to incorporate this into the script, appreciate your help
I have tried a command line approach using uniq -c, but not sure if this is the best approach
gawk -f inductedNEW.awk item1.log | uniq -c
Appreciate your help
Phil

You don't need pipes here. You have everything you need in your awk program.
Change the END part of your script to:
END {
print "Detail:"
for (id in dest) {
print dest[id]
group=dest[id]; # group = "123: ABC"
sub(/^[0-9]+: */, "", group); # group = "ABC"
groupCounter[group]++; # counter++
}
print "Grouped:"
for(group in groupCounter) {
print group " - " groupCounter[group]
}
}
If you don't need the detail part, just remove it.

Related

Grabbing value from piped file contents

Let's say I have the following file:
credentials:
[default]
key_id = AKIAGHJQTOP
secret_key = alcsjkf
[default2]
key_id = AKIADGHNKVP
secret_key = njprmls
I want to grab the value of [default] key_id. I'm trying to do it with awk command but I'm open to any other way if it's more efficient and easier. Instead of passing a file name to awk, I want to pass the file contents from environmental variable FILE_CONTENTS
I tried the following:
$export VAR=$(echo "$FILE_CONTENTS" | awk '/credentials.default.key_id/ {print $2}')
But it didn't work. Any help is appreciated.
You can use awk like this:
cat srch.awk
BEGIN { FS = " *= *" }
{ sub(/^[[:blank:]]+/, "") }
/:[[:blank:]]*$/ {
sub(/:[[:blank:]]*$/, "")
k = $1
}
/^[[:blank:]]*\[/ {
s = k "." $1
}
NF == 2 {
map[s "." $1] = $2
}
key in map {
print map[key]
exit
}
# then use it as
echo "$FILE_CONTENTS" |
awk -v key='credentials.[default].key_id' -f srch.awk
AKIAGHJQTOP
# or else
echo "$FILE_CONTENTS" |
awk -v key='credentials.[default].secret_key' -f srch.awk
alcsjkf
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='(^|\\n)credentials:\\n[[:space:]]+\\[default\\]\\n[[:space:]]+key_id = \\S+' '
RT && num=split(RT,arr," key_id = "){
print arr[num]
}
' Input_file
Here is the Online demo for used regex(its bit changed from regex used in awk code as escaping is done in program not in site).
Assumptions:
no spaces between labels and :
no spaces between [ the stanza name and ]
all lines with attribute/value pairs have exactly 3 space-delimited fields as shown (ie, attr = value; value has no embedded spaces)
the contents of OP's variable (FILE_CONTENTS) is an exact copy (data and format) of the sample file provided by OP
NOTE: if the input file format can differ from these assumptions then additional code must be added to address said differences; as mentioned in comments ... writing your own parser is doable but you need to insure you address all possible format variations
One awk idea:
awk -v label='credentials' -v stanza='default' -v attr='key_id' '
/:/ { f1=0; if ($0 ~ label ":") f1=1 }
f1 && /[][]/ { f2=0; if ($0 ~ "[" stanza "]") f2=1 }
f1 && f2 && /=/ { if ($1 == attr) { print $3; f1=f2=0 } }
'
This generates:
AKIAGHJQTOP
$ awk 'f{print $3; exit} /\[default]/{f=1}' <<<"$FILE_CONTENTS"
AKIAGHJQTOP
If that's not all you need then edit your question to provide more truly realistic sample input/output including cases where the above doesn't work.
open to any other way if it's more efficient and easier
I suggest taking look at python's configparser, which is part of standard library. Let FILE_CONTENTS environment variable be holding
credentials:
[default]
key_id = AKIAGHJQTOP
secret_key = alcsjkf
[default2]
key_id = AKIADGHNKVP
secret_key = njprmls
then create file getkeyid.py with content as follows
import configparser
import os
config = configparser.ConfigParser()
config.read_string(os.environ["FILE_CONTENTS"].replace("credentials","#credentials",1))
print(config["default"]["key_id"])
and do
python3 getkeyid.py
to get output
AKIAGHJQTOP
Explanation: I retrieve string from environmental variable and replace credentials with #credentials at most 1 time in order to comment that line (otherwise parser will fail), then parse it and retrieve value corresponding to desired key.

Generating a new file after processing data in Shell script

The input file which is shown below is generated by performing results of 2 other files
i.e awk 'BEGIN{FS=OFS=","} FNR==NR{arr[$0];next} {print $1,$2,$3,$5,($4 in arr)?1:0}' $NGW_REG_RESP_FILE $NGW_REG_REQ_FILE >> $NGW_REG_FILE
$NGW_REG_FILE file contains below data based on that i have to create a new file
2020-12-21,18,1,1,1
2020-12-21,18,1,1,0
2020-12-21,18,1,2,1
2020-12-21,18,1,2,1
2020-12-21,18,2,1,1
2020-12-21,18,2,1,1
2020-12-21,18,2,1,0
2020-12-21,18,3,2,1
2020-12-21,18,3,2,1
2020-12-21,18,4,2,0
2020-12-21,18,4,2,1
2020-12-21,18,3,2,0
What this data indicates is:
Date,Hour,Quarter,ReqType,Success/failed
Reqtype there were 2 possibilities: 1-> incoming 2-> outgoing
last field: 1->success 0-> failed
Quarter -> 1,2,3,4
I want to read this file and generate a new file that contains data like below (MY OUTPUT FILE):
2020-12-21,18,1,1,1,1
2020-12-21,18,1,2,2,0
2020-12-21,18,2,1,2,1
.....
Explanation:
heading: date,hour,quarter,reqType,Success_count,Failure_count (for reference to understand o/p file)
Date H Q ReqID SuccessCnt Fail Count
2020-12-21,18,1,1 ,1 ,1
Explanation: in input file for quarter 1 both reqTypes(1&2) were present
there will be at max 2 entry in each quarter.
in quarter 1 for reqid 1 there were 2 requests, 1 got success and other got failed
so 1 as success cnt and 1 as failure cnt
2020-12-21,18,1,2,2,0
here quarter 1 ,for req ID 2 there were 2 requests both got success so success count be 2 and failure count be 0
**UPDATE
The answer which is given in the comment is worked exactly what I was looking for.
I have some updates in the sample input file, i.e one more columns gets added before the last column i.e the STATUS CODE which you can see in the below input i,e 200,400,300
2020-12-21,18,1,1,200,1
2020-12-21,18,2,1,400,0
2020-12-21,18,2,1,300,0
The existing code gives the below result in the output file:
i.e Total count of success/failed in that quarter. Which is Correct.
What I want to do is add one more column to the output file, next to the total failed count i.e the array holding those status codes.
2020-12-21,18,1,1,1,0,[] //empty array in end bcs there is no failed req,1,success req
2020-12-21,18,2,1,0,2,[400,300] // here 2 failed req,0 success request
<DATE>,<HOUR>,<QUARTER>,<REQ_TYPE>,<SUCCESS_COUNT>,<FAIL_CNT>,<ARRAY_HOLDING_STATUSCODE>
I have added below changes to the code , Bu not getting how to iterate in side the same for loop
`cat $input_file | grep -v Orig | awk -F, '{
if ($NF==1) {
map[$1][$2][$3][$4]["success"]++
}
else {
map[$1][$2][$3][$4]["fail"]++
harish[$1][$2][$3][$4][$5]++ //ADDED THIS
}
}
END {
PROCINFO["sorted_in"]="#ind_num_asc";
for (i in map) {
for (j in map[i]) {
for (k in map[i][j]) {
for (l in map[i][j][k]) {
print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"])
}
}
}
}
}' >> OUTPUT_FILE.txt`
With awk (GNU awk for array sorting):
awk -F, '{ if ($NF==1) { map[$1][$2][$3][$4]["success"]++ } else { map[$1][$2][$3][$4]["fail"]++ } } END { PROCINFO["sorted_in"]="#ind_num_asc";for (i in map) { for (j in map[i]) { for (k in map[i][j]) { for (l in map[i][j][k]) { print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"]) } } } } }' $NGW_REG_FILE
Explanation:
awk -F, '{
if ($NF==1) {
map[$1][$2][$3][$4]["success"]++ # If last field is 1, increment a success index in array map with other fields as further indexes
}
else {
map[$1][$2][$3][$4]["fail"]++ # Otherwise increment a fail index
}
}
END {
PROCINFO["sorted_in"]="#ind_num_asc"; # Set the array ordering
for (i in map) {
for (j in map[i]) {
for (k in map[i][j]) {
for (l in map[i][j][k]) {
print i","j","k","l","(map[i][j][k][l]["success"]==""?"0":map[i][j][k][l]["success"])","(map[i][j][k][l]["fail"]==""?"0":map[i][j][k][l]["fail"]) # Loop through the array and print the data in the format required. If there is no entry in the success or fail index, print 0.
}
}
}
}
}' $NGW_REG_FILE

How to find the maximum value for the field by ignoring the lines with characters using awk?

Since am newbie to the awk , please help me with your suggestions. I tried the below command to filter the maximum value and ignore the first & last lines from the sample text file separately. They work when I try them separately.
My query:
I need to ignore the last line and first few lines and from the file and then need to take the maximum value for the field 7 using awk .
I also need to ignore the lines with the characters . Can anyone suggest me the possibilities two use both the commands together and get the required output.
Sample file:
Linux 3.10.0-957.5.1.el7.x86_64 (j051s784) 11/24/2020 _x86_64_ (8 CPU)
12:00:02 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
12:10:01 AM 4430568 61359128 93.27 1271144 27094976 66771548 33.04 39005492 16343196 1348
12:20:01 AM 4423380 61366316 93.28 1271416 27102292 66769396 33.04 39012312 16344668 1152
12:30:04 AM 4406324 61383372 93.30 1271700 27108332 66821724 33.06 39028320 16343668 2084
12:40:01 AM 4404100 61385596 93.31 1271940 27107724 66799412 33.05 39031244 16344532 1044
06:30:04 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
07:20:01 PM 3754904 62034792 94.29 1306112 27555948 66658632 32.98 39532204 16476848 2156
Average: 4013043 61776653 93.90 1293268 27368986 66755606 33.03 39329729 16427160 2005
Commands used:
cat testfile | awk '{print $7}' | head -n -1 | tail -n+7
awk 'BEGIN{a= 0}{if ($7>0+a) a=$7} END{print a}' testfile
Expected output:
Maximum value for the column 7 by excluding the lines wherever alphabet character is available
1st solution(Generic solution): Adding one Generic solution here, where sending field name to an awk variable(which we want to look for for maximum value) it will automatically find out its field number from very first line and will work accordingly. Considering that your first line has that field name which you want to look for.
awk -v var="kbcached" '
FNR==1{
for(i=1;i<=NF;i++){
if($i==var){ field=i }
}
next
}
/kbmemused/{
next
}
{
if($2!~/^[AP]M$/){
val=$(field-1)
}
else{
val=$field
}
}
{
max=(max>val?max:val)
val=""
}
END{
print "Maximum value is:" max
}
' Input_file
2nd solution(As per shown samples only): Could you please try following, based on your shown samples only. I am assuming you want the field value of column kbcached.
awk '
/kbmemfree/{
next
}
{
if($2!~/^[AP]M$/){
val=$6
}
else{
val=$7
}
}
{
max=(max>val?max:val)
val=""
}
END{
print "Maximum value is:" max
}
' Input_file
awk '$7 ~ ^[[:digit:]]+$/ && $1 != "Average:" {
max[$7]=""
}
END {
PROCINFO["sorted_in"]="#ind_num_asc";
for (i in max) {
maxtot=i
}
print maxtot
}' file
One liner:
awk '$7 ~ /^[[:digit:]]+$/ && $1 != "Average:" { max[$7]="" } END { PROCINFO["sorted_in"]="#ind_num_asc";for (i in max) { maxtot=i } print maxtot }' file
Using GNU awk, search for lines where field 7 is only numbers and field one is not "Average:" In these instances, create an array entry with field 7 as the index. At the end, sort the array in index ascending number order. Loop through the array setting a maxtot variable. The last entry in the max array will be the highest kbcached and so print maxtot

awk: match pattern and print lines after and before till next pattern

Using awk:
Find a pattern.
Print all lines after that pattern till next pattern.
Print all lines before that pattern till next pattern.
eg. if this is the content of the file
?hello#
line-0
?type=A;so on
line-1
short-description
line-2
line-3
ending#
line-4
?bye#
match pattern short-description and print lines after till pattern # and print lines before till pattern ? so the output should be:
?type=A;so on
line-1
short-description
line-2
line-3
ending#
i tried: awk '/short-description/{copy=1;next} /#/{copy=0;next} copy' file
but i don't know how to get the before pattern part, i have very limited knowledge of awk. Also please provide a one line solution.
please help. Thanks a lot.
Try:
/^\?/ { delete arr ; len = 0 ; hit = 0 }
/^\?/,/#$/ {
arr[len++] = $0
if ( /short-description/ )
hit = 1
}
/#$/ {
if(hit)
for(i=0;i<len;++i)
print arr[i]
}
Or, this one-liner:
BEGIN { RS="?" } /short-description/ { sub("#.*","") ; print $0 }

Using a variable defined inside AWK

I got this piece of script working. This is what i wanted:
input
3.76023 0.783649 0.307724 8766.26
3.76022 0.764265 0.307646 8777.46
3.7602 0.733251 0.30752 8821.29
3.76021 0.752635 0.307598 8783.33
3.76023 0.79528 0.307771 8729.82
3.76024 0.814664 0.307849 8650.2
3.76026 0.845679 0.307978 8802.97
3.76025 0.826293 0.307897 8690.43
with script
!/bin/bash
awk -F ', ' '
{
for (i=3; i<=10; i++) {
if (i==NR) {
npc1[i]=sprintf("%s", $1);
npc2[i]=sprintf("%s", $2);
npc3[i]=sprintf("%s", $3);
npRs[i]=sprintf("%s", $4);
print npc1[i],npc2[i],\
npc3[i], npc4[i];
}
}
} ' p_walls.raw
echo "${npc1[100]}"
But now I can't use those arrays npc1[i], outside awk. That last echo prints nothing. Isnt it possible or am I missing something?
AWK is a separate process, after it finishes all internal data is gone. This is true for all external processes/commands. Bash only sees what bash builtins touch.
i is never 100, so why do you want to access npc1[100]?
What are you really trying to do? If you rewrite the question we might be able to help...
(Cherry on the cake is always good!)
Sorry, but all of #yi_H 's answer and comments above are correct.
But there's really no problem loading 2 sets of data into 2 separate arrays in awk, ie.
awk '{
if (FILENAME == "file1") arr1[i++]=$0 ;
#same for file2; }
END {
f1max=++i; f2max=++j;
for (i=1;i<f1max;i++) {
arr1[i]
# put what you need here for arr1 processing
#
# dont forget that you can do things like
if (arr1[i] in arr2) { print arr1[i]"=arr2[arr1["i"]=" arr2[arr1[i]] }
}
for j=1;j<f2max;j++) {
arr2[j]
# and here for arr2
}
}' file1 file2
You'll have to fill the actual processing for arr1[i] and arr2[j].
Also, get an awk book for the weekend and be up and running by Monday. It's easy. You can probably figure it out from grymoire.com/Unix/awk.html
I hope this helps.