How can I embed arguments in an awk script? - scripting
This is the evolution of these two questions, here, and here.
For mine own learning, I'm trying to accomplish two (more) things with the code below:
Instead of invoking my script with # myscript -F "," file_to_process, how can I fold in the '-F ","' part into the script itself?
How can I initialize a variable, so that I only assign a value once (ignoring subsequent matches? You can see from the script that I parse seconds and micro seconds in each rule, I'd like to keep the first assignment of sec around so I could subtract it from subsequent matches in the printf() statement.
#!/usr/bin/awk -f
/DIAG:/ {
lbl = $3;
sec = $5;
usec = $6;
/Test-S/ {
stgt = $7;
s1 = $30;
s2 = $31;
}
/Test-A/ {
atgt = $7;
a = $8;
}
/Test-H/ {
htgt = $7;
h = $8;
}
/Test-C/ {
ctgt = $7;
c = $8;
}
}
/WARN:/ {
sec = $4;
usec = $5;
m1 = $2;
m2 = $3
}
{
printf("%16s,%17d.%06d,%7.2f,%7.2f,%7.2f,%7.2f,%7.2f,%7.2f,%7.2f,%7.2f,%7.2f,%5d,%5d\n", lbl, sec, usec, stgt, s1, s2, atgt, a, htgt, h, ctgt, c, m1, m2)
}
Use a BEGIN clause:
BEGIN { FS = ","
var1 = "text"
var2 = 3
etc.
}
This is run before the line-processing statements. FS is the field separator.
If you want to parse a value and keep it, it depends on whether you want only the first one or you want each previous one.
To keep the first one:
FNR==1 {
keep=$1
}
To keep the previous one:
BEGIN {
prevone = "initial value"
}
/regex/ {
do stuff with $1
do stuff with prevone
prevone = $1
}
Related
awk: preserve row order and remove duplicate strings (mirrors) when generating data
I have two text files g1.txt alfa beta;www.google.com Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org; g2.txt Jack to ride.zip;http://alfa.org; JKr.rui.rar;http://gamma.org; Nofj ogk.png;http://gamma.org; I use this command to run my awk script awk -f ./join2.sh g1.txt g2.txt > "g3.txt" and I obtain this output Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;;Jack to ride.zip;http://alfa.org;JKr.rui.rar;http://gamma.org;Nofj ogk.png;http://gamma.org; alfa beta;www.google.com; What are the problems? 1. row order is not conservated, for example in the output file g3.txt, the line alfa beta;www.google.com; is after the line Light.... when it should be first, as you can see in g1.txt 2. I have many mirror strings in Light.. line, you can see that in g3.txt http://alfa.org http://gamma.org http://gamma.org are repeated in same row. What kind of output for rows, instead, do I want? Like this: alfa beta;www.google.com Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;Jack to ride.zip;JKr.rui.rar;Nofj ogk.png; First: I try to implement a function that check if there are ugual strings inside a row, for example do you see in my row output Light Dweller - CR, Technical Metal... that there are identical string inside that row? For example http://alfa.org and http://gamma.org ? Ok, I don't want this. I want each string, enclosed within delimiters; is present only once and only once for each row. This rule should only apply to the output file, g3.txt Second: I want that original order of rows in g1.txt must be maintained in the g3.txt output file. For example, in g1.txt I have alfa beta ... Light Dweller ... but my script returns to me a different ordering Light Dweller ... alfa beta ... I want to prevent reordering of rows My join2.sh script is this #! /usr/bin/awk -f BEGIN { OFS=FS=";" C=0; } { if (ARGIND == 1) { X = $NF T0[$NF] = C++ $NF = "" if (T1[X]) { T1[X] = T1[X] $0 } else { T1[X] = $0 } } else { X = $NF T0[$NF] = C++ $NF = "" if (T2[X]) { T2[X] = T2[X] $0 } else { T2[X] = $0 } } } END { for (X in T0) { # concatenate T1[X] and X, since T1[X] ends with ";" print T1[X] X, T2[X] } } SOLUTION:
You should process g2.txt first like this: cat join2.awk BEGIN { OFS=FS=";" } ARGIND == 1 { map[$2] = ($2 in map ? map[$2] OFS : "") $1 next } { r = $0; for (i=1; i<=NF; ++i) if ($i in map) r = r OFS map[$i] $0 = r } 1 Then use it as: awk -f join2.awk g2.txt g1.txt alfa beta;www.google.com Light Dweller - CR, Technical Metal;http://alfa.org;http://beta.org;http://gamma.org;;Jack to ride.zip;JKr.rui.rar;Nofj ogk.png
Migration from Pelican to Hugo
I was reading an article showing how to migrate markdown files from Pelican to Hugo. I'm trying to understand what the awk script is doing. : # begin block, executed once, # to set field separator, output fied separator & print 3 dashes BEGIN { FS = ":"; OFS = ":"; print "---" } # ??? !c && /^$/ { print "---\n"; c = 1 } # user defined function? c { print; next } # user defined function? !c { # lower first field $1 = tolower($1) # if first field is "date" if ($1 == "date") { # transform second field $2 = gensub(/ ([^.]+)\.([^.]+).([^.]+)/, " \\3-\\2-\\1", 1, $2) $2 = gensub(/-([0-9])-/, "-0\\1-", 1, $2) } if ($1 == "tags") $2 = " [" gensub(/[-a-z]+/, "'\\0'", "g", substr($2, 2)) "]" print } I don't really understand, what are c and !c are they user defined functions? Without the function keyword and without parameters? What is exactly the meaning of c=1?
c is a variable. c=1 sets the value of c to 1 c is a test of variable c and its true, other than 0 !c is a test of variable c and its true if c is not set or 0 c { print; next } If c is set to some other than nothing or 0, then print (will print the whole line since nothing other is specified). next stop what you are doing and skip to next line and start over.
gsub for substituting translations not working
I have a dictionary dict with records separated by ":" and data fields by new lines, for example: :one 1 :two 2 :three 3 :four 4 Now I want awk to substitute all occurrences of each record in the input file, eg onetwotwotwoone two threetwoone four My first awk script looked like this and works just fine: BEGIN { RS = ":" ; FS = "\n"} NR == FNR { rep[$1] = $2 next } { for (key in rep) grub(key,rep[key]) print } giving me: 12221 2 321 4 Unfortunately another dict file contains some character used by regular expressions, so I have to substitute escape characters in my script. By moving key and rep[key] into a string (which can then be parsed for escape characters), the script will only substitute the second record in the dict. Why? And how to solve? Here's the current second part of the script: { for (key in rep) orig=key trans=rep[key] gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig) gsub(orig,trans) print } All scripts are run by awk -f translate.awk dict input Thanks in advance!
Your fundamental problem is using strings in regexp and backreference contexts when you don't want them and then trying to escape the metacharacters in your strings to disable the characters that you're enabling by using them in those contexts. If you want strings, use them in string contexts, that's all. You won't want this: gsub(regexp,backreference-enabled-string) You want something more like this: index(...,string) substr(string) I think this is what you're trying to do: $ cat tst.awk BEGIN { FS = ":" } NR == FNR { if ( NR%2 ) { key = $2 } else { rep[key] = $0 } next } { for ( key in rep ) { head = "" tail = $0 while ( start = index(tail,key) ) { head = head substr(tail,1,start-1) rep[key] tail = substr(tail,start+length(key)) } $0 = head tail } print } $ awk -f tst.awk dict file 12221 2 321 4
Never mind for asking.... Just some missing parentheses...?! { for (key in rep) { orig=key trans=rep[key] gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig) gsub(orig,trans) } print } works like a charm.
awk count selective combinations only:
Would like to read and count the field value == "TRUE" only from 3rd field to 5th field. Input.txt Locationx,Desc,A,B,C,Locationy ab123,Name1,TRUE,TRUE,TRUE,ab1234 ab123,Name2,TRUE,FALSE,TRUE,ab1234 ab123,Name2,FALSE,FALSE,TRUE,ab1234 ab123,Name1,TRUE,TRUE,TRUE,ab1234 ab123,Name2,TRUE,TRUE,TRUE,ab1234 ab123,Name3,FALSE,FALSE,FALSE,ab1234 ab123,Name3,TRUE,FALSE,FALSE,ab1234 ab123,Name3,TRUE,TRUE,FALSE,ab1234 ab123,Name3,TRUE,TRUE,FALSE,ab1234 ab123,Name1,TRUE,TRUE,FALSE,ab1234 While reading the headers from 3rd field to 5th field , i,e A, B, C want to generate unique combinations and permutations like A,B,C,AB,AC,AB,ABC only. Note: AA, BB, CC, BA etc excluded If the "TRUE" is considered for "AB" combination count then it should not be considered for "A" conut & "B" count again to avoid duplicate .. Example#1 Locationx,Desc,A,B,C,Locationy ab123,Name1,TRUE,TRUE,TRUE,ab1234 Op#1 Desc,A,B,C,AB,AC,BC,ABC Name1,,,,,,,1 Example#2 Locationx,Desc,A,B,C,Locationy ab123,Name1,TRUE,TRUE,FALSE,ab1234 Op#2 Desc,A,B,C,AB,AC,BC,ABC Name1,,,,1,,, Example#3 Locationx,Desc,A,B,C,Locationy ab123,Name1,FALSE,TRUE,FALSE,ab1234 Op#3 Desc,A,B,C,AB,AC,BC,ABC Name1,,1,,,,, Desired Output: Desc,A,B,C,AB,AC,BC,ABC Name1,,,,1,,,2 Name2,,,1,,1,,1 Name3,1,,,2,,, Actual File is like below : Input.txt Locationx,Desc,INCOMING,OUTGOING,SMS,RECHARGE,DEBIT,DATA,Locationy ab123,Name1,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,ab1234 ab123,Name2,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,ab1234 ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,ab1234 ab123,Name1,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234 ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234 ab123,Name3,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,ab1234 ab123,Name3,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,ab1234 ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234 ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234 ab123,Name1,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,ab1234 Have tried lot , nothing is materialised , any suggestions please !!! Edit: Desired Output from Actual Input: Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING Name1,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,, Name2,,,,1,1,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,, Don't have Perl and Python access !!!
I have written a perl script that does this for you. As you can see from the size and comments, it is really simple to get this done. #!/usr/bin/perl use strict; use warnings; use autodie; use Algorithm::Combinatorics qw(combinations); ## change the file to the path where your file exists open my $fh, '<', 'file'; my (%data, #new_labels); ## capture the header line in an array my #header = split /,/, <$fh>; ## backup the header my #fields = #header; ## remove first, second and last columns #header = splice #header, 2, -1; ## generate unique combinations for my $iter (1 .. +#header) { my $combination = combinations(\#header, $iter); while (my $pair = $combination->next) { push #new_labels, "#$pair"; } } ## iterate through rest of the file while(my $line = <$fh>) { my #line = split /,/, $line; ## identify combined labels that are true my #is_true = map { $fields[$_] } grep { $line[$_] eq "TRUE" } 0 .. $#line; ## increment counter in hash map keyed at description and then new labels ++$data{$line[1]}{$_} for map { s/ /-/g; $_ } "#is_true"; } ## print the new header print join ( ",", "Desc", map {s/ /-/g; $_} reverse #new_labels ) . "\n"; ## print the description and counter values for my $desc (sort keys %data){ print join ( ",", $desc, ( map { $data{$desc}{$_} //= "" } reverse #new_labels ) ) . "\n"; } Output: Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-DEBIT-DATA,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING Name1,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,, Name2,,,,1,,1,,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,, Note: Please revisit your expected output. It has few mistakes in it as you can see from the output generated from the script above.
Here is an attempt at solving this using awk: Content of script.awk BEGIN { FS = OFS = "," } function combinations(flds, itr, i, pre) { for (i=++cnt; i<=numRecs; i++) { ++n sep = "" for (pre=1; pre<=itr; pre++) { newRecs[n] = newRecs[n] sep (sprintf ("%s", flds[pre])); sep = "-" } newRecs[n] = newRecs[n] sep (sprintf ("%s", flds[i])) ; } } NR==1 { for (fld=3; fld<NF; fld++) { recs[++numRecs] = $fld } for (iter=0; iter<numRecs; iter++) { combinations(recs, iter) } next } !seen[$2]++ { desc[++d] = $2 } { y = 0; var = sep = "" for (idx=3; idx<NF; idx++) { if ($idx == "TRUE") { is_true[++y] = recs[idx-2] } } for (z=1; z<=y; z++) { var = var sep sprintf ("%s", is_true[z]) sep = "-" } data[$2,var]++; } END{ printf "%s," , "Desc" for (k=1; k<=n; k++) { printf "%s%s", newRecs[k],(k==n?RS:FS) } for (name=1; name<=d; name++) { printf "%s,", desc[name]; for (nR=1; nR<=n; nR++) { printf "%s%s", (data[desc[name],newRecs[nR]]?data[desc[name],newRecs[nR]]:""), (nR==n?RS:FS) } } } Sample file Locationx,Desc,A,B,C,Locationy ab123,Name1,TRUE,TRUE,TRUE,ab1234 ab123,Name2,TRUE,FALSE,TRUE,ab1234 ab123,Name2,FALSE,FALSE,TRUE,ab1234 ab123,Name1,TRUE,TRUE,TRUE,ab1234 ab123,Name2,TRUE,TRUE,TRUE,ab1234 ab123,Name3,FALSE,FALSE,FALSE,ab1234 ab123,Name3,TRUE,FALSE,FALSE,ab1234 ab123,Name3,TRUE,TRUE,FALSE,ab1234 ab123,Name3,TRUE,TRUE,FALSE,ab1234 ab123,Name1,TRUE,TRUE,FALSE,ab1234 Execution: $ awk -f script.awk file Desc,A,B,C,A-B,A-C,A-B-C Name1,,,,1,,2 Name2,,,1,,1,1 Name3,1,,,2,, Now, there is pretty evident bug in the combination function. It does not recurse to print all combinations. For eg: for A B C D it will print A B C AB AC ABC but not BC
How can I delete one line before and two after a string?
Say you have records in a text file which look like this: header data1 data2 data3 I would like to delete the whole record if data1 is a given string. I presume this needs awk which I do not know.
Awk can handle these multiline records by setting the record separator to the empty string: BEGIN { RS = ""; ORS = "\n\n" } $2 == "some string" { next } # skip this record { print } # print (non-skipped) record You can save this in a file (eg remove.awk) and execute it with awk -f remove.awk data.txt > newdata.txt This assumes your data is of the format: header data .... header data ... If there are no blank lines between the records, you need to manually split the records (this is with 4 lines per record): { a[++i] = $0 } i == 2 && a[i] == "some string" { skip = 1 } i == 4 && ! skip { for (i = 1; i <= 4; i++) print a[i] } i == 4 { skip = 0; i = 0 }
without knowing what output you desired and insufficient sample input. awk 'BEGIN{RS=""}!/data1/' file