How to use variable with or operator | in awk - awk

The following works:
awk '
NR==FNR { sub(/\.(png|txt|jpg|json)$/,""); a[$0]; next }
{ f=$0; sub(/\.(png|txt|jpg|json)$/,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt
and now I want it to take the file endings that shall be ignored in the comparison as a variable, but cannot get it to work. My attempt below just compares without ignoring any file endings. I tried with $ and without, with () and without, escaping the |, but so far without success. What is the correct solution?
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
NR==FNR { sub(/\.(fileEndingsToIgnore)$/,""); a[$0]; next }
{ f=$0; sub(/\.(fileEndingsToIgnore)$/,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt

GNU AWK doesn't allow you to use variable inside regular expression literal, you might use string variable with ~ and !~ and many String functions, however in such case you need to double escaping as explained in Using Dynamic Regexps. Consider following example, let say you want to output only .txt and .json filenames without extension and you have file.txt with content as follows
file1.txt
file2.bmp
file3.json
then
awk 'BEGIN{s="\\.(txt|json)$"}sub(s,""){print}' file.txt
gives output
file1
file3
Observe \\ rather than \.
(tested in GNU Awk 5.0.1)

One workaround is to dynamically build the regex and store it in a variable, then use the variable in the sub() call.
Sample input file:
$ cat test.file
abc.txt
def.jpg
ghi.exe
jkl.dat
123.json
456.ini
789.pngX
000.png
111.dat
One awk idea:
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
BEGIN { regex="\\.(" fileEndingsToIgnore ")$" } # need to escape the escape char, ie, "\\"
{ out=$1
sub(regex,"",out)
printf "%s => %s\n",$0,out
}
' test.file
This generates:
abc.txt => abc
def.jpg => def
ghi.exe => ghi.exe
jkl.dat => jkl.dat
123.json => 123
456.ini => 456.ini
789.pngX => 789.pngX
000.png => 000
111.dat => 111.dat
Applying this to OP's current code:
fileEndingsToIgnore="png|txt|jpg|json"
awk -v fileEndingsToIgnore="${fileEndingsToIgnore}" '
BEGIN { regex="\\.(" fileEndingsToIgnore ")$" }
NR==FNR { sub(regex,""); a[$0]; next }
{ f=$0; sub(regex,"", f) }
!(f in a)
' comp1.txt comp2.txt > result.txt

I think this should be generic enough :
"-v FS=..." is the list of file extensions to exclude, case sensitive:
mawk -v FS='mp[34]|txt|sh|awk' 'BEGIN { _^= FS = "[.]("FS")$"
split("",__) } FNR==NR ? __[$_] : NF<=($_ in __)' file file

Related

Grabbing value from piped file contents

Let's say I have the following file:
credentials:
[default]
key_id = AKIAGHJQTOP
secret_key = alcsjkf
[default2]
key_id = AKIADGHNKVP
secret_key = njprmls
I want to grab the value of [default] key_id. I'm trying to do it with awk command but I'm open to any other way if it's more efficient and easier. Instead of passing a file name to awk, I want to pass the file contents from environmental variable FILE_CONTENTS
I tried the following:
$export VAR=$(echo "$FILE_CONTENTS" | awk '/credentials.default.key_id/ {print $2}')
But it didn't work. Any help is appreciated.
You can use awk like this:
cat srch.awk
BEGIN { FS = " *= *" }
{ sub(/^[[:blank:]]+/, "") }
/:[[:blank:]]*$/ {
sub(/:[[:blank:]]*$/, "")
k = $1
}
/^[[:blank:]]*\[/ {
s = k "." $1
}
NF == 2 {
map[s "." $1] = $2
}
key in map {
print map[key]
exit
}
# then use it as
echo "$FILE_CONTENTS" |
awk -v key='credentials.[default].key_id' -f srch.awk
AKIAGHJQTOP
# or else
echo "$FILE_CONTENTS" |
awk -v key='credentials.[default].secret_key' -f srch.awk
alcsjkf
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='(^|\\n)credentials:\\n[[:space:]]+\\[default\\]\\n[[:space:]]+key_id = \\S+' '
RT && num=split(RT,arr," key_id = "){
print arr[num]
}
' Input_file
Here is the Online demo for used regex(its bit changed from regex used in awk code as escaping is done in program not in site).
Assumptions:
no spaces between labels and :
no spaces between [ the stanza name and ]
all lines with attribute/value pairs have exactly 3 space-delimited fields as shown (ie, attr = value; value has no embedded spaces)
the contents of OP's variable (FILE_CONTENTS) is an exact copy (data and format) of the sample file provided by OP
NOTE: if the input file format can differ from these assumptions then additional code must be added to address said differences; as mentioned in comments ... writing your own parser is doable but you need to insure you address all possible format variations
One awk idea:
awk -v label='credentials' -v stanza='default' -v attr='key_id' '
/:/ { f1=0; if ($0 ~ label ":") f1=1 }
f1 && /[][]/ { f2=0; if ($0 ~ "[" stanza "]") f2=1 }
f1 && f2 && /=/ { if ($1 == attr) { print $3; f1=f2=0 } }
'
This generates:
AKIAGHJQTOP
$ awk 'f{print $3; exit} /\[default]/{f=1}' <<<"$FILE_CONTENTS"
AKIAGHJQTOP
If that's not all you need then edit your question to provide more truly realistic sample input/output including cases where the above doesn't work.
open to any other way if it's more efficient and easier
I suggest taking look at python's configparser, which is part of standard library. Let FILE_CONTENTS environment variable be holding
credentials:
[default]
key_id = AKIAGHJQTOP
secret_key = alcsjkf
[default2]
key_id = AKIADGHNKVP
secret_key = njprmls
then create file getkeyid.py with content as follows
import configparser
import os
config = configparser.ConfigParser()
config.read_string(os.environ["FILE_CONTENTS"].replace("credentials","#credentials",1))
print(config["default"]["key_id"])
and do
python3 getkeyid.py
to get output
AKIAGHJQTOP
Explanation: I retrieve string from environmental variable and replace credentials with #credentials at most 1 time in order to comment that line (otherwise parser will fail), then parse it and retrieve value corresponding to desired key.

Can't replace string to multi-lined string with sed

I'm trying to replace a fixed parse ("replaceMe") in a text with multi-lined text with sed.
My bash script goes as follows:
content=$(awk'{print $5}' < data.txt | sort | uniq)
target=$(cat install.sh)
text=$(sed "s/replaceMe/$content/" <<< "$target")
echo "${text}"
If content contains one line only, replacing works, but if it contains sevrel lines I get:
sed:... untarminated `s' command
I read about "fetching" multi-lined content, but I couldn't find something about placing multi lined string
You'll have more problems than that depending on the contents of data.txt since sed doesn't understand literal strings (see Is it possible to escape regex metacharacters reliably with sed). Just use awk which does:
text="$( awk -v old='replaceMe' '
NR==FNR {
if ( !seen[$5]++ ) {
new = (NR>1 ? new ORS : "") $5
}
next
}
s = index($0,old) { $0 = substr($0,1,s-1) new substr($0,s+length(old)) }
{ print }
' data.txt install.sh )"

awk: extract data from a column by name rather than position

I have a text file that is comma delimited. The first line is a list of field names, and subsequent lines contain data. I'll get new versions of the file, and I want to extract all the values from a particular column by name rather than by column number. (I.e. the column I want may be in different positions in different versions of the file.)
For example, here are two files:
foo,bar,interesting,junk
1,2,gold,ramjet
2,25,diamonds,superfluous
and
foo,bar,baz,interesting,junk,morejunk
5,3,smurf,platinum,garbage,scrap
6,2.5,mushroom,sodium,liverwurst,eew
I'd like a single script that will go through multiple files, extracting the minerals in the "interesting" column. :-)
What I've got so far is something that works on ONE file, but I know that awk is more elegant than this. How do I clean this up and make it work on multiple files at once?
BEGIN {
FS=",";
}
NR == 1 {
for(i=1; i<=NF; i++) {
if($i=="interesting") {
col=i;
}
}
}
NR > 1 {
print $col;
}
You're pretty darn close already. Just use FNR instead of NR, for "File NR".
#!/usr/bin/awk -f
BEGIN { FS="," }
FNR==1 {
for (col=1;col<=NF;col++)
if ($col=="interesting")
next
}
{ print $col }
Or if you like:
#!/usr/bin/awk -f
BEGIN { FS="," }
FNR==1 { for (col=1;$col!="interesting";col++); next }
{ print $col }
Or if you prefer one-liners:
$ awk -F, -v txt="interesting" 'FNR==1{for(c=1;$c!=txt;c++);next} {print $c}' file1 file2
Of course, be careful that you actually have the specified column, or you may find yourself in an endless loop. You can probably figure out the extra condition that saves you from that risk.
Note that in awk, you only need to terminate commands with semicolons if they are followed by another command. Thus, you would do this:
command1; command2
But you can drop the semicolon if you separate commands with newlines:
command1
command2
Do it this way:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { for (i=1;i<=NF;i++) f[$i]=i; next }
{ print $(f["interesting"]) }
$ awk -f tst.awk file1 file2
gold
diamonds
platinum
sodium
Creating a name->value array is always the best approach when it's applicable. It keeps every part of the code simple and decoupled from the rest of the code, and it sets you up for doing other things like changing the order of the fields when you output the results, e.g.:
$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { for (i=1;i<=NF;i++) f[$i]=i; next }
{ print $(f["junk"]), $(f["interesting"]), $(f["bar"]) }
$ awk -f tst.awk file1 file2
ramjet,gold,2
superfluous,diamonds,25
garbage,platinum,3
liverwurst,sodium,2.5

How to rewrite a Awk script to process several files instead of one

I am writing a report tool which processes the source files of some application and produce a report table with two columns, one containing the name of the file and the other containing the word TODO if the file contains a call to some deprecated function deprecated_function and DONE otherwise.
I used awk to prepare this report and my shell script looks like
report()
{
find . -type f -name '*.c' \
| xargs -n 1 awk -v deprecated="$1" '
BEGIN { status = "DONE" }
$0 ~ deprecated{ status = "TODO" }
END {
printf("%s|%s\n", FILENAME, status)
}'
}
report "deprecated_function"
The output of this script looks like
./plop-plop.c|DONE
./fizz-boum.c|TODO
This works well but I would like to rewrite the awk script so that it supports several input files instead of just one — so that I can remove the -n 1 argument to xargs. The only solutions I could figure out involve a lot of bookkeeping, because we need to track the changes of FILENAME and the END event to catch each end of file event.
awk -v deprecated="$1" '
BEGIN { status = "DONE" }
oldfilename && (oldfilename != FILENAME) {
printf("%s|%s\n", oldfilename, status);
status = DONE;
oldfilename = FILENAME;
}
$0 ~ deprecated{ status = "TODO" }
END {
printf("%s|%s\n", FILENAME, status)
}'
Maybe there is a cleaner and shorter way to handle this.
I am using FreeBSD's awk and am looking for solutions compatible with this tool.
This will work in any modern awk:
awk -v deprecated="$1" -v OFS='|' '
$0 ~ deprecated{ dep[FILENAME] }
END {
for (i=1;i<ARGC;i++)
print ARGV[i], (ARGV[i] in dep ? "TODO" : "DONE")
}
' file1 file2 ...
Any time you need to produce a report for all files and don't have GNU awk for ENDFILE, you MUST loop through ARGV[] in the END section (or loop through it in BEGIN and populate a different array for END section processing). Anything else will fail if you have empty files.
Your awk script could be something like this:
awk -v deprecated="$1" '
FNR==1 {if(file) print file "|" (f?"TODO":"DONE"); file=FILENAME; f=0}
$0 ~ deprecated {f=1}
END {print file "|" (f?"TODO":"DONE")}' file1.c file2.c # etc.
The logic is fairly similar to your program so hopefully it's all clear. FNR is the record number of the current file, which I'm using to detect the start of a new file. Admittedly there's some repetition in the END block but I don't think it's a big deal. You could always use a function if you wanted to.
Testing it out:
$ cat f1.c
int deprecated_function()
{
// some deprecated stuff
}
$ cat f2.c
int good_function()
{
// some good stuff
}
$ find -name "f?.c" -print0 | xargs -0 awk -v deprecated="deprecated" 'FNR==1 {if(file) print file "|" (f?"TODO":"DONE"); file=FILENAME; f=0} $0 ~ deprecated {f=1} END {print file "|" (f?"TODO":"DONE")}'
./f2.c|DONE
./f1.c|TODO
I have used -print0 and the -0 switch to xargs so that both programs with work file names separated by null bytes "\0" rather than spaces. This means that you won't run into problems with spaces in file names.

awk: non-terminated string

I'm trying to run the command below, and its giving me the error. Thoughts on how to fix? I would rather have this be a one line command than a script.
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } ' |
awk -F\" ' { print "url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/'{ print $1 }'\?schema\=1\.3\.0\&form\=json\&pretty\=true\&token\=582EVTY78-03iBkTAf0JAhwOBx\&account\=room_event\"" } '
awk: non-terminated string url = "ht... at source line 1
context is
>>> <<<
awk: giving up
source line number 2
The line below exports out a single column of ID's:
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } '
156512145
898545774
454658748
898432413
I'm looking to get the ID's above into a string like so:
" url = "string...'ID'string"
take a look what you have in last awk :
awk -F\"
' #single start here
{ print " #double starts for print, no ends
url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/
' #single ends here???
{ print $1 }'..... #single again??? ...
(rest codes)
and you want to print exact {print } out? i don't think so. why you were nesting print ?
Most of the elements of your pipe can be expressed right inside awk.
I can't tell exactly what you want to do with the last awk script, but here are some points:
Your "grep" is really just looking for a string of text, not a
regexp.
You can save time and simplify things if you use awk's
index() function instead of a RE. Output formats are almost always
best handled using printf().
Since you haven't provided your input data, I can't test this code, so you'll need to adapt it if it doesn't work. But here goes:
awk -F/ '
BEGIN {
string="id\": \"http://room.event.assist.com/event/room/event/";
fmt="url = http://example.com/event/room/event/%s?schema=whatever\n";
}
count == 1217 { nextfile; }
index($0, string) {
split($7, a, "\"");
printf(fmt, a[0]);
count++;
}' failed_events.txt
If you like, you can use awk's -v option to pass in the string variable from a shell script calling this awk script. Or if this is a stand-alone awk script (using #! shebang), you could refer to command line options with ARGV.