I have the following output from repo status:
project X/Y (*** NO BRANCH ***)
-- A/B/c
-m D/E/f
project Z/ (*** NO BRANCH ***)
-- G/H/i
-m J/K/l
(lowercase letters are files, and uppercase are dirs)
The lines prefixed with -- indicate newly added files. repo diff does not include these files, so I can't create a patch that includes all differences. So, I'll just create tarball of them.
Q: What script (e.g., awk, perl, or python) can I use to create a tarball of these files? The tarball should contain:
X/Y/A/B/c
Z/G/H/i
I'm thinking an awk script would be something like this (I'm not that familiar w/syntax):
awk {
BEGIN curdir = '', filelist = []
{
if ($0 == "project") {
curdir = $1
} else if ($0 == "--") {
# insert file specified by $1 into tarball
}
}
}
Ideas? Thanks!
You are close. Here is some suggestion:
/^project/ {
dir = $2
}
$1 == "--" {
fullpath = dir $2 # space between dir and $2 means concatenation
print fullpath
# Do something with fullpath such as
# system("tar ...")
}
Discussion
$1 is the first field (token) in a line, $2 is the second field, and so on
$0 represent the whole record (line)
Every time we see a line that starts with project, we save the directory, $2 to the variable dir
Every time we see the first field of "--", we print out the directory, concatenated with the file name. In your case, insert command to archive the file here.
Related
Let's say I have the following file:
credentials:
[default]
key_id = AKIAGHJQTOP
secret_key = alcsjkf
[default2]
key_id = AKIADGHNKVP
secret_key = njprmls
I want to grab the value of [default] key_id. I'm trying to do it with awk command but I'm open to any other way if it's more efficient and easier. Instead of passing a file name to awk, I want to pass the file contents from environmental variable FILE_CONTENTS
I tried the following:
$export VAR=$(echo "$FILE_CONTENTS" | awk '/credentials.default.key_id/ {print $2}')
But it didn't work. Any help is appreciated.
You can use awk like this:
cat srch.awk
BEGIN { FS = " *= *" }
{ sub(/^[[:blank:]]+/, "") }
/:[[:blank:]]*$/ {
sub(/:[[:blank:]]*$/, "")
k = $1
}
/^[[:blank:]]*\[/ {
s = k "." $1
}
NF == 2 {
map[s "." $1] = $2
}
key in map {
print map[key]
exit
}
# then use it as
echo "$FILE_CONTENTS" |
awk -v key='credentials.[default].key_id' -f srch.awk
AKIAGHJQTOP
# or else
echo "$FILE_CONTENTS" |
awk -v key='credentials.[default].secret_key' -f srch.awk
alcsjkf
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v RS='(^|\\n)credentials:\\n[[:space:]]+\\[default\\]\\n[[:space:]]+key_id = \\S+' '
RT && num=split(RT,arr," key_id = "){
print arr[num]
}
' Input_file
Here is the Online demo for used regex(its bit changed from regex used in awk code as escaping is done in program not in site).
Assumptions:
no spaces between labels and :
no spaces between [ the stanza name and ]
all lines with attribute/value pairs have exactly 3 space-delimited fields as shown (ie, attr = value; value has no embedded spaces)
the contents of OP's variable (FILE_CONTENTS) is an exact copy (data and format) of the sample file provided by OP
NOTE: if the input file format can differ from these assumptions then additional code must be added to address said differences; as mentioned in comments ... writing your own parser is doable but you need to insure you address all possible format variations
One awk idea:
awk -v label='credentials' -v stanza='default' -v attr='key_id' '
/:/ { f1=0; if ($0 ~ label ":") f1=1 }
f1 && /[][]/ { f2=0; if ($0 ~ "[" stanza "]") f2=1 }
f1 && f2 && /=/ { if ($1 == attr) { print $3; f1=f2=0 } }
'
This generates:
AKIAGHJQTOP
$ awk 'f{print $3; exit} /\[default]/{f=1}' <<<"$FILE_CONTENTS"
AKIAGHJQTOP
If that's not all you need then edit your question to provide more truly realistic sample input/output including cases where the above doesn't work.
open to any other way if it's more efficient and easier
I suggest taking look at python's configparser, which is part of standard library. Let FILE_CONTENTS environment variable be holding
credentials:
[default]
key_id = AKIAGHJQTOP
secret_key = alcsjkf
[default2]
key_id = AKIADGHNKVP
secret_key = njprmls
then create file getkeyid.py with content as follows
import configparser
import os
config = configparser.ConfigParser()
config.read_string(os.environ["FILE_CONTENTS"].replace("credentials","#credentials",1))
print(config["default"]["key_id"])
and do
python3 getkeyid.py
to get output
AKIAGHJQTOP
Explanation: I retrieve string from environmental variable and replace credentials with #credentials at most 1 time in order to comment that line (otherwise parser will fail), then parse it and retrieve value corresponding to desired key.
I have an smb.conf ini file which is overwritten whenever edited with a certain GUI tool, wiping out a custom setting. This means I need a cron job to ensure that one particular section in the file contains a certain option=value pair, and insert it at the end of the section if it doesn't exist.
Example
Ensure that hosts deny=192.168.23. exists within the [myshare] section:
[global]
printcap name = cups
winbind enum groups = yes
security = user
[myshare]
path=/mnt/myshare
browseable=yes
enable recycle bin=no
writeable=yes
hosts deny=192.168.23.
[Another Share]
invalid users=nobody,nobody
valid users=nobody,nobody
path=/mnt/share2
browseable=no
Long-winded solution using awk
After a long time struggling with sed, I concluded that it might not be the right tool for the job. So I moved over to awk and came up with this:
#!/bin/sh
file="smb.conf"
tmp="smb.conf.tmp"
section="myshare"
opt="hosts deny=192.168.23."
awk '
BEGIN {
this_section=0;
opt_found=0;
}
# Match the line where our section begins
/^[ \t]*\['"$section"'\][ \t]*$/ {
this_section=1;
print $0;
next;
}
# Match lines containing our option
this_section == 1 && /^[ \t]*'"$opt"'[ \t]*$/ {
opt_found=1;
}
# Match the following section heading
this_section == 1 && /^[ \t]*\[.*$/ {
this_section=0;
if (opt_found != 1) {
print "\t'"$opt"'";
}
}
# Print every line
{ print $0; }
END {
# In case our section is the very last in the file
if (this_section == 1 && opt_found != 1) {
print "\t'"$opt"'";
}
}
' $file > $tmp
# Overwrite $file only if $tmp is different
diff -q $file $tmp > /dev/null 2>&1
if [ $? -ne 0 ]; then
mv $tmp $file
# reload smb.conf here
else
rm $tmp
fi
I can't help feeling that this is a long script to achieve a simple task. Is there a more efficient/elegant way to insert a property in an ini file using basic shell tools like sed and awk?
Consider using Python 3's configparser:
#!/usr/bin/python3
import sys
from configparser import SafeConfigParser
cfg = SafeConfigParser()
cfg.read(sys.argv[1])
cfg['myshare']['hosts deny'] = '192.168.23.';
with open(sys.argv[1], 'w') as f:
cfg.write(f)
To be called as ./filename.py smb.conf (i.e., the first parameter is the file to change).
Note that comments are not preserved by this. However, since a GUI overwrites the config and doesn't preserve custom options, I suspect that comments are already nuked and that this is not a worry in your case.
Untested, should work though
awk -vT="hosts deny=192.168.23" 'x&&$0~T{x=0}x&&/^ *\[[^]]+\]/{print "\t\t"T;x=0}
/^ *\[myshare\]/{x++}1' file
This solution is a bit awkward. It uses the INI section header as the record separator. This means that there is an empty record before the first header, so when we match the header we're interested in, we have to read the next record to handle that INI section. Also, there are some printf commands because the records still contain leading and trailing newlines.
awk -v RS='[[][^]]+[]]' -v str="hosts deny=192.168.23." '
{printf "%s", $0; printf "%s", RT}
RT == "[myshare]" {
getline
printf "%s", $0
if (index($0, str) == 0) print str
printf "%s", RT
}
' smb.conf
RS is the awk variable that contains the regex to split the text into records.
RT is the awk variable that contains the actual text of the current record separator.
With GNU awk for a couple of extensions:
$ cat tst.awk
index($0,str) { found = 1 }
match($0,/^\s*\[([^]]+).*/,a) {
if ( (name == tgt) && !found ) { print indent str }
name = a[1]
found = 0
}
{ print; indent=gensub(/\S.*/,"","") }
.
$ awk -v tgt="myshare" -v str="hosts deny=192.168.23." -f tst.awk file
[global]
printcap name = cups
winbind enum groups = yes
security = user
[myshare]
path=/mnt/myshare
browseable=yes
enable recycle bin=no
writeable=yes
hosts deny=192.168.23.
[Another Share]
invalid users=nobody,nobody
valid users=nobody,nobody
path=/mnt/share2
browseable=no
.
$ awk -v tgt="myshare" -v str="fluffy bunny" -f tst.awk file
[global]
printcap name = cups
winbind enum groups = yes
security = user
[myshare]
path=/mnt/myshare
browseable=yes
enable recycle bin=no
writeable=yes
hosts deny=192.168.23.
fluffy bunny
[Another Share]
invalid users=nobody,nobody
valid users=nobody,nobody
path=/mnt/share2
browseable=no
I am writing a report tool which processes the source files of some application and produce a report table with two columns, one containing the name of the file and the other containing the word TODO if the file contains a call to some deprecated function deprecated_function and DONE otherwise.
I used awk to prepare this report and my shell script looks like
report()
{
find . -type f -name '*.c' \
| xargs -n 1 awk -v deprecated="$1" '
BEGIN { status = "DONE" }
$0 ~ deprecated{ status = "TODO" }
END {
printf("%s|%s\n", FILENAME, status)
}'
}
report "deprecated_function"
The output of this script looks like
./plop-plop.c|DONE
./fizz-boum.c|TODO
This works well but I would like to rewrite the awk script so that it supports several input files instead of just one — so that I can remove the -n 1 argument to xargs. The only solutions I could figure out involve a lot of bookkeeping, because we need to track the changes of FILENAME and the END event to catch each end of file event.
awk -v deprecated="$1" '
BEGIN { status = "DONE" }
oldfilename && (oldfilename != FILENAME) {
printf("%s|%s\n", oldfilename, status);
status = DONE;
oldfilename = FILENAME;
}
$0 ~ deprecated{ status = "TODO" }
END {
printf("%s|%s\n", FILENAME, status)
}'
Maybe there is a cleaner and shorter way to handle this.
I am using FreeBSD's awk and am looking for solutions compatible with this tool.
This will work in any modern awk:
awk -v deprecated="$1" -v OFS='|' '
$0 ~ deprecated{ dep[FILENAME] }
END {
for (i=1;i<ARGC;i++)
print ARGV[i], (ARGV[i] in dep ? "TODO" : "DONE")
}
' file1 file2 ...
Any time you need to produce a report for all files and don't have GNU awk for ENDFILE, you MUST loop through ARGV[] in the END section (or loop through it in BEGIN and populate a different array for END section processing). Anything else will fail if you have empty files.
Your awk script could be something like this:
awk -v deprecated="$1" '
FNR==1 {if(file) print file "|" (f?"TODO":"DONE"); file=FILENAME; f=0}
$0 ~ deprecated {f=1}
END {print file "|" (f?"TODO":"DONE")}' file1.c file2.c # etc.
The logic is fairly similar to your program so hopefully it's all clear. FNR is the record number of the current file, which I'm using to detect the start of a new file. Admittedly there's some repetition in the END block but I don't think it's a big deal. You could always use a function if you wanted to.
Testing it out:
$ cat f1.c
int deprecated_function()
{
// some deprecated stuff
}
$ cat f2.c
int good_function()
{
// some good stuff
}
$ find -name "f?.c" -print0 | xargs -0 awk -v deprecated="deprecated" 'FNR==1 {if(file) print file "|" (f?"TODO":"DONE"); file=FILENAME; f=0} $0 ~ deprecated {f=1} END {print file "|" (f?"TODO":"DONE")}'
./f2.c|DONE
./f1.c|TODO
I have used -print0 and the -0 switch to xargs so that both programs with work file names separated by null bytes "\0" rather than spaces. This means that you won't run into problems with spaces in file names.
I am trying to append lines to some new files with awk in this way:
#!/usr/bin/awk -f
BEGIN {
FS = "[ \t|]"; }
{
print $5 "\t" $13 "\t" $14 >> "./bed/" $5 ".bed";
}
END {
}
New file is created with filename derived from a field of awk input file (5th field). I am unable to execute this script since it fails with
awk: ./blast2bed.awk:6: (FILENAME=blastout000 FNR=1) fatal: can't redirect to `./bed/AY517392.1.bed' (No such file or directory)
Any hints?
Thanks
The directory bed has to exist so create it first with mkdir bed either before you run your script or in the BEGIN block. You should also add brackets around the output file:
print $5"\t"$13"\t"$14 >> ("./bed/"$5".bed")
Notes: You don't need to end lines with ; if you have a single statement per line and the BEGIN and END blocks are optional.
I would like to read a file like this
1.23213213
0.12321321
-1.12321321
0.23232322
into a variable, or array to use it somewhere in the main {} code.
But I would like to use it if this file exists. How can I check if it already exists or not, and if not, then do not use that variable or array?
I don't understand completely what you want to achieve, but perhaps something like this can be useful to you:
It process the file line by line and saves each one in an array, the key is the line number so you keep the order. In the END section check how many lines were processed and get if the file had content.
awk '{ line[ FNR ] = $0 } END { if ( FNR > 0 ) { print "File" } else { print "NO file" } }' infile
EDIT to comment:
But in awk you can process many files from command line.
BEGIN {
...
}
## Processing of first file in command line.
FNR == NR {
a[ FNR ] = $0
next
}
## Processing of second file in command line
FNR < NR {
## Check if array 'a' has the values you want and use them
## 'for(...)variable += a[i]' or whatever.
}
Run script like:
awk -f script.awk first_file.txt second_file.txt
But if first_file.txt doesn't exists, awk will complain with an error.