How to grep inside while loop with two files or one file - awk

I have this use case where I am trying to list some keys from s3 and filtering the results based on a grep command
fileA - abc/def
def/123
After listing the keys, I am trying to remove this exact key from the list. For example, if list return 2 other keys with the same prefix
list - abc/def/123
abc/def/1234
abc/ghi/12345
def/123/456
def/456/4567
I want to remove the keys matching the pattern read from file i.e. abc/def and def/123
Code :
while read line; do
prefix = $(echo "$line"| grep -oPw '[A-Za-z0-9]*')
aws s3api list-objects --bucket blah-bucket --prefix "$prefix" | grep -vFfw "$line" > result
done < fileA
I am getting this error that command not found : prefix
What am I missing here in the loop?

This is a common problem that has been addressed in various questions posted here for years and years. :-)
The notation you want should look more like this:
prefix="$(echo ...)"
Remember that the shell is a shell, not really a full fledged programming language. Its parsing rules were intended to facilitate calling other programs, and setting up the plumbing to allow those programs to interact with each other.
Here are the various ways mis-placed spaces can be interpreted by shells in the Bourne family (sh, bash, ksh, zsh, ash/dash). Consider:
var=val
var =val
var= val
var = val
var=val: this is the correct syntax for variable assigment -- an unquoted word followed immediately by an equals followed immediately by an argument.
var =val: this runs the var command with =val as its argument.
var= val: this assigns an empty string to the var variable, then runs the val command as if var had been exported to it. This is meant to provide single-use environment variables to commands called by the shell.
var = val: this runs the var command with = and val as arguments.
Other (non-Bourne-style or non-POSIX) shells will have different interpretations.
Also, beware that you will be overwriting the file result for every iteration of this loop.

Well, you had simple syntax error. But, if I understand the what you mean by remove the keys matching the pattern read from file, you're working too hard.
If the following solution does what you want, I guarantee it will run faster and be easier to understand:
$ head patterns input
==> patterns <==
abc/def
def/123
==> input <==
abc/def/123
abc/def/1234
abc/ghi/12345
def/123/456
def/456/4567
$ grep -vf patterns input
abc/ghi/12345
def/456/4567
Any shell solution that iterates over the data is bound to be the wrong approach. Look for ways to let grep and friends operate on whole files, and use the shell to choose the files. It always a safe bet your problem can be solved that way, because over the decades lots of problems looked like your problem. :-)

You can also use the following chain of commands:
$cat to_remove.in
abc/def
def/123
$cat to_process.in
abc/def/123
abc/def/1234
abc/ghi/12345
def/123/456
def/456/4567
$awk 'BEGIN{ORS="\\\\|"}{print}' to_remove.in | sed 's/\\|$//' | xargs -I {} grep -v {} to_process.in
abc/ghi/12345
def/456/4567
Explanations:
awk will be used to create a regex from file to_remove.in with | between each line that will be used by grep -v to exclude the lines from file to_process.in
sed 's/\\|$//' is used to remove the last | at the end of the regex string
then you use xargs to pass the resulting regex string to your grep command

Related

bulk renaming files rearranging file names based on delimiter

I have seen questions that are close to this but I have not seen the exact answer I need and can't seem to get my head wrapped around the regex, awk, sed, grep, rename that I would need to make it happen.
I have files in one directory sequentially named from multiple sub directories of a different directory created using find piped to xargs.
Command I used:
find `<dir1>` -name "*.png" | xargs cp -t `<dir2>`
This resulted in the second directory containing duplicate filenames sequentially named as follows:
<name>.png
<name>.png.~1~
<name>.png.~2~
...
<name>.png.~n~
What I would like to do is take all files ending in ~*~ and rename it as follows:
<name>.#.png where the '#" is the number between the "~"s at the end of the file name
Any help would be appreciated.
With Perl's rename (stand alone command):
rename -nv 's/^([^.]+)\.(.+)\.~([0-9]+)~/$1.$3.$2/' *
If everything looks fine remove option -n.
There might be an easier way to this, but here is a small shell script using grep and awk to achieve what you wanted
for i in $(ls|grep ".png."); do
name=$(echo $i|awk -F'png' '{print $1}');
n=$(echo $i|awk -F'~' '{print $2}');
mv $i $name$n.png;
done

Retain backslashes with while read loop in multiple shells

I have the following code:
#!/bin/sh
while read line; do
printf "%s\n" $line
done < input.txt
Input.txt has the following lines:
one\two
eight\nine
The output is as follows
onetwo
eightnine
The "standard" solutions to retain the slashes would be to use read -r.
However, I have the following limitations:
must run under #!/bin/shfor reasons of portability/posix compliance.
not all systems
will support the -r switch to read under /sh
The input file format cannot be changed
Therefore, I am looking for another way to retain the backslash after reading in the line. I have come up with one working solution, which is to use sed to replace the \ with some other value (e.g.||) into a temporary file (thus bypassing my last requirement above) then, after reading them in use sed again to transform it back. Like so:
#!/bin/sh
sed -e 's/[\/&]/||/g' input.txt > tempfile.txt
while read line; do
printf "%s\n" $line | sed -e 's/||/\\/g'
done < tempfile.txt
I'm thinking there has to be a more "graceful" way of doing this.
Some ideas:
1) Use command substitution to store this into a variable instead of a file. Problem - I'm not sure command substitution will be portable here either and my attempts at using a variable instead of a file were unsuccessful. Regardless, file or variable the base solution is really the same (two substitutions).
2) Use IFS somehow? I've investigated a little, but not sure that can help in this issue.
3) ???
What are some better ways to handle this given my constraints?
Thanks
Your constraints seem a little strict. Here's a piece of code I jotted down(I'm not too sure of how valuable your while loop is for the other stuffs you would like to do, so I removed it off just for ease). I don't guarantee this code to be robustness. But anyway, the logic would give you hints in the direction you may wish to proceed. (temp.dat is the input file)
#!/bin/sh
var1="$(cut -d\\ -f1 temp.dat)"
var2="$(cut -d\\ -f2 temp.dat)"
iter=1
set -- $var2
for x in $var1;do
if [ "$iter" -eq 1 ];then
echo $x "\\" $1
else
echo $x "\\" $2
fi
iter=$((iter+1))
done
As Larry Wall once said, writing a portable shell is easier than writing a portable shell script.
perl -lne 'print $_' input.txt
The simplest possible Perl script is simpler still, but I imagine you'll want to do something with $_ before printing it.

awk: setting environment variables directly from within an awk script

first post here, but been a lurker for ages. i have googled for ages, but cant find what i want (many abigious topic subjects which dont request what the topic suggests it does ...). not new to awk or scripting, just a little rusty :)
i'm trying to write an awk script which will set shell env values as it runs - for another bash script to pick up and use later on. i cannot simply use stdout from awk to report this value i want setting (i.e. "export whatever=awk cmd here"), as thats already directed to a 'results file' which the awkscript is creating (plus i have more than one variable to export in the final code anyway).
As an example test script, to demo my issue:
echo $MYSCRIPT_RESULT # returns nothing, not currently set
echo | awk -f scriptfile.awk # do whatever, setting MYSCRIPT_RESULT as we go
echo $MYSCRIPT_RESULT # desired: returns the env value set in scriptfile.awk
within scriptfile.awk, i have tried (without sucess)
1/) building and executing an adhoc string directly:
{
cmdline="export MYSCRIPT_RESULT=1"
cmdline
}
2/) using the system function:
{
cmdline="export MYSCRIPT_RESULT=1"
system(cmdline)
}
... but these do not work. I suspect that these 2 commands are creating a subshell within the shell awk is executing from, and doing what i ask (proven by touching files as a test), but once the "cmd"/system calls have completed, the subshell dies, unfortunatley taking whatever i have set with it - so my env setting changes dont stick from "the caller of awk"'s perspective.
so my question is, how do you actually set env variables within awk directly, so that a calling process can access these env values after awk execution has completed? is it actually possible?
other than the adhoc/system ways above, which i have proven fail for me, i cannot see how this could be done (other than writing these values to a 'random' file somewhere to be picked up and read by the calling script, which imo is a little dirty anyway), hence, help!
all ideas/suggestions/comments welcomed!
You cannot change the environment of your parent process. If
MYSCRIPT_RESULT=$(awk stuff)
is unacceptable, what you are asking cannot be done.
You can also use something like is described at
Set variable in current shell from awk
unset var
var=99
declare $( echo "foobar" | awk '/foo/ {tmp="17"} END {print "var="tmp}' )
echo "var=$var"
var=
The awk END clause is essential otherwise if there are no matches to the pattern declare dumps the current environment to stdout and doesn't change the content of your variable.
Multiple values can be set by separating them with spaces.
declare a=1 b=2
echo -e "a=$a\nb=$b"
NOTE: declare is bash only, for other shells, use eval with the same syntax.
You can do this, but it's a bit of a kludge. Since awk does not allow redirection to a file descriptor, you can use a fifo or a regular file:
$ mkfifo fifo
$ echo MYSCRIPT_RESULT=1 | awk '{ print > "fifo" }' &
$ IFS== read var value < fifo
$ eval export $var=$value
It's not really necessary to split the var and value; you could just as easily have awk print the "export" and just eval the output directly.
I found a good answer. Encapsulate averything in a subshell!
The comand declare works as below:
#Creates 3 variables
declare var1=1 var2=2 var3=3
ex1:
#Exactly the same as above
$(awk 'BEGIN{var="declare "}{var=var"var1=1 var2=2 var3=3"}END{print var}')
I found some really interesting uses for this technique. In the next exemple I have several partitions with labels. I create variables using the labels as variable names and the device name as variable values.
ex2:
#Partition data
lsblk -o NAME,LABEL
NAME LABEL
sda
├─sda1
├─sda2
├─sda5 System
├─sda6 Data
└─sda7 Arch
#Creates a subshell to execute the text
$(\
#Pipe lsblk to awk
lsblk -o NAME,LABEL | awk \
#Initiate the variable with the text for the declare command
'BEGIN{txt="declare "}'\
#Filters devices with labels Arch or Data
'/Data|Arch/'\
#Concatenate txt with itself plus text for the variables(name and value)
#substr eliminates the special caracters before the device name
'{txt=txt$2"="substr($1,3)" "}'\
#AWK prints the text and the subshell execute as a command
'END{print txt}'\
)
The end result of this is 2 variables: Data with value sda6 and Arch with value sda7.
The same exemple in a single line.
$(lsblk -o NAME,LABEL | awk 'BEGIN{txt="declare "}/Data|Arch/{txt=txt$2"="substr($1,3)" "}END{print txt}')

How do I iterate over all the lines output by a command in zsh?

How do I iterate over all the lines output by a command using zsh, without setting IFS?
The reason is that I want to run a command against every file output by a command, and some of these files contain spaces.
Eg, given the deleted file:
foo/bar baz/gamma
That is, a single directory 'foo', containing a sub directory 'bar baz', containing a file 'gamma'.
Then running:
git ls-files --deleted | xargs ls
Will report in that file being handled as two files: 'foo/bar', and '/baz/gamma'.
I need it to handle it as one file: 'foo/bar baz/gamma'.
If you want to run the command once for all the lines:
ls "${(#f)$(git ls-files --deleted)}"
The f parameter expansion flag means to split the command's output on newlines. There's a more general form (#s:||:) to split at an arbitrary string like ||. The # flag means to retain empty records. Somewhat confusingly, the whole expansion needs to be inside double quotes, to avoid IFS splitting on the output of the command substitution, but it will produce separate words for each record.
If you want to run the command for each line in turn, the portable idiom isn't particularly complicated:
git ls-filed --deleted | while IFS= read -r line; do ls $line; done
If you want to run the command as few times as the command line length limit permits, use zargs.
autoload -U zargs
zargs -- "${(#f)$(git ls-files --deleted)}" -- ls
Using tr and the -0 option of xargs, assuming that the lines don't contain \000 (NUL), which is a fair assumption due to NUL being one of the characters that can't appear in filenames:
git ls-files --deleted | tr '\n' '\000' | xargs -0 ls
this turns the line: foo/bar baz/gamma\n into foo/bar baz/gamma\000 which xargs -0 knows how to handle

How to assign the output of a program to a variable in a DCL com script on VMS?

For example, I have a perl script p.pl that writes "5" to stdout. I'd like to assign that output to a variable like so:
$ x = perl p.pl ! not working code
$ ! now x would be 5
The PIPE command allows you to do Unix-ish pipelining, but DCL is not bash. Getting the output assigned to a symbol is tricky. Each PIPE segment runs in a separate subprocess (like Unix) and there's no way to return a symbol from a subprocess. AFAIK, there's no bash equivalent of assigning stdout to a variable.
The typical approach is to write (redirect) the output to a file and then read it back:
$ PIPE perl p.pl > temp.txt
$ open t temp.txt
$ read t x
$ close t
Another approach is to assign the return value as a JOB logical which is shared by all subprocesses. This can be done as a one-liner using PIPE:
$ PIPE perl p.pl | DEFINE/JOB RET_VALUE #SYS$PIPE
$ x = f$logical("RET_VALUE")
Since the "RET_VALUE" is shared by all processes in the job, you have to be careful of side-effects.
Look up the PIPE command. It lets you do unix like things.
I wanted to identify a particular ACE from a file's ACL and then assign the value to a variable I could refer to later in the script. I wanted to avoid the overhead of writing to/reading from a file as I had 1000s of files to iterate over. This method worked for me.
$ PIPE DIR/SEC filename | SEARCH SYS$PIPE variable | (READ SYS$PIPE variable && DEFINE/JOB/NOLOG variable &variable)
$ SHOW LOGICAL variable