while loop only iterates once - awk

I'm writing a unix script which does an awk and pipes to a while loop. For some reason, though, the while loop iterates only once. Can someone point out what I am missing?
awk '{ print $1, $2}' file |
while IFS=" " read A B
do
echo $B
if [ "$B" -eq "16" ];
then
grep -A 1 $A $1 | python unreverse.py
else
grep -A 1 $A
fi
done
"file" looks something like
cheese 2
elephant 5
tiger 16

Solution
The solution is to replace:
grep -A 1 $A
With:
grep -A 1 "$A" filename
Where filename is whatever file you intended grep to read from. Just guessing, maybe you intended:
grep -A 1 "$A" "$1"
I added double-quotes to prevent any possible word-splitting.
Explanation
The problem is that, without the filename, the grep command reads from and consumes all of standard input. It does this on the first run through the loop. Consequently, there is not input left for the second run and read A B fails and the loop terminates.
A Simpler Example
We can see the same issue happening with many fewer statements. Here is a while loop that is given two lines of input but only loops once:
$ { echo 1; echo 2; } | while read n; do grep "$n"; echo "n=$n"; done
n=1
Here, simply by adding a filename to the grep statement, we see that the while loop executes twice, as it should:
$ { echo 1; echo 2; } | while read n; do grep "$n" /dev/null; echo "n=$n"; done
n=1
n=2

Related

Change a string using sed or awk

I have some files which have wrong time and date, but the filename contains the correct time and date and I try to write a script to fix this with the touch command.
Example of filename:
071212_090537.jpg
I would like this to be converted to the following format:
1712120905.37
Note, the year is listed as 07 in the filename, even if it is 17 so I would like the first 0 to be changed to 1.
How can I do this using awk or sed?
I'm quite new to awk and sed, an programming in general. Have tried to search for a solution and instruction, but haven't manage to figure out how to solve this.
Can anyone help me?
Thanks. :)
Take your example:
awk -F'[_.]' '{$0=$1$2;sub(/^./,"1");sub(/..$/,".&")}1'<<<"071212_090537.jpg"
will output:
1712120905.37
If you want the file to be removed, you can let awk generate the mv origin new command, and pipe the output to |sh, like: (comments inline)
listYourFiles| # list your files as input to awk
awk -F'[_.]' '{o=$0;$0=$1$2;sub(/^./,"1");sub(/..$/,".&");
printf "mv %s %s\n",o,$0 }1' #this will print "mv ori new"
|sh # this will execute the mv command
It's completely unnecessary to call awk or sed for this, you can do it in your shell. e.g. with bash:
$ f='071212_090537.jpg'
$ [[ $f =~ ^.(.*)_(.*)(..)\.[^.]+$ ]]
$ echo "1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
1712120905.37
This is probably what you're trying to do:
for old in *.jpg; do
[[ $old =~ ^.(.*)_(.*)(..)\.[^.]+$ ]] || { printf 'Warning, unexpected old file name format "%s"\n' "$old" >&2; continue; }
new="1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
[[ -f "$new" ]] && { printf 'Warning, new file name "%s" generated from "%s" already exists, skipping.\n' "$new" "$old" >&2; continue; }
mv -- "$old" "$new"
done
You need that test for new already existing since an old of 071212_090537.jpg or 171212_090537.jpg (or various other values) would create the same new of 1712120905.37
I think sed really is the easiest solution:
You could do this:
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
For more info:
You probably need to read an introductory tutorial on regular expressions.
Note that the -E option to sed allows use of extended regular expressions, allowing a more readable and convenient expression here.
Use of <<< is a Bashism known as a "here-string". If you are using a shell that doesn't support that, A <<< $b can be rewritten as echo $b | A.
Testing:
▶ touch 071212_090538.jpg 071212_090539.jpg
▶ ls -1 *.jpg
071212_090538.jpg
071212_090539.jpg
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
▶ ls -1
0712120905.38.jpg
0712120905.39.jpg

How do I correctly retrieve, using bash' cut, the first field from a line with only 1 field in a text file?

In a text file (accounts.txt) with (financial) accounts the sub-accounts are, and need to be, separated by an underscore, looking like this:
assets
assets_hh
assets_hh_reimbursements
assets_hh_reimbursements_ff
... etc.
Now I want to get specific sub-accounts from specific line numbers, e.g.:
field 3 from line 4:
$ lnr=4; fnr=3
$ cut -d $'\n' -f "$lnr" < accounts.txt | cut -d _ -f "$fnr"
reimbursements
$
But both fnr=1 and fnr=2 give for the first line, which has only 1 field:
$ cut -d $'\n' -f 1 < accounts.txt | cut -d _ -f "fnr"
assets
$
which is undesired behaviour.
Now I can get around this by prefixing an underscore to each account and add 1 to each required field number, but this is not an elegant solution.
Am I doing something wrong and/or can this be changed by issuing a different retrieval command?
Using the cut -d $'\n' -f "$lnr" for getting the lnr-th line from the file is somewhat strange. More common approach is using sed, like:
sed -n "${lnr}p" file | cmd ...
However, for this the awk is better - in one invocation could handle the lnr and fnr too.
file=accounts.txt
lnr=1
fnr=2
awk -F_ -v l=$lnr -v f=$fnr 'NR==l{print $f}' "$file"
The above for the all combinations lnr/fnr produces:
line field1 field2 field3 field4
------------------------------------------------------------------------
assets assets
assets_hh assets hh
assets_hh_reimbursements assets hh reimbursements
assets_hh_reimbursements_ff assets hh reimbursements ff
Check below solution -
cat f
assets
assets_hh
assets_hh_reimbursements
assets_hh_reimbursements_ff
Based on your comment try below commands -
$ lnr=1; fnr=2
$ echo $lnr $fnr
1 2
$ awk -v lnr=$lnr -v fnr=$fnr -F'_' 'NR==lnr {print $fnr}' f
###Output is nothing as line 1 column 2 is blank when FS="_"
$ lnr=4;fnr=1
$ echo $lnr $fnr
4 1
$ awk -v lnr=$lnr -v fnr=$fnr -F'_' 'NR==lnr {print $fnr}' f
assets
$ lnr=4;fnr=3
$ echo $lnr $fnr
4 3
$ awk -v lnr=$lnr -v fnr=$fnr -F'_' 'NR==lnr {print $fnr}' f
reimbursements
One solution is to head|tail and read into an array so it's easier to work with the items:
lnr=4
fnr=2
IFS=_ read -r -a arr < <(head -n "$lnr" accounts.txt | tail -n 1)
#note that the array is 0-indexed, so the fieldnumber has to fit that
echo "${arr[$fnr]}"
Then you could expand the idea into a more usable function:
get_field_from_file() {
local fname="$1"
local lnr="$2"
local fnr="$3"
IFS=_ read -r -a arr < <(head -n "$lnr" "$fname" | tail -n 1)
if (( $fnr > ${#arr[#]} )); then
return 1
else
echo "${arr[$fnr]}"
fi
}
field=$(get_field_from_file "accounts.txt" "4" "2") || echo "no such line or field"
[[ -n $field ]] && echo "field: $field"

How do you change a variable in a KSH if or case statement?

Does anyone know how to set a variable with global scope in a KSH if, case, or loop statement?
I am trying to run the following code but the script only echo's "H" instead of the actual value seen in the input file.
CFG_DIR=${WORK_DIR}/cfg
CFG_FILE=${CFG_DIR}/$1
NAME=$(echo $CFG_FILE | cut -f1 -d\.)
UPPER_BUS_NETWORK="H"
cat ${CFG_FILE} | grep -v ^\# |
while read CLINE
do
PROPERTY=$(echo $CLINE | cut -f1 -d\=)
VALUE=$(echo $CLINE | cut -f2 -d\=)
if [ ${PROPERTY} = "UpperBusService" ]; then
UPPER_BUS_SERVICE="${VALUE}"
fi
if [ ${PROPERTY} = "UpperBusNetwork" ]; then
UPPER_BUS_NETWORK="${VALUE}"
fi
done
echo ${UPPER_BUS_NETWORK}
Are you sure you're running that in ksh? Which version? Ksh93 doesn't set up a subshell in a while loop. Bash, dash, ash and pdksh do, though. I'm not sure about ksh88.
Compare
$ bash -c 'a=111; echo foo | while read bar; do echo $a; a=222; echo $a; done; echo "after: $a"'
111
222
after: 111
to
ksh -c 'a=111; echo foo | while read bar; do echo $a; a=222; echo $a; done; echo "after: $a"'
111
222
after: 222
Zsh gives the same result as ksh93.
Unfortunately, pdksh doesn't support process substitution and ksh93 does, but not when redirected into the done of a while loop, so the usual solution which works in Bash is not available. This is what it would look like:
# Bash (or Zsh)
while read ...
do
...
done < <(command)
Using a temporary file may be the only solution:
command > tmpfile
while read
do
...
done < tmpfile
Some additional notes:
Instead of cat ${CFG_FILE} | grep -v ^\# do grep -v ^\# "${CFG_FILE}"
Usually, you should use read -r so backslashes are handled literally
Instead of NAME=$(echo $CFG_FILE | cut -f1 -d\.) you should be able to do something like NAME=${CFG_FILE%%.*} and VALUE=${#*=}; VALUE=${VALUE%%=*}
Variables should usually be quoted on output, for example in each of your echo statements and your cat command
I recommend the habit of using lowercase or mixed case variable names to avoid conflict with shell variables (though none are present in your posted code)

Problem with awk and grep

I am using the following script to get the running process to print the id, command..
if [ "`uname`" = "SunOS" ]
then
awk_c="nawk"
ps_d="/usr/ucb/"
time_parameter=7
else
awk_c="awk"
ps_d=""
time_parameter=5
fi
main_class=RiskEngine
connection_string=db.regression
AWK_CMD='BEGIN{printf "%-15s %-6s %-8s %s\n","ID","PID","STIME","Cmd"} {printf "%-15s %-6s %-8s %s %s %s\n","MY_APP",$2,$time_parameter, main_class, connection_string, port}'
while getopts ":pnh" opt; do
case $opt in
p) AWK_CMD='{ print $2 }'
do_print_message=1;;
n) AWK_CMD='{printf "%-15s %-6s %-8s %s %s %s\n","MY_APP",$2,$time_parameter,main_class, connection_string, port}' ;;
h) print "usage : `basename ${0}` {-p} {-n} : Returns details of process running "
print " -p : Returns a list of PIDS"
print " -n : Returns process list without preceding header"
exit 1 ;
esac
done
ps auxwww | grep $main_class | grep 10348 | grep -v grep | ${awk_c} -v main_class=$merlin_main_class -v connection_string=$merlin_connection_
string -v port=10348 -v time_parameter=$time_parameter "$AWK_CMD"
# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 6)
# uname -a
Linux deapp25v 2.6.9-67.0.4.EL #1 Fri Jan 18 04:49:54 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
When I am executing the following from the script independently or inside script
# ps auxwww | grep $main_class | grep 10348 | grep -v grep | ${awk_c} -v main_class=$merlin_main_class -v connection_string=$merlin_connection_string -v port=10348 -v time_parameter=$time_parameter "$AWK_CMD"
I get two rows on Linux:
ID PID STIME Cmd
MY_APP 6217 2355352 RiskEngine 10348
MY_APP 21874 5316 RiskEngine 10348
I just have one jvm (Java command) running in the background but still I see 2 rows.
I know one of them (Duplicate with pid 21874) comes from awk command that I am executing. It includes again the main class and the port so two rows. Can you please help me to avoid the one that is duplicate row?
Can you please help me?
AWK can do all that grepping for you.
Here is a simple example of how an AWK command can be selective:
ps auxww | awk -v select="$mainclass" '$0 ~ select && /10348/ && ! (/grep/ || /awk/) && {print}'
ps can be made to selectively output fields which will help a little to reduce false positives. However pgrep may be more useful to you since all you're really using is the PID from the result.
pgrep -f "$mainclass.*10348"
I've reformatted the code as code, but you need to learn that the return key is your friend. The monstrously long pipelines should be split over multiple lines - I typically use one line per command in the pipeline. You can also write awk scripts on more than one line. This makes your code more readable.
Then you need to explain to us what you are up to.
However, it is likely that you are using 'awk' as a variant on grep and are finding that the value 10348 (possibly intended as a port number on some command line) is also in the output of ps as one of the arguments to awk (as is the 'main_class' value), so you get the extra information. You'll need to revise the awk script to eliminate (ignore) the line that contains 'awk'.
Note that you could still be bamboozled by a command running your main class on port 9999 (any value other than 10348) if it so happens that it is run by a process with PID or PPID equal to 10348. If you're going to do the job thoroughly, then the 'awk' script needs to analyze only the 'command plus options' part of the line.
You're already using the grep -v grep trick in your code, why not just update it to exclude the awk process as well with grep -v ${awk_c}?
In other words, the last line of your script would be (on one line and with the real command parameters to awk rather than blah blah blah).:
ps auxwww
| grep $main_class
| grep 10348
| grep -v grep
| grep -v ${awk_c}
| ${awk_c} -v blah blah blah
This will ensure the list of processes will not containg any with the word awk in it.
Keep in mind that it's not always a good idea to do it this way (false positives) but, since you're already taking the risk with processes containing grep, you may as well do so with those containing awk as well.
You can add this simple code in front of all your awk args:
'!/awk/ { .... original awk code .... }'
The '!/awk/' will have the effect of telling awk to ignore any line containing the string awk.
You could also remove your 'grep -v' if you extended my awk suggestion into something like:
'!/awk/ && !/grep/ { ... original awk code ... }'.

Print columns with Awk or Cut?

I'm writing a script that will take a filename as an argument, find a word a specific word at the beginning of each line - the word ATOM, in this case - and print the values from specific columns.
$FILE=*.pdb *
if test $# -lt 1
then
echo "usage: $0 Enter a .PDB filename"
exit
fi
if test -r $FILE
then
grep ^ATOM $FILE | awk '{ print $18 }' | awk '{ print NR $4, "\t" $38,}'
else
echo "usage: $FILE must be readable"
exit
fi
I'm having trouble figuring out three problems:
How to use awk to print only lines that contain ATOM as the first word
How to use awk to print only certain columns from the rows that match the above criteria, specifically columns 2-20 and 38-40
How can I indicate this must be a pdb file? *.pdb *
That would be
awk '$1 == "ATOM"' $FILE
That task is probably better accomplished with cut:
grep ^ATOM $FILE | cut -c 2-20,38-40
If you want to ensure that the filename passed as the first argument to your script ends with .pdb: first, please don't (file extensions don't really matter in UNIX), and secondly, if you must, here's one way:
"${1%%.pdb}" == "$1" && echo "usage:..." && exit 1
This takes the first command-line argument ($1), strips the suffix .pdb if it exists, and then compares it to the original command-line argument. If they match, it didn't have the suffix, so the program prints a usage message and exits with status code 1.
Contrary to the answer, your task can be accomplished with just one awk command. No need grep or cut or ...
if [ $# -lt 1 ];then
echo "usage: $0 Enter a .PDB filename"
exit
fi
FILE="$1"
case "$FILE" in
*.pdb )
if test -r $FILE
then
# do for 2-20 assuming whites paces as column separators
awk '$1=="ATOM" && NF>18 {
printf "%s ",$2
for(i=3;i<=19;i++){
printf "%s ",$i
}
printf "%s",$20
}' "$FILE"
else
echo "usage: $FILE must be readable"
exit
fi
;;
*) exit;;
esac
You can do everything you need in native bash without spawning any sub-processes:
#!/bin/bash
declare key="ATOM"
declare print_columns=( 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 38 39 40 )
[ ! -f "${1}" ] && echo "File not found." && exit
[ "${1%.pdb}" == "${1}" ] && echo "File is wrong type." && exit
while read -a columns; do
if [ ${columns[0]} == ${key} ]; then
printf "%s " ${key}
for print_column in ${print_columns[#]}; do
printf "%s " ${columns[${print_column}]}
fi
printf "\n"
fi
done < ${1}