Print columns with Awk or Cut? - scripting

I'm writing a script that will take a filename as an argument, find a word a specific word at the beginning of each line - the word ATOM, in this case - and print the values from specific columns.
$FILE=*.pdb *
if test $# -lt 1
then
echo "usage: $0 Enter a .PDB filename"
exit
fi
if test -r $FILE
then
grep ^ATOM $FILE | awk '{ print $18 }' | awk '{ print NR $4, "\t" $38,}'
else
echo "usage: $FILE must be readable"
exit
fi
I'm having trouble figuring out three problems:
How to use awk to print only lines that contain ATOM as the first word
How to use awk to print only certain columns from the rows that match the above criteria, specifically columns 2-20 and 38-40
How can I indicate this must be a pdb file? *.pdb *

That would be
awk '$1 == "ATOM"' $FILE
That task is probably better accomplished with cut:
grep ^ATOM $FILE | cut -c 2-20,38-40
If you want to ensure that the filename passed as the first argument to your script ends with .pdb: first, please don't (file extensions don't really matter in UNIX), and secondly, if you must, here's one way:
"${1%%.pdb}" == "$1" && echo "usage:..." && exit 1
This takes the first command-line argument ($1), strips the suffix .pdb if it exists, and then compares it to the original command-line argument. If they match, it didn't have the suffix, so the program prints a usage message and exits with status code 1.

Contrary to the answer, your task can be accomplished with just one awk command. No need grep or cut or ...
if [ $# -lt 1 ];then
echo "usage: $0 Enter a .PDB filename"
exit
fi
FILE="$1"
case "$FILE" in
*.pdb )
if test -r $FILE
then
# do for 2-20 assuming whites paces as column separators
awk '$1=="ATOM" && NF>18 {
printf "%s ",$2
for(i=3;i<=19;i++){
printf "%s ",$i
}
printf "%s",$20
}' "$FILE"
else
echo "usage: $FILE must be readable"
exit
fi
;;
*) exit;;
esac

You can do everything you need in native bash without spawning any sub-processes:
#!/bin/bash
declare key="ATOM"
declare print_columns=( 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 38 39 40 )
[ ! -f "${1}" ] && echo "File not found." && exit
[ "${1%.pdb}" == "${1}" ] && echo "File is wrong type." && exit
while read -a columns; do
if [ ${columns[0]} == ${key} ]; then
printf "%s " ${key}
for print_column in ${print_columns[#]}; do
printf "%s " ${columns[${print_column}]}
fi
printf "\n"
fi
done < ${1}

Related

find pattern in multiple files and perform some action on them

I have 2 files - file1.txt and file2.txt.
I want to set a condition such that, a command is run on both files only if a pattern "xyz" is present in both files. Even if one file fails to have that pattern, the command shouldn't run. Also , I need to have both files being passed to the grep or awk command at the same time as I am using this code inside another workflow language.
I wrote some code with grep, but this code performs the action even if the pattern is present in one of the files, which is not what I want . Please let me know if there is a better way to do this.
if grep "xyz" file1.txt file2.txt; then
my_command file1.txt file2.txt
else
echo " command cannot be run on these files"
fi
Thanks!
This awk should work for you:
awk -v s='xyz' 'FNR == NR {
if ($0 ~ s) {
++p
nextfile
}
next
}
FNR == 1 {
if (!p) exit 1
}
{
if ($0 ~ s) {
++p
exit
}
}
END {
exit p < 2
}' file1 file2
This will exit with 0 if given string is found in both the files otherwise it will exit with 1.
Salvaging code from a deleted answer by Cyrus:
if grep -q "xyz" file1.txt && grep -q "xyz" file2.txt; then
echo "xyz was found in both files"
else
echo "xyz was found in one or no file"
fi
If you need to run a single command, save this as a script, and run that script in your condition.
#!/bin/sh
grep -q "xyz" "$1" && grep -q "xyz" "$2"
If you save this in your PATH and call it grepboth (don't forget to chmod a+x grepboth when you save it) your condition can now be written
grepboth file1.txt file2.txt
Or perhaps grepall to accept a search expression and a list of files;
#!/bin/sh
what=$1
shift
for file; do
grep -q "$what" "$file" || exit
done
This could be used as
grepall "xyz" file1.txt file2.txt

how to check if file exist and not empty in awk using wildcard in filename or path

I am trying to create a small awk line that should go through several paths and in each path find a specific file that should not be empty (wildcard). If the file is not found or empty it should print "NULL".
I did some searching in stackoverflow and other places but couldn't really make it work.
Example: path is /home/test[1..5]/test.json
awk -F"[{}]" '{ if (system("[ ! -f FILENAME ]") == 0 && NR > 0 && NF > 0) print $2; else print "NULL"}' /home/test*/test.txt
If the test.txt is empty or does not exists it should print "NULL" but meanwhile when it is not empty it should print $2.
In the above example it will just skip the empty file and not write "NULL"!
Example execution /home/ has test1, test2, test3 path and each path has one test.txt (/home/test1/test.txt is empty):
The test.txt file in each of the /home/test* path will be empty or the below kind of text (always one line):
{"test":1033}
# awk -F"[{}]" '{ if (system("[ ! -f FILENAME ]") == 0 && NR > 0 && NF > 0) print $2; else print "NULL"}' /home/test*/test.txt
"test":1033
"test":209
File examples:
/home/test0/test.txt (not empty -> {"test":1033})
/home/test1/test.txt (empty)
/home/test2/test.txt (not empty -> {"test":209})
/home/test3/test.txt (not exist)
But for ../test1/test.txt I would like to see "NULL" but instead I see nothing!
I would like to have a printout like the below:
"test":1033
NULL
"test":209
NULL
What am I doing wrong?
BR
If I understand what you are asking correctly, there is no need for a system call. One can use ENDFILE to check to see if a file was empty.
Try this:
awk -F"[{}]" '{print $2} ENDFILE{if(FNR==0)print "NULL"}' /home/test*/test.txt
FNR is the number of records in a file. If FNR is zero at the end of a file, then that file had not records and we print NULL.
Note: Since this solution use ENDFILE, Ed Morton points out that GNU awk (sometimes called gawk) is required.
Example
Suppose that we have these three files:
$ ls -1 home/test*/test.txt
home/test1/test.txt
home/test2/test.txt
home/test3/test.txt
All are empty except home/test2/test.txt which contains:
$ cat home/test2/test.txt
first{second}
1st{2nd}
Our command produces the output:
$ awk -F"[{}]" '{print $2} ENDFILE{if(FNR==0)print "NULL"}' home/test*/test.txt
NULL
second
2nd
NULL
Test for non-existent files
for d in home/test*/; do [ -f "$d/test.txt" ] || echo "Missing $d/test.txt"; done
Sample output:
$ for d in home/test*/; do [ -f "$d/test.txt" ] || echo "Missing $d/test.txt"; done
Missing home/test4//test.txt
for dir in home/test*; do
file="$dir/test.txt"
if [ -s "$file" ]; then
# exists and is non-empty
val=$( awk -F'[{}]' '{print $2}' "$file" )
else
# does not exist or is empty
val="NULL"
fi
printf '%s\n' "$val"
done

How to move grep inside awk script?

In the below have I 3 grep commands that I would like to replace with awk's grep. so I have tried
! /000000000000/;
! /000000000000/ $0;
! /000000000000/ $3;
where I don't get an error, but testing with both the script below and
$ echo 000000000000 | awk '{ ! /000000000000/; print }'
000000000000
it doesn't skip the lines as expected.
Question
Can anyone explain why my "not grep" doesn't work in awk?
grep -v '^#' $hosts | grep -E '[0-9A-F]{12}\b' | grep -v 000000000000 | awk '{
print "host "$5" {"
print " option host-name \""$5"\";"
gsub(/..\B/,"&:", $3)
print " hardware ethernet "$3";"
print " fixed-address "$1";"
print "}"
print ""
}' > /etc/dhcp/reservations.conf
Could you please try changing your code to:
echo 000000000000 | awk '!/000000000000/'
Problem in your attempt: $ echo 000000000000 | awk '{ ! /000000000000/; print }' Since you are checking condition ! /000000000000/ which is having ; after it so that condition works well and DO NOT print anything. But then you have print after it which is NOT COMING under that condition so it simply prints that line.
awk works on pattern{action} if you are putting semi colon in between it means that condition ends before it and statement after ; is all together a new statements for awk.
EDIT: Adding possible solution by seeing OP's attempt here, not tested at all since no samples are shown by OP. Also I am using --re-interval since my awk version is old you could remove in case you have new version of awk in your box.
awk --re-interval '!/^#/ && !/000000000000/ && /[0-9A-Fa-f]{12}/{
print "host "$5" {"
print " option host-name \""$5"\";"
gsub(/..\B/,"&:", $3)
print " hardware ethernet "$3";"
print " fixed-address "$1";"
print "}"
print ""
}' "$host" > /etc/dhcp/reservations.conf
Taking a look at your code:
$ echo 000000000000 | awk '
{
! /000000000000/ # on given input this evaluates to false
# but since its in action, affects nothing
print # this prints the record regardless of whatever happened above
}'
Adding a print may help you understand:
$ echo 000000000000 | awk '{ print ! /000000000000/; print }'
0
000000000000
Removing the !:
$ echo 000000000000 | awk '{ print /000000000000/; print }'
1
000000000000
This is all I can help you with since there is not enough information for more.

while loop only iterates once

I'm writing a unix script which does an awk and pipes to a while loop. For some reason, though, the while loop iterates only once. Can someone point out what I am missing?
awk '{ print $1, $2}' file |
while IFS=" " read A B
do
echo $B
if [ "$B" -eq "16" ];
then
grep -A 1 $A $1 | python unreverse.py
else
grep -A 1 $A
fi
done
"file" looks something like
cheese 2
elephant 5
tiger 16
Solution
The solution is to replace:
grep -A 1 $A
With:
grep -A 1 "$A" filename
Where filename is whatever file you intended grep to read from. Just guessing, maybe you intended:
grep -A 1 "$A" "$1"
I added double-quotes to prevent any possible word-splitting.
Explanation
The problem is that, without the filename, the grep command reads from and consumes all of standard input. It does this on the first run through the loop. Consequently, there is not input left for the second run and read A B fails and the loop terminates.
A Simpler Example
We can see the same issue happening with many fewer statements. Here is a while loop that is given two lines of input but only loops once:
$ { echo 1; echo 2; } | while read n; do grep "$n"; echo "n=$n"; done
n=1
Here, simply by adding a filename to the grep statement, we see that the while loop executes twice, as it should:
$ { echo 1; echo 2; } | while read n; do grep "$n" /dev/null; echo "n=$n"; done
n=1
n=2

printing variable inside awk

In this script , I want awk to print the variables $file, $f, $order and sum/NR (all in a single row)
#!/bin/bash
for file in pmb_mpi tau xhpl mpi_tile_io fftw ; do
for f in 2.54 1.60 800 ;do
if [ ${f} = 2.54 ]
then
for order in even odd ; do
# echo ${file}_${f}_${order}_v1.xls >> P-state-summary.xls
awk '{sum+=$2} END {print ${file}_${f}_${order}_v1.xls, sum/NR}' ${file}_${f}_${order}_v1.xls >> P-state-summary.xls
done
else
# echo ${file}_${f}_v1.xls >> P-state-summary.xls
awk '{sum+=$2} END {print ${file}_${f}_v1.xls , sum/NR}' ${file}_${f}_v1.xls >> P-state-summary.xls
fi
done
done
Colud anyone of you kindly help me with this ?
awk doesn't go out and get shell variables for you, you have to pass them in as awk variables:
pax> export x=XX
pax> export y=YY
pax> awk 'BEGIN{print x "_" y}'
_
pax> awk -vx=$x -v y=$y 'BEGIN{print x "_" y}'
XX_YY
There is another way of doing it by using double quotes instead of single quotes (so that bash substitutes the values before awk sees them), but then you have to start escaping $ symbols and all sorts of other things in your awk command:
pax> awk "BEGIN {print \"${x}_${y}\"}"
XX_YY
I prefer to use explicit variable creation.
By the way, there's another solution to your previous related question here which should work.
You can do this:
echo -n "${file}_${f}_${order}_v1.xls " >> P-state-summary.xls
# or printf "${file}_${f}_${order}_v1.xls " >> P-state-summary.xls
awk '{sum+=$2} END {print sum/NR}' "${file}_${f}_${order}_v1.xls" |
tee "${file}_${f}_avrg.xls" >> P-state-summary.xls
Using echo -n or printf without a "\n" will output the text without a newline so the output of the awk command will follow it on the same line. I added a space as a separator, but you could use anything.
Using tee will allow you to write your output to the individual files and the summary file using only one awk invocation per input (order) file.