find pattern in multiple files and perform some action on them

find pattern in multiple files and perform some action on them - awk

I have 2 files - file1.txt and file2.txt.
I want to set a condition such that, a command is run on both files only if a pattern "xyz" is present in both files. Even if one file fails to have that pattern, the command shouldn't run. Also , I need to have both files being passed to the grep or awk command at the same time as I am using this code inside another workflow language.
I wrote some code with grep, but this code performs the action even if the pattern is present in one of the files, which is not what I want . Please let me know if there is a better way to do this.
if grep "xyz" file1.txt file2.txt; then
my_command file1.txt file2.txt
else
echo " command cannot be run on these files"
fi
Thanks!

This awk should work for you:
awk -v s='xyz' 'FNR == NR {
if ($0 ~ s) {
++p
nextfile
}
next
}
FNR == 1 {
if (!p) exit 1
}
{
if ($0 ~ s) {
++p
exit
}
}
END {
exit p < 2
}' file1 file2
This will exit with 0 if given string is found in both the files otherwise it will exit with 1.

Salvaging code from a deleted answer by Cyrus:
if grep -q "xyz" file1.txt && grep -q "xyz" file2.txt; then
echo "xyz was found in both files"
else
echo "xyz was found in one or no file"
fi
If you need to run a single command, save this as a script, and run that script in your condition.
#!/bin/sh
grep -q "xyz" "$1" && grep -q "xyz" "$2"
If you save this in your PATH and call it grepboth (don't forget to chmod a+x grepboth when you save it) your condition can now be written
grepboth file1.txt file2.txt
Or perhaps grepall to accept a search expression and a list of files;
#!/bin/sh
what=$1
shift
for file; do
grep -q "$what" "$file" || exit
done
This could be used as
grepall "xyz" file1.txt file2.txt

Related

how to check if file exist and not empty in awk using wildcard in filename or path

I am trying to create a small awk line that should go through several paths and in each path find a specific file that should not be empty (wildcard). If the file is not found or empty it should print "NULL".
I did some searching in stackoverflow and other places but couldn't really make it work.
Example: path is /home/test[1..5]/test.json
awk -F"[{}]" '{ if (system("[ ! -f FILENAME ]") == 0 && NR > 0 && NF > 0) print $2; else print "NULL"}' /home/test*/test.txt
If the test.txt is empty or does not exists it should print "NULL" but meanwhile when it is not empty it should print $2.
In the above example it will just skip the empty file and not write "NULL"!
Example execution /home/ has test1, test2, test3 path and each path has one test.txt (/home/test1/test.txt is empty):
The test.txt file in each of the /home/test* path will be empty or the below kind of text (always one line):
{"test":1033}
# awk -F"[{}]" '{ if (system("[ ! -f FILENAME ]") == 0 && NR > 0 && NF > 0) print $2; else print "NULL"}' /home/test*/test.txt
"test":1033
"test":209
File examples:
/home/test0/test.txt (not empty -> {"test":1033})
/home/test1/test.txt (empty)
/home/test2/test.txt (not empty -> {"test":209})
/home/test3/test.txt (not exist)
But for ../test1/test.txt I would like to see "NULL" but instead I see nothing!
I would like to have a printout like the below:
"test":1033
NULL
"test":209
NULL
What am I doing wrong?
BR

If I understand what you are asking correctly, there is no need for a system call. One can use ENDFILE to check to see if a file was empty.
Try this:
awk -F"[{}]" '{print $2} ENDFILE{if(FNR==0)print "NULL"}' /home/test*/test.txt
FNR is the number of records in a file. If FNR is zero at the end of a file, then that file had not records and we print NULL.
Note: Since this solution use ENDFILE, Ed Morton points out that GNU awk (sometimes called gawk) is required.
Example
Suppose that we have these three files:
$ ls -1 home/test*/test.txt
home/test1/test.txt
home/test2/test.txt
home/test3/test.txt
All are empty except home/test2/test.txt which contains:
$ cat home/test2/test.txt
first{second}
1st{2nd}
Our command produces the output:
$ awk -F"[{}]" '{print $2} ENDFILE{if(FNR==0)print "NULL"}' home/test*/test.txt
NULL
second
2nd
NULL
Test for non-existent files
for d in home/test*/; do [ -f "$d/test.txt" ] || echo "Missing $d/test.txt"; done
Sample output:
$ for d in home/test*/; do [ -f "$d/test.txt" ] || echo "Missing $d/test.txt"; done
Missing home/test4//test.txt

for dir in home/test*; do
file="$dir/test.txt"
if [ -s "$file" ]; then
# exists and is non-empty
val=$( awk -F'[{}]' '{print $2}' "$file" )
else
# does not exist or is empty
val="NULL"
fi
printf '%s\n' "$val"
done

Change a string using sed or awk

I have some files which have wrong time and date, but the filename contains the correct time and date and I try to write a script to fix this with the touch command.
Example of filename:
071212_090537.jpg
I would like this to be converted to the following format:
1712120905.37
Note, the year is listed as 07 in the filename, even if it is 17 so I would like the first 0 to be changed to 1.
How can I do this using awk or sed?
I'm quite new to awk and sed, an programming in general. Have tried to search for a solution and instruction, but haven't manage to figure out how to solve this.
Can anyone help me?
Thanks. :)

Take your example:
awk -F'[_.]' '{$0=$1$2;sub(/^./,"1");sub(/..$/,".&")}1'<<<"071212_090537.jpg"
will output:
1712120905.37
If you want the file to be removed, you can let awk generate the mv origin new command, and pipe the output to |sh, like: (comments inline)
listYourFiles| # list your files as input to awk
awk -F'[_.]' '{o=$0;$0=$1$2;sub(/^./,"1");sub(/..$/,".&");
printf "mv %s %s\n",o,$0 }1' #this will print "mv ori new"
|sh # this will execute the mv command

It's completely unnecessary to call awk or sed for this, you can do it in your shell. e.g. with bash:
$ f='071212_090537.jpg'
$ [[ $f =~ ^.(.*)_(.*)(..)\.[^.]+$ ]]
$ echo "1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
1712120905.37
This is probably what you're trying to do:
for old in *.jpg; do
[[ $old =~ ^.(.*)_(.*)(..)\.[^.]+$ ]] || { printf 'Warning, unexpected old file name format "%s"\n' "$old" >&2; continue; }
new="1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
[[ -f "$new" ]] && { printf 'Warning, new file name "%s" generated from "%s" already exists, skipping.\n' "$new" "$old" >&2; continue; }
mv -- "$old" "$new"
done
You need that test for new already existing since an old of 071212_090537.jpg or 171212_090537.jpg (or various other values) would create the same new of 1712120905.37

I think sed really is the easiest solution:
You could do this:
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
For more info:
You probably need to read an introductory tutorial on regular expressions.
Note that the -E option to sed allows use of extended regular expressions, allowing a more readable and convenient expression here.
Use of <<< is a Bashism known as a "here-string". If you are using a shell that doesn't support that, A <<< $b can be rewritten as echo $b | A.
Testing:
▶ touch 071212_090538.jpg 071212_090539.jpg
▶ ls -1 *.jpg
071212_090538.jpg
071212_090539.jpg
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
▶ ls -1
0712120905.38.jpg
0712120905.39.jpg

Prepend a # to the first line not already having a #

I have a file with options for a command I run. Whenever I run the command I want it to run with the options defined in the first line which is not commented out. I do this using this bash script:
while read run opt c; do
[[ $run == \#* ]] && continue
./submit.py $opt $run -c "$c"
break
done < to_submit.txt
The file to_submit.txt has entries like this:
#167993 options/optionfile.py long description
167995 options/other_optionfile.py other long description
...
After having run the submit script with the options in the last not commented out line, I want to comment out that line after the command ran successfully.
I can find the line number of the options I used adding this to the while loop:
line=$(grep -n "$run" to_submit.txt | grep "$opt" | grep "$c" | cut -f 1 -d ":")
But I'm not sure how to actually prepend a # to that line now. I could probably use head and tail to save the other lines and process that line separately and combine it all back into the file. But this sounds like it's to complicated, there must be an easier sed or awk solution to this.

$ awk '!f && sub(/^[^#]/,"#&"){f=1} 1' file
#167993 options/optionfile.py long description
#167995 options/other_optionfile.py other long description
...
To overwrite the contents of the original file:
awk '!f && sub(/^[^#]/,"#&"){f=1} 1' file > tmp && mv tmp file
just like with any other UNIX command.

Using GNU sed is probably simplest here:
sed '0,/^[^#]/ s//#&/' file
Add option -i if you want to update file in place.
'0,/^[^#]/ matches all lines up to and including the first one that doesn't start with #
s//#&/ then prepends # to that line.
Note that s//.../ (i.e., an empty regex) reuses the last matching regex in the range, which is /^[^#]/ in this case.
Note that the command doesn't work with BSD/OSX sed, unfortunately, because starting a range with 0 so as to allow the range endpoint to match the very first line also is not supported there. It is possible to make the command work with BSD/OSX sed, but it's more cumbersome.

If the input/output file is not very large, you can do it all in Bash:
optsfile=to_submit.txt
has_run_cmd=0
outputlines=()
while IFS= read -r inputline || [[ -n $inputline ]] ; do
read run opt c <<<"$inputline"
if (( has_run_cmd )) || [[ $run == \#* ]] ; then
outputlines+=( "$inputline" )
elif ./submit.py "$opt" "$run" -c "$c" ; then
has_run_cmd=1
outputlines+=( "#$inputline" )
else
exit $?
fi
done < "$optsfile"
(( has_run_cmd )) && printf '%s\n' "${outputlines[#]}" > "$optsfile"
The lines of the file are put in the outputlines array, with a hash prepended to the line that was used in the ./submit.py command. If the command runs successfully, the file is overwritten with the lines in outputlines.

After some searching around I found that
awk -v run="$run" -v opt="$opt" '{if($1 == run && $2 == opt) {print "#" $0} else print}' to_submit.txt > temp
mv -b -f temp to_submit.txt
seems to solve this (without needing to find the line number first, just comparing $ run and $opt). This assumes that the combination of run and opt is enough to identify a line and the comment is not needed (which happens to be true in my case). Not sure how the comment which is spanning multiple fields in awk would also be taken into account.

AWK - suppress stdout on system() function

I'm currently writing a shell script that will be given a directory, then output an ls of that directory with the return code from a C program appended to each line. The C program only needs to be called for regular files.
The problem I'm having is that output from the C program is cluttering up the output from awk, and I can't get stdout to redirect to /dev/null inside of awk. I have no use for the output, I just need the return code. Speed is definitely a factor, so if you have a more efficient solution I'd be happy to hear it. Code follows:
directory=$1
ls -i --full-time $directory | awk '
{
rc = 0
if (substr($2,1,1) == "-") {
dbType=system("cprogram '$directory'/"$10)
}
print $0 " " rc
}
'

awk is not shell so you cant just use a shell variable inside an awk script, and in shell always quote your variables. Try this:
directory="$1"
ls -i --full-time "$directory" | awk -v dir="$directory" '
{
rc = 0
if (substr($2,1,1) == "-") {
rc = system("cprogram \"" dir "/" $10 "\" >/dev/null")
}
print $0, rc
}
'
Oh and, of course, don't actually do this. See http://mywiki.wooledge.org/ParsingLs.
I just spent a minute thinking about what your script is actually doing and rather than trying to use awk as a shell and parse the output of ls, it looks like the solution you REALLY want would be more like:
directory="$1"
find "$directory" -type f -maxdepth 1 -print |
while IFS= read -r dirFile
do
op=$(ls -i --full-time "$dirFile")
cprogram "$dirFile" >/dev/null
rc="$?"
printf "%s %s\n" "$op" "$rc"
done
and you could probably save a step by using the -printf arg for find to get whatever info you're currently using ls for.

Print columns with Awk or Cut?

I'm writing a script that will take a filename as an argument, find a word a specific word at the beginning of each line - the word ATOM, in this case - and print the values from specific columns.
$FILE=*.pdb *
if test $# -lt 1
then
echo "usage: $0 Enter a .PDB filename"
exit
fi
if test -r $FILE
then
grep ^ATOM $FILE | awk '{ print $18 }' | awk '{ print NR $4, "\t" $38,}'
else
echo "usage: $FILE must be readable"
exit
fi
I'm having trouble figuring out three problems:
How to use awk to print only lines that contain ATOM as the first word
How to use awk to print only certain columns from the rows that match the above criteria, specifically columns 2-20 and 38-40
How can I indicate this must be a pdb file? *.pdb *

That would be
awk '$1 == "ATOM"' $FILE
That task is probably better accomplished with cut:
grep ^ATOM $FILE | cut -c 2-20,38-40
If you want to ensure that the filename passed as the first argument to your script ends with .pdb: first, please don't (file extensions don't really matter in UNIX), and secondly, if you must, here's one way:
"${1%%.pdb}" == "$1" && echo "usage:..." && exit 1
This takes the first command-line argument ($1), strips the suffix .pdb if it exists, and then compares it to the original command-line argument. If they match, it didn't have the suffix, so the program prints a usage message and exits with status code 1.

Contrary to the answer, your task can be accomplished with just one awk command. No need grep or cut or ...
if [ $# -lt 1 ];then
echo "usage: $0 Enter a .PDB filename"
exit
fi
FILE="$1"
case "$FILE" in
*.pdb )
if test -r $FILE
then
# do for 2-20 assuming whites paces as column separators
awk '$1=="ATOM" && NF>18 {
printf "%s ",$2
for(i=3;i<=19;i++){
printf "%s ",$i
}
printf "%s",$20
}' "$FILE"
else
echo "usage: $FILE must be readable"
exit
fi
;;
*) exit;;
esac

You can do everything you need in native bash without spawning any sub-processes:
#!/bin/bash
declare key="ATOM"
declare print_columns=( 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 38 39 40 )
[ ! -f "${1}" ] && echo "File not found." && exit
[ "${1%.pdb}" == "${1}" ] && echo "File is wrong type." && exit
while read -a columns; do
if [ ${columns[0]} == ${key} ]; then
printf "%s " ${key}
for print_column in ${print_columns[#]}; do
printf "%s " ${columns[${print_column}]}
fi
printf "\n"
fi
done < ${1}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

find pattern in multiple files and perform some action on them - awk

This awk should work for you: awk -v s='xyz' 'FNR == NR { if ($0 ~ s) { ++p nextfile } next } FNR == 1 { if (!p) exit 1 } { if ($0 ~ s) { ++p exit } } END { exit p < 2 }' file1 file2 This will exit with 0 if given string is found in both the files otherwise it will exit with 1.

Related

how to check if file exist and not empty in awk using wildcard in filename or path

Change a string using sed or awk

Prepend a # to the first line not already having a #

AWK - suppress stdout on system() function

Print columns with Awk or Cut?

Categories

Resources