I am interested in efficiently searching files for content using bash and related tools (eg sed, grep), in the specific case that I have additional information about where in the file the intended content is. For example, I want to replace a particular string in line #3 of each file that contains a specific string on line 3 of the file. Therefore, I don't want to do a recursive grep -r on the whole directory as that would search the entirety of each file, wasting time since I know that the string of interest is on line #3, if it is there. This full-grep approach could be done with grep -rl 'string_to_find_in_files' base_directory_to_search_recursively. Instead I am thinking about using sed -i ".bak" '3s/string_to_replace/string_to_replace_with' files to search only on line #3 of all files recursively in a directory, however sed seems to only be able to take one file as input argument. How can I apply sed to multiple files recursively? find -exec {} \; and find -print0 | xargs -0 seem to be very slow.. Is there a faster method than using find? I can achieve the desired effect very quickly with awk but only on a single directory, it does not seem to me to be recursive, such as using awk 'FNR==3{print $0}' directory/*. Any way to make this recursive? Thanks.
You can use find to have the list of files and feed to sed or awk one by one by xargs
for example, this will print the first lines of all files listed by find.
$ find . -name "*.csv" | xargs -L 1 sed -n '1p'
Related
I have a directory full of output files, with files with names:
file1.out,file2.out,..., fileN.out.
There is a certain key string in each of these files, lets call it keystring. I want to replace every instance of keystring with newstring in all files.
If there was only one file, I know I can do:
awk '{gsub("keystring","newstring",$0); print $0}' file1.out > file1.out
Is there a way to loop through all N files in awk?
You could use find command for the same. Please make sure you run this on a test file first once it works fine then only run it in your actual path(on all your actual files) for safer side. This also needs gawk newer version which has inplace option in it to save output into files itself.
find your_path -type f -name "*.out" -exec awk -i inplace -f myawkProgram.awk {} +
Where your awk program is as follows: as per your shown samples(cat myawkProgram.awk is only to show contents of your awk program here).
cat myawkProgram.awk
{
gsub("keystring","newstring",$0); print $0
}
2nd option would be pass all .out format files into your gawk program itself with -inplace by doing something like(but again make sure you run this on a single test file first and then run actual command for safer side once you are convinced by command):
awk -i inplace '{gsub("keystring","newstring",$0); print $0}' *.out
sed is the most ideal solution for this and so integrating it with find:
find /directory/path -type f -name "*.out" -exec sed -i 's/keystring/newstring/g' {} +
Find files with the extension .out and then execute the sed command on as many groups of the found files as possible (using + with -exec)
I have a bunch of output files labelled file1.out, file2.out, file3.out, ...,fileN.out.
All of these files have multiple instances of a string in them called "keystring". However, only the first instance of "keystring" is meaningful to me. The other lines are not required.
When I do
grep 'keystring' *.out
I reach all files, and they output every instance of keystring.
When I do grep -m1 'keystring' *.out I only get the instance when file1.out has keystring.
I want to extract the line where keystring appears FIRST in all these output files. How can I pull this off?
You can use awk:
awk '/keystring/ {print FILENAME ":", $0; nextfile}' *.out
nextfile will move to next file as soon as it has printed first match from current file.
Use find -exec like so:
find . -name '*.out' -exec grep -m1 'keystring' {} \;
SEE ALSO:
GNU find manual
I have seen questions that are close to this but I have not seen the exact answer I need and can't seem to get my head wrapped around the regex, awk, sed, grep, rename that I would need to make it happen.
I have files in one directory sequentially named from multiple sub directories of a different directory created using find piped to xargs.
Command I used:
find `<dir1>` -name "*.png" | xargs cp -t `<dir2>`
This resulted in the second directory containing duplicate filenames sequentially named as follows:
<name>.png
<name>.png.~1~
<name>.png.~2~
...
<name>.png.~n~
What I would like to do is take all files ending in ~*~ and rename it as follows:
<name>.#.png where the '#" is the number between the "~"s at the end of the file name
Any help would be appreciated.
With Perl's rename (stand alone command):
rename -nv 's/^([^.]+)\.(.+)\.~([0-9]+)~/$1.$3.$2/' *
If everything looks fine remove option -n.
There might be an easier way to this, but here is a small shell script using grep and awk to achieve what you wanted
for i in $(ls|grep ".png."); do
name=$(echo $i|awk -F'png' '{print $1}');
n=$(echo $i|awk -F'~' '{print $2}');
mv $i $name$n.png;
done
i have hundreads of folders with a subfolder named "thumbs" under each folder. i need to change the "thumbs" subfolder name with "thumb", under each subfolder.
i tried
find . -type d -exec rename 's/^thumbs$/thumb/' {} ";"
and i run this in shell when i am inside the folder that contains all subfolders, and each one of these subfolders contains the "thumbs" folder that need to be renamed with "thumb".
well I ran that command and shell stayed a lot of time thinking, then i gave a CTRL+C to stop, but I checked and no folder was renamed under current directory, I dont know if i renamed folders outside the directory i was in, can someone tell me where i am wrong with the code?
Goal 1: To change a subfolder "thumbs" to "thumb" if only one level deep.
Example Input:
./foo1/thumbs
./foo2/thumbs
./foo2/thumbs
Solution:
find . -maxdepth 2 -type d | sed 'p;s/thumbs/thumb/' | xargs -n2 mv
Output:
./foo1/thumb
./foo2/thumb
./foo2/thumb
Explanation:
Use find to give you all "thumbs" folders only one level deep. Pipe the output to sed. The p option prints the input line and the rest of the sed command changes "thumbs" to "thumb". Finally, pipe to xargs. The -n2 option tells xargs to use two arguments from the pipe and pass them to the mv command.
Issue:
This will not catch deeper subfolders. You can't simple not use depth here because find prints the output from the top and since we are replacing things with sed before we mv, mv will result in a error for deeper subfolders. For example, ./foo/thumbs/thumbs/ will not work because mv will take care of ./foo/thumbs first and make it ./foo/thumb, but then the next output line will result in an error because ./foo/thumbs/thumbs/ no longer exist.
Goal 2: To change all subfolders "thumbs" to "thumb" regardless of how deep.
Example Input:
./foo1/thumbs
./foo2/thumbs
./foo2/thumbs/thumbs
./foo2/thumbs
Solution:
find . -type d | awk -F'/' '{print NF, $0}' | sort -k 1 -n -r | awk '{print $2}' | sed 'p;s/\(.*\)thumbs/\1thumb/' | xargs -n2 mv
Output:
./foo1/thumb
./foo2/thumb
./foo2/thumb/thumb
./foo2/thumb
Explanation:
Use find to give you all "thumbs" subfolders. Pipe the output to awk to print the number of '/'s in each path plus the original output. sort the output numerically, in reverse (to put the deepest paths on top) by the number of '/'s. Pipe the sorted list to awk to remove the counts from each line. Pipe the output to sed. The p option prints the input line and the rest of the sed command finds the last occurrence of "thumbs" and changes only it to "thumb". Since we are working with sorted list in the order of deepest to shallowest level, this will provide mv with the right commands. Finally, pipe to xargs. The -n2 option tells xargs to use two arguments from the pipe and pass them to the mv command.
I am trying to run the following to extract the text from all the pdfs
find *.pdf | awk '{system("pdftotext "$0)}'
but dang it some crazy person put spaces in file names, how can I deal with this smoothly?
What is awk's role in this? Perhaps you should let find execute things itself.
find . -name \*.pdf -exec /path/to/pdftotext {} \;
Or if you're really really stuck with assuming that filenames will be safe as stdout to find (which you've proven they are not simply by asking this question), then put the filenames in quotes. This will work:
find . -name \*.pdf -print | awk '{cmd=sprintf("pdftotext \"%s\"", $0);system(cmd);}'