Shell Script Search and Delete Non Text Files - scripting

I want to write a shell script to search and delete all non text files in a directory..
I basically cd into the directory that I want to iterate through in the script and search through all files.
-- Here is the part I can't do --
I want to check using an if statement if the file is a text file.
If not I want to delete it
else continue
Thanks
PS By the way this is in linux
EDIT
I assume a file is a "text file" if and only if its name matches the shell pattern *.txt.

The file program always outputs the word "text" when passed the name of a file that it determines contains text format. You can test for output using grep. For example:
find -type f -exec file '{}' \; | grep -v '.*:[[:space:]].*text.*' | cut -d ':' -f 1
I strongly recommend printing out files to delete before deleting them, to the point of redirecting output to a file and then doing:
rm $(<filename)
after reviewing the contents of "filename". And beware of filenames with spaces, if you have those, things can get more involved.

Use the opposite of, unless an if statement is mandatory:
find <dir-path> -type f -name "*.txt" -exec rm {} \;
What the opposite is exactly is an exercise for you. Hint: it comes before -name.
Your question was ambiguous about how you define a "text file", I assume it's just a file with extension ".txt" here.

find . -type f ! -name "*.txt" -exec rm {} +;

Related

Using grep to only obtain first match in EACH file

I have a bunch of output files labelled file1.out, file2.out, file3.out, ...,fileN.out.
All of these files have multiple instances of a string in them called "keystring". However, only the first instance of "keystring" is meaningful to me. The other lines are not required.
When I do
grep 'keystring' *.out
I reach all files, and they output every instance of keystring.
When I do grep -m1 'keystring' *.out I only get the instance when file1.out has keystring.
I want to extract the line where keystring appears FIRST in all these output files. How can I pull this off?
You can use awk:
awk '/keystring/ {print FILENAME ":", $0; nextfile}' *.out
nextfile will move to next file as soon as it has printed first match from current file.
Use find -exec like so:
find . -name '*.out' -exec grep -m1 'keystring' {} \;
SEE ALSO:
GNU find manual

cygwin awk output files containing string

Okay so I wish to search my C drive for multiple .txt files containing the name "Public"
example of file names
public.txt
publicagain.txt
etc etc
and then output them into a folder
cd C: and use the below find command
find -type f -iname '*public*'

Search file contents recursively when know where in file

I am interested in efficiently searching files for content using bash and related tools (eg sed, grep), in the specific case that I have additional information about where in the file the intended content is. For example, I want to replace a particular string in line #3 of each file that contains a specific string on line 3 of the file. Therefore, I don't want to do a recursive grep -r on the whole directory as that would search the entirety of each file, wasting time since I know that the string of interest is on line #3, if it is there. This full-grep approach could be done with grep -rl 'string_to_find_in_files' base_directory_to_search_recursively. Instead I am thinking about using sed -i ".bak" '3s/string_to_replace/string_to_replace_with' files to search only on line #3 of all files recursively in a directory, however sed seems to only be able to take one file as input argument. How can I apply sed to multiple files recursively? find -exec {} \; and find -print0 | xargs -0 seem to be very slow.. Is there a faster method than using find? I can achieve the desired effect very quickly with awk but only on a single directory, it does not seem to me to be recursive, such as using awk 'FNR==3{print $0}' directory/*. Any way to make this recursive? Thanks.
You can use find to have the list of files and feed to sed or awk one by one by xargs
for example, this will print the first lines of all files listed by find.
$ find . -name "*.csv" | xargs -L 1 sed -n '1p'

SSH recursively change all subfolders names to a specific name

i have hundreads of folders with a subfolder named "thumbs" under each folder. i need to change the "thumbs" subfolder name with "thumb", under each subfolder.
i tried
find . -type d -exec rename 's/^thumbs$/thumb/' {} ";"
and i run this in shell when i am inside the folder that contains all subfolders, and each one of these subfolders contains the "thumbs" folder that need to be renamed with "thumb".
well I ran that command and shell stayed a lot of time thinking, then i gave a CTRL+C to stop, but I checked and no folder was renamed under current directory, I dont know if i renamed folders outside the directory i was in, can someone tell me where i am wrong with the code?
Goal 1: To change a subfolder "thumbs" to "thumb" if only one level deep.
Example Input:
./foo1/thumbs
./foo2/thumbs
./foo2/thumbs
Solution:
find . -maxdepth 2 -type d | sed 'p;s/thumbs/thumb/' | xargs -n2 mv
Output:
./foo1/thumb
./foo2/thumb
./foo2/thumb
Explanation:
Use find to give you all "thumbs" folders only one level deep. Pipe the output to sed. The p option prints the input line and the rest of the sed command changes "thumbs" to "thumb". Finally, pipe to xargs. The -n2 option tells xargs to use two arguments from the pipe and pass them to the mv command.
Issue:
This will not catch deeper subfolders. You can't simple not use depth here because find prints the output from the top and since we are replacing things with sed before we mv, mv will result in a error for deeper subfolders. For example, ./foo/thumbs/thumbs/ will not work because mv will take care of ./foo/thumbs first and make it ./foo/thumb, but then the next output line will result in an error because ./foo/thumbs/thumbs/ no longer exist.
Goal 2: To change all subfolders "thumbs" to "thumb" regardless of how deep.
Example Input:
./foo1/thumbs
./foo2/thumbs
./foo2/thumbs/thumbs
./foo2/thumbs
Solution:
find . -type d | awk -F'/' '{print NF, $0}' | sort -k 1 -n -r | awk '{print $2}' | sed 'p;s/\(.*\)thumbs/\1thumb/' | xargs -n2 mv
Output:
./foo1/thumb
./foo2/thumb
./foo2/thumb/thumb
./foo2/thumb
Explanation:
Use find to give you all "thumbs" subfolders. Pipe the output to awk to print the number of '/'s in each path plus the original output. sort the output numerically, in reverse (to put the deepest paths on top) by the number of '/'s. Pipe the sorted list to awk to remove the counts from each line. Pipe the output to sed. The p option prints the input line and the rest of the sed command finds the last occurrence of "thumbs" and changes only it to "thumb". Since we are working with sorted list in the order of deepest to shallowest level, this will provide mv with the right commands. Finally, pipe to xargs. The -n2 option tells xargs to use two arguments from the pipe and pass them to the mv command.

awk system command with spaces in parameters

I am trying to run the following to extract the text from all the pdfs
find *.pdf | awk '{system("pdftotext "$0)}'
but dang it some crazy person put spaces in file names, how can I deal with this smoothly?
What is awk's role in this? Perhaps you should let find execute things itself.
find . -name \*.pdf -exec /path/to/pdftotext {} \;
Or if you're really really stuck with assuming that filenames will be safe as stdout to find (which you've proven they are not simply by asking this question), then put the filenames in quotes. This will work:
find . -name \*.pdf -print | awk '{cmd=sprintf("pdftotext \"%s\"", $0);system(cmd);}'