Removing group of random characters - batch-rename

Ok so I have these files
File1 (37362717)
File2 (29190171)
File3 (35515714)
How would a script look to remove the (*) portion of the name?

If you can use powershell, use this command in the directory containing all your files that need rename:
Get-ChildItem File* | Rename-Item -NewName { $_.name -replace " \(\d+\)", "" }
This command renames all files whose name starts with "File" in current directory so that the "space and then parentheses with numbers inside" (\(\d+\)) part will be removed.
Check example 4 in this link to understand how this command works.

Related

Using grep to only obtain first match in EACH file

I have a bunch of output files labelled file1.out, file2.out, file3.out, ...,fileN.out.
All of these files have multiple instances of a string in them called "keystring". However, only the first instance of "keystring" is meaningful to me. The other lines are not required.
When I do
grep 'keystring' *.out
I reach all files, and they output every instance of keystring.
When I do grep -m1 'keystring' *.out I only get the instance when file1.out has keystring.
I want to extract the line where keystring appears FIRST in all these output files. How can I pull this off?
You can use awk:
awk '/keystring/ {print FILENAME ":", $0; nextfile}' *.out
nextfile will move to next file as soon as it has printed first match from current file.
Use find -exec like so:
find . -name '*.out' -exec grep -m1 'keystring' {} \;
SEE ALSO:
GNU find manual

awk output with spaces in first column

I tried using awk splitting the columns to print a sentence but the first column has spaces.
Sample of my beginner code:
$ awk '/Linux/ { print "The filename","\""$1"\"","is located in",$2 }' test.txt
The filename "The" is located in test
The filename "Some" is located in file
The filename "File" is located in name
The filename "Something_here" is located in /ABC
The filename "Another_test" is located in /DEFG
The filename "Label" is located in test
From file: test.txt
Filename Folder Type
-------------------------------------- -------------- ------
The test file /test/folder Linux
Some file / Linux
File name /Temp Linux
Something_here /ABC Linux
Another_test /DEFG Linux
Label test /HIJK Linux
what I want to achieve: (with quotes inclusive)
The filename "Default file" is located in /
The filename "The test file" is located in /test/folder
issue is when i use 'space' or '/' as delimiter i cannot get the whole line when printing
If you have GNU AWK, this should do the trick:
awk 'match($0, /([^\/]+)([^ ]+) *Linux/, arr) { sub(/ +$/, "", arr[1]); printf("The filename \"%s\" is located in %s\n", arr[1], arr[2]) }' test.txt
Explanation:
# match and store groups in 'arr'
# - arr[1]: everything up until the first slash (including a lot of whitespace)
# - arr[2]: first slash until space
# - rest: also ensure there's 'Linux' after that
match($0, /([^\/]+)([^ ]+) *Linux/, arr) {
# trim whitespace from the right hand side of the filename
sub(/ +$/, "", arr[1]);
# print
printf("The filename \"%s\" is located in %s\n", arr[1], arr[2])
}
Note that there is also a less powerful version of match in other flavors of AWK and the same thing could be achieve with those, but you'd have to write a bit more code.
GNU awk has regex field separators, so just require multiple spaces separating your columns.
awk '/Linux/ { print "The file \""$1"\" is in "$2"." }' FS=" *" test.txt
It also offers fixed-width fields, say info gawk fieldwidths, you could use the lengths of the dash lines to set those on the fly.
I would propose sed with the substitution based on regular expression and back references plus a grep command to eliminate the header lines of the source file:
$ cat test.txt | grep -E 'Linux[ ]*$' | sed -E 's%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%'
The filename "The test file" is located in /test/folder
The filename "Some file" is located in /
The filename "File name" is located in /Temp
The filename "Something_here" is located in /ABC
The filename "Another_test" is located in /DEFG
The filename "Label test" is located in /HIJK
A good reference for the regular expressions (regex) is in the Linux manuals
The detailed description as requested in the comment:
grep with -E option accepts the extended regex (reference document above). Here it is used to filter the lines containing "Linux" word followed by some spaces if any at the end of each line
The output of grep goes into the input of sed
sed is passed the -E option like grep to accept extended regex. The s command substitutes a characters matching the regex (first part between % chars = "(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$") by others (second part between % chars = "The filename "\1\2" is located in \4").
The second part uses back references: '\' followed by a nonzero decimal digit n is substituted by the nth parenthesized sub-expression of the regex. Here, \1 is substituted by the string which matches the 1st "(.+)" which is the filename here, \2 is substituted by the following "([^ ])" which is the last char of the filename (trick to suppress the following blanks from the name)...
This is not a rigorous explanation but at least it provides some inputs to go farther.
Another solution is to pass multiple actions on the sed command line. Hence, you can add a query to delete the first 2 header lines to suppress the pipe with cat and grep. Here "1,2d', means "delete lines numbers 1 and 2":
$ sed -E '1,2d;s%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%' test.txt
The filename "The test file" is located in /test/folder
The filename "Some file" is located in /
The filename "File name" is located in /Temp
The filename "Something_here" is located in /ABC
The filename "Another_test" is located in /DEFG
The filename "Label test" is located in /HIJK
NOTES: According to the manual, the -E option switches to using extended regular expressions. It has been supported for years by GNU sed, and is now included in POSIX.
On older systems, -r may be used if -E is not supported:
$ sed -r '1,2d;s%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%' test.txt
The filename "The test file" is located in /test/folder
The filename "Some file" is located in /
The filename "File name" is located in /Temp
The filename "Something_here" is located in /ABC
The filename "Another_test" is located in /DEFG
The filename "Label test" is located in /HIJK

Print File Paths and Filenames within a Directory into a CSV

I have a directory that contains several files. For instance:
File1.bam
File2.bam
File3.bam
I want to create a .csv file that contains 2 columns and includes a header:
Name,Path
File1, /Path/To/File1.bam
File2, /Path/To/File2.bam
File3, /Path/To/File3.bam
I've managed to piece together a way of doing this in separate steps, but it involves creating a csv with the path, appending with the filename, and then appending again with the header. I would like to add both the filename and path in 1 step, so that there is no possibility of linking an incorrect filename and path.
In case it matters, I'm trying to do this within a script that is running in a batch job (SLURM), and the output CSV will be used in subsequent workflow steps.
find ~/Desktop/test -iname '*.csv' -type f >bamlist1.csv
awk '{print FILENAME (NF?",":"") $0}' *.csv > test.csv
{ echo 'Name, Path'; cat bamlist.csv; } > bamdata.csv
Untested but should be close:
find ~/Desktop/test -name '*.bam' -type f |
awk '
BEGIN { OFS=","; print "Name", "Path" }
{ fname=$0; sub(".*/","",fname); print fname, $0 }
' > bamdata.csv
Like your original script the above assumes that none of your file/directory names contain newlines or commas.
If you have GNU find you can just do:
{ echo "Name,Path"; find ~/Desktop/test -name '*.bam' -type f -printf '%f,%p\n' } > bamdata.csv

Save entire file from GREP results

Okay so I'm using grep on a external HDD
example,
M:/
grep -rhI "bananas" . > out.txt
which would output any lines within " M:/ " containing " bananas "
However I would like to output the entire contents of the file, so if one line in example.txt contains " bananas " output entire content of example.txt and same goes for any other .txt file within directory " M:/ " that contains " bananas ".
To print the contents of any file name containing the string bananas would be:
find . -type f -exec grep 'bananas' -l --null {} + | xargs -0 cat
The above uses GNU tools to handle with file names containing newlines.
Forget you ever saw any grep args to recursively find files, adding those args to GNU grep was a terrible idea and just makes your code more complicated. Use find to find files and grep to g/re/p within files.

zcat a file, output its contents to another file based on original filename

I'm looking to create a bash/perl script in Linux that will restore .gz files based on filename:
_path_to_file.txt.gz
_path_to_another_file.conf.gz
Where the underscores form the directory structure.. so the two above would be:
/path/to/file.txt
/path/to/another/file.conf
These are all in the /backup/ directory..
I want to write a script that will cat each .gz file into its correct location by changing the _ to / to find the correct path - so that the contents of _path_to_another_file.conf.gz replaces the text in /path/to/another/file.conf
zcat _path_to_another_file.conf.gz > /path/to/another/file.conf
I've started by creating a file with the correct destination filenames in it.. I could create another file to list the original filenames in it and have the script go through line by line?
ls /backup/ |grep .gz > /backup/backup_files && sed -i 's,_,\/,g' /backup/backup_files && cat /backup/backup_files
Whatcha think?
Here's a Bash script that should do what you want :
#!/bin/bash
for f in *.gz; do
n=$(echo $f | tr _ /)
zcat $f > ${n%.*}
done
It loops over all files that end with .gz, and extracts them into the path represented by their filename with _ replaced with /.
That's not necessarily an invertible mapping (what if the original file is named high_scores for instance? is that encoded specially, e.g., with double underscore as high__scores.gz?) but if you just want to take a name and translate _ to / and remove .gz at the end, sed will do it:
for name in /backup/*.gz; do
newname=$(echo "$name" |
sed -e 's,^/backup/,,' \
-e 's,_,/,g' \
-e 's/\.gz$//')
echo "zcat $name > $newname"
done
Make sure it works right (the above is completely untested!) then take out the echo, leaving:
zcat "$name" > "$newname"
(the quotes protect against white space in the names).