Cygwin - grep ( if file contains ) - awk

Okay so basically I have a list of emails in
EMAILS.TXT
Then I have another bunch .txt files containing, email:phonenumber:name
Compiled1.txt
Compiled2.txt
Compiled3.txt
...
Can I use grep or gawk, to search a folder containg compiled1, compiled2 to see if lines contain emails from the .txt file?
So example
email.txt Contains " example#example.com " & " example1#example1.com "
Folder containing
Compiled1.txt & Compiled2.txt have both these lines
Cygwin/Gnuwin outputs lines from compiled1 & compiled2 IF it contains those specified from emails.txt
Output > example#example.com:000000:ExampleUser
example1#example1.com:00010101:ExampleUser2
...

You can use
grep -Fi -f EMAILS.TXT Compiled*.txt
-f EMAILS.TXT uses the lines of EMAILS.TXT as search patterns.
-F disables special treatment of symbols like . or ?.
-i case insensitive search.
Output will be of the form
File-in-which-a-match-was-found:Matched-line-from-that-file
In case of your example:
Compiled1.txt:example#example.com:000000:ExampleUser
Compiled1.txt:example1#example1.com:00010101:ExampleUser2
Compiled2.txt:example#example.com:000000:ExampleUser
Compiled2.txt:example1#example1.com:00010101:ExampleUser2
If your are also interested in the line numbers, add -n to the command.

Related

Search and print list of the any number in the pdf document between "[" "]"

I have PDF document, with information. I need only the number between [] in the document. For instance, [12223-23-56], print as 12223-23-56
In windows that's a one line command with xpdf pdftotext or use poppler utils but that is more than one.exe and you will need to modify the simple code you use to remove ends of strings.
pdftotext -layout "numbers in brackets text.pdf" text.txt & findstr /r [[0-9]] text.txt
if you only want those with - then filter first
>pdftotext -layout "numbers in brackets text.pdf" text.txt & findstr /r [[0-9]] text.txt|find "-"
[1-2-3]
[12223-23-56]
etc. but I have given enough hints to use in your required minimal attempt.

How to specify a file prefix in gawk

I am trying to identify file extensions from a list of filenames extracted from a floppy disk image. The problem is different from this example where files are already extracted from the disk image. I'm new to gawk so maybe it is not the right tool.
ls Sounddsk2.img -a1 > allfilenames
The command above creates the list of filenames shown below.
flute.pt
flute.ss
flute.vc
guitar.pt
guitar.ss
guitar.vc
The gawk command below identifies files ending in .ss
cat allfilenames | gawk '/[fluteguitar].ss/' > ssfilenames
This would be fine when there are just a few known file names. How do I specify a file prefix in a more generic form?
Unless someone can suggest a better one this seems to be the most generic way to express this. It will work for any prefix filename spelt with uppercase letters, lowercase letters and numbers
cat allfilenames | gawk '/[a-zA-Z0-9].ss/' > ssfilenames
Edit
αғsнιη's first suggested answer and jetchisel's comment prompted me to try using gawk without using cat.
gawk '/^([a-zA-Z0-9])\.ss$/' allfilenames > ssfilenames
and this also worked
gawk '/[a-zA-Z0-9]\.ss/' allfilenames > ssfilenames
Please use find command to deal with matching of names of files, with your shown samples you could try following. You could run this command on directory itself and you need not to store file names into a file and then use awk for it.
find . -regextype egrep -regex '.*/(flute|guitar)\.ss$'
Explanation: Simple explanation would be, using find command's capability to add regextype in it(using egrep style here); where giving regex to match file names fulte OR guitar and make sure its ending with ss here.
You might also use grep with -E for extended regexp and use an alter alternation to match either flute or guitar.
ls Sounddsk2.img -a1 | grep -E "^(flute|guitar)\.ss$" > ssfilenames
The pattern matches:
^ Start of string
(flute|guitar) Match either flute or guitar
\.ss Match .ss
$ End of string
The file ssfilenames contains:
flute.ss
guitar.ss
with the regex you come /[fluteguitar].ss/, this matches on lines having one of these characters in it f, l, u, e, g, i, t, a and r (specified within bracket expression [...],duplicated characters count only once) followed by any single character (except newline here) that a single un-escaped dot . matches, then double ss in any place of a line.
you need to restrict the matching by using the start ^ and end $ of line anchors, as well as using the group of match.
awk '/^(flute|guitar)\.ss$/' allFilesName> ssFileNames
to filter only two files names matched with flute.ss and/or guitar.ss. The group match (...|...) is matches on any one of regexpes separated by the pipe as saying logical OR.
if these are just prefixes and to match any files beginning with these characters and end with .ss, use:
awk '/^(flute|guitar).*\.ss$/' allFilesName> ssFileNames

Save entire file from GREP results

Okay so I'm using grep on a external HDD
example,
M:/
grep -rhI "bananas" . > out.txt
which would output any lines within " M:/ " containing " bananas "
However I would like to output the entire contents of the file, so if one line in example.txt contains " bananas " output entire content of example.txt and same goes for any other .txt file within directory " M:/ " that contains " bananas ".
To print the contents of any file name containing the string bananas would be:
find . -type f -exec grep 'bananas' -l --null {} + | xargs -0 cat
The above uses GNU tools to handle with file names containing newlines.
Forget you ever saw any grep args to recursively find files, adding those args to GNU grep was a terrible idea and just makes your code more complicated. Use find to find files and grep to g/re/p within files.

zcat a file, output its contents to another file based on original filename

I'm looking to create a bash/perl script in Linux that will restore .gz files based on filename:
_path_to_file.txt.gz
_path_to_another_file.conf.gz
Where the underscores form the directory structure.. so the two above would be:
/path/to/file.txt
/path/to/another/file.conf
These are all in the /backup/ directory..
I want to write a script that will cat each .gz file into its correct location by changing the _ to / to find the correct path - so that the contents of _path_to_another_file.conf.gz replaces the text in /path/to/another/file.conf
zcat _path_to_another_file.conf.gz > /path/to/another/file.conf
I've started by creating a file with the correct destination filenames in it.. I could create another file to list the original filenames in it and have the script go through line by line?
ls /backup/ |grep .gz > /backup/backup_files && sed -i 's,_,\/,g' /backup/backup_files && cat /backup/backup_files
Whatcha think?
Here's a Bash script that should do what you want :
#!/bin/bash
for f in *.gz; do
n=$(echo $f | tr _ /)
zcat $f > ${n%.*}
done
It loops over all files that end with .gz, and extracts them into the path represented by their filename with _ replaced with /.
That's not necessarily an invertible mapping (what if the original file is named high_scores for instance? is that encoded specially, e.g., with double underscore as high__scores.gz?) but if you just want to take a name and translate _ to / and remove .gz at the end, sed will do it:
for name in /backup/*.gz; do
newname=$(echo "$name" |
sed -e 's,^/backup/,,' \
-e 's,_,/,g' \
-e 's/\.gz$//')
echo "zcat $name > $newname"
done
Make sure it works right (the above is completely untested!) then take out the echo, leaving:
zcat "$name" > "$newname"
(the quotes protect against white space in the names).

Show filename and line number in grep output

I am trying to search my rails directory using grep. I am looking for a specific word and I want to grep to print out the file name and line number.
Is there a grep flag that will do this for me? I have been trying to use a combination of -n and -l but these are either printing out the file names with no numbers or just dumping out a lot of text to the terminal which can't be easily read.
ex:
grep -ln "search" *
Do I need to pipe it to awk?
I think -l is too restrictive as it suppresses the output of -n. I would suggest -H (--with-filename): Print the filename for each match.
grep -Hn "search" *
If that gives too much output, try -o to only print the part that matches.
grep -nHo "search" *
grep -rin searchstring * | cut -d: -f1-2
This would say, search recursively (for the string searchstring in this example), ignoring case, and display line numbers. The output from that grep will look something like:
/path/to/result/file.name:100: Line in file where 'searchstring' is found.
Next we pipe that result to the cut command using colon : as our field delimiter and displaying fields 1 through 2.
When I don't need the line numbers I often use -f1 (just the filename and path), and then pipe the output to uniq, so that I only see each filename once:
grep -ir searchstring * | cut -d: -f1 | uniq
I like using:
grep -niro 'searchstring' <path>
But that's just because I always forget the other ways and I can't forget Robert de grep - niro for some reason :)
The comment from #ToreAurstad can be spelled grep -Horn 'search' ./, which is easier to remember.
grep -HEroine 'search' ./ could also work ;)
For the curious:
$ grep --help | grep -Ee '-[HEroine],'
-E, --extended-regexp PATTERNS are extended regular expressions
-e, --regexp=PATTERNS use PATTERNS for matching
-i, --ignore-case ignore case distinctions
-n, --line-number print line number with output lines
-H, --with-filename print file name with output lines
-o, --only-matching show only nonempty parts of lines that match
-r, --recursive like --directories=recurse
Here's how I used the upvoted answer to search a tree to find the fortran files containing a string:
find . -name "*.f" -exec grep -nHo the_string {} \;
Without the nHo, you learn only that some file, somewhere, matches the string.