Using grep or sed in a foreach loop won't work - variables

I've spent countless hours trying to get this work and I think it's time to get some help. I have a 2-column file - let's call it "result.txt" with a list of values like this:
fileA.ext -10.3
fileB.ext -9.8
fileC_1.ext -9.7
fileC_2.ext -9.5
fileD.ext -9.4
fileC_3.ext -9.3
I want to recreate this list using only unique results for each file type, so it should look like this:
fileA.ext -10.3
fileB.ext -9.8
fileC_1.ext -9.7
fileD.ext -9.4
I created a list of files which would be able to do this by using grep or sed to extract the first line containing the matching file:
fileA
fileB
fileC
fileD
We'll call this result2.txt.
I have attempted to write the following c-shell script:
foreach l (`cat result2.txt`)
set name = "$l"
echo "$name"
grep -m1 "$name" result.txt >> result3.txt
end
The output file, "result3.txt" is empty. The script runs perfectly up to the grep command. When I run the grep command outside of the loop, using a line from result2.txt, it works fine. I get the same result using this: sed -n '/"\$name\"/p'
And I think I tried an awk command at some point.
The problem seems to be in getting those programs to recognise the $name or $l variables. I have tried different combinations of " and ' around $name and I have tried adding backslashes: e.g. $\name. Can anyone please tell me what the issue is?
Thanks

Sounds like a job for awk. Use underscore or whitespace as the field separator, and print a line only if the first field has not been seen yet:
awk -F '[_[:space:]]+' '!seen[$1]++' << END
fileA.ext -10.3
fileB.ext -9.8
fileC_1.ext -9.7
fileC_2.ext -9.5
fileD.ext -9.4
fileC_3.ext -9.3
END
fileA.ext -10.3
fileB.ext -9.8
fileC_1.ext -9.7
fileD.ext -9.4

I've just tried in CSH and both your version and the following simplified version just work. Note, no quotation marks at all.
foreach name (`cat result2.txt`)
grep -m1 $name result.txt >>result3.txt
end
Could you please check whether result.txt really contains what you mentioned at the beginning?
cat result.txt

sed -n 's/.*/²&³/;H
$ {x;s/\(.\).*/&\1/
t again
: again
s/²\([^_]\{1,\}_\)\(.*\)\²\1[^³]*³./²\1\2/
t again
s/.\(.*\)./\1/;s/[²³]//g
p
}' YourFile
Use of 2 temporary delimiter ² and ³ due to limitation in \n manipulation

Related

How to use sed/awk to replace the original file and get the following desired output?

I'm writing a bash scrip that would translate one file to another, and am encountering an issue.
Whenever the program sees something like this(......not included):
......Mul(-a1+b2-c3...+f+e)......
change it to:
......M(-a1)*M(b2)*M(-c3)*...*M(f)*M(e)......
the number of the variables in Mul is unknown and there could be multiple occurrence of Mul in the file. There are also other places in the file where + or - appears. And Variables could be one or more characters.
I tried grouping in sed, with a group followed by a "*", but it doesn't seem to be working due to the need of replacing unknown amount of variables.
Here is a sed script that will do it:
:a
s/\(Mul(.[^)]*\)\([+-].\)/\1)*Mul(\2/
ta
s/Mul(+\{0,1\}/M(/g
The trick is to use the test to jump back to the beginning after making a substitution (e.g. "Mul(a+b+c)"=>"Mul(a)*Mul(+b+c)").
$ cat tst.awk
match($0,/Mul\([^()]+\)/) {
tgt = substr($0,RSTART+4,RLENGTH-5)
gsub(/[-+][[:alnum:]]+/,"*M(&)",tgt)
gsub(/\+/,"",tgt)
sub(/^\*/,"",tgt)
print substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH)
}
$ awk -f tst.awk file
......M(-a1)*M(b2)*M(-c3)*M(f)*M(e)......
The above was run on this input file:
$ cat file
......Mul(-a1+b2-c3+f+e)......

Find a word in a text file and replace it with the filename

I have a lot of text files in which I would like to find the word 'CASE' and replace it with the related filename.
I tried
find . -type f | while read file
do
awk '{gsub(/CASE/,print "FILENAME",$0)}' $file >$file.$$
mv $file.$$ >$file
done
but I got the following error
awk: syntax error at source line 1 context is >>> {gsub(/CASE/,print <<< "CASE",$0)}
awk: illegal statement at source line 1
I also tried
for i in $(ls *);
do
awk '{gsub(/CASE/,${i},$0)}' ${i} > file.txt;
done
getting an empty output and
awk: syntax error at source line 1 context is >>> {gsub(/CASE/,${ <<<
awk: illegal statement at source line 1
Why awk? sed is what you want:
while read -r file; do
sed -i "s/CASE/${file##*/}/g" "$file"
done < <( find . -type f )
or
while read -r file; do
sed -i.bak "s/CASE/${file##*/}/g" "$file"
done < <( find . -type f )
To create a backup of the original.
You didn't post any sample input and expected output so this is a guess but maybe this is what you want:
find . -type f |
while IFS= read -r file
do
awk '{gsub(/CASE/,FILENAME)} 1' "$file" > "${file}.$$" &&
mv "${file}.$$" "$file"
done
Every change I made to the shell code is important so if you don't understand why I changed any part of it, ask the question.
btw if after making the changes you are still getting the error message:
awk: syntax error at source line 1
awk: illegal statement at source line 1
then you are using old, broken awk (/usr/bin/awk on Solaris). Never use that awk. On Solaris use /usr/xpg4/bin/awk (or nawk if you must).
Caveats: the above will fail if your file name contains newlines or ampersands (&) or escaped digits (e.g. \1). See Is it possible to escape regex metacharacters reliably with sed for details. If any of that is a problem, post some representative sample input and expected output.
print in that first script is the error.
The second argument to gsub is the replacement string not a command.
You want just FILENAME. (Note not "FILENAME" that's a literal string. FILENAME the variable.)
find . -type f -print0 | while IFS= read -d '' file
do
awk '{gsub(/CASE/,FILENAME,$0)} 7' "$file" >"$file.$$"
mv "$file.$$" "$file"
done
Note that I quoted all your variables and fixed your find | read pipeline to work correctly for files with odd characters in the names (see Bash FAQ 001 for more about that). I also fixed the erroneous > in the mv command.
See the answers on this question for how to properly escape the original filename to make it safe to use in the replacement portion of gsub.
Also note that recent (4.1+ I believe) versions of awk have the -i inplace argument.
To fix the second script you need to add the quotes you removed from the first script.
for i in *; do awk '{gsub(/CASE/,"'"${i}"'",$0)}' "${i}" > file.txt; done
Note that I got rid of the worse than useless use of ls (worse than useless because it actively breaks files with spaces or shell metacharacters in the their names (see Parsing ls for more on that).
That command though is somewhat ugly and unsafe for filenames with various characters in them and would be better written as the following though:
for i in *; do awk -v fname="$i" '{gsub(/CASE/,fname,$0)}' "${i}" > file.txt; done
since that will work with filenames with double quotes/etc. in their names correctly whereas the direct variable expansion version will not.
That being said the corrected first script is the right answer.

grep a number from the line and append it to a file

I went through several grep examples, but don't see how to do the following.
Say, i have a file with a line
! some test here and number -123.2345 text
i can get this line using
grep ! input.txt
but how do i get the number (possibly positive or negative) from this line and append it to the end of another file? Is it possible to apply grep to grep results?
If yes, then i could get the number via something like
grep -Eo "[0-9]{1,}|\-[0-9]{1,}"
p/s/ i am using OS-X
p/p/s/ i'm trying to fetch data from several files and put into a single file for later plotting.
The format with your commands would be:
grep ! input.txt | grep -Eo "[0-9]{1,}|\-[0-9]{1,}" >> output
To grep from grep we use the pipe operator | this lets us chain commands together. To append this output to a file we use the redirection operator >>.
However there are a couple of problems. You regexp is better written: grep -Eoe '-?[0-9.]+' this allows for the decimal and returns the single number instead of two and if you want lines that start with ! then grep ^! is better to avoid matches with lines what contain ! but don't start with it. Better to do:
grep '^!' input | grep -Eoe '-?[0-9.]+' >> output
perl -lne 'm/.*?([\d\.\-]+).*/g;print $1' your_file >>anotherfile_to_append
$foo="! some test here and number -123.2345 text"
$echo $foo | sed -e 's/[^0-9\.-]//g'
$-123.2345
Edit:-
for a file,
[ ]$ cat log
! some test here and number -123.2345 text
some blankline
some line without "the character" and with number 345.566
! again a number 34
[ ]$ sed -e '/^[^!]/d' -e 's/[^0-9.-]//g' log > op
[ ]$ cat op
-123.2345
34
Now lets see the toothpicks :) '/^[^!]/d' / start of pattern, ^ not (like multiply with false), [^!] anyline starting with ! and d delete. Second expression, [^0-9.-] not matching anything within 0 to 9, and . and -, (everything else) // replace with nothing (i.e. delete) and done :)

Show filename and line number in grep output

I am trying to search my rails directory using grep. I am looking for a specific word and I want to grep to print out the file name and line number.
Is there a grep flag that will do this for me? I have been trying to use a combination of -n and -l but these are either printing out the file names with no numbers or just dumping out a lot of text to the terminal which can't be easily read.
ex:
grep -ln "search" *
Do I need to pipe it to awk?
I think -l is too restrictive as it suppresses the output of -n. I would suggest -H (--with-filename): Print the filename for each match.
grep -Hn "search" *
If that gives too much output, try -o to only print the part that matches.
grep -nHo "search" *
grep -rin searchstring * | cut -d: -f1-2
This would say, search recursively (for the string searchstring in this example), ignoring case, and display line numbers. The output from that grep will look something like:
/path/to/result/file.name:100: Line in file where 'searchstring' is found.
Next we pipe that result to the cut command using colon : as our field delimiter and displaying fields 1 through 2.
When I don't need the line numbers I often use -f1 (just the filename and path), and then pipe the output to uniq, so that I only see each filename once:
grep -ir searchstring * | cut -d: -f1 | uniq
I like using:
grep -niro 'searchstring' <path>
But that's just because I always forget the other ways and I can't forget Robert de grep - niro for some reason :)
The comment from #ToreAurstad can be spelled grep -Horn 'search' ./, which is easier to remember.
grep -HEroine 'search' ./ could also work ;)
For the curious:
$ grep --help | grep -Ee '-[HEroine],'
-E, --extended-regexp PATTERNS are extended regular expressions
-e, --regexp=PATTERNS use PATTERNS for matching
-i, --ignore-case ignore case distinctions
-n, --line-number print line number with output lines
-H, --with-filename print file name with output lines
-o, --only-matching show only nonempty parts of lines that match
-r, --recursive like --directories=recurse
Here's how I used the upvoted answer to search a tree to find the fortran files containing a string:
find . -name "*.f" -exec grep -nHo the_string {} \;
Without the nHo, you learn only that some file, somewhere, matches the string.

Solaris awk Troubles

I'm writing a shell script and I need to strip FIND ME out of something like this:
* *[**FIND ME**](find me)*
and assign it to an array. I had the code working flawlessly .. until I moved the script in Solaris to a non-global zone. Here is the code I used before:
objectArray[$i]=`echo $line | nawk -F '*[**|**]' '{print $2}'`
Now Prints:
awk: syntax error near line 1
awk: bailing out near line 1
It was suggested that I try the same command with nawk, but I receive this error now instead:
nawk: illegal primary in regular expression `* *[**|**]` at `*[**|**]`
input record number 1
source line number 1
Also, /usr/xpg4/bin/awk does not exist.
I think you need to be clearer on what you want to get. For me your awk line doesn't 'strip FIND ME out'
echo "* *[**FIND ME**](find me)*" | nawk -F '* *[**|**]' '{print $2}'
[
So it would help if you gave some examples of the input/output you are expecting. Maybe there's a way to do what you want with sed?
EDIT:
From comments you actually want to select "FIND ME" from line, not strip it out.
I guess the dialect of regular expressions accepted by this nawk is different than gawk. So maybe a tool that's better suited to the job is in order.
echo "* *[**FIND ME**](find me)*" | sed -e"s/.*\* \*\[\*\*\(.[^*]*\)\*\*\].*/\1/"
FIND ME
quote your $line variable like this: "$line". If still doesn't work, you can do it another way with nawk, since you only want to find one instance of FIND ME,
$ echo "$line" | nawk '{gsub(/.*\*\[\*\*|\*\*\].*/,"");print}'
FIND ME
or if you are using bash/ksh on Solaris,
$ line="${line#*\[\*\*}"
$ echo "${line%%\*\*\]*}"
FIND ME