How to remove comments from a file using "grep"? - sql

I have an SQL file that I need to remove all the comments
-- Sql comment line
How can I achieve this in Linux using GREP or other tool?
Best Regards,

The grep tool has a -v option which reverses the sense of the filter. For example:
grep -v pax people
will give you all lines in the people file that don't contain pax.
An example is:
grep -v '^ *-- ' oldfile >newfile
which gets rid of lines with only white space preceding a comment marker. It won't however change lines like:
select blah blah -- comment here.
If you wanted to do that, you would use something like sed:
sed -e 's/ --.*$//' oldfile >newfile
which edits each line removing any characters from " --" to the end of the line.
Keep in mind you need to be careful with finding the string " --" in real SQL statements like (the contrived):
select ' -- ' | colm from blah blah blah
If you have these, you're better off creating/using an SQL parser rather than a simple text modification tool.
A transcript of the sed in operation:
pax$ echo '
...> this is a line with -- on it.
...> this is not
...> and -- this is again' | sed -e 's/ --.*$//'
this is a line with
this is not
and
For the grep:
pax$ echo '
-- this line starts with it.
this line does not
and -- this one is not at the start' | grep -v '^ *-- '
this line does not
and -- this one is not at the start

You can use the sed command as sed -i '/\-\-/d' <filename>

Try using sed on shell:
sed -e "s/(--.*)//" sql.filename

Related

How do I decrement all array indexes in a text file?

Background
I have a text file that looks like the following:
$SomeText.element_[1]="MoreText[3]";\r"
$SomeText.element_[2]="MoreText[6]";\r"
$SomeText.element_[3]="MoreText[2]";\r"
$SomeText.element_[4]="MoreText[1]";\r"
$SomeText.element_[5]="MoreText[5]";\r"
This goes on for over a thousand lines. I want to do the following:
$SomeText.element_[0]="MoreText[3]";\r"
$SomeText.element_[1]="MoreText[6]";\r"
$SomeText.element_[2]="MoreText[2]";\r"
$SomeText.element_[3]="MoreText[1]";\r"
$SomeText.element_[4]="MoreText[5]";\r"
Each line of text in the file should have the left most index reduced by one, with the rest of the text unchanged.
Attempted Solutions
So far I have tried the following...but the issue for me is I do not know how to feed it back into the file properly:
Attempt 1
I tried a double cutting technique:
cat file.txt | cut -d '[' -f2 | cut -d ']' -f1 | xargs -I {} expr {} + 1
This properly outputs all of the indicies reduced by one to the command line.
Attempt 2
I tried using awk with a mix of sed, but this caused by machine to hang:
awk -F'[' '{printf("%d\n", $2-1)}' file.txt | xargs -I {} sed -i 's/\[\d+\]/{}/g' file.txt
Question
How to I properly decrement all of the array indexes in the file by one and properly write the decremented indexes into the right location of the text file?
A Perl one-liner makes this easy, overwriting the input file:
perl -pi -e 's/(\d+)/$1-1/e' your-file-name-here
(assuming the first number on each line is the index)
With simple awk you could try following, written and tested with shown samples.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) count++ substr($0,RSTART+RLENGTH)
}
' Input_file
OR in case your Input_file's count in between [..] is in any order then simply reduce 1 from them as follows.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) substr($0,RSTART+1,RLENGTH)-1 substr($0,RSTART+RLENGTH)
}
' Input_file
With GNU sed and bash:
sed -E "s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e" file
Or, if it is possible that the lines contain ' character:
sed -E "
/\[[0-9]+]/{
s/'/'\\\''/g
s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e
}" file

I need to extract I'd from a Google drive urls with sed, gawk or grep

URLs:
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
I need a single regex for these all urls.
This is what I tried to use but didn't get expected results.
sed -E 's/.*\(folders\)?\(id\)?=?\/?(.*)&?.*/\1/'
Expected results:
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
With your own code updated:
$ cat file
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
$ sed -E 's#.*(folders/|id=)([^?&]+).*#\2#' file
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
$ sed -E 's#.*(folders/|id=)([^?&]+).*#\2#' file | uniq
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
And yours updated to sed -E 's#.*(folders/|id=)(.*)(\?|&|$).*#\2#' would work on GNU sed.
You are using -E, so no need to escape group quotes (), and | means OR.
When matching literal ?, you need to escape it.
And the separator of sed can change to other character, which is # here.
Note uniq will only remove adjacent duplicates, if there're duplicates in different places, change it to sort -u instead.
A GNU grep solution :
$ grep -Poi '(id=|folders/)\K[a-z0-9_-]*' file
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
Also these two give same results, but are more accurate than above shorter sed one:
sed -E 's#.*(folders/|id=)([A-Za-z0-9_-]*).*#\2#'
sed -E 's#.*(folders/|id=)([[:alnum:]_-]*).*#\2#'
Btw, + means one or more occurances, * means zero or more.
A GNU awk version (removes duplicates at the same time):
awk 'match($0,".*(folders/|id=)([A-Za-z0-9_-]+)",m){if(!a[m[2]]++)print m[2]}' file
Could you please try following.
awk 'match($0,/uc\?id=[^&]*|folders\/[^?]*/){value=substr($0,RSTART,RLENGTH);gsub(/.*=|.*\//,"",value);print value}' Input_file
Try this:
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file
Explanations:
.*(id=|folders\/): after any characters(.*) followed by id= or folders/
([^&?/]*): search and capture any characters except &, ? and /
\2: using backreference, matching string is replaced with the second captured text([^&?/]*)
Edit:
To remove duplicate url, just pipe the command to sort then to uniq(because uniq just removes adjacent duplicate lines, you may want to sort the list before):
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file | sort | uniq
As #Tiw suggests in edit, you can also pipe to a single command by using sort with the -u flag:
sed -E 's/.*(id=|folders\/)([^&?/]*).*/\2/' file | sort -u
Using Perl
$ cat rohit.txt
1. https://docs.google.com/uc?id=0B3X9GlR6EmbnQ0FtZmJJUXEyRTA&export=download
2. https://drive.google.com/open?id=1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
3. https://drive.google.com/drive/folders/1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py?usp=sharing
$ perl -lne ' s/.*\/.*..\/(.*)$/$1/g; s/(.*id=)//g; /(.+?)(&|\?|$)/ and print $1 ' rohit.txt
0B3X9GlR6EmbnQ0FtZmJJUXEyRTA
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
1TkLq5C7NzzmbRjd7VGRhauNT9Vaap-Py
$

Using grep-awk and sed in one-row-command result in a "No such file or directory" error

..And I know why:
I have a xml document with lots of information inside. I need to extract what I need and eventually print them on a new file.
The xml (well, part of it.. rows just keeps repeating)
<module classname="org.openas2.processor.receiver.AS2DirectoryPollingModule"
outboxdir="%home%/../../../home/samba/user/Outbound/toMartha/"
errordir="%home%/../../../home/samba/user/Outbound/toMartha/error"
sentdir="%home%/../../../home/samba/user/data/Sent/Martha"
interval="600"
defaults="sender.name=me_myself, receiver.name=Martha"
sendfilename="true"
mimetype="application/standard"/>
<module classname="org.openas2.processor.receiver.AS2DirectoryPollingModule"
outboxdir="%home%/../../../home/samba/user/Outbound/toJosh/"
errordir="%home%/../../../home/samba/user/Outbound/toJosh/error"
sentdir="%home%/../../../home/samba/user/data/Sent/Josh"
interval="600"
defaults="sender.name=me_myself, receiver.name=Josh"
sendfilename="true"
mimetype="application/standard"/>
<module classname="org.openas2.processor.receiver.AS2DirectoryPollingModule"
outboxdir="%home%/../../../home/samba/user/Outbound/toPamela/"
errordir="%home%/../../../home/samba/user/Outbound/toPamela/error"
interval="600"
defaults="sender.name=me_myself, receiver.name=Pamela"
sendfilename="true"
mimetype="application/standard"/>
I need to extract the folder after "Outbound" and clean it from quotes or slashes.
Also, I need to exclude the "/error" so I get only 1 result for each of them.
My command is:
grep -o -v "/error" "Outbound/" config.xml | awk -F"Outbound/" '{print $2}' | sed -e "s/\/\"//g" > /tmp/sync_users
The error is: grep: Outbound/: No such file or directory which of course means that I'm giving to grep too many arguments (?) - If i remove the -v "/error" it would work but would print also the names with "/error".
Can someone help me?
EDIT:
As some pointed out in their example (thanks for the time you put in), I'd need to extract these words based on the sample above:
toMartha
toJosh
toPamela
could be intersting to use sed in this case
sed -e '\#/Outbound/#!d' -e '\#/error"$#d' -e 's#.*/Outbound/##;s#/\{0,1\}"$##' Config.xml
awk version, assuming (for last print) that your line is always 1 folder below Outbound as shown
awk -F '/' '$0 !~ /\/Outbound\// || /\/error"$/ {next} {print $(NF-1)}' Config.xml
Loose the grep altogether:
$ awk '/outboxdir/{gsub(/^.+Outbound\/|\/" *\r?$/,""); print}' file
toMartha
toJosh
toPamela
/^outboxdir/ /outboxdir/only process records that have start with outboxdir on them
gsub remove unwanted parts of the record
added space removal at the end of record and CRLF fix for Windows originated files
To give grep multiples patterns they have to be separated by newlines or specified by multiples pattern option (-e, F,.. ). However -v invert the match as a whole, you can't invert only one.
For what you're after you can use PCRE (-P argument) for the lookaround ability:
grep -o -P '(?<=Outbound\/)[^\/]+(?!.*\/error)' config.xml
Regex demo here
The regex try to
match something not a slash at least once, the [^\/]+
preceded by Outbound/ the positive lookbehind (?<=Outbound\/)
and not followed by something ending with /error, the negative lookahead (?!.*\/error)
With your first sample input:
$ grep -o -P '(?<=Outbound\/)[^\/]+(?!.*\/error)' test.txt
toMartha
toJosh
toPamela
How about:
grep -i "outbound" your_file | awk -F"Outbound/" '{print $2}' | sed -e 's/error//' -e 's/\/\"//' | uniq
Should work :)
You can use match in gawkand capturing group in regex
awk 'match($0, /^.*\/Outbound\/([^\/]+)\/([^\/]*)\/?"$/, a){
if(a[2]!="error"){print a[1]}
}' config.xml
you get,
toMartha
toJosh
toPamela
grep can accept multiple patterns with the -e option (aka --regexp, even though it can be used with --fixed-strings too, go figure). However, -v (--invert-match) applies to all of the patterns as a group.
Another solution would be to chain two calls to grep:
grep -v "/error" config.xml | grep "Outbound/" | awk -F"Outbound/" '{print $2}' | sed -e "s/\/\"//g"

How to extract the final word of a sentence

For a given text file I'd like to extract the final word in every sentence to a space-delimited text file. It would be acceptable to have a few errors for words like Mr. and Dr., so I don't need to try to achieve that level of precision.
I was thinking I could do this with Sed and Awk, but it's been too long since I've worked with them and I don't remember where to begin. Help?
(Output example: For the previous two paragraphs, I'd like to see this):
file Mr Dr precision begin Help
Using this regex:
([[:alpha:]]+)[.!?]
Explanation
Grep can do this:
$ echo "$txt" | grep -o -E '([[:alpha:]]+)[.!?]'
file.
Mr.
Dr.
precision.
begin.
Help?
Then if you want only the words, a second time through:
$ echo "$txt" | grep -o -E '([[:alpha:]]+)[.!?]' | grep -o -E '[[:alpha:]]+'
file
Mr
Dr
precision
begin
Help
In awk, same regex:
$ echo "$txt" | awk '/[[:alpha:]]+[.!?]/{for(i=1;i<=NF;i++) if($i~/[[:alpha:]]+[.!?]/) print $i}'
Perl, same regex, allows capture groups and maybe a little more direct syntax:
$ echo "$txt" | perl -ne 'print "$1 " while /([[:alpha:]]+)[.!?]/g'
file Mr Dr precision begin Help
And with Perl, it is easier to refine the regex to be more discriminating about the words captured:
echo "$txt" | perl -ne 'print "$1 " while /([[:alpha:]]+)(?=[.!?](?:(?:\s+[[:upper:]])|(?:\s*\z)))/g'
file precision begin Help
gawk:
$ gawk -v ORS=' ' -v RS='[.?!]' '{print $NF}' w.txt
file Mr Dr precision begin Help
(Note that plain awk does not support assigning a regular expression to RS.)
This might work for you (GNU sed):
sed -r 's/^[^.?!]*\b(\w+)[.?!]/\1\n/;/\n/!d;P;D' file
For one word per line or use paste for a single line so:
sed -r 's/^[^.?!]*\b(\w+)[.?!]/\1\n/;/\n/!d;P;D' file | paste -sd' '
For another solution just using sed:
sed -r 'H;$!d;x;s/\n//g;s/\b(\w+)[.?!]/\n\1\n/g;/\n/!d;s/[^\n]*\n([^\n]*)\n/ \1/g;s/.//' file
Easy in Perl:
perl -ne 'print "$1 " while /(\w+)[.!?]/g'
-n reads the input line by line.
\w matches a "word character".
\w+ matches one or more word characters.
[.!?] matches any of the sentence-end markers.
/g stands for "globally" - it remembers where the last match occurred and tries to match after it.

sed replace variables while read lines

I'm working on a shell script and i need to change some strings from different lines of a file into a while read statement. The structure need to be like this, because the String_Search and String_result will be calculated on each line.
while read line
do
varA="String_Search"
resA="String_Result"
line=`echo $line | sed -e "s/$varA/$resA"`
echo $line >> outputFile.txt
done < "inputFile.txt"
The script doesn't works and its showing to me this error message:
sed: -e expression #1, char 31: unterminated `s' command
Anyone Can Help Me?
Thanks to All
You need to end the substitution pattern by a slash /
line=`echo $line | sed -e "s/$varA/$resA/"`