How can I convert SQL comments with -- to # using Perl? - sql

UPDATE:
This is what works!
fgrep -ircl --include=*.sql -- -- *
I have various SQL files with '--' comments and we migrated to the latest version of MySQL and it hates these comments. I want to replace -- with #.
I am looking for a recursive, inplace replace one-liner.
This is what I have:
perl -p -i -e 's/--/# /g'` ``fgrep -- -- *
A sample .sql file:
use myDB;
--did you get an error
I get the following error:
Unrecognized switch: --did (-h will show valid options).
p.s : fgrep skipping 2 dashes was just discussed here if you are interested.
Any help is appreciated.

The command-line arguments after the -e 's/.../.../' argument should be filenames. Use fgrep -l to return names of files that contain a pattern:
perl -p -i -e 's/--/# /g' `fgrep -l -- -- * `

I'd use a combination of find and inplace sed
find . -name '*.sql' -exec sed -i -e "s/^--/#/" '{}' \;
Note that it will only replace lines beginning with --
The regex will become vastly more complex if you wan't to replace this for example:
INSERT INTO stuff VALUES (...) -- values used for xyz
because the -- might as well be in your data (I guess you don't want to replace those)
INSERT INTO stuff VALUES (42, "<!-- sboing -->") -- values used for xyz

The equivalent of that in script form is:
#!/usr/bin/perl -i
use warnings;
use strict;
while(<>) {
s/--/# /g;
print;
}
If I have several files with comments of the form of --comment and feed any number of names to this script, they are changed in place to # comment You could use find, ls, grep, etc to find the files...
There is nothing per se wrong with using a 1 liner.
Is that what you are looking for?

Related

Script to display only comments from /etc/services file

I need to write a bash script that takes service name as a parameter and display only comment that is after hash symbol in /etc/services but I have no idea how to cut only the comment part.
The ,,it's working solution'' for me is to just:
grep "^$1" /etc/services | awk '{print $3,$4 ...
but I don't think this is a good one
I'm searching for something like:
[find the service] -> print only the part from # till the end of the line
I'm still learning so any solution with explanation or just a hint will be very helpful for me.
Chances are this is what you're looking for:
awk -v svc="$1" '($1==svc) && sub(/[^#]+#/,"")' /etc/services
but without sample input/output it's a guess.
The above will work using any awk in any shell on every Unix box.
Try this:
SERVICE_NAME=linuxconf; grep -Po "^$SERVICE_NAME.*# \K.*$" /etc/services
-P tells grep to use perl regex.
-o trims the output so that it only includes the regex match.
\K tells the regex engine to exclude previously matched part of the string from the match, i.e. only the part after \K will be present in the final match.

Replace character except between pattern using grep -o or sed (or others)

In the following file I want to replace all the ; by , with the exception that, when there is a string (delimited with two "), it should not replace the ; inside it.
Example:
Input
A;B;C;D
5cc0714b9b69581f14f6427f;5cc0714b9b69581f14f6428e;1;"5cc0714b9b69581f14f6427f;16a4fba8d13";xpto;
5cc0723b9b69581f14f64285;5cc0723b9b69581f14f64294;2;"5cc0723b9b69581f14f64285;16a4fbe3855";xpto;
5cc072579b69581f14f6428a;5cc072579b69581f14f64299;3;"5cc072579b69581f14f6428a;16a4fbea632";xpto;
output
A,B,C,D
5cc0714b9b69581f14f6427f,5cc0714b9b69581f14f6428e,1,"5cc0714b9b69581f14f6427f;16a4fba8d13",xpto,
5cc0723b9b69581f14f64285,5cc0723b9b69581f14f64294,2,"5cc0723b9b69581f14f64285;16a4fbe3855",xpto,
5cc072579b69581f14f6428a,5cc072579b69581f14f64299,3,"5cc072579b69581f14f6428a;16a4fbea632",xpto,
For sed I have: sed 's/;/,/g' input.txt > output.txt but this would replace everything.
The regex for the " delimited string: \".*;.*\" .
(A regex for hexadecimal would be better -- something like: [0-9a-fA-F]+)
My problem is combining it all to make a grep -o / sed that replaces everything except for that pattern.
The file size is in the order of two digit Gb (max 99Gb), so performance is important. Relevant.
Any ideas are appreciated.
sed is for doing simple s/old/new on individual strings. grep is for doing g/re/p. You're not trying to do either of those tasks so you shouldn't be considering either of those tools. That leaves the other standard UNIX tool for manipulating text - awk.
You have a ;-separated CSV that you want to make ,-separated. That's simply:
$ awk -v FPAT='[^;]*|"[^"]+"' -v OFS=',' '{$1=$1}1' file
A,B,C,D
5cc0714b9b69581f14f6427f,5cc0714b9b69581f14f6428e,1,"5cc0714b9b69581f14f6427f;16a4fba8d13",xpto,
5cc0723b9b69581f14f64285,5cc0723b9b69581f14f64294,2,"5cc0723b9b69581f14f64285;16a4fbe3855",xpto,
5cc072579b69581f14f6428a,5cc072579b69581f14f64299,3,"5cc072579b69581f14f6428a;16a4fbea632",xpto,
The above uses GNU awk for FPAT. See What's the most robust way to efficiently parse CSV using awk? for more details on parsing CSVs with awk.
If I get correctly your requirements, one option would be to make a three pass thing.
From your comment about hex, I'll consider nothing like # will come in the input so you can do (using GNU sed) :
sed -E 's/("[^"]+);([^"]+")/\1#\2/g' original > transformed
sed -i 's/;/,/g' transformed
sed -i 's/#/;/g' transformed
The idea being to replace the ; when within quotes by something else and write it to a new file, then replace all ; by , and then set back the ; in place within the same file (-i flag of sed).
The three pass can be combined in a single command with
sed -E 's/("[^"]+);([^"]+")/\1#\2/g;s/;/,/g;s/#/;/g' original > transformed
That said, there's probably a bunch of csv parser witch already handle quoted fields that you can probably use in the final use case as I bet this is just an intermediary step for something else later in the chain.
From Ed Morton's comment: if you do it in one pass, you can use \n as replacement separator as there can't be a newline in the text considered line by line.
This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^"]*"[^"]*)*"[^";]*);/\1\n/;ta;y/;/,/;y/\n/;/' file
Replace ;'s inside double quotes with newlines, transpose ;'s to ,'s and then transpose newlines to ;'s.

Use multiline regex on cat output

I've the following file queries.sql that contains a number of queries, structured like this:
/* Query 1 */
SELECT cab_type_id,
Count(*)
FROM trips
GROUP BY 1;
/* Query 2 */
SELECT passenger_count,
Avg(total_amount)
FROM trips
GROUP BY 1;
/* Query 3 */
SELECT passenger_count,
Extract(year FROM pickup_datetime),
Count(*)
FROM trips
GROUP BY 1,
2;
Then I've written a regex, that finds all those queries in the file:
/\*[^\*]*\*/[^;]*;
What I'd like to achieve is the following:
Select all the queries with the regex.
Prefix each query with EXPLAIN ANALYZE
Execute each query and output the results to a new file. That means, query 1 will create a file q1.txt with the corresponding output, query 2 create q2.txt etc.
One of my main challenges (there are no problems, right? ;-)) is, that I'm rather unfamiliar with the linux bash I've to use.
I tried cat queries.sql | grep '/\*[^\*]*\*/[^;]*;' but that doesn't return anything.
So a solution could look like:
count = 0
for query in (cat queries.sql | grep 'somehow-here-comes-my-regex') do
count = $count+1
query = 'EXPLAIN ANALYZE '+query
psql -U postgres -h localhost -d nyc-taxi-data -c query > 'q'$count'.txt'
Except from: that doesn't work and I don't know how to make it work.
You have to omit spaces for variable assignments.
The following script would help. Save it in a file eg.: explain.sh, make it executable using chmod 0700 explain.sh and run in the following way: ./explain.sh query.sql.
#!/bin/bash
qfile="$1"
# number of queries
n="$(grep -oP '(?<=Query )[0-9]+ ' $qfile)"
count=1
for q in $n; do
# Corrected solution, modified after the remarks of #EdMorton
qn="EXPLAIN ANALYZE $(awk -v n="Query $q" 'flag; $0 ~ n {flag=1} /;/{flag=0}' $qfile)"
#qn="EXPLAIN ANALYZE $(awk -v n=$q "flag; /Query $q/{flag=1} /;/{flag=0}" $qfile)"
# psql -U postgres -h localhost -d nyc-taxi-data -c "$qn" > q$count.txt
echo "$qn" > q$count.txt
count=$(( $count + 1 ))
done
First of all, the script accounts for one argument (your example input query.sql file). It reads out the number of queries and save into a variable n. Then in a for loop it iterates through the query numbers and uses awk to extract the number n query and append EXPLAIN ANALYZE to the beginning. Then you can run your psql with the desired query. Here I commented out the psql part. This example script only creates qN.txt files for each explain query.
UPDATE:
The awk part: It is possible to use a shell variable in awk using the -v flag. Here we creates an awk variable n with the value of the q shell variable. n is used to create the starter pattern ie: Query 1. awk -v n="Query $q" 'flag; $0 ~ n {flag=1} /;/{flag=0}' $qfile matches everything between Query 1 and the first occurence of a semi-colon (;) excluding the line of Query 1 from query.sql. The $(...) means command-substitution in bash, thus we can save the output of a shell command into a variable. Here we save the output of awk and prefix it with the EXPLAIN ANALYZE string.
Here is a great answer about awk pattern matching.
It sounds like this is what you're looking for:
awk -v RS= -v ORS='\0' '{print "EXPLAIN ANALYZE", $0}' queries.sql |
while IFS= read -r -d '' query; do
psql -U postgres -h localhost -d nyc-taxi-data -c "$query" > "q$((++count)).txt"
do
The awk statement outputs each query as a NUL-terminated string, the shell loop reads it as such one at a time and calls psql on it. Simple, robust, efficient, etc...

Find a word in a text file and replace it with the filename

I have a lot of text files in which I would like to find the word 'CASE' and replace it with the related filename.
I tried
find . -type f | while read file
do
awk '{gsub(/CASE/,print "FILENAME",$0)}' $file >$file.$$
mv $file.$$ >$file
done
but I got the following error
awk: syntax error at source line 1 context is >>> {gsub(/CASE/,print <<< "CASE",$0)}
awk: illegal statement at source line 1
I also tried
for i in $(ls *);
do
awk '{gsub(/CASE/,${i},$0)}' ${i} > file.txt;
done
getting an empty output and
awk: syntax error at source line 1 context is >>> {gsub(/CASE/,${ <<<
awk: illegal statement at source line 1
Why awk? sed is what you want:
while read -r file; do
sed -i "s/CASE/${file##*/}/g" "$file"
done < <( find . -type f )
or
while read -r file; do
sed -i.bak "s/CASE/${file##*/}/g" "$file"
done < <( find . -type f )
To create a backup of the original.
You didn't post any sample input and expected output so this is a guess but maybe this is what you want:
find . -type f |
while IFS= read -r file
do
awk '{gsub(/CASE/,FILENAME)} 1' "$file" > "${file}.$$" &&
mv "${file}.$$" "$file"
done
Every change I made to the shell code is important so if you don't understand why I changed any part of it, ask the question.
btw if after making the changes you are still getting the error message:
awk: syntax error at source line 1
awk: illegal statement at source line 1
then you are using old, broken awk (/usr/bin/awk on Solaris). Never use that awk. On Solaris use /usr/xpg4/bin/awk (or nawk if you must).
Caveats: the above will fail if your file name contains newlines or ampersands (&) or escaped digits (e.g. \1). See Is it possible to escape regex metacharacters reliably with sed for details. If any of that is a problem, post some representative sample input and expected output.
print in that first script is the error.
The second argument to gsub is the replacement string not a command.
You want just FILENAME. (Note not "FILENAME" that's a literal string. FILENAME the variable.)
find . -type f -print0 | while IFS= read -d '' file
do
awk '{gsub(/CASE/,FILENAME,$0)} 7' "$file" >"$file.$$"
mv "$file.$$" "$file"
done
Note that I quoted all your variables and fixed your find | read pipeline to work correctly for files with odd characters in the names (see Bash FAQ 001 for more about that). I also fixed the erroneous > in the mv command.
See the answers on this question for how to properly escape the original filename to make it safe to use in the replacement portion of gsub.
Also note that recent (4.1+ I believe) versions of awk have the -i inplace argument.
To fix the second script you need to add the quotes you removed from the first script.
for i in *; do awk '{gsub(/CASE/,"'"${i}"'",$0)}' "${i}" > file.txt; done
Note that I got rid of the worse than useless use of ls (worse than useless because it actively breaks files with spaces or shell metacharacters in the their names (see Parsing ls for more on that).
That command though is somewhat ugly and unsafe for filenames with various characters in them and would be better written as the following though:
for i in *; do awk -v fname="$i" '{gsub(/CASE/,fname,$0)}' "${i}" > file.txt; done
since that will work with filenames with double quotes/etc. in their names correctly whereas the direct variable expansion version will not.
That being said the corrected first script is the right answer.

Show filename and line number in grep output

I am trying to search my rails directory using grep. I am looking for a specific word and I want to grep to print out the file name and line number.
Is there a grep flag that will do this for me? I have been trying to use a combination of -n and -l but these are either printing out the file names with no numbers or just dumping out a lot of text to the terminal which can't be easily read.
ex:
grep -ln "search" *
Do I need to pipe it to awk?
I think -l is too restrictive as it suppresses the output of -n. I would suggest -H (--with-filename): Print the filename for each match.
grep -Hn "search" *
If that gives too much output, try -o to only print the part that matches.
grep -nHo "search" *
grep -rin searchstring * | cut -d: -f1-2
This would say, search recursively (for the string searchstring in this example), ignoring case, and display line numbers. The output from that grep will look something like:
/path/to/result/file.name:100: Line in file where 'searchstring' is found.
Next we pipe that result to the cut command using colon : as our field delimiter and displaying fields 1 through 2.
When I don't need the line numbers I often use -f1 (just the filename and path), and then pipe the output to uniq, so that I only see each filename once:
grep -ir searchstring * | cut -d: -f1 | uniq
I like using:
grep -niro 'searchstring' <path>
But that's just because I always forget the other ways and I can't forget Robert de grep - niro for some reason :)
The comment from #ToreAurstad can be spelled grep -Horn 'search' ./, which is easier to remember.
grep -HEroine 'search' ./ could also work ;)
For the curious:
$ grep --help | grep -Ee '-[HEroine],'
-E, --extended-regexp PATTERNS are extended regular expressions
-e, --regexp=PATTERNS use PATTERNS for matching
-i, --ignore-case ignore case distinctions
-n, --line-number print line number with output lines
-H, --with-filename print file name with output lines
-o, --only-matching show only nonempty parts of lines that match
-r, --recursive like --directories=recurse
Here's how I used the upvoted answer to search a tree to find the fortran files containing a string:
find . -name "*.f" -exec grep -nHo the_string {} \;
Without the nHo, you learn only that some file, somewhere, matches the string.