REGEX check only when start of line - sql

Suppose we want to keep the entire line of a string only if a particular word say e.g 'test' appears at starting of line.
If it appears anywhere then the entire line should be removed
e.g
if function_test()=5; //here this entire line should be removed
test sample =5; //here this entire should be there

From Oracle 10g R2 on you should be able to use the anchor \A to require the match at the beginning of the string (will only work for single-line strings thus).
http://www.regular-expressions.info/oracle.html

What do you mean by keep / remove lines? Where is this regex supposed to run? I.e. is it a part of an SQL command, or part of a grep, or sg else?
Regarding SQL you can use LIKE operator:
WHERE line LIKE 'test%'
You can use substring too:
WHERE substring(line, 1, 4) = 'test'
Using grep or any other language, you can specify start of line, e.g.:
grep '^test' bigfile.txt

Try...
...
WHEN REGEXP_LIKE(string,'^test','i') THEN
//this is a good line, do what you want or return string;
END
...

Related

How do I print the first occurence of a string after a special character in Hive using reg_extract or split?

I am having a deep dilemma in hive. My data set in Hive looks like this:
##214628##564#7576#7876
#12771#242###256823
###3264###7236473####3
In each instance, I want to print only the first string after the #. So the output should be something like this:
214628
12771
3264
I tried using the reg_extract function, but alas I am getting only NULL values. Since hive doesn't support reg_substr, the following synatax doesn't work:
to_number(trim(regexp_substr(col_name,'[^#]+',1,1)))
Any suggestions are wecome!
You can use regexp_replace and then substr combination.
First remove all multiple occurrences of # from the string using regexp_replace().
regexp_replace(col,'#+','#') -- for data '#####123##' this will produce '#123#'
Then remove first # using substr. And then use instr to fetch everything starting from first till #.
substr(substr(str,2),1, instr(substr(str,2),'#')-1) this will produce '123'
You can see whole sql below.
select substr(substr(str,2),1, instr(substr(str,2),'#')-1) as result
from (
SELECT regexp_replace('#####123##','#+','#') as str) a
I assumed you always have # in the beginning. if you just add if left(str,1)='#'... and handle according to the data.

String manipulation with Replace in SQL

I am using a replace function to add some quotes around a couple of keywords.
However, this replacement doesn't work for a few cases like the one below.
See example below.
This is the query:
replace(replace(aa.SourceQuery,'sequence','"sequence"'),'timestamp','"timestamp"')
Before:
select timestamp, SparkTimeStamp
from SparkRecordCounts
After:
select "timestamp", Spark"timestamp"
from SparkRecordCounts
However, I want it to be like:
select "timestamp", Sparktimestamp
from SparkRecordCounts
EDIT I wrote this before knowing what RDBMS you were using but have left it in case it helps someone else.
I think you are looking for word boundaries in your replacement, which are generally a job for regular expressions.
Oracle has one built in, called regexp_replace, and you could use something like this:
regexp_replace(aa.SourceQuery, '(^|\s|\W)timestamp($|\s|\W)', '\1"timestamp"\2')
The regular expression looks at the start for:
^ - the start of the line OR
\s - a space character OR
\W - a non-word character
It then matches timestamp, and must end with:
$ - the end of the line OR
\s - a space character OR
\W - a non-word character
Then, and only then, does it perform the replace. \1 and \2 are used to preserve what word boundary matched at the beginning and ending of the word.
I'm not sure how other databases handle regexp_replace, it looks like mysql can via a plugin like this but there may not be a native method.
SQL Server has a solution to something similar here

Need to remove GO from string but only if followed or preceded by hidden character or space

I have a string (=SQL query) and I need to remove all GO commands.
That could be done simply like this: REPLACE(<columnname>,'GO','') but strings like 'Be gone!' will suddenly look like 'Be ne!'
So my idea is to use something like this:
REPLACE(<columnname>,'GO' + <hidden character>,'')
But how to do that?
If returns are also a problem, you'll have to nest replace like:
REPLACE(REPLACE(<columnname>,'GO ',''), CHAR(10)+CHAR(13), '').
Note this replaces a char(10)+char(13), which is a windows return (Carriage Return Line Feed). If you (also) have Carriage Returns or Line Feeds without the other, you'll have to correct for that. If you have a combination of possible line endings, you'll have to nest replace even further. This should be the general pattern, though.
replace ([columnA], 'GO' + char(13),'') seems to do the trick.

Regex: match line if previous line satisfies a criteria

What's a regex that will match lines whose previous line starts with a set of characters?
I'm trying to parse M3U files, and I need to match the lines whose preceding line starts with #EXTINF: So if we take this example:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXTINF:11.54
ASMIK_tid_0000250058_m.600000-00000.ts
#EXTINF:8.51
ASMIK_tid_0000250058_m.600000-00001.ts
#EXTINF:11.76
ASMIK_tid_0000250058_m.600000-00002.ts
#EXTINF:10.05
ASMIK_tid_0000250058_m.600000-00003.ts
etc...
I only want to extract these lines:
ASMIK_tid_0000250058_m.600000-00000.ts
ASMIK_tid_0000250058_m.600000-00001.ts
ASMIK_tid_0000250058_m.600000-00002.ts
ASMIK_tid_0000250058_m.600000-00003.ts
I've tried variations on this answer and this: (?#EXT.*\n) but had no luck...
Firstly you have to be sure that the function you are using is matching the whole file instead of line by line, otherwise this is impossible.
Then you would need to specify a lookbehind:
(?<=#EXTINF.*\r\n).*
If your regex implementation does not support lookbehinds OR repetition inside of a lookbehind, you can use two capture groups instead:
(#EXTINF.*\r\n)(.*)
Obviously you would simply ignore the first capture group, but keep all of the data in the second capture group.
If you need to manually specify that the . does not match newlines, you can specify the mode at the beginning of the regex: (?-s)

BASH - Single quote inside double quote for SQL Where clause

I need to send a properly formatted date comparison WHERE clause to a program on the command line in bash.
Once it gets inside the called program, the WHERE clause should be valid for Oracle, and should look exactly like this:
highwater>TO_DATE('11-Sep-2009', 'DD-MON-YYYY')
The date value is in a variable. I've tried a variety of combinations of quotes and backslashes. Rather than confuse the issue and give examples of my mistakes, I'm hoping for a pristine accurate answer unsullied by dreck.
If I were to write it in Perl, the assignment would I think look like this:
$hiwaterval = '11-Sep-2009';
$where = "highwater>TO_DATE(\'$hiwaterval\', \'DD-MON-YYYY\')";
How do I achieve the same effect in bash?
hiwaterval='11-Sep-2009'
where="highwater > TO_DATE('$hiwaterval', 'DD-MON-YYYY')"
optionally add "export " before final variable setting if it is to be visible ourside the current shell.
Have you tried using using double ticks? Like highwater>TO_DATE(''11-Sep-2009'', ''DD-MON-YYYY''). Just a suggestion. I haven't tried it out.
You can assign the where clause like this:
export WHERECLAUSE=`echo "where highwater >TO_DATE('11-Sep-2009', 'DD-MON-YYYY')"`
(with backticks around the echo statement - they're not showing up in my editor here...)
which works with a shell script of the form:
sqlplus /nolog <<EOS
connect $USERNAME/$PASSWD#$DB
select * from test $WHERECLAUSE
;
exit
EOS