Regex - find lines starting with 2 dashes containing single speech mark - sql

Starting with this sample text:
-- Search for relevant Title, second half of the screen, under "Context Field Values" lists the main parts of the Flexfield
-- This lists the bits users see in Core Applications when they click into the DFF plus shows if there is a LOV linked to the field
-- It's a test
-- So is this
SELECT fat.application_name
, fdfv.title
, fdfv.application_table_name
How can I use a RegEx in Notepad++ to find any lines starting with -- and containing a single speech mark '?, so that only this line is returned:
-- It's a test
I tried a silly amount of things, such as:
[^--']
[^--*']
[*--*']
--[']
[--][']
[']
^--[']
^--*\'*
^--*'*
But as you can see, I'm not too clever!

You may use this regex to match a line starting with -- containing only a single ':
^\s*--[^'\r\n]*'[^'\r\n]*$
Make sure to keep MULTILINE mode on since we are using anchor ^.
RegEx Demo
RegEx Breakup:
^: Start
\s*: Match 0 or more whitespaces
--: Match --
[^'\r\n]*: Match 0 or more of any char that is not ' and not a line break
': Match a '
[^'\r\n]*: Match 0 or more of any char that is not ' and not a line break
$: End

Ctrl+F
Find what: ^--.*'.*$
TICK Wrap around
SELECT Regular expression
UNTICK . matches newline
Find All in Current Document
Explanation:
^ # beginning of line
-- # 2 dashes
.* # 0 or more any character but newline
' # a single quote
.* # 0 or more any character but newline
$ # end of line
Screenshot:

Related

How do i add reg exp to sed?

action: 'blah' 'blah'
Need to remove anything that is after action: in a file
sed -i 's/action:\*//g' tes1
This does not do anything.
It appears you are confusing globbing with regular expressions. In your question you have:
Need to remove anything that is after action: in a file
sed -i 's/action:\*//g' tes1
This does not do anything.
Of course it doesn't if the input is "action: 'blah' 'blah'". The only thing that would be matched by your attempt would be a literal:
action:*
(the glob, e.g. wildcard, '*' matches any number of characters as a shell expression, not a regular expression. As a regular expression, it controls repetition)
What you want to do is match the "action:" portion and then match everything that follows it, replacing what follows with nothing. The general match any number of characters until end of line is .*$ Where '.' matches any character, the repetition '*' matches zero-or-more times and finally the '$' anchors the expression to the end of line.
Put another way, you want to match "action:" at the beginning of the line, followed by any number of additional characters, and you want to replace all additional characters with nothing. To anchor action: to the beginning of the line you use the circumflex '^', e.g. "^action:" to ensure you only match "action:" and not "any action:".
Putting it altogether, you could use:
sed -i 's/^action:.*$/action:/' tes1
Which in sum matches "action: and to end of line" and replaces the entire expression with "action:" leaving you with your desired line containing only "action:".

Teradata regular expressions, 0 or 1 spaces

In Teradata, I'm looking for one regular expression pattern that would allow me to find a pattern of some numbers, then a space or maybe no space, and then 'SF'. It should return 7 in both cases below:
SELECT
REGEXP_INSTR('12345 1000SF', pattern),
REGEXP_INSTR('12345 1000 SF', pattern)
Or, my actual goal is to extract the 1000 in both cases if there's an easier way, probably using REGEXP_SUBSTR. More details are below if you need them.
I have a column that contains free text and I would like to extract the square footage. But, in some cases, there is a space between the number and 'SF' and in some cases there is not:
'other stuff 1000 SF'
'other stuff 1000SF'
I am trying to use the REGEXP_INSTR function to find the starting position. Through google, I have found the pattern for the first to be
'([0-9])+ SF'
When I try the pattern for the second, I try
'([0-9])+SF'
and I get the error
SELECT Failed. [2662] SUBSTR: string subscript out of bounds
I've also found an answer to a similar questions, but they don't work for Teradata. For example, I don't think you can use ? in Teradata.
The error message indicates you're using SUBSTR, not REGEXP_SUBSTR.
Try this:
RegExp_Substr(col, '[0-9]*(?= {0,1}SF)')
Find multiple digits followed by a single optional blank followed by SF and extract those digits.
I would pattern it like this:
\b(\d+)\s*[Ss][Ff]\b
\b # word boundary
(\d+) # 1 or more digits (captured)
\s* # 0 or more white-space characters
[Ss] # character class
[Ff] # character class
\b # word boundary
Demo

Oracle SQL Reg Exp check email

I want to check if an email address fits a pattern:
-Only letters, numbers, and '.' or '_' symbols.
-The last part (ex: .com) must contain between 2 and 4 letters.
This is my Reg Exp: '[a-zA-Z0-9._]+#[a-zA-Z0-9._]+.[a-zA-Z]{2,4}'
The problem is that it accepts symbols like %, and .commmm is accepted as the last part. How could I solve it?
The main problems are actually two here:
You are using an unescaped . outside the character class that may match any symbol (but a newline)
You are not using anchors ^ and $, and thus you may match substring inside a larger string.
Use
'^[a-zA-Z0-9._]+#[a-zA-Z0-9._]+[.][a-zA-Z]{2,4}$'
^ ^^^ ^
When you place a . into a pair of square brackets, you match a literal period.
I think you just need ^ and $ to specify the beginning and end of the string:
'^[a-zA-Z0-9.]+#[a-zA-Z0-9.]+.[a-zA-Z]{2,4}$'
You might want to slightly adjust the rules so the email and domain cannot start with a period:
'^\w[a-zA-Z0-9.]*#\w[a-zA-Z0-9.]*.[a-zA-Z]{2,4}$'

How to prevent two succeeding spaces in an Antlr rule?

As a lexer rule I'd like to match a string according to these rules:
must not contain tabs (\t) or line breaks (\r, \n)
must not contain two succeeding spaces
can contain all other characters, including single spaces
I came up with:
STRING: ~[\t\r\n]*
But I don't know how to prevent succeeding spaces.
This will do it:
STRING:
(
~[\t\r\n ] // non-whitespace
| ' ' ~[\t\r\n ] // or single space followed by non-whitespace
)+
' '? // may optionally end in a space (if desired, remote the line otherwise)
;

Fortran read statement reading beyond an end of line

do you know if the following statement is guaranteed to be true by one of the fortran 90/95/2003 standards?
"Suppose a read statement for a character variable is given a blank line (i.e., containing only white spaces and new line characters). If the format specifier is an asterisk (*), it continues to read the subsequent lines until a non-blank line is found. If the format specifier is '(A)', a blank string is substituted to the character variable."
For example, please look at the following minimal program and input file.
program code:
PROGRAM chk_read
INTEGER, PARAMETER :: MAXLEN=30
CHARACTER(len=MAXLEN) :: str1, str2
str1='minomonta'
read(*,*) str1
write(*,'(3A)') 'str1_start|', str1, '|str1_end'
str2='minomonta'
read(*,'(A)') str2
write(*,'(3A)') 'str2_start|', str2, '|str2_end'
END PROGRAM chk_read
input file:
----'input.dat' content is below this line----
yamanakako
kawaguchiko
----'input.dat' content is above this line----
Please note that there are four lines in 'input.dat' and the first and third lines are blank (contain only white spaces and new line characters). If I run the program as
$ ../chk_read < input.dat > output.dat
I get the following output
----'output.dat' content is below this line----
str1_start|yamanakako |str1_end
str2_start| |str2_end
----'output.dat' content is above this line----
The first read statement for the variable 'str1' seems to look at the first line of 'input.dat', find a blank line, move on to the second line, find the character value 'yamanakako', and store it in 'str1'.
In contrast, the second read statement for the variable 'str2' seems to be given the third line, which is blank, and store the blank line in 'str2', without moving on to the fourth line.
I tried compiling the program by Intel Fortran (ifort 12.0.4) and GNU Fortran (gfortran 4.5.0) and got the same result.
A little bit about a background of asking this question: I am writing a subroutine to read a data file that uses a blank line as a separator of data blocks. I want to make sure that the blank line, and only the blank line, is thrown away while reading the data. I also need to make it standard conforming and portable.
Thanks for your help.
From Fortran 2008 standard draft:
List-directed input/output allows data editing according to the type
of the list item instead of by a format specification. It also allows
data to be free-field, that is, separated by commas (or semicolons) or
blanks.
Then:
The characters in one or more list-directed records constitute a
sequence of values and value separators. The end of a record has the
same effect as a blank character, unless it is within a character
constant. Any sequence of two or more consecutive blanks is treated as
a single blank, unless it is within a character constant.
This implicitly states that in list-directed input, blank lines are treated as blanks until the next non-blank value.
When using a fmt='(A)' format descriptor when reading, blank lines are read into str. On the other side, fmt=*, which implies list-directed I/O in free-form, skips blank lines until it finds a non-blank character string. To test this, do something like:
PROGRAM chk_read
INTEGER :: cnt
INTEGER, PARAMETER :: MAXLEN=30
CHARACTER(len=MAXLEN) :: str
cnt=1
do
read(*,fmt='(A)',end=100)str
write(*,'(I1,3A)')cnt,' str_start|', str, '|str_end'
cnt=cnt+1
enddo
100 continue
END PROGRAM chk_read
$ cat input.dat
yamanakako
kawaguchiko
EOF
Running the program gives this output:
$ a.out < input.dat
1 str_start| |str_end
2 str_start| |str_end
3 str_start| |str_end
4 str_start|yamanakako |str_end
5 str_start| |str_end
6 str_start|kawaguchiko |str_end
On the other hand, if you use default input:
read(*,fmt=*,end=100)str
You end up with this output:
$ a.out < input.dat
1 str1_start|yamanakako |str1_end
2 str2_start|kawaguchiko |str2_end
This Part of the F2008 standard draft probably treats your problem:
10.10.3 List-directed input
7 When the next effective item is of type character, the input form
consists of a possibly delimited sequence of zero or more
rep-char s whose kind type parameter is implied by the kind of the
effective item. Character sequences may be continued from the end of
one record to the beginning of the next record, but the end of record
shall not occur between a doubled apostrophe in an
apostrophe-delimited character sequence, nor between a doubled quote
in a quote-delimited character sequence. The end of the record does
not cause a blank or any other character to become part of the
character sequence. The character sequence may be continued on as many
records as needed. The characters blank, comma, semicolon, and slash
may appear in default, ASCII, or ISO 10646 character sequences.