SQL Regular Expressions

SQL Regular Expressions - sql

I created the following SQL regex pattern for matching an ISBN:
CREATE RULE ISBN_Rule AS #value LIKE 'ISBN\x20(?=.{13}$)\d{1,5}([-])\d{1,7}\1\d{1,6}\1(\d|X)$'
I used the following values as test data; however, the data is not being committed:
ISBN 0 93028 923 4 | ISBN 1-56389-668-0 | ISBN 1-56389-016-X
Where am I wrong?

You can do this using LIKE.
You'll need some ORs to deal with the different ISBN 10 and 13 formats
For the above strings:
LIKE 'ISBN [0-9][ -][0-9][0-9][0-9][0-9][0-9][ -][0-9][0-9][0-9][ -][0-9X]'

The LIKE operator in SQL Server isn't a regex operator. You can do some complicated pattern matching, but its not normal regex syntax.
http://msdn.microsoft.com/en-us/library/ms179859.aspx

SQL Server 2005 does not support REGEX expressions out of the box, you would need OLE Automation or a CLR to provide that functionality through a UDF.
The only supported wildcards are % (any) and _ (one), and character range (or negation) matches using [] optionally [^]. So your expression
'ISBN\x20(?=.{13}$)\d{1,5}([- ])\d{1,7}\1\d{1,6}\1(\d|X)$'
Means something very weird with the range [- ] and everything else being literal.

If it splits on | and doesen't strip whitespaces, its probably missing a space before ISBN and/or after (\d|X) here $ .. Also, I doubt this is the problem, but [- ] could be [ -]
edit: ok, well keep this in mind when you get a regex lib/control.

Related

Get everything after a string pattern and before a ' ' in Databricks SQL

I got the following entry in my database with column name - properties_desc:
#Thu Sep 03 02:18:11 UTC 2020 cardType=MasterCard cardDebit=true cardUniqueNumber=f0b03da93bc70fbc194a5a4ef5879685
I want to trim the entry so I get: MasterCard
So basically, I want everything after 'cardType=' and before ''.
I tried referring this Get everything after and before certain character in SQL Server
but this works for a special character and not a string.
My try:
SUBSTRING(properties_desc, length(SUBSTRING(properties_desc, 0, length(properties_desc) - CHARINDEX ('cardType=', properties_desc))) + 1,
length(properties_desc) - length(SUBSTRING(properties_desc, 0, length(properties_desc) - CHARINDEX ('cardType=', properties_desc))) - length(SUBSTRING(
properties_desc, CHARINDEX (' ', properties_desc), length(properties_desc))))
But the above query does not work. Any help is appreciated.
How can I solve it?

You have tagged this question as both sql-server and databricks. Based on your use of length() instead of len(), I assume that you are using databricks. In that case, you can make use of the regexp_extract() function
Try: "regexp_extract(properties_desc, '(?<=cardType=)[^ ]*')".
This is untested, as I am not a databricks programmer.
The "[^ ]*" in the above will match and extract a string of non-space characters after "cardType=". The "(?<=...)" is a "look-behind" construct that requires that the matched text be preceded by "cardType=", but does not include that text in the result. The end result is that the regex matches and extracts everything after "cardtype=" up to the next space (or the end of the string).
Regular expressions are a pretty powerful string matching tool. Well worth learning if you are not already familiar with them. (I wish SQL Server had them.)

String manipulation with Replace in SQL

I am using a replace function to add some quotes around a couple of keywords.
However, this replacement doesn't work for a few cases like the one below.
See example below.
This is the query:
replace(replace(aa.SourceQuery,'sequence','"sequence"'),'timestamp','"timestamp"')
Before:
select timestamp, SparkTimeStamp
from SparkRecordCounts
After:
select "timestamp", Spark"timestamp"
from SparkRecordCounts
However, I want it to be like:
select "timestamp", Sparktimestamp
from SparkRecordCounts

EDIT I wrote this before knowing what RDBMS you were using but have left it in case it helps someone else.
I think you are looking for word boundaries in your replacement, which are generally a job for regular expressions.
Oracle has one built in, called regexp_replace, and you could use something like this:
regexp_replace(aa.SourceQuery, '(^|\s|\W)timestamp($|\s|\W)', '\1"timestamp"\2')
The regular expression looks at the start for:
^ - the start of the line OR
\s - a space character OR
\W - a non-word character
It then matches timestamp, and must end with:
$ - the end of the line OR
\s - a space character OR
\W - a non-word character
Then, and only then, does it perform the replace. \1 and \2 are used to preserve what word boundary matched at the beginning and ending of the word.
I'm not sure how other databases handle regexp_replace, it looks like mysql can via a plugin like this but there may not be a native method.
SQL Server has a solution to something similar here

howto cut text from specific character in sqlite query

SQLITE Query question:
I have a query which returns string with the character '#' in it.
I would like to remove all characters after this specific character '#':
select field from mytable;
result :
text#othertext
text2#othertext
text3#othertext
So in my sample I would like to create a query which only returns :
text
text2
text3
I tried something with instr() to get the index, but instr() was not recognized as a function -> SQL Error: no such function: instr (probably old version of db . sqlite_version()-> 3.7.5).
Any hints howto achieve this ?

There are two approaches:
You can rtrim the string of all characters other than the # character.
This assumes, of course, that (a) there is only one # in the string; and (b) that you're dealing with simple strings (e.g. 7-bit ASCII) in which it is easy to list all the characters to be stripped.
You can use sqlite3_create_function to create your own rendition of INSTR. The specifics here will vary a bit upon how you're using

How can I perform a SQL SELECT with a LIKE condition for a string containing an open bracket character?

I have a simple search query:
<cfquery name="_qSearch" dbtype="Query">
SELECT
*
FROM MyQoQ
WHERE
DESCRIPTION LIKE '%#URL.searchString#%'
</cfquery>
This query works excellently for most values. However, if someone searches for a value like "xxx[en", it bombs with the error message The pattern of the LIKE conditional is malformed..
Is there any way around this, since the bracket has a special use in CFQUERY?

QoQ shares a feature of TSQL (MS SQL Server) whereby it's not just % and _ that are wildcards in LIKE - it also supports regex-style character classes, as in[a-z] for any lowercase letter.
To escape these values and match the literal equivalents, you can use a character class itself, i.e. [[] will match a literal [, and of course you probably also want to escape any % and _ in the user input - you can do all three like so:
'%#Url.SearchString.replaceAll('[\[%_]','[$0]')#%'
That is just a simple regex replace (using String.replaceAll) to match all instances of [ or % or _ and wrap each one in [..] - the $0 on the replacement side represents the matched text.

How can I extract field names from SQL with Perl?

I have a series of select statements in a text file and I need to extract the field names from each select query. This would be easy if some of the fields didn't use nested functions like to_char() etc.
Given select statement fields that could have several nested parenthese like:
ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name,
Or the simple case of just base_field_name as a field, what would the regex look like in Perl?

Don't try to write a regex parser (though perl regexes can handle nested patterns like that), use SQL::Statement::Structure.

Why not ask the target database itself how it would interpret the queries?
In perl, one can use the DBI to query the prepared representation of a SQL query. Sometimes this is database-specific: some drivers (under the perl DBD:: namespace) support their RDBMS' idea of describing statements in ways analogous to the RDBMS' native C or C++ API.
It can be done generically, however, as the DBI will put the names of result columns in the statement handle attribute NAME. The following, for example, has a good chance of working on any DBI-supported RDBMS:
use strict;
use warnings;
use DBI;
use constant DSN => 'dbi:YouHaveNotToldUs:dbname=we_do_not_know';
my $dbh = DBI->connect(DSN, ..., { RaiseError => 1 });
my $sth;
while (<>) {
next unless /^SELECT/i; # SELECTs only, assume whole query on one line
chomp;
my $sql = /\bWHERE\b/i ? "$_ AND 1=0" : "$_ WHERE 1=0"; # XXX ugly!
eval {
$sth = $dbh->prepare($sql); # some drivers don't know column names
$sth->execute(); # until after a successful execute()
};
print $#, next if $#; # oops, problem with that one
print join(', ', #{$sth->{NAME}}), "\n";
}
The XXX ugly! bit there tries to append an always-false condition on the SELECT, so that the SQL engine doesn't have to do any real work when you execute(). It's a terribly naive approach -- that /\bWHERE\b/i test is no more correctly identifying a SQL WHERE clause than simple regexes correctly parse out SELECT field names -- but it is likely to work.

In a somewhat related problem at the office I used:
my #SqlKeyWordList = qw/select from where .../; # (1)
my #Candidates =split(/\s/,$SqlSelectQuery); # (2)
my %FieldHash; # (3)
for my $Word (#Candidates) {
next if grep($word,#SqlKeyWordList);
$FieldHash($Word)++;
}
Comments:
SqlKeyWordList contains all the SQL keywords that are potentially in the SQL statement (we use MySQL, there are many SQL dialiects, choosing/building this list is work, look at my comments below!). If someone decided to use a keyword as a field name, you will need a regex after all (beter to refactor the code).
Split the SQL statement into a list of words, this is the trickiest part and WILL REQUIRE tweeking. For now it uses Perl notion of "space" (=not in word) to split. Splitting the field list (select a,b,c) and the "from" portion of the SQL might be advisabel here, depends on your SQL statements.
%MyFieldHash will contain one entry per select field (and gunk, until you validated your SqlKeyWorkList and the regex in (2)
Beware
there is nothing in this code that could not be done in Python.
your life would be much easier if you can influence the creation of said SQL statements. (e.g. make sure each field is written to a comment)
there are so many things that can/will go wrong in this parsing approach, you really should sidestep the issue entirely, by changing the process (saves time in the long run).
this is the regex we use at the office
my #Candidates=split(/[\s
\(
\)
\+
\,
\*
\/
\-
\n
\
\=
\r
]+/,$SqlSelectQuery
);

How about splitting each line into terms (replace every parenthesis, comma and space with a newline), then sorting:
perl -ne's/[(), ]/\n/g; print' < textfile | sort -u
You'll end up with a lot of content like:
fieldname1
fieldname1
formatstring
ltrim
rtrim
t_char

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Regular Expressions - sql

You can do this using LIKE. You'll need some ORs to deal with the different ISBN 10 and 13 formats For the above strings: LIKE 'ISBN [0-9][ -][0-9][0-9][0-9][0-9][0-9][ -][0-9][0-9][0-9][ -][0-9X]'

The LIKE operator in SQL Server isn't a regex operator. You can do some complicated pattern matching, but its not normal regex syntax. http://msdn.microsoft.com/en-us/library/ms179859.aspx

If it splits on | and doesen't strip whitespaces, its probably missing a space before ISBN and/or after (\d|X) here $ .. Also, I doubt this is the problem, but [- ] could be [ -] edit: ok, well keep this in mind when you get a regex lib/control.

Related

Get everything after a string pattern and before a ' ' in Databricks SQL

String manipulation with Replace in SQL

howto cut text from specific character in sqlite query

How can I perform a SQL SELECT with a LIKE condition for a string containing an open bracket character?

How can I extract field names from SQL with Perl?

Categories

Resources