Regex for parsing SQL parameters - sql

If I have a query such as SELECT * from authors where name = #name_param, is there a regex to parse out the parameter names (specifically the "name_param")?
Thanks

This is tricky because params can also occur inside quoted strings.
SELECT * FROM authors WHERE name = #name_param
AND string = 'don\'t use #name_param';
How would the regular expression know to use the first #name_param but not the second?
It's a problem that can be solved, but it's not practical to do it in a single regular expression. I had to handle this in Zend_Db, and what I did was first strip out all quoted strings and delimited identifiers, and then you can use regular expressions on the remainder.
You can see the code, because it's open-source.
See functions _stripQuoted() and _parseParameters().
https://github.com/zendframework/zf1/blob/136735e776f520b081cd374012852cb88cef9a88/library/Zend/Db/Statement.php#L200
https://github.com/zendframework/zf1/blob/136735e776f520b081cd374012852cb88cef9a88/library/Zend/Db/Statement.php#L140

Given you have no quoted strings or comments with parameters in them, the required regex would be quite trivial:
#([_a-zA-Z]+) /* match group 1 contains the name only */
I go with Bill Karwin's recommendation to be cautious, knowing that the naïve approach has its pitfalls. But if you kow the data you deal with, this regex would be all you need.

Related

Splitting variable content in SQL

I have a variable in a stored procedure that contains a string of characters like
[Tag]MESSAGE[/Tag]
I need a way to get the MESSAGE part from within the tags.
Any help would be much appreciated
Note: I have tested it on Oracle RDBMS
A more reliable approach is to use REGEXP_REPLACE.
REGEXP_REPLACE(value, pattern)
Example
SELECT REGEXP_REPLACE(
'<Tag>Message</Tag>',
'\s*</?\w+((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*') FROM DUAL;
Just replace "<" with "[" if your tags are different
What you need is this:
SELECT SUBSTRING(ColumnName,CHARINDEX('html_tag',ColumnName)+LEN('html_tag'),CHARINDEX('html_close_tag',ColumnName)-LEN('html_close_tag')) FROM TableName
You'll require to change the html_tag and html_close_tag with your own HTML tag that you want to get rid of.
If the column contains only single tag, simple call of substring function should be enough. Otherwise there will always be some point where regular expression does not suffice since you fall into trap (see this legendary StackOverflow answer).

Regex not working in LIKE condition

I'm currently using Oracle SQL developer and am trying to write a query that will allow me to search for all fields that resemble a certain value but always differ from it.
SELECT last_name FROM employees WHERE last_name LIKE 'Do[^e]%';
So the result that I'm after would be: Give me all last names that start with 'Do' but are not 'Doe'.
I got the square brackets method from a general SQL basics book so I assume any SQL database should be able to run it.
This is my first post and I'd be happy to clarify if my question wasn't clear enough.
In Oracle's LIKE no regular expressions can be used. But you can use REGEXP_LIKE.
SELECT * FROM EMPLOYEES WHERE REGEXP_LIKE (Name, '^Do[^e]');
The ^ at the beginning of the pattern anchors it to the beginning of the compared string. In other words the string must start with the pattern to match. And there is no wildcard needed at the end, as there is no anchor for the end of the string (which would be $). And you seem to already know the meaning of [^e].

How to select values around .(dot) using sql

I am running below query in Teradata :
sel requesttext from dbc.tables
where tablename='old_employee_table'
Result:
alter table DB_NAME.employee_table,no fallback ;
I want to get below result using SQL:
DB_NAME.employee_table
Requesttext can be:
create set table DB_NAME.employee_table;
DB Name and table can occur anywhere in the result. Since .(dot) is joining them that's why i want to split with .(dot).
Basically I need sql which can result me surrounding values of .(dot)
I want DBName and Tablename in result.
I'm not a Teradata person, but this should work for both strings given so far, as long as teradata's regexp_substr() supports positive look-behind and positive look-ahead assertions (I might have the Teradata syntax wrong, so a little tweaking may be needed):
SELECT REGEXP_SUBSTR(requesttext, '(?<= )(\w+\.\w+)(?=[,$]?)', 1, 1)
FROM dbc.tables
WHERE tablename='old_employee_table'
See the regex101 example. Hopefully it translates to Teradata easily.
The regex looks for and returns the words either side of and including the period, when preceded by a space, and followed by an optional comma or the end of the line.
You could do this with either regexp_substr() or strtok().
As Jamie Zawinski said:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
So I would go with the strtok() method. Also I'm lazy and regular expressions are hard.
Function strtok() takes three arguments:
The string being split
The delimiter to split the string
The number of the token to grab.
To get at the <database>.<table> from that string that is returned in your query, we can split by a space, grab the third token, then split that by a comma and grab the first token.
That would look like:
SELECT strtok(strtok(requestText,' ',3),',',1)
FROM dbc.tables
WHERE tablename='old_employee_table'

Return sql rows where field contains ONLY non-alphanumeric characters

I need to find out how many rows in a particular field in my sql server table, contain ONLY non-alphanumeric characters.
I'm thinking it's a regular expression that I need along the lines of [^a-zA-Z0-9] but Im not sure of the exact syntax I need to return the rows if there are no valid alphanumeric chars in there.
SQL Server doesn't have regular expressions. It uses the LIKE pattern matching syntax which isn't the same.
As it happens, you are close. Just need leading+trailing wildcards and move the NOT
WHERE whatever NOT LIKE '%[a-z0-9]%'
If you have short strings you should be able to create a few LIKE patterns ('[^a-zA-Z0-9]', '[^a-zA-Z0-9][^a-zA-Z0-9]', ...) to match strings of different length. Otherwise you should use CLR user defined function and a proper regular expression - Regular Expressions Make Pattern Matching And Data Extraction Easier.
This will not work correctly, e.g. abcÑxyz will pass thru this as it has a,b,c... you need to work with Collate or check each byte.

Preg_replace solution for prepared statements

I have a command class that abstracts almost all specific database functions (We have the exactly same application running on Mssql 2005 (using ODBC and the native mssql library), MySQL and Oracle. But sometimes we had some problems with our prepare method that, when executed, replaces all placeholders with their respective values. But the problem is that I am using the following:
if(is_array($Parameter['Value']))
{
$Statement = str_ireplace(':'.$Name, implode(', ', $this->Adapter->QuoteValue($Parameter['Value'])), $Statement);
}
else
{
$Statement = str_ireplace(':'.$Name, $this->Adapter->QuoteValue($Parameter['Value']), $Statement);
}
The problem arises when we have two or mer similar parameters names, for example, session_browser and session_browse_version... The first one will partially replace the last one.
Course we learned to go around specifying the parameters within a specific order, but now that I have some "free" time I want to make it better, so I am thinking on switching to preg_replace... and I am not good in regular expression, can anyone give any help with a regex to replace a string like ':parameter_name`?
Best Regards,
Bruno B B Magalhaes
You should use the \b metacharacter to match the word boundary, so you don't accidentally match a short parameter name within a longer parameter name.
Also, you don't need to special-case arrays if you coerce a scalar Value to an array of one entry:
preg_replace("/:$Name\b/",
implode(",", $this->Adapter->QuoteValue( (array) $Parameter['Value'] )),
$Statement);
Note, however, that this can make false positive matches when an identifier or a string literal contains a pattern that looks like a parameter placeholder:
SELECT * FROM ":Name";
SELECT * FROM Table WHERE column = ':Name';
This gets even more complicated when quoted identifiers and string literals can contain escaped quotes.
SELECT * FROM Table WHERE column = 'word\':Name';
You might want to reconsider interpolating variables into SQL strings during prepare, because you're defeating any benefits of prepared statements with respect to security or performance.
I understand why you're doing what you're doing, because not all RDBMS back-end supports named parameters, and also SQL parameters can't be used for lists of values in an IN() predicate. But you're creating an awfully leaky abstraction.