Teradata substring out of bounds - sql

I'm having issues figuring out the bounds between a substring. For example for the string 063016_shape_tea_cleanse__emshptea1_I want to substring out emshptea1, but it also has to work for the string 063016_shape_tea_cleanse__emshptea1_TESTDATA_HERE.
Currently I have:
sel SUBSTR('063016_shape_tea_cleanse__emshptea1_',POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_')+2,
POSITION('_' IN SUBSTR('063016_shape_tea_cleanse__emshptea1_',POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_') + 2,CHARACTER_LENGTH('063016_shape_tea_cleanse__emshptea1_') - (POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_') + 2)))-1)
But that is erroring out due to it trying to substring 27 to -1.

You might use a regular expression, this will extract everything between __ and the following _ or end of string:
REGEXP_SUBSTR(col, '(?<=__).+?(?=(_|$))')
'(?<= )' is a look-behind, i.e search for previous characters without adding it to the result. Here: search for __
'.+' matches any character, one or multiple times. This would match until the end of the string ("greedy"), '?' ("lazy") prevents that.
'(?= )' is a look-ahead, i.e. search for following characters without adding it to the result.
( | ) The pipe splits an expression in multiple alternatives. Here either an underscore character or the end of the string $

Related

How to remove non-numeric characters (except full stop "." ) from a string in amazon redshift

I have been trying to figure out how to remove multiple non-numeric characters except full stop ("."), or return only the numeric characters with full stop (".") from a string. I've tried:
SELECT regexp_replace('~�$$$1$$#1633,123.60&&!!__!', '[^0-9]+', '')
This query returns following result : 1163312360
But I want the result as 11633123.60
Please try this:
The below regex_replace expression will replace all character which are not ("^") in the (range of 0-9) & "."
SELECT regexp_replace('ABC$$$%%11633123.60','([^0-9.])','') FROM DUAL;
It returns the expected output "11633123.60"

Replace function doesn't work as expected

I'm having trouble figuring out why REPLACE() doesn't work correctly.
I'm getting a string formatted as:
RISHON_LEZION-CMTSDV4,Cable7/0/4/U1;RISHON_LEZION-CMTSDV4,Cable7/0/4/U2;RISHON_LEZION-CMTSDV4,Cable7/0/5/U0;.....
Up to 4000 characters .
Each spot of ; represent a new string(can be up to about 15 in one string). I'm splitting it by using REPLACE() - each occurence of ; replace with $ + go down a line + concat the entire string again (I have another part that is splitting down the string)
I think the length of the string is some how effecting the result, though I never heard replace has some kind of limitation about the length of the string.
SELECT REPLACE(HOT_ALERTKEY_PK, ';', '$' || CHR(13) || CHR(10) || HOT_ALERTKEY_PK || '$')
from (SELECT 'RISHON_LEZION-CMTSDV4,Cable7/0/3/U0;RISHON_LEZION-CMTSDV4,Cable7/0/3/U1;RISHON_LEZION-CMTSDV4,Cable7/0/3/U2;RISHON_LEZION-CMTSDV4,Cable7/0/4/U0;RISHON_LEZION-CMTSDV4,Cable7/0/4/U1;RISHON_LEZION-CMTSDV4,Cable7/0/4/U2;RISHON_LEZION-CMTSDV4,Cable7/0/5/U0;RISHON_LEZION-CMTSDV4,Cable7/0/5/U1;RISHON_LEZION-CMTSDV4,Cable7/0/5/U2;RISHON_LEZION-CMTSDV4,Cable7/0/7/U0;RISHON_LEZION-CMTSDV4,Cable7/0/7/U1;RISHON_LEZION-CMTSDV4,Cable7/0/7/U2;RISHON_LEZION-CMTSDV4,Cable7/0/9/U0;RISHON_LEZION-CMTSDV4,Cable7/0/9/U1;RISHON_LEZION-CMTSDV4,Cable7/0/9/U2' as hot_alertkey_pk
FROM dual)
This for some reason result in splitting the string correctly, up to cable7/0/5/U0; , and stops. If I remove one or more parts from the start of the string (up to the semicolumn is each part) then I'm getting it up to the next cables, according to how many I remove from the beggining.
Why is this happening ?
Thanks in advance.
If you wrap your sample input string within to_clob() in the inner query, and you wrap the resulting string within length() in the outer query, you will find that the result is 8127 characters. This answers your question, but only partially.
I am not sure why replace doesn't throw an error, or perhaps just truncate the result at 4000 characters. I got exactly the same result as you did in Oracle 11.2, with the result chopped off after 3503 characters. I just looked quickly at the Oracle documentation for replace() and it doesn't say what the behavior should be if the input is VARCHAR2 but the output is more than 4000 characters. It looks as though it performed as many substitutions as it could and then it stopped (the next substitution would have gone above 4000 characters).

Search an Oracle clob for special characters that are not escaped

Is it possible to run a query that can search an Oracle clob for any record that contains an ampersand character where the word in which the character is located in is not one of any of the following (or possible any escape code):
& - &
< - <
> - >
" - "
' - &apos;
I want to extract 5 character before the ampersand and 5 characters after the ampersand so i can see the actual value.
Basically i want to search for any record that contains those fields and replace it with the escape code.
At the moment i am doing something like this:
Select * from articles
where dbms_lob.instr(article_summary , '&amp' ) = 0 and dbms_lob.instr(article_summary , '&' )
Update
If i was to use a regular expression, how would i specify it if i want to retrieve all fields where the value is & followed by any character other than 'a'?
You can use DBMS_XMLGEN.CONVERT for this. The second parameter is optional and if left out will escape the the XML special characters.
select DBMS_XMLGEN.CONVERT(article_summary)
from articles;
But, if article summary contains a mixture of escaped and unescaped characters, then this will give wrong result. Easiest way to solve it, is to unescape the characters first and then escape it.
select DBMS_XMLGEN.CONVERT(
DBMS_XMLGEN.CONVERT(article_summary,1) --1 as parameter does unescaping
)
from articles;

Parse stringto get final end result

I'm trying to parse this string 'Smith, Joe M_16282' to get everything before the comma, combined with everything after the underscore.
The resulting string would be: Smith16282
string longName = "Smith, Joe M_16282";
string shortName = longName.Substring(0, longName.IndexOf(",")) + longName.Substring(longName.LastIndexOf("_") + 1);
Notes:
The second "substring" doesn't need a length parameter, because we want everything after the underscore
The LastIndexOf is used instead of IndexOf in case there are other underscores appearing in the name such as "Smith_Jones, Joe M_16282"
This code assumes that there is at least one comma and at least one underscore in the string "longName." If not, the code fails. I will leave that checking to you if you need it.
As others have said, the simple approach for parsing a string like that would be to use the String's various parsing methods, such as IndexOf and SubString. If you want something more powerful and flexible, you may also want to consider using a RegEx replacement. For instance, you could do something like this:
Dim input As String = "Smith, Joe M_16282"
Dim pattern As String = "(.*?),.*?_(.*)"
Dim replacement As String = "$1$2"
Dim output As String = Regex.Replace(input, pattern, replacement)
Or, more simply:
Dim output As String = Regex.Replace("Smith, Joe M_16282", "(.*?),.*?_(.*)", "$1$2")
Here's the meaning of the pattern:
(.*?) - The first group capturing all of the characters before the comma
( - Starts the capturing group
. - This is a wildcard which matches any character
* - Specifies that the previous thing (any character) is repeated any number of times
? - Specifies that the * is non-greedy, meaning it won't match everything until the end of the string--it will only match until it finds the following comma
) - Ends the capturing group
, - The comma to look for
.*? - Says that there will be any number of any characters between the comma and the underscore which we don't care about
. - Any character
* - Any number of times
? - Until you find the underscore
_ - The underscore the look for
(.*) - The second group capturing all of the characters after the underscore
( - Starts the capturing group
. - Any character
* - Any number of times
) - Ends the capturing group
Here's the meaning of the replacement:
$1 - The value of all of the characters found in the first capturing group
$2 - The value of all of the characters found in the second capturing group
RegEx may be overkill for your particular situation, but it is a very handy tool to learn. One major advantage is that you could move the pattern and replacement values out into external settings in the app.config, or somewhere. Then, you could modify the replacement rules without recompiling your application.

List of special characters for SQL LIKE clause

What is the complete list of all special characters for a SQL (I'm interested in SQL Server but other's would be good too) LIKE clause?
E.g.
SELECT Name FROM Person WHERE Name LIKE '%Jon%'
SQL Server:
%
_
[specifier] E.g. [a-z]
[^specifier]
ESCAPE clause E.g. %30!%%' ESCAPE '!' will evaluate 30% as true
' characters need to be escaped with ' E.g. they're becomes they''re
MySQL:
% - Any string of zero or more characters.
_ - Any single character
ESCAPE clause E.g. %30!%%' ESCAPE '!' will evaluate 30% as true
Oracle:
% - Any string of zero or more characters.
_ - Any single character
ESCAPE clause E.g. %30!%%' ESCAPE '!' will evaluate 30% as true
Sybase
%
_
[specifier] E.g. [a-z]
[^specifier]
Progress:
% - Any string of zero or more characters.
_ - Any single character
Reference Guide here [PDF]
PostgreSQL:
% - Any string of zero or more characters.
_ - Any single character
ESCAPE clause E.g. %30!%%' ESCAPE '!' will evaluate 30% as true
ANSI SQL92:
%
_
An ESCAPE character only if specified.
PostgreSQL also has the SIMILAR TO operator which adds the following:
[specifier]
[^specifier]
| - either of two alternatives
* - repetition of the previous item zero or more times.
+ - repetition of the previous item one or more times.
() - group items together
The idea is to make this a community Wiki that can become a "One stop shop" for this.
For SQL Server, from http://msdn.microsoft.com/en-us/library/ms179859.aspx :
% Any string of zero or more characters.
WHERE title LIKE '%computer%' finds all book titles with the word 'computer' anywhere in the book title.
_ Any single character.
WHERE au_fname LIKE '_ean' finds all four-letter first names that end with ean (Dean, Sean, and so on).
[ ] Any single character within the specified range ([a-f]) or set ([abcdef]).
WHERE au_lname LIKE '[C-P]arsen' finds author last names ending with arsen and starting with any single character between C and P, for example Carsen, Larsen, Karsen, and so on. In range searches, the characters included in the range may vary depending on the sorting rules of the collation.
[^] Any single character not within the specified range ([^a-f]) or set ([^abcdef]).
WHERE au_lname LIKE 'de[^l]%' all author last names starting with de and where the following letter is not l.
ANSI SQL92:
%
_
an ESCAPE character only if specified.
It is disappointing that many databases do not stick to the standard rules and add extra characters, or incorrectly enable ESCAPE with a default value of ‘\’ when it is missing. Like we don't already have enough trouble with ‘\’!
It's impossible to write DBMS-independent code here, because you don't know what characters you're going to have to escape, and the standard says you can't escape things that don't need to be escaped. (See section 8.5/General Rules/3.a.ii.)
Thank you SQL! gnnn
You should add that you have to add an extra ' to escape an exising ' in SQL Server:
smith's -> smith''s
Sybase :
% : Matches any string of zero or more characters.
_ : Matches a single character.
[specifier] : Brackets enclose ranges or sets, such as [a-f]
or [abcdef].Specifier can take two forms:
rangespec1-rangespec2:
rangespec1 indicates the start of a range of characters.
- is a special character, indicating a range.
rangespec2 indicates the end of a range of characters.
set:
can be composed of any discrete set of values, in any
order, such as [a2bR].The range [a-f], and the
sets [abcdef] and [fcbdae] return the same
set of values.
Specifiers are case-sensitive.
[^specifier] : A caret (^) preceding a specifier indicates
non-inclusion. [^a-f] means "not in the range
a-f"; [^a2bR] means "not a, 2, b, or R."
Potential answer for SQL Server
Interesting I just ran a test using LinqPad with SQL Server which should be just running Linq to SQL underneath and it generates the following SQL statement.
Records
.Where(r => r.Name.Contains("lkjwer--_~[]"))
-- Region Parameters
DECLARE #p0 VarChar(1000) = '%lkjwer--~_~~~[]%'
-- EndRegion
SELECT [t0].[ID], [t0].[Name]
FROM [RECORDS] AS [t0]
WHERE [t0].[Name] LIKE #p0 ESCAPE '~'
So I haven't tested it yet but it looks like potentially the ESCAPE '~' keyword may allow for automatic escaping of a string for use within a like expression.