Determine if substring corresponds to specific code (character types) in SQL - sql

I have a collection of strings and want to filter out those where the last four characters are: (alpha)(alpha)(number)(number).
I know I can make a substring of each of these and separately, but what is the method to determine the types of the characters in the sequence?
This is for SQL in Hive.

You can use regular expressions. Something like:
where col regexp '[a-zA-Z]{2}[0-9]{2}$'

Related

How do I remove a character from strings of different lengths with sql? Intersystems cache sql

I have a column of strings that have an '&' at the beginning and end of each one that I need to remove for a Crystal report I'm creating. I'm writing the SQL code outside of Crystal I am using Intersystems Cache SQL. Below is an example:
&This& This
&is& is
&What& what
&it& I
&looks& need
&like& it
&now& to
look
like
Any suggestions would be greatly appreciated!!!
Assuming the ampersands are always positioned as both the leading and trailing characters, here's at least maybe a start. Use a combination of SUBSTR (or SUBSTRING, if using stream data) and LENGTH, like so:
SELECT SUBSTR((SELECT column FROM table), 2, LENGTH(SELECT column FROM table) - 2)
This should return a substring that starts counting at the 2nd character [of the original string, given by the first sub-expression/argument to SUBSTR], counting up for the total number of characters [of the original string] less 2 (i.e. less the two ampersands).
If you need to including trailing blanks and/or the string termination character, you may need to use a different variation of the LENGTH function. See resources for details on these functions and their variants:
https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_substr
https://cedocs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_length
Here's a Crystal formula that does the same:
ExtractString({YourData},"&","&")

How to use Regular Expressions to replace part of a string in SQLite?

I currently would like some advice on how to find and replace part of a string using regular expressions in SQLite? i am using Rstudio/R as the SQLite connector.
I have the following strings:
my_strings
--------------
1244599arts
3490872testing
4478933great
2342340obvious
gremlin2342678
i would like to replace the numbers with the word "final" - now I would like to use regular expressions to achieve this as I want to be able to capture the numbers only and then replace them with the word "final" and not affect any other part of the string
the output i would like to achieve is the following:
my_strings
--------------
finalarts
finaltesting
finalgreat
finalobvious
gremlinfinal
As you can see the numbers have now been replaced by the word "final" - please note that I have around 8 million rows so I cannot just repeat a REPLACE function as there are simply too many numbers!
I have written some regex to capture those numbers and the following statement will match those numbers:
[0-9]{7}
Here is an example of how the above matches those numbers
Now I would like to use this regex statement to amend these strings - the reason is that I would like to learn how to use regex in sqlite to find and replace matching parts of a string.
Has anyone got any advice?
for reference, I can use the REGEXP function as I have already made a sqlite instance in R.
You can use the sqlean-regexp extension, which provides regular expressions search and replace functions:
-- replace 7 digits with the word 'final'
update t set my_strings = regexp_replace(my_strings, '[0-9]{7}', 'final');

How to use BETWEEN Operator with Text Value in SQL?

How am I going to use BETWEEN Operator with Text Value or what is the right syntax when you will select all products with a ProductName for example ending with any of the letter BETWEEN 'C' and 'M'?
Most SQL dialects provide the RIGHT() function. This allows you to do:
WHERE RIGHT(TextValue, 1) BETWEEN 'C' AND 'M'
If your database doesn't have this function, you can do something similar with the built-in functions. Also, the exact comparison might depend on the collation of the column/table/database/server. Sometimes comparisons are independent of case and sometimes they are dependent on case.
In case you are interested in an alternative method (which does work with the w3schools SQL editor), you can also use the LIKE operator:
WHERE ProductName LIKE '%[c-m]'
This will get you all Product Names ending on any character between C and M.
(It does work with the w3schools SQL Editor.)
In this case, the LIKE operator is using two wildcard characters:
1.%
Any string of zero or more characters.
2.[c-m]
Any single character within the specified range ([a-f]) or set
([abcdef]).
You can find more information about the LIKE operator here:
https://msdn.microsoft.com/en-us/library/ms179859.aspx

Regular expression filter

I have this regular expression in my sql query
DECLARE #RETURN_VALUE VARCHAR(MAX)
IF #value LIKE '%[0-9]%[^A-Z]%[0-9]%'
BEGIN
SET #RETURN_VALUE = NULL
END
I am not sure, but whenever I have this in my row 12 TEST then it gives me the value of 12, but if I have three digit number then it filters out the three digit numbers.How can I modify the regular expression to return me the three digits numbers too.
any help will be appreciated.
SQL doesn't have regular expressions: it has SQL wildcard expressions. They are much simpler than regular expressions and long predate regular expressions. For instance, there is no way to specify alternation (a|b) or repetition ( a*, a+, a?, a{m,n} ) such as you might find in a regular expression.
The 'like expression' that you have
LIKE '%[0-9]%[^A-Z]%[0-9]%'
will match any string containing the following pattern anywhere in the string
zero or more of any character, followed by...
a single decimal digit, followed by...
zero or more of any character, followed by...
a single character other than A–Z (whether it's case sensitive or not depends on the collating sequence in use), followed by...
zero or of any character, followed by...
a single decimal digit, followed by...
zero or more of any character
One should note that the % is likely to match perhaps more than you might like.
Have you tried ([0-9]*). I believe that this will capture every digit for you. However, I am not as strong at regex. When I ran this through rubular, it worked, though :) BTW, rubular is a great way to test out regular expressions
You can easily create a SQL CLR function and use this in your queries. Visual Studio has a project template for this and makes deploying the functions a snap.
Here is more information from Microsoft about how to create the function and how to use it (for boolean matches and for data extraction).
First of all, note that this is not really a "regular expression", it's a SQL-specific form of wildcard matching. You are very limited in what you can accomplish with SQL wildcards. As one example, you cannot "optionally" match a specific character or character set.
Your expression, as you've written it, will match any value that contains two digits with at least one non-letter character in between them, meaning it will match:
111
1^1
1?7
1AAAAAAAAAAA?AAAAAAAAA1
-----------------------5-----------------3-------
And infinitely more items of a similar structure.
Oddly, one string that would not match this pattern is "12 TEST" because there is no character between the 1 and 2. The pattern also won't "give you" the value of 12 back because it's not a parsing expression, just a matching expression: it returns 1 (true) or 0 (false).
There is clearly something else going on in your application, possibly even an actual regular expression, but it has nothing to do with the SQL you've included here.

Return sql rows where field contains ONLY non-alphanumeric characters

I need to find out how many rows in a particular field in my sql server table, contain ONLY non-alphanumeric characters.
I'm thinking it's a regular expression that I need along the lines of [^a-zA-Z0-9] but Im not sure of the exact syntax I need to return the rows if there are no valid alphanumeric chars in there.
SQL Server doesn't have regular expressions. It uses the LIKE pattern matching syntax which isn't the same.
As it happens, you are close. Just need leading+trailing wildcards and move the NOT
WHERE whatever NOT LIKE '%[a-z0-9]%'
If you have short strings you should be able to create a few LIKE patterns ('[^a-zA-Z0-9]', '[^a-zA-Z0-9][^a-zA-Z0-9]', ...) to match strings of different length. Otherwise you should use CLR user defined function and a proper regular expression - Regular Expressions Make Pattern Matching And Data Extraction Easier.
This will not work correctly, e.g. abcÑxyz will pass thru this as it has a,b,c... you need to work with Collate or check each byte.