Numbers vs Non-numbers in String validation - sql

I would like to ask for advice on records testing.
At this point, I have an account field that must consist of numbers only. Nevertheless, this is a varchar field because of the leading zeroes.
I had this query that actually shows me non-digits in the account number (or null).
LTRIM(TRANSLATE(ACCOUNT,'0123456789',' '),' ') INVALID_DATA
Nevertheless, I just faced another issue- spaces are not taken into account, and therefore if account has a space, it goes unnoticed as null. Yes, I can replace space with something more recognizable, but will it be enough? I am sure there are other exceptions I don't know about.
Is there any universal way to check for ANY POSSIBLE variation that is not a number? maybe something like this? How reliable is it?
LENGTH(ACCOUNT)-LENGTH(TO_NUMBER(REGEXP_REPLACE((ACCOUNT), '[^[:digit:]]+', ''))) NON-NUMBERS
Also, how to detect and account for cases with non-Unicode characters?

I would just use a regexp_replace approach to find non-numbers in a string:
regexp_replace(account, '\d+')
Explanation
-The escape character, \d, is metacharacter for a digit character.
-The + symbol is a quantifier indicating one or more instances of this digit.
Thus, we are removing all digits from the account column and where this is non-null,
you have non-number left.
~~~~~~~~~~~~
With respect to your calculation:
LENGTH(ACCOUNT)-LENGTH(TO_NUMBER(REGEXP_REPLACE((ACCOUNT), '[^[:digit:]]+', ''))) NON-NUMBERS
Your reg expression looks for non-digits and removes them. Your approach will work (I do not like converting the string to a number then dynamically casting as a string). If you use this in a condition, it would need to be encapsulated in a NVL function like this:
NVL(LENGTH(ACCOUNT)-LENGTH(REGEXP_REPLACE(ACCOUNT, '[^[:digit:]]+', '')),0) NON-NUMBERS

I think you have the right idea:
LTRIM(TRANSLATE(ACCOUNT, ' 0123456789', 'X'), ' ') as INVALID_DATA
Should return a non-empty string when there is a space.
In a where clause you would use:
where LENGTH( LTRIM(TRANSLATE(ACCOUNT, ' 0123456789', 'X'), ' ')) > 0

You can replace all spaces like this:
SELECT REPLACE(fld_or_variable, ' ', '')
Or you can trim spaces like this
SELECT LTRIM(RTRIM(' Amit Tech Corp '))

Related

split text if contains certain string postgres

exit_reason
sr_inefficient_management
tech_too_complex
company_member_resignation
sr_product_engagement
sr_contractual_reasons
sr_contractual_reasons-expectation_issues
sr_churn-takeover_business
I would like to split the column if the value contains the string "sr_" and keep the rest as it is. If the column contains "-" such as "sr_contractual_reasons-expectation_issues", I only want to keep it as "contractual reasons".
So far, my idea is to use
case when exit_reason like '%inefficient_management%' then 'inefficient management'
but if there are many different values, I am in trouble.
Expected output
exit_reason column
tech too complex
company member resignation
product engagement
contractual reasons
contractual reasons
churn
You can just replace 'sr_'
replace(exit_reason, 'sr_', '')
It is unlikely that 'sr_' would appear in any of the reasons. But you can use regexp_replace() to be sure:
regexp_replace(exit_reason, '^sr_', '')
You can try something like it:
REPLACE(
CASE
WHEN exit_reason LIKE '%-%'
THEN split_part(exit_reason,'-',2)
WHEN exit_reason LIKE 'sr_%'
THEN split_part(exit_reason,'sr_',2)
ELSE exit_reason
END
, '_', ' '
)
This code first checks if 'exist_reason' has a hyphen, then if it has 'sr_' and replaces all underscores with blanks.
To also remove the suffix, you could use:
SELECT replace(
regexp_replace(
'sr_contractual_reasons-expectation_issues',
'^(sr_)?([^-]*).*$',
'\2'
),
'_',
' '
);
replace
═════════════════════
contractual reasons
(1 row)
The regular expression matches an optional leading sr_, then all characters until the first -, then anything that follows that, and keeps only the middle part. replace then replaces underscores with spaces.

TRIM doesn't remove inner whitespaces

I try to remove all whitespaces inside a string. For this, I use TRIM() function. Unfortunately it doesn't work as expected, inner whitespaces (between 35 and 'A') remain untouched:
select TRIM('Hopkins 35 A Street') as Street
Column type is nvarchar. The funny thing is that this function works fine (using example from above) when executed on W3Schools (TRIM function example): https://www.w3schools.com/sql/func_sqlserver_trim.asp.
I can use replace on this string and replace ' ' into '' without a problem. I work on SQL Server 18.7.1 (2020)
if you use TRIM like this you are only removing leading and trailing spaces from a string. To remove also spaces in between you should change to:
select TRIM(' ' FROM 'Hopkins 35 A Street') as Street
UPDATE: if you meant to remove all spaces you should use
SELECT REPLACE('Hopkins 35 A Street', ' ', '')
TRIM is only intended to make a double space become a single one
"Trimming" means the removal of whitespace from the start and/or the end of a string value. It never means (and has never meant to mean) the removal of whitespace within a string value (enclosed by non-whitespace characters).
You can indeed use the TRIM function with a FROM in its argument to specify other characters than whitespace to trim. In that case, the TRIM function will remove the specified characters from the start and the end of the string, but not within the string (enclosed by other characters).
In other words: the specified characters will be treated as if they were whitespace as well, but specifying them so will not affect the trimming behavior/algorithm itself.
Check out the sample on Microsoft Docs:
SELECT TRIM( '.,! ' FROM ' # test .') AS Result;
produces this result: # test
TRIM function will only remove only the leading and tailing spaces in the data. It cannot remove all the spaces in the data. I mean it cannot remove all the spaces if there are any spaces in the data like 'Hello World'. TRIM cannot remove the space between the word Hello and World and make it look like 'HelloWorld'. If you want to remove all the spaces, you can use the REPLACE function. In the REPLACE function you can replace the space with any character/number/symbol. If you don't need any you can simply remove the space with ''. like
SELECT REPLACE('Hopkins 35 A Street', ' ', '')

Get only Number

How to ignore special characters and get only number with the below input as string.
Input: '33-01-616-000'
Output should be 3301616000
Use the REPLACE() function to remove the - characters.
REPLACE(columnname, '-', '')
Or if there can be other non-numeric characters, you can use REGEXP_REPLACE() to remove anything that isn't a number.
REGEXP_REPLACE(columnname, '\D', '')
Standard string functions (like REPLACE, TRANSLATE etc.) are often much faster (one order of magnitude faster) than their regular expression counterparts. Of course, this is only important if you have a lot of data to process, and/or if you don't have that much data but you must process it very frequently.
Here is one way to use TRANSLATE for this problem even if you don't know ahead of time what other characters there may be in the string - besides digits:
TRANSLATE(columnname, '0123456789' || columnname, '0123456789')
This will map 0 to 0, 1 to 1, etc. - and all other characters in the input string columnname to nothing (so they will be simply removed). Note that in the TRANSLATE mapping, only the first occurrence of a character in the second argument matters - any additional mapping (due to the appearance of the same character in the second argument more than once) is ignored.
You can also use REGEXP_REPLACE function. Try code below,
SELECT REGEXP_REPLACE('33-01-61ASDF6-0**(98)00[],./123', '([^[:digit:]])', NULL)
FROM DUAL;
SELECT regexp_replace('33-01-616-000','[^0-9]') digits_only FROM dual;
/

White spaces in sql

that will sounds stupid, but I have a table with names, those names may finish with white space or may not. E.g. I have name ' dummy ', but even if in the query I write only ' dummy' it will find the record ' dummy '. Can I fix it somehow?
SELECT *
FROM MYTABLE where NAME=' dummy'
Thanks
This is how SQL works (except Oracle), when you compare two strings the shorter one will be padded with blanks to the length of th 2nd string.
If you really need to consider trailings blanks you can switch to LIKE which doesn't follow that rule:
SELECT *
FROM MYTABLE where NAME LIKE ' dummy'
Of course, you better clean your data during load.
There's only one thing which is worse than trailing spaces, leading spaces (oh, wait a minute, you got them, too).

Regular expressions in SQL

Im curious if and how you can use regular expressions to find white space in SQL statments.
I have a string that can have an unlimited amount of white space after the actual string.
For example:
"STRING "
"STRING "
would match, but
"STRING A"
"STRINGB"
would not.
Right now I have:
like 'STRING%'
which doesnt quite return the results I would like.
I am using Sql Server 2008.
A simple like can find any string with spaces at the end:
where col1 like '% '
To also allow tabs, carriage returns or line feeds:
where col1 like '%[ ' + char(9) + char(10) + char(13) + ']'
Per your comment, to find "string" followed by any number of whitespace:
where rtrim(col1) = 'string'
You could try
where len(col1) <> len(rtrim(col1))
Andomar's answer will find the strings for you, but my spidey sense tells me maybe the scope of the problem is bigger than simply finding the whitespace.
If, as I suspect, you are finding the whitespace so that you can then clean it up, a simple
UPDATE Table1
SET col1 = RTRIM(col1)
will remove any trailing whitespace from the column.
Or RTRIM(LTRIM(col1)) to remove both leading and trailing whitespace.
Or REPLACE(col1,' '.'') to remove all whitespace including spaces within the string.
Note that RTRIM and LTRIM only work on spaces, so to remove tabs/CRs/LFs you would have to use REPLACE. To remove those only from the leading/trailing portion of the string is feasible but not entirely simple. Bug your database vendor to implement the ANSI SQL 99 standard TRIM function that would make this much easier.
where len(col1 + 'x') <> len(rtrim(col1)) + 1
BOL provides workarounds for LEN() with trailing spaces : http://msdn.microsoft.com/en-us/library/ms190329.aspx
LEN(Column + '_') - 1
or using DATALENGTH