Decode does not pick up first character - sql

Why does this fail for the first character in oracle sql?
select DECODE( TRANSLATE('1','123',' '), NULL, 'number','contains char') from dual
This works because 1 is the second digit
select DECODE( TRANSLATE('1','4123',' '), NULL, 'number','contains char') from dual
But this fails because 4 is the first digit
select DECODE( TRANSLATE('4','423',' '), NULL, 'number','contains char') from dual

First let's take a look at translate function definition:
TRANSLATE(expr, from_string, to_string): TRANSLATE returns expr with all
occurrences of each character in from_string replaced by its corresponding
character in to_string. Characters in expr that are not in from_string are not replaced.
If expr is a character string, then you must enclose it in single quotation marks.
The argument from_string can contain more characters than to_string. In this case,
the extra characters at the end of from_string have no corresponding characters
in to_string. If these extra characters appear in char, then they are removed
from the return value.
i.e. TRANSLATE(some_string,'123','abc'): 1 will be replaced by a, 2 by b, 3 by c(I will use arrow -> instead of "replaced by" further)
Now let's take a look at our examples:
TRANSLATE('1','123',' '): 1 -> " ", 2->nothing, 3->nothing.
(nothing means removed from the return value, see definition)
Result of above function is string consisted of whitespace - " "
TRANSLATE('1','4123',' '): 4 -> " ", 1->nothing, 2->nothing, 3->nothing
Result of the above function is empty string "". Oracle Database interprets the empty string as null, and if this function has a null argument, then it returns null.
TRANSLATE('4','423',' '): 4->" ", 2->nothing, 3->nothing
Result of the above function is whitespace string as in the first example.
That is why you are getting "contains char" in the first and third queries, and number in the second one

Related

REGEXP_REPLACE for spark.sql()

I need to write a REGEXP_REPLACE query for a spark.sql() job.
If the value, follows the below pattern then only, the words before the first hyphen are extracted and assigned to the target column 'name', but if the pattern doesn't match, the entire 'name' should be reported.
Pattern:
Values should be hyphen delimited. Any values can be present before the first hyphen (be it numbers,
alphabets, special characters or even space)
First hyphen should be exactly followed by 2 words, separated by hyphen (it can only be numbers,
alphabets or alphanumeric) (Note: Special characters & blanks are not allowed)
Two words should be followed by one or more digits, followed by hyphen.
Last portion should be only one or more digits.
For Example:
if name = abc45-dsg5-gfdvh6-9890-7685, output of REGEXP_REPLACE = abc45
if name = abc, output of REGEXP_REPLACE = abc
if name = abc-gf5-dfg5-asd5-98-00, output of REGEXP_REPLACE = abc-gf5-dfg5-asd5-98-00
I have
spark.sql("SELECT REGEXP_REPLACE(name , '-[^-]+-\\w{2}-\\d+-\\d+$','',1,1,'i') AS name").show();
But it does not work.
Use
^([^-]*)(-[a-zA-Z0-9]+){2}-[0-9]+-[0-9]+$
See proof. Replace with $1. If $1 does not work, use \1. If \1 does not work use \\1.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^-]* any character except: '-' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2 (2 times):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
){2} end of \2 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Oracle SQL Regular Expression RegExp_SubStr End Of Line (chr(10) in search text returns null

I am trying to get a specific part of a text with Oracle regular expression. If there is no end of line character (chr(10)) in the text, I get what I want. But if there is an end of line character, it returns null. You can see the sample sql codes below.
SELECT RegExp_SubStr('TEXT_BEGIN Line 1 Line 2 TEXT_END',
'TEXT_BEGIN(.+)TEXT_END', 1, 1, NULL, 1)
FROM dual;
returns
Line 1 Line 2
with end of line char.
SELECT RegExp_SubStr('TEXT_BEGIN Line 1' || chr(10) || 'Line 2 TEXT_END',
'TEXT_BEGIN(.+)TEXT_END', 1, 1, NULL, 1)
FROM dual;
returns
NULL
It may be a solution to convert the end of line characters in the text to a special character such as ## CHR10 ## before SubStr and then back after SubStr. But I want a simple solution without hack.
By default, the dot does not match the newline character.
The fifth argument to regexp_substr (which you have as NULL in your example) is used for a few modifiers. One of them is 'n' to allow the dot to match newline.
So - it's really easy: change the fifth argument from NULL to 'n'.

why output is null at select translate(' #',' ','') from dual; and why resulit is # at select replace(' #',' ','') from dual;

Basically translate will change character to character and Replace string to string , and here i have tried to remove spaces using translate to count the number words .
select translate(' #',' ','') from dual;
select replace(' #',' ','') from dual;
select ename , nvl(length(replace(TRANSLATE(upper(trim(ename)),'ABCDEFGHIJKLMNOPQRSTUVWXYZ'' ',' # '),' ',''))+1,1) NOOFWORDs
from emp;
Unfortunately Oracle has made many bizarre choices around null vs. empty string.
One of those has to do with TRANSLATE. TRANSLATE will return NULL if any of its arguments (including the last one) is NULL, no matter what the logical behavior should be.
So, to remove spaces (say) with TRANSLATE, you must add a character you do NOT want to be removed to both the second and the third argument. I added the lower-case letter z, but you could add anything (a dot, the digit 0, whatever - just make sure you add the same character at the beginning of both arguments)
... translate (input_string, 'z ', 'z') ....
For example:
select translate(' #','z ','z') from dual;
TRANSLATE('#','Z','Z')
------------------------
#
select translate(' #',' ','') from dual;
Returns NULL because in Oracle empty strings unfortunately yield NULLs. Therefore it's equivalent to
SELECT translate(' #', ' ', NULL)
FROM dual;
and translate() returns NULL when an argument is null. Actually this is well documented in "TRANSLATE":
(...)
You cannot use an empty string for to_string to remove all characters in from_string from the return value. Oracle Database interprets the empty string as null, and if this function has a null argument, then it returns null.
If you want to replace one character, use replace() as you already did. For a few but more than one characters you can nest the replace()s.
This however gets unhandy, when you want to replace quite a lot of characters. In such a situation, if the replacement character is only one character or the empty string regexp_replace() using a character class or alternates may come in handy.
For example
SELECT regexp_replace('a12b478c01', '[0-9]', '')
FROM dual;
replaces all the digits so just 'abc' remains and
SELECT regexp_replace('ABcc1233', 'c|3', '')
FROM dual;
removes any '3' or 'c' and results in 'AB12'. In your very example
SELECT regexp_replace(' #', ' ', '')
FROM dual;
would also work and give you '#'. Though in the simple case of your example a simple replace() is enough.

Oracle replace a character not followed by another character

I am attempting to replace all of the &'s in a string with &amp unless the & is followed by lt, apos, gt or quot.
Running this statement
select
regexp_replace('&lt &apos &gt &quot &','&(^lt|^gt|^quot|^apos)','&amp')
however results in no changes to the string.
The output I would be looking for is
'&lt &apos &gt &quot &amp'
A direct and efficient solution (but difficult to write, read and maintain) is:
set define off
(in case you are using a front-end that uses & to mark substitution variables)
then
with
inputs ( inp_str ) as (
select '&lt &apos &gt &quot &' from dual union all
select 'Hello, World!' from dual union all
select '' from dual union all
select '7 &lt 10 &and &&quot' from dual
)
select inp_str,
regexp_replace(inp_str,
'&($|[^lagq]|(g|l)([^t]|$)|a($|[^p]|p($|[^o]|o($|[^s])))|q($|[^u]|u($|[^o]|o($|[^t]))))',
'&amp\1') as new_str
from inputs;
Explanation: (partial...) This will replace every & with &amp, with a few exceptions. The & will be replaced if:
It is followed by the end of the string ($), or
It is followed by any character other than l, a, g or q; or
it is followed by g or l, which is then followed by a character other than t, or by the end of string ($); or
It is followed by a, followed by the end of string, by any letter other than p, or by the letter p followed by the end of string, or .........
Output (from my inputs):
INP_STR NEW_STR
---------------------------- ----------------------------
&lt &apos &gt &quot & &lt &apos &gt &quot &amp
Hello, World! Hello, World!
7 &lt 10 &and &&quot 7 &lt 10 &ampand &amp&quot
4 rows selected.
(Note: I always include an empty string and a string with no ampersands among the inputs, to verify that the query works correctly on them too.)
These codes look much like HTML entity names, but the ending semi-colons are missing... making it less clear where a name ends.
In the following solution I assume that these entities cannot be followed immediately by a letter, a digit nor underscore.
When a & is followed by such a character, it is considered an entity, and not touched. Only the other & are replaced.
select regexp_replace('&lt &apos &gt &quot &', '&(\W|$)', '&amp\1') from dual;
The \W|$ matches either with a character that is not a letter, digit or underscore, or with the end of the string.

How to replace character in SQL

I want to Replace a particular character on position 4 in sql Server ,
i know about replace or case when but my problem is that i just want to 4th position character replace ,
i am trying like
SELECT REPLACE(_NAME,0,1) AS exp FROM _EMPLOYEE
but it will not cheching 4th character
for example if _name contain IMR002001 then it should be IMR012001
Use stuff():
select stuff(_NAME, 4, 1, '#')
This replaces the substring starting at position 4 with length 1 with the string that is the fourth argument. The string can be longer or shorter than the string being replaced.
For your example:
select stuff(_NAME, 4, 1, '1')