How to remove semicolon in a string in HIVE SQL - hive

I am trying to remove ";" semi-colon from a string.
What command in HIVE SQL should I use. I know regexp_replace may work..but what to put ?
It appears that ; - the special character does not work but other special characters like , or : works.
For example ,
Data looks like
;;;;;0123445
I want the data to look like this
0123445
Any help on this will be appreciated. I have been struggling with this.

REGEXP_REPLACE indeed looks like a good pick. For example, this removes all semicolons from the field :
REGEXP_REPLACE(my_column, ';', '')
From the documentation :
Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT.
Please note that the semicolon has no special meaning in the regexp language.
If you want to match on semicolons on the beginning of the string only (as shown in your question), use regexp special character ^, which indicates the beginning of the string
REGEXP_REPLACE(my_column, '^;', '')

To remove all semicolons, you can simply use replace():
replace(my_column, ';', '')
To remove leading semicolons, you can use:
replace(my_column, '^;+', '')

In Hive you need to escape the semi-colon.
regexp_replace(column_name,'\;','')

Related

sql regexp string end with ".0"

I want to judge if a positive number string is end with ".0", so I wrote the following sql:
select '12310' REGEXP '^[0-9]*\.0$'. The result is true however. I wonder why I got the result, since I use "\" before "." to escape.
So I write another one as select '1231.0' REGEXP '^[0-9]\d*\.0$', but this time the result is false.
Could anyone tell me the right pattern?
Dot (.) in regexp has special meaning (any character) and requires escaping if you want literally dot:
select '12310' REGEXP '^[0-9]*\\.0$';
Result:
false
Use double-slash to escape special characters in Hive. slash has special meaning and used for characters like \073 (semicolon), \n (newline), \t (tab), etc. This is why for escaping you need to use double-slash. Also for character class digit use \\d:
hive> select '12310.0' REGEXP '^\\d*?\\.0$';
OK
true
Also characters inside square brackets do not need double-slash escaping: [.] can be used instead of \\.
If you know it is a number string, why not just use:
select ( val like '%.0' )
You need regular expression if you want to validate that the string has digits everywhere else. But if you only need to check the last two characters, like is sufficient.
As for your question . is a wildcard in regular expressions. It matches any character.

how to remove ' from string

Im trying to use the TRIM command in SQL to Remove special characters from a string. thing is i cant seem to figure out how to remove the ' character like how when people use it in their surname.
e.g O'Reilly
in order to remove a character i have to quote it, but how can put in quotes or identify the character ' when it is used for quoting.
You want to use replace() and not trim(). Then, the escaping of single quotes requires doubling it, plus the outer single quotes. So:
replace(name, '''', '')
---------------^^ escaped single quote
--------------^--^ string delimiter for the single quote character
Use Replace function to replace that character (').
replace(name,"'","");
Link 1
Link 2

How can I escape the wildcard for like operator? [duplicate]

This question also has the answer, but it mentions DB2 specifically.
How do I search for a string using LIKE that already has a percent % symbol in it? The LIKE operator uses % symbols to signify wildcards.
Use brackets. So to look for 75%
WHERE MyCol LIKE '%75[%]%'
This is simpler than ESCAPE and common to most RDBMSes.
You can use the ESCAPE keyword with LIKE. Simply prepend the desired character (e.g. '!') to each of the existing % signs in the string and then add ESCAPE '!' (or your character of choice) to the end of the query.
For example:
SELECT *
FROM prices
WHERE discount LIKE '%80!% off%'
ESCAPE '!'
This will make the database treat 80% as an actual part of the string to search for and not 80(wildcard).
MSDN Docs for LIKE
WHERE column_name LIKE '%save 50[%] off!%'
You can use the code below to find a specific value.
WHERE col1 LIKE '%[%]75%'
When you want a single digit number after the% sign, you can write the following code.
WHERE col2 LIKE '%[%]_'
In MySQL,
WHERE column_name LIKE '%|%%' ESCAPE '|'

Remove Special Characters from an Oracle String

From within an Oracle 11g database, using SQL, I need to remove the following sequence of special characters from a string, i.e.
~!##$%^&*()_+=\{}[]:”;’<,>./?
If any of these characters exist within a string, except for these two characters, which I DO NOT want removed, i.e.: "|" and "-" then I would like them completely removed.
For example:
From: 'ABC(D E+FGH?/IJK LMN~OP' To: 'ABCD EFGHIJK LMNOP' after removal of special characters.
I have tried this small test which works for this sample, i.e:
select regexp_replace('abc+de)fg','\+|\)') from dual
but is there a better means of using my sequence of special characters above without doing this string pattern of '\+|\)' for every special character using Oracle SQL?
You can replace anything other than letters and space with empty string
[^a-zA-Z ]
here is online demo
As per below comments
I still need to keep the following two special characters within my string, i.e. "|" and "-".
Just exclude more
[^a-zA-Z|-]
Note: hyphen - should be in the starting or ending or escaped like \- because it has special meaning in the Character class to define a range.
For more info read about Character Classes or Character Sets
Consider using this regex replacement instead:
REGEXP_REPLACE('abc+de)fg', '[~!##$%^&*()_+=\\{}[\]:”;’<,>.\/?]', '')
The replacement will match any character from your list.
Here is a regex demo!
The regex to match your sequence of special characters is:
[]~!##$%^&*()_+=\{}[:”;’<,>./?]+
I feel you still missed to escape all regex-special characters.
To achieve that, go iteratively:
build a test-tring and start to build up your regex-string character by character to see if it removes what you expect to be removed.
If the latest character does not work you have to escape it.
That should do the trick.
SELECT TRANSLATE('~!##$%sdv^&*()_+=\dsv{}[]:”;’<,>dsvsdd./?', '~!##$%^&*()_+=\{}[]:”;’<,>./?',' ')
FROM dual;
result:
TRANSLATE
-------------
sdvdsvdsvsdd
SQL> select translate('abc+de#fg-hq!m', 'a+-#!', etc.) from dual;
TRANSLATE(
----------
abcdefghqm

SQL Server LIKE containing bracket characters

I am using SQL Server 2008. I have a table with the following column:
sampleData (nvarchar(max))
The value for this column in some of these rows are lists formatted as follows:
["value1","value2","value3"]
I'm trying to write a simple query that will return all rows with lists formatted like this, by just detecting the opening bracket.
SELECT * from sampleTable where sampleData like '[%'
The above query doesn't work, because '[' is a special character. How can I escape the bracket so my query does what I want?
... like '[[]%'
You use [ ] to surround a special character (or range).
See the section "Using Wildcard Characters As Literals" in SQL Server LIKE
Note: You don't need to escape the closing bracket...
Aside from gbn's answer, the other method is to use the ESCAPE option:
SELECT * from sampleTable where sampleData like '\[%' ESCAPE '\'
See the documentation for details.
Just a further note here...
If you want to include the bracket (or other specials) within a set of characters, you only have the option of using ESCAPE (since you are already using the brackets to indicate the set).
Also you must specify the ESCAPE clause, since there is no default escape character (it isn't backslash by default as I first thought, coming from a C background).
E.g., if I want to pull out rows where a column contains anything outside of a set of 'acceptable' characters, for the sake of argument let's say alphanumerics... we might start with this:
SELECT * FROM MyTest WHERE MyCol LIKE '%[^a-zA-Z0-9]%'
So we are returning anything that has any character not in the list (due to the leading caret ^ character).
If we then want to add special characters in this set of acceptable characters, we cannot nest the brackets, so we must use an escape character, like this...
SELECT * FROM MyTest WHERE MyCol LIKE '%[^a-zA-Z0-9\[\]]%' ESCAPE '\'
Preceding the brackets (individually) with a backslash and indicating that we are using backslash for the escape character allows us to escape them within the functioning brackets indicating the set of characters.