removing speciacl characters from teradata coloumn - sql

how to remove special characters from Teradata columns.
I am having columns values other than characters A-Z and numbers 1-9, how to remove theses column values as we don't know the exact special character. Can any one share the command for it?

You can use a regular expression, '[^0-9a-z]' to find characters which don't match a list of characters:
regexp_replace(col, '[^0-9a-z]', '', 1, 0, 'i')

Related

How to replace characters which are not alphanumeric in pyspark sql?

this is my code.
%spark.pyspark
jdbc_write(spark, spark.sql("""
SELECT
Global_Order_Number__c
, Infozeile__c
FROM STAG.SF_CASE_TRANS
"""), JDBC_URLS['xyz_tera_utf8'], "DEV_STAG.SF_CASE", "abc", "1234")
I want to exclude every character in the Infozeile__c field which are not a-z, A-Z, 0-9.
Is their any function which is able to do this?
Apply regexp_replace() to the column in your query:
regexp_replace(Infozeile__c, '[^a-zA-Z0-9]', '') as Infozeile__c
The regex [^a-zA-Z0-9] is a negated character class, meaning any character not in the ranges given. The replacement is a blank, effectively deleting the matched character.
If you're expecting lots of characters to be replaced like this, it would be a bit more efficient to add a +, which means "one or more", so whole blocks of undesirable characters are removed at a time.
regexp_replace(Infozeile__c, '[^a-zA-Z0-9]+', '')

Find phone numbers with unexpected characters using SQL in Oracle?

I need to find rows where the phone number field contains unexpected characters.
Most of the values in this field look like:
123456-7890
This is expected. However, we are also seeing character values in this field such as * and #.
I want to find all rows where these unexpected character values exist.
Expected:
Numbers are expected
Hyphen with numbers is expected (hyphen alone is not)
NULL is expected
Empty is expected
Tried this:
WHERE phone_num is not like ' %[0-9,-,' ' ]%
Still getting rows where phone has numbers.
from https://regexr.com/3c53v address you can edit regex to match your needs.
I am going to use example regex for this purpose
select * from Table1
Where NOT REGEXP_LIKE(PhoneNumberColumn, '^[+]*[(]{0,1}[0-9]{1,4}[)]{0,1}[-\s\./0-9]*$')
You can use translate()
...
WHERE translate(Phone_Number,'a1234567890-', 'a') is NOT NULL
This will strip out all valid characters leaving behind the invalid ones. If all the characters are valid, the result would be NULL. This does not validate the format, for that you'd need to use REGEXP_LIKE or something similar.
You can use regexp_like().
...
WHERE regexp_like(phone_num, '[^ 0123456789-]|^-|-$')
[^ 0123456789-] matches any character that is not a space nor a digit nor a hyphen. ^- matches a hyphen at the beginning and -$ on the end of the string. The pipes are "ors" i.e. a|b matches if pattern a matches of if pattern b matches.
Oracle has REGEXP_LIKE for regex compares:
WHERE REGEXP_LIKE(phone_num,'[^0-9''\-]')
If you're unfamiliar with regular expressions, there are plenty of good sites to help you build them. I like this one

SQL find all rows that do not have certain characters

I want to find all rows for which values in string column does not possess certain characters (to be specific [a-Z0-9 \t\n]) how can I do it in sql ?
I tried to do it with like operator
SELECT ***
where column like '%[^ a-Z0-9 \t\n]%'
however, it does not work and I get rows that possess characters and numbers.
To fetch all records that contain any characters other than alphabets, numbers, spaces, tabs and new-line delimiters:
SELECT ***
WHERE column like '%[^A-Za-z0-9 \t\n]%'
Note that [^A-Za-z0-9 \t\n] represents anything other than alphanumeric characters, spaces, tabs, and new line delimitters.
Your logic is inverted. I think you want:
where column not like '%[^ a-Z0-9 \t\n]%'
I don't think that SQL Server interprets \t and \n as special characters. You may need to insert the actual values for the characters. (See here.)
SELECT ***
WHERE column like '%[^A-Za-z0-9 \t\n]%'

a sql code or function to remove all the special characters from a particular column in a table

a sql code or function to remove all the special characters from a particular column in a table.
:a oracle code to remove all the special character from a column .for example ABC D.E.F so it should be ABC DEF,space should be maintained between 2 words.
If you want to remove the hyphens and the dots, you can go with translate like so:
select
translate(column_name, 'Q._"?!##$%^&*æ', 'Q')
from
your_table;
The easiest way should be a Regular Expression, this removes anything but spaces and letters a to z:
REGEXP_REPLACE(col, '[^a-z ]','' , 1, 0, 'i')

Regex to split values in PostgreSQL

I have a list of values coming from a PGSQL database that looks something like this:
198
199
1S
2
20
997
998
999
C1
C10
A
I'm looking to parse this field a bit into individual components, which I assume would take two regexp_replace function uses in my SQL. Essentially, any non-numeric character that appears before numeric ones needs to be returned for one column, and the other column would show all non-numeric characters appearing AFTER numeric ones.
The above list would then be split into this layout as the result from PG:
I have created a function that strips out the non-numeric characters (the last column) and casts it as an Integer, but I can't figure out the regex to return the string values prior to the number, or those found after the number.
All I could come up with so far, with my next to non-existant regex knowledge, was this: regexp_replace(fieldname, '[^A-Z]+', '', 'g'), which just strips out anything not A-Z, but I can;t get to to work with strings before numeric values, or after them.
For extracting the characters before the digits:
regexp_replace(fieldname, '\d.*$', '')
For extracting the characters after the digits:
regexp_replace(fieldname, '^([^\d]*\d*)', '')
Note that:
if there are no digits, the first will return the original value and then second an empty string. This way you are sure that the concatenation is equal to the original value in this case also.
the concatenation of the three parts will not return the original if there are non-numerical characters surrounded by digits: those will be lost.
This also works for any non-alphanumeric characters like #, [, ! ...etc.
Final SQL
select
fieldname as original,
regexp_replace(fieldname, '\d.*$', '') as before_s,
regexp_replace(fieldname, '^([^\d]*\d*)', '') as after_s,
cast(nullif(regexp_replace(fieldname, '[^\d]', '', 'g'), '') as integer) as number
from mytable;
See fiddle.
This answer relies on information you delivered, which is
Essentially, any non-numeric character that appears before numeric
ones needs to be returned for one column, and the other column would
show all non-numeric characters appearing AFTER numeric ones.
Everything non-numeric before a numeric value into 1 column
Everything non-numeric after a numeric value into 2 column
So there's assumption that you have a value that has a numeric value in it.
select
val,
regexp_matches(val,'([a-zA-Z]*)\d+') AS before_numeric,
regexp_matches(val,'\d+([a-zA-Z]*)') AS after_numeric
from
val;
Attached SQLFiddle for a preview.