How can I replace multiples value using regex_repalce in sql? - apache-spark-sql

I have column in which different values are there like
refer_nO
Post-factos022110
P0st-fact04433
Postfact0s304004
Postfact202934
Now I want to keep the numbers and replace all non-numbers values with blank, Wanting something like this
refer_nO
022110
04433
304004
202934
To replace one single value I can use
select regex_replace(column1,"Post-factos"," ") from table
How can I replace mutiples values ?

Use regex [^0-9]+ to remove all non-numeric characters:
select regexp_replace(column1,'[^0-9]+','') from table

Related

extract all numbers from start of string?

I have a table which contains some bad data I am trying to clean up.
An example of the fields is below
36234735HAN876
2342JOE9823
554444PUT003
What I want to do is remove all the numeric characters before the first alphabetical character so it would look like the below:
HAN876
JOE9823
PUT003
What would be the best way to achieve this? I have used the below method but this can only be used to extract ALL numeric from the string, not the ones before the alphabetical characters
How to get the numeric part from a string using T-SQL?
You could achieve this using PATINDEX to locate the first position of an alphabetical character in the string, and then use SUBSTRING to only return the characters after that position:
CREATE TABLE #temp (val VARCHAR(50));
INSERT INTO #temp VALUES ('36234735HAN876'), ('2342JOE9823'), ('554444PUT003'), ('TEST1234');
SELECT val,
SUBSTRING(val, PATINDEX('%[A-Z]%', val), LEN(val)) AS output
FROM #temp;
DROP TABLE #temp;
Outputs:
val output
36234735HAN876 HAN876
2342JOE9823 JOE9823
554444PUT003 PUT003
TEST1234 TEST1234
Note that I have created a temporary table with a column named val. You should change this to work with whatever the actual column is called.
About case sensitivity: If you are using a non-case sensitive collation this will work without issue. If your collation is case sensitive then you may need to alter the pattern being matched to cater for upper- and lower-case letters.
Use PATINDEX to find the first non-numeric character (or first alpha character, depending on the logic) and STUFF to remove them:
SELECT STUFF(V.YourString,1,ISNULL(NULLIF(PATINDEX('%[^0-9]%',V.YourString),0)-1,0),'')
FROM (VALUES('36234735HAN876'),
('2342JOE9823'),
('554444PUT003'),
('ABC123'))V(YourString)
If the logic is the first alpha character, instead of the first non-numeric, then the pattern would be [A-z].
The NULLIF and ISNULL are in there for when/if the string starts with a alpha/non-numeric and thus doesn't cause STUFF to error due to the 3rd parameter being -1. The is demonstrated with the additional example I put into the sample data ('ABC123').

Combine several columns in one with Teradata

I have 10 columns and their values could be either null, or a name of a fruit.
I would like to add another column with all the fruits that every row has. I have used Concat(column1 , column2,..., column10) as name.
Issue : There are no commas coming on the result and if I add the comma before concatenating, we are having them together, the last word is also a comma.
Any ideas?
Thanks!
You can use the standard concatenation (||) in conjunciton with COALESCE function, which returns the value of the first non-null argument.
Example:
select coalesce(column1||',', '')||coalesce(column2||',', '')|| ... ||coalesce(column10||, '');

Regex Postgres More than one dot

I need to return the fields that have more than one . in a specific column.
Now I have this query:
select *
from table
where column ~ '\.{2,}?';
But for some reason it returns nothing. If I use something like 'A{2,}?' it works. Apparently the problem is the dot.
It returns null since the dots are not next two each other. You have to consider the occurrences of the characters in the order of your regex meta characters. You could try this instead:
select *
from table
where column ~ '\.\d{3}\.';
Or instead of just focusing on the dot characters start parsing the string as a whole and consider the numbers as well:
where column ~ '^\d{3}\.\d{3}\.';
Why not just use like?
where column like '%.%.%'

Using SQL to make specific changes in a database.

I am trying to figure out some commands/code in SQL.
I have database with names, addresses IDs etc, but I have to convert firstname values ending in “jnr” to “(Jnr)” and those ending in “snr” to “(Snr)”.
How do I do this?
update table TABLE_NAME set NAMES = '*xyz*Jnr' where NAMES like '%jnr'
Update or select:
PASTE(column, CHAR_LENGTH(column)-3, 1, UPPER(SUBSTRING(column FROM CHAR_LENGTH(column)-3 FOR 1)
WHERE column LIKE '%jnr' OR column LIKE '%snr'
PASTE is used to put in one character at position 3 from end,
CHAR_LENGTH to get length of column value,
UPPER converts character to upper case,
SUBSTRING is used to pick one character here (j or s),
LIKE is used to find values ending with jnr, or snr.
All ANSI SQL (no dbms specified!)

replace two characters in one cell

I am using this query to replace one character in a cell
select replace(id,',','')id from table
But I want to replace two characters in a cell.
If the cell is having this data (1,3.1), and I want it to look like this (131).
How can I replace two different characters in one cell?
Use TRANSLATE instead of REPLACE(). It replaces each occurrence of a character in the first pattern with its matched character in the second. To remove characters, simply leave cut short the replacement string:
select translate(id, '1,.', '1') id from table
Note that the second string cannot be null. Hence the need to include 1 (or some other character) in both strings.
Find out more.
Obviously the more characters you need to convert/remove the more attractive TRANSLATE() becomes. The main use for REPLACE is changing patterns (such as words) rather than individual characters.
Can use
select replace(translate(id,',.',' '),' ','') from table;
or
select regexp_replace('1,3.1','[,.]','') from dual;
or
select replace(replace(id,',',''),'.','') from table;
Call the replace again.
select replace(replace(id,',',''), '.','') id from table
Do this:
select REPLACE(REPLACE(id,',',''),'.','')
Or use a regular expression:
select regexp_replace(id, '[.,]', '') id from table
Find out more