Remove sub string from a column's text - sql

I've the following two columns in Postgres table
name | last_name
----------------
AA | AA aa
BBB | BBB bbbb
.... | .....
.... | .....
How can I update the last_name by removing name text from it?
final out put should be like
name | last_name
----------------
AA | aa
BBB | bbbb
.... | .....
.... | .....

UPDATE table SET last_name = regexp_replace(last_name, '^' || name || ' ', '');
This only removes one copy from the beginning of the column and correctly removes the trailing space.
Edit
I'm using a regular expression here. '^' || name || ' ' builds the regular expression, so with the 'Davis McDavis' example, it builds the regular expression '^Davis '. The ^ causes the regular expression to be anchored to the beginning of the string, so it's going to match the word 'Davis' followed by a space only at the beginning of the string it is replacing in, which is the last_name column.
You could achieve the same effect without regular expressions like this:
UPDATE table SET last_name = substr(last_name, length(name) + 2);
You need to add two to the length to create the offset because substr is one-based (+1) and you want to include the space (+1). However, I prefer the regular expression solution even though it probably performs worse because I find it somewhat more self-documenting. It has the additional advantage that it is idempotent: if you run it again on the database it won't have any effect. The substr/offset method is not idempotent; if you run it again, it will eat more characters off your last name.

Not sure about syntax, but try this:
UPDATE table
SET last_name = TRIM(REPLACE(last_name,name,''))
I suggest first to check it by selecting :
SELECT REPLACE(last_name,name,'') FROM table

you need the replace function see http://www.postgresql.org/docs/8.1/static/functions-string.html
UPDATE table SET last_name = REPLACE(last_name,name,'')

Related

SQL: Select rows that contain a word

The goal is to select all rows that contain some specific word, can be in the beginning or the end of the string and/or surrounded by white-space, should not be inside other word, so to speak.
Here are couple rows in my database:
+---+--------------------+
| 1 | string with test |
+---+--------------------+
| 2 | test string |
+---+--------------------+
| 3 | testing stringtest |
+---+--------------------+
| 4 | not-a-test |
+---+--------------------+
| 5 | test |
+---+--------------------+
So in this example, selecting word test, should return rows 1, 2 and 5.
Problem is that for some reason, SELECT * FROM ... WHERE ... RLIKE '(\s|^)test(\s|$)'; returns 0 rows.
Where am I wrong and maybe, how it could be done better?
Edit: Query should also select the row with just a word test.
The answer to my first question is:
I haven't escaped special characters, so \s should be \\s.
Working query: SELECT * FROM ... WHERE ... RLIKE '(\\s|^)test(\\s|$)';. (or just a space ( |^)/( |$), also works)
Hi you could grab with trailing space and with leading space
SELECT * from new_table
where text RLIKE(' test')
union
SELECT * from new_table
where text RLIKE('test ')
REGEXP_INSTR() function, which's is an extension of the INSTR() function, might be used for version 10.0.5+ case-insensitively as default :
SELECT *
FROM t
WHERE REGEXP_INSTR(str, 'TeSt ')>0
OR REGEXP_INSTR(str, ' tESt')>0
Demo
SELECT * FROM ...
WHERE ... LIKE 'test';
This should do the trick.
Is this what you want?
SELECT * FROM ... WHERE ... LIKE
'%test%';
Use word boundary tests:
Before MySQL 8.0, and in MariaDB:
WHERE ... REGEXP '[[:<:]]test[[:>:]]'
MySQL 8.0:
WHERE ... REGEXP '\btest\b'
(If that does not work, double up the backslashes; this depends on whether the client is collapsing backslashes before MySQL gets them.)
Note that this solution will also work with punctuation such as the comma in "foo, test, bar"

getting the string path piece by piece with regex (SQL -Athena)

i want to convert a string into rows in SQL in Amazon Athena
Since Athena not support certain functions im forced doing many regex functions
a input (who can also have different lengths ) can look like this:
v1 facility username utm_parameter
and i want to turn this into a table who will look like this
1st | 2nd | 3rd | 4th
------ | ------ | ----- | -----
v1 | facility |username | utm_parameter
i allready filter out the first piece of text out of the string with this code:
SELECT REGEXP_EXTRACT( REGEXP_replace( REGEXP_REPLACE( REGEXP_EXTRACT( REGEXP_EXTRACT(message,'path=\S+'),'"(.*?)"'),'/', ' '),'"',''),'\S+') AS '1st' from data
but i dont know how to get the text part after the next blank spaces with the regex
does anyone know how i write the next regex function?
Try this:
-- input, don't use in real query
WITH
input(message) AS (
SELECT 'v1 facility username utm_parameter'
)
-- input end, start real query here
SELECT
SPLIT_PART(message,' ',1) AS "1st"
, SPLIT_PART(message,' ',2) AS "2nd"
, SPLIT_PART(message,' ',3) AS "3rd"
, SPLIT_PART(message,' ',4) AS "4th"
FROM input;
1st|2nd |3rd |4th
v1 |facility|username|utm_parameter
And, for the rest, it's like spelling the word Mississippi: you need to know when to stop.....

how to skip particular character(s) in SQL LIKE query

I have a table(say users) in which there is a column say name.
you may think table structure a shown below:
-------------
name
--------------
Abdul Khalid
--------------
Abdul, Khalid
--------------
Abdul - Khalid
--------------
other names
My question is can I do some query to find all the 3 rows in which the name column value is "Abdul Khalid"(basically "Abdul Khalid" or "Abdul, Khalid" or "Abdul - Khalid" if I skip the "," and "-" character).
You can use like:
select t.*
from t
where name like 'Abdul%Khalid';
If you want the names anywhere in the string (but in that order), then put wildcards at the beginning:
select t.*
from t
where name like '%Abdul%Khalid%';
If you are passing in the value as a variable:
select t.*
from t
where name like replace('Abdul Khalid', ' ', '%');
For PostgreSQL is better to use '~'
name ~ '^Abdul[ ,-]Khalid$'
OR if you want also in middle of string:
name ~ 'Abdul[ ,-]Khalid'
Or you can use translate (with index on it) for any SQL:
translate(name, ' ,-') = 'AbdulKhalid'
you also can use REGEXP like this:
SELECT * from yourTable where name REGEXP 'Abdul( |, | - )Khalid';

Firebird Database Split String on Field

Currently working with a Firebird 1.5 database and attempting to pull the data in the correct format natively with SQL.
Consider the following database:
ID | Full Name
1 Jon Doe
2 Sarah Lee
What I am trying to achieve is a simple split on the full name field (space) within a query.
ID | First Name | Last Name
1 Jon Doe
2 Sarah Lee
The issue faced is Firebird POSITION() was introduced in v2.0. Is there any known workaround to split on a space that anyone has come across?
Much appreciate your assistance!
For Firebird 1.5, a solution is to find a UDF that either combines both functions, or provides the position (I don't use UDFs, so I am not sure if one already exists). If none is available, you might have to write one.
The other solution is to write a stored procedure for this functionality, see for example: Position of substring function in SP
CREATE PROCEDURE Pos (SubStr VARCHAR(100), Str VARCHAR(100))
RETURNS (Pos INTEGER) AS
DECLARE VARIABLE SubStr2 VARCHAR(201); /* 1 + SubStr-lenght + Str-length */
DECLARE VARIABLE Tmp VARCHAR(100);
BEGIN
IF (SubStr IS NULL OR Str IS NULL)
THEN BEGIN Pos = NULL; EXIT; END
SubStr2 = SubStr || '%';
Tmp = '';
Pos = 1;
WHILE (Str NOT LIKE SubStr2 AND Str NOT LIKE Tmp) DO BEGIN
SubStr2 = '_' || SubStr2;
Tmp = Tmp || '_';
Pos = Pos + 1;
END
IF (Str LIKE Tmp) THEN Pos = 0;
END
This example (taken from the link) can be extended to then use SUBSTRING to split on the space.
For searching on a single character like a space, a simpler solution can probably be devised than above stored procedure. For your exact needs you might need to write a selectable stored procedure specifically for this purpose.
However, upgrading your database to Firebird 2.5 will give you much more powerful internal functions that simplify this query (and your life)!
I also wanted to split a full name string to first and last name and I used the following SQL statements in firebird 2.1 Database:
Patients is the table name.
The Name field holds the full name string e.g.: "Jon Doe". The FIRST_NAME field will store the first name and the LAST_NAME field the last name
First get the first name (string part before the first space) and execute a TRIM UPDATE statement to remove any spaces.
UPDATE "Patients" SET "Patients".FIRST_NAME = (SUBSTRING("Patients"."Name" FROM 1 FOR (POSITION(' ' IN "Patients"."Name"))))
UPDATE "Patients" SET "Patients".FIRST_NAME = TRIM(BOTH ' ' FROM "Patients".FIRST_NAME)
Then get the last name (the string after the first space) and execute a TRIM UPDATE statement to remove any spaces
UPDATE "Patients" SET "Patients"."LAST_NAME" = (SUBSTRING("Patients"."Name" FROM (POSITION(' ' IN "Patients"."Name")+1)))
UPDATE "Patients" SET "Patients".LAST_NAME = TRIM(BOTH ' ' FROM "Patients".LAST_NAME)
The result will be:
ID | NAME | FIRST_NAME | LAST_NAME
1 Jon Doe Jon Doe
2 Sarah Lee Sarah Lee
You could use a UDF, but that isn't strictly SQL
you could write a stored procedure to parse and split but thats not strictly SQL either

Format phone number in Oracle with country code

I have a requirement to format phone numbers in the following way:
No spaces
No special characters
Remove preceding zero - if area code exists
Remove country code if present e.g. +44
For instance this: (03069) 990927 would become: 3069990927.
So far I have come up this this:
replace(replace(replace(replace(replace(replace(substr(replace(ltrim([VALUE],0), ' ', ''),nvl(length(substr(replace(ltrim([VALUE],0), ' ', ''),11)),0)+1), '-', ''), '(', ''), ')', ''),'/', ''), '.', ''), '+', '')
Is there a shorter version of this, maybe using a regular expression?
The final version of this snippet will become a column in a view that will return the following columns:
Customer Number
Customer Name
Country
Formatted Phone Number
The formatted phone number will be concatenated with the international dial code (e.g. +44) that are saved in the database in a table - DIALCODE_TAB(COUNTRY_CODE, CODE). Below is an example using the replace syntax above:
CREATE OR REPLACE FORCE VIEW "CUST_PHONE" ("CUSTOMER_ID", "NAME", "COUNTRY", "PHONE_NUMBER") AS
select
cicm.customer_id,
cicm.name,
dct.country,
dct.code || replace(replace(replace(replace(replace(replace(substr(replace(ltrim(cicm.value,0), ' ', ''),nvl(length(substr(replace(ltrim(cicm.value,0), ' ', ''),11)),0)+1), '-', ''), '(', ''), ')', ''),'/', ''), '.', ''), '+', '') phone_number
from customer_info_comm_method cicm
join dialcode_tab dct
on dct.country_code = customer_info_api.get_country_code(cicm.customer_id)
where cicm.method_id_db = 'PHONE'
--and dct.code || replace(replace(replace(replace(replace(replace(substr(replace(ltrim(cicm.value,0), ' ', ''),nvl(length(substr(replace(ltrim(cicm.value,0), ' ', ''),11)),0)+1), '-', ''), '(', ''), ')', ''),'/', ''), '.', ''), '+', '') = [phone_number]
--in terms of performance this SQL has to be written so that it returns all the records or a specific record when searching for the phone number - very quickly (<10s).
WITH read only;
N.B. A customer record can have more than 1 phone number and the same phone number can exist on more than 1 customer record.
To begin with a remark: This only works if the country is stored elsewhere for the record and there are no telephone numbers without an area code. Otherwise one would not be able to reconstruct the complete phone number again.
Then: How are country codes represented in your data? Is it always +44 or can it be 0044? Be careful here. Especially don't remove a single zero (assuming it's an area code), when it's actually the first of two zeros representing the country code :-)
Then: You need a list of all country codes. Let's take for example +1441441441. Where does the country code end? (Solution: +1441 is Bermudas.)
As to "no spaces" and "no special characters" you can solve this best with regexp_replace.
So all in all not so simple a task as you obviously expected it to be. (But not too hard to do either.)
I would use PL/SQL for this.
Hope my hints help you. Good luck.
EDIT: Here is what is needed. I still think a PL/SQL function will be best here.
Make sure your DIALCODE_TAB contains all country codes necessary.
1. Trim the phone number.
2. Then check if its starts with a country identifyer (+, 00).
2.1. If so: remove that. Remove all non-digits. Look up the country code in your table and remove it.
2.2. If not so: check if it starts with an area identifyer (0).
2.2.1. If so: remove it.
2.2.2. In any case: remove all non-digits.
That should do it, provided the numbers are valid. In Germany sometimes people write +49(0)40-123456, which is not valid, because one either uses a country code or an area code, not both in the same number. The (0) would have to be removed to make the number valid.
SELECT LTRIM(REGEXP_REPLACE(
REGEXP_REPLACE('+44(03069) 990927',
'(\+).([[:digit:]])+'), -- to strip off country code
'[^[:alnum:]]'),-- Strip off non-aplanumeric [:digit] if only digit
'0') -- Remove preceding Zero
FROM DUAL;
Wont work for +44990927 (If country code ends without any space or something or country didnt start with +)
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE phone_numbers ( phone_number ) AS
SELECT '(03069) 990927' FROM DUAL
UNION ALL SELECT '+44 1234 567890' FROM DUAL
UNION ALL SELECT '+44(0)1234 567890' FROM DUAL
UNION ALL SELECT '+44(012) 34-567-890' FROM DUAL
UNION ALL SELECT '+44-1234-567-890' FROM DUAL
UNION ALL SELECT '+358-1234567890' FROM DUAL;
Query 1:
If you are just dealing with +44 international dialling codes then you could:
use ^\+44|\D to strip the +44 international code and all non-digit characters; then
use ^0 to strip a leading zero if its present.
Like this:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
phone_number,
'^\+44|\D',
''
),
'^0', '' ) AS phone_number
FROM phone_numbers
Results:
| PHONE_NUMBER |
|---------------|
| 3069990927 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 3581234567890 |
(You can see it doesn't work for the final number with a +358 international code.)
Query 2:
This can be simplified into a single regular expression (that's slightly less readable):
SELECT REGEXP_REPLACE(
phone_number,
'^(\+44)?\D*0?|\D',
''
) AS phone_number
FROM phone_numbers
Results:
| PHONE_NUMBER |
|---------------|
| 3069990927 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 3581234567890 |
Query 3:
If you want to deal with multiple international dialling codes then you will need to know which ones are valid (see http://en.wikipedia.org/wiki/List_of_country_calling_codes for a list).
This is an example of a regular expression which will strip out valid international dialling codes beginning with +3, +4 or +5 (I'll leave all the other dialling codes for you to code up):
SELECT REGEXP_REPLACE(
phone_number,
'^(\+(3[0123469]|3[57]\d|38[01256789]|4[013456789]|42[013]|5[09]\d|5[12345678]))?\D*0?|\D',
''
) AS phone_number
FROM phone_numbers
Results:
| PHONE_NUMBER |
|--------------|
| 3069990927 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
If the + at the start of the international dialling code is optional then just replace \+ (near the start of the regular expression) with \+?.