PL/SQL to find Special Characters in multiple columns and tables - sql

I am trying to come up with a script that we can use to locate any special characters that may exist in a column of data except for period, dash or underscore, and using variables.
My Data - Employees table:
---------------------------------------------------------
ID | LASTFIRST | LAST_NAME | FIRST_NAME | MIDDLE_NAME
---------------------------------------------------------
57 | Miller, Bob | Miller | &^$#*)er | NULL
58 | Smith, Tom | Smith | Tom | B
59 | Perry, Pat | Perry | P. | Andrew
My Script:
VAR spchars VARCHAR
spchars := '!#$%&()*+/:;<=>?#[\\\]^`{}|~'
select *
from (select dcid, LastFirst, Last_Name, First_Name, middle_name,
CASE WHEN REGEXP_LIKE(First_Name, '[ || spchars || ]*$' )
THEN '0' ELSE '1' END AS FNSPC
from employees)
where FNSPC = '0';
/
And all rows are returned.
Any idea what I am doing wrong here? I want to only select Bob Miller's row.

REGEXP, Schmegexp! ;-)
select * from employees
where translate (first_name, 'x!#$%&()*+/:;<=>?#[\]^`{}|~', 'x') != first_name;
That translates all the special characters to nothing, i.e. removes them from the string - hence changing the string value.
The 'x' is just a trick because translate doesn't work as you'd like if the 3rd parameter is null.

Related

how to loop an array in string in a where clause

I have an information table with a column of an array in string format. The length is unknown starting from 0. How can I put it in a where clause of PostgreSQL?
* hospital_information_table
| ID | main_name | alternative_name |
| --- | ---------- | ----------------- |
| 111 | 'abc' | 'abe, abx' |
| 222 | 'bbc' | '' |
| 333 | 'cbc' | 'cbe,cbd,cbf,cbg' |
​
​
* record
| ID | name | hospital_id |
| --- | ------- | ------------ |
| 1 | 'abc-1' | |
| 2 | 'bbe+2' | |
| 3 | 'cbf*3' | |
​
e.g. this column is for alternative names of hospitals. let's say e.g. 'abc,abd,abe,abf' as column Name and '111' as ID. And I have a record with a hospital name 'cbf*3' ('3' is the department name) and I would like to check its ID. How can I check all names one by one in 'cbe,cbd,cbf,cbg' and get its ID '333'?
--update--
In the example, in the record table, I used '-', '*', '+', meaning that I couldn't split the name in the record table under a certain pattern. But I can make sure that some of the alternative names may appear in the record name (as a substring). something similar to e.g. 'cbf' in 'cbf*3'. I would like to check all names, if 'abe' in 'cbf*3'? no, if 'abx' in 'cbf*3'? no, then the next row etc.
--update--
Thanks for the answers! They are great!
For more details, the original dataset is not in alphabetic languages. The text in the record name is not separable. it is really hard to find a separator or many separators. Therefore, for the solutions with regrex like '[-*+]' could not work here.
Thanks in advance!
You could use regexp_split_to_array to convert the coma-delimited string to a proper array, and then use the any operator to search inside it:
SELECT r.*, h.id
FROM record r
JOIN hospital_information h ON
SPLIT_PART(r.name, '-', 1) = ANY(REGEXP_SPLIT_TO_ARRAY(h.name, ','))
SQLFiddle demo
Substring can be used with a regular expression to get the hospital name from the record's name.
And String_to_array can transform a CSV string to an array.
SELECT
r.id as record_id
, r.name as record_name
, h.id as hospital_id
FROM record r
LEFT JOIN hospital_information h
ON SUBSTRING(r.name from '^(.*)[+*\-]\w+$') = ANY(STRING_TO_ARRAY(h.alternative_name,',')||h.main_name)
WHERE r.hospital_id IS NULL;
record_id
record_name
hospital_id
1
abc-1
111
2
bbe+2
222
3
cbf*3
333
Demo on db<>fiddle here
Btw, text [] can be used as a datatype in a table.

Similarity search for name surname

I have a column name which contains name surname (name space surname) and I would like to search it based on
name, surname but I would like to match cases where people accidentally inserted surname name in a different order
misspelled names surnames by 1-2 characters.
You should read about the pg_trgm extension and its function similarity(). A few examples below.
Example data:
create table my_table(id serial primary key, name text);
insert into my_table (name) values
('John Wilcock'),
('Henry Brown'),
('Jerry Newcombe');
create extension if not exists pg_trgm; -- install the extension
Example 1:
select *,
similarity(name, 'john wilcock') as "john wilcock",
similarity(name, 'wilcock john') as "wilcock john"
from my_table;
id | name | john wilcock | wilcock john
----+----------------+--------------+--------------
1 | John Wilcock | 1 | 1
2 | Henry Brown | 0 | 0
3 | Jerry Newcombe | 0.037037 | 0.037037
(3 rows)
Example 2:
select *,
similarity(name, 'henry brwn') as "henry brwn",
similarity(name, 'brovn henry') as "brovn henry"
from my_table;
id | name | henry brwn | brovn henry
----+----------------+------------+-------------
1 | John Wilcock | 0 | 0
2 | Henry Brown | 0.642857 | 0.6
3 | Jerry Newcombe | 0.04 | 0.0384615
(3 rows)
Example 3:
select *
from my_table
where similarity(name, 'J Newcombe') >= 0.6;
id | name
----+----------------
3 | Jerry Newcombe
(1 row)
To counter the exchanged parts of the name you could use split_part() to split the name in its two parts and compare both of them, something similar to the following:
SELECT *
FROM person
WHERE split_part(name, ' ', 1) IN ('<given_name_searched_for>'
'<surname_searched_for>')
OR split_part(name, ' ', 2) IN ('<given_name_searched_for>'
'<surname_searched_for>');
Or have a look at the other string functions and operators. -- there a variants of split functions using regular expressions, e.g..
Are there names like 'John F. Kennedy', that is, with more than one token? Are there names with more than one contiguous spaces? Bear in mind that these have to be addressed with further means if any. (Such things can get hairy. If possible consider revising your design and use a separate column for the surname.)
For the similarity part: PostgreSQL provides some modules, that might be useful here:
fuzzystrmatch
pg_trm

Query WHERE Only Alphabetic Characters

I am trying to filter out data in my Excel sheet of customers for my company.
The three fields I need to by are FIRST_NAME, LAST_NAME, and COMPANY_NAME.
The rules are as follows:
FIRST_NAME AND LAST_NAME must NOT be NULL
FIRST_NAME AND LAST_NAME must be only alphabetic
The above rules are irrelevant IF COMPANY_NAME is NOT NULL
So, just to reiterate to be clear.. A customer must have a FIRST_NAME AND a LAST_NAME (They cannot be missing one or both), BUT, if they have a COMPANY_NAME they are allowed to not have a FIRST_NAME and/or LAST_NAME.
Here's some example data and if they should stay in the data or not:
FIRST_NAME | LAST_NAME | COMPANY_NAME | Good customer?
-----------|-----------|--------------|--------------------------------
Alex | Goodman | AG Inc. | Yes - All are filled out
John | Awesome | | Yes - First and last are fine
Cindy | | Cindy Corp. | Yes - Company is filled out
| | Blank Spa | Yes - Company is filled out
| | | No - Nothing is filled out
Gordon | Mang#2 | | No - Last contains non-alphabet
Jesse#5 | Levvitt | JL Inc. | Yes - Company is filled out
Holly | | | No - No last or company names
Here is the query (With some fields in the SELECT clause removed):
SELECT VR_CUSTOMERS.CUSTOMER_ID, VR_CUSTOMERS.FIRST_NAME, VR_CUSTOMERS.LAST_NAME, VR_CUSTOMERS.COMPANY_NAME, ...
FROM DEV.VR_CUSTOMERS VR_CUSTOMERS
WHERE (
LENGTH(NAME)>4 AND
(UPPER(NAME) NOT LIKE UPPER('%delete%')) AND
(COMPANY_NAME IS NOT NULL OR (COMPANY_NAME IS NULL AND FIRST_NAME IS NOT NULL AND LAST_NAME IS NOT NULL AND FIRST_NAME LIKE '%^[A-z]+$%' AND LAST_NAME LIKE '%^[A-z]+$%'))
)
I've tried as well the regex of '%[^a-z]%'. I've tried RLIKE and REGEXP, instead of LIKE, and those did not seem to work either.
With the above query, the results only show records with a COMPANY_NAME.
Fixed the issue using REGEXP_LIKE and the regex ^[A-z]+$.
Here is the WHERE clause after this fix:
WHERE (
LENGTH(NAME)>4 AND
(UPPER(NAME) NOT LIKE UPPER('%delete%')) AND
(COMPANY_NAME IS NOT NULL OR (COMPANY_NAME IS NULL AND REGEXP_LIKE(FIRST_NAME, '^[A-z]+$') AND REGEXP_LIKE(LAST_NAME, '^[A-z]+$')))
)
It appears you're using MySQL given your mention of RLIKE and REGEXP. In that case, try this WHERE clause, that uses the regular expression character class 'alpha':
WHERE
COMPANY_NAME is not null -- COMPANY_NAME being present is the higher priority pass condition
or ( -- but if COMPANY_NAME is not present, then the following conditions must be satisfied
FIRST_NAME is not null
and FIRST_NAME REGEXP '[[:alpha:]]+'
and LAST_NAME is not null
and LAST_NAME REGEXP '[[:alpha:]]+'
)
Bear in mind that the not null check is redundant given the regular expression, so the WHERE clause would simplify itself to:
WHERE
COMPANY_NAME is not null -- COMPANY_NAME being present is the higher priority pass condition
or ( -- but if COMPANY_NAME is not present, then the following conditions must be satisfied
FIRST_NAME REGEXP '[[:alpha:]]+'
and LAST_NAME REGEXP '[[:alpha:]]+'
)

Specify Which Column Comes First SQL

I am processing a large list of church members in order to send them a letter. We want the letter to say "Dear John & Jane Smith". We will use Word to do the mail merge from an Excel sheet. The important thing is the male name has to always come first.
Each individual has their own row in the table I am using. They have a unique ID as well as a family ID. I am using that family ID to put families together on the same row. Currently I have the male name and the female name separated using MAX(CASE WHEN) in order to specify what goes where. It looks something like this:
+-----------+------------+--------------------------+
| family id | male name | female name | last name |
+-----------+------------+--------------------------+
| 1234 | john | jane | doe |
| 1235 | bob | cindy | smith |
| 1236 | NULL | susan | jones |
| 1237 | jim | NULL | taylor |
+-----------+------------+--------------------------+
But I run into a problem when the family only has one member.
Here's a part of the query I have:
SELECT
fm.family_id AS 'Family ID',
MAX(CASE WHEN PB.gender like 'm' and FM.role_luid=29 THEN PB.nick_name END)
AS 'Male Name',
MAX(CASE WHEN PB.gender like 'f' and FM.role_luid=29 THEN PB.nick_name END)
AS 'Female Name',
PB.last_name AS 'Last Name',
FROM core_family F
I was thinking that I need to combine rows using STUFF or something like that, but I'd need some way of specifying which column comes first so that the male name always comes first. Essentially, as stated above, I need the letter to read "Dear John & Jane Smith" for families with two people and "Dear John Smith" for families with one person. So I am hoping my results might look like:
+-----------+--------------+-----------+
| family id | First name | last name |
+-----------+--------------+-----------+
| 1234 | john & jane | doe |
| 1235 | bob & cindy | smith |
| 1236 | susan | jones |
| 1237 | jim | taylor |
+-----------+--------------+-----------+
You can use your intermediate table (assuming you don't have 3 names for a family id).
From the table you indicated use:
select
id
, coalesce(male_name+' & '+female_name,male_name, female_name)
, last_name
from F;
Here is an example with your data
Basically if you concatenate using + in Sql Server you will get null. So if either male or female name is NULL, you get NULL. Coalesce will move on to the next value if it sees NULL. This way you either get a pair with '&' or a single name for each family.
I've created some test data. This technique works with the test data.
CREATE TABLE #Temp (FamID INT,
MaleName VARCHAR(20),
FemaleName VARCHAR(20),
LName VARCHAR(20))
INSERT #Temp
VALUES (1234, 'John' ,'Jane' , 'Doe' ),
(1235, 'Bob' , 'Cindy' , 'Smith'),
(1236 , NULL , 'Susan' , 'Jones'),
(1237 , 'Jim' , NULL , 'Taylor')
Here is your query.
SELECT FamID,
ISNULL(MaleName+' ','') +
CASE WHEN MaleName IS NULL OR FemaleName IS NULL THEN '' ELSE 'and ' END+
ISNULL(FemaleName,'') AS FirstName,
LName
FROM #Temp
You can use like this
SELECT
fm.family_id AS 'Family ID',
MAX(CASE WHEN PB.gender like 'm' and FM.role_luid=29 THEN PB.nick_name END)
+ '&'+
MAX(CASE WHEN PB.gender like 'f' and FM.role_luid=29 THEN PB.nick_name END)
AS 'First Name',
PB.last_name AS 'Last Name',
FROM core_family F

How to get initials easily out of text field using Postgres

I am using Postgres version 9.4 and I have a full_name field in a table.
In some cases, I want to put initials instead of the full_name of the person in my table.
Something like:
Name | Initials
------------------------
Joe Blow | J. B.
Phil Smith | P. S.
The full_name field is a string value (obviously) and I think the best way to go about this is to split the string into an array foreach space i.e.:
select full_name, string_to_array(full_name,' ') initials
from my_table
This produces the following result-set:
Eric A. Korver;{Eric,A.,Korver}
Ignacio Bueno;{Ignacio,Bueno}
Igmar Mendoza;{Igmar,Mendoza}
Now, the only thing I am missing is how to loop through each array element and pull the 1st character out of it. I will end up using substring() to get the initial character of each element - however I am just stuck on how to loop through them on-the-fly..
Anybody have a simple way to go about this?
Use unnest with string_agg:
select full_name, string_agg(substr(initials, 1,1)||'.', ' ') initials
from (
select full_name, unnest(string_to_array(full_name,' ')) initials
from my_table
) sub
group by 1;
full_name | initials
------------------------+-------------
Phil Smith | P. S.
Joe Blow | J. B.
Jose Maria Allan Pride | J. M. A. P.
Eric A. Korver | E. A. K.
(4 rows)
In Postgres 14+ you can replace unnest(string_to_array(...)) with string_to_table(...).
Test it in db<>fiddle.
You can also create a helper function for this, in case you want to use similar logic in multiple queries. Check this out
--
-- Function to extract a person's initials from the full name.
--
DROP FUNCTION IF EXISTS get_name_initials(TEXT);
CREATE OR REPLACE FUNCTION get_name_initials(full_name TEXT)
RETURNS TEXT AS $$
DECLARE
result TEXT :='';
part VARCHAR :='';
BEGIN
FOREACH part IN ARRAY string_to_array($1, ' ') LOOP
result := result || substr(part, 1, 1) || '.';
END LOOP;
RETURN result;
END;
$$ LANGUAGE plpgsql;
Now you can simply use this function to get the initials like this.
SELECT full_name, get_name_initials(full_name) as initials
FROM my_table;
SELECT get_name_initials('Phil Smith'); -- Returns P. H.
SELECT get_name_initials('Joe Blow'); -- Returns J. B.
SqlFiddleDemo
WITH add_id AS (
SELECT n.*, row_number() OVER (ORDER BY "Name") AS id
FROM names n
),
split_names AS (
SELECT id, regexp_split_to_table("Name", E'\\s+') AS single_name
FROM add_id
),
initials AS (
SELECT id, left(single_name, 1) || '.' AS initial
FROM split_names
),
final AS (
SELECT id, string_agg(initial, ' ')
FROM initials
GROUP BY id
)
SELECT a.*, f.*
FROM add_id a
JOIN final f USING (id)
For debug I create the Initial to Show how match the string_agg
| Name | Initials | id | id | string_agg |
|----------------|----------|----|----|------------|
| Eric A. Korver | E. A. K. | 1 | 1 | E. A. K. |
| Igmar Mendoza | I. M. | 2 | 2 | I. M. |
| Ignacio Bueno | I. B. | 3 | 3 | I. B. |
| Joe Blow | J. B. | 4 | 4 | J. B. |
| Phil Smith | P. S. | 5 | 5 | P. S. |
After some work I got a compact version SqlFiddleDemo
SELECT "Name", string_agg(left(single_name, 1) || '.', '') AS Initials
FROM (
SELECT
"Name",
regexp_split_to_table("Name", E'\\s+') AS single_name
FROM names
) split_names
GROUP BY "Name"
OUTPUT
| Name | initials |
|----------------|----------|
| Eric A. Korver | E.K.A. |
| Igmar Mendoza | M.I. |
| Ignacio Bueno | I.B. |
| Joe Blow | B.J. |
| Phil Smith | P.S. |