RegEx with spaces and delimiters - sql

I have a two columns with the following data:
Column 1: BIG123 - Telecommunications (John Barrot)
Column 2: 7 Congressional 1 - Toward
The data format is the same with spaces and the "-" as the delimiter for each column, but the organization, names, and beginning code can be longer or shorter than what you see here(instead of Telecommunications it can be CEO or instead of John Barrott it can be Guy Rodriguez, etc). I need to extract the following:
(Column names are in bold)
Organization Telecommunications
Supervisor John Barrot
Profile
Congressional 1 - Toward
I have been using the following cheat sheet but I am still having issues extracting: https://cheatography.com/davechild/cheat-sheets/regular-expressions/
I have tried regex_extract(column1, [A-Z][a-z]) and I only get the first two letters of column 1 after the "-".
Any help would be great.
Thanks,
DW

With your example try the following
with sample_data as (
select 'BIG123 - Telecommunications (John Barrot)' AS COLUMN_1, '7 Congressional 1 - Toward' as COLUMN_2
)
select regexp_extract(COLUMN_1, r'.+-\s(\S+)') as Organization
, regexp_extract(COLUMN_1, r'.+\((.+\w)') as Supervisor
, regexp_extract(COLUMN_2, r'\d+\s(.+)') as Profile
from sample_data

Related

SQL query to get the first letter of each word and seperate it by a dot and a space

I have never really used SQL much but recent changes due to working from home is forcing me to gain some knowledge in it. I have been doing fine so far but I am now running into a problem that I can't seem to find a solution for.
I have an excel sheet that pulls customer information trough a SQL query which runs by VBA code.
What I first needed to do is to get a full name from a customer and input this into the sheet. This works fine. I am using the following query for this:
Select concat(concat(Customer_First_Names,' '), Customer_Last_Name) FROM CustomerInformationTable where Customer_Number = &&1
This gives me the full name of a customer and spaces in between the first and last name and in between the names (the full first names are already spaced in between in the table).
Now, I got another request to not retrieve the first full first names and last name of a customer, but their initials and the last name.
For example:
Barack Hussein Obama
Would become
B. H. Obama
I need to do 2 things for this:
I need to change my query to retrieve only the initials for each first name. Like I said, all full first names (even if a customer has more then one first name) is located in the column Customer_First_Names.
I need to add a dot and a space after each initial.
How would I go on about this?
I have been thinking about using SUBSTRING but I am struggling on how to do this if there is more then one first name.
So this is not going to work:
Select concat(substr(Customer_First_Names, 1, 1), '. ') from CustomerInformationTable where Customer_Number = &&1
My apologies if this has already been ask on the board so far, I looked but I did not find a suitable solution.
Assuming you don't want to see 2 dots after someone who has just one first name (like J.. Smith), then here's a solution that works in postgres. Not sure what your db is, so you may need to adjust as needed.
The 'with' query is splitting apart the first names, limiting to two.
The 'case' statement then checks if the person has a second first name. If not, then only the first initial is provided and followed by a dot. Otherwise, both first initials are followed by a dot. Final results, all initials and names are separated by a space (like T. R. Smith).
So, a table looking like this:
cid first last
1 JAKE SMITH
2 TERREL HOWARD WILLIAMS
3 PHOEBE M KATES
Will produce the following results with the query below.
cid cust_name
1 J. SMITH
2 T. H. WILLIAMS
3 P. M. KATES
with first_names as
(select distinct customer_number ,
split_part(customer_first_name, ' ', 1) as first1,
split_part(customer_first_name, ' ', 2) as first2
from CustomerInformationTable
)
select distinct customer_number,
case
when fn.first2 = '' then substring(fn.first1, 1, 1) || '.'
else substring(fn.first1, 1, 1) || '. ' || substring(fn.first2, 1, 1) || '.'
end
|| ' ' || a.customer_last_name as cust_name
from CustomerInformationTable a
join first_names fn on fn.customer_number = a.customer_number

How to match entires in SQL based on their ending letter?

So I'm trying to match entries in two databases so in the new table the row is comprised of two words that end in the same ending letter. I'm working with two tables that have one column in each of them, each named word. table 1 contains the following data in order: Dog, High, It, Weeks, while table two contains the data: Bat, Is, Laugh, Sing. I need to select from both of these tables and match the words so that each row is as follows: Dog | Sing, High | Laugh, It | Bat, Weeks | Is
The screenshot is what I have so far for my SQL statement. I'm still early on in learning SQL so any info to help on this would be appreciated.
Recommend reading up on SUBSTR() for more information on why the below code works: https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2101.htm#OLADM679
SELECT
a.word
, b.word
FROM sec1313_words1 a
JOIN sec1313_words2 b
ON SUBSTR(b.word, -1) = SUBSTR(a.word, -1)
ORDER BY a.word

CASE ReGex with substring

I'm writing a SQL query where I am taking the substring of 2 names (First name/last name) to create an initials column, the data is unstructured to a certain extent (Can't show for GDPR reasons) but where there is a company name it is just in the surname column.
I'm trying to use Regex to say when the already present initials column is 1 letter (I.e not an initial) and if it is not an initial run a command that I wrote which successfully works.
CAST(CASE
WHEN [DATA_TABLE].[INITIALS] = '\d' THEN (CONCAT(substring([DATA_TABLE].[FIRSTNAMES],1,1),substring([DATA_TABLE].[SURNAME],1,1)) AS char) AS INITIALS
ELSE [DATA_TABLE].[INITIALS]
end as char) as INITIALS,
An example of the data format:
First name last name initials
John smith JS
Electrical company E
Sam Craig SC
I want the names that are just in the surname (Company names) to just remain as they are with no change (I.e The \d regex). Ones which don't will become the substring of their first name as (1,1) and a substring of their last name to also be (1,1).

Name Correction

Name Correction
As the wedding season is on, John has been given the work of printing guest names on wedding cards. John has written code to print only those names that start with upper-case alphabets and reject those that start with lower-case alphabets or special characters.
Your job is to do the following:
1.Correct the rejected names (names which start with lower case or with a special character). You have to change the first alphabet of the rejected name to Upper case and in the case of special character there will be no change.
2.Output the newly corrected names in ascending order.
Table format
Table: person
Field Type
name varchar(20)
Sample
Sample person table
name
mohit
Kunal
manoj
Raj
tanya
#man
Sample output table
name
#man
Manoj
Mohit
Tanya
Solution Attempted: IN SQL SERVER 2014
select name
from person as per
where (left(per.name,1) like '%[^A-Z]%' or left(per.name,1) like '% %')
union
select Upper(left(per.name,1))+right(per.name,len(per.name)-1)
from person as per
where left(per.name,1)<>left(Upper(per.name),1)
collate Latin1_General_CS_AI
order by per.name
Sample Test Cases Passes,
Still getting wrong answer in some competitor exam.
Please suggest what test case i have not handled.
Since you are only interested in correcting lower case and reporting special characters in the first character position I would use ascii comparision rather than regex.
select name, ascii(left(name,1)),
case
when ascii(left(name,1)) between 97 and 122 then
concat(char(ascii(left(name,1)) - 32),substring(name,2,len(name) -1))
else name
end name
from t
where ascii(left(name,1)) <= 64 or
ascii(left(name,1)) >= 91

SQL get student with last name 5 characters

I have a table called Students.
This table has two fields (ID, Name)
i need to Select all the students whose last name have 5 characters.
For example if i have in this table two records.
Student 1: ID - 1
Name - Roman Jatt Pearce
Student 2: ID:2
Name: Matt Crazy
The query i wanted should only return Matt Crazy since his last name has 5 characters and roman pearce's doesnt.
someone told me to use charindex but i dont really know how to implement it
any suggestion?
Assuming the format of Name is always "First Middle Last", no names contain spaces, and there are no other things like generation listed (Jr., Sr., et al).
SELECT *
FROM Students
WHERE CHARINDEX(' ', REVERSE(Name)) = 6
How about
select * from Students where Name like '% _____'
with dash symbol coming five times