Matching pattern in SQL - sql

I have two tables and need to match all records by name. The problem is that in one table name format is FirstName LastName, in another table - LastName FirstName, and I cannot split into separate columns because some records might have few first names or last names, so I don't know where first or last name ends or starts.
Eg. in first table I have John Erick Smith and need to join all records from another table where the name is Smith John Erick.
Any solution in SQL?

I think you can use string functions to get the piece of string (in 'John Erick Smith' type column) after the last space as a surname and stick it to front. Then you could compare the strings. That is assuming you don't have spaces in surnames.
Here is MSDN article on how to do it.
DECLARE #string nvarchar(max) = 'FirstName SecondName Surname'
SELECT RIGHT(#string, NULLIF(charindex(' ', REVERSE(#string)),0)) + ' ' +
REVERSE(RIGHT(REVERSE (#string), len(#string) - NULLIF(charindex(' ', REVERSE(#string)),0)))
Returns:
Surname FirstName SecondName

Verify first if you still have other tables with "FirstName" and "LastName" that you can use instead of using the field with "FirstName LastName". Normally Oracle has this kind of tables use for persons/employees. You may have something like this.
But if the "LastName FirstName" uses "," (comma) in its data then you can do a substring to get the LastName from the FirstName.
Or another alternative is by using their IDs (eg. employee IDs) [if only available].

Related

SQL join table on partial string

I have two tables in a Postgres database:
Table A:
**Middle_name**
John
Joe
Fred
Jim Bob
Paul-John
Table B:
**Full_name**
Fred, Joe, Bobda
Jason, Fred, Anderson
Tom, John, Jefferson
Jackson, Jim Bob, Sager
Michael, Paul-John, Jensen
Sometimes the middle name is hyphenated or has a space between it. But there is never a comma in the middle name. If it is hyphenated or two middle names, the entries will still be the same in both Table A and Table B.
I want to join the tables on Middle_name and Full_name. The difficult part is that the join has to check only the values between the commas in Full_name. Otherwise it might match the first name accidentally.
I've been using the query below but I just realized that there is nothing stopping it from matching the middle name to a first name accidentally.
SELECT Full_name, Middle_name
FROM B
JOIN A
ON POSITION(Middle_name IN Full_name)>0
I'm wondering how I can refactor this query to match only the middle name (assuming they all appear in the same format).
use split_part('Fred, Joe, Bobda', ',', 2) which returns the middle name joe
SELECT Full_name, Middle_name
FROM B
JOIN A
ON split_part(B.Full_name, ',', 2)=A.Middle_name
demo for returning middle name
If there is always exactly one space after the comma, and everybody has a middle name like your sample data suggests, the space can just be part of the delimiter in split_part():
SELECT full_name, middle_name
FROM A
JOIN B ON split_part(B.full_name, ', ', 2) = A.middle_name;
Related:
Split comma separated column data into additional columns

SQL select specific letter from concatenated string

This may have been answered similarly somewhere but I am still kind of confused. I need to create a view named A7T7 that will display the concatenated first name and last name of the students who have at least three letter Os or three letter Ts in their concatenated name (e.g., John Olson, Richard Hooton, Tina Trout). The column heading should be Student and the rows should be sorted by the length of the concatenated names from long to short.
Not really sure how to make my WHERE statement for this restriction.
You can use a LIKE query - for example (using a table variable for SQL Server):
CREATE TABLE #students (firstname varchar(20), lastname varchar(20));
INSERT ALL
INTO #students VALUES ('John','Olson')
INTO #students VALUES ('Richard','Hooton')
INTO #students VALUES ('Tina','Trout')
SELECT 1 FROM dual;
SELECT *
FROM #students s
WHERE CONCAT(s.firstname, s.lastname) LIKE '%o%o%o%';
DROP TABLE #students;
Which roughly translates to "select all the rows from students where the concatenation of firstname and lastname contains o three times".

Basic SQL Query, I am newbie

I just started my database and query class on Monday. We met on Monday and just went over the syllabus, and on Wednesday the network at school was down so we couldn't even do the power point lecture. Right now I am working on my first homework assignment and I am almost finished but I am having trouble on one question.
Here is is...
Write a SELECT statement that returns one column from the Customers table named FullName that joins the LastName and FirstName columns.
Format the columns with the last name, a comma, a space, and the first name like this:
Doe, John
Sort the result set by last name in ascending sequence.
Return only the contacts whose last name begins with letters from M to Z.
Here is what I have so far...
USE md0577283
SELECT FirstName,LastName
FROM Customers
ORDER BY LastName,FirstName
My question is how do I format is Lastname, FirstName like the professor wants and how do I only select names M-Z?
If someone could point me in the right direction I would greatly appreciate it.
Thank you.
PS With all do respect, I didn't ask for the answer I asked for a nudge in the right direction so why the down vote guys?
USE md0577283
SELECT LastName + ', ' + FirstName FullName
FROM Customers
WHERE LastName LIKE '[M-Z]%'
ORDER BY LastName,FirstName
You want to add two things: create an expression to return the name in the requested format
(LastName + ", " + FirstName as Name)
USe a "where clause" to filter what is returned: where LastName >= "M" and LastName <= "Z" perhaps.
Simply write like this.
If you want to get names from m to z.
SELECT LastName, FirstName
FROM Customers
WHERE FirstName between 'M%' and 'Z%'
ORDER BY LastName, FirstName
If you want to get names from m and z only.
SELECT LastName, FirstName
FROM Customers
WHERE FirstName LIKE 'M%' OR FirstName LIKE 'Z%'
ORDER BY LastName, FirstName

How can I compare two name strings that are formatted differently in SQL Server?

What would be the best approach for comparing the following set of strings in SQL Server?
Purpose: The main purpose of this Stored Procedure is to compare a set of input Names of Customers to names that Exist in the Customer database for a specific accounts. If there is a difference between the input name and the name in the Database this should trigger the updating of the Customer Database with the new name information.
Conditions:
Format of Input: FirstName [MiddleName] LastName
Format of Value in Database: LastName, FirstName MiddleName
The complication arises when names like this are presented,
Example:
Input: Dr. John A. Mc Donald
Database: Mc Donald, Dr. John A.
For last names that consist of 2 or more parts what logic would have to be put into place
to ensure that the lastname in the input is being compared to the lastname in the database and likewise for the first name and middle name.
I've thought about breaking the database values up into a temp HASH table since I know that everything before the ',' in the database is the last name. I could then check to see if the input contains the lastname and split out the FirstName [MiddleName] from it to perform another comparison to the database for the values that come after the ','.
There is a second part to this however. In the event that the input name has a completely New last name (i.e. if the name in the database is Mary Smith but the updated input name is now Mary Mc Donald). In this case comparing the database value of the last name before the ',' to the input name will result in no match which is correct, but at this point how does the code know where the last name even begins in the input value? How does it know that her Middle name isn't MC and her last name Donald?
Has anyone had to deal with a similar problem like this before? What solutions did you end up going with?
I greatly appreciate your input and ideas.
Thank you.
Realistically, it's extremely computationally difficult (if not impossible) to know if a name like "Mary Jane Evelyn Scott" is first-middle-last1-last2, first1-first2-middle-last, first1-first2-last1-last2, or some other combination... and that's not even getting into cultural considerations...
So personally, I would suggest a change in the data structure (and, correspondingly, the application's input fields). Instead of a single string for name, break it into several fields, e.g.:
FullName{
title, //i.e. Dr., Professor, etc.
firstName, //or given name
middleName, //doesn't exist in all countries!
lastName, //or surname
qualifiers //i.e. Sr., Jr., fils, D.D.S., PE, Ph.D., etc.
}
Then the user could choose that their first name is "Mary", their middle name is "Jane Evelyn", and their last name is "Scott".
UPDATE
Based on your comments, if you must do this entirely in SQL, I'd do something like the following:
Build a table for all possible combinations of "lastname, firstname [middlename]" given an input string "firstname [middlename] lastname"
Run a query based on the join of your original data and all possible orderings.
So, step 1. would take the string "Dr. John A. Mc Donald" and create the table of values:
'Donald, Dr. John A. Mc'
'Mc Donald, Dr. John A.'
'A. Mc Donald, Dr. John'
'John A. Mc Donald, Dr.'
Then step 2. would search for all occurrences of any of those strings in the database.
Assuming MSSQL 2005 or later, step 1. can be achieved using some recursive CTE, and a modification of a method I've used to split CSV strings (found here) (SQL isn't the ideal language for this form of string manipulation...):
declare #str varchar(200)
set #str = 'Dr. John A. Mc Donald'
--Create a numbers table
select [Number] = identity(int)
into #Numbers
from sysobjects s1
cross join sysobjects s2
create unique clustered index Number_ind on #Numbers(Number) with IGNORE_DUP_KEY
;with nameParts as (
--Split the name string at the spaces.
select [ord] = row_number() over(order by Number),
[part] = substring(fn1, Number, charindex(' ', fn1+' ', Number) - Number)
from (select #str fn1) s
join #Numbers n on substring(' '+fn1, Number, 1) = ' '
where Number<=Len(fn1)+1
),
lastNames as (
--Build all possible lastName strings.
select [firstOrd]=ord, [lastOrd]=ord, [lastName]=cast(part as varchar(max))
from nameParts
where ord!=1 --remove the case where the whole string is the last name
UNION ALL
select firstOrd, p.ord, l.lastName+' '+p.part
from lastNames l
join nameParts p on l.lastOrd+1=p.ord
),
firstNames as (
--Build all possible firstName strings.
select [firstOrd]=ord, [lastOrd]=ord, [firstName]=cast(part as varchar(max))
from nameParts
where ord!=(select max(ord) from nameParts) --remove the case where the whole string is the first name
UNION ALL
select p.ord, f.lastOrd, p.part+' '+f.firstName
from firstNames f
join nameParts p on f.firstOrd-1 = p.ord
)
--Combine for all possible name strings.
select ln.lastName+', '+fn.firstName
from firstNames fn
join lastNames ln on fn.lastOrd+1=ln.firstOrd
where fn.firstOrd=1
and ln.lastOrd = (select max(ord) from nameParts)
drop table #Numbers
Since I had my share of terrible experience with data from third parties, it is almost guaranteed that the input data will contain lots of garbage not following the specified format.
When trying to match data multipart string data like in your case, I preprocessed both input and our data into something I called "normalized string" using the following method.
strip all non-ascii chars (leaving language-specific chars like "č" intact)
compact spaces (replace multiple spaces with single one)
lower case
split into words
remove duplicates
sort alphabetically
join back to string separated by dashes
Using you sample data, this function would produce:
Dr. John A. Mc Donald ->
a-donald-dr-john-mc Mc Donald, Dr.
John A.-> a-donald-dr-john-mc
Unfortunaly it's not 100% bulletproof, there are cases where degenerated inputs produce invalid matches.
Your name field is bad in the database. Redesign and get rid of it. If you havea a first name, middlename, lastname, prefix and suffix sttructure, you can hava computed filed that has the structure you are using. But it is a very poor way to store data and your first priority should be to stop using it.
Since you have a common customer Id why aren't you matching on that instead of name?

How to fetch values with a MySQL query?

I want to fetch all the records of First_Name, LastName, First Name Last Name in a mysql Query.
For example,
mytable looks like this:
rec Id First Name Last Name
1 Gnaniyar Zubair
2 Frankyn Albert
3 John Mathew
4 Suhail Ahmed
Output should be like this:
Gnaniyar Zubair, Frankyn Albert, John Mathew, Suhail Ahmed
Give me the SQL.
If this must the done in the query, you can use GROUP_CONCAT, but unless you're grouping by something it's a pretty silly query and the concatenation should really be done on the client.
SELECT GROUP_CONCAT(FirstName + ' ' + LastName
ORDER BY FirstName, LastName
SEPARATOR ', ') AS Names
FROM People;
It is not a matter of getting one row with all the records, but a matter of representation of data. Therefore, I suggest to take a simple SELECT query, take the records you need, then arrange them in the view layer as you like.
On the other hand, why do you need to solve this record concatenation at SQL level and not on view level?
If you wanted to get them in just one row, you're probably not using your database properly.
If you just want to join together the first and last names, that's easy:
SELECT CONCAT(`First Name`, ' ', `Last Name`) FROM mytable