split a fullname into first, middle, last and suffix - sql

I have full_name column data like
|Victoria Brown |
|Sam Allen JR |
|Ray M James III |
I want to split base on the number of space the fistname, lastname
HERE is what I did but last case statement is coming wrong it still getting the suffix when we have 3 space. also need to combinne them into one column please.

This is unfortunately a lot more complex than it may first seem, and how you handle it may have to do largely with the end goal for the data.
Here's a post that covers this same issue in fairly great depth -
SQL: parse the first, middle and last name from a fullname field
As Gilbert pointed out, some names are just different, and it will be hard to get everything right, but there are certainly things you can do to limit errors.
One of the better pieces of advice from that article would be to alter your collection method to get First/Middle/Last/Suffixes/Prefixes entered separately and join them after, rather than try to parse through richer text that contains them all.
Here is a function that you could get creative with - https://blog.seandaylor.com/sql-server-split-part/

Related

SQL - Calculating columns using dynamic functions

I'm trying to create a set of data that I'm going to write out to a file, it's essentially a report composed of various fields from a number of different tables, some columns need to have some processing done on them, some can just be selected.
Different users will likely want different processing performed on certain columns, and in the future, I'll probably need to add additional functions for computed columns.
I'm considering the cleanest/most flexable approach to storing and using all the different functions I'm likely to need for these computed columns, I've got two ideas in my head, but I'm hoping there might be a much more obvious solution I'm missing.
For a simple, slightly odd example, a Staff table:
Employee | DOB | VacationDays
Frank | 01/01/1970 | 25
Mike | 03/03/1975 | 24
Dave | 05/02/1980 | 30
I'm thinking I'd either end up with a query like
SELECT NameFunction(Employee, optionID),
DOBFunction(DOB, optionID),
VacationFunction(VacationDays, optionID),
from Employee
With user defined functions, where the optionID would be used in a case statement inside the functions to decide what processing to perform.
Or I'd want to make the way the data is returned customisable using a lookup table of other functions:
ID | Name | Description
1 | ShortName | Obtains 3 letter abbreviation of employee name
2 | LongDOB | Returns DOB in format ~ 1st January 1970
3 | TimeStampDOB | Returns Timestamp for DOB
4 | VacationSeconds | Returns Seconds of vaction time
5 | VacationBusinessHours | Returns number of business hours of vacation
Which seems neater, but I'm not sure how I'd formulate the query, presumably using dynamic SQL? Is there a sensible alternative?
The functions will be used on a few thousand rows.
The closest answer I've found was in this thread:
Call dynamic function name in SQL
I'm not a huge fan of dynamic SQL, although in this case I think it might be the best way to get the result I'm after?
Any replies appreciated,
Thanks,
Chris
I would go for the second solution. You could even use real stored proc names in your lookup table.
create proc ShortName (
#param varchar(50)
)as
begin
select 'ShortName: ' + #param
end
go
declare #proc sysname = 'ShortName'
exec #proc 'David'
As you can see in the example above, the first parameter of exec (i.e. the procedure name) can be a parameter. This said with all the usual warnings regarding dynamic sql...
In the end, you should go with whichever is faster, so you should try both ways (and any other way someone might come up with) and decide after that.
I like better the first option, as long as your functions don't have extra selects to a table. You may not even need the user defined functions, if they are not going to be reused in a different report.
I prefer to use Dynamic SQL ony to improve a query's performance, such as adding a dynamic ordering or adding / removing complex WHERE conditions.
But these are all subjective opinions, the best thing is try, compare, and decide.
Actually, this isn't a question of what's faster. It is a question of what makes the code cleaner, particularly for adding new functionality (new columns, new column formats, re-ordering them).
Don't think of your second approach as "using dynamic SQL", since that tends to have negative connotations. Instead, think of it as a data-driven approach. You want to build a table that describes the columns that users can get, and the formats. This is great! Users can then provide a list of columns, and you'll have a magical stored procedure that combines the information from the users with the information in your metadata table, and produces the desired result.
I'm a big fan of data-driven approaches, and dynamic SQL is the best SQL tool I've found so far for implementing them.

SQL Server - looking for matches within names

I'm using SQL Server 2005, I have names coming into a system and I want to compare them against a table to look for matches. Any suggestions on how to match something like this:
The incoming value is something like "J.R. Thompson Corporation"
while the value is "The Jim Ryan Thompson Company" in the database.
Simply said - it can not be done. Even normalizing addresses is complex, and there you follow only specific rules (Str. for Street, for example). I was in a project doing that in germany 15 years ago and hell broke loose, so to say - because some places had special rules (m 4 = valid address in one city, for example, because the inner city is quandrants, or "Strasse des 14. July Appartement 3" broke our "first numer is end of street name" rules).
The whole thing gets woirse - in your xample "J.R." and "Jim Ryan" may or may not be the same. There are some rules you can set up, and it gets a lot easier with addresses (same address means that at the end the name matching can ebe looser) but in general this is not a reasible approach. Even spelling correction will not catch that. There simply is no way to normalize that without artificial intelligence having internet access to use google to find out whether it matches. Yes, you may fget a 20% or 30% hit rate, but that leaves you with a TON of errors left and right and is likely less than useless from a business point of view.
You need at least one standardized identifier you can use to nail down the selection. House numbers, phone numbers, anything that can be standardized easier and then provides an achor for the name matching algo.
Without doing full text search (which is designed to do these things), you can do this in a simple way and get close by just replacing the spaces and periods with % wild cards and putting % at the start and end of the string:
DECLARE #input VARCHAR(50) = 'J.R. Thompson Corporation'
SELECT *
FROM Company
WHERE Name LIKE '%' + REPLACE(REPLACE(#input, '.', '%'), ' ', '%') + '%'
It is important to note that doing any sort of LIKE search where you have a leading % symbol will not benefit from an index on that column.
Note this still will not pick up things like "company' meaning 'corporation' as in your example.

FREETEXT queries in SQL Server 2008 not phrase matching

I have a full text indexed table in SQL Server 2008 that I am trying to query for an exact phrase match using FULLTEXT. I don't believe using CONTAINS or LIKE is appropriate for this, because in other cases the query might not be exact (user doesn't surround phrase in double quotes) and in general I want to flexibility of FREETEXT.
According to the documentation[MSDN] for FREETEXT:
If freetext_string is enclosed in double quotation marks, a phrase match is instead performed; stemming and thesaurus are not performed.
which would lead me to believe a query like this:
SELECT Description
FROM Projects
WHERE FREETEXT(Description, '"City Hall"')
would only return results where the term "City Hall" appears in the Description field, but instead I get results like this:
1 Design of handicap ramp at Manning Hall.
2 Antenna investigation. Client: City of Cranston Engineering Dept.
3 Structural investigation regarding fire damage to International Tennis Hall of Fame.
4 Investigation Roof investigation for proposed satellite design on Herald Hall.
... etc
Obviously those results include at least one of the words in my phrase, but not the phrase itself. What's worse, I had thought the results would be ranked but the two results I actually wanted (because they include the actual phrase) are buried.
SELECT Description
FROM Projects
WHERE Description LIKE '%City Hall%'
1 Major exterior and interior renovation of the existing city hall for Quincy Massachusetts
2 Cursory structural investigation of Pawtucket City Hall tower plagued by leaks.
I'm sure this is a case of me not understanding the documentation, but is there a way to achieve what I'm looking for? Namely, to be able to pass in a search string without quotes and get exactly what I'm getting now or with quotes and get only that exact phrase?
As you said, FREETEXT looks up every word in your phrase, not the phrase as an all. For that you need to use the CONTAINS statement. Like this:
SELECT Description
FROM Projects
WHERE CONTAINS(Description, '"City Hall"')
If you want to get the rank of the results, you have to use CONTAINSTABLE. It works roughly the same, but it returns a table with two columns: [Key] wich contains the primary key of the search table and [Rank], which gives you the rank of the result.

MySQL: select the closest match?

I want to show the closest related item for a product. So say I am showing a product and the style number is SG-sfs35s. Is there a way to select whatever product's style number is closest to that?
Thanks.
EDIT: to answer your questions. Well I definitely want to keep the first 2 letters as that is the manufacturer code but as for the part after the first dash, just whatever matches closest. so for example SG-sfs35s would match SG-shs35s much more than SG-sht64s. I hope this makes sense whenever I do LIKE product_style_number it only pulls the exact match.
There normally isn't a simple way to match product codes that are roughly similar.
A more SQL friendly solution is to create a new table that maps each product to all the products it is similar to.
This table would either need to be maintained manually, or a more sophisticated script can be executed periodically to update it.
If your product codes follow a consistent pattern (all the letters are the same for similar products, with only the numbers changing), then you should be able to use a regular expression to match the similar items. There are docs on this here...
It sounds like what you want is levenshtein distance .
Unfortunately, there isn't a built-in levenshtein function for mysql, but some folks have come up with a user-defined function that does it(deadlink).
You will probably want to do it as a stored procedure, as I expect that the algorithm may not be trivial.
For example, you may split the term at the -, so you have two parts. You do a LIKE query on each part and use that to make a decision.
You could just loop though, replacing the last character with "%" until you get at least one result, in your stored procedure.
Sounds like you need something like Lucene, though i'm not sure if that would be overkill for your situation. But it certainly would be able to do text searches and return the ones most similar first.
If you need something more simple I would try to start by searching with the full product code, then if that doesn't work try to use wildcards/remove some characters until you return a result.
JD Isaacks.
This situation of yours is very simple to solve.
It`s not like you need to use Artificial Intelligence like the Google.
http://www.w3schools.com/sql/sql_wildcards.asp
Take a look at this manual at w3schools about wildcards to use with your SELECT code.
But also you will need to create a new table with 3 columns: LeftCode, RightCode and WildCard.
Example:
Rows on Table:
LeftCode = SG | RightCode = 35s | WildCard = SG-s_s35s
LeftCode = SG | RightCode = 64s | WildCard = SG-s_t64s
SQL Code
If the user typed the code that matches the row1 of the table:
SELECT * FROM PRODUCTS WHERE CODE LIKE "$WildCard";
Where $WildCard is the PHP variable containing the column 3 of the new table.
I hope I helped, even 4 years late...

How to use the SQL replace function effectively?

I am trying to replace multiple rows in an Access database to follow a new set of data rules. For instance, the word Fort in Fort Myers is listed as Ft., Ft and Fort. I would like to make a global change to the group. I am familiar with the SQL replace command, but wondering if anyone has done something similar with a stored procedure or had experience with something like this.
You have to be really, really careful that you don't replace more than what you intend.
MAKE A BACKUP first in case things go horribly wrong.
Always start with a SELECT to filter the records first. Go over the results carefully.
SELECT * FROM Table WHERE City LIKE "%Ft. Myers%"
Then do the Replaces as Carlton said.
Harder than it sounds to the lay person ...
There is no way around it but making a Replace for each thing you don't like, changing into what you do like. BUT BE VERY CAREFUL ... unintended consequences and all. I recommend doing a select before every update to see exactly what you will be updating.
So in your instance of Fort Myers you have to do 3 Replaces:
Replace("Ft. Myers", "Fort Myers")
Replace("Ft Myers", "Fort Myers")
Replace("Fort. Myers", "Fort Myers")
If you have much data and many things to change, this could be a HUGE task. But there is no "automated" way to do it - SQL does not use fuzzy logic, you have to specify exactly everything you want it to do.
Tidying addresses can be a nightmare. You may need to create a replace table:
ShouldBe Current
Fort Myers Ft Myers
Foot Hill Ft Hills
For the most part, the ShouldBe column can be filled in with update queries, but you will also be able to run your eye over the results before updating the main table. This will also stand in good stead for future data entry.
If your goal is to standardize the city names that are something like ~Fort Myers, you should be able to do something like this:
UPDATE Table SET City = 'Fort Myers' WHERE City LIKE 'F%Myers';
This should replace any City field in any row where the City begins with an F and ends in Myers. This may be what you want, but be very careful.