SQL Server Function to Split Full Names - sql

I have scoured SO for a resolution to my problem with getting this function to run properly, but I mostly see solutions regarding the use of the function and not as many in creating the function. I have already created the function, so that's why it reads 'ALTER FUNCTION' at the top of my code. The end goal is to parse out First Name, Middle Initial, and Last Name.
I keep getting an incorrect syntax error near the first 'END' in the CASE statement regarding the parsing of the FirstName. I apologize if this is such an easy fix but I just cannot figure out what I am missing. Any help in error recognition or a cleaner syntax would be much appreciated for a beginner like myself.
Also, the 2nd SET statement towards the bottom is just a simple function(already written before I got here) that CamelCases the output.
Sorry about the first comment. Here are some sample names that I have been using and I want to parse these from one column into 3 columns. First, middle, and last name.
Carlton J Smith
Charmane Thorn
Deel S Shah
Curtis Brennan
Allie F Allison
Alex Finde
Tina D Page
Jackie Russell
I tried adding two more SET statements but it still is giving me the same syntax error around the first CASE statement. Anything else I could provide to give more context? Thanks for the prompt responses.
ALTER FUNCTION fn_clean_Name_Split (#source VARCHAR(255))
RETURNS VARCHAR(255)
AS
BEGIN
DECLARE #target VARCHAR(255) = #source
DECLARE #index INT = CHARINDEX(' ',#target)
SET #target =
CASE
WHEN #index <> LEN(#target)
THEN LEFT(#target, #index)
END AS FirstName,
CASE
WHEN #index <> LEN(#target) - CHARINDEX(' ', REVERSE(#target)) + 1
THEN SUBSTRING(#target, #index + 1, LEN(#target) - CHARINDEX(' ', REVERSE(#target)) - #index)
END AS MI,
CASE
WHEN #index <> LEN(#target) - #index + 1
THEN RIGHT(#target, CHARINDEX(' ', REVERSE(#target))) AS LastName,
ELSE #target
SET #target = dbo.fn_standardize_CamelCase(#target)
RETURN #target
END

This is too long for a comment.
A set is used to set the value of a single variable, not multiple variables. This is easy to work around; you can just use multiple set statements.
You can set multiple variables using select. That is a nice convenience. But you cannot both set values and return values in the same statement.
You have a function, so you don't want a query that returns values anyway.

Related

in SQL how can I remove the first 3 characters on the left and everything on the right after an specific character

In SQL how can I remove (from displaying on my report no deleting from database) the first 3 characters (CN=) and everything after the comma that is followed by "OU" so that I am left with the name and last name in the same column? for example:
CN=Tom Chess,OU=records,DC=1234564786_data for testing, 1234567
CN=Jack Bauer,OU=records,DC=1234564786_data for testing, 1234567
CN=John Snow,OU=records,DC=1234564786_data for testing, 1234567
CN=Anna Rodriguez,OU=records,DC=1234564786_data for testing, 1234567
Desired display:
Tom Chess
Jack Bauer
John Snow
Anna Rodriguez
I tried playing with TRIM but I don't know how to do it without declaring the position and with names and last names having different lengths I really don't know how to handle that.
Thank you in advance
Update: I wonder about an approach of using Locate to match the position of the comma and then feed that to a sub-string. Not sure if a approach like would work and not sure how to put the syntax together. What do you think? will it be a feasible approach?
You can try this one SUBSTRING(ColumnName, 4, CHARINDEX(',', ColumnName) - 4)
In Postgres, you could use split_part() assuming no name contains a ,
select substr(split_part(the_column, ',', 1), 4)
from ...
Db2 11.x for LUW:
with tab (str) as (values
' CN = Tom Chess , OU = records,DC=1234564786_data for testing, 1234567'
, 'CN=Jack Bauer,OU=records,DC=1234564786_data for testing, 1234567'
, 'CN=John Snow,OU=records,DC=1234564786_data for testing, 1234567'
, 'CN=Anna Rodriguez,OU=records,DC=1234564786_data for testing, 1234567'
)
select REGEXP_REPLACE(str, '^\s*CN\s*=\s*(.*)\s*,\s*OU\s*=.*', '\1')
from tab;
Note, that such a regex pattern allows an arbitrary number of spaces as in the 1-st record of example above.
In Oracle 11g, it might work.
REGEXP_SUBSTR(REGEXP_SUBSTR(COLUMN_NAME, '[^CN=]+',1,1),'[^,OU]+',1,1)
I think there has to be a loop to handle this. Here's SQL Server function that will parse this out. (I know the question didn't specify SQL Server, but it's an example of how it can be done.)
select dbo.ScrubFieldValue(value) from table will return what you're looking for
CREATE FUNCTION ScrubFieldValue
(
#Input varchar(8000)
)
RETURNS varchar(8000)
AS
BEGIN
DECLARE #retval varchar(8000)
DECLARE #charidx int
DECLARE #remaining varchar(8000)
DECLARE #current varchar(8000)
DECLARE #currentLength int
select #retval = ''
select #remaining = #Input
select #charidx = CHARINDEX('CN=', #remaining,2)
while(LEN(#remaining) > 0)
BEGIN
--strip current row from remaining
if (#charidx > 0)
BEGIN
select #current = SUBSTRING(#remaining, 1, #charidx - 1)
END
else
BEGIN
select #current = #remaining
END
select #currentLength = LEN(#current)
-- get current name
select #current = SUBSTRING(#current, 4, CHARINDEX(',OU', #current)-4)
select #retval = #retval + #current + ' '
-- strip off current from remaining
select #remaining =substring(#remaining,#currentLength + 1,
LEN(#remaining) - #currentLength)
select #charidx = CHARINDEX('CN=', #remaining,2)
END
RETURN #retval
END
On my version of DB2 for Z/OS CHARINDEX throws a syntax error. Here are two ways to work around that.
SUBSTRING(ColumnName, 4, INSTR(ColumnName,',',1) - 4)
SUBSTRING(ColumnName, 4, LOCATE_IN_STRING(ColumnName,',') - 4)
I should add that the version is V12R1
If input str is wellformed (i.e. looks like your sample data without any additional tokens such as space), you could use something like:
substr(str,locate('CN=', str)+length('CN='), locate(',', str)-length('CN=')-1)
If your Db2 version support REGEXP, that's a better choice.

Extract substring from string if certain characters exists SQL

I have a string:
DECLARE #UserComment AS VARCHAR(1000) = 'bjones marked inspection on system UP for site COL01545 as Refused to COD won''t pay upfront :Routeid: 12 :Inspectionid: 55274'
Is there a way for me to extract everything from the string after 'Inspectionid: ' leaving me just the InspectionID to save into a variable?
Your example doesn't quite work correctly. You defined your variable as varchar(100) but there are more characters in your string than that.
This should work based on your sample data.
DECLARE #UserComment AS VARCHAR(1000) = 'bjones marked inspection on system UP for site COL01545 as Refused to COD won''t pay upfront :Routeid: 12 :Inspectionid: 55274'
select right(#UserComment, case when charindex('Inspectionid: ', #UserComment, 0) > 0 then len(#UserComment) - charindex('Inspectionid: ', #UserComment, 0) - 13 else len(#UserComment) end)
I would do this as:
select stuff(#UserComment, 1, charindex(':Inspectionid: ', #UserComment) + 14, '')
This works even if the string is not found -- although it will return the whole string. To get an empty string in this case:
select stuff(#UserComment, 1, charindex(':Inspectionid: ', #UserComment + ':Inspectionid: ') + 14, '')
Firstly, let me say that your #UserComment variable is not long enough to contain the text you're putting into it. Increase the size of that first.
The SQL below will extract the value:
DECLARE #UserComment AS VARCHAR(1000); SET #UserComment = 'bjones marked inspection on system UP for site COL01545 as Refused to COD won''t pay upfront :Routeid: 12 :Inspectionid: 55274'
DECLARE #pos int
DECLARE #InspectionId int
DECLARE #IdToFind varchar(100)
SET #IdToFind = 'Inspectionid: '
SET #pos = CHARINDEX(#IdToFind, #UserComment)
IF #pos > 0
BEGIN
SET #InspectionId = CAST(SUBSTRING(#UserComment, #pos+LEN(#IdToFind)+1, (LEN(#UserComment) - #pos) + 1) AS INT)
PRINT #InspectionId
END
You could make the above code into a SQL function if necessary.
If the Inspection ID is always 5 digits then the last argument for the Substring function (length) can be 5, i.e.
SELECT SUBSTRING(#UserComment,PATINDEX('%Inspectionid:%',#UserComment)+14,5)
If the Inspection ID varies (but is always at the end - which your question slightly implies), then the last argument can be derived by subtracting the position of 'InspectionID:' from the overall length of the string. Like this:
SELECT SUBSTRING(#UserComment,PATINDEX('%Inspectionid:%',#UserComment)+14,LEN(#usercomment)-(PATINDEX('%Inspectionid:%',#UserComment)+13))

Simple Explanation for PATINDEX

I have have been reading up on PATINDEX attempting to understand what and why. I understand the when using the wildcards it will return an INT as to where that character(s) appears/starts. So:
SELECT PATINDEX('%b%', '123b') -- returns 4
However I am looking to see if someone can explain the reason as to why you would use this in a simple(ish) way. I have read some other forums but it just is not sinking in to be honest.
Are you asking for realistic use-cases? I can think of two, real-life use-cases that I've had at work where PATINDEX() was my best option.
I had to import a text-file and parse it for INSERT INTO later on. But these files sometimes had numbers in this format: 00000-59. If you try CAST('00000-59' AS INT) you'll get an error. So I needed code that would parse 00000-59 to -59 but also 00000159 to 159 etc. The - could be anywhere, or it could simply not be there at all. This is what I did:
DECLARE #my_var VARCHAR(255) = '00000-59', #my_int INT
SET #my_var = STUFF(#my_var, 1, PATINDEX('%[^0]%', #my_var)-1, '')
SET #my_int = CAST(#my_var AS INT)
[^0] in this case means "any character that isn't a 0". So PATINDEX() tells me when the 0's end, regardless of whether that's because of a - or a number.
The second use-case I've had was checking whether an IBAN number was correct. In order to do that, any letters in the IBAN need to be changed to a corresponding number (A=10, B=11, etc...). I did something like this (incomplete but you get the idea):
SET #i = PATINDEX('%[^0-9]%', #IBAN)
WHILE #i <> 0 BEGIN
SET #num = UNICODE(SUBSTRING(#IBAN, #i, 1))-55
SET #IBAN = STUFF(#IBAN, #i, 1, CAST(#num AS VARCHAR(2))
SET #i = PATINDEX('%[^0-9]%', #IBAN)
END
So again, I'm not concerned with finding exactly the letter A or B etc. I'm just finding anything that isn't a number and converting it.
PATINDEX is roughly equivalent to CHARINDEX except that it returns the position of a pattern instead of single character. Examples:
Check if a string contains at least one digit:
SELECT PATINDEX('%[0-9]%', 'Hello') -- 0
SELECT PATINDEX('%[0-9]%', 'H3110') -- 2
Extract numeric portion from a string:
SELECT SUBSTRING('12345', PATINDEX('%[0-9]%', '12345'), 100) -- 12345
SELECT SUBSTRING('x2345', PATINDEX('%[0-9]%', 'x2345'), 100) -- 2345
SELECT SUBSTRING('xx345', PATINDEX('%[0-9]%', 'xx345'), 100) -- 345
Quoted from PATINDEX (Transact-SQL)
The following example uses % and _ wildcards to find the position at
which the pattern 'en', followed by any one character and 'ure' starts
in the specified string (index starts at 1):
SELECT PATINDEX('%en_ure%', 'please ensure the door is locked');
Here is the result set.
8
You'd use the PATINDEX function when you want to know at which character position a pattern begins in an expression of a valid text or character data type.

Looking for a scalar function to find the last occurrence of a character in a string

Table FOO has a column FILEPATH of type VARCHAR(512). Its entries are absolute paths:
FILEPATH
------------------------------------------------------------
file://very/long/file/path/with/many/slashes/in/it/foo.xml
file://even/longer/file/path/with/more/slashes/in/it/baz.xml
file://something/completely/different/foo.xml
file://short/path/foobar.xml
There's ~50k records in this table and I want to know all distinct filenames, not the file paths:
foo.xml
baz.xml
foobar.xml
This looks easy, but I couldn't find a DB2 scalar function that allows me to search for the last occurrence of a character in a string. Am I overseeing something?
I could do this with a recursive query, but this appears to be overkill for such a simple task and (oh wonder) is extremely slow:
WITH PATHFRAGMENTS (POS, PATHFRAGMENT) AS (
SELECT
1,
FILEPATH
FROM FOO
UNION ALL
SELECT
POSITION('/', PATHFRAGMENT, OCTETS) AS POS,
SUBSTR(PATHFRAGMENT, POSITION('/', PATHFRAGMENT, OCTETS)+1) AS PATHFRAGMENT
FROM PATHFRAGMENTS
)
SELECT DISTINCT PATHFRAGMENT FROM PATHFRAGMENTS WHERE POS = 0
I think what you're looking for is the LOCATE_IN_STRING() scalar function. This is what Info Center has to say if you use a negative start value:
If the value of the integer is less than zero, the search begins at
LENGTH(source-string) + start + 1 and continues for each position to
the beginning of the string.
Combine that with the LENGTH() and RIGHT() scalar functions, and you can get what you want:
SELECT
RIGHT(
FILEPATH
,LENGTH(FILEPATH) - LOCATE_IN_STRING(FILEPATH,'/',-1)
)
FROM FOO
One way to do this is by taking advantage of the power of DB2s XQuery engine. The following worked for me (and fast):
SELECT DISTINCT XMLCAST(
XMLQuery('tokenize($P, ''/'')[last()]' PASSING FILEPATH AS "P")
AS VARCHAR(512) )
FROM FOO
Here I use tokenize to split the file path into a sequence of tokens and then select the last of these tokens. The rest is only conversion from SQL to XML types and back again.
I know that the problem from the OP was already solved but I decided to post the following anyway to hopefully help others like me that land here.
I came across this thread while searching for a solution to my similar problem which had the exact same requirement but was for a different kind of database that was also lacking the REVERSE function.
In my case this was for a OpenEdge (Progress) database, which has a slightly different syntax. This made the INSTR function available to me that most Oracle typed databases offer.
So I came up with the following code:
SELECT
SUBSTRING(
foo.filepath,
INSTR(foo.filepath, '/',1, LENGTH(foo.filepath) - LENGTH( REPLACE( foo.filepath, '/', '')))+1,
LENGTH(foo.filepath))
FROM foo
However, for my specific situation (being the OpenEdge (Progress) database) this did not result into the desired behaviour because replacing the character with an empty char gave the same length as the original string. This doesn't make much sense to me but I was able to bypass the problem with the code below:
SELECT
SUBSTRING(
foo.filepath,
INSTR(foo.filepath, '/',1, LENGTH( REPLACE( foo.filepath, '/', 'XX')) - LENGTH(foo.filepath))+1,
LENGTH(foo.filepath))
FROM foo
Now I understand that this code won't solve the problem for T-SQL because there is no alternative to the INSTR function that offers the Occurence property.
Just to be thorough I'll add the code needed to create this scalar function so it can be used the same way like I did in the above examples.
-- Drop the function if it already exists
IF OBJECT_ID('INSTR', 'FN') IS NOT NULL
DROP FUNCTION INSTR
GO
-- User-defined function to implement Oracle INSTR in SQL Server
CREATE FUNCTION INSTR (#str VARCHAR(8000), #substr VARCHAR(255), #start INT, #occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #found INT = #occurrence,
#pos INT = #start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #pos = CHARINDEX(#substr, #str, #pos);
-- Nothing found
IF #pos IS NULL OR #pos = 0
RETURN #pos;
-- The required occurrence found
IF #found = 1
BREAK;
-- Prepare to find another one occurrence
SET #found = #found - 1;
SET #pos = #pos + 1;
END
RETURN #pos;
END
GO
To avoid the obvious, when the REVERSE function is available you do not need to create this scalar function and you can just get the required result like this:
SELECT
SUBSTRING(
foo.filepath,
LEN(foo.filepath) - CHARINDEX('\', REVERSE(foo.filepath))+2,
LEN(foo.filepath))
FROM foo
You could just do it in a single statement:
select distinct reverse(substring(reverse(FILEPATH), 1, charindex('/', reverse(FILEPATH))-1))
from filetable

SQL Server substring breaking on words, not characters

I'd like to show no more than n characters of a text field in search results to give the user an idea of the content. However, I can't find a way to easily break on words, so I wind up with a partial word at the break.
When I want to show: "This student has not submitted his last few assignments", the system might show: "This student has not submitted his last few assig"
I'd prefer that the system show up to the n character limit where words are preserved, so I'd like to see:
"This student has not submitted his last few"
Is there a nearest word function that I could write in T-SQL, or should I do that when I get the results back into ASP or .NET?
If you must do it in T-SQL:
DECLARE #t VARCHAR(100)
SET #t = 'This student has not submitted his last few assignments'
SELECT LEFT(LEFT(#t, 50), LEN(LEFT(#t, 50)) - CHARINDEX(' ', REVERSE(LEFT(#t, 50))))
It will not be catastrophically slow, but it will definitely be slower than doing it in the presentation layer.
Other than that — just cutting off the word and appending an ellipsis for longer strings is no bad option either. This way at least all truncated strings have the same length, which might come in handy if you are formatting for a fixed-width output.
I agree with doing this outside of the database that way other applications with different length restrictions can make their own decisions on what to show/hide. Perhaps that can be a parameter to the database call though.
Here's a quick stab at a solution:
DECLARE #OriginalData NVARCHAR(MAX)
,#ReversedData NVARCHAR(MAX)
,#MaxLength INT
,#DelimiterPosition INT ;
SELECT #OriginalData = 'This student has not submitted his last few assignments'
,#MaxLength = 45;
SET #ReversedData = REVERSE(
LEFT(#OriginalData, #MaxLength)
);
SET #DelimiterPosition = CHARINDEX(' ', #ReversedData);
PRINT LEFT(#OriginalData, #MaxLength - #DelimiterPosition);
/*
This student has not submitted his last few assignments
1234567890123456789012345678901234567890123456789012345
*/
I recommend doing that kind of logic outside database. With C# it could look similar to this:
static string Cut(string s, int length)
{
if (s.Length <= length)
{
return s;
}
while (s[length] != ' ')
{
length--;
}
return s.Substring(0, length).Trim();
}
Of cause you could do this with T-SQL, but that is bad idea (bad performance etc.). If you really need to put it inside DB I would use CLR-based stored procedure instead.
I'd like to add to the solutions already offered that word breaking logic is a lot more complicated than it seems on the surface. To do it well you are going to need to define a number of rules for what constitutes a word. Consider the following:
Spaces - No brainer.
Hyphens - Well that depends. In Over-exposed proably, in re-animated probably not. Then what about dates such as 01-02-1985?
Periods - No brainer. Oh wait, what about the one in myemail#myisp.com or $79.95?
Commas - In numbers such as 1,239 no, but in sentences yes.
Apostrophes - In O'Reily no, in SQL is an 'Enterprise' Database tool yes.
Do special characters alone constitute words?: In Item 1 : Buy TP is the colon counted as a word?
I found an answer on this site and modified it:
the cast (150) must be greater than the number of characters you're returning (100)
LEFT (Cast(myTextField As varchar(150)),
CHARINDEX(' ', CAST(flag_myTextField AS VARCHAR(150)), 100) ) AS myTextField_short
I'm not sure how fast this will run, but it will work....
DECLARE #Max int
SET #Max=??
SELECT
REVERSE(RIGHT(REVERSE(LEFT(YourColumnHere,#Max)),#Max- CHARINDEX(' ',REVERSE(LEFT(YourColumnHere,#Max)))))
FROM YourTable
WHERE X=Y
I wouldn't advice to do that either, but if you must, you can do something like this:
DECLARE #text nvarchar(max);
DECLARE #end_char int;
SELECT #text = 'This student has not submitted his last few assignments', #end_char = 50 ;
WHILE #end_char > 0 AND SUBSTRING( #text, #end_char+1, 1 ) <> ' '
SET #end_char = #end_char - 1
SELECT #text = SUBSTRING( #text, 1, #end_char ) ;
SELECT #text