Isolating an alphanumerical ID that can be anywhere in a text based cell

Isolating an alphanumerical ID that can be anywhere in a text based cell - sql

I'm trying to isolate a certain character string from a text cell.
For example, I would like to extract "AB-T120-15" from the string "His server ID was AB-T120-15 and his problem was that he needed a reboot"
AB-T120-15 is an example, but they would all be codes of a max length of 13 characters starting by something like AB-T, CL-R, etc.
The codes can appear anywhere in a text field of the column.
string_split() cannot be used since the DB we are under is older.
I have tried many combinations of Substring and LEFT, but I cannot seem to have it worked.
Any thoughts?

String operations are not the strength of SQL Server -- which I assume you are using.
You can do this with rather painful string manipulation:
select left(stuff(str, 1, patindex('%[A-Z][A-Z]-[A-Z]%', str) - 1, ''),
charindex(' ', stuff(str, 1, patindex('%[A-Z][A-Z]-[A-Z]%', str), '') + ' ')
)
from (values ('His server ID was AB-T120-15 and his problem was that he needed a reboot')) v(str);

Related

In SQL Server, how can I identify "double" strings and correct?

How can I find strings in a column that are doubled-up and correct them? I feel like there is an easy answer to this I just can't think of it.
Example:
I want to find instances of a repeating string, example "SolonSolon", and then update the column to "Solon".
Update:
They're always the same. No extra characters, but might have a space as part of the repeating value. Other examples would be...
"PlacePlace", "TreeTree", "OrangeOrange", "TravisMemorialHSTravisMemorialHS", "Texas HSTexas HS"

You can check if the string is equal to the first half replicated.
SELECT LEFT(YourCol,LEN(REPLACE(YourCol, ' ', 'x'))/2)
FROM YourTable
WHERE YourCol = REPLICATE(LEFT(YourCol,LEN(REPLACE(YourCol, ' ', 'x'))/2),2)
The reason for the REPLACE of spaces with x before calculating the LEN is because trailing spaces are ignored by this function. You can also use the technique in #lptr's answer for this but an edge case will be if the string was varchar(8000) and already 8000 characters long in which case concatenating an extra character won't do anything (LEN(SPACE(8000) + 'x') is 0).

..replace the first half of the value with an empty string..if there is nothing left..the value consists of two equal parts
select *, substring(c, 1, (len(c+'.')-1)/2)
from
(
values
('solosolo'), ('yoyo'), ('andand'), ('1212'),(' . .'),
('ababc'), ('onetwoone')
) as t(c)
where replace(c, substring(c, 1, (len(c+'.')-1)/2), '') = '';

Another alternative. The query removes inner spaces using REPLACE(str_col, ' ', ''), removes leading/traling spaces using TRIM, and checks to make sure the first half of the string equals the second half.
select left(no_spaces.str_col, v.str_len/2)
from foo f
cross apply (values (replaced trim(f.str_col), ' ', '')) no_spaces(str_col)
cross apply (values (len(no_spaces.str_col))) v(str_len)
where no_spaces.str_col=replicate(left(f.str_col, v.str_len/2), 2);

Trim FIRST character in string that has multiple text of that character

I'm using SQL Server 2008 and I'm trying to trim values that looks like this
DocID
----------------
FOO_1_23_456
FOO1_1_23_4567
I'm trying to make it so it will only give me everything after the first '_'
Result
_1_23_456
_1_23_4567
Right now my query is
select
right(DocIDDocument, LEN(DocID.Document) - 3)) AS NewDocID
which only the trims the first 3 characters, I need it to where it trims everything before the first '_'
Thanks

Use stuff() and charindex():
select stuff(document, 1, charindex('_', document) - 1, '')

Using Upper to Capitalize the first letter of City name

I am doing some data clean-up and need to Capitalize the first letter of City names. How do I capitalize the second word in a City Like Terra Bella.
SELECT UPPER(LEFT([MAIL CITY],1))+
LOWER(SUBSTRING([MAIL CITY],2,LEN([MAILCITY])))
FROM masterfeelisting
My results is this 'Terra bella' and I need 'Terra Bella'. Thanks in advance.

Ok, I know I answered this before, but it bugged me that we couldn't write something efficient to handle an unknown amount of 'text segments'.
So re-thinking it and researching, I discovered a way to change the [MAILCITY] field into XML nodes where each 'text segment' is assigned it's own Node within the xml field. Then those xml fields can be processed node by node, concatenated together, and then changed back to a SQL varchar. It's convoluted, but it works. :)
Here's the code:
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(max) not null
);
INSERT INTO #masterfeelisting VALUES
('terra bellA')
,(' terrA novA ')
,('chicagO ')
,('bostoN')
,('porT dE sanTo')
,(' porT dE sanTo pallo ');
SELECT
RTRIM
(
(SELECT
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, '')) + ' '
FROM [xmlNodeRecordSet].[nodeField].nodes('/N') as [xmlField]([xmlNode]) FOR
xml path(''), type
).value('.', 'varchar(max)')
) as [MAILCITY]
FROM
(SELECT
CAST('<N>' + REPLACE([MAILCITY],' ','</N><N>')+'</N>' as xml) as [nodeField]
FROM #masterfeelisting
) as [xmlNodeRecordSet];
Drop table #masterfeelisting;
First I create a table and fill it with dummy values.
Now here is the beauty of the code:
For each record in #masterfeelisting, we are going to create an xml field with a node for each 'text segment'.
ie. '<N></N><N>terrA</N><N>novA</N><N></N>'
(This is built from the varchar ' terrA novA ')
1) The way this is done is by using the REPLACE function.
The string starts with a '<N>' to designate the beginning of the node. Then:
REPLACE([MAILCITY],' ','</N><N>')
This effectively goes through the whole [MAILCITY] string and replaces each
' ' with '</N><N>'
and then the string ends with a '</N>'. Where '</N>' designates the end of each node.
So now we have a beautiful XML string with a couple of empty nodes and the 'text segments' nicely nestled in their own node. All the 'spaces' have been removed.
2) Then we have to CAST the string into xml. And we will name that field [nodeField]. Now we can use xml functions on our newly created record set. (Conveniently named [xmlNodeRecordSet].)
3) Now we can read the [xmlNodeRecordSet] into the main sub-Select by stating:
FROM [xmlNodeRecordSet].[nodeField].nodes('/N')
This tells us we are reading the [nodeField] as nodes with a '/N' delimiter.
This table of node fields is then parsed by stating:
as [xmlField]([xmlNode]) FOR xml path(''), type
This means each [xmlField] will be parsed for each [xmlNode] in the xml string.
4) So in the main sub-select:
Each blank node '<N></N>' is discarded. (Or not processed.)
Each node with a 'text segment' in it will be parsed. ie <N>terrA</N>
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
This code will grab each node out of the field and take its contents '.' and only grab the first character 'char(1)'. Then it will Upper case that character. (the plus sign at the end means it will concatenate this letter with the next bit of code:
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, ''))
Now here is the beauty... STUFF is a function that will take a string, from a position, for a length, and substitute another string.
STUFF(string, start position, length, replacement string)
So our string is:
[xmlField].[xmlNode].value('.', 'varchar(max)')
Which grabs the whole string inside the current node since it is 'varchar(max)'.
The start position is 1. The length is 1. And the replacement string is ''. This effectively strips off the first character by replacing it with nothing. So the remaining string is all the other characters that we want to have lower case. So that's what we do... we use LOWER to make them all lower case. And this result is concatenated to our first letter that we already upper cased.
But wait... we are not done yet... we still have to append a + ' '. Which adds a blank space after our nicely capitalized 'text segment'. Just in case there is another 'text segment' after this node is done.
This main sub-Select will now parse each node in our [xmlField] and concatenate them all nicely together.
5) But now that we have one big happy concatenation, we still have to change it back from an xml field to a SQL varchar field. So after the main sub-select we need:
.value('.', 'varchar(max)')
This changes our [MAILCITY] back to a SQL varchar.
6) But hold on... we still are not done. Remember we put an extra space at the end of each 'text segment'??? Well the last 'text segment still has that extra space after it. So we need to Right Trim that space off by using RTRIM.
7) And dont forget to rename the final field back to as [MAILCITY]
8) And that's it. This code will take an unknown amount of 'text segments' and format each one of them. All using the fun of XML and it's node parsers.
Hope that helps :)

Here's one way to handle this using APPLY. Note that this solution supports up to 3 substrings (e.g. "Phoenix", "New York", "New York City") but can easily be updated to handle more.
DECLARE #string varchar(100) = 'nEW yoRk ciTY';
WITH DELIMCOUNT(String, DC) AS
(
SELECT #string, LEN(RTRIM(LTRIM(#string)))-LEN(REPLACE(RTRIM(LTRIM(#string)),' ',''))
),
CIPOS AS
(
SELECT *
FROM DELIMCOUNT
CROSS APPLY (SELECT CHARINDEX(char(32), string, 1)) CI1(CI1)
CROSS APPLY (SELECT CHARINDEX(char(32), string, CI1.CI1+1)) CI2(CI2)
)
SELECT
OldString = #string,
NewString =
CASE DC
WHEN 0 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,8000))
WHEN 1 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,100))
WHEN 2 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,CI2-(CI1+1))) +
UPPER(SUBSTRING(string,CI2+1,1))+LOWER(SUBSTRING(string,CI2+2,100))
END
FROM CIPOS;
Results:
OldString NewString
--------------- --------------
nEW yoRk ciTY New York City

This will only capitalize the first letter of the second word. A shorter but less flexible approach. Replace #str with [Mail City].
DECLARE #str AS VARCHAR(50) = 'Los angelas'
SELECT STUFF(#str, CHARINDEX(' ', #str) + 1, 1, UPPER(SUBSTRING(#str, CHARINDEX(' ', #str) + 1, 1)));

This is a way to use imbedded Selects for three City name parts.
It uses CHARINDEX to find the location of your separator character. (ie a space)
I put an 'if' structure around the Select to test if you have any records with more than 3 parts to the city name. If you ever get the warning message, you could add another sub-Select to handle another city part.
Although... just to be clear... SQL is not the best language to do complicated formatting. It was written as a data retrieval engine with the idea that another program will take that data and massage it into a friendlier look and feel. It may be easier to handle the formatting in another program. But if you insist on using SQL and you need to account for city names with 5 or more parts... you may want to consider using Cursors so you can loop through the variable possibilities. (But Cursors are not a good habit to get into. So don't do that unless you've exhausted all other options.)
Anyway, the following code creates and populates a table so you can test the code and see how it works. Enjoy!
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(30) not null
);
Insert into #masterfeelisting select 'terra bella';
Insert into #masterfeelisting select ' terrA novA ';
Insert into #masterfeelisting select 'chicagO ';
Insert into #masterfeelisting select 'bostoN';
Insert into #masterfeelisting select 'porT dE sanTo';
--Insert into #masterfeelisting select ' porT dE sanTo pallo ';
Declare #intSpaceCount as integer;
SELECT #intSpaceCount = max (len(RTRIM(LTRIM([MAILCITY]))) - len(replace([MAILCITY],' ',''))) FROM #masterfeelisting;
if #intSpaceCount > 2
SELECT 'You need to account for more than 3 city name parts ' as Warning, #intSpaceCount as SpacesFound;
else
SELECT
cThird.[MAILCITY1] + cThird.[MAILCITY2] + cThird.[MAILCITY3] as [MAILCITY]
FROM
(SELECT
bSecond.[MAILCITY1] as [MAILCITY1]
,SUBSTRING(bSecond.[MAILCITY2],1,bSecond.[intCol2]) as [MAILCITY2]
,UPPER(SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 1, 1)) +
SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 2,LEN(bSecond.[MAILCITY2]) - bSecond.[intCol2]) as [MAILCITY3]
FROM
(SELECT
SUBSTRING(aFirst.[MAILCITY],1,aFirst.[intCol1]) as [MAILCITY1]
,UPPER(SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, 1)) +
SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 2,LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) as [MAILCITY2]
,CHARINDEX ( ' ', SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) ) as intCol2
FROM
(SELECT
UPPER (LEFT(RTRIM(LTRIM(mstr.[MAILCITY])),1)) +
LOWER(SUBSTRING(RTRIM(LTRIM(mstr.[MAILCITY])),2,LEN(RTRIM(LTRIM(mstr.[MAILCITY])))-1)) as [MAILCITY]
,CHARINDEX ( ' ', RTRIM(LTRIM(mstr.[MAILCITY]))) as intCol1
FROM
#masterfeelisting as mstr -- Initial Master Table
) as aFirst -- First Select Shell
) as bSecond -- Second Select Shell
) as cThird; -- Third Select Shell
Drop table #masterfeelisting;

Parse SQL file to separate columns

I have a sql file which has a lot of insert statements (over 3000+).
E.g.
insert into `pubs_for_client` (`ID`, `num`, `pub_name`, `pub_address`, `publ_tele`, `publ_fax`, `pub_email`, `publ_website`, `pub_vat`, `publ_last_year`, `titles_on_backlist`, `Personnel`) values('7','5','4TH xxxx xxxx','xxxx xxxx, 16 xxxxx xxxxx, xxxxxxx, We','111111111','1111111111','support#example.net','www.example.net','15 675 4238 14',NULL,NULL,'Jane Bloggs(Sales Contact:)jane.bloggs#example.net,Joe Bloggs(Other Contact:)joe.bloggs#example.net');
I have exported this into an excel document (I did this through running the query in phpmyadmin, and exporting for an excel document). There's just one problem, as you can see in this case, there are two names & email addresses being inserted into 'Personnel'.
How easy/difficult would it be to seperate these out to display as Name, email, Name2, email2?

What about when there are three e-mails/names?
With shown data it should be easy to do
select replace(substring(substring_index(`Personnel`, ',', 1),length(substring_index(`Personnel`, ',', 1 - 1)) + 1), ',', '') personnel1,
replace(substring(substring_index(`Personnel`, ',', 2),length(substring_index(`Personnel`, ',', 2 - 1)) + 1), ',', '') personnel2,
from `pubs_for_client`
The above will split the Personnel column on delimiter ,.
You can then split these fields on delimiter ( and ) to split personnel into name, position and e-mail
The SQL will be ugly (because mysql does not have split function), but it will get the job done.
The split expression was taken from comments on mysql documentation (search for split).
You can also
CREATE FUNCTION strSplit(x varchar(255), delim varchar(12), pos int) returns varchar(255)
return replace(substring(substring_index(x, delim, pos), length(substring_index(x, delim, pos - 1)) + 1), delim, '');
After which you can user
select strSplit(`Personnel`, ',', 1), strSplit(`Personnel`, ',', 2)
from `pubs_for_client`
You could also create your own function that will extract directly names and e-mails.

As bogeymin has already said - either get the data to CSV (or convert it easily from Excel) to manipulate it. If you're on Windows, then have a look at using Notepad++ to break apart the last column.
Or... (and I'd probably do this), insert it into the database as it is (even if you insert into a dummy field, not the one you eventually want to use), then use the string manipulation functions in your varient of SQL to make either update statements, or more insert statements (whatever you need). Cerainly, MS-SQL Server can do this using things like SUBSTRING, PATINDEX etc etc...

Oracle SQL - Parsing a name string and converting it to first initial & last name

Does anyone know how to turn this string: "Smith, John R"
Into this string: "jsmith" ?
I need to lowercase everything with lower()
Find where the comma is and track it's integer location value
Get the first character after that comma and put it in front of the string
Then get the entire last name and stick it after the first initial.
Sidenote - instr() function is not compatible with my version
Thanks for any help!

Start by writing your own INSTR function - call it my_instr for example. It will start at char 1 and loop until it finds a ','.
Then use as you would INSTR.

The best way to do this is using Oracle Regular Expressions feature, like this:
SELECT LOWER(regexp_replace('Smith, John R',
'(.+)(, )([A-Z])(.+)',
'\3\1', 1, 1))
FROM DUAL;
That says, 1) when you find the pattern of any set of characters, followed by ", ", followed by an uppercase character, followed by any remaining characters, take the third element (initial of first name) and append the last name. Then make everything lowercase.
Your side note: "instr() function is not compatible with my version" doesn't make sense to me, as that function's been around for ages. Check your version, because Regular Expressions was only added to Oracle in version 9i.
Thanks for the points.
-- Stew

instr() is not compatible with your version of what? Oracle? Are you using version 4 or something?

There is no need to create your own function, and quite frankly, it seems a waste of time when this can be done fairly easily with sql functions that already exist. Care must be taken to account for sloppy data entry.
Here is another way to accomplish your stated goal:
with name_list as
(select ' Parisi, Kenneth R' name from dual)
select name
-- There may be a space after the comma. This will strip an arbitrary
-- amount of whitespace from the first name, so we can easily extract
-- the first initial.
, substr(trim(substr(name, instr(name, ',') + 1)), 1, 1) AS first_init
-- a simple substring function, from the first character until the
-- last character before the comma.
, substr(trim(name), 1, instr(trim(name), ',') - 1) AS last_name
-- put together what we have done above to create the output field
, lower(substr(trim(substr(name, instr(name, ',') + 1)), 1, 1)) ||
lower(substr(trim(name), 1, instr(trim(name), ',') - 1)) AS init_plus_last
from name_list;
HTH,
Gabe

I have a hard time believing you don’t have access to a proper instr() but if that’s the case, implement your own version.
Assuming you have that straightened out:
select
substr(
lower( 'Smith, John R' )
, instr( 'Smith, John R', ',' ) + 2
, 1
) || -- first_initial
substr(
lower( 'Smith, John R' )
, 1
, instr( 'Smith, John R', ',' ) - 1
) -- last_name
from dual;
Also, be careful about your assumption that all names will be in that format. Watch out for something other than a single space after the comma, last names having data like “Parisi, Jr.”, etc.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Isolating an alphanumerical ID that can be anywhere in a text based cell - sql

Related

In SQL Server, how can I identify "double" strings and correct?

Trim FIRST character in string that has multiple text of that character

Using Upper to Capitalize the first letter of City name

Parse SQL file to separate columns

Oracle SQL - Parsing a name string and converting it to first initial & last name

Categories

Resources