Remove unwanted text from column values

Remove unwanted text from column values - sql

I dont know how to do this. On my table I have descriptions mixed with description code. I need to remove the code description, just want the description. The description is just the first part without the acronym (capital letters at the end). I use SQL Server 2012
Example:
ColumnDescription
Chemistry Q
education E
psychology P
Sociology SOC
Documentation DOC
communication COM
Political Science CP
Pharmacy and Toxicology FT
Engineering Education (General) ING-G

If you are looking to simply strip the code that is at the end of each string, a way to do this is to identify the last space character in the string and then use SUBSTRING to extract everything before that character:
SELECT SUBSTRING(ColumnDescription, 0, LEN(ColumnDescription) - CHARINDEX(' ', REVERSE(ColumnDescription)) + 1) AS ColumnDescription
FROM Table
Note that I do not know what your table is called, so I called it Table.
This effectively reverses the column text (using REVERSE), finds the first occurrence of a space character (using CHARINDEX) and then subtracts this from the length of the text (using LEN).
Then a simple SUBSTRING is used to extract the left-most portion of the text, resulting in the output of:
ColumnDescription
-----------------
Chemistry
education
psychology
Sociology
Documentation
communication
Political Science
Pharmacy and Toxicology
Engineering Education (General)

Based on your sample data I would guess that your problem could be simplified to:
Split the string at the last space character in the string
In which case:
DECLARE #your_table table (
ColumnDescription varchar(100)
);
INSERT INTO #your_table (ColumnDescription)
VALUES ('Chemistry Q')
, ('education E')
, ('psychology P')
, ('Sociology SOC')
, ('Documentation DOC')
, ('communication COM')
, ('Political Science CP')
, ('Pharmacy and Toxicology FT')
, ('Engineering Education (General) ING-G');
SELECT *
, SubString(ColumnDescription, number_of_characters - last_space + 2, 100) As last_part
, SubString(ColumnDescription, 0, number_of_characters - last_space + 2) As first_part
FROM (
SELECT ColumnDescription
, Len(ColumnDescription) As number_of_characters
, Reverse(ColumnDescription) As reversed
, CharIndex(' ', Reverse(ColumnDescription)) As last_space
FROM #your_table
) As x;

Related

SQL Server replace different characters in a column value

I have a Barcode column with some data like below:
Z1B1S1A -- Zone 1 Bay 1 Shelf 1A
Z10B10S10B -- Zone 10 Bay 10 Shelf 10B
want to replace them with:
01-01-01A -- I think I can get by with 1-1-1A
10-10-10B
The zone, bay, shelf can go from 1 to 99.
The problem for me is the inconsistent with 1 or 2 digit, and the last char 'B' can be mistaken for the Bay number.
Thank you for any help.

2 solutions. First solution is from help & idea of Analyst & DeepShiKha
1. First solution
-- SUBSTRING ( expression ,start , length )
-- CHARINDEX ( expressionToFind , expressionToSearch [ , start_location ] )
select substring(Barcode,2,charindex('B',barcode)-2) AS Zoney,
substring(Barcode,charindex('B',Barcode)+1, charindex('S',Barcode)-charindex('B',Barcode)-1) AS Bay,
substring(Barcode,charindex('S',Barcode)+1,len(Barcode)) AS Shelf,
concat (
substring(Barcode,2,charindex('B',barcode)-2),'-',
substring(Barcode,charindex('B',Barcode)+1, charindex('S',Barcode)-charindex('B',Barcode)-1), '-',
substring(Barcode,charindex('S',Barcode)+1,len(Barcode))
) AS ZBS
from TB_BarcodeTag4
where Barcode LIKE 'Z%B%S%'
2. Second solution is from my own
UPDATE TB_BarcodeTag4
SET Barcode = STUFF(Barcode, LEN(Barcode),1, '&')
WHERE Barcode LIKE 'Z%B%S%' AND Barcode like '%B'
UPDATE TB_BarcodeTag4
SET Barcode = REPLACE(REPLACE(REPLACE(Barcode, 'Z', ''),'B','-'),'S','-')
WHERE Barcode LIKE 'Z%B%S%'
UPDATE TB_BarcodeTag4
SET Barcode = STUFF(Barcode, LEN(Barcode),1, 'B')
WHERE Barcode LIKE 'Z%B%S%' AND Barcode like '%&'

I am confident there is a much more beautiful way to do this but just wanted to give back to the forum that has given so much to me. Hope it helps.
This produces 4 columns.
1. Zone
2. Shelf
3. Bay
4. ZoneBayShelf (Last column is just a concatenation of previous three)
Used a combination of SUBSTR and LOCATE function to find the points at which to cut the BarCode.
CONCAT just to bring it altogether.
SELECT
SUBSTR(BARCODE,2,LOCATE("Z",BARCODE, 1)) AS ZONE,
SUBSTR(BARCODE,LOCATE("B",BARCODE, 1),
LOCATE("B",BARCODE, 1)-LOCATE("Z",BARCODE, 1)) AS BAY,
SUBSTR(BARCODE,LOCATE("S",BARCODE, 1),
LEN(BARCODE)) AS SHELF,
CONCAT( SUBSTR(BARCODE,2,LOCATE("Z",BARCODE, 1)),
"-", SUBSTR(BARCODE,LOCATE("B",BARCODE, 1),
LOCATE("B",BARCODE, 1)-LOCATE("Z",BARCODE, 1)).
"-", SUBSTR(BARCODE,LOCATE("S",BARCODE, 1),
LEN(BARCODE))
) AS ZoneShelfBay
FROM DATABASE

It might look a bit messy but i think following code should help you:
First select the parts of string we are interested in:
declare #T1 table (barcode varchar(max))
insert into #T1 values('Z1B1S1A'),('Z10B10S10B'),('Z99B99S99C')
select substring(barcode,2,charindex('B',barcode)-2),
substring(barcode,charindex('B',barcode)+1,
charindex('S',Barcode)-charindex('B',Barcode)-1),
substring(barcode,charindex('S',barcode)+1,len(barcode))
from #T1
Now we can format as required:
select right('00'+ substring(barcode,2,charindex('B',barcode)-2),2)
+'-'+
right('00' +
substring(barcode,charindex('B',barcode)+1,
charindex('S',Barcode)-charindex('B',Barcode)-1),2)
+'-'+
right('000' +
substring(barcode,charindex('S',barcode)+1,
len(barcode)),3)
from #T1

SQL - Extracting a substring between two characters

I am trying to pull out the name of a city from a long string in my database. Here is an example of what the data looks like for a few different locations.
"701 MONROE STREET NW RUSSELLVILLE, AL 35653 (34.514971, -87.736372)"
"1825 HOLTVILLE ROAD WETUMPKA, AL 36092 (32.558544, -86.221265)"
I want to create a column for just the Name of the city. My thought was was to take everything Left of the fir comma and right of the following space. I have tried a few different ways to pull this but thing I might be missing something.
SELECT left(Location, CHARINDEX(',', location)) as city FROM table
This is returning everything left of the first comma.
"701 MONROE STREET NW RUSSELLVILLE,
"1825 HOLTVILLE ROAD WETUMPKA,
But now I want to return everything left of the comma and everything Right of the last space in this string and I am stumped as to how I would pull that information correctly. Any help would be appreciated.
Thanks,
Pat

Using REVERSE could work with something along the lines of:
SELECT reverse(
left(
reverse(
left(
Location,
CHARINDEX(',', location)-1
)
),
CHARINDEX(' ', reverse(
left(
Location,
CHARINDEX(',', location)-1
)
)
)
)
)as city FROM table;
Fiddle

If the Google API mentioned in my comment above is not an option. You can download (or even purchase) a ZIP Code database. The cost is nominal. I would suggest the quarterly updates because ZIP Codes change over time (add/edit/delete)
Example
Declare #YourTable table (id int,addr varchar(250))
Insert Into #YourTable values
(1,'701 MONROE STREET NW RUSSELLVILLE, AL 35653 (34.514971, -87.736372)'),
(2,'1825 HOLTVILLE ROAD WETUMPKA, AL 36092 (32.558544, -86.221265)')
Select A.ID
,StreetAddress =left(addr,nullif(charindex(Z.CityName,addr),0)-1)
,Z.CityName
,Z.StateCode
,Z.ZIPCode
From #YourTable A
Join [dbo].[OD-Zip] Z
on Z.ZipCode = substring(addr,nullif(patindex('%[0-9][0-9][0-9][0-9][0-9]%',addr),0),5)
and charindex(Z.CityName,addr)>0
and Z.ZipType='S'
and Z.CityType='D'
Returns
ID StreetAddress CityName StateCode ZIPCode
1 701 MONROE STREET NW Russellville AL 35653
2 1825 HOLTVILLE ROAD Wetumpka AL 36092

Retrieving right two words in char DB2 field

Using SQL, how can I retrieve the 2 words from the right end of a CHAR(30) field?
namefield = "My name is Bill Smith"
results = Bill Smith

This is untested, but maybe something like:
SELECT REVERSE(SUBSTRING(REVERSE(namefield) , 0, CHARINDEX(' ', REVERSE(namefield), CHARINDEX(' ', REVERSE(namefield), 0)+1))) FROM TABLE
Replace table with your table. Let me know if it works!

If you use LOCATE_IN_STRING you can say -X to search backwards - as op. to charindex which only only look forward.
select
-- locate_in_string(str,' ',-1),
-- substr(str,1,locate_in_string(str,' ',-1)-1),
-- length(str) - locate_in_string(str,' ',-1),
-- locate_in_string(str,' ',-7),
-- locate_in_string(str,' ',(-1* (length(str) - locate_in_string(str,' ',-1))) -2 ),
-- substr(str,1,11) || '<-',
-- substr(str,1,locate_in_string(str,' ',(-1* (length(str) - locate_in_string(str,' ',-1))) -2 )-1),
substr(str,locate_in_string(str,' ',(-1* (length(str) - locate_in_string(str,' ',-1))) -2 ))
FROM (
VALUES('My name is Bill Smith')
) AS T(str)
Here I start at the point of the last space and search for the prior space and then pass that to substr.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.7.0/com.ibm.db2.luw.sql.ref.doc/doc/r0054098.html
I've included my testing code above -- you can see how I test various parts by removing the comment for that line. This technique my prove useful in your testing.

Using Upper to Capitalize the first letter of City name

I am doing some data clean-up and need to Capitalize the first letter of City names. How do I capitalize the second word in a City Like Terra Bella.
SELECT UPPER(LEFT([MAIL CITY],1))+
LOWER(SUBSTRING([MAIL CITY],2,LEN([MAILCITY])))
FROM masterfeelisting
My results is this 'Terra bella' and I need 'Terra Bella'. Thanks in advance.

Ok, I know I answered this before, but it bugged me that we couldn't write something efficient to handle an unknown amount of 'text segments'.
So re-thinking it and researching, I discovered a way to change the [MAILCITY] field into XML nodes where each 'text segment' is assigned it's own Node within the xml field. Then those xml fields can be processed node by node, concatenated together, and then changed back to a SQL varchar. It's convoluted, but it works. :)
Here's the code:
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(max) not null
);
INSERT INTO #masterfeelisting VALUES
('terra bellA')
,(' terrA novA ')
,('chicagO ')
,('bostoN')
,('porT dE sanTo')
,(' porT dE sanTo pallo ');
SELECT
RTRIM
(
(SELECT
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, '')) + ' '
FROM [xmlNodeRecordSet].[nodeField].nodes('/N') as [xmlField]([xmlNode]) FOR
xml path(''), type
).value('.', 'varchar(max)')
) as [MAILCITY]
FROM
(SELECT
CAST('<N>' + REPLACE([MAILCITY],' ','</N><N>')+'</N>' as xml) as [nodeField]
FROM #masterfeelisting
) as [xmlNodeRecordSet];
Drop table #masterfeelisting;
First I create a table and fill it with dummy values.
Now here is the beauty of the code:
For each record in #masterfeelisting, we are going to create an xml field with a node for each 'text segment'.
ie. '<N></N><N>terrA</N><N>novA</N><N></N>'
(This is built from the varchar ' terrA novA ')
1) The way this is done is by using the REPLACE function.
The string starts with a '<N>' to designate the beginning of the node. Then:
REPLACE([MAILCITY],' ','</N><N>')
This effectively goes through the whole [MAILCITY] string and replaces each
' ' with '</N><N>'
and then the string ends with a '</N>'. Where '</N>' designates the end of each node.
So now we have a beautiful XML string with a couple of empty nodes and the 'text segments' nicely nestled in their own node. All the 'spaces' have been removed.
2) Then we have to CAST the string into xml. And we will name that field [nodeField]. Now we can use xml functions on our newly created record set. (Conveniently named [xmlNodeRecordSet].)
3) Now we can read the [xmlNodeRecordSet] into the main sub-Select by stating:
FROM [xmlNodeRecordSet].[nodeField].nodes('/N')
This tells us we are reading the [nodeField] as nodes with a '/N' delimiter.
This table of node fields is then parsed by stating:
as [xmlField]([xmlNode]) FOR xml path(''), type
This means each [xmlField] will be parsed for each [xmlNode] in the xml string.
4) So in the main sub-select:
Each blank node '<N></N>' is discarded. (Or not processed.)
Each node with a 'text segment' in it will be parsed. ie <N>terrA</N>
UPPER([xmlField].[xmlNode].value('.', 'char(1)')) +
This code will grab each node out of the field and take its contents '.' and only grab the first character 'char(1)'. Then it will Upper case that character. (the plus sign at the end means it will concatenate this letter with the next bit of code:
LOWER(STUFF([xmlField].[xmlNode].value('.', 'varchar(max)'), 1, 1, ''))
Now here is the beauty... STUFF is a function that will take a string, from a position, for a length, and substitute another string.
STUFF(string, start position, length, replacement string)
So our string is:
[xmlField].[xmlNode].value('.', 'varchar(max)')
Which grabs the whole string inside the current node since it is 'varchar(max)'.
The start position is 1. The length is 1. And the replacement string is ''. This effectively strips off the first character by replacing it with nothing. So the remaining string is all the other characters that we want to have lower case. So that's what we do... we use LOWER to make them all lower case. And this result is concatenated to our first letter that we already upper cased.
But wait... we are not done yet... we still have to append a + ' '. Which adds a blank space after our nicely capitalized 'text segment'. Just in case there is another 'text segment' after this node is done.
This main sub-Select will now parse each node in our [xmlField] and concatenate them all nicely together.
5) But now that we have one big happy concatenation, we still have to change it back from an xml field to a SQL varchar field. So after the main sub-select we need:
.value('.', 'varchar(max)')
This changes our [MAILCITY] back to a SQL varchar.
6) But hold on... we still are not done. Remember we put an extra space at the end of each 'text segment'??? Well the last 'text segment still has that extra space after it. So we need to Right Trim that space off by using RTRIM.
7) And dont forget to rename the final field back to as [MAILCITY]
8) And that's it. This code will take an unknown amount of 'text segments' and format each one of them. All using the fun of XML and it's node parsers.
Hope that helps :)

Here's one way to handle this using APPLY. Note that this solution supports up to 3 substrings (e.g. "Phoenix", "New York", "New York City") but can easily be updated to handle more.
DECLARE #string varchar(100) = 'nEW yoRk ciTY';
WITH DELIMCOUNT(String, DC) AS
(
SELECT #string, LEN(RTRIM(LTRIM(#string)))-LEN(REPLACE(RTRIM(LTRIM(#string)),' ',''))
),
CIPOS AS
(
SELECT *
FROM DELIMCOUNT
CROSS APPLY (SELECT CHARINDEX(char(32), string, 1)) CI1(CI1)
CROSS APPLY (SELECT CHARINDEX(char(32), string, CI1.CI1+1)) CI2(CI2)
)
SELECT
OldString = #string,
NewString =
CASE DC
WHEN 0 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,8000))
WHEN 1 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,100))
WHEN 2 THEN UPPER(SUBSTRING(string,1,1))+LOWER(SUBSTRING(string,2,CI1-1)) +
UPPER(SUBSTRING(string,CI1+1,1))+LOWER(SUBSTRING(string,CI1+2,CI2-(CI1+1))) +
UPPER(SUBSTRING(string,CI2+1,1))+LOWER(SUBSTRING(string,CI2+2,100))
END
FROM CIPOS;
Results:
OldString NewString
--------------- --------------
nEW yoRk ciTY New York City

This will only capitalize the first letter of the second word. A shorter but less flexible approach. Replace #str with [Mail City].
DECLARE #str AS VARCHAR(50) = 'Los angelas'
SELECT STUFF(#str, CHARINDEX(' ', #str) + 1, 1, UPPER(SUBSTRING(#str, CHARINDEX(' ', #str) + 1, 1)));

This is a way to use imbedded Selects for three City name parts.
It uses CHARINDEX to find the location of your separator character. (ie a space)
I put an 'if' structure around the Select to test if you have any records with more than 3 parts to the city name. If you ever get the warning message, you could add another sub-Select to handle another city part.
Although... just to be clear... SQL is not the best language to do complicated formatting. It was written as a data retrieval engine with the idea that another program will take that data and massage it into a friendlier look and feel. It may be easier to handle the formatting in another program. But if you insist on using SQL and you need to account for city names with 5 or more parts... you may want to consider using Cursors so you can loop through the variable possibilities. (But Cursors are not a good habit to get into. So don't do that unless you've exhausted all other options.)
Anyway, the following code creates and populates a table so you can test the code and see how it works. Enjoy!
CREATE TABLE
#masterfeelisting (
[MAILCITY] varchar(30) not null
);
Insert into #masterfeelisting select 'terra bella';
Insert into #masterfeelisting select ' terrA novA ';
Insert into #masterfeelisting select 'chicagO ';
Insert into #masterfeelisting select 'bostoN';
Insert into #masterfeelisting select 'porT dE sanTo';
--Insert into #masterfeelisting select ' porT dE sanTo pallo ';
Declare #intSpaceCount as integer;
SELECT #intSpaceCount = max (len(RTRIM(LTRIM([MAILCITY]))) - len(replace([MAILCITY],' ',''))) FROM #masterfeelisting;
if #intSpaceCount > 2
SELECT 'You need to account for more than 3 city name parts ' as Warning, #intSpaceCount as SpacesFound;
else
SELECT
cThird.[MAILCITY1] + cThird.[MAILCITY2] + cThird.[MAILCITY3] as [MAILCITY]
FROM
(SELECT
bSecond.[MAILCITY1] as [MAILCITY1]
,SUBSTRING(bSecond.[MAILCITY2],1,bSecond.[intCol2]) as [MAILCITY2]
,UPPER(SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 1, 1)) +
SUBSTRING(bSecond.[MAILCITY2],bSecond.[intCol2] + 2,LEN(bSecond.[MAILCITY2]) - bSecond.[intCol2]) as [MAILCITY3]
FROM
(SELECT
SUBSTRING(aFirst.[MAILCITY],1,aFirst.[intCol1]) as [MAILCITY1]
,UPPER(SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, 1)) +
SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 2,LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) as [MAILCITY2]
,CHARINDEX ( ' ', SUBSTRING(aFirst.[MAILCITY],aFirst.[intCol1] + 1, LEN(aFirst.[MAILCITY]) - aFirst.[intCol1]) ) as intCol2
FROM
(SELECT
UPPER (LEFT(RTRIM(LTRIM(mstr.[MAILCITY])),1)) +
LOWER(SUBSTRING(RTRIM(LTRIM(mstr.[MAILCITY])),2,LEN(RTRIM(LTRIM(mstr.[MAILCITY])))-1)) as [MAILCITY]
,CHARINDEX ( ' ', RTRIM(LTRIM(mstr.[MAILCITY]))) as intCol1
FROM
#masterfeelisting as mstr -- Initial Master Table
) as aFirst -- First Select Shell
) as bSecond -- Second Select Shell
) as cThird; -- Third Select Shell
Drop table #masterfeelisting;

Unexpected execution in an update query in SQL

I am getting an 'Unexpected' result with an update query in SQL Server 2012.
This is what I am trying to do.
From a column (IDENTIFIER) composed by an ID ','name (e.g. 258967,Sarah Jones), I have to fill other two columns: ID and SELLER_NAME.
The original column has some values with a blank at the end and the rest with out it:
'258967,Sarah Jones'
'98745,Richard James '
This is the update query that I am executing:
UPDATE SELLER
SET
IDENTIFIER = LTRIM(RTRIM(IDENTIFIER)),
ID = Left(IDENTIFIER , charindex(',', IDENTIFIER )-1),
SELLER_NAME = UPPER(RIGHT((IDENTIFIER ),LEN(IDENTIFIER )-CHARINDEX(',',IDENTIFIER )));
But I am having a wrong result at the end
258967,Sarah Jones 258967 SARAH JONES
98745,Richard James 98745 ICHARD JAMES
The same happens with all the names that has the blank at the end. At this point I wonder, if I have specified that I want to eliminate all the blanks at the begining and at the end of the value of IDENTIFIER as a first action, why the system updates the ID and SELLER_NAMES and then does this action?.
Just to specify: The IDENTIFIER column is part of the seller table which is updating from another person that imports the data from an Excel file. I receive this values and I have to normalize the information. I only can read the SELLER table, take this into account before answer

Try this, because you have space in right side of name, so it will just truncate one char from name. So just need to RTRIM(IDENTIFIER) and thats it.
SELLER_NAME = UPPER(RIGHT((RTRIM(IDENTIFIER)),LEN(IDENTIFIER )-CHARINDEX(',',IDENTIFIER)));

The design of your tables violates 1NF and is nothing but painful. Instead of doing all this crazy string manipulation you could leverage PARSENAME here quite easily.
with Something(SomeValue) as
(
select '258967,Sarah Jones' union all
select '98745,Richard James '
)
select *
, ltrim(rtrim(PARSENAME(replace(SomeValue, ',', '.'), 2)))
, ltrim(rtrim(PARSENAME(replace(SomeValue, ',', '.'), 1)))
from Something

Instead of using Right(), use SubString().
Here's an example. I've tried to show each step individually to illustrate
; WITH x (identifier) AS (
SELECT '258967,Sarah Jones'
UNION ALL
SELECT '98745,Richar James '
)
SELECT identifier
, CharIndex(',', identifier) As comma
, SubString(identifier, CharIndex(',', identifier) + 1, 1000) As name_only
, LTrim(RTrim(SubString(identifier, CharIndex(',', identifier) + 1, 1000))) As trimmed_name_only
FROM x
Note that the 1000 used should be the maximum length of the column definition or higher e.g. if your IDENTIFIER column is a varchar(2000) then use 2,000 instead.

try trim the IDENTIFIER first like this
SALLER_NAME = UPPER(RIGHT((RTRIM(IDENTIFIER),LEN(IDENTIFIER )-CHARINDEX(',',IDENTIFIER )));

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove unwanted text from column values - sql

Related

SQL Server replace different characters in a column value

SQL - Extracting a substring between two characters

Retrieving right two words in char DB2 field

Using Upper to Capitalize the first letter of City name

Unexpected execution in an update query in SQL

Categories

Resources