remove all words in front of a consistent known sub string - sql

I am trying to remove all words in front of a consistent known sub string ("To Find a"). I would like to remove everything in front of "To Find a" in bulk over 600 Descriptionstrings. The words in front of this sub string are different in all cases. For example (Description 'Some Text, Some More Text…To Find a… Some More Text') I have red several other posts and have tried using TRIM, CHARINDEX, and SUBSTRING_INDEX.
Thanks for the help!

If this is SQL Server, a relatively easy way to remove the leading bit would be with the help of SUBSTRING and CHARINDEX:
SELECT SUBSTRING(ColumnName, CHARINDEX('To Find a', ColumnName), 2147483647)
FROM YourTable;
The CHARINDEX function finds the position of the substring, and the result is used as SUBSTRING's second argument. The length argument is set to the maximum int value to make sure all the remaining characters to the end of the string are returned. (You don't need to calculate the exact number.) If the substring isn't found, CHARINDEX returns 0. In this context, 0 as the starting position causes the entire string value to be returned.
If you actually want to do the opposite, i.e. keep the leading text and remove the rest, as one of your comments seems to imply, you could try using CHARINDEX and LEFT in this way:
SELECT LEFT(ColumnName, CHARINDEX('To Find a', ColumnName + 'To Find a') - 1)
FROM YourTable;
Again, CHARINDEX returns the position of 'To Find a' in the column value. After subtracting 1, that becomes the length argument of LEFT. To make sure CHARINDEX does find the search term, the term is appended to the value being searched: if the original value doesn't have 'To Find a', CHARINDEX hits the appended bit and returns the position after the last character of the original string, which, when subtracted, becomes the string's exact length.

In SQL Server to select the leading text:
DECLARE #String AS VARCHAR(255) = 'Some Text, Some More Text…To Find a… Some More Text'
SELECT LEFT(#String,CHARINDEX('To Find a',#String)-1)
(Assuming string is consistently present, as stated in question)
To remove the leading text:
DECLARE #String AS VARCHAR(255) = 'Some Text, Some More Text…To Find a… Some More Text'
SELECT RIGHT(#String,CHARINDEX(REVERSE('To Find a'),REVERSE(#String))-1)
If you want to keep the 'To Find a' then you adjust the -1 near the end of the query.
Update:
If 'to find a' isn't in every string, and using your table:
SELECT CASE WHEN CHARINDEX('To Find a',YourField) > 0
THEN LEFT(YourField,CHARINDEX('To Find a',YourField)-1)
ELSE YourField
END AS 'FixedField'
FROM YourTable

Related

SQL Server pull out only data after = OR only the numerics

It seems that a regular expression would be ideal, yet some team members are not fond of regex...
Problem: Data in a column (from a mainframe flat file import) looks like 2 different ways
BreakID = 83823737237
OR
MFR BreakID=482883
Thus, the differences are a space before numerics, length of both the alphacharacter before the equals varies and finally the length of the numbers will vary.
Seems I have a few approaches,
1. Everything after the = sign , and trim ?
2. regex , get only the numerics?
So I found this code, in which I assume PATINDEX is standard way of doing regex in -tsql ? what is "string" in this?
SELECT SUBSTRING(string, PATINDEX('%[0-9]%', string), PATINDEX('%[0-9][^0-9]%', string + 't') - PATINDEX('%[0-9]%',
string) + 1) AS Number
How would this be solved with best practices?
Slightly different answer than scsimon. I usually go this route when I have to grab the vals at the end of a string. You reverse the string and grab position of the first instance of your key value ('=' in this case). Get that position with charindex, and then grab the RIGHT() chars using that charindex value.
DECLARE #val1 VARCHAR(100) = 'BreakID = 83823737237'
DECLARE #val2 VARCHAR(100) = 'MFR BreakID=482883'
SELECT
LTRIM(RTRIM(RIGHT(#val1, CHARINDEX('=', REVERSE(#val1), 0)-1)))
,LTRIM(RTRIM(RIGHT(#val2, CHARINDEX('=', REVERSE(#val2), 0)-1)))
This solution will play nice if you have weird cases, like if you have a company called SQL=Cool in your data and it needs an ID:
'SQL=CoolID = 12345'
and you wanted to still get 12345.
Seems like a good use case for substring and replace with charindex
We take the substring from everything starting with the first value after the = up to 99 digits (or how ever many you want to enter). We use replace to get rid of the leading space, if there is one.
select replace(substring(stringColumn,charindex('=',stringColumn) + 1,99),' ','')
That solution is good and versatile, although it sounds like your string will always have an = so you could write something more specific around that if you want to.
That solution finds the start location of the first number string:
PATINDEX('%[0-9]%', string)
And finds the location of the first non-numeric character after that number string (adding a 't' to the end of the string, in case it ends in a number which would otherwise throw an error):
PATINDEX('%[0-9][^0-9]%', string + 't')
And finally it subtracts the start position of the number from the end position to find the length of the number string, and pulls that length out with substring:
SELECT SUBSTRING(string, PATINDEX('%[0-9]%', string), PATINDEX('%[0-9][^0-9]%', string + 't') - PATINDEX('%[0-9]%',
string) + 1) AS Number
Here "string" is a placeholder that should be replaced with your column name. Also, the easiest way to test stuff like this in tsql is to use a variable:
DECLARE #string varchar(100) = 'foo bar la la la 83823737237'
SELECT SUBSTRING(#string, PATINDEX('%[0-9]%', #string), PATINDEX('%[0-9][^0-9]%', #string + 't') - PATINDEX('%[0-9]%',
#string) + 1) AS Number
Output:
83823737237
Kaizen: go for the simple solution, not the perfect one
SELECT substring(c, charindex('=', c), 999)
I'm assuming the column you're putting this in is some kind of number. Sqlserver doesn't care about leading spaces when casting to a number
If it's going in a string column then wrap it in a ltrim()
Now to your questions
1 .. trim
Sure, as above
2 regex...
Not implemented in sqlserver unless you use CLR
PATINDEX ...
It's like regex but it's a very limited subset that only does searching, only returns one string index, doesn't capture, has limited/no character classes. It's more like dos/vb6 wildcards/like than regex
...best practice?
Look at it simply; you're getting the part of a string after an =, not landing on the moon. the best solution to minor optimisations like these is the one that requires the least amount of mental effort from the next human who takes over your job, to get up to speed with this (it'll still be being used in 20 years) :)

Parsing a string and comparing values to existing column

I have the below table with the string marked "Remark" that needs to be parsed. The highlighted fares need to be compared from the columns TotalBookedFare and Remark. The only issue is that the value I need to compare under the Remark column is in the middle of a string. I've tried to parse the string but I cannot figure it out. I am using SQL Server 2008. As you can see the first row is not a match while the other three are matching.
Ideally I would like to convert the one string "Remark" to the 5 columns listed below so I can compare the TotalBookedFare to the "New" column.dionbennett
I think this should work
select substring(
remark, --string base
charindex ('/', 'xyz/57.77usd/zyx') + 1,
--starting position is location one to the right of first instance of / character (5)
charindex ('u', 'xyz/57.77usd/zyx', charindex ('/', 'xyz/57.77usd/zyx')) - charindex ('/', 'xyz/57.77usd/zyx') - 1
--length is the location of the first instance of the u character
--starting from the location of first instance of the / character (10)
--then subtracted by the location of the first instance of the / character (4)
--and then an additional 1 resulting in the length of the string to be extracted (5)
)
The string I put in there is just a more concrete example, if you replace it with Remark, it should extract the substring for each row. You could even modify it with some copy/pasting to get each of those columns you were looking for.

SQL: Finding dynamic length characters in a data string

I am not sure how to do this, but I have a string of data. I need to isolate a number out of the string that can vary in length. The original string also varies in length. Let me give you an example. Here is a set of the original data string:
:000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:
:000000000:715186816:P:000001996:::H1009671:H1009671:
For these two examples, I need 3SA70000SUPPL from the first and H1009671 from the second. How would I do this using SQL? I have heard that case statements might work, but I don't see how. Please help.
This works in Oracle 11g:
with tbl as (
select ':000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:' str from dual
union
select ':000000000:715186816:P:000001996:::H1009671:H1009671:' str from dual
)
select REGEXP_SUBSTR(str, '([^:]*)(:|$)', 1, 8, NULL, 1) data
from tbl;
Which can be described as "look at the 8th occurrence of zero or more non-colon characters that are followed by a colon or the end of the line, and return the 1st subgroup (which is the data less the colon or end of the line).
From this post: REGEX to select nth value from a list, allowing for nulls
Sorry, just saw you are using DB2. I don't know if there is an equivalent regular expression function, but maybe it will still help.
For the fun of it: SQL Fiddle
first substring gets the string at ::: and second substring retrieves the string starting from ::: to :
declare #x varchar(1024)=':000000000:715186816:P:000001996:::H1009671:H1009671:'
declare #temp varchar(1024)= SUBSTRING(#x,patindex('%:::%', #x)+3, len(#x))
SELECT SUBSTRING( #temp, 0,CHARINDEX(':', #temp, 0))

Transact SQL replace part of string

Is it possible to delete part of string using regexp (or something else, may be something like CHARINDEX could help) in SQL query?
I use MS SQL Server (2008 most likely).
Example: I have strings like "[some useless info] Useful part of string" I want to delete parts with text in brackets if they are in line.
Use REPLACE
for example :
UPDATE authors SET city = replace(city, 'To Remove', 'With BLACK or Whatever')
WHERE city LIKE 'Salt%'; // with where condition
You can use the PATINDEX function. Its not a complete regular expression implementation but you can use it for simple things.
PATINDEX (Transact-SQL)> Returns the starting position of the first occurrence of a pattern in a specified expression, or zeros if the pattern is not found, on all valid text and character data types.
OR You can use CLR to extend the SQL Server with a complete regular expression implementation.
SQL Server 2005: CLR Integration
SELECT * FROM temp where replace(replace(replace(url,'http://',''),'www.',''),'https://','')='"+url+"';
You can use STUFF to insert a string into another string. It deletes a specified length of characters in the first string at the start position and then inserts the second string into the first string at the start position.
For example, the code below, replaces the 5 with 666666:
DECLARE #Variable NVARCHAR(MAX) = '12345678910'
SELECT STUFF(#Variable, 5, 1, '666666')
Note, that the second argument is not a string, it is a position and you are able to calculate it position using CHARINDEX for example.
Here is your case:
DECLARE #Variable NVARCHAR(MAX) = '[some useless info] Useful part of string'
SELECT STUFF(
#Variable
,CHARINDEX('[', #Variable)
,LEN(SUBSTRING(#Variable, CHARINDEX('[', #Variable), CHARINDEX(']', #Variable) - LEN(SUBSTRING(#Variable, 0, CHARINDEX('[', #Variable)))))
,''
)
Finally helps REPLACE, SUBSTRING and PATINDEX.
REPLACE(t.badString, Substring(t.badString , Patindex('%[%' , t.badString)+1 , Patindex('%]%' , t.badString)), '').
Thanks to all.

Split string and replace

I have a varchar column with Url's with data that looks like this:
http://google.mews.......http://www.somesite.com
I want to get rid of the first http and keep the second one so the row above would result in:
http://www.somesite.com
I've tried using split() but can't get it to work
Thanks
If you are trying to do this using T-SQL, you can try something in the lines of:
-- assume #v is the variable holding the URL
SELECT SUBSTRING(#v, PATINDEX('%_http://%', #v) + 1, LEN(#v))
This will return the start position of the first http:// that has before it at least one character (hence the '%_' before it and the + 1 offset).
If the first URL always starts right from the beginning of the string, you can use SUBSTRING() & CHARINDEX():
SELECT SUBSTRING(column, CHARINDEX('http://', column, 2), LEN(column))
FROM table
CHARINDEX simply searches a string for a substring and returns the substring's starting position within the string. Its third argument is optional and, if set, specifies the search starting position, in this case it's 2 so it didn't hit the first http://.