SQL search for complex string within a row - sql

Ok so I have a database row with a specified string in for example i am here.
I want to know how I could match this row (in a T-SQL query) if for example my input was hello i am here in this bright room.
To be clearer and get a better answer hopefully, here is a rough example:
Table:
1 | i am there |
2 | i am here |
3 | i am not here |
Problem:
I have the input hello i am here in this bright room - this should return a match to row 2 above only as only row 2 contains i am here definitively whilst the others contain the characters for i am here but with subtle differences.
If anyone can help it would be much appreciated. I would like to do this all in SQL so I can create a stored procedure for the above.

DECLARE #InputString VARCHAR(100);
SET #InputString = 'hello i am here in this bright room';
SELECT *
FROM YourTable
WHERE CHARINDEX(YourColumn, #InputString) <> 0;

declare #input as varchar
set #input = 'hello i am here in this bright room'
select *
from MyTable
where #input like '%' + MyCol + '%'

Related

How to use the SQL REPLACE Function, so that it will replace some text between a certain range, rather than one specific value

I have a table called Product and I am trying to replace some of the values in the Product ID column pictured below:
ProductID
PIDLL0000074853
PIDLL000086752
PIDLL00000084276
I am familiar with the REPLACE function and have used this like so:
SELECT REPLACE(ProductID, 'LL00000', '/') AS 'Product Code'
FROM Product
Which returns:
Product Code
PID/74853
PIDLL000086752
PID/084276
There will always be there letter L in the ProductID twice LL. However, the zeros range between 4-6. The L and 0 should be replaced with a /.
If anyone could suggest the best way to achieve this, it would be greatly appreciate. I'm using Microsoft SQL Server, so standard SQL syntax would be ideal.
Please try the following solution.
All credit goes to #JeroenMostert
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, ProductID VARCHAR(50));
INSERT INTO #tbl (ProductID) VALUES
('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PITLL0000084770');
-- DDL and sample data population, end
SELECT *
, CONCAT(LEFT(ProductID,3),'/', CONVERT(DECIMAL(38, 0), STUFF(ProductID, 1, 5, ''))) AS [After]
FROM #tbl;
Output
+----+------------------+-----------+
| ID | ProductID | After |
+----+------------------+-----------+
| 1 | PIDLL0000074853 | PID/74853 |
| 2 | PIDLL000086752 | PID/86752 |
| 3 | PIDLL00000084276 | PID/84276 |
| 4 | PITLL0000084770 | PIT/84770 |
+----+------------------+-----------+
This isn't particularly pretty in T-SQL, as it doesn't support regex or even pattern replacement. Therefore you method is to use things like CHARINDEX and PATINDEX to find the start and end positions and then replace (don't read REPLACE) that part of the text.
This uses CHARINDEX to find the 'LL', and then PATINDEX to find the first non '0' character after that position. As PATINDEX doesn't support a start position I have to use STUFF to remove the first characters.
Then, finally, we can use STUFF (again) to replace the length of characters with a single '/':
SELECT STUFF(V.ProductID,CI.I+2,ISNULL(PI.I,0),'/')
FROM (VALUES('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PIDLL3246954384276'))V(ProductID)
CROSS APPLY(VALUES(NULLIF(CHARINDEX('LL',V.ProductID),0)))CI(I)
CROSS APPLY(VALUES(NULLIF(PATINDEX('%[^0]%',STUFF(V.ProductID,1,CI.I+2,'')),1)))PI(I);
If you are always starting with "PIDLL", you can just remove the "PIDLL", cast the rest as an INT to lose the leading 0's, then append the front of the string with "PID/". One line of code.
-- Sample Data
DECLARE #t TABLE (ProductID VARCHAR(40));
INSERT #t VALUES('PIDLL0000074853'),('PIDLL000086752'),('PIDLL00000084276');
-- Solution
SELECT t.ProductID, NewProdID = 'PID/'+LEFT(CAST(REPLACE(t.ProductID,'PIDLL','') AS INT),20)
FROM #t AS t;
Returns:
ProductID NewProdID
------------------ ----------------
PIDLL0000074853 PID/74853
PIDLL000086752 PID/86752
PIDLL00000084276 PID/84276

in SQL how can I remove the first 3 characters on the left and everything on the right after an specific character

In SQL how can I remove (from displaying on my report no deleting from database) the first 3 characters (CN=) and everything after the comma that is followed by "OU" so that I am left with the name and last name in the same column? for example:
CN=Tom Chess,OU=records,DC=1234564786_data for testing, 1234567
CN=Jack Bauer,OU=records,DC=1234564786_data for testing, 1234567
CN=John Snow,OU=records,DC=1234564786_data for testing, 1234567
CN=Anna Rodriguez,OU=records,DC=1234564786_data for testing, 1234567
Desired display:
Tom Chess
Jack Bauer
John Snow
Anna Rodriguez
I tried playing with TRIM but I don't know how to do it without declaring the position and with names and last names having different lengths I really don't know how to handle that.
Thank you in advance
Update: I wonder about an approach of using Locate to match the position of the comma and then feed that to a sub-string. Not sure if a approach like would work and not sure how to put the syntax together. What do you think? will it be a feasible approach?
You can try this one SUBSTRING(ColumnName, 4, CHARINDEX(',', ColumnName) - 4)
In Postgres, you could use split_part() assuming no name contains a ,
select substr(split_part(the_column, ',', 1), 4)
from ...
Db2 11.x for LUW:
with tab (str) as (values
' CN = Tom Chess , OU = records,DC=1234564786_data for testing, 1234567'
, 'CN=Jack Bauer,OU=records,DC=1234564786_data for testing, 1234567'
, 'CN=John Snow,OU=records,DC=1234564786_data for testing, 1234567'
, 'CN=Anna Rodriguez,OU=records,DC=1234564786_data for testing, 1234567'
)
select REGEXP_REPLACE(str, '^\s*CN\s*=\s*(.*)\s*,\s*OU\s*=.*', '\1')
from tab;
Note, that such a regex pattern allows an arbitrary number of spaces as in the 1-st record of example above.
In Oracle 11g, it might work.
REGEXP_SUBSTR(REGEXP_SUBSTR(COLUMN_NAME, '[^CN=]+',1,1),'[^,OU]+',1,1)
I think there has to be a loop to handle this. Here's SQL Server function that will parse this out. (I know the question didn't specify SQL Server, but it's an example of how it can be done.)
select dbo.ScrubFieldValue(value) from table will return what you're looking for
CREATE FUNCTION ScrubFieldValue
(
#Input varchar(8000)
)
RETURNS varchar(8000)
AS
BEGIN
DECLARE #retval varchar(8000)
DECLARE #charidx int
DECLARE #remaining varchar(8000)
DECLARE #current varchar(8000)
DECLARE #currentLength int
select #retval = ''
select #remaining = #Input
select #charidx = CHARINDEX('CN=', #remaining,2)
while(LEN(#remaining) > 0)
BEGIN
--strip current row from remaining
if (#charidx > 0)
BEGIN
select #current = SUBSTRING(#remaining, 1, #charidx - 1)
END
else
BEGIN
select #current = #remaining
END
select #currentLength = LEN(#current)
-- get current name
select #current = SUBSTRING(#current, 4, CHARINDEX(',OU', #current)-4)
select #retval = #retval + #current + ' '
-- strip off current from remaining
select #remaining =substring(#remaining,#currentLength + 1,
LEN(#remaining) - #currentLength)
select #charidx = CHARINDEX('CN=', #remaining,2)
END
RETURN #retval
END
On my version of DB2 for Z/OS CHARINDEX throws a syntax error. Here are two ways to work around that.
SUBSTRING(ColumnName, 4, INSTR(ColumnName,',',1) - 4)
SUBSTRING(ColumnName, 4, LOCATE_IN_STRING(ColumnName,',') - 4)
I should add that the version is V12R1
If input str is wellformed (i.e. looks like your sample data without any additional tokens such as space), you could use something like:
substr(str,locate('CN=', str)+length('CN='), locate(',', str)-length('CN=')-1)
If your Db2 version support REGEXP, that's a better choice.

Needing to parse out data

I am trying to parse out certain data from a string and I am having issues.
Here is the string:
1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found
I need to return this "REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR."
Here is my query SELECT CONVERT(VARCHAR(5000),CHARINDEX('14=',Column))FROM Table
If you're parsing, can we assume that you don't know what might come after the '^14=', but you need to capture whatever does? So searching for a particular string won't work because anything could come after '^14='. The best approach is to identify the longest reliable specific string that gives you a "foothold" to find the data you're looking for. What you don't want to do is accidentally capture the wrong data if the '^14=' appears more than once in your string. It looks like the '^' is your delimiter, since I don't see one at the start of the string. So you were actually on the right track, you just need to use SUBSTRING as a commenter mentioned. You also need to identify a marker for the end of the error message, which looks like it might be the next occurring '^', correct? Check several samples to be sure of this, and make sure the end marker doesn't at any point exist before your start marker or you'll get an error.
SELECT CAST((SUBSTRING(Column,CHARINDEX('14=',Column,0),CHARINDEX('^',Column,CHARINDEX('14=',Column,0) + 1) - CHARINDEX('14=',Column,0))) AS VARCHAR(5000)) FROM Table
You may need to increment or decrement the start position and end position by doing a +1 or -1 to fully capture your error message. But this should dynamically grab any length error message provided you are positive of your starting and ending markers.
I also have here a table-valued parsing function, where you would pass it the string and the '^' and it will return a table of data with not only the 14=, but everything.
CREATE function [dbo].[fn_SplitStringByDelimeter]
(
#list nvarchar(8000)
,#splitOn char(1)
)
returns #rtnTable table
(
id int identity(1,1)
,value nvarchar(100)
)
as
begin
declare #index int
declare #string nvarchar(4000)
select #index = 1
if len(#list) < 1 or #list is null return
--
while #index!= 0
begin
set #index = charindex(#splitOn,#list)
if #index!=0
set #string = left(#list,#index - 1)
else
set #string = #list
if(len(#string)>0)
insert into #rtnTable(value) values(#string)
--
set #list = right(#list,len(#list) - #index)
if len(#list) = 0 break
end
return
end
It sounds like you're trying to get the value of argument 14. This should do it:
select substring(
someData
, charindex('^14=',someData) + 4
, charindex('^',someData, charindex('^14=',someData) + 4) - charindex('^14=',someData) - 4
) errorMessage
from myData
where charindex('^14=',someData) > 0
and charindex('^',someData, charindex('^14=',someData) + 4) > 0
Try it here: http://sqlfiddle.com/#!18/22f23/2
This gets a substring of the given input.
The substring starts at the first character after the string ^14=; i.e. we get the index of ^14= in the string, then add 4 to it to skip over the matched characters themselves.
The substring ends at the first ^ character after the one in ^14=. We get the index of that character, then subtract the starting position from it to get the length of the desired output.
Caveats: If there is no parameter (^) after ^14= this will not work. Equally if there is no ^14= (even if the string starts 14=) this will not work. From the information available that's OK; but if this is a concern please say and we can provide something to handle that more complex scenario.
Code to create table & populate demo data
create table myData (someData nvarchar(256))
insert myData (someData)
values ('1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found')
, ('1xx^14=something else.^10=xx')
You could try to use a Case When statement with wildcards to find the value that you want.
Example:
SELECT
CASE
WHEN x LIKE '%REP Not Found%'
THEN 'REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR'
ELSE
''
END AS x
FROM
#T1
You could use this query (assuming MySQL database):
-- item is the column that contains the string
select SUBSTR(item, LOCATE('REP',item), LOCATE('REPRGR.',item) + LENGTH('REPRGR.') - LOCATE('REP', item)) info_msg from Table;
Illustration:
create table parsetest (item varchar(5000));
insert into parsetest values('1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found');
select * from parsetest;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| item |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1=BETA.1.0^2=175^3=812^4=R^5=N^9=1^12=1^13=00032^14=REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR.^10=107~117~265~1114~3143~3505~3506~3513~5717^11=SA16~1~WY~WY~A~S~20100210~001~SE62^-omitted due to existing Rep Not Found |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
select SUBSTR(item, LOCATE('REP',item), LOCATE('REPRGR.',item) + LENGTH('REPRGR.') - LOCATE('REP', item)) info_msg from parsetest;
+------------------------------------------------------+
| info_msg |
+------------------------------------------------------+
| REP NOT FOUND ON REP TABLE, CANNOT INSERT TO REPRGR. |
+------------------------------------------------------+

create sql view from comma separated values

T-sql question:
I need help to build a join from 2 tables, where on one of the tables I have aggregated data (comma separated values).
I have a table - Users where I have 3 columns: UserId, DefaultLanguage and OtherLanguages.
The table looks like this:
UserId | DefaultLanguage | OtherLanguages
---------------------------------------------
1 | en | NULL
2 | en | it, fr
3 | fr | en, it
4 | en | sp
and so on.
I have another table where I have the association between language code (en, fr, ro, it, sp) and language name:
LangCode | LanguageName
-------------------------
en | English
fr | French
it | Italian
sp | Spanish
and so on.
I want to create a view like this:
UserId | DefaultLanguage | OtherLanguages
---------------------------------------------
1 | English | NULL
2 | English | Italian, French
3 | French | English, Italian
4 | English | Spanish
and so on.
In short, I need a view where the language code is replaced by language name.
Any help, please?
Several solutions of course you can recreate all table change the data structure.
1. If all the language are 2 digits:
select t1.UserId, t2.LanguageName,
ISNULL( t3.LanguageName, '') + ISNULL(', '+t4.LanguageName, '') + ISNULL( ', '+t5.LanguageName, '') OtherLanguages
from Table1 t1
inner join Table2 t2 on t1.DefaultLanguage = t2.LangCode
left join Table2 t3 on Left(t1.OtherLanguages,2) = t3.LangCode
left join Table2 t4 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 3 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 4, 2) ELSE null END = t4.LangCode
left join Table2 t5 on CASE WHEN len(Replace(t1.OtherLanguages, ' ', '')) > 6 THEN
SUBSTRING( Replace(t1.OtherLanguages, ' ', ''), 7, 2) ELSE null END = t5.LangCode
Use user-define function:
CREATE FUNCTION [dbo].[func_GetLanguageName] (#pLanguageList varchar(max))
RETURNS varchar(max) AS
BEGIN
Declare #aLanguageList varchar(max) = #pLanguageList
Declare #aLangCode varchar(max) = null
Declare #aReturnName varchar(max) = null
WHILE LEN(#aLanguageList) > 0
BEGIN
IF PATINDEX('%,%',#aLanguageList) > 0
BEGIN
SET #aLangCode = RTRIM(LTRIM(SUBSTRING(#aLanguageList, 0, PATINDEX('%,%',#aLanguageList))))
SET #aLanguageList = LTRIM(SUBSTRING(#aLanguageList, LEN(#aLangCode + ',') + 1,LEN(#aLanguageList)))
END
ELSE
BEGIN
SET #aLangCode = #aLanguageList
SET #aLanguageList = NULL
END
Select #aReturnName = ISNULL( #aReturnName + ', ' , '') + LanguageName from Table2 where LangCode=#aLangCode
END
RETURN(#aReturnName)
END
and use select
select UserId, dbo.func_GetLanguageName(DefaultLanguage)DefaultLanguage, dbo.func_GetLanguageName(OtherLanguages) OtherLanguages from table1
Best practice would dictate not to have this type of comma delimited
data in a column...
Since you stated in comments that the schema cannot be changed, the next best thing is a function. This can be used in a select query in-line.
SQL is notoriously slow with string manipulation. Here is an interesting article on the topic. There are many SQL "string split" functions out there. They all generally split a comma delimited string and return a table.
For this specific use-case, you actually need a scalar-valued
function (a function which returns one value) rather than a
table-valued function (one which returns a table of values).
Below is a modified such function, which returns a scalar value in place of the original comma delimited string of language codes.
The comments explain what is happening line by line.
The gist is that you must loop through the input string keeping track of the last comma location, extract each code, lookup the full language from the languages table, and then return the output as a comma-delimited string.
Language codes to languages function:
Create Function [dbo].fn_languageCodeToFull
( #Input Varchar(100) )
Returns Varchar(1000)
As
Begin
-- To address null input, based on the example you provided, we set the output to NULL if there is no input
If #Input = '' Or #Input Is Null
Return Null
Declare
#CodeLength int, -- constant for code length to avoid hardcoded "magic numbers"
#Output varchar(1000), -- will contain the final comma delimited string of full languages
#LastIndex int, -- tracks the location of the input we are searching as we loop over the string
#CurrentCode varchar(2), -- for code readability, we extract each language code to this variable
#CurrentLanguage varchar(50), -- for code readability, we store the full language in this variable
#IndexIncrement int -- constant to increment the search index by 1 at each iteration
-- ensuring the loop moves forward
Set #LastIndex = 0 -- seed the index, so we begin to search at 0 index
Set #CodeLength = 2 -- ISO language codes are always 2 characters in length
Set #Output = '' -- seed with empty string to avoid NULL when concatenating
Set #IndexIncrement = 1 -- again avoiding hardcoded values...
-- We will loop until we have gone to or beyond the length of the input string
While #LastIndex < len(#Input)
Begin
-- Set the index of each comma (charindex is 1-based)
Set #LastIndex = CHARINDEX(',', #Input, #LastIndex)
-- When we get to the last item, CharIndex will return 0 when it does not find a comma.
-- To pull the last item, we will artificially set #LastIndex to be 1 greater than the input string
-- This will allow the code following this line to be unaltered for this scenario
If #LastIndex = 0 set #LastIndex = len(#Input) + 1 -- account for 1-based index of substring
-- Extract the code prior to the current comma that charindex has identified
Set #CurrentCode = substring(#Input, #LastIndex - #CodeLength, #CodeLength)
-- Do a lookup to get the language for the current code
Set #CurrentLanguage = (Select LanguageName From languages Where code = #CurrentCode)
-- Only add comma after first language to ensure no extra comma will be present in Output
If #LastIndex > 3 Set #Output = #Output + ','
-- Here we build the Output string with the language
Set #Output = #Output + #CurrentLanguage
-- Finally, we increment #LastIndex by 1 to avoid loop on first instance of comma
Set #LastIndex = #LastIndex + #IndexIncrement
End
Return #Output
End
Then your view would simply do something like:
Sample view using the function:
Create View vw_UserLanguages
As
Select
UserId,
dbo.fn_languageCodeToFull(DefaultLanguage) as DefaultLanguage,
dbo.fn_languageCodeToFull(OtherLanguages) as OtherLanguages,
From UserLanguageCodes -- you do not provide a name so I made one up
Note that the function will work whether there are commas or not, so there is no need to join the Languages table here as you can just have the function do all the work in this case.
One quick and dirty solution would be to use a nested REPLACE command but that could result in a very complex statement a bit long winded, especially if you have more than five languages.
As an example:
SELECT [UserId],[DefaultLanguage],
CASE
WHEN [OtherLanguages] IS NULL THEN ''
ELSE REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE([OtherLanguages],
'en','English'),
'fr','French'),
'it','Italian'),
'ro','Romulan'), --Probably not the intended language ;-)
'sp','Spanish')
END as [OtherLanguages]
FROM YourTable
Personally, I'd create a scalar function, again using the REPLACE command, but you can then check the number of languages present and add a counter so that you're not doing unnecessary lookups.
SELECT [UserId],[DefaultLanguage],
CASE
WHEN [OtherLanguages] IS NULL THEN ''
WHEN [OtherLanguages] = '' THEN ''
ELSE do_function_name([OtherLanguages])
END as [OtherLanguages]
FROM YourTable
It might not be good practice but there are times when it is more efficient to store multiple values in a single field but accept that when you do, it will slow down the way you handle that data.

tsql match abreviation

I need to match abreviations, where a dot means everything that starts with, like:
Rows:
blue love
abo love
comeb love
blue lauer
blue alo
(a)
Input:
b. love
Expected output:
blue love
(b)
Input:
b. l.
Expected output:
blue love
blue lauer
Any tip?
You could convert the . to a % so that you can do a LIKE
SELECT #yourfilter = REPLACE(#yourfilter, '.', '%');
Then just use it:
SELECT * FROM TABLE WHERE COLUMN LIKE #yourfilter
It would be equivalent to:
SELECT * FROM TABLE WHERE COLUMN LIKE 'b% love'
OR
SELECT * FROM TABLE WHERE COLUMN LIKE 'b% l%'
Here is a working SQL Fiddle example
You could do something like:
CREATE PROCEDURE abrieviate
(
#delim char,
#word varchar(20)
)
AS
SELECT TOP 1 *
FROM Table
WHERE CHARINDEX(WORD_FIELD, #delim) > 0
AND WORD_FIELD LIKE '%' + #word + '%'
GO
EXEC abrieviate 'b', "love"