Matching multiple variations of input to one sql row - sql

I would like to know after much searching how I would match different variations of input to one sql row using standard TSQL. Here is the scenario:
I have in my sql row the following text: I love
I then have the following 3 inputs all of which should return a match to this row:
"I want to tell you we all love StackOverflow"
"I'm totally in love with StackOverflow"
"I really love StackOverflow"
As you can see I have bolded the reason for the match to try and make it clearer to you why they match. The I in I'm is deliberately matched too so it would be good if we could include that in matches.
I thought about splitting the input string which I done using the following TSQL:
-- Create a space delimited string for testing
declare #str varchar(max)
select #str = 'I want to tell you we all love StackOverflow'
-- XML tag the string by replacing spaces with </x><x> tags
declare #xml xml
select #xml = cast('<x><![CDATA['+ replace(#str,' ',']]></x><x><![CDATA[') + ']]></x>' as xml)
-- Finally select values from nodes <x> and trim at the same time
select ltrim(rtrim(mynode.value('.[1]', 'nvarchar(12)'))) as Code
from (select #xml doc) xx
cross apply doc.nodes('/x') (mynode)
This gets me all the words as separate rows but then I could not work out how to do the query for matching these.
Therefore any help from this point or any alternate ways of matching as required would be more than greatly appreciated!
UPDATE:
#freefaller pointed me to the RegEx route and creating a function I have been able to get a bit further forward, therefore +1 #freefaller, however I now need to know how I can get it to look at all my table rows rather than the hard-coded input of 'I love' I now have the following select statements:
SELECT * FROM dbo.FindWordsInContext('i love','I want to tell you we all love StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I''m totally in love with StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I really love StackOverflow',30)
The above returns me the number of times matched and the context of the string matched, therefore the first select above returns:
Hits Context
1 ...I want to tell you we all love StackOv...
So based on the fact we now have the above can anyone tell me how to make this function look at all of the rows for matches and then return the row/rows that have a match?

One option would be to use Regular Expressions via SQLCLR objects as explained here.
I have never myself created SQLCLR objects, so cannot comment on the ease of this method. I am however, a great fan of Regular Expressions and would recommend their use for most text search / manipulation
Edit: In response to the comment, I have no experience of SQLCLR, but assuming you get that working, something like the following simple untested TSQL might work...
SELECT *
FROM mytable
WHERE dbo.RegexMatch(#search, REPLACE(myfield, ' ', '.*?')) = 1

I have managed to come up with an answer to my own question so thought I thought I would post here in case anyone else has similar requirements in the future. Basically it relies upon the SQL-CLR regular expression functionality and runs with minimal impact to performance.
Firstly enable SQL-CLR on your server if not already available (you need to be sysadmin):
--Enables CLR Integration
exec sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO
Then you will need to create the assembly in SQL (Don't forget to change your path from D:\SqlRegEx.dll and use SAFE permission set as this is the most restrictive and safest set of permissions but won't go into detail here.) :
CREATE ASSEMBLY [SqlRegEx] FROM 'D:\SqlRegEx.dll' WITH PERMISSION_SET = SAFE
Now create the actual function you will call:
CREATE FUNCTION [dbo].[RegexMatch]
(#Input NVARCHAR(MAX), #Pattern NVARCHAR(MAX), #IgnoreCase BIT)
RETURNS BIT
AS EXTERNAL NAME SqlRegEx.[SqlClrTools.SqlRegEx].RegExMatch
Finally and to complete and answer my own question we can then run the following TSQL:
SELECT *
FROM your_table
WHERE dbo.RegexMatch(#search, REPLACE(your_field, ' ', '.*?'), 1) = 1
SELECT *
FROM your_table
WHERE dbo.RegexMatch(#search, REPLACE(REVERSE(your_field), ' ', '.*?'), 1) = 1
I hope this will help someone in what should be a simple search option in the future.

Related

REVERSE function in Netezza not working, how to extract file name from path without it?

My company has production and testing databases in SQL Server and a data warehouse in IBM Netezza. I wrote a query in SQL Server and now need to covert it for use in the data warehouse, however I am running into a problem.
A crucial part of the query is extracting a file name from a path, and in SQL Server I use this:
RIGHT( BitmapID, CHARINDEX( '\', REVERSE( BitmapID ) + '\' ) - 1 )
This turns "G:\grps\every\Permanent Marketing Signage\SPC\BRD\BLAD\BCAG_BLAD_001.png" to "BCAG_BLAD_001.png" and it works perfectly. I tried to convert this to Netezza syntax like so:
SUBSTRING(bit_map_ID, LENGTH(bit_map_ID) - ( STRPOS( REVERSE( bit_map_ID ), '\' ) + 2 ) )
However, when I run this, I get an error:
ERROR [42S02] ERROR: Function 'REVERSE(VARCHAR)' does not exist
Unable to identify a function that satisfies the given argument types
You may need to add explicit typecasts
When I replace REVERSE( bit_map_ID ) with a reversed string example like "gnp.100_DALB_GACB\DALB\DRB\CPS\egangiS gnitekraM tnenamreP\yreve\sprg:G" this also works perfectly, so it's the REVERSE function that's the problem. Even though Aginity Workbench highlights the REVERSE function as if it exists, it doesn't seem to work at all - or if there is a way to make it work, I can't figure it out. I've already tried using CAST as suggested by the error message but it makes no difference.
Is there a way to reverse a string in Netezza? Or failing that, is there any other way of accomplishing what I want to do without reversing the string?
I was able to figure out how to do this in Netezza without using a REVERSE function like so:
SUBSTRING( bit_map_ID, INSTR( bit_map_ID, '\', -1 ) + 1 )
The key is to use the INSTR function and specify the third argument as -1 so that it will look for the first instance starting from the end of the string instead of the beginning of the string. No reversing needed.
While this works for my needs, I would definitely be open for alternative answers for the question I posed!
To my knowledge, the REVERSE function does not exist on netezza, and that is indeed what the error message above says, so I can confirm that the solution you provided is the way to go.
Alternative solutions would be to use a regular expression function or a string split.
To my knowledge MSsql server has none of those 3 solutions available, and the real issue for you is probably that the SQL standard does not include a list of functions needed to be compliant, so each database has its own take on which functions to include and what their interface is (negative arguments to instr in not universally accepted)

Dynamic, Nested Replace

I'm using SQL Server 2008 and need to strip out quite a bit of data within a string. Because of the nature and variability of the string, I think I'm needing to use multiple, nested REPLACE commands. The problem is each REPLACE needs to build on the previous one. Here is a sample of what I'm looking at:
<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Essentially, I need it to return just the text outside of the <> brackets so for this example it would be:
Treatment by test.
Also, I wanted to mention that the strings inside the <> brackets can vary quite a bit for each row both by content and length, but it isn't relevant for what I'm needing other than making it more complex for replacing.
Here is what I've tried:
REPLACE(note,substring(note,patindex('<%>',note),CHARINDEX('>',note) - CHARINDEX('<',note) + 1),'')
And it returns:
<Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Somehow I need to keep going with replacing each of the <> brackets but don't know how to proceed. Any help or guidance would be greatly appreciated!!!
Depending on how you have that string holding the HTML fragment available you could try to use something like:
SELECT convert(xml, '<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>').value('/', 'varchar(255)') as stripped
You convert it to XML and then use the built in xml parser function "value".

How to add comments to SQL CLR function?

I want to add comments to my SQL CLR functions (as I do to other SQL objects I am creating or editing - functions, procedures and views). Unfortunately, I am not able to do this for the SQL CLR objects.
For example, the following code:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =================================================================================================================================
-- Author: gotqn
-- Create date: 2015-03-25
-- Description: Converts a string that has been encoded for transmission in a URL into a decoded string.
-- Usage Example:
/*
SELECT [dbo].[fn_UrlDecode]('http://stackoverflow.com/search?q=tql+sql+server');
*/
-- =================================================================================================================================
CREATE FUNCTION [dbo].[fn_UrlDecode] (#value NVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
AS EXTERNAL NAME [Utils].[Utils].[UrlDecode]
GO
when the function is script from the SQL Management studio is going to produce this:
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER OFF
GO
CREATE FUNCTION [dbo].[fn_UrlDecode](#value [nvarchar](max))
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [Utils].[Utils].[UrlDecode]
GO
I try to fix this moving the comments part after the AS as this is the way comments are added for views, but it fails again. Then I try to put the comments after the CREATE clause, after the EXTERNAL NAME ... clause, but nothing changed.
Is there a way to fix this behaviour?
While #Damien is correct as to why the comments are not saved, there is still a somewhat work-around to store comments: Extended Properties.
For example:
EXEC sys.sp_addextendedproperty #name = N'comments', #value = N'
-- =================================================================================================================================
-- Author: gotqn
-- Create date: 2015-03-25
-- Description: Converts a string that has been encoded for transmission in a URL into a decoded string.
-- Usage Example:
/*
SELECT [dbo].[fn_UrlDecode](''http://stackoverflow.com/search?q=tql+sql+server'');
*/
-- =================================================================================================================================
', #level0type = 'SCHEMA', #level0name = N'dbo',
#level1type = 'FUNCTION', #level1name = N'fn_UrlDecode';
You just need to escape your embedded single-quotes.
Then you can retrieve them via:
SELECT [value]
FROM sys.fn_listextendedproperty(N'comments', 'SCHEMA', N'N'dbo',
'FUNCTION', N'fn_UrlDecode', NULL, NULL);
Minor additional note: if you won't ever decode URLs that are more than 4000 characters long (and I am pretty sure that you won't run into many that are even over 2048 characters), then you would be better served to use NVARCHAR(4000) for both input and output datatypes as that will be quite a bit faster than if either, or both, are NVARCHAR(MAX).
Basically, if it's a type not listed as having data stored in sys.sql_modules then the original text that created the object is not retained and so comments aren't retained. No CLR object stores such text.
This is the expected behavior. Even when you write a native TSQL script adding comments right before the routine signature, build it against the DBMS and do right click/edit to see the code the comments won't be there. Go ahead and give a try to this approach:
CREATE FUNCTION [dbo].[fn_UrlDecode]
(
#value [nvarchar](max)
)
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS
/*
***All the comments goes here***
*/
EXTERNAL NAME [Utils].[Utils].[UrlDecode]
GO
Hope it helps!
Committing Necromancy.
Solution first...reason second...
Simply wrap the CLR function in a standard function and put the comments in the standard function.
Overkill? Perhaps, but as I said, I have a reason and need in my current situation.
As an employee of a contractor to a large organization, there are tons of fingers in every pie, and the hands attached to those fingers are usually long gone and forgotten...but maintenance of their legacy code lives on. This is especially noticeable when external assemblies are loaded in a database, and there is a need to change or augment existing libraries...somewhere...but where. The coder is gone, and the documentation of the process is buried somewhere long forgotten.
So, for me, the primary reason for commenting such functions is to make it easy to identify the name of the Assembly and the Repository for the Source Code. Other than starting out with properly documented code in the first place (remember this is legacy cruft) I am open to other suggestions about better places/ways to do this...I am all ears.
Providing this information in a comment, somewhere in or related to the function or assembly, is very helpful. The simple wrapper noted above is a very KISS method to achieve this goal.

string replace sql select

Hi Ive tried to find an answer to this but cant find one.
Id like to remove some characters and prepend a pound sign to the result of an SQL query which looks as follows (its already using a replace command can I stack these)?:
select fundraiser.Company_Name,
replace(Just_Giving_Campaign,'"label":',''),
sum(fundraising_campaigns.Total_Collected) as donations
from fundraising_campaigns,
fundraiser
where Charity_Name = 'WaterAid'
and fundraising_campaigns.Campaigners_ID = fundraiser.id
group by fundraiser.Company_Name
Can anyone confirm how I would go about adding (£ sign) and remove several sets of characters from a select statement.Certainly dont appear to be able to stack replace statements (e.g.
replace(replace (string, what to match, what to replace it with), what to match, what to replace it with)
Appreciate any thoughts
I am not sure about your question. If I am correct you want to prepend £ and do some nested replace. Hope the below example helps.
select '£'+replace(replace('YourText','x','s'),'You','U')

Search for string with conditions

I have table with a name filed which is string. I need to create a sql statement that searches for children to a node without finding the children to the children. Is it possible to use LIKE and some wildcards to accomplish this? You can see some examples below of the results I need to get based on my search string.
Search string is /home
Then the follwing entries should be returned
/home/something
/home/somethingElse
but not
/home/something/foo
/home/something/bar
/home/somethingElse/foo
but if the search string is /home/something
These should be returned
/home/something/foo
/home/something/bar
SELECT name FROM table
WHERE name LIKE '/home/%' AND name NOT LIKE '/home/%/%'
should filter out anything with second level node under it.
I would probably search on the number of slashes in addition to the actual keywords. So the first one would be searching for /home with 1-2 /'s
The second one would be /home/something wtih 2-3 slashes.
I don't have sql up infront of me, but I'll work on some sample code for you.
Edit:
CREATE FUNCTION [dbo].[ufn_CountChar] ( #pInput VARCHAR(1000), #pSearchChar CHAR(1) )
RETURNS INT
BEGIN
RETURN (LEN(#pInput) - LEN(REPLACE(#pInput, #pSearchChar, '')))
END
GO
This little function will act nicely to count the number of slashes in your strings.
Enjoy
Hope this helps,
Cheers,