Search a database for a sentence separated by a delimiter in SQL - sql

I currently have knowledge base articles in a database, and the system separates the keywords relating to the article by a semicolon.
On the front facing form, I have a text box limited to 100 characters. How do I create a SQL query that looks at the words within the text box and starts matching those words to the keywords within the database?
I basically want to create a search a little like Google's.
I am currently using LIKE '%{token}%'. However, this is not good enough.

If you have an application layer that in-turn interacts with the database, then you should generate that portion of the query, searching for each word at the beginning, in the middle and at the end of the sentence column of each record, for each word in the search text.
(
sentence LIKE 'token;%'
OR sentence like '%;token;%'
OR sentence like '%;token'
)
E.g.
Search text = "search google";
Query:
(
sentence LIKE 'search;%'
OR sentence like '%;search;%'
OR sentence like '%;search'
)
AND
(
sentence LIKE 'google;%'
OR sentence like '%;google;%'
OR sentence like '%;google'
)

If I understood you correctly, I had a similar problem.
I solved it by stored function "split". The function splits the parameter by delimiters and returns a table with one column.
This allows you to join the returned table column with your article keywords.
Here is a bit more generic version of the code
FUNCTION [dbo].[Split] (#s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, dbo.GetNextDelimiter(#s,0)
UNION ALL
SELECT pn + 1, stop + 1, dbo.GetNextDelimiter(#s, stop + 1)
FROM Pieces
WHERE stop > 0
),
Parts(pn,s,start,[end]) as
(SELECT pn,
SUBSTRING(#s, start, CASE WHEN (stop > 0) THEN stop-start ELSE 512 END) AS s, start, stop
FROM Pieces)
SELECT pn as PartNumber,s as StringPart,(start - 1) start,([end] - 1) [end] FROM Parts WHERE s != ''
)
The delimiter function :
FUNCTION [dbo].[GetNextDelimiter](#string nvarchar(max), #startFrom int)
RETURNS int
AS
BEGIN
-- Declare the return variable here
DECLARE #response int
SELECT #response = MIN(delimiterIndex)
FROM
(SELECT CHARINDEX(' ', #string, #startFrom)
UNION
SELECT CHARINDEX('-', #string, #startFrom)
UNION
SELECT CHARINDEX('''', #string, #startFrom)) as Delimiters(delimiterIndex)
WHERE delimiterIndex != 0
IF #response is NULL
SET #response = 0
RETURN #response
END
This would allow you to use not only semicolons, but any type and number of delimiters.
From here it's simple
Select * from Articles WHERE Token in Split(#Input)

Related

Update or replace a special character text in my database field

I am facing to an issue that I cannot update or replace some characters in my database.
Here is how this text look like in my column when I retrieve it:
As you can see, there is an unknown characters between 'master' and 'degree' which I cannot even paste it here.
I also tried to update and replace it with below code (I cannot paste that two vertical lines here since they are not supported in any browser and I am not sure what they are, Please see the picture above to see what is in my SQL statement).
begin transaction
update gm_desc set projdesc=replace(projdesc,'%â s%','') where projdesc like '%âs%' and proposalno = '15-149-01'
You can see the real SQL Statement here:
I tried to update, or replace it but I cannot do it. The update statement successfully works but I still see that weird special charters. I would be appreciate to help me.
Here's a scalar-valued function which removes all non-alphanumeric characters (preserves spaces) from a string.
Hopefully it helps!
dbfiddle
create function dbo.get_alphanumeric_str
(
#string varchar(max)
)
returns varchar(max)
as
begin
declare #ret varchar(max);
with nums as (
select 1 as n
union all select n+1 from nums
where n < 256
)
select #ret = replace(stuff(
(
select '' + substring(#string, nums.n, 1)
from nums
where patindex('%[^0-9A-Za-z ]%', substring(#string, nums.n,1)) = 0
for xml path('')
), 1, 0, ''
), ' ', ' ')
option (MAXRECURSION 256)
return #ret;
end
Usage
select dbo.get_alphanumeric_str('Helloᶄ âWorld 1234⅊⅐')
Returns: Hello World 1234
How it works
The nums CTE is just to get a list of numbers (you can set the maximum of 256 to a higher value if your strings are longer; n.b. option (MAXRECURSION n) is for this CTE but has to be placed at the query)
The stuff essentially iterates through the string, using the list of numbers above and extracts a substring of length 1; each of these chars are checked if they match the [^0-9A-Za-z ] regex group (0-9 all digits, A-Za-z all letters both lower and upper case, and a single space character)
If they match, patindex() should return 0; i.e. index zero.
Use replace(string, ' ', ' ') for the space character as the xml path returns a special encoding, see this question.
Use a binary collation for accented characters; see this answer

Hide characters in email address using an SQL query

I have a very specific business requirement to hide certain characters within an email address when returned from SQL and I have hit the limit of my ability with SQL to achieve this. I was wondering if someone out there would be able to point me in the right direction. Essentially, my business is asking for the following:
test#email.com to become t*\*t#e**l.com
or
thislong#emailaddress.com to become t******h#e**********s.com
I am aware that if either portion of the email before of after the # are less than 3 characters, then this won't work, but I intend on checking for this and dealing with it appropriately. I have tried a mixture of SUBSTRING, STUFF, LEFT/RIGHT etc but I can't quite get it right.
DECLARE #String VARCHAR(100) = 'example#gmail.com'
SELECT LEFT(#String, 1) + '*****#'
+ REVERSE(LEFT(RIGHT(REVERSE(#String) , CHARINDEX('#', #String) +1), 1))
+ '******'
+ RIGHT(#String, 5)
result will be
e******e#g***l.com
Very interesting and very much tough to generate generic solution try this
this may help you
DECLARE #String VARCHAR(100) = 'sample#gmail.com'
SELECT STUFF(STUFF(#STring,
CHARINDEX('#',#String)+2,
(CHARINDEX('.',#String, CHARINDEX('#',#String))-CHARINDEX('#',#String)-3),
REPLICATE('*',CHARINDEX('.',#String, CHARINDEX('#',#String))-CHARINDEX('#',#String)))
,2
,CHARINDEX('#',#String)-3
,REPLICATE('*',CHARINDEX('#',#String)-3))
OUTPUT will be
s****e#g******l.com
Similar way for thislong#emailaddress.com
OUTPUT will be
t******g#e*************s.com
You could also use regular expressions. Something like this (not completely finished):
select regexp_replace('test#email.com', '^(.?).*#(.?).*', '\1***#\2***')
from dual
Results in:
t***#e***
Could be a useful solution if you can only use SELECT statements.
CREATE FUNCTION dbo.EmailObfuscate
( #Email VARCHAR(255)
) RETURNS TABLE AS RETURN
(
SELECT adr.email
, LEFT (adr.email, 1)
+ REPLICATE ('*', AtPos-3)
+ SUBSTRING (adr.Email, AtPos-1, 3)
+ REPLICATE ('*', Length-DotPos - AtPos - 2)
+ SUBSTRING (adr.Email, Length - DotPos, 10) AS hide
FROM ( VALUES ( #Email) ) AS ADR (EMail)
CROSS APPLY ( SELECT CHARINDEX ('#', adr.Email)
, CHARINDEX ('.', REVERSE(Adr.Email))
, LEN (Adr.Email)
) positions ( AtPos
, dotpos
, Length
)
);
GO
-- Calling
SELECT Emails.*
FROM ( Values ( 'this.long#emailaddress.com')
, ( 'test#email.com' )
, ( 'sample#gmail.com' )
, ( 'test#gmx.de' )
) AS adr (email)
CROSS APPLY dbo.EmailObfuscate (Adr.Email) AS Emails
SELECT
-- Email here is the name of the selected Column from your Table
--Display the First Character
SUBSTRING(Email,1,1)+
--Replace selected Number of *
REPLICATE('*',10)+
--Display the One Character before # along with # & One Character after #
SUBSTRING(Email,CHARINDEX('#',Email)-1,3)+
--Replace selected Number of *
REPLICATE('*',10)+
--Display. Character along with the rest selected Number of Characters
SUBSTRING(Email,CHARINDEX('.',Email)-1, LEN(Email) -
CHARINDEX('.',Email)+12)
--NameEmail is the Table Name
FROM NameEmail
Result is:
j**********l#a**********a.co.uk
I've found this answer, but was searching for PostgreSQL version (and have not found it)
So I've made my own for PGSQL
SELECT
-- Email here is the name of the selected Column from your Table
--Display the First Character
SUBSTRING('g40gj30m#my-email.co.uk', 1, 1) ||
--Replace selected Number of *
REPEAT('*', 10) ||
--Display the One Character before # along with # & One Character after #
SUBSTRING('g40gj30m#my-email.co.uk', strpos('g40gj30m#my-email.co.uk', '#') - 1, 3) ||
--Replace selected Number of *
REPEAT('*', 10) ||
--Display. Character along with the rest selected Number of Characters
SUBSTRING(
'g40gj30m#my-email.co.uk',
strpos('g40gj30m#my-email.co.uk', '.') - 1,
length('g40gj30m#my-email.co.uk') - strpos('g40gj30m#my-email.co.uk', '.') + 12
);
the result will be
?column?
---------------------------------
g**********m#m**********l.co.uk
(1 row)

Looking for a scalar function to find the last occurrence of a character in a string

Table FOO has a column FILEPATH of type VARCHAR(512). Its entries are absolute paths:
FILEPATH
------------------------------------------------------------
file://very/long/file/path/with/many/slashes/in/it/foo.xml
file://even/longer/file/path/with/more/slashes/in/it/baz.xml
file://something/completely/different/foo.xml
file://short/path/foobar.xml
There's ~50k records in this table and I want to know all distinct filenames, not the file paths:
foo.xml
baz.xml
foobar.xml
This looks easy, but I couldn't find a DB2 scalar function that allows me to search for the last occurrence of a character in a string. Am I overseeing something?
I could do this with a recursive query, but this appears to be overkill for such a simple task and (oh wonder) is extremely slow:
WITH PATHFRAGMENTS (POS, PATHFRAGMENT) AS (
SELECT
1,
FILEPATH
FROM FOO
UNION ALL
SELECT
POSITION('/', PATHFRAGMENT, OCTETS) AS POS,
SUBSTR(PATHFRAGMENT, POSITION('/', PATHFRAGMENT, OCTETS)+1) AS PATHFRAGMENT
FROM PATHFRAGMENTS
)
SELECT DISTINCT PATHFRAGMENT FROM PATHFRAGMENTS WHERE POS = 0
I think what you're looking for is the LOCATE_IN_STRING() scalar function. This is what Info Center has to say if you use a negative start value:
If the value of the integer is less than zero, the search begins at
LENGTH(source-string) + start + 1 and continues for each position to
the beginning of the string.
Combine that with the LENGTH() and RIGHT() scalar functions, and you can get what you want:
SELECT
RIGHT(
FILEPATH
,LENGTH(FILEPATH) - LOCATE_IN_STRING(FILEPATH,'/',-1)
)
FROM FOO
One way to do this is by taking advantage of the power of DB2s XQuery engine. The following worked for me (and fast):
SELECT DISTINCT XMLCAST(
XMLQuery('tokenize($P, ''/'')[last()]' PASSING FILEPATH AS "P")
AS VARCHAR(512) )
FROM FOO
Here I use tokenize to split the file path into a sequence of tokens and then select the last of these tokens. The rest is only conversion from SQL to XML types and back again.
I know that the problem from the OP was already solved but I decided to post the following anyway to hopefully help others like me that land here.
I came across this thread while searching for a solution to my similar problem which had the exact same requirement but was for a different kind of database that was also lacking the REVERSE function.
In my case this was for a OpenEdge (Progress) database, which has a slightly different syntax. This made the INSTR function available to me that most Oracle typed databases offer.
So I came up with the following code:
SELECT
SUBSTRING(
foo.filepath,
INSTR(foo.filepath, '/',1, LENGTH(foo.filepath) - LENGTH( REPLACE( foo.filepath, '/', '')))+1,
LENGTH(foo.filepath))
FROM foo
However, for my specific situation (being the OpenEdge (Progress) database) this did not result into the desired behaviour because replacing the character with an empty char gave the same length as the original string. This doesn't make much sense to me but I was able to bypass the problem with the code below:
SELECT
SUBSTRING(
foo.filepath,
INSTR(foo.filepath, '/',1, LENGTH( REPLACE( foo.filepath, '/', 'XX')) - LENGTH(foo.filepath))+1,
LENGTH(foo.filepath))
FROM foo
Now I understand that this code won't solve the problem for T-SQL because there is no alternative to the INSTR function that offers the Occurence property.
Just to be thorough I'll add the code needed to create this scalar function so it can be used the same way like I did in the above examples.
-- Drop the function if it already exists
IF OBJECT_ID('INSTR', 'FN') IS NOT NULL
DROP FUNCTION INSTR
GO
-- User-defined function to implement Oracle INSTR in SQL Server
CREATE FUNCTION INSTR (#str VARCHAR(8000), #substr VARCHAR(255), #start INT, #occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #found INT = #occurrence,
#pos INT = #start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #pos = CHARINDEX(#substr, #str, #pos);
-- Nothing found
IF #pos IS NULL OR #pos = 0
RETURN #pos;
-- The required occurrence found
IF #found = 1
BREAK;
-- Prepare to find another one occurrence
SET #found = #found - 1;
SET #pos = #pos + 1;
END
RETURN #pos;
END
GO
To avoid the obvious, when the REVERSE function is available you do not need to create this scalar function and you can just get the required result like this:
SELECT
SUBSTRING(
foo.filepath,
LEN(foo.filepath) - CHARINDEX('\', REVERSE(foo.filepath))+2,
LEN(foo.filepath))
FROM foo
You could just do it in a single statement:
select distinct reverse(substring(reverse(FILEPATH), 1, charindex('/', reverse(FILEPATH))-1))
from filetable

How do I call this function in a sproc?

I got this function from someone on here:
create FUNCTION [dbo].[fnSplitString] (#s varchar(512),#sep char(1))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
FROM Pieces
)
Normally in where clauses of sprocs I call this type of function like this:
WHERE u.[Fleet] IN (SELECT [VALUE] FROM dbo.udf_GenerateVarcharTableFromStringList(#Fleets, ','))
How do I go about calling the above function in a similar manner?
Thanks!
You would want to replace [value] with [s].
Since you said the result of:
SELECT * FROM dbo.fnSplitString('abc,def', ',')
is correct, then it's likely that everything should work. Check what happens when you have an empty token (two adjacent commas) and see if the results are as you expect.
Also, I noticed there was an effective limit on how long the input string could be. Check to make sure that nothing is getting cut off.
The function does not remove leading/trailing spaces, so if you have a string 'a, b, c' and a separator ',' you'll get a table with 'a', ' b', ' c'. Does this match in your u.[Fleet]?

sql function to return table of names and values given a querystring

Anyone have a t-sql function that takes a querystring from a url and returns a table of name/value pairs?
eg I have a value like this stored in my database:
foo=bar&baz=qux&x=y
and I want to produce a 2-column (key and val) table (with 3 rows in this example), like this:
name | value
-------------
foo | bar
baz | qux
x | y
UPDATE: there's a reason I need this in a t-sql function; I can't do it in application code. Perhaps I could use CLR code in the function, but I'd prefer not to.
UPDATE: by 'querystring' I mean the part of the url after the '?'. I don't mean that part of a query will be in the url; the querystring is just used as data.
create function dbo.fn_splitQuerystring(#querystring nvarchar(4000))
returns table
as
/*
* Splits a querystring-formatted string into a table of name-value pairs
* Example Usage:
select * from dbo.fn_splitQueryString('foo=bar&baz=qux&x=y&y&abc=')
*/
return (
select 'name' = SUBSTRING(s,1,case when charindex('=',s)=0 then LEN(s) else charindex('=',s)-1 end)
, 'value' = case when charindex('=',s)=0 then '' else SUBSTRING(s,charindex('=',s)+1,4000) end
from dbo.fn_split('&',#querystring)
)
go
Which utilises this general-purpose split function:
create function dbo.fn_split(#sep nchar(1), #s nvarchar(4000))
returns table
/*
* From https://stackoverflow.com/questions/314824/
* Splits a string into a table of values, with single-char delimiter.
* Example Usage:
select * from dbo.fn_split(',', '1,2,5,2,,dggsfdsg,456,df,1,2,5,2,,dggsfdsg,456,df,1,2,5,2,,')
*/
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS s
FROM Pieces
)
go
Ultimately letting you do something like this:
select name, value
from dbo.fn_splitQuerystring('foo=bar&baz=something&x=y&y&abc=&=whatever')
I'm sure TSQL could be coerced to jump through this hoop for you, but why not parse the querystring in your application code where it most probably belongs?
Then you can look at this answer for what others have done to parse querystrings into name/value pairs.
Or this answer.
Or this.
Or this.
Please don't encode your query strings directly in URLs, for security reasons: anyone can easily substitute any old query to gain access to information they shouldn't have -- or worse, "DROP DATABASE;". Checking for suspicious "keywords" or things like quote characters is not a solution -- creative hackers will work around these measures, and you'll annoy everyone whose last name is "O'Reilly."
Exceptions: in-house-only servers or public https URLS. But even then, there's no reason why you can't build the SQL query on the client side and submit it from there.