Working with a table in a DB2 User Defined Function - sql

So I've made a UDF in DB2 that tokenizes a String called Tokenize and returns a table with an ID and word (so each row is a word that was in the initial string). example
tokenize('University of Toronto', ' ') returns
ID Word
1 University
2 of
3 Toronto
What I'm trying to do is make another function that compares 2 strings to see how many words they have in common based on the length of the first string.
So for example 'University of Toronto' and 'University of Guelph' should return 0.66 I've gotten this to work with this code
CREATE OR REPLACE FUNCTION Parsed_Match(STRING1 VARCHAR(256), STRING2 VARCHAR(256))
RETURNS DECIMAL(5,2)
NO EXTERNAL ACTION
BEGIN
DECLARE SCORE DECIMAL(5,2);
DECLARE mymatches int;
DECLARE len int;
set mymatches = (
select count(*)
from (
select word
from table(tokenize(STRING1, ' '))
intersect all
select word
from table(tokenize(STRING2, ' '))
)
);
set len = (
select count(*)
from table(tokenize(STRING1, ' '))
);
set score = decimal(mymatches) / decimal(len);
RETURN SCORE;
END
Having to recall the tokenize code though to get the length of string1 just strikes me as wrong. Is there a way in DB2 I can store the calculated table in a variable to reuse that later on?
like I ideally want to do
set t1 = tokenize(String1);
set t2 = tokenize(String2);
set matches = (
select count(*)
from (
select word
from t1
intersect all
select word
from t2
)
);
set len = ( select count(*) from t1 );
but just can't find a way to get that to work :(

It'd probably be easiest to just create the table inline; that is, with a CTE:
WITH T1 AS (SELECT word
FROM TABLE(tokenize(STRING1, ' ')))
SELECT COUNT(*) / (SELECT COUNT(*) FROM T1) AS score
FROM (SELECT word
FROM T1
INTERSECT ALL
SELECT word
FROM TABLE(tokenize(STRING2, ' '))
)
... if you'll notice, this also allowed me to calculate the score in the same statement.
This won't help if you actually the to work with the tables outside this single statement.

Related

sort given string alphabetically and return letter for letter

I basically want to alphabetically sort a given string and output letter for letter via inline-function...
create function dbo.SortStringLetter4Letter (
#key varchar(64)
) returns table
as
return select ...
And I want to call up the function like this:
select *
from dbo.SortStringLetter4Letter('TEST');
go
And get a result like this:
id Sorted
----------- --------
1 E
2 S
3 T
4 T
Anybody got an idea for this problem?
This should do;
create or alter function dbo.SortStringLetter4Letter (
#key varchar(64)
) returns table
as
return
WITH Numbers AS
(
SELECT 1 AS Number
UNION ALL
SELECT Number+1
FROM Numbers
WHERE Number < Len(#key)
)
Select
id = Number, Sorted = SUBSTRING(#Key, Number, 1)
from Numbers
GO
Select * from dbo.SortStringLetter4Letter('test') order by sorted
This is the solution I worked out:
drop function if exists dbo.LetterForLetter;
create function dbo.LetterForLetter (
#key varchar(128)
) returns #chars table (# int, character char)
as
begin
insert into #chars
select #,
substring(#key, #, 1) "character"
from Numbers
where # <= len(#key)
and # > 0
return
end;
go
select *
from dbo.LetterForLetter('TEST');
go
The table Numbers is just filled with numbers btw. that I used to give the keys an ID

Parsing / Indexing a Binary String in SQL Server

I have searched extensively for a relevant answer, but none quite satisfy what I need to be doing.
For our purposes I have a column with a 50 character binary string. In our database, it is actually hundreds of characters long.
There is one string for each unique item ID in our database. The location of each '1' flags a specific criteria being true, and a '0' false, so the indexed location of the ones and zeros are very important. Mostly, I care about where the 1's are.
I am not updating any databases, so I first decided to try and make a loop to look through each string and create a list of the 1's locations.
declare #binarystring varchar(50) = '10000010000110000001000000000000000000000000000001'
declare #position int = 0
declare #list varchar(200) = ''
while (#position <= len(#binarystring))
begin
set #position = charindex('1', #binarystring, #position)
set #list = #list + ', ' + convert(varchar(10),#position)
set #position = charindex('1', #binarystring, #position)+1
end
select right(#list, len(#list)-2)
This creates the following list:
1, 7, 12, 13, 20, 50
However, the loop will bomb if there is not a '1' at the end of the string, as I am searching through the string via occurrences of 1's rather than one character at a time. I am not sure how satisfy the break criteria when the loop would normally reach the end of the string, without there being a 1.
Is there a simple solution to my loop bombing, and should I even be looping in the first place?
I have tried other methods of parsing, union joining, indexing, etc, but given this very specific set of circumstances I couldn't find any combination that did quite what I needed. The above code is the best I've got so far.
I don't specifically need a comma delimited list as an output, but I need to know the location of all 1's within the string. The amount of 1's vary, but the string size is always the same.
This is my first time posting to stackoverflow, but I have used answers many times. I seek to give a clear question with relevant information. If there is anything I can do to help, I will try to fulfill any requests.
How about changing the while condition to this?
while (charindex('1', #binarystring, #position) > 0)
while (#position <= len(#binarystring))
begin
set #position = charindex('1', #binarystring, #position)
if #position != 0
begin
set #list = #list + ', ' + convert(varchar(10),#position)
set #position = charindex('1', #binarystring, #position)+1
end
else
begin
break
end;
end
It's often useful to have a source of large ranges of sequential integers handy. I have a table, dbo.range that has a single column, id containing all the sequential integers from -500,000 to +500,000. That column is a clustered primary key so lookups against are fast. With such a table, solving your problem is easy.
Assuming your table has a schema something like
create table dbo.some_table_with_flags
(
id int not null primary key ,
flags varchar(1000) not null ,
)
The following query should do you:
select row_id = t.id ,
flag_position = r.id
from dbo.some_table t
join dbo.range r on r.id between 1 and len(t.flags)
and substring(t.flags,r.id,1) = '1'
For each 1 value in the flags column, you'll get a row containing the ID from your source table's ID column, plus the position in which the 1 was found in flags.
There are a number of techniques for generating such sequences. This link shows several:
http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
For instance, you could use common table expressions (CTEs) to generate your sequences, like this:
WITH
s1(n) AS -- 10 (10^1)
( SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
) ,
s2(n) as ( select 1 from s1 a cross join s1 b ) , -- 10^2 100
s3(n) as ( select 1 FROM s1 a cross join s2 b ) , -- 10^3 1,000
s4(n) as ( select 1 from s1 a cross join s3 b ) , -- 10^4 10,000
s5(n) as ( select 1 from s1 a cross join s4 b ) , -- 10^5 100,000
s6(n) as ( select 1 from s1 a cross join s5 b ) , -- 10^6 1,000,000
seq(n) as ( select row_number() over ( order by n ) from s6 )
select *
from dbo.some_table t
join seq s on s.n between 1 and len(t.flags)
and substring(t.flags,s.n,1) = '1'

How to compare two sub queries in one sql statement

I have a table tbl_Country, which contains columns called ID and Name. The Name column has multiple country names separated by comma, I want the id when I pass multiple country names to compare with Name column values. I am splitting the country names using a function - the sample query looks like this:
#country varchar(50)
SELECT *
FROM tbl_Country
WHERE (SELECT *
FROM Function(#Country)) IN (SELECT *
FROM Function(Name))
tbl_country
ID Name
1 'IN,US,UK,SL,NZ'
2 'IN,PK,SA'
3 'CH,JP'
parameter #country ='IN,SA'
i have to get
ID
1
2
NOTE: The Function will split the string into a datatable
Try this
SELECT * FROM tbl_Country C
LEFT JOIN tbl_Country C1 ON C1.Name=C.Country
Try this:
SELECT *
FROM tbl_Country C
WHERE ',' + #country + ',' LIKE '%,' + C.Name + ',%';
Basically, by specifying multiple values in a single column, you are violating the 1st NF. Therefore, the following might not be a good approach but provides the solution that you are looking for:
declare #country varchar(50)= 'IN,SA'
declare #counterend int
declare #counterstart int =1
declare #singleCountry varchar(10)
set #counterend = (select COUNT(*) from fnSplitStringList(#country))
create table #temp10(
id int
,name varchar(50))
while #counterstart<= #counterend
begin
;with cte as (
select stringliteral country
, ROW_NUMBER() over (order by stringliteral) countryseq
from fnSplitStringList(#country))
select #singleCountry = (select country FROM cte where countryseq=#counterstart)
insert into #temp10(id, name)
select * from tbl_country t1
where not exists (select id from #temp10 t2 where t1.id=t2.id)
and name like '%' + #singleCountry +'%'
set #counterstart= #counterstart+1
end
select * from #temp10
begin drop table #temp10 end
How it works: It splits the passed string and ranks it. Afterwards, it loops through all the records for every single Value(country) produced and inserts them into temptable.
try this,
select a.id FROM tbl_Country a inner join
(SELECT country FROM dbo.Function(#Country)) b on a.name=b.country

Finding strings with duplicate letters inside

Can somebody help me with this little task? What I need is a stored procedure that can find duplicate letters (in a row) in a string from a table "a" and after that make a new table "b" with just the id of the string that has a duplicate letter.
Something like this:
Table A
ID Name
1 Matt
2 Daave
3 Toom
4 Mike
5 Eddie
And from that table I can see that Daave, Toom, Eddie have duplicate letters in a row and I would like to make a new table and list their ID's only. Something like:
Table B
ID
2
3
5
Only 2,3,5 because that is the ID of the string that has duplicate letters in their names.
I hope this is understandable and would be very grateful for any help.
In your answer with stored procedure, you have 2 mistakes, one is missing space between column name and LIKE clause, second is missing single quotes around search parameter.
I first create user-defined scalar function which return 1 if string contains duplicate letters:
EDITED
CREATE FUNCTION FindDuplicateLetters
(
#String NVARCHAR(50)
)
RETURNS BIT
AS
BEGIN
DECLARE #Result BIT = 0
DECLARE #Counter INT = 1
WHILE (#Counter <= LEN(#String) - 1)
BEGIN
IF(ASCII((SELECT SUBSTRING(#String, #Counter, 1))) = ASCII((SELECT SUBSTRING(#String, #Counter + 1, 1))))
BEGIN
SET #Result = 1
BREAK
END
SET #Counter = #Counter + 1
END
RETURN #Result
END
GO
After function was created, just call it from simple SELECT query like following:
SELECT
*
FROM
(SELECT
*,
dbo.FindDuplicateLetters(ColumnName) AS Duplicates
FROM TableName) AS a
WHERE a.Duplicates = 1
With this combination, you will get just rows that has duplicate letters.
In any version of SQL, you can do this with a brute force approach:
select *
from t
where t.name like '%aa%' or
t.name like '%bb%' or
. . .
t.name like '%zz%'
If you have a case sensitive collation, then use:
where lower(t.name) like '%aa%' or
. . .
Here's one way.
First create a table of numbers
CREATE TABLE dbo.Numbers
(
number INT PRIMARY KEY
);
INSERT INTO dbo.Numbers
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number > 0;
Then with that in place you can use
SELECT *
FROM TableA
WHERE EXISTS (SELECT *
FROM dbo.Numbers
WHERE number < LEN(Name)
AND SUBSTRING(Name, number, 1) = SUBSTRING(Name, number + 1, 1))
Though this is an old post it's worth posting a solution that will be faster than a brute force approach or one that uses a scalar udf (which generally drag down performance). Using NGrams8K this is rather simple.
--sample data
declare #table table (id int identity primary key, [name] varchar(20));
insert #table([name]) values ('Mattaa'),('Daave'),('Toom'),('Mike'),('Eddie');
-- solution #1
select id
from #table
cross apply dbo.NGrams8k([name],1)
where charindex(replicate(token,2), [name]) > 0
group by id;
-- solution #2 (SQL 2012+ solution using LAG)
select id
from
(
select id, token, prevToken = lag(token,1) over (partition by id order by position)
from #table
cross apply dbo.NGrams8k([name],1)
) prep
where token = prevToken
group by id; -- optional id you want to remove possible duplicates.
another burte force way:
select *
from t
where t.name ~ '(.)\1';

SQL query to match keywords?

I have a table with a column as nvarchar(max) with text extracted from word documents in it. How can I create a select query that I'll pass another a list of keywords as parameter and return the rows ordered by the number of matches?
Maybe it is possible with full text search?
Yes, possible with full text search, and likely the best answer. For a straight T-SQL solution, you could use a split function and join, e.g. assuming a table of numbers called dbo.Numbers (you may need to decide on a different upper limit):
SET NOCOUNT ON;
DECLARE #UpperLimit INT;
SET #UpperLimit = 200000;
WITH n AS
(
SELECT
rn = ROW_NUMBER() OVER
(ORDER BY s1.[object_id])
FROM sys.objects AS s1
CROSS JOIN sys.objects AS s2
CROSS JOIN sys.objects AS s3
)
SELECT [Number] = rn - 1
INTO dbo.Numbers
FROM n
WHERE rn <= #UpperLimit + 1;
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);
And a splitting function that uses that table of numbers:
CREATE FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX)
)
RETURNS TABLE
AS
RETURN
(
SELECT DISTINCT
[Value] = LTRIM(RTRIM(
SUBSTRING(#List, [Number],
CHARINDEX(N',', #List + N',', [Number]) - [Number])))
FROM
dbo.Numbers
WHERE
Number <= LEN(#List)
AND SUBSTRING(N',' + #List, [Number], 1) = N','
);
GO
Then you can simply say:
SELECT key, NvarcharColumn /*, other cols */
FROM dbo.table AS outerT
WHERE EXISTS
(
SELECT 1
FROM dbo.table AS t
INNER JOIN dbo.SplitStrings(N'list,of,words') AS s
ON t.NvarcharColumn LIKE '%' + s.Item + '%'
WHERE t.key = outerT.key
);
As a procedure:
CREATE PROCEDURE dbo.Search
#List NVARCHAR(MAX)
AS
BEGIN
SET NOCOUNT ON;
SELECT key, NvarcharColumn /*, other cols */
FROM dbo.table AS outerT
WHERE EXISTS
(
SELECT 1
FROM dbo.table AS t
INNER JOIN dbo.SplitStrings(#List) AS s
ON t.NvarcharColumn LIKE '%' + s.Item + '%'
WHERE t.key = outerT.key
);
END
GO
Then you can just pass in #List (e.g. EXEC dbo.Search #List = N'foo,bar,splunge') from C#.
This won't be super fast, but I'm sure it will be quicker than pulling all the data out into C# and double-nested loop it manually.
how to ... return the rows ordered by the number of [full-text] matches
I have not used it myself but believe SQL Server 2008 supports weighting the CONTAINSTABLE matches which might be of help to you:
http://msdn.microsoft.com/en-us/library/ms189760.aspx
If you don't have an engine that returns results weighted by the number of hits ...
You could write a UDF that takes two inputs and returns an integer: the big textvalue is the first input and the words you're looking for as a comma-delimited string is the second. The function returns an integer representing either the number of distinct looked-for words that were actually found at least once in the text, or the total number of times the looked-for words were found. Implementation --how to weight-- is up to you. Maybe, for example, you'd want to arrange the looked-for words in most-important to least-important order, and give an important word hit more weight than a less important word hit.
You could then use your full text search engine to find all records that contain at least one of the words (you'd OR them), and you'd run this result set through your UDF scalar function:
pseudo code
select title, weightfunction(summary, 'word1,word2,word3....wordN')
from docs
where summary contains ( word1 or word2 or word3 ... or wordN)
order by weightfunction(summary, 'word1,word2,word3....wordN') desc