Firebird how to select ids that match all items in a set - sql

I'm using Firebird 2.1.
There is a table: IDs, Labels
There can be multiple labels for the same ID:
10 Peach
10 Pear
10 Apple
11 Apple
12 Pear
13 Peach
13 Apple
Let's say I have a set of labels, ie.: (Apple, Pear, Peach).
How can I write a single select to return all IDs that have all labels associated in a given set? Preferably I'd like to specify the set in a string separated with commas, like: ('Apple', 'Pear', 'Peach') -› this should return ID = 10.
Thanks!

As asked, I'm posting my simpler version of piclrow's answer. I have tested this on my Firebird, which is version 2.5, but the OP (Steve) has tested it on 2.1 and it works as well.
SELECT id
FROM table
WHERE label IN ('Apple', 'Pear', 'Peach')
GROUP BY id
HAVING COUNT(DISTINCT label)=3
This solution has the same disadvantage as pilcrow's... you need to know how many values you are looking for, as the HAVING = condition must match the WHERE IN condition. In this respect, Ed's answer is more flexible, as it splits the concatenated value string parameter and counts the values. So you just have to change the one parameter, instead of the 2 conditions I and pilcrow use.
OTOH, if efficency is of concern, I would rather think (but I am absolutely not sure) that Ed's CTE approach might be less optimizable by the Firebird engine than the one I suggest. Firebird is very good at optimizing queries, but I don't really now if it is able to do so when you use CTE this way. But the WHERE + GROUP BY + HAVING should be optimizable by simply having an index on (id,label).
In conclusion, if execution times are of concern in your case, then you probably need some explain plans to see what is happening, whichever solution you choose ;)

It's easiest to split the string in code and then query
SQL> select ID
CON> from (select ID, count(DISTINCT LABEL) as N_LABELS
CON> from T
CON> where LABEL in ('Apple', 'Pear', 'Peach')
CON> group by 1) D
CON> where D.N_LABELS >= 3; -- We know a priori we have 3 LABELs
ID
============
10

If it is acceptable to create a helper stored procedure that will be called from the primary select then consider the following.
The Helper stored procedure takes in a delimited string along with the delimiter and returns a row for each delimited string
CREATE OR ALTER PROCEDURE SPLIT_BY_DELIMTER (
WHOLESTRING VARCHAR(10000),
SEPARATOR VARCHAR(10))
RETURNS (
ROWID INTEGER,
DATA VARCHAR(10000))
AS
DECLARE VARIABLE I INTEGER;
BEGIN
I = 1;
WHILE (POSITION(:SEPARATOR IN WHOLESTRING) > 0) DO
BEGIN
ROWID = I;
DATA = TRIM(SUBSTRING(WHOLESTRING FROM 1 FOR POSITION(TRIM(SEPARATOR) IN WHOLESTRING) - 1));
SUSPEND;
I = I + 1;
WHOLESTRING = TRIM(SUBSTRING(WHOLESTRING FROM POSITION(TRIM(SEPARATOR) IN WHOLESTRING) + 1));
END
IF (CHAR_LENGTH(WHOLESTRING) > 0) THEN
BEGIN
ROWID = I;
DATA = WHOLESTRING;
SUSPEND;
END
END
Below is the code to call, I am using Execute block to demonstrate passing in the delimited string
EXECUTE BLOCK
RETURNS (
LABEL_ID INTEGER)
AS
DECLARE VARIABLE PARAMETERS VARCHAR(50);
BEGIN
PARAMETERS = 'Apple,Peach,Pear';
FOR WITH CTE
AS (SELECT ROWID,
DATA
FROM SPLIT_BY_DELIMITER(:PARAMETERS, ','))
SELECT ID
FROM TABLE1
WHERE LABELS IN (SELECT DATA
FROM CTE)
GROUP BY ID
HAVING COUNT(*) = (SELECT COUNT(*)
FROM CTE)
INTO :LABEL_ID
DO
SUSPEND;
END

Related

Checking if field contains multiple string in sql server

I am working on a sql database which will provide with data some grid. The grid will enable filtering, sorting and paging but also there is a strict requirement that users can enter free text to a text input above the grid for example
'Engine 1001 Requi' and that the result will contain only rows which in some columns contain all the pieces of the text. So one column may contain Engine, other column may contain 1001 and some other will contain Requi.
I created a technical column (let's call it myTechnicalColumn) in the table (let's call it myTable) which will be updated each time someone inserts or updates a row and it will contain all the values of all the columns combined and separated with space.
Now to use it with entity framework I decided to use a table valued function which accepts one parameter #searchQuery and it will handle it like this:
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS #Result TABLE
( ... here come columns )
AS
BEGIN
DECLARE #searchToken TokenType
INSERT INTO #searchToken(token) SELECT value FROM STRING_SPLIT(#searchText,' ')
DECLARE #searchTextLength INT
SET #searchTextLength = (SELECT COUNT(*) FROM #searchToken)
INSERT INTO #Result
SELECT
... here come columns
FROM myTable
WHERE (SELECT COUNT(*) FROM #searchToken WHERE CHARINDEX(token, myTechnicalColumn) > 0) = #searchTextLength
RETURN;
END
Of course the solution works fine but it's kinda slow. Any hints how to improve its efficiency?
You can use an inline Table Valued Function, which should be quite a lot faster.
This would be a direct translation of your current code
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
(
WITH searchText AS (
SELECT value token
FROM STRING_SPLIT(#searchText,' ') s(token)
)
SELECT
... here come columns
FROM myTable t
WHERE (
SELECT COUNT(*)
FROM searchText
WHERE CHARINDEX(s.token, t.myTechnicalColumn) > 0
) = (SELECT COUNT(*) FROM searchText)
);
GO
You are using a form of query called Relational Division Without Remainder and there are other ways to cut this cake:
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
(
WITH searchText AS (
SELECT value token
FROM STRING_SPLIT(#searchText,' ') s(token)
)
SELECT
... here come columns
FROM myTable t
WHERE NOT EXISTS (
SELECT 1
FROM searchText
WHERE CHARINDEX(s.token, t.myTechnicalColumn) = 0
)
);
GO
This may be faster or slower depending on a number of factors, you need to test.
Since there is no data to test, i am not sure if the following will solve your issue:
-- Replace the last INSERT portion
INSERT INTO #Result
SELECT
... here come columns
FROM myTable T
JOIN #searchToken S ON CHARINDEX(S.token, T.myTechnicalColumn) > 0

How does one automatically insert the results of several function calls into a table?

Wasn't sure how to title the question but hopefully this makes sense :)
I have a table (OldTable) with an index and a column of comma separated lists. I'm trying to split the strings in the list column and create a new table with the indexes coupled with each of the sub strings of the string it was connected to in the old table.
Example:
OldTable
index | list
1 | 'a,b,c'
2 | 'd,e,f'
NewTable
index | letter
1 | 'a'
1 | 'b'
1 | 'c'
2 | 'd'
2 | 'e'
2 | 'f'
I have created a function that will split the string and return each sub string as a record in a 1 column table as so:
SELECT * FROM Split('a,b,c', ',', 1)
Which will result in:
Result
index | string
1 | 'a'
1 | 'b'
1 | 'c'
I was hoping that I could use this function as so:
SELECT * FROM Split((SELECT * FROM OldTable), ',')
And then use the id and string columns from OldTable in my function (by re-writing it slightly) to create NewTable. But I as far as I understand sending tables into the function doesn't work as I get: "Subquery returned more than 1 value. ... not premitted ... when the subquery is used as an expression."
One solution I was thinking of would be to run the function, as is, on all the rows of OldTable and insert the result of each call into NewTable. But I'm not sure how to iterate each row without a function. And I can't send tables into the a function to iterate so I'm back at square one.
I could do it manually but OldTable contains a few records (1000 or so) so it seems like automation would be preferable.
Is there a way to either:
Iterate over OldTable row by row, run the row through Split(), add the result to NewTable for all rows in OldTable. Either by a function or through regular sql-transactions
Re-write Split() to take a table variable after all
Get rid of the function altogether and just do it in sql transactions?
I'd prefer to not use procedures (don't know if there is a solutions with them either) mostly because I don't want the functionality inside of the DB to be exposed to the outside. If, however that is the "best"/only way to go I'll have to consider it. I'm quite (read very) new to SQL so it might be a needless worry.
Here is my Split() function if it is needed:
CREATE FUNCTION Split (
#string nvarchar(4000),
#delimitor nvarchar(10),
#indexint = 0
)
RETURNS #splitTable TABLE (id int, string nvarchar(4000) NOT NULL) AS
BEGIN
DECLARE #startOfSubString smallint;
DECLARE #endOfSubString smallint;
SET #startOfSubString = 1;
SET #endOfSubString = CHARINDEX(#delimitor, #string, #startOfSubString);
IF (#endOfSubString <> 0)
WHILE #endOfSubString > 0
BEGIN
INSERT INTO #splitTable
SELECT #index, SUBSTRING(#string, #startOfSubString, #endOfSubString - #startOfSubString);
SET #startOfSubString = #endOfSubString+1;
SET #endOfSubString = CHARINDEX(#delimitor, #string, #startOfSubString);
END;
INSERT INTO #splitTable
SELECT #index, SUBSTRING(#string, #startOfSubString, LEN(#string)-#startOfSubString+1);
RETURN;
END
Hope my problem and attempt was explained and possible to understand.
You are looking for cross apply:
SELECT t.index, s.item
FROM OldTable t CROSS APPLY
(dbo.split(t.list, ',')) s(item);
Inserting in the new table just requires an insert or select into clause.

SQL select multiple rows of data then compare

What would be the best approach in SQL Server 2008 to select something that can contain 10 list of data, then compare that data with a specific value in one of it's columns
So something like this below
SELECT bType FROM WORK_STATION WHERE nFileId = 123456789
Which could return either 1 - 10 values MAX (will return at least one value). Then to compare the data from that SQL statement above that we just selected to a specific value to something like
if bType = 1
--DO something
What is the best approach of doing something like this?
declare #table as table(btype int)
declare #btype int
insert into #table
SELECT bType FROM WORK_STATION WHERE nFileId = 123456789
while(exists(select top 1 'x' from #table)) --as long as #table contains records continue
begin
select top 1 #btype = btype from #table
if(#btype = 10)
print 'something'
delete top (1) from #table --remove the previously processed row. also ensures no infinite loop
end
I think you can use SP to declare variables and then compare it with the resultset, if you know that you have only 10 values you can use temp table and insert 10 values.
I hope this is helpful.

SQL Inline or Scalar Function?

So I need an SQL function that will concatenate a bunch of row values into one varchar.
I have the functions written but right now I'm focused on what is the better choice for performance.
The Scalar Function is
CREATE FUNCTION fn_GetPatients_ByRecipient (#recipient int)
RETURNS varchar(max)
AS
BEGIN
DECLARE #patients varchar(max)
SET #patients = ''
SELECT #patients = #patients + convert(varchar, Patient) + ';' FROM RecipientsPatients WHERE Recipient = #recipient
RETURN #patients
END
The Inline Function just returns a table of all the values instead of concatenating them.
CREATE FUNCTION fn_GetPatients_ByRecipient (#recipient int)
RETURNS TABLE
AS
RETURN
(
SELECT Patient FROM RecipientsPatients WHERE Recipient = #recipient
)
I would then take this table in a separate function and concatenate them together. I was thinking the second choice is best since I will be going row by row through a smaller data set. Any opinions on what I'm doing right/wrong would be appreciated.
Thanks
This problem of string concatenation in SQL Server has several solutions, and the pros and cons are discussed in Concatenating Row Values in Transact-SQL and other similar articles on the web.
My favourite solution is using the FOR XML PATH(' ') trick. The chain assignment method you use works fine, although is not officialy supported and hence may break in future. Your method should be among the fastest possible, if not the fastes, as long as the table valued function does not perform a full scan, ie. you have an index on Recipient that covers Patient (use include).
The only thing I would add is to declare both functions WITH SCHEMABINDING, this has side effects that improve performance.
See here for an example of using the FOR XML PATH trick
set nocount on;
declare #t table (id int, name varchar(20), x char(1))
insert into #t (id, name, x)
select 1,'test1', 'a' union
select 1,'test1', 'b' union
select 1,'test1', 'c' union
select 2,'test2', 'a' union
select 2,'test2', 'c' union
select 3,'test3', 'b' union
select 3,'test3', 'c'
SELECT p1.id, p1.name,
stuff((SELECT ', ' + x
FROM #t p2
WHERE p2.id = p1.id
ORDER BY name, x
FOR XML PATH('') ), 1,2, '') AS p3
FROM #t p1
GROUP BY
id, name
it returns
1 test1 a, b, c
2 test2 a, c
3 test3 b, c
Have a look at Adam Machanic's results from his Grouped String Concatenation Contest:
http://web.archive.org/web/20150328021904/http://sqlblog.com/blogs/adam_machanic/archive/2009/05/31/grouped-string-concatenation-the-winner-is.aspx
It has the code to show you the most efficient way to do this. Peter Larsson, who won the contest, used a combination of tricks including XML PATH to accomplish the task. There was some debate later about whether it was the most efficient solution based on subsequent tests of other submissions. Make sure you check the comments to know what scripts to look at in the zip file you can download there. Generally FOR XML PATH('') is the fastest though.

Find the last value in a "rolled-over" sequence with a stored procedure?

Suppose I had a set of alpha-character identifiers of a set length, e.g. always five letters, and they are assigned in such a way that they are always incremented sequentially (GGGGZ --> GGGHA, etc.). Now, if I get to ZZZZZ, since the length is fixed, I must "roll over" to AAAAA. I might have a contiguous block from ZZZAA through AAAAM. I want to write a sproc that will give me the "next" identifier, in this case AAAAN.
If I didn't have this "rolling over" issue, of course, I'd just ORDER BY DESC and grab the top result. But I'm at a bit of a loss now -- and it doesn't help at all that SQL is not my strongest language.
If I have to I can move this to my C# calling code, but a sproc would be a better fit.
ETA: I would like to avoid changing the schema (new column or new table); I'd rather just be able to "figure it out". I might even prefer to do it brute force (e.g. start at the lowest value and increment until I find a "hole"), even though that could get expensive. If you have an answer that does not modify the schema, it'd be a better solution for my needs.
Here's code that I think will give you your Next value. I created 3 functions. The table is just my simulation of the table.column with your alpha ids (I used MyTable.AlphaID). I assume that it's as you implied and there is one contiguous block of five-character uppercase alphabetic strings (AlphaID):
IF OBJECT_ID('dbo.MyTable','U') IS NOT NULL
DROP TABLE dbo.MyTable
GO
CREATE TABLE dbo.MyTable (AlphaID char(5) PRIMARY KEY)
GO
-- Play with different population scenarios for testing
INSERT dbo.MyTable VALUES ('ZZZZY')
INSERT dbo.MyTable VALUES ('ZZZZZ')
INSERT dbo.MyTable VALUES ('AAAAA')
INSERT dbo.MyTable VALUES ('AAAAB')
GO
IF OBJECT_ID('dbo.ConvertAlphaIDToInt','FN') IS NOT NULL
DROP FUNCTION dbo.ConvertAlphaIDToInt
GO
CREATE FUNCTION dbo.ConvertAlphaIDToInt (#AlphaID char(5))
RETURNS int
AS
BEGIN
RETURN 1+ ASCII(SUBSTRING(#AlphaID,5,1))-65
+ ((ASCII(SUBSTRING(#AlphaID,4,1))-65) * 26)
+ ((ASCII(SUBSTRING(#AlphaID,3,1))-65) * POWER(26,2))
+ ((ASCII(SUBSTRING(#AlphaID,2,1))-65) * POWER(26,3))
+ ((ASCII(SUBSTRING(#AlphaID,1,1))-65) * POWER(26,4))
END
GO
IF OBJECT_ID('dbo.ConvertIntToAlphaID','FN') IS NOT NULL
DROP FUNCTION dbo.ConvertIntToAlphaID
GO
CREATE FUNCTION dbo.ConvertIntToAlphaID (#ID int)
RETURNS char(5)
AS
BEGIN
RETURN CHAR((#ID-1) / POWER(26,4) + 65)
+ CHAR ((#ID-1) % POWER(26,4) / POWER(26,3) + 65)
+ CHAR ((#ID-1) % POWER(26,3) / POWER(26,2) + 65)
+ CHAR ((#ID-1) % POWER(26,2) / 26 + 65)
+ CHAR ((#ID-1) % 26 + 65)
END
GO
IF OBJECT_ID('dbo.GetNextAlphaID','FN') IS NOT NULL
DROP FUNCTION dbo.GetNextAlphaID
GO
CREATE FUNCTION dbo.GetNextAlphaID ()
RETURNS char(5)
AS
BEGIN
DECLARE #MaxID char(5), #ReturnVal char(5)
SELECT #MaxID = MAX(AlphaID) FROM dbo.MyTable
IF #MaxID < 'ZZZZZ'
RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(#MaxID)+1)
IF #MaxID IS NULL
RETURN 'AAAAA'
SELECT #MaxID = MAX(AlphaID)
FROM dbo.MyTable
WHERE AlphaID < dbo.ConvertIntToAlphaID((SELECT COUNT(*) FROM dbo.MyTable))
IF #MaxID IS NULL
RETURN 'AAAAA'
RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(#MaxID)+1)
END
GO
SELECT * FROM dbo.MyTable ORDER BY dbo.ConvertAlphaIDToInt(AlphaID)
GO
SELECT dbo.GetNextAlphaID () AS 'NextAlphaID'
By the way, if you don't want to assume contiguity, you can do as you suggested and (if there's a 'ZZZZZ' row) use the first gap in the sequence. Replace the last function with this:
IF OBJECT_ID('dbo.GetNextAlphaID_2','FN') IS NOT NULL
DROP FUNCTION dbo.GetNextAlphaID_2
GO
CREATE FUNCTION dbo.GetNextAlphaID_2 ()
RETURNS char(5)
AS
BEGIN
DECLARE #MaxID char(5), #ReturnVal char(5)
SELECT #MaxID = MAX(AlphaID) FROM dbo.MyTable
IF #MaxID < 'ZZZZZ'
RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(#MaxID)+1)
IF #MaxID IS NULL
RETURN 'AAAAA'
SELECT TOP 1 #MaxID=M1.AlphaID
FROM dbo.Mytable M1
WHERE NOT EXISTS (SELECT 1 FROM dbo.MyTable M2
WHERE AlphaID = dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(M1.AlphaID) + 1 )
)
ORDER BY M1.AlphaID
IF #MaxID IS NULL
RETURN 'AAAAA'
RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(#MaxID)+1)
END
GO
You'd have to store the last allocated identifier in the sequence.
For example, store it in another table that has one column & one row.
CREATE TABLE CurrentMaxId (
Id CHAR(6) NOT NULL
);
INSERT INTO CurrentMaxId (Id) VALUES ('AAAAAA');
Each time you allocate a new identifier, you'd fetch the value in that tiny table, increment it, and store that value in your main table as well as updating the value in CurrentMaxId.
The usual caveats apply with respect to concurrency, table-locking, etc.
I think I'd have tried to store the sequence as an integer, then translate it to string. Or else store a parallel integer column that is incremented at the same time as the alpha value. Either way, you could sort on the integer column.
A problem here is that you can't really tell from the data where the "last" entry is unless there is more detail as to how the old entries are deleted.
If I understand correctly, you are wrapping around at the end of the sequence, which means you must be deleting some of your old data to make space. However if the data isn't deleted in a perfectly uniform manner, you'll end up with fragments, like below:
ABCD HIJKL NOPQRS WXYZ
You'll notice that there is no obvious next value...D could be the last value created, but it might also be L or S.
At best you could look for the first or last missing element (use a stored procedure to perform a x+1 check just like you would to find a missing element in an integer sequence), but it's not going to provide any special result for rolled-over lists.
Since I don't feel like writing code to increment letters, I'd create a table of all valid IDs (AAAAAA through ZZZZZZ) with an integer from 1 to X for those IDs. Then you can use the following:
SELECT #max_id = MAX(id) FROM Possible_Silly_IDs
SELECT
COALESCE(MAX(PSI2.silly_id), 'AAAAAA')
FROM
My_Table T1
INNER JOIN Possible_Silly_IDs PSI1 ON
PSI1.silly_id = T1.silly_id
INNER JOIN Possible_Silly_IDs PSI2 ON
PSI2.id = CASE WHEN PSI1.id = #max_id THEN 1 ELSE PSI1.id + 1 END
LEFT OUTER JOIN My_Table T2 ON
T2.silly_id = PSI2.silly_id
WHERE
T2.silly_id IS NULL
The COALESCE is there in case the table is empty. To be truly robust you should calculate the 'AAAAAA' (SELECT #min_silly_id = silly_id WHERE id = 1) in case your "numbering" algorithm changes.
If you really wanted to do things right, you'd redo the database design as has been suggested.
I think the lowest-impact solution for my needs is to add an identity column. The one thing I can guarantee is that the ordering will be such that entries that should "come first" will be added first -- I'll never add one with identifier BBBB, then go back and add BBBA later. If I didn't have that constraint, obviously it wouldn't work, but as it stands, I can just order by the identity column and get the sort I want.
I'll keep thinking about the other suggestions -- maybe if they "click" in my head, they'll look like a better option.
To return the next ID for a given ID (with rollover), use:
SELECT COALESCE
(
(
SELECT TOP 1 id
FROM mytable
WHERE id > #id
ORDER BY
id
),
(
SELECT TOP 1 id
FROM mytable
ORDER BY
id
)
) AS nextid
This query searches for the ID next to the given. If there is no such ID, it returns the first ID.
Here are the results:
WITH mytable AS
(
SELECT 'AAA' AS id
UNION ALL
SELECT 'BBB' AS id
UNION ALL
SELECT 'CCC' AS id
UNION ALL
SELECT 'DDD' AS id
UNION ALL
SELECT 'EEE' AS id
)
SELECT mo.id,
COALESCE
(
(
SELECT TOP 1 id
FROM mytable mi
WHERE mi.id > mo.id
ORDER BY
id
),
(
SELECT TOP 1 id
FROM mytable mi
ORDER BY
id
)
) AS nextid
FROM mytable mo
id nextid
----- ------
AAA BBB
BBB CCC
CCC DDD
DDD EEE
EEE AAA
, i. e. it returns BBB for AAA, CCC for BBB, etc., and, finally, AAA for EEE which is last in the table.