SQL Text Matching Query Tuning

SQL Text Matching Query Tuning - sql

I'm trying to do some free text search matching, and wondering if I can improve this query (using MSSQL 2008):
#FreeText is a table, where each row is a search word
DECLARE #WordCount = (SELECT COUNT(*) from #FreeText)
SELECT p.ID
FROM Product p
OUTER APPLY
(
SELECT COUNT(ID) as MatchCount
FROM Product pm
INNER JOIN #FreeText ft
ON pm.txt like '%'+ft.text+'%'
WHERE pm.ID = p.ID
AND (SELECT TOP 1 [text] FROM #FreeText) IS NOT NULL
)MC
WHERE MatchCount = #WordCount
So I'm wondering if there is any way to avoid the "FROM Product pm" in the outer apply?
I cannot always INNER JOIN #FreeText because sometimes we don't use free text searching.
Any thoughts or tips would be greatly appreciated, also let me know if I can clarify anything. Thanks in advance.
P.S. I do know that MS SQL has a FREETEXT() search, but I unfortunately cannot use that at the moment.

Here's a query without OUTER APPLY, that returns all results when there are no search critera.
DECLARE #FreeText TABLE
(
[text] varchar(200)
)
INSERT INTO #FreeText SELECT 'a'
INSERT INTO #FreeText SELECT 'c'
-- what, null? No.
DELETE FROM #FreeText WHERE [text] is null
DECLARE #WordCount int
SET #WordCount = (SELECT Count(*) FROM #FreeText)
SELECT p.ID
FROM Product p
LEFT JOIN #FreeText ft
ON p.txt like '%' + ft.text + '%'
WHERE ft.text is not null OR #WordCount = 0
GROUP BY p.ID
HAVING COUNT(*) = #WordCount OR #WordCount = 0
Note: it would be my preference to not use the "freetext" query when there is not any freetext criteria - instead use another query (simpler). If you choose to go that route - go back to an INNER JOIN and drop the OR #WordCount = 0 x2.

Related

Query / design performance problem: How to get an Id via a 'link'-table in the most efficient way?

See this part from my ERD:
Part from the design
From a reader, I get a RFID which belongs to a wheel. How can I obtain the corresponding Id from the tblProduct-table. I know how to do it with a number of SELECT-statements, but is this the fastest way? I am asking this because I do not have a lot of experience generating speed-efficient QUERY Statements.
On the moment I did create a function to handle this:
CREATE FUNCTION [dbo].ufnGetProductIdFromRFID
(#StationRFID NVARCHAR(20))
RETURNS
INT
AS
BEGIN
DECLARE #ProductId INT = 0
DECLARE #WheelId INT = 0
SELECT #WheelId = [Id] FROM [dbo].[tblWheel] WHERE RFID = #StationRFID
SELECT #ProductId = [ProductId] FROM [dbo].[tblLinkWheelProduct] WHERE WheelId = #WheelId
RETURN #ProductId
END
GO
So i execute two query's.
My question: will a 'JOIN' or some other contruction lead to a more efficient (less executing time) solution? Since the two SELECT-statements are fairly simple and not time-consuming at all I think.....
Thanks already for thinking along with me!

In this way you can get whatever information from tblProduct table
select * from tblProduct p
where exists (
select 1 from tblLinkWheelProduct lwp
inner join tblWheel w
on w.Id = lwp.WheelId
where p.Id = lwp.ProductId and RFID = ?
)
if your intention is to only get productId then,
select productId from tblLinkWheelProduct lwp
where exists (
select 1 from tblWheel w
where w.id = lwp.WheelId
and w.RFID = ?
)

I would join tblWheel to tblLinkWhelProduct to tblProduct. You would get the data with one query. I think it is the best way:
select p.Id from tblWheel w
inner join tblLinkWheelProduct l on w.Id=t.WheelId
inner join tblProduct p on p.Id=l.ProductId

Conversion failed while using ids "where ID in (...)", nvarchar to int

I have a query in MSSQL 2008 like:
IF OBJECT_Id('tempdb..#AccessibleFacilities') IS NOT NULL DROP TABLE #AccessibleFacilities
SELECT u.Userid
, AccesibleFacilityIds = dbo.GetCommaDelimitedString(upf.Facility_Id)
INTO #AccessibleFacilities
FROM Users u
INNER join UserProfileFacilities upf on upf.UserProfile_Id = up.Id
WHERE LOWER(u.Userid) = LOWER(#userId)
GROUP BY u.Userid
This query returns AccessibleFacilityIds like ",1,2,3,4,5,6,". Please note that I am not able to modify GetCommaDelimitedString function.
What I actually need to do is that using those facility ids to reach provs like below:
INSERT INTO #AccessibleProvs
SELECT Userid = #userId
, AccessibleProvIds = dbo.GetCommaDelimitedString(distinct p.Id)
FROM Provs p
inner join ProvFacs pf on p.Id = pf.Provider_Id
WHERE pf.Facility_Id in
(select a.AccesibleFacilityIds from #AccessibleFacilities a)
However, it gives me an error like:
Conversion failed when converting the nvarchar value ',1,2,3,4,5,6,'
to data type int.
I tried removing the comma signs at the start and end like below to fix it, but it did not help:
...
where pf.Facility_Id in (
select SUBSTRING(a.AccesibleFacilityIds,2,LEN(a.AccesibleFacilityIds)-2)
from #AccessibleFacilities a
)
Any advice would be appreciated. Thanks.

Instead of converting Facility_Id into a comma delimited string, why not keep it as a usable column in your temp table?
if object_Id('tempdb..#AccessibleFacilities') is not null drop table #AccessibleFacilities;
select
u.UserId
, upf.Facility_Id
into #AccessibleFacilities
from Users u
inner join UserProfileFacilities upf
on upf.UserProfile_Id = up.Id
Then use it as you did with in() or with exists():
insert into #AccessibleProvs
select
UserId = #userId
, AccessibleProvIds = dbo.GetCommaDelimitedString(distinct p.Id)
from Provs p
inner join ProvFacs pf
on p.Id = pf.Provider_Id
where exists (
select 1
from #AccessibleFacilities a
where a.Facility_Id = pf.Facility_Id
--and a.UserId = #UserId -- Do you need to check Facility_Id by User?
)
If you have the value for #UserId in the beginning, you could limit your temp table usage to just the user you need. Hopefully this code is not meant for use in some sort of cursor or other loop.

How can I perform the Count function with a where clause?

I have my database setup to allow a user to "Like" or "Dislike" a post. If it is liked, the column isliked = true, false otherwise (null if nothing.)
The problem is, I am trying to create a view that shows all Posts, and also shows a column with how many 'likes' and 'dislikes' each post has. Here is my SQL; I'm not sure where to go from here. It's been a while since I've worked with SQL and everything I've tried so far has not given me what I want.
Perhaps my DB isn't setup properly for this. Here is the SQL:
Select trippin.AccountData.username, trippin.PostData.posttext,
trippin.CategoryData.categoryname, Count(trippin.LikesDislikesData.liked)
as TimesLiked from trippin.PostData
inner join trippin.AccountData on trippin.PostData.accountid = trippin.AccountData.id
inner join trippin.CategoryData on trippin.CategoryData.id = trippin.PostData.categoryid
full outer join trippin.LikesDislikesData on trippin.LikesDislikesData.postid =
trippin.PostData.id
full outer join trippin.LikesDislikesData likes2 on trippin.LikesDislikesData.accountid =
trippin.AccountData.id
Group By (trippin.AccountData.username), (trippin.PostData.posttext), (trippin.categorydata.categoryname);
Here's my table setup (I've only included relevant columns):
LikesDislikesData
isliked(bit) || accountid(string) || postid(string
PostData
id(string) || posttext || accountid(string)
AccountData
id(string) || username(string)
CategoryData
categoryname(string)

Problem 1: FULL OUTER JOIN versus LEFT OUTER JOIN. Full outer joins are seldom what you want, it means you want all data specified on the "left" and all data specified on the "right", that are matched and unmatched. What you want is all the PostData on the "left" and any matching Likes data on the "right". If some right hand side rows don't match something on the left, then you don't care about it. Almost always work from left to right and join results that are relevant.
Problem 2: table alias. Where ever you alias a table name - such as Likes2 - then every instance of that table within the query needs to use that alias. Straight after you declare the alias Likes2, your join condition refers back to trippin.LikesDislikesData, which is the first instance of the table. Given the second one in joining on a different field I suspect that the postid and accountid are being matched on the same row, therefore it should be AND together, not a separate table instance. EDIT reading your schema closer, it seems this wouldn't be needed at all.
Problem 3: to solve you Counts problem separate them using CASE statements. Count will add the number of non NULL values returned for each CASE. If the likes.liked = 1, then return 1 otherwise return NULL. The NULL will be returned if the columns contains a 0 or a NULL.
SELECT trippin.PostData.Id, trippin.AccountData.username, trippin.PostData.posttext,
trippin.CategoryData.categoryname,
SUM(CASE WHEN likes.liked = 1 THEN 1 ELSE 0 END) as TimesLiked,
SUM(CASE WHEN likes.liked = 0 THEN 1 ELSE 0 END) as TimesDisLiked
FROM trippin.PostData
INNER JOIN trippin.AccountData ON trippin.PostData.accountid = trippin.AccountData.id
INNER JOIN trippin.CategoryData ON trippin.CategoryData.id = trippin.PostData.categoryid
LEFT OUTER JOIN trippin.LikesDislikesData likes ON likes.postid = trippin.PostData.id
-- remove AND likes.accountid = trippin.AccountData.id
GROUP BY trippin.PostData.Id, (trippin.AccountData.username), (trippin.PostData.posttext), (trippin.categorydata.categoryname);
Then "hide" the PostId column in the User Interface.

Instead of selecting Count(trippin.LikesDislikesData.liked) you could put in a select statement:
Select AccountData.username, PostData.posttext, CategoryData.categoryname,
(select Count(*)
from LikesDislikesData as likes2
where likes2.postid = postdata.id
and likes2.liked = 'like' ) as TimesLiked
from PostData
inner join AccountData on PostData.accountid = AccountData.id
inner join CategoryData on CategoryData.id = PostData.categoryid

USE AdventureWorksDW2008R2
GO
SET NOCOUNT ON
GO
/*
Default
*/
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO
BEGIN TRAN
IF OBJECT_ID('tempdb.dbo.#LikesDislikesData') IS NOT NULL
BEGIN
DROP TABLE #LikesDislikesData
END
CREATE TABLE #LikesDislikesData(
isLiked bit
,accountid VARCHAR(50)
,postid VARCHAR(50)
);
IF OBJECT_ID('tempdb.dbo.#PostData') IS NOT NULL
BEGIN
DROP TABLE #PostData
END
CREATE TABLE #PostData(
postid INT IDENTITY(1,1) NOT NULL
,accountid VARCHAR(50)
,posttext VARCHAR(50)
);
IF OBJECT_ID('tempdb.dbo.#AccountData') IS NOT NULL
BEGIN
DROP TABLE #AccountData
END
CREATE TABLE #AccountData(
accountid INT
,username VARCHAR(50)
);
IF OBJECT_ID('tempdb.dbo.#CategoryData') IS NOT NULL
BEGIN
DROP TABLE #CategoryData
END
CREATE TABLE #CategoryData(
categoryname VARCHAR(50)
);
INSERT INTO #AccountData VALUES ('1', 'user1')
INSERT INTO #PostData VALUES('1','this is a post')
INSERT INTO #LikesDislikesData (isLiked ,accountid, postid)
SELECT '1', P.accountid, P.postid
FROM #PostData P
WHERE P.posttext = 'this is a post'
SELECT *
FROM #PostData
SELECT *
FROM #LikesDislikesData
SELECT *
FROM #AccountData
SELECT COUNT(L.isLiked) 'Likes'
,P.posttext
,A.username
FROM #PostData P
JOIN #LikesDislikesData L
ON P.accountid = L.accountid
AND L.IsLiked = 1
JOIN #AccountData A
ON P.accountid = A.accountid
GROUP BY P.posttext, A.username
SELECT X.likes, Y.dislikes
FROM (
(SELECT COUNT(isliked)as 'likes', accountid
FROM #LikesDislikesData
WHERE isLiked = 1
GROUP BY accountid
) X
JOIN
(SELECT COUNT(isliked)as 'dislikes', accountid
FROM #LikesDislikesData
WHERE isLiked = 0
GROUP BY accountid) Y
ON x.accountid = y.accountid)
IF (XACT_STATE() = 1 AND ERROR_STATE() = 0)
BEGIN
COMMIT TRAN
END
ELSE IF (##TRANCOUNT > 0)
BEGIN
ROLLBACK TRAN
END

How do you think about the solution? We create a new table SummaryReport(PostID,AccountID,NumberOfLikedTime,NumberOfDislikedTimes).
An user clicks on LIKE or DISLIKE button we update the table. After that, you can query as you desire. Another advantage, the table can be served reporting purpose.

SQL Query + more efficient alternative

I have a query which involves 2 tables 'Coupons' and 'CouponUsedLog' in SQL Server, the query below will obtain some information from these 2 tables for statistics study use. Somehow I feel that while my query works and returns me the desired results, I feel that I can be written in a more efficient way, can someone please advice if there's a better way to rewrite this? Am I using too many unnecessary variables and joins? Thanks.
DECLARE #CouponISSUED int=null
DECLARE #CouponUSED int=null
DECLARE #CouponAVAILABLE int=null
DECLARE #CouponEXPIRED int=null
DECLARE #CouponLastUsed Date=null
--Total CouponIssued
SET #CouponISSUED =
(
select count(*)
from Coupon C Left Join
couponusedlog CU on C.autoid = CU.Coupon_AutoID
where C.VoidedBy is null and
C.VoidedOn is null and
DeletedBy is null and
DeletedOn is null and
Card_AutoID in (Select AutoID
from Card
where MemberID = 'Mem001')
)
--Total CouponUsed
SET #CouponUSED =
(
select count(*)
from couponusedlog CU Left Join
Coupon C on CU.Coupon_AutoID = V.autoid
where CU.VoidedBy is null and
CU.VoidedOn is null and
C.Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem001')
)
SET #CouponAVAILABLE = #CouponISSUED - #CouponUSED
--Expired Coupons
SET #CouponEXPIRED =
(
select Count(*)
from Coupon C Left Join
couponusedlog CU on C.autoid = CU.Coupon_AutoID
where C.VoidedBy is null and
C.VoidedOn is null and
deletedBy is null and
deletedOn is null and
Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem002') and
CONVERT (date, getdate()) > C.expirydate
)
--Last Used On
SET #CouponLastUsed =
(
select CONVERT(varchar(10),
Max(VU.AddedOn), 103) AS [DD/MM/YYYY]
from couponusedlog CU Left Join
coupon C on CU.Coupon_AutoID = C.autoid
where CU.voidedBy is null and
CU.voidedOn is null and
C.Card_AutoID in (select AutoID
from Card
where MemberID = 'Mem002')
)
Select #CouponISSUED As Coupon_Issued,
#CouponUSED As Coupon_Used,
#CouponAVAILABLE As Coupon_Available,
#CouponEXPIRED As Coupon_Expired,
#CouponLastUsed As Last_Coupon_UsedOn

In general its better to do things in a single query if you you're just looking for counts of things particularly against nearly the same data set then in four separate queries.
This query combines what you need into a single query by converting your WHERE Clauses into SUMS of CASE statements. The MAX of the date is just a normal thing you can do when you're doing a count or a sum.
SELECT COUNT(*) couponissued,
SUM(CASE
WHEN deletedby IS NULL
AND deletedon IS NULL THEN 1
ELSE 0
END) AS couponused,
SUM(CASE
WHEN deletedby IS NULL
AND deletedon IS NULL
AND Getdate() > c.expirydate THEN 1
ELSE 0
END) AS couponex,
MAX(vu.addedon) CouponEXPIRED
FROM [couponusedlog] cu
LEFT JOIN [Coupon] c
ON ( cu.coupon_autoid = v.autoid )
WHERE cu.voidedby IS NULL
AND cu.voidedon IS NULL
AND ( c.card_autoid IN (SELECT [AutoID]
FROM [Card]
WHERE memberid = 'Mem001') )
You can then convert that into a Common Table Expression to do your subtraction and formatting

Are you asking this question out of a proactive desire to be as effecient as possible, or because of an actual performance issue you would like to correct? You can make this more effecient at the cost of having code that is harder to manage. If the performance is okay right now I would highly recommend you leave it because the next person to come along will be able to understand it just fine. If you make one huge effecient but garbled sql statement out of it then when you or anyone else wants to update something about it it's going to take you 3 times longer as you try to re-figure out what the heck you were thinking when you wrote it.

How to check intersection of subqueries in query?

I have the next query:
SELECT c.client_code, a.account_num, m.account_close_date, u.uso, m.product_name
FROM accounts a INNER JOIN Clients c ON c.id = a.client_id INNER JOIN
Uso u ON c.uso_id = u.uso_id INNER JOIN Magazine m ON a.account_id = m.account_id
and I need to compare product_name with input parameter.
product_name and input parameter #s are comma-delimited strings.
I use next split function:
ALTER FUNCTION [dbo].[Split]
(
#s VARCHAR(max),
#split CHAR(1)
)
RETURNS #temptable TABLE (items VARCHAR(MAX))
AS
BEGIN
DECLARE #x XML
SELECT #x = CONVERT(xml,'<root><s>' + REPLACE(#s,#split,'</s><s>') + '</s></root>');
INSERT INTO #temptable
SELECT [Value] = T.c.value('.','varchar(20)')
FROM #X.nodes('/root/s') T(c);
RETURN
END;
I think that I need to check the intersection of tables, which I will receive after split of product_name and after split of input parameter. I trid to do this:
WHERE (select * from dbo.Split(m.product_name, ';')
INTERSECT select * from dbo.Split('product1;product2',';'))
is not null
But it does not work quite right. Please, help me.

INTERSECT requires the same column output and is used like UNION or EXCEPT: not in the WHERE clause
Just JOIN onto the udf
...
INNER JOIN
Magazine m ON a.account_id = m.account_id
INNER JOIN
dbo.Split(#parameter, ';') CSV ON m.productname = CSV.items
If you need to split m.productname, if you can't fix the design, use CROSS APPLY
...
INNER JOIN
Magazine m ON a.account_id = m.account_id
CROSS APPLY
dbo.Split(m.productname, ';') WTF
INNER JOIN
dbo.Split(#parameter, ';') CSV ON WTF.items = CSV.items
However, JOIN and INTERSECT give different results if #parameter has duplicated values. Add a DISTINCT to the UDF for example to get around this. Or change the udf JOIN into EXISTS

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Text Matching Query Tuning - sql

Related

Query / design performance problem: How to get an Id via a 'link'-table in the most efficient way?

Conversion failed while using ids "where ID in (...)", nvarchar to int

How can I perform the Count function with a where clause?

SQL Query + more efficient alternative

How to check intersection of subqueries in query?

Categories

Resources