Cross Join taking too long - sql

Please see the DDL below:
CREATE TABLE [dbo].[TBX_RRDGenieDeletedItem](
[DeletedId] [decimal](25, 0) NOT NULL
) ON [PRIMARY]
INSERT INTO TBX_RRDGenieDeletedItem values (90309955000010401948421)
CREATE TABLE [dbo].[dbNicheCIS](
[OccurrenceID] [decimal](25, 0) NULL,
[OccurrenceFileNo] [varchar](20) NULL
)
INSERT INTO dbNicheCIS values (90309955000010401948421,'3212')
CREATE TABLE [dbo].[Asset_Table](
[user_crimenumber] [varchar](4000) NOT NULL
)
INSERT INTO Asset_Table VALUES ('3212; 4512; 34322; 45674; 33221')
The only table I designed was dbNicheCIS. I am trying to find all of the rows in tbx_rrdgeniedeleteditem that are also in Asset_Table using the LIKE statement. Asset_Table contains the OccurrenceFileNo (note that asset table contains occurrencefileno: 3212, which relates to OccurrenceID: 90309955000010401948421). I have tried this:
Select user_crimenumber from tbx_rrdgeniedeleteditem --asset_table.user_crimenumber
inner join dbNicheCIS on tbx_rrdgeniedeleteditem.deletedid = dbNicheCIS.OccurrenceID
cross join asset_table
where deletedid like '903%' and asset_table.user_crimenumber like '%' + occurrencefileno + '%'
It works, but it takes hours to run. Is there a better way to approach it rather than a cross join?

You can use INNER JOIN and also you can eliminate LIKE for the number comparison like below
Select user_crimenumber from tbx_rrdgeniedeleteditem
inner join dbNicheCIS
on tbx_rrdgeniedeleteditem.deletedid = dbNicheCIS.OccurrenceID
inner join asset_table
on CAST(LEFT([DeletedId], 3) AS [decimal](25, 0)) =903
and asset_table.user_crimenumber like '%' + occurrencefileno + '%'

you can make use of the in operator in this case
SELECT * FROM TBX_RRDGenieDeletedItem
WHERE DeletedId IN (
SELECT DISTINCT OccurrenceID FROM dbNicheCIS
INNER JOIN Split(...) ON ...)
Updated: You can create a custom split function which will split the values into a temp table and then do the join.

You need to index your tables to get faster query response.
CREATE INDEX [IX_dbNicheCIS_OccurrenceID] ON [dbNicheCIS]
([OccurrenceID] ASC, [OccurrenceFileNo] ASC)
CREATE INDEX [IX_TBX_RRDGenieDeletedItem_DeletedId] ON [dbo].[TBX_RRDGenieDeletedItem]
([DeletedId] ASC)
Creating such indexes replaces "Table scan" in query execution plan with faster "Index scan" and "Index seek". But you cannot solve like '%' + occurrencefileno + '%' problem with simple indexes.
There you will have to use full text indexes. After you define fulltext index on asset_table.user_crimenumber, you can use following query
SELECT user_crimenumber
FROM tbx_rrdgeniedeleteditem di --asset_table.user_crimenumber
JOIN dbNicheCIS dnc
ON di.deletedid = dnc.OccurrenceID
CROSS JOIN asset_table at
WHERE di.deletedid like '903%'
AND CONTAINS(at.user_crimenumber, occurrencefileno)
But it is a bad practice to store your occurrencefileno list as a varchar value delimited with semicolons. If you were the author of this database design, you should have tried to normalize the data, so that you got one row for every occurrencefileno and not a string like '3212; 4512; 34322; 45674; 33221'.
You can also create as a first step before querying a normalized version of asset_table.user_crimenumber and then use this table with normal indexes as a base for your further queries.
To split your asset_table.user_crimenumber fields you can use the Fn_Split() function as mentioned in this answer.
There is also option using the fnSplit to rewrite your query this way:
SELECT user_crimenumber
FROM tbx_rrdgeniedeleteditem di --asset_table.user_crimenumber
JOIN dbNicheCIS dnc
ON di.deletedid = dnc.OccurrenceID
INNER JOIN (
SELECT at.user_crimenumber, f.item FROM asset_table at
CROSS APPLY dbo.fnSplit(at.user_crimenumber,';') f ) at
ON at.item=dnc.occurrencefileno
WHERE di.deletedid like '903%'
If you create fnSplit as CLR in C# as described here you may get even faster results. But it will not speed up your query magically.

Related

Trying to optimize a system objects search

I’m trying to search the database for any stored procedures that contain one of about 3500 different values.
I created a table to store the values in. I’m running the query below. The problem is, just testing it with a SELECT TOP 100 is taking 3+ mins to run (I have 3500+ values). I know it’s happening due to the query using LIKE.
I’m wondering if anyone has an idea on how I could optimize the search. The only results I need are the names of every value being searched for (pulled directly from the table I created: “SearchTerms”) and then a column that displays a 1 if it exists, 0 if it doesn’t.
Here’s the query I’m running:
SELECT
trm.Pattern,
(CASE
WHEN sm.object_id IS NULL THEN 0
ELSE 1
END) AS “Exists”
FROM dbo.SearchTerms trm
LEFT OUTER JOIN sys.sql_modules sm
ON sm.definition LIKE '%' + trm.Pattern + '%'
ORDER BY trm.Pattern
Note: it’s a one-time deal —it’s not something that will be run consistently.
Try CTE and get your Patterns which exists in any stored procedure with WHERE condition using EXISTS (...). Then use LEFT JOIN with dbo.SearchTerms and your CTE to get 1 or 0 value for Exists column.
;WITH ExistsSearchTerms AS (
SELECT Pattern
FROM dbo.SearchTerms
WHERE EXISTS (SELECT 1 FROM sys.sql_modules sm WHERE sm.definition LIKE '%' + Pattern + '%')
)
SELECT trm.Pattern, IIF(trmExist.Pattern IS NULL, 0, 1) AS "Exists"
FROM dbo.SearchTerms trm
LEFT JOIN dbo.SearchTerms trmExist
ON trm.Pattern = trmExist.Pattern
ORDER BY Pattern
Reference :
SQL performance on LEFT OUTER JOIN vs NOT EXISTS
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server

Improve performance of SQL Query with dynamic like

I need to search for people whose FirstName is included (a substring of) in the FirstName of somebody else.
SELECT DISTINCT top 10 people.[Id], peopleName.[LastName], peopleName.[FirstName]
FROM [dbo].[people] people
INNER JOIN [dbo].[people_NAME] peopleName on peopleName.[Id] = people.[Id]
WHERE EXISTS (SELECT *
FROM [dbo].[people_NAME] peopleName2
WHERE peopleName2.[Id] != people.[id]
AND peopleName2.[FirstName] LIKE '%' + peopleName.[FirstName] + '%')
It is so slow! I know it's because of the "'%' + peopleName.[FirstName] + '%'", because if I replace it with a hardcoded value like '%G%', it runs instantly.
With my dynamic like, my top 10 takes mores that 10 seconds!
I want to be able to run it on much bigger database.
What can I do?
Take a look at my answer about using LIKE operator here
It could be quite performant if you use some tricks
You can gain much speed if you play with collation, try this:
SELECT DISTINCT TOP 10 p.[Id], n.[LastName], n.[FirstName]
FROM [dbo].[people] p
INNER JOIN [dbo].[people_NAME] n on n.[Id] = p.[Id]
WHERE EXISTS (
SELECT 'x' x
FROM [dbo].[people_NAME] n2
WHERE n2.[Id] != p.[id]
AND
lower(n2.[FirstName]) collate latin1_general_bin
LIKE
'%' + lower(n1.[FirstName]) + '%' collate latin1_general_bin
)
As you can see we are using binary comparision instead of string comparision and this is much more performant.
Pay attention, you are working with people's names, so you can have issues with special unicode characters or strange accents.. etc.. etc..
Normally the EXISTS clause is better than INNER JOIN but you are using also a DISTINCT that is a GROUP BY on all columns.. so why not to use this?
You can switch to INNER JOIN and use the GROUP BY instead of the DISTINCT so testing COUNT(*)>1 will be (very little) more performant than testing WHERE n2.[Id] != p.[id], especially if your TOP clause is extracting many rows.
Try this:
SELECT TOP 10 p.[Id], n.[LastName], n.[FirstName]
FROM [dbo].[people] p
INNER JOIN [dbo].[people_NAME] n on n.[Id] = p.[Id]
INNER JOIN [dbo].[people_NAME] n2 on
lower(n2.[FirstName]) collate latin1_general_bin
LIKE
'%' + lower(n1.[FirstName]) + '%' collate latin1_general_bin
GROUP BY n1.[Id], n1.[FirstName]
HAVING COUNT(*)>1
Here we are matching also the name itself, so we will find at least one match for each name.
But We need only names that matches other names, so we will keep only rows with match count greater than one (count(*)=1 means that name match only with itself).
EDIT: I did all test using a random names table with 100000 rows and found that in this scenario, normal usage of LIKE operator is about three times worse than binary comparision.
This is a hard problem. I don't think a full text index will help, because you want to compare two columns.
That doesn't leave good options. One possibility is to implement ngrams. These are sequences of characters (say, 3 in a row) that come from a string. From my first name, you would have:
gor
ord
rdo
don
Then you can use these for direct matching on another column. Then you have to do additional work to see if the full name for one column matches another. But the ngrams should significantly reduce the work space.
Also, implementing ngrams requires work. One method uses a trigger which calculates the ngrams for each name and then inserts them into an ngram table.
I'm not sure if all this work is worth the effort to solve your problem. But it is possible to speed up the search.
You can do this,
With CTE as
(
SELECT top 10 peopleName.[Id], peopleName.[LastName], peopleName.[FirstName]
FROM
[dbo].[people_NAME] peopleName on peopleName.[Id] = people.[Id]
WHERE EXISTS (SELECT 1
FROM [dbo].[people_NAME] peopleName2
WHERE peopleName2.[Id] != people.[id]
AND peopleName2.[FirstName] LIKE '%' + peopleName.[FirstName] + '%')
order by peopleName.[Id]
)
//here join CTE with people table if at all it is require
select * from CTE
IF joining with people is not require then no need of CTE.
Have you tried a JOIN instead of a correlated query ?.
Being unable to use an index it won't have an optimal performance, but it should be a bit better than a correlated subquery.
SELECT DISTINCT top 10 people.[Id], peopleName.[LastName], peopleName.[FirstName]
FROM [dbo].[people] people
INNER JOIN [dbo].[people_NAME] peopleName on peopleName.[Id] = people.[Id]
INNER JOIN [dbo].[people_NAME] peopleName2 on peopleName2.[Id] <> people.[id] AND
peopleName2.[FirstName] LIKE '%' + peopleName.[FirstName] + '%'

Oracle like statement not using correct index

Oracle database.
I've got the following segment of SQL that's performing a full table scan on PROVIDER P1 table. I believe this is because it's dynamically building a like clause as you can see on line XXX.
I've got an index on PROVIDER.TERMINAL_NUMBER and the following SQL snippet does use the correct index.
select * from providers where terminal_number like '1234%'
so why does the following not hit that index?
SELECT P1.PROVIDER_NUMBER, P1.TERMINAL_NUMBER, PC."ORDER" FROM PROVIDERS P1
INNER JOIN PROVIDER_CONFIG PC
ON PC.PROVIDER_NUMBER = P1.PROVIDER_NUMBER
WHERE EXISTS (
SELECT E2.* FROM EQUIPMENT E1
INNER JOIN EQUIPMENT E2
ON E1.MERCHANT_NUMBER = E2.MERCHANT_NUMBER
WHERE E1.TERMINAL_NUMBER = 'SA323F'
AND E1.STATUS IN (0, 9)
AND E2.STATUS IN (0, 9)
XXX
AND P1.TERMINAL_NUMBER LIKE SUBSTR(E2.TERMINAL_NUMBER, 0, length(E2.TERMINAL_NUMBER) - 1) || '%'
)
ORDER BY PC."ORDER" DESC
Here ...
select * from providers where terminal_number like '1234%'
... the Optimiser knows all the fitting numbers start with a fixed prefix and so will be co-located in the index. Hence reading the index is likely to be very efficient.
But here there is no such knowledge ...
P1.TERMINAL_NUMBER LIKE SUBSTR(E2.TERMINAL_NUMBER, 0, length(E2.TERMINAL_NUMBER) - 1) || '%'
There can be any number of different prefixes from E2.TERMINAL_NUMBER and the query will be returning records from all over the PROVIDERS table. So indexed reads will be highly inefficient, and a blunt approach of full scans is the right option.
It may be possible to rewrite the query so it works more efficiently - for instance you would want a Fast Full Index Scan rather than a Full Table Scan. But without knowing your data and business rules we're not really in a position to help, especially when dynamic query generation is involved.
One thing which might improve performance would be to replace the WHERE EXISTS with a WHERE IN...
SELECT P1.PROVIDER_NUMBER, P1.TERMINAL_NUMBER, PC."ORDER" FROM PROVIDERS P1
INNER JOIN PROVIDER_CONFIG PC
ON PC.PROVIDER_NUMBER = P1.PROVIDER_NUMBER
WHERE substr(P1.TERMINAL_NUMBER, 1, 5) IN (
SELECT SUBSTR(E2.TERMINAL_NUMBER, 1, 5)
FROM EQUIPMENT E1
INNER JOIN EQUIPMENT E2
ON E1.MERCHANT_NUMBER = E2.MERCHANT_NUMBER
WHERE E1.TERMINAL_NUMBER = 'SA323F'
AND E1.STATUS IN (0, 9)
AND E2.STATUS IN (0, 9)
)
ORDER BY PC."ORDER" DESC
This would work if the length of the terminal number is constant. Only you know your data, so only you can tell whether it will fly.
If this query does not use an index:
select *
from providers
where terminal_number like '1234%'
Then presumably terminal_number is numeric and not a string. The type conversion prevents the use of the index.
If you want to use an index, then convert the value to a string and use a string index:
create index idx_providers_terminal_number_str on providers(cast(terminal_number as varchar2(255)));
Then write the query as:
select *
from providers
where cast(terminal_number as varchar2(255)) like '1234%'

What if the column to be indexed is nvarchar data type in SQL Server?

I retrieve data by joining multiple tables as indicated on the image below. On the other hand, as there is no data in the FK column (EmployeeID) of Event table, I have to use CardNo (nvarchar) fields in order to join the two tables. On the other hand, the digit numbers of CardNo fields in the Event and Employee tables are different, I also have to use RIGHT function of SQL Server and this makes the query to be executed approximately 10 times longer. So, in this scene what should I do? Can I use CardNo field without changing its data type to int, etc (because there are other problem might be seen after changing it and it sill be better to find a solution without changing the data type of it). Here is also execution plan of the query below.
Query:
; WITH a AS (SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNo
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID),
b AS (SELECT eve.EventID, eve.EventTime, eve.CardNo, evt.EventCH, dor.DoorName
FROM TEvent eve LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID)
SELECT * FROM b LEFT JOIN a ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
ORDER BY b.EventID ASC
You can add a computed column to your table like this:
ALTER TABLE TEmployee -- Don't start your table names with prefixes, you already know they're tables
ADD CardNoRight8 AS RIGHT(CardNo, 8) PERSISTED
ALTER TABLE TEvent
ADD CardNoRight8 AS RIGHT(CardNo, 8) PERSISTED
CREATE INDEX TEmployee_CardNoRight8_IDX ON TEmployee (CardNoRight8)
CREATE INDEX TEvent_CardNoRight8_IDX ON TEvent (CardNoRight8)
You don't need to persist the column since it already matches the criteria for a computed column to be indexed, but adding the PERSISTED keyword shouldn't hurt and might help the performance of other queries. It will cause a minor performance hit on updates and inserts, but that's probably fine in your case unless you're importing a lot of data (millions of rows) at a time.
The better solution though is to make sure that your columns that are supposed to match actually match. If the right 8 characters of the card number are something meaningful, then they shouldn't be part of the card number, they should be another column. If this is an issue where one table uses leading zeroes and the other doesn't then you should fix that data to be consistent instead of putting together work arounds like this.
This line is what is costing you 86% of the query time:
LEFT JOIN a ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
This is happening because it has to run RIGHT() on those fields for every row and then match them with the other table. This is obviously going to be inefficient.
The most straightforward solution is probably to either remove the RIGHT() entirely or else to re-implement it as a built-in column on the table so it doesn't have to be calculated on the fly while the query is running.
While inserting the record, you would have to also insert the eight, right digits of the card number and store it in this field. My original thought was to use a computed column but I don't think those can be indexed so you'd have to use a regular column.
; WITH a AS (
SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNoRightEight
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID
),
b AS (
SELECT eve.EventID, eve.EventTime, eve.CardNoRightEight, evt.EventCH, dor.DoorName
FROM TEvent eve LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID
)
SELECT *
FROM b
LEFT JOIN a ON a.CardNoRightEight = b.CardNoRightEight
ORDER BY b.EventID ASC
This will help you see how to add a calculated column to your database.
create table #temp (test varchar(30))
insert into #temp
values('000456')
alter table #temp
add test2 as right(test, 3) persisted
select * from #temp
The other alternative is to fix the data and the data entry so that both columns are the same data type and contain the same leading zeros (or remove them)
Many thanks all of your help. With the help of your answers, I managed to reduce the query execution time from 2 minutes to 1 at the first step after using computed columns. After that, when creating an index for these columns, I managed to reduce the execution time to 3 seconds. Wow, it is really perfect :)
Here are the steps posted for those who suffers from a similar problem:
Step I: Adding computed columns to the tables (As CardNo fields are nvarchar data type, I specify data type of computed columns as int):
ALTER TABLE TEvent ADD CardNoRightEight AS RIGHT(CAST(CardNo AS int), 8)
ALTER TABLE TEmployee ADD CardNoRightEight AS RIGHT(CAST(CardNo AS int), 8)
Step II: Create index for the computed columns in order to execute the query faster:
CREATE INDEX TEmployee_CardNoRightEight_IDX ON TEmployee (CardNoRightEight)
CREATE INDEX TEvent_CardNoRightEight_IDX ON TEvent (CardNoRightEight)
Step 3: Update the query by using the computed columns in it:
; WITH a AS (
SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNoRightEight --emp.CardNo
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID
),
b AS (
SELECT eve.EventID, eve.EventTime, evt.EventCH, dor.DoorName, eve.CardNoRightEight --eve.CardNo
FROM TEvent eve
LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID)
SELECT * FROM b LEFT JOIN a ON a.CardNoRightEight = b.CardNoRightEight --ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
ORDER BY b.EventID ASC

Check the query efficiency

I have this below SQL query that I want to get an opinion on whether I can improve it using Temp Tables or something else or is this good enough? So basically I am just feeding the result set from inner query to the outer one.
SELECT S.SolutionID
,S.SolutionName
,S.Enabled
FROM dbo.Solution S
WHERE s.SolutionID IN (
SELECT DISTINCT sf.SolutionID
FROM dbo.SolutionToFeature sf
WHERE sf.SolutionToFeatureID IN (
SELECT sfg.SolutionToFeatureID
FROM dbo.SolutionFeatureToUsergroup SFG
WHERE sfg.UsergroupID IN (
SELECT UG.UsergroupID
FROM dbo.Usergroup UG
WHERE ug.SiteID = #SiteID
)
)
)
It's going to depend largely on the indexes you have on those tables. Since you are only selecting data out of the Solution table, you can put everything else in an exists clause, do some proper joins, and it should perform better.
The exists clause will allow you to remove the distinct you have on the SolutionToFeature table. Distinct will cause a performance hit because it is basically creating a temp table behind the scenes to do the comparison on whether or not the record is unique against the rest of the result set. You take a pretty big hit as your tables grow.
It will look something similar to what I have below, but without sample data or anything I can't tell if it's exactly right.
Select S.SolutionID, S.SolutionName, S.Enabled
From dbo.Solutin S
Where Exists (
select 1
from dbo.SolutionToFeature sf
Inner Join dbo.SolutionToFeatureTousergroup SFG on sf.SolutionToFeatureID = SFG.SolutionToFeatureID
Inner Join dbo.UserGroup UG on sfg.UserGroupID = UG.UserGroupID
Where S.SolutionID = sf.SolutionID
and UG.SiteID = #SiteID
)