SQL Server 2012 Performance issue using FULLTEXT - sql

I'm using SQL Server 2012 Standard and I have some issue using the CONTAINS clause on a query.
My query:
select *
from
calles as c
INNER JOIN
colonias as col ON c.ID_Colonia = col.ID_Colonia
where
CONTAINS(c.Nombre,#Busqueda) OR CONTAINS(col.Nombre,#Busqueda)
If I use only one contains the time of the search is about 200 ms but if I use both it is about 10s (that's a lot of time). I try a workaround to do it using UNION like this:
select *
from
calles as c
INNER JOIN
colonias as col ON c.ID_Colonia = col.ID_Colonia
where
CONTAINS(c.Nombre,#Busqueda)
UNION
select *
from
calles as c
INNER JOIN
colonias as col ON c.ID_Colonia = col.ID_Colonia
where
CONTAINS(col.Nombre,#Busqueda)
And the query time is about 200ms again. But I think that the second code is clumsy. Do I have some error?

FULLTEXT index in SQL Server is a service which is (kinda) external to the RDBMS engine.
It accepts the search string and returns a list of key values from the table (which need then to be joined with the table itself to be sure they're still there).
So in fact you are joining two more tables in your query and apply an OR condition to the result of the join.
SQL Server's optimizer is not especially smart when it comes to constructs like this.
Replacing an OR condition with a UNION is a legitimate and commonly used optimization technique.

Related

SQL INNER JOIN vs WHERE IN big performance difference

I have read multiple sources and still don't understand where a big difference comes from in a query I have for Microsoft SQL Server.
I need to count different alerts linked to vehicles (IdMateriel is synonymous to the id of the vehicle) based on types (CodeAlerte), state (Etat), and a Top true/false column, but beofre the different counts I need to select the data.
TLDR : There are two parameters, the current date as an SQL DATETIME, and the VARCHAR(MAX) string of entity codes separated by commas, which I split using STRING_SPLIT to use them either in WHERE IN clause or INNER JOIN. Using it in the first clause is ~10x faster than the second clause although it seems equivalent to me. Why?
First, the queries are based on the view created as follows:
CREATE OR ALTER VIEW [dbo].[AlertesVehicules]
WITH SCHEMABINDING
AS
SELECT dbo.Alerte.IdMateriel, dbo.Materiel.EntiteGestion, dbo.Alerte.IdTypeAlerte,
dbo.TypeAlerte.CodeAlerte, dbo.TypeAlerte.TopAlerteMajeure, dbo.Alerte.Etat,
Vehicule.Top, Vehicule.EtatVehicule,COUNT_BIG(*) AS COUNT
FROM dbo.Alerte
INNER JOIN dbo.Materiel on dbo.Alerte.IdMateriel= dbo.Materiel.Id
INNER JOIN dbo.Vehicule on dbo.Vehicule.Id= dbo.Materiel.Id
INNER JOIN dbo.TypeAlerte on dbo.Alerte.IdTypeAlerte = dbo.TypeAlerte.Id
WHERE dbo.Materiel.EntiteGestion is NOT NULL
AND dbo.TypeAlerte.CodeAlerte IN ('P07','P08','P09','P11','P12','P13','P14')
GROUP BY dbo.Alerte.IdMateriel, dbo.Materiel.EntiteGestion, dbo.Alerte.IdTypeAlerte,
dbo.TypeAlerte.CodeAlerte, dbo.TypeAlerte.TopAlerteMajeure, dbo.Alerte.Etat,
Vehicule.Top, Vehicule.EtatVehicule
GO
CREATE UNIQUE CLUSTERED INDEX IX_AlerteVehicule
ON dbo.AlertesVehicules (EntiteGestion,IdMateriel,CodeAlerte,Etat,TopAlerteMajeure);
GO
This first version of the query takes ~100ms:
SELECT DISTINCT a.IdMateriel, a.CodeAlerte, a.Etat, a.Top INTO #tmpTabAlertes
FROM dbo.AlertesVehicules a
LEFT JOIN tb_AFFECTATION_SECTION ase ON a.IdMateriel = ase.ID_Vehicule
INNER JOIN (SELECT value AS entiteGestion FROM STRING_SPLIT(#entiteGestion, ',')) eg
ON a.EntiteGestion = eg.entiteGestion
WHERE
a.CodeAlerte IN ('P08','P09')
AND #currentDate <= ISNULL(ase.DateFin, #currentDate)
According to SQL Sentry Plan Explorer, the actual execution plan starts with an index seek taking ~30% of the time on dbo.Alerte with the predicate Alerte.IdTypeAlerte=TypeAlerte.Id, outputting 369 000 rows of Etat, IdMateriel and IdTypeAlerte, which it then directly filters down to 7 742 based on predicate PROBE(Opt_Bitmapxxxx,Alerte.IdMateriel), and then inner joins taking ~25% of the time with the 2 results of another index seek of TypeAlerte but with predicates TypeAlerte.CodeAlerte = N'P08' and =N'P09'. So just these two parts take > 50ms, but I don't understand why there are so many initial results.
The second version takes ~10ms :
SELECT DISTINCT a.dMateriel, a.CodeAlerte, a.Etat, a.Top INTO #tmpTab
FROM dbo.AlertesVehicules a
LEFT JOIN tb_AFFECTATION_SECTION ase ON a.IdMateriel = ase.ID_Vehicule
WHERE
a.EntiteGestion IN (SELECT value FROM STRING_SPLIT(#entiteGestion, ','))
AND a.CodeAlerte IN ('P08','P09')
AND (#currentDate <= ISNULL(ase.DateFin, #currentDate))
For this one, SQL Sentry Plan Explorer starts with a View Clustered Index Seek directly on the view AlertesVehicules with Seek Predicates AlertesVehicule.EntiteGestion > Exprxxx1 and <Exprxxx2and Predicate AlertesVehicules.CodeAlerte=N'P08' and =N'P09'
Why are those two treated so differently when it seems to me that they are exactly equivalent?
For references, here are some threads I already looked into, but didn't seem to find an explanation in (except that there shouldn't be a difference):
SQL JOIN vs IN performance?
Performance gap between WHERE IN (1,2,3,4) vs IN (select * from STRING_SPLIT('1,2,3,4',','))

How to speed up a SQL query which has an IN clause that contains 4,000 elements?

I am using python to generate a query text which I then send to the SQL server. The query is created in a function that accepts a list of strings which are then inserted into the query.
The query looks like:
SELECT *
FROM DB
WHERE last_word in ('red', 'phone', 'robin')
The issue is that here I have just 3 words, red, phone, and robin, but in another use case I have over 4,000 words and the response takes about 2 hours. How can I rewrite this query to make it more performant?
optimization strategies:
add an index on last_word
CREATE INDEX ON db(last_word)
store the filter words in a table and use a WHERE exists (or inner join)
WITH words (word) AS (
VALUES ('red'), ('phone'), ('robin')
)
SELECT *
FROM db
WHERE EXISTS (SELECT TRUE FROM words WHERE word = last_word)
or
WITH words (word) AS (
VALUES ('red'), ('phone'), ('robin')
)
SELECT db.*
FROM db
JOIN words ON db.last_word = words.word
The WHERE EXISTS here should be slightly faster than JOIN
How many rows do you have in "DB"? Are there more "last_word"s matching the 4000 words in the IN clause than not? If so, it would be better to use NOT IN, to exclude instead of include. Also, try to never use SELECT * since this wildcard is very unperformant, it's better to explicitly define the columns you want to include in your query.
You could also try to put the 4000 words to match on in a (temporary) table or a CTE and then join on it, since joins usually work better than large loads of data within the IN clause. With this, I still recommend to not use the wildcard in the SELECT statement.
Put your data into a temp table or CTE. This would make for easier addition of new data. Likewise, you’ll have to do an inner join to your source table to make sure you capture everything.
Hope this helps.
Try doing something like this:
SELECT *
FROM DB INNER JOIN WORDS_TABLE
ON DB.WORDS = WORDS_TABLE.WORDS;
Instead of the * use whatever you want to get.
JOIN in this case will be faster than the IN as you will have to write another inner query if you are using a table.

Left join Ignore

I have recently noticed that SQL Server 2016 appears to be ignoring left joins if any column is not used in the select or where clause. Also not found in Actual execution plan.
This is good for if anyone added extra join but still not affecting performance.
I have query that took 9 sec, if I add column in select clause for Left join tables but without that only 1 sec.
Can anyone please check and suggest, Is that true or not?
Query with Actual execution plan. You can see there is no any table from left join in execution plan.
I'm not 100% sure what the question is asking, but a SQL optimizer can ignore left join. Consider this type of query:
select a.*
from a left join
b
on a.b_id = b.id;
If b.id is declared as unique (or equivalently a primary key) then the above query returns exactly the same result set as:
select a.*
from a;
I am not per se aware that SQL Server started putting this optimization in 2016. But the optimization is perfectly valid and (I believe) other optimizers do implement it.
Remember, SQL is a declarative language, not a procedural language. The SQL query describes the result set, not how it is produced.
If you have a left join and your matching condition don't return any data from the joined table it will return data as inner join return, when select statement does not contains columns from right tables. Not only in ms server 2016 but most of the DB's.
Left join reduces the performance of the query if there are large amount of data available in join tables.

Access SQL Syntax error: missing operator

I am trying to convert a T-SQL query to MS Access SQL and getting a syntax error that I am struggling to find. My MS Access SQL query looks like this:
INSERT INTO IndvRFM_PreSort (CustNum, IndvID, IndvRScore, IndRecency, IndvFreq, IndvMonVal )
SELECT
IndvMast.CustNum, IndvMast.IndvID, IndvMast.IndvRScore,
IndvMast.IndRecency, IndvMast.IndvFreq, IndvMast.IndvMonVal
FROM
IndvMast
INNER JOIN
OHdrMast ON IndvMast.IndvID = OHdrMast.IndvID
INNER JOIN
MyParameterSettings on 1=1].ProdClass
INNER JOIN
[SalesTerritoryFilter_Check all that apply] ON IndvMast.SalesTerr = [SalesTerritoryFilter_Check all that apply].SalesTerr
WHERE
(((OHdrMast.OrdDate) >= [MyParameterSettings].[RFM_StartDate]))
GROUP BY
IndvMast.CustNum, IndvMast.IndvID, IndvMast.IndvRScore,
IndvMast.IndRecency, IndvMast.IndvFreq, IndvMast.IndvMonVal,
[CustTypeFilter_Check all that apply].IncludeInRFM,
[ProductClassFilter_Check all that apply].IncludeInRFM,
[SourceCodeFilter_Check all that apply].IncludeInRFM,
IndvMast.FlgDontUse
I have reviewed differences between MS Access SQL and T-SQL at http://rogersaccessblog.blogspot.com/2013/05/what-are-differences-between-access-sql.html and a few other locations but with no luck.
All help is appreciated.
update: I have removed many lines trying to find the syntax error and I am still getting the same error when running just (which runs fine using T-SQL):
SELECT
IndvMast.CustNum, IndvMast.IndvID, IndvMast.IndvRScore,
IndvMast.IndRecency, IndvMast.IndvFreq, IndvMast.IndvMonVal
FROM
IndvMast
INNER JOIN
OHdrMast ON IndvMast.IndvID = OHdrMast.IndvID
INNER JOIN
[My Parameter Settings] ON 1 = 1
There are a number of items in your query that should also have failed in any SQL-compliant database:
You have fields from tables in GROUP BY not referenced in FROM or JOIN clauses.
Number of fields in SELECT query do not match number of fields in INSERT INTO clause.
The MyParameterSettings table is not properly joined with valid ON expression.
Strictly MS Access SQL items:
For more than one join, MS Access SQL requires paired parentheses but even this can get tricky if some tables are joined together and their paired result joins to outer where you get nested joins.
Expressions like ON 1=1 must be used in WHERE clause and for cross join tables as MyParameterSettings appears to be, use comma-separated tables.
For above reasons and more, it is advised for beginners to this SQL dialect to use the Query Design builder providing table diagrams and links (if you have the MS Access GUI .exe of course). Then, once all tables connect graphically with at least one field selected, jump into SQL view for any nuanced scripting logic.
Below is an adjustment to SQL statement to demonstrate the parentheses pairings and for best practices, uses table aliases especially with long table names.
INSERT INTO IndvRFM_PreSort (CustNum, IndvID, IndvRScore, IndRecency, IndvFreq, IndvMonVal)
SELECT
i.CustNum, i.IndvID, i.IndvRScore, i.IndRecency, i.IndvFreq, i.IndvMonVal
FROM
[MyParameterSettings] p, (IndvMast i
INNER JOIN
OHdrMast o ON i.IndvID = o.IndvID)
INNER JOIN
[SalesTerritoryFilter_Check all that apply] s ON i.SalesTerr = s.SalesTerr
WHERE
(o.OrdDate >= p.[RFM_StartDate])
GROUP BY
i.CustNum, i.IndvID, i.IndvRScore, i.IndRecency, i.IndvFreq, i.IndvMonVal
And in your smaller SQL subset, the last table does not need an ON 1=1 condition and may be redundant as well in SQL Server. Simply a comma separate table will suffice if you intend for cross join. The same is done in above example:
SELECT
IndvMast.CustNum, IndvMast.IndvID, IndvMast.IndvRScore,
IndvMast.IndRecency, IndvMast.IndvFreq, IndvMast.IndvMonVal
FROM
[My Parameter Settings], IndvMast
INNER JOIN
OHdrMast ON IndvMast.IndvID = OHdrMast.IndvID
I suppose there are some errors in your query, the first (more important).
Why do you use HAVING clause to add these conditions?
HAVING (((IndvMast.IndRecency)>(date()-7200))
AND (([CustTypeFilter_Check all that apply].IncludeInRFM)=1)
AND (([ProductClassFilter_Check all that apply].IncludeInRFM)=1)
AND (([SourceCodeFilter_Check all that apply].IncludeInRFM)=1)
AND ((IndvMast.FlgDontUse) Is Null))
HAVING usually used about conditions on aggregate functions (COUNT, SUM, MAX, MIN, AVG), for scalar value you must put in WHERE clause.
The second: You have 12 parenthesis opened and 11 closed in HAVING clause

How do I force MS SQL Server to perform an index join?

I'm working on an assignment where I'm supposed to compare different join methods in SQL Server, namely hash-join, merge-join and index-join.
I'm having difficulties getting SQL Server to perform an index-join. Can anyone show me how I can force it to use an index-join (using a join hint or similar), or just simply provide a simple query with a join on which SQL server uses the index-join method?
You have Loop, hash and merge joins (BOL) only. No index joins.
For more than you ever needed to know, Craig Friedman's series on JOINs (he's one of the team that designed the relation engine for SQL Server)
You can have an Index hint on straight select, but I'm not sure that the same syntax is available for a join.
SELECT blah FROM table WITH (INDEX (index_name))
you could use this in a non-ansi (?) join
SELECT blah FROM TABLE1, TABLE2
WHERE TABLE2.ForiegnKeyID = TABLE1.ID
WITH (INDEX (index_name))
Join with a index hint:
SELECT
ticket.ticket_id
FROM
purchased_tickets
JOIN ticket WITH (INDEX ( ticket_ix3))
ON ticket.original_ticket_id = purchased_tickets.ticket_id
AND ticket.paid_for = 1
AND ticket.punched = 0
WHERE purchased_tickets.seller_id = #current_user
OPTION (KEEPFIXED PLAN);
I'm having trouble finding such terminology in SQL server.
http://en.wikipedia.org/wiki/Join_(SQL)#Join_algorithms
Are you just looking for a nested loop that uses indexes, resulting in an index seek?