performance impact of making where clause dummy - SQL Server

performance impact of making where clause dummy - SQL Server - sql

I need to know about the performance impact of the below method of writing the query.
Assume there is an employee table. Requirement is to get a list of employees under a particular department and optionally the user can filter the result set by providing the city/location.
declare #dept varchar(10) = 'ABC', #city varchar(10)
select * from employee where dept = #dept and city = isnull(#city, city)
Is this fine? or do we need to use traditional if logic to check whether the user provided city as input?
Thanks,
Sabarish.

I remember reading somewhere that the following syntax is quicker than calling ISNULL():
select * from employee where dept = #dept and (#city IS NULL OR #city = city)
It was something to do with the SQL compiler effectively knowing that it can ignore the expression in brackets if #city is null.
Sorry but no idea where I read this (it was some time ago), otherwise I would cite it properly.

Most powerfull aproach to solve performance problems with nulls is try to avoid nulls by default values. In your case should be good try something like:
declare #dept varchar(10) = 'ABC', #city varchar(10) = 'unknown'
SELECT *
FROM employee
WHERE dept = #dept AND
#city = 'unknown'
UNION
SELECT *
FROM employee
WHERE dept = #dept AND
city = #city AND
#city != 'unknown'
Why?
Cardinality estimator is not able to estimate correct number of rows that query returns and it causes, that execution plan should be bad for this particular query. Avoid nulls and everything will be great B-)

For sure the answer provided by #Jonathan will improve performance if 'City' column has separate NonClustered Index on it. If not both the execution plan will lead to SCAN. If you have NonClustered Index then the Jonathan's approach will do SEEK instead of SCAN which will be good in terms of performance.
Let me try to explain why that is the scenario with a sample as in below table: For ease of use I did not considered two predicates dept and city instead I am considering only City.
Consider below Employee table:
CREATE TABLE [dbo].[Employee](
[EmployeeId] [int] NULL,
[EmployeeName] [varchar](20) NULL,
[Dept] [varchar](15) NULL,
[city] [varchar](15) NULL
) ON [PRIMARY]
GO
--Creating Clustered Index on Id
CREATE CLUSTERED INDEX [CI_Employee_EmployeeId] ON [dbo].[Employee] ( [EmployeeId] ASC)
--Loading Data
Loading Sample data
Insert into Employee
Select top (10000) EmployeeId = Row_Number() over (order by (Select NULL))
,EmployeeName = Concat ('Name ',Row_Number() over (order by (Select NULL)))
,Dept = Concat ('Dept ',(Row_Number() over (order by (Select NULL))) % 50)
,City = Concat ('City ',Row_Number() over (order by (Select NULL)))
from master..spt_values s1, master..spt_values s2
Now Executing simple query with normal predicate:
Declare #city varchar(15) = 'City 1500'
Select * from Employee where city = #city
--It Does Clustered Index Scan
Now creating an non-clustered Index on city
--Now adding Index on City
Create NonClustered Index NCI_Employee_City on dbo.Employee (city)
Declare #city varchar(15) = 'City 1500'
Select * from Employee where city = #city
--It Does Index Seek
Now coming to your isnull function
Since it forces function on each city it uses SCAN as below
Declare #city varchar(15) = 'City 1500'
Select * from Employee where city = isnull(#city, City)
go
Declare #city varchar(15) = 'City 1500'
Select * from Employee where city is null or city = #city
If you look at the overall percentage it takes more for IsNull function.
So if you have an Index all these will be helpfull else it is going to be scan anyway.

Related

SQL Server pivot values with different data types

i am trying to pivot all values in different type in MSSQL 2016. I could not find a way how i can pivot different data types..
The first table are initial form / structure. The second table is the desired shape.
I was trying the following SQL code to pivot my values
SELECT
[id] AS [id],
FIRSTNAME,
LASTNAME,
BIRTHDATE,
ADDRESS,
FLAG,
NUMBER
FROM (
SELECT
[cm].[key] AS [id],
[cm].[column] AS [column],
[cm].[string] AS [string],
[cm].[bit] AS [bit],
[cm].[xml] AS [xml],
[cm].[number] AS [number],
[cm].[date] AS [date]
FROM [cmaster] AS [cm]
) AS [t]
PIVOT (
MAX([string]) --!?!?
FOR [column] IN (
FIRSTNAME,
LASTNAME,
BIRTHDATE,
ADDRESS,
FLAG,
NUMBER
)
) AS [p]

I think your best bet is to use conditional aggregation, e.g.
SELECT cm.id,
FIRSTNAME = MAX(CASE WHEN cm.[property] = 'firstname' THEN cm.[string] END),
LASTNAME = MAX(CASE WHEN cm.[property] = 'lastname' THEN cm.[string] END),
BIRTHDATE = MAX(CASE WHEN cm.[property] = 'birthddate' THEN cm.[date] END),
FLAG = CONVERT(BIT, MAX(CASE WHEN cm.[bit] = 'flag' THEN CONVERT(TINYINT, cm.[boolean]) END)),
NUMBER = MAX(CASE WHEN cm.[property] = 'number' THEN cm.[integer] END)
FROM cmaster AS cm
GROUP BY cm.id;
Although, as you can see, your query becomes very tightly coupled to your EAV model, and why EAV is considered an SQL antipattern. Your alternative is to create a single column in your subquery and pivot on that, but you have to convert to a single data type, and lose a bit of type safety:
SELECT id, FIRSTNAME, LASTNAME, BIRTHDATE, ADDRESS, FLAG, NUMBER
FROM (
SELECT id = cm.[key],
[column] = cm.[column],
Value = CASE cm.type
WHEN 'NVARCHAR' THEN cm.string
WHEN 'DATETIME' THEN CONVERT(NVARCHAR(MAX), cm.date, 112)
WHEN 'XML' THEN CONVERT(NVARCHAR(MAX), cm.xml)
WHEN 'BIT' THEN CONVERT(NVARCHAR(MAX), cm.boolean)
WHEN 'INT' THEN CONVERT(NVARCHAR(MAX), cm.integer)
END
FROM cmaster AS cm
) AS t
PIVOT
(
MAX(Value)
FOR [column] IN (FIRSTNAME, LASTNAME, BIRTHDATE, ADDRESS, FLAG, NUMBER)
) AS p;

In order to make the result as per your request, first thing is we need to bring the data in to one format which is compatible with all data types. VARCHAR is ideal for that. Then prepare the base table using a simple select query, then PIVOT the result.
In the last projection, if you want, you can convert the data back in to the original format.
This query can be written dynamically as well to obtain the result as records are added. Here I provide the static answer according to your data. If you need a more generic dynamic answer, let me know. So I can post here.
--data insert scripts I used:
CREATE TABLE First_Table
(
[id] int,
[column] VARCHAR(10),
[string] VARCHAR(20),
[bit] BIT,
[xml] [xml],
[number] INT,
[date] DATE
)
SELECT GETDATE()
INSERT INTO First_Table VALUES(1, 'FIRST NAME', 'JOHN' , NULL, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'LAST NAME', 'DOE' , NULL, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'BIRTH DATE', NULL , NULL, NULL, NULL, '1985-02-25')
INSERT INTO First_Table VALUES(1, 'ADDRESS', NULL , NULL, 'SDFJDGJOKGDGKPDGKPDKGPDKGGKGKG', NULL, NULL)
INSERT INTO First_Table VALUES(1, 'FLAG', NULL , 1, NULL, NULL, NULL)
INSERT INTO First_Table VALUES(1, 'NUMBER', NULL , NULL, NULL, 20, NULL)
SELECT
PIVOTED.* FROM
(
--MAKING THE BASE TABLE FOR PIVOT
SELECT
[id]
,[column] AS [COLUMN]
, CASE WHEN [column] = 'FIRST NAME' then [string]
WHEN [column] = 'LAST NAME' then [string]
WHEN [column] = 'BIRTH DATE' then CAST([date] AS VARCHAR(100))
WHEN [column] = 'ADDRESS' then CAst([xml] as VARCHAR(100))
WHEN [column] = 'FLAG' then CAST([bit] AS VARCHAR(100))
else CAST([number] AS VARCHAR(100)) END AS [VALUE]
FROM First_Table
) AS [P]
PIVOT
(
MIN ([P].[VALUE])
FOR [column] in ([FIRST NAME],[LAST NAME],[BIRTH DATE],[ADDRESS],[FLAG],[NUMBER])
) AS PIVOTED
RESULT:

SQL：
SELECT
            ID,
            FIRSTNAME,
            ...,
            FLAG = CAST (FLAG AS INT),
            ...
FROM
            (
            SELECT
                        *
            FROM
                        (
                        SELECT
                                    f.ID,
                                    f.PROPERTY,
                                    f.STRING + f."INTEGER" + f.DATETIME + f.BOLLEAN + f.XML AS COLS
                        FROM
                                    FIRSTTBL f)
            PIVOT(
                        min(COLS) FOR PROPERTY IN
                                    (
                                    'firstname' AS firstname,
                                    'lastname' AS lastname,
                                    'birthdate' AS birthdate,
                                    'address' AS address,
                                    'flag' AS flag,
                                    'number' AS "NUMBER"
                                    )
                        )
            )
According to the original table, there is one and only one non-null value among STRING, INTEGER, DATETIME, BOLLEAN and XML columns for any row, so we just need to get the first non-null value and assign it to the corresponding new column. It is not difficult to perform the transposition using PIVOT function, except that we need to handle different data types according to the SQL rule, which requires that each column have a consistent type. For this task, first we need to convert the combined column values into a string, perform row-to-column transposition, and then convert string back to the proper types. When there are a lot of columns, the SQL statement can be tricky, and dynamic requirements are even hard to achieve.
Yet it is easy to write the code using the open-source esProc SPL:
　
A
1
=connect("MSSQL")
2
=A1.query#x("SELECT * FROM FIRSTTBL")
3
=A2.pivot(ID;PROPERTY,~.array().m(4:).ifn();"firstname":"FIRSTNAME", "lastname":"LASTANME","birthdate":"BIRTHDAY","address":"ADDRESS","flag":"FLAG","number":"NUMBER")
SPL does not require that data in the same column have consistent type. It is easy for it to maintain the original data types while performing the transposition.

Updating 20 rows in a table is really slow

I can't figure out why updating only 21 rows in a table takes so much time.
Step 1: I'm creating #tempTable from the StagingTable (it will never have more than 20 rows of data)
CREATE TABLE #tmpTable (
ID INT NULL,
UniqueID INT NULL,
ReportDate VARCHAR(15) NULL,
DOB Datetime NULL,
Weight VARCHAR(15) NULL,
Height VARCHAR(15) NULL)
INSERT INTO #tempTable (
ID,
UniqueID,
ReportDate,
DOB,
Weight,
Height)
SELECT
A.ID,
A.UniqueID,
A.ReportDate,
A.DOB,
A.Weight,
A.Height
FROM [testDB].[StagingTable] as A
WHERE A.UniqueID = '12345'
Step 2. Updating FinalTable:
UPDATE [Customers].[FinalTable]
SET ID = B.ID,
UniqueID = B.UniqueID,
ReportDate = B.ReportDate,
DOB = B.DOB,
Weight = B.Weight,
Height = B.Height
FROM #tempTable AS B
WHERE [Customers].[FinalTable].[ReportDate] = B.ReportDate
AND [Customers].[FinalTable].[DOB] = B.DOB
This query takes more than 30 minutes!
Is there any way to speed up this update process? Any ideas what I might be doing wrong?
I just want to add that the FinalTable has millions of rows...
Any help would be greatly appreciated.
Thanks!

If there are only 30 matches, then you want an index on #temptable(ReportDate, DOB):
create index idx_temptable_2 on #temptable(ReportDate, DOB);

If a table has an unindexed column with a 1 to many relationship to an indexed column, how to optimize a query for the unindexed column?

If there is a two column table MyTable with enough records that optimization of queries is relevant.
CorporationID int (unindexed)
BatchID int (indexed)
And lets assume there is always a 1 to many relationship between CorporationID and BatchID. In other words for each BatchID there will be only one CorporationID, but for each CorporationID there will be many BatchID values.
We need to get all BatchID values where corporationID = 1.
I know the simplest solution may be to just add an index to CorporationID, but assuming that is not allowed, is there some other way to inform SQL that each BatchID corresponds to only 1 CorporationID, through a query or otherwise?
select distinct batchid from MyTable where corporationID = 1
It seems this is not effective.
select batchid from (select min(corporationid) corporationid, batchid
from MyTable group by batchid) subselect where corporationid = 1
This is also not effective, I assume due to SQL needing to iterate needlessly through all values of corporationid? (Does an aggregate function exist to select any() value which would not have the overhead of min(), max(), sum() or avg()??)
select batchid
from (
select corporationid, batchid
from (
select *, ROW_NUMBER() OVER (PARTITION BY batchid ORDER BY(SELECT NULL)) AS RowNumber
from mytable
) subselect
where RowNumber = 1
) subselect2
where corporationid = 1
Would this work? By arbitrarily selecting the corporationid related to row number 1 after partitioning by batchid with no order?

"assuming it is not allowed to create an index" - this is a highly unlikely assumption. Of course, you should create the index.
The most direct answer to your alternate questions that lie within your question is "no". There is no function or sub query or view or other "read" action you can make to get a list of the batches for a given CorpID. You NEED to access the corpID data to do that... all your sample queries do not work because, at some point, they NEED to access the CorpIDs to know which rows to gather for BatchIDs. Any summary or "rollup" function that might exist would still NEED to access all the pages of data to "see" them. The reading of the pages cannot be avoided.
Without changes to your architecture, it's not physically possible to optimize your query further.
However, with some changes, you could have some options (but Id guess they are much uglier than just adding the index). For instance, you could modify the structure of your BatchID to include data for both the BatchID and the CorpID. Something like "8888899999999"... the 9's are the batchID and the 8's are the CorpID. This doesn't win you much though, you're not saving any index space, but at least you dont have to index the CorpID field :) Somethings like this could be done, but I wont share any others. I dont want the really experienced people here to see this stuff and get ill. :)
You need an index on CorpID if you want to improve performance.

If you don't have a lot of data, I suggest putting an index on the Corporation ID column. But if you have too much data, you can define an index for each Corporation ID

Part 01=>
/*01Create DB*/
IF DB_ID('Test01')>0
BEGIN
ALTER DATABASE Test01 SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE Test01
END
GO
CREATE DATABASE Test01
GO
USE Test01
Go
Part 02=>
/*02Create table*/
CREATE TABLE Table01(
ID INT PRIMARY KEY IDENTITY,
Title NVARCHAR(100),
CreationDate DATETIME,
CorporationID INT ,
MyID INT ,
[GuidId1] [uniqueidentifier] NOT NULL,
[GuidId2] [uniqueidentifier] NOT NULL,
[Code] [nvarchar](50) NULL
)
ALTER TABLE [dbo].[Table01] ADD DEFAULT (GETDATE()) FOR [CreationDate]
GO
ALTER TABLE [dbo].[Table01] ADD DEFAULT (NEWSEQUENTIALID()) FOR [GuidId1]
GO
ALTER TABLE [dbo].[Table01] ADD DEFAULT (NEWID()) FOR [GuidId2]
GO
CREATE TABLE Table02(
ID INT PRIMARY KEY IDENTITY,
Title NVARCHAR(100),
CreationDate DATETIME,
CorporationID INT ,
MyID INT ,
[GuidId1] [uniqueidentifier] NOT NULL,
[GuidId2] [uniqueidentifier] NOT NULL,
[Code] [nvarchar](50) NULL
)
ALTER TABLE [dbo].[Table02] ADD DEFAULT (GETDATE()) FOR [CreationDate]
GO
ALTER TABLE [dbo].[Table02] ADD DEFAULT (NEWSEQUENTIALID()) FOR [GuidId1]
GO
ALTER TABLE [dbo].[Table02] ADD DEFAULT (NEWID()) FOR [GuidId2]
GO
Part 03=>
/*03Add Data*/
DECLARE #I INT = 1
WHILE #I < 1000000
BEGIN
DECLARE #Title NVARCHAR(100) = 'TITLE '+ CAST(#I AS NVARCHAR(10)),
#CorporationID INT = CAST((RAND()*20) + 1 AS INT),
#Code NVARCHAR(50) = 'CODE '+ CAST(#I AS NVARCHAR(10)) ,
#MyID INT = CAST((RAND()*50) + 1 AS INT)
INSERT INTO Table01 (Title , CorporationID , Code , MyID )
VALUES ( #Title , #CorporationID , 'CODE '+ #Code , #MyID)
SET #I += 1
END
INSERT INTO Table02 ([Title], [CreationDate], [CorporationID], [MyID], [GuidId1], [GuidId2], [Code])
SELECT [Title], [CreationDate], [CorporationID], [MyID], [GuidId1], [GuidId2], [Code] FROM Table01
Part 04=>
/*04 CREATE INDEX*/
CREATE NONCLUSTERED INDEX IX_Table01_ALL
ON Table01 (CorporationID) INCLUDE (MyID) ;
DECLARE #QUERY NVARCHAR(MAX) = ''
DECLARE #J INT = 1
WHILE #J < 21
BEGIN
SET #QUERY += '
CREATE NONCLUSTERED INDEX IX_Table02_'+CAST(#J AS NVARCHAR(5))+'
ON Table02 (CorporationID) INCLUDE (MyID) WHERE CorporationID = '+CAST(#J AS NVARCHAR(5))+';'
SET #J+= 1
END
EXEC (#QUERY)
Part 05=>
/*05 READ DATA => PUSH Button CTRL + M ( EXECUTION PLAN) */
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT * FROM [dbo].[Table01] WHERE CorporationID = 10 AND MyID = 25
SELECT * FROM [dbo].[Table01] WITH(INDEX(IX_Table01_ALL)) WHERE CorporationID = 10 AND MyID = 25
SELECT * FROM [dbo].[Table02] WITH(INDEX(IX_Table02_10)) WHERE CorporationID = 10 AND MyID = 25
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
Notice IO , TIME , and EXECUTION PLAN .
Good luck

How do I search for ALL words within ANY columns of multiple Full Text indexes?

If I have two full text indexes on tables such as Contacts and Companies, how can I write a query that ensures ALL the words of the search phrase exist within either of the two indexes?
For example, if I'm searching for contacts where all the keywords exist in either the contact record or the company, how would I write the query?
I've tried doing CONTAINSTABLE on both the contact and company tables and then joining the tables together, but if I pass the search phrase in to each as '"searchTerm1*' AND '"searchTerm2*"' then it only matches when all the search words are on both indexes and returns too few records. If I pass it in like '"searchTerm1*' OR '"searchTerm2*"' then it matches where any (instead of all) of the search words are in either of the indexes and returns too many records.
I also tried creating an indexed view that joins contacts to companies so I could search across all the columns in one shot, but unfortunately a contact can belong to more than one company and so the ContactKey that I was going to use as the key for the view is no longer unique and so it fails to be created.
It seems like maybe I need to break the phrase apart and query for each word separately and then join the results back together to be able to ensure all the words were matched on, but I can't think of how I'd write that query.
Here's an example of what the model could look like:
Contact CompanyContact Company
-------------- -------------- ------------
ContactKey ContactKey CompanyKey
FirstName CompanyKey CompanyName
LastName
I have a Full Text index on FirstName,LastName and another on CompanyName.

This answer is rebuilt to address your issue such that multiple strings must exist ACROSS the fields. Note the single key in the CompanyContactLink linking table:
CREATE FULLTEXT CATALOG CompanyContact WITH ACCENT_SENSITIVITY = OFF
GO
CREATE TABLE Contact ( ContactKey INT IDENTITY, FirstName VARCHAR(20) NOT NULL, LastName VARCHAR(20) NOT NULL )
ALTER TABLE Contact ADD CONSTRAINT PK_Contact PRIMARY KEY NONCLUSTERED ( ContactKey )
CREATE TABLE Company ( CompanyKey INT IDENTITY, CompanyName VARCHAR(50) NOT NULL )
ALTER TABLE Company ADD CONSTRAINT PK_Company PRIMARY KEY NONCLUSTERED ( CompanyKey )
GO
CREATE TABLE CompanyContactLink ( CompanyContactKey INT IDENTITY NOT NULL, CompanyKey INT NOT NULL, ContactKey INT NOT NULL )
GO
INSERT INTO Contact ( FirstName, LastName ) VALUES ( 'Dipper', 'Pines' )
INSERT INTO Contact ( FirstName, LastName ) VALUES ( 'Mabel', 'Pines' )
INSERT INTO Contact ( FirstName, LastName ) VALUES ( 'Stanley', 'Pines' )
INSERT INTO Contact ( FirstName, LastName ) VALUES ( 'Soos', 'Ramirez' )
INSERT INTO Contact ( FirstName, LastName ) VALUES ( 'Wendy', 'Corduroy' )
INSERT INTO Contact ( FirstName, LastName ) VALUES ( 'Sheriff', 'Blubs' )
INSERT INTO Contact ( FirstName, LastName ) VALUES ( 'Bill', 'Cipher' )
INSERT INTO Contact ( FirstName, LastName ) VALUES ( 'Pine Dip', 'Nobody' )
INSERT INTO Contact ( FirstNAme, LastName ) VALUES ( 'Nobody', 'Pine Dip' )
INSERT INTO Company ( CompanyName ) VALUES ( 'Mystery Shack' )
INSERT INTO Company ( CompanyName ) VALUES ( 'Greesy Diner' )
INSERT INTO Company ( CompanyName ) VALUES ( 'Watertower' )
INSERT INTO Company ( CompanyName ) VALUES ( 'Manotaur Cave' )
INSERT INTO Company ( CompanyName ) VALUES ( 'Big Dipper Watering Hole' )
INSERT INTO Company ( CompanyName ) VALUES ( 'Lost Pines Dipping Pool' )
GO
INSERT INTO CompanyContactLink Values (3, 5), (1, 1), (1, 2), (1, 3), (1, 4), (1,5), (5,1), (3,1), (4,1)
GO
CREATE FULLTEXT INDEX ON Contact (LastName, FirstName)
KEY INDEX PK_Contact
ON CompanyContact
WITH STOPLIST = SYSTEM
CREATE FULLTEXT INDEX ON Company (CompanyName)
KEY INDEX PK_Company
ON CompanyContact
WITH STOPLIST = SYSTEM
GO
CREATE VIEW CompanyContactView
WITH SCHEMABINDING
AS
SELECT
CompanyContactKey,
CompanyName,
FirstName,
LastName
FROM
dbo.CompanyContactLink
INNER JOIN dbo.Company ON Company.CompanyKey = CompanyContactLink.CompanyKey
INNER JOIN dbo.Contact ON Contact.ContactKey = CompanyContactLink.ContactKey
GO
CREATE UNIQUE CLUSTERED INDEX idx_CompanyContactView ON CompanyContactView (CompanyContactKey);
GO
CREATE FULLTEXT INDEX ON CompanyContactView (CompanyName, LastName, FirstName)
KEY INDEX idx_CompanyContactView
ON CompanyContact
WITH STOPLIST = SYSTEM
GO
-- Wait a few moments for the FULLTEXT INDEXing to take place.
-- Check to see how the index is doing ... repeat the following line until you get a zero back.
DECLARE #ReadyStatus INT
SET #ReadyStatus = 1
WHILE (#ReadyStatus != 0)
BEGIN
SELECT #ReadyStatus = FULLTEXTCATALOGPROPERTY('CompanyContact', 'PopulateStatus')
END
SELECT
CompanyContactView.*
FROM
CompanyContactView
WHERE
FREETEXT((FirstName,LastName,CompanyName), 'Dipper') AND
FREETEXT((FirstName,LastName,CompanyName), 'Shack')
GO
And for the sake of your example with Wendy at the Watertower:
SELECT
CompanyContactView.*
FROM
CompanyContactView
WHERE
FREETEXT((FirstName,LastName,CompanyName), 'Wendy') AND
FREETEXT((FirstName,LastName,CompanyName), 'Watertower')
GO

I created a method that works with any number full text indexes and columns. Using this method, it is very easy to add additional facets to search for.
Split the search phrase into rows in a temp table
Join to this temp table to search for each search term using CONTAINSTABLE on each applicable full text index.
Union the results together and get the distinct count of the search terms found.
Filter out results where the number of search terms specified does not match the number of search terms found.
Example:
DECLARE #SearchPhrase nvarchar(255) = 'John Doe'
DECLARE #Matches Table(
MentionedKey int,
CoreType char(1),
Label nvarchar(1000),
Ranking int
)
-- Split the search phrase into separate words.
DECLARE #SearchTerms TABLE (Term NVARCHAR(100), Position INT)
INSERT INTO #SearchTerms (Term, Position)
SELECT dbo.ScrubSearchTerm(Term)-- Removes invalid characters and convert the words into search tokens for Full Text searching such as '"word*"'.
FROM dbo.SplitSearchTerms(#SearchPhrase)
-- Count the search words.
DECLARE #numSearchTerms int = (SELECT COUNT(*) FROM #SearchTerms)
-- Find the matching contacts.
;WITH MatchingContacts AS
(
SELECT
[ContactKey] = sc.[KEY],
[Ranking] = sc.[RANK],
[Term] = st.Term
FROM #SearchTerms st
CROSS APPLY dbo.SearchContacts(st.Term) sc -- I wrap my CONTAINSTABLE query in a Sql Function for convenience
)
-- Find the matching companies
,MatchingContactCompanies AS
(
SELECT
c.ContactKey,
Ranking = sc.[RANK],
st.Term
FROM #SearchTerms st
CROSS APPLY dbo.SearchCompanies(st.Term) sc
JOIN dbo.CompanyContact cc ON sc.CompanyKey = cc.CompanyKey
JOIN dbo.Contact c ON c.ContactKey = cc.ContactKey
)
-- Find the matches where ALL search words were found.
,ContactsWithAllTerms AS
(
SELECT
c.ContactKey,
Ranking = SUM(x.Ranking)
FROM (
SELECT ContactKey, Ranking, Term FROM MatchingContacts UNION ALL
SELECT ContactKey, Ranking, Term FROM MatchingContactCompanies
) x
GROUP BY c.ContactKey
HAVING COUNT(DISTINCT x.Term) = #numSearchTerms
)
SELECT
*
FROM ContactsWithAllTerms c
Update
Per the comments, here's an example of my SearchContacts function. It's just a simple wrapper function because I was using it in multiple procedures.
CREATE FUNCTION [dbo].[SearchContacts]
(
#contactsKeyword nvarchar(4000)
)
RETURNS #returntable TABLE
(
[KEY] int,
[RANK] int
)
AS
BEGIN
INSERT #returntable
SELECT [KEY],[RANK] FROM CONTAINSTABLE(dbo.Contact, ([FullName],[LastName],[FirstName]), #contactsKeyword)
RETURN
END
GO

ANSI NULLS and the ON clause

Please see the DDL below:
create table #address (ID int IDENTITY, housenumber varchar(30), street varchar(30), town varchar(30), county varchar(30), postcode varchar(30), primary key (id))
insert into #address (housenumber,street,town,county,postcode) values ('1', 'The Street', 'Lincoln', null, 'LN21AA')
insert into #address (housenumber,street,town,county,postcode) values ('1', 'The Street', 'Lincoln', null, 'LN21AA')
insert into #address (housenumber,street,town,county,postcode) values ('1', 'The Street', 'Lincoln', 'Lincolnshire', 'LN21AA')
and the SQL below:
select #address .id as masterid, address2.id as childid from #address inner join #address as address2 on
#address.housenumber=address2.housenumber and #address.street=address2.street
and #address.town=address2.town
and #address.county=address2.county
and #address.postcode=address2.postcode
where #address.id<address2.id
I am trying to identify duplicates.
The 'County' is null sometimes and is not null others. The query above returns no rows.
I have tried this command:
set ansi_nulls off
However, it makes no difference. I realise I can do this:
select #address .id as masterid, address2.id as childid from #address inner join #address as address2 on
#address.housenumber=address2.housenumber and #address.street=address2.street
and #address.town=address2.town
and ((#address.county=address2.county) or (#address.county is null and address2.county is null))
and #address.postcode=address2.postcode
However, I am interested to know why setting ansi nulls to off allows you to do this:
select * from #address where county=null
which returns two rows. However, my first query returns no rows when ANSI NULLs is off. Why does ANSI NULLS have no affect on the ON clause.
I have spend 20 minutes Googling this, however I have not found my answer.
where #address.id

You can identify duplicates by using group by. The following returns the ids when there are two values:
select housenumber, street, town, country postcode, count(*) as cnt,
min(a.id) as masterid, max(a.id) as childid
from #address a
group by housenumber, street, town, country postcode
having count(*) >= 2;
Getting all ids for a given address would require additional joins or funky string aggregations.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

performance impact of making where clause dummy - SQL Server - sql

Related

SQL Server pivot values with different data types

Updating 20 rows in a table is really slow

If a table has an unindexed column with a 1 to many relationship to an indexed column, how to optimize a query for the unindexed column?

How do I search for ALL words within ANY columns of multiple Full Text indexes?

ANSI NULLS and the ON clause

Categories

Resources