What if the column to be indexed is nvarchar data type in SQL Server? - sql

I retrieve data by joining multiple tables as indicated on the image below. On the other hand, as there is no data in the FK column (EmployeeID) of Event table, I have to use CardNo (nvarchar) fields in order to join the two tables. On the other hand, the digit numbers of CardNo fields in the Event and Employee tables are different, I also have to use RIGHT function of SQL Server and this makes the query to be executed approximately 10 times longer. So, in this scene what should I do? Can I use CardNo field without changing its data type to int, etc (because there are other problem might be seen after changing it and it sill be better to find a solution without changing the data type of it). Here is also execution plan of the query below.
Query:
; WITH a AS (SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNo
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID),
b AS (SELECT eve.EventID, eve.EventTime, eve.CardNo, evt.EventCH, dor.DoorName
FROM TEvent eve LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID)
SELECT * FROM b LEFT JOIN a ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
ORDER BY b.EventID ASC

You can add a computed column to your table like this:
ALTER TABLE TEmployee -- Don't start your table names with prefixes, you already know they're tables
ADD CardNoRight8 AS RIGHT(CardNo, 8) PERSISTED
ALTER TABLE TEvent
ADD CardNoRight8 AS RIGHT(CardNo, 8) PERSISTED
CREATE INDEX TEmployee_CardNoRight8_IDX ON TEmployee (CardNoRight8)
CREATE INDEX TEvent_CardNoRight8_IDX ON TEvent (CardNoRight8)
You don't need to persist the column since it already matches the criteria for a computed column to be indexed, but adding the PERSISTED keyword shouldn't hurt and might help the performance of other queries. It will cause a minor performance hit on updates and inserts, but that's probably fine in your case unless you're importing a lot of data (millions of rows) at a time.
The better solution though is to make sure that your columns that are supposed to match actually match. If the right 8 characters of the card number are something meaningful, then they shouldn't be part of the card number, they should be another column. If this is an issue where one table uses leading zeroes and the other doesn't then you should fix that data to be consistent instead of putting together work arounds like this.

This line is what is costing you 86% of the query time:
LEFT JOIN a ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
This is happening because it has to run RIGHT() on those fields for every row and then match them with the other table. This is obviously going to be inefficient.
The most straightforward solution is probably to either remove the RIGHT() entirely or else to re-implement it as a built-in column on the table so it doesn't have to be calculated on the fly while the query is running.
While inserting the record, you would have to also insert the eight, right digits of the card number and store it in this field. My original thought was to use a computed column but I don't think those can be indexed so you'd have to use a regular column.
; WITH a AS (
SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNoRightEight
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID
),
b AS (
SELECT eve.EventID, eve.EventTime, eve.CardNoRightEight, evt.EventCH, dor.DoorName
FROM TEvent eve LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID
)
SELECT *
FROM b
LEFT JOIN a ON a.CardNoRightEight = b.CardNoRightEight
ORDER BY b.EventID ASC

This will help you see how to add a calculated column to your database.
create table #temp (test varchar(30))
insert into #temp
values('000456')
alter table #temp
add test2 as right(test, 3) persisted
select * from #temp
The other alternative is to fix the data and the data entry so that both columns are the same data type and contain the same leading zeros (or remove them)

Many thanks all of your help. With the help of your answers, I managed to reduce the query execution time from 2 minutes to 1 at the first step after using computed columns. After that, when creating an index for these columns, I managed to reduce the execution time to 3 seconds. Wow, it is really perfect :)
Here are the steps posted for those who suffers from a similar problem:
Step I: Adding computed columns to the tables (As CardNo fields are nvarchar data type, I specify data type of computed columns as int):
ALTER TABLE TEvent ADD CardNoRightEight AS RIGHT(CAST(CardNo AS int), 8)
ALTER TABLE TEmployee ADD CardNoRightEight AS RIGHT(CAST(CardNo AS int), 8)
Step II: Create index for the computed columns in order to execute the query faster:
CREATE INDEX TEmployee_CardNoRightEight_IDX ON TEmployee (CardNoRightEight)
CREATE INDEX TEvent_CardNoRightEight_IDX ON TEvent (CardNoRightEight)
Step 3: Update the query by using the computed columns in it:
; WITH a AS (
SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNoRightEight --emp.CardNo
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID
),
b AS (
SELECT eve.EventID, eve.EventTime, evt.EventCH, dor.DoorName, eve.CardNoRightEight --eve.CardNo
FROM TEvent eve
LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID)
SELECT * FROM b LEFT JOIN a ON a.CardNoRightEight = b.CardNoRightEight --ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
ORDER BY b.EventID ASC

Related

fastest way to compare two columns with different data types

I have two tables that I need to join over a linked server, but I have a problem with the source data that I am stuck with for now.
The column names that I need to join on are account_number and member_number respectively.
My issue is that account_number is a varchar(10) and is always padded with leading zeros, but member_number is a varchar(12) (don't ask why, the last 2 are never used) but is not padded with leading zeros.
If we say that account_number is in A and member_number is in B, I have come up with the following solutions:
SELECT * FROM
A INNER JOIN B
ON CAST(A.account_number AS BIGINT) = CAST(B.member_number AS BIGINT)
and
SELECT * FROM
A INNER JOIN B
ON A.account_number = RIGHT('0000000000'+B.member_number, 10)
The problem is that they are super slow!
It must be the fact that the functions are forcing table scans, but I'm not sure what else to do about this. Is there any way to do this comparison that is faster? Maybe with some variation of like or something?
The fastest way is to create a computed column so they are of the same types and then build an index on that column. Something like:
alter table b add account_number as ( RIGHT('0000000000'+B.member_number, 10) );
create index b_acount_number on b(account_number);
Then run the query as:
SELECT *
FROM A INNER JOIN
B
ON A.account_number = b.account_number;
That is probably the fastest you can get.

How to create a table with multiple columns

Struggling with a create table statement which is based on this select into statement below:
#MaxAPDRefundAmount money = 13.00
...
select pkd.*, pd.ProviderReference, per.FirstName, per.Surname, #MaxAPDRefundAmount [MaxAPDRefundAmount],commission.Type [Commission] into #StagedData from CTE_PackageData pkd
inner join J2H.dbo.Package pk on pkd.Reference = pk.Reference
inner join J2H.dbo.Product pd on pk.PackageId = pd.PackageId
inner join J2H.dbo.FlightReservation fr on pd.ProductId = fr.ProductId
and fr.FlightBoundID = 1
inner join J2H.dbo.ProductPerson pp on pd.ProductId = pp.ProductID
and pp.StatusId < 7
inner join J2H.dbo.Flight f on fr.FlightId = f.FlightID
inner join J2H.dbo.Person per on pk.PackageId = per.PackageId
and per.PersonId = pp.PersonId
inner join J2H.dbo.PersonType pt on per.PersonTypeId = pt.PersonTypeID
We are changing a select into to just normal insert and select, so need a create table (we are going to create a temp (hash tag table) and not declaring a variable table. Also there is a pkd.* at the start as well so I am confused in knowing which fields to include in the create table. Do I include all the fields in the select statement into the create statement?
Update:
So virtually I know I need to include the data types below but I can just do:
create table #StagedData
(
pkd.*,
pd.ProviderReference,
per.FirstName,
per.Surname,
#MaxAPDRefundAmount [MaxAPDRefundAmount],
commission
)
"Do I include all the fields in the select statement into the create statement?" Well, that depends, if you need them, than yes, if not than no. It's impossible for us to say whether you need them... If you're running this exact query as insert, than yes.
As for the create statement, you can run the query you have, but replace into #StagedData with something like into TEMP_StagedData. In management studio you can let sql server build the create query for you: right-click the newly created TEMP_StagedData table in the object explorer (remember to refresh), script Table as, CREATE To and select New Query Editor Window.
The documentation of the CREATE TABLE statement is pretty straightforward.
No. Clearly, you cannot use pkd.* in a create table statement.
What you can do is run your old SELECT INTO statement as a straight SELECT (remove the INTO #stagedata) part, and look at the columns that get returned by the SELECT.
Then write a CREATE TABLE statement that includes those columns.
To create a table from a SELECT without inserting data, add a WHERE clause that never returns True.
Like this:
SELECT * INTO #TempTable FROM Table WHERE 1=0
Once the table with the columns for your SELECT, you can add additional columns with ALTER TABLE.
ALTER TABLE #TempTable ALL ExtraColumn INT
Then do your INSERT/SELECT.

SQL joins before where clause when selecting columns?

This is kind of a weird one. This particular piece of bad database design has caught me out so many times and I've always had to make ridiculous work arounds and this is no exception. To summarize: I have 3 tables, the first one is a lookup table of questions, the second is a lookup table of answers and the third stores a questions and answer id to show which questions have been answered. So far straight forward.
However the answer can be 1 of 3 types: Free text, multiple choice or multiple selection and these are all stored in the same column (Answer). Free text can be anything, like 'Hello' or a datetime '2015-07-03 00:00:00'. Multiple choice gets stored as integers 1 or 49 etc and Multiple selection gets stored as a delimited string '1,4,7,8' (i know this is very bad design, a column shouldn't store more than 1 value however it is before my time and written into our aspx web application, as I work on my own I simply do not have the resource or time to change it)
Here comes the problem; take a look at this query:
Select *
FROM AnswersTable
JOIN LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID
Where LK_Questions.QuestionTypeID = 1
The where clause should ensure that the only questions that are returned are multiple choice. (So I am not joining a free text answer to an integer) and in fact when I run this query it runs ok but when i try to select individual columns it errors out with this error message:
Conversion failed when converting the varchar value ',879' to data type smallint.
It almost like it's doing the join before it does the where although I know the query optimizer doesn't work that way. The problem is I need to select column names as this is going into a table so I need to define the column names. ?Is there anything I can do? I've tried for ages but with no results. I should mention that I am running SQL Server 2005.
Many thanks in advance
EDIT:
This is the query that causes an error:
Select LK_Answers.Answer
FROM AnswersTable
JOIN LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID
Where LK_Questions.QuestionTypeID = 1
The issue in your query is that you have a datatype mismatch on the columns you're using to join your tables together.
One of the easiest ways to correct this is to explicitly CAST both sides of the join to VARCHAR so that there is no problem matching datatypes when joining. This isn't ideal, but if you're not able to change the table schema then you have to work around it.
SQL Fiddle Demo
CREATE TABLE LeftTable
(
id INT
);
CREATE TABLE RightTable
(
id VARCHAR(30)
);
INSERT INTO LeftTable (id)
VALUES (1), (2), (3), (4), (5);
INSERT INTO RightTable (id)
VALUES ('1'), ('2'), ('3'), ('4'), ('5'), (',879');
SELECT l.*, r.*
FROM LeftTable l
JOIN RightTable r ON CAST(l.id AS VARCHAR(30)) = CAST(r.id AS VARCHAR(30))
WHERE l.id = '1'
You can use a subselect for this.
Select *
FROM AnswersTable
JOIN ( SELECT * FROM LK_Questions Where QuestionTypeID = 1) as LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID
You are joining the ID field of the lookup table against the answer field of the answer table. Is this OK?
Pay attention to those fields;
AnswersTable.QuestionID, LK_Questions.QuestionID
AnswersTable.Answer , LK_Answers.AnswerID and LK_Questions.QuestionTypeID
are they sharing the same datatype?
if so, you should change your query like;
SELECT * FROM AnswersTable RIGHT JOIN LK_Questions ON LK_Questions.QuestionTypeID = AnswersTable.QuestionID JOIN LK_Answers ON AnswersTable.Answer = LK_Answers.AnswerID where LK_Questions.QuestionTypeID = 1

INNER LOOP JOIN Failing

I need to update a field called FamName in a table called Episode with randomly generated Germanic names from a different table called Surnames which has a single column called Surname.
To do this I have first added an ID field and NONCLUSTERED INDEX to my Surnames table
ALTER TABLE Surnames
ADD ID INT NOT NULL IDENTITY(1, 1);
GO
CREATE UNIQUE NONCLUSTERED INDEX idx ON Surnames(ID);
GO
I then attempt to update my Episode table via
UPDATE E
SET E.FamName = S.Surnames
FROM Episode AS E
INNER LOOP JOIN Surnames AS S
ON S.ID = (1 + ABS(CRYPT_GEN_RANDOM(8) % (SELECT COUNT(*) FROM Surnames)));
GO
where I am attempting to force the query to 'loop' using the LOOP join hint. Of course if I don't force the optimizer to loop (using LOOP) I will get the same German name for all rows. However, this query is strangely returning zero rows affected.
Why is this returning zero affected rows and how can this be amended to work?
Note, I could use a WHILE loop to perform this update, but I want a succinct way of doing this and to find out what I am doing wrong in this particular case.
You cannot (reliably) affect query results with join hints. They are performance hints, not semantic hints. You try to rely on undefined behavior.
Moving the random number computation out of the join condition into one of the join sources prevents the expression to be treated as a constant:
UPDATE E
SET E.FamName = S.Surnames
FROM (
SELECT *, (1 + ABS(CRYPT_GEN_RANDOM(8) % (SELECT COUNT(*) FROM Surnames))) AS SurnameID
FROM Episode AS E
) E
INNER LOOP JOIN Surnames AS S ON S.ID = E.SurnameID
The derived table E adds the computed SurnameID as a new column.
You don't need join hints any longer. I just tested that this works in my specific test case although I'm not whether this is guaranteed to work.

Why does my left join in Access have fewer rows than the left table?

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.
SELECT TblIndividuals.PersonID AS OldID,
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName)
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName)
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or (TblIndividuals.DateBorn is null
and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));
TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.
If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.
Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?
ON (
TblIndividuals.FirstName = TblIndividualsUpdates.FirstName
and
TblIndividuals.LastName = TblIndividualsUpdates.LastName
AND (
TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or
(
TblIndividuals.DateBorn is null
and
(
TblIndividuals.MidName is null
and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName
)
)
)
);
What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.
This should never happen. Unless rows are being inserted/deleted in the meantime,
the query:
SELECT *
FROM a LEFT JOIN b
ON whatever ;
should never return less rows than:
SELECT *
FROM a ;
If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).
Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected
4,149 rows with this simplified query.
SELECT
i.PersonID AS OldID,
u.PersonID AS UpdateID
FROM
TblIndividualsUpdates AS u
LEFT JOIN TblIndividuals AS i
ON
(
(i.FirstName = u.FirstName)
AND (i.LastName = u.LastName)
AND (i.DateBorn = u.DateBorn)
);
For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.
The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.
I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.
It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.