SQL joins before where clause when selecting columns? - sql

This is kind of a weird one. This particular piece of bad database design has caught me out so many times and I've always had to make ridiculous work arounds and this is no exception. To summarize: I have 3 tables, the first one is a lookup table of questions, the second is a lookup table of answers and the third stores a questions and answer id to show which questions have been answered. So far straight forward.
However the answer can be 1 of 3 types: Free text, multiple choice or multiple selection and these are all stored in the same column (Answer). Free text can be anything, like 'Hello' or a datetime '2015-07-03 00:00:00'. Multiple choice gets stored as integers 1 or 49 etc and Multiple selection gets stored as a delimited string '1,4,7,8' (i know this is very bad design, a column shouldn't store more than 1 value however it is before my time and written into our aspx web application, as I work on my own I simply do not have the resource or time to change it)
Here comes the problem; take a look at this query:
Select *
FROM AnswersTable
JOIN LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID
Where LK_Questions.QuestionTypeID = 1
The where clause should ensure that the only questions that are returned are multiple choice. (So I am not joining a free text answer to an integer) and in fact when I run this query it runs ok but when i try to select individual columns it errors out with this error message:
Conversion failed when converting the varchar value ',879' to data type smallint.
It almost like it's doing the join before it does the where although I know the query optimizer doesn't work that way. The problem is I need to select column names as this is going into a table so I need to define the column names. ?Is there anything I can do? I've tried for ages but with no results. I should mention that I am running SQL Server 2005.
Many thanks in advance
EDIT:
This is the query that causes an error:
Select LK_Answers.Answer
FROM AnswersTable
JOIN LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID
Where LK_Questions.QuestionTypeID = 1

The issue in your query is that you have a datatype mismatch on the columns you're using to join your tables together.
One of the easiest ways to correct this is to explicitly CAST both sides of the join to VARCHAR so that there is no problem matching datatypes when joining. This isn't ideal, but if you're not able to change the table schema then you have to work around it.
SQL Fiddle Demo
CREATE TABLE LeftTable
(
id INT
);
CREATE TABLE RightTable
(
id VARCHAR(30)
);
INSERT INTO LeftTable (id)
VALUES (1), (2), (3), (4), (5);
INSERT INTO RightTable (id)
VALUES ('1'), ('2'), ('3'), ('4'), ('5'), (',879');
SELECT l.*, r.*
FROM LeftTable l
JOIN RightTable r ON CAST(l.id AS VARCHAR(30)) = CAST(r.id AS VARCHAR(30))
WHERE l.id = '1'

You can use a subselect for this.
Select *
FROM AnswersTable
JOIN ( SELECT * FROM LK_Questions Where QuestionTypeID = 1) as LK_Questions
ON AnswersTable.QuestionID = LK_Questions.QuestionID
JOIN LK_Answers
ON AnswersTable.Answer = LK_Answers.AnswerID

You are joining the ID field of the lookup table against the answer field of the answer table. Is this OK?

Pay attention to those fields;
AnswersTable.QuestionID, LK_Questions.QuestionID
AnswersTable.Answer , LK_Answers.AnswerID and LK_Questions.QuestionTypeID
are they sharing the same datatype?
if so, you should change your query like;
SELECT * FROM AnswersTable RIGHT JOIN LK_Questions ON LK_Questions.QuestionTypeID = AnswersTable.QuestionID JOIN LK_Answers ON AnswersTable.Answer = LK_Answers.AnswerID where LK_Questions.QuestionTypeID = 1

Related

Oracle Invalid Number in Join Clause

I am getting an Oracle Invalid Number error that doesn't make sense to me. I understand what this error means but it should not be happening in this case. Sorry for the long question, but please bear with me so I can explain this thoroughly.
I have a table which stores IDs to different sources, and some of the IDs can contain letters. Therefore, the column is a VARCHAR.
One of the sources has numeric IDs, and I want to join to that source:
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID);
In most cases this works, but depending on random things such as what columns are in the select clause, if it uses a left join or an inner join, etc., I will start seeing the Invalid Number error.
I have verified multiple times that all entries in AGG_MATCHES where AGGSRC = 'source_a' do not contain non numeric characters in the AGGPROJ_ID column:
-- this returns no results
SELECT AGGPROJ_ID
FROM AGG_MATCHES
WHERE AGGSRC = 'source_a' AND REGEXP_LIKE(AGGPROJ_ID, '[^0-9]');
I know that Oracle basically rewrites the query internally for optimization. Going back to the first SQL example, my best guess is that depending on how the entire query is written, in some cases Oracle is trying to perform the JOIN before the sub query. In other words, it's trying to join the entire AGG_MATCHES tables to SOURCE_A instead of just the subset returned by the sub query. If so, there would be rows that contain non numeric values in the AGGPROJ_ID column.
Does anyone know for certain if this is what's causing the error? If it is the reason, is there anyway for me to force Oracle to execute the sub query part first so it's only trying to join a subset of the AGG_MATCHES table?
A little more background:
This is obviously a simplified example to illustrate the problem. The AGG_MATCHES table is used to store "matches" between different sources (i.e. projects). In other words, it's used to say that a project in sourceA is matched to a project in sourceB.
Instead of writing the same SQL over and over, I've created views for the sources we commonly use. The idea is to have a view with two columns, one for SourceA and one for SourceB. For this reason, I don't want to use TO_CHAR on the ID column of the source table, because devs would have to remember to do this every time they are doing a join, and I'm trying to remove code duplication. Also, since the ID in SOURCE_A is a number, I feel that any view storing SOURCE_A.ID should go ahead and convert it to a number.
You are right that Oracle is executing the statement in a different order than what you wrote, causing conversion errors.
The best ways to fix this problem, in order, are:
Change the data model to always store data as the correct type. Always store numbers as numbers, dates as dates, and strings as strings. (You already know this and said you can't change your data model, this is a warning for future readers.)
Convert numbers to strings with a TO_CHAR.
If you're on 12.2, convert strings to numbers using the DEFAULT return_value ON CONVERSION ERROR syntax, like this:
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID default null on conversion error);
Add a ROWNUM to an inline view to prevent optimizer transformations that may re-write statements. ROWNUM is always evaluated at the end and it forces Oracle to run things in a certain order, even if the ROWNUM isn't used. (Officially hints are the way to do this, but getting hints right is too difficult.)
SELECT *
FROM (
SELECT AGGPROJ_ID -- this column is a VARCHAR
FROM AGG_MATCHES -- this is the table storing the matches
WHERE AGGSRC = 'source_a'
--Prevent optimizer transformations for type safety.
AND ROWNUM >= 1
) m
JOIN SOURCE_A a ON a.ID = TO_NUMBER(m.AGGPROJ_ID);
I think the simplest solution uses case, which has more guarantees on the order of evaluation:
SELECT a.*
FROM AGG_MATCHES m JOIN
SOURCE_A a
ON a.ID = (CASE WHEN m.AGGSRC = 'source_a' THEN TO_NUMBER(m.AGGPROJ_ID) END);
Or, better yet, convert to strings:
SELECT a.*
FROM AGG_MATCHES m JOIN
SOURCE_A a
ON TO_CHAR(a.ID) = m.AGGPROJ_ID AND
m.AGGSRC = 'source_a' ;
That said, the best advice is to fix the data model.
Possibly the best solution in your case is simply a view or a generate column:
create view v_agg_matches_a as
select . . .,
(case when regexp_like(AGGPROJ_ID, '^[0-9]+$')
then to_number(AGGPROJ_ID)
end) as AGGPROJ_ID
from agg_matches am
where m.AGGSRC = 'source_a';
The case may not be necessary if you use a view, but it is safer.
Then use the view in subsequent queries.

Long SQL subquery trouble

I just registered and want to ask.
I learn sql queries not so long time and I got a trouble when I decided to move a table to another database. A few articles were read about building long subqueries , but they didn't help me.
Everything works perfect before that my action.
I just moved the table and tried to rewrite the query while whole day.
update [dbo].Full
set [salary] = 1000
where [dbo].Full.id in (
select distinct k1.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.S_School)
) k1
where k1.id not in (
select distinct k2.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.Shool)
) k2,
List_School t3
where charindex (t3.NameApp, k2.Topic)>5
)
)
I moved table List_School to database [DB_1] and I can't to bend with it.
I can't write [DB_1].dbo.List_School. Should I use one more subquery?
I even thought about create a few temporary tables but it can influence on speed of execution.
Sql gurus , please invest some your time on me. Thank you in advance.
I will be happy for each hint, which you give me.
There appear to be a number of issues. You are comparing the user column to the topic_name column. An expected meaning of those column names would suggest you are not comparing the correct columns. But that is a guess.
In the final subquery you have an ansi join on table List_School but no join columns which means the join witk k2 is a cartesian product (aka cross join) which is not what you would want in most situations. Again a guess as no details of actual problem data or error messages was provided.

fastest way to compare two columns with different data types

I have two tables that I need to join over a linked server, but I have a problem with the source data that I am stuck with for now.
The column names that I need to join on are account_number and member_number respectively.
My issue is that account_number is a varchar(10) and is always padded with leading zeros, but member_number is a varchar(12) (don't ask why, the last 2 are never used) but is not padded with leading zeros.
If we say that account_number is in A and member_number is in B, I have come up with the following solutions:
SELECT * FROM
A INNER JOIN B
ON CAST(A.account_number AS BIGINT) = CAST(B.member_number AS BIGINT)
and
SELECT * FROM
A INNER JOIN B
ON A.account_number = RIGHT('0000000000'+B.member_number, 10)
The problem is that they are super slow!
It must be the fact that the functions are forcing table scans, but I'm not sure what else to do about this. Is there any way to do this comparison that is faster? Maybe with some variation of like or something?
The fastest way is to create a computed column so they are of the same types and then build an index on that column. Something like:
alter table b add account_number as ( RIGHT('0000000000'+B.member_number, 10) );
create index b_acount_number on b(account_number);
Then run the query as:
SELECT *
FROM A INNER JOIN
B
ON A.account_number = b.account_number;
That is probably the fastest you can get.

What if the column to be indexed is nvarchar data type in SQL Server?

I retrieve data by joining multiple tables as indicated on the image below. On the other hand, as there is no data in the FK column (EmployeeID) of Event table, I have to use CardNo (nvarchar) fields in order to join the two tables. On the other hand, the digit numbers of CardNo fields in the Event and Employee tables are different, I also have to use RIGHT function of SQL Server and this makes the query to be executed approximately 10 times longer. So, in this scene what should I do? Can I use CardNo field without changing its data type to int, etc (because there are other problem might be seen after changing it and it sill be better to find a solution without changing the data type of it). Here is also execution plan of the query below.
Query:
; WITH a AS (SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNo
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID),
b AS (SELECT eve.EventID, eve.EventTime, eve.CardNo, evt.EventCH, dor.DoorName
FROM TEvent eve LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID)
SELECT * FROM b LEFT JOIN a ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
ORDER BY b.EventID ASC
You can add a computed column to your table like this:
ALTER TABLE TEmployee -- Don't start your table names with prefixes, you already know they're tables
ADD CardNoRight8 AS RIGHT(CardNo, 8) PERSISTED
ALTER TABLE TEvent
ADD CardNoRight8 AS RIGHT(CardNo, 8) PERSISTED
CREATE INDEX TEmployee_CardNoRight8_IDX ON TEmployee (CardNoRight8)
CREATE INDEX TEvent_CardNoRight8_IDX ON TEvent (CardNoRight8)
You don't need to persist the column since it already matches the criteria for a computed column to be indexed, but adding the PERSISTED keyword shouldn't hurt and might help the performance of other queries. It will cause a minor performance hit on updates and inserts, but that's probably fine in your case unless you're importing a lot of data (millions of rows) at a time.
The better solution though is to make sure that your columns that are supposed to match actually match. If the right 8 characters of the card number are something meaningful, then they shouldn't be part of the card number, they should be another column. If this is an issue where one table uses leading zeroes and the other doesn't then you should fix that data to be consistent instead of putting together work arounds like this.
This line is what is costing you 86% of the query time:
LEFT JOIN a ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
This is happening because it has to run RIGHT() on those fields for every row and then match them with the other table. This is obviously going to be inefficient.
The most straightforward solution is probably to either remove the RIGHT() entirely or else to re-implement it as a built-in column on the table so it doesn't have to be calculated on the fly while the query is running.
While inserting the record, you would have to also insert the eight, right digits of the card number and store it in this field. My original thought was to use a computed column but I don't think those can be indexed so you'd have to use a regular column.
; WITH a AS (
SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNoRightEight
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID
),
b AS (
SELECT eve.EventID, eve.EventTime, eve.CardNoRightEight, evt.EventCH, dor.DoorName
FROM TEvent eve LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID
)
SELECT *
FROM b
LEFT JOIN a ON a.CardNoRightEight = b.CardNoRightEight
ORDER BY b.EventID ASC
This will help you see how to add a calculated column to your database.
create table #temp (test varchar(30))
insert into #temp
values('000456')
alter table #temp
add test2 as right(test, 3) persisted
select * from #temp
The other alternative is to fix the data and the data entry so that both columns are the same data type and contain the same leading zeros (or remove them)
Many thanks all of your help. With the help of your answers, I managed to reduce the query execution time from 2 minutes to 1 at the first step after using computed columns. After that, when creating an index for these columns, I managed to reduce the execution time to 3 seconds. Wow, it is really perfect :)
Here are the steps posted for those who suffers from a similar problem:
Step I: Adding computed columns to the tables (As CardNo fields are nvarchar data type, I specify data type of computed columns as int):
ALTER TABLE TEvent ADD CardNoRightEight AS RIGHT(CAST(CardNo AS int), 8)
ALTER TABLE TEmployee ADD CardNoRightEight AS RIGHT(CAST(CardNo AS int), 8)
Step II: Create index for the computed columns in order to execute the query faster:
CREATE INDEX TEmployee_CardNoRightEight_IDX ON TEmployee (CardNoRightEight)
CREATE INDEX TEvent_CardNoRightEight_IDX ON TEvent (CardNoRightEight)
Step 3: Update the query by using the computed columns in it:
; WITH a AS (
SELECT emp.EmployeeName, emp.Status, dep.DeptName, job.JobName, emp.CardNoRightEight --emp.CardNo
FROM TEmployee emp
LEFT JOIN TDeptA AS dep ON emp.DeptAID = dep.DeptID
LEFT JOIN TJob AS job ON emp.JobID = job.JobID
),
b AS (
SELECT eve.EventID, eve.EventTime, evt.EventCH, dor.DoorName, eve.CardNoRightEight --eve.CardNo
FROM TEvent eve
LEFT JOIN TEventType AS evt ON eve.EventType = evt.EventID
LEFT JOIN TDoor AS dor ON eve.DoorID = dor.DoorID)
SELECT * FROM b LEFT JOIN a ON a.CardNoRightEight = b.CardNoRightEight --ON RIGHT(a.CardNo, 8) = RIGHT(b.CardNo, 8)
ORDER BY b.EventID ASC

Why does my left join in Access have fewer rows than the left table?

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.
SELECT TblIndividuals.PersonID AS OldID,
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName)
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName)
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or (TblIndividuals.DateBorn is null
and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));
TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.
If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.
Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?
ON (
TblIndividuals.FirstName = TblIndividualsUpdates.FirstName
and
TblIndividuals.LastName = TblIndividualsUpdates.LastName
AND (
TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or
(
TblIndividuals.DateBorn is null
and
(
TblIndividuals.MidName is null
and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName
)
)
)
);
What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.
This should never happen. Unless rows are being inserted/deleted in the meantime,
the query:
SELECT *
FROM a LEFT JOIN b
ON whatever ;
should never return less rows than:
SELECT *
FROM a ;
If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).
Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected
4,149 rows with this simplified query.
SELECT
i.PersonID AS OldID,
u.PersonID AS UpdateID
FROM
TblIndividualsUpdates AS u
LEFT JOIN TblIndividuals AS i
ON
(
(i.FirstName = u.FirstName)
AND (i.LastName = u.LastName)
AND (i.DateBorn = u.DateBorn)
);
For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.
The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.
I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.
It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.