SQL: Except Join on multiple queries - sql

Getting a bit stuck trying to build this query. (SQL SERVER)
I'm trying to join two tables on similar rows, but then stack the unique rows from both table 1 and table 2 on the result set. I was first shooting for a full outer join, but it leaves my key fields blank when the data comes from only one of the tables.
Example: Full Outer Join
Here's what I would like for the query to be able to do:
Essentially, I would like to have a result table where the key fields (Part and Operation) are all returned in two columns (so like a union), but the Estimated and Actual Rate columns returned side by side where there is a matching row between table 1 and table 2.
I've also been trying to inner join the two tables to make a subquery, then using that inner join for except clause on each of the tables, then stacking the original inner join with the two except unions.
Current Attempt: One Join, Two Excepts, Two Unions
UPDATE: I got the current attempt to return values! It's a bit complicated though, Appreciate any advice or feedback though! Great answers below thanks, I will need to do some comparisons
Thanks

SELECT ISNULL(t1.part,t2.part) AS Part,
ISNULL(t1.operation,t2.operation) AS Operation,
ISNULL('Estimated Rate',0) AS 'Estimated Rate',
ISNULL('Actual Rate',0) AS 'Actual Rate'
FROM table1 t1
FULL OUTER JOIN table2 t2
ON t1.part = t2.part
AND t1.operation = t2.operation

I would do this as a union all and group by:
select part, operation,
sum(estimatedrate) as estimatedrate, sum(actualrate) as actualrate
from ((select part, operation, estimatedrate, 0 as actualrate
from table1
) union all
(select part, operation, 0 as estimatedrate, 0 actualrate
from table1
)
) er
group by part, operation;

Related

Alternative for joining two tables multiple times

I have a situation where I have to join a table multiple times. Most of them need to be left joins, since some of the values are not available. How to overcome the query poor performance when joining multiple times?
The Scenario
Tables
[Project]: ProjectId Guid, Name VARCHAR(MAX).
[UDF]: EntityId Guid, EntityType Char(1), UDFCode Guid, UDFName varchar(20)
[UDFDetail]: UDFCode Guid, Description VARCHAR(MAX)
Relationship:
[Project].ProjectId - [UDF].EntityId
[UDFDetail].UDFCode - [UDF].UDFCode
The UDF table holds custom fields for projects, based on the UDFName column. The value for these fields, however, is stored on the UDFDetail, in the column Description.
I have lots of custom columns for Project, and they are stored in the UDF table.
So for example, to get two fields for the project I do the following select:
SELECT
p.Name ProjectName,
ud1.Description Field1,
ud1.UDFCode Field1Id,
ud2.Description Field2,
ud2.UDFCode Field2Id
FROM
Project p
LEFT JOIN UDF u1 ON
u1.EntityId = p.ProjectId AND u1.ItemName='Field1'
LEFT JOIN UDFDetail ud1 ON
ud1.UDFCode = u1.UDFCode
LEFT JOIN UDF u2 ON
u2.EntityId = p.ProjectId AND u2.ItemName='Field2'
LEFT JOIN UDFDetail ud2 ON
ud2.UDFCode = u2.UDFCode
The Problem
Imagine the above select but joining with like 15 fields. In my query I have around 10 fields already and the performance is not very good. It is taking about 20 seconds to run. I have good indexes for these tables, so looking at the execution plan, it is doing only index seeks without any lookups. Regarding the joins, it needs to be left join, because Field 1 might not exist for that specific project.
The Question
Is there a more performatic way to retrieve the data?
How would you do the query to retrieve 10 different fields for one project in a schema like this?
Your choices are pivot, explicit aggregation (with conditional functions), or the joins. If you have the appropriate indexes set up, the joins may be the fastest method.
The correct index would be UDF(EntityId, ItemName, UdfCode).
You can test if the group by is faster by running a query such as:
SELECT count(*)
FROM p LEFT JOIN
UDF u1
ON u1.EntityId = p.ProjectId LEFT JOIN
UDFDetail ud1
ON ud1.UDFCode = u1.UDFCode;
If this runs fast enough, then you can consider the group by approach.
You can try this very weird contraption (it does not look pretty, but it does a single set of outer joins). The intermediate result is a very "wide" and "long" dataset, which we can then "compact" with aggregation (for example, for each ProjectName, each Field1 column will have N result, N-1 NULLs and 1 non-null result, which is then selecting with a simple MAX aggregation) [N is the number of fields].
select ProjectName, max(Field1) as Field1, max(Field1Id) as Field1Id, max(Field2) as Field2, max(Field2Id) as Field2Id
from (
select
p.Name as ProjectName,
case when u.UDFName='Field1' then ud.Description else NULL end as Field1,
case when u.UDFName='Field1' then ud.UDFCode else NULL end as Field1Id,
case when u.UDFName='Field2' then ud.Description else NULL end as Field2,
case when u.UDFName='Field2' then ud.UDFCode else NULL end as Field2Id
from Project p
left join UDF u on p.ProjectId=u.EntityId
left join UDFDetail ud on u.UDFCode=ud.UDFCode
) tmp
group by ProjectName
The query can actually be rewritten without the inner query, but that should not make a big difference :), and looking at Gordon Linoff's suggestion and your answer, it might actually take just about 20 seconds as well, but it is still worth giving a try.

Number of Records don't match when Joining three tables

Despite going through every material I could possibly find on the internet, I haven't been able to solve this issue myself. I am new to MS Access and would really appreciate any pointers.
Here's my problem - I have three tables
Source1084 with columns - Department, Sub-Dept, Entity, Account, +few more
R12CAOmappingTable with columns - Account, R12_Account
Table4 with columns - R12_Account, Department, Sub-Dept, Entity, New Dept, LOB +few more
I have a total of 1084 records in Source and the result table must also contain 1084 records. I need to draw a table with all the columns from Source + R12_account from R12CAOmappingTable + all columns from Table4.
Here is the query I wrote. This yields the right columns but gives me more or less number of records with interchanging different join options.
SELECT rmt.r12_account,
srb.version,
srb.fy,
srb.joblevel,
srb.scenario,
srb.department,
srb.[sub-department],
srb.[job function],
srb.entity,
srb.employee,
table4.lob,
table4.product,
table4.newacct,
table4.newdept,
srb.[beg balance],
srb.jan,
srb.feb,
srb.mar,
srb.apr,
srb.may,
srb.jun,
srb.jul,
srb.aug,
srb.sep,
srb.oct,
srb.nov,
srb.dec,
rmt.r12_account
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
WHERE ( ( ( srb.[sub-department] ) = table4.subdept )
AND ( ( srb.entity ) = table4.entity )
AND ( ( rmt.r12_account ) = table4.r12_account ) );
In this simple example, Table1 contains 3 rows with unique fld1 values. Table2 contains one row, and the fld1 value in that row matches one of those in Table1. Therefore this query returns 3 rows.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
However if I add the WHERE clause as below, that version of the query returns only one row --- the row where the fld1 values match.
SELECT *
FROM
Table1 AS t1
LEFT JOIN Table2 AS t2
ON t1.fld1 = t2.fld1
WHERE t1.fld1 = t2.fld1;
In other words, that WHERE clause counteracts the LEFT JOIN because it excludes rows where t2.fld1 is Null. If that makes sense, notice that second query is functionally equivalent to this ...
SELECT *
FROM
Table1 AS t1
INNER JOIN Table2 AS t2
ON t1.fld1 = t2.fld1;
Your situation is similar. I suggest you first eliminate the WHERE clause and confirm this query returns at least your expected 1084 rows.
SELECT Count(*) AS CountOfRows
FROM (source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account)
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity );
After you get the query returning the correct number of rows, you can alter the SELECT list to return the columns you want. But the columns aren't really the issue until you can get the correct rows.
Without knowing your tables values it is hard to give a complete answer to your question. The issue that is causing you a problem based on how you described it. Is more then likely based on the type of joins you are using.
The best way I found to understand what type of joins you should be using would referencing a Venn diagram explaining the different type of joins that you can use.
Jeff Atwood also has a really good explanation of SQL joins on his site using the above method as well.
Best to just use the query builder. Drop in your main table. Choose the columns you want. Now for any of the other lookup values then simply drop in the other tables, draw the join line(s), double click and use a left join. You can do this for 2 or 30 columns that need to "grab" or lookup other values from other tables. The number of ORIGINAL rows in the base table returned should ALWAYS remain the same.
So just use the query builder and follow the above.
The problem with your posted SQL is you NESTED the joins inside (). Don't do that. (or let the query builder do this for you – they tend to be quite messy but will also work).
Just use this:
FROM source1084 AS srb
LEFT JOIN r12caomappingtable AS rmt
ON srb.account = rmt.account
LEFT JOIN table4
ON ( srb.department = table4.dept )
AND ( srb.[sub-department] = table4.subdept )
AND ( srb.entity = table4.entity )
As noted, I don't see why you are "repeating" the conditions again in the where clause.

SQL join result errors

I'm trying to run this join and I'm not receiving the correct values.
My first query return like 25,000 record
SELECT count(*) from table1 as DSO,
table2 as EAR
WHERE
(UCASE(TRIM(EAR.value)) = UCASE(TRIM(DSO.value))
AND
UCASE(TRIM(EAR.value1) = UCASE(TRIM(DSO.value1))
my second Query return like 3,000,000
SELECT count(*) from table1 as DSO
left join table2 as EAR,
ON
(UCASE(TRIM(EAR.value)) = UCASE(TRIM(DSO.value))
AND
UCASE(TRIM(EAR.value1) = UCASE(TRIM(DSO.value1))
The total of records of the table 1 are like 45,000, thats what I Should recieve.
First query is an INNER JOIN and second one is a LEFT JOIN. You should expect quite different results. Also, look at the way db2400 treats NULLs with the UCASE and TRIM functions. My guess is that your left join is making some matches that you don't want.
The INNER JOIN in the first query is going to exclude any records from table1 that don't have a match in table2. That pretty quickly explains the lower count.
Either join will happily create more than one row for each record in table1 if it finds multiple matches in table2. The difference is that the LEFT JOIN will ALSO create one row for each record in table1 that doesn't have a match in table2. It sounds like you expect there to be a 1:1 match between the two tables, but that is not what you are getting.

INNER JOIN with complex condition dramatically increases the execution time

I have 2 tables with several identical fields needed to be linked in JOIN condition. E.g. in each table there are fields: P1, P2. I want to write the following join query:
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P1
OR Table1.P2 = Table2.P2
OR Table1.P1 = Table2.P2
OR Table1.P2 = Table2.P1
In the case I have huge tables this request is executing a lot of time.
I tried to test how long will be the request of a query with one condition only. First, I have modified the tables in such way all data from P2 & P1 where copied as new rows into Table1 & Table2. So my query is simple:
SELECT ... FROM Table1 INNER JOIN Table2 ON Table1.P = Table2.P
The result was more then surprised: the execution time from many hours (the 1st case) was reduced to 2-3 seconds!
Why is it so different? Does it mean the complex conditions are always reduce performance? How can I improve the issue? May be P1,P2 indexing will help? I want to remain the 1st DB schema and not to move to one field P.
The reason the queries are different is because of the join strategies being used by the optimizer. There are basically four ways that two tables can be joined:
"Hash join": Creates a hash table on one of the tables which it uses to look up the values in the second.
"Merge join": Sorts both tables on the key and then readsthe results sequentially for the join.
"Index lookup": Uses an index to look up values in one table.
"Nested Loop": Compars each value in each table to all the values in the other table.
(And there are variations on these, such as using an index instead of a table, working with partitions, and handling multiple processors.) Unfortunately, in SQL Server Management Studio both (3) and (4) are shown as nested loop joins. If you look more closely, you can tell the difference from the parameters in the node.
In any case, your original join is one of the first three -- and it goes fast. These joins can basically only be used on "equi-joins". That is, when the condition joining the two tables includes an equality operator.
When you switch from a single equality to an "in" or set of "or" conditions, the join condition has changed from an equijoin to a non-equijoin. My observation is that SQL Server does a lousy job of optimization in this case (and, to be fair, I think other databases do pretty much the same thing). Your performance hit is the hit of going from a good join algorithm to the nested loops algorithm.
Without testing, I might suggest some of the following strategies.
Build an index on P1 and P2 in both tables. SQL Server might use the index even for a non-equijoin.
Use the union query suggested in another solution. Each query should be correctly optimized.
Assuming these are 1-1 joins, you can also do this as a set of multiple joins:
from table1 t1 left outer join
table2 t2_11
on t1.p1 = t2_11.p1 left outer join
table2 t2_12
on t1.p1 = t2_12.p2 left outer join
table2 t2_21
on t1.p2 = t2_21.p2 left outer join
table2 t2_22
on t1.p2 = t2_22.p2
And then use case/coalesce logic in the SELECT to get the value that you actually want. Although this may look more complicated, it should be quite efficient.
you can use 4 query and Union there result
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P1
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P1 = Table2.P2
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P2 = Table2.P1
UNION
SELECT ... FROM Table1
INNER JOIN
Table2
ON Table1.P2 = Table2.P2
Does using CTEs help performance?
;WITH Table1_cte
AS
(
SELECT
...
[P] = P1
FROM Table1
UNION
SELECT
...
[P] = P2
FROM Table1
)
, Table2_cte
AS
(
SELECT
...
[P] = P1
FROM Table2
UNION
SELECT
...
[P] = P2
FROM Table2
)
SELECT ... FROM Table1_cte x
INNER JOIN
Table2_cte y
ON x.P = y.P
I suspect, as far as the processor is concerned, the above is just different syntax for the same complex conditions.

How can I exclude values from a third query (Access)

I have a query that shows me a listing of ALL opportunities in one query
I have a query that shows me a listing of EXCLUSION opportunities, ones we want to eliminate from the results
I need to produce a query that will take everything from the first query minus the second query...
SELECT DISTINCT qryMissedOpportunity_ALL_Clients.*
FROM qryMissedOpportunity_ALL_Clients INNER JOIN qryMissedOpportunity_Exclusions ON
([qryMissedOpportunity_ALL_Clients].[ClientID] <> [qryMissedOpportunity_Exclusions].[ClientID])
AND
([qryMissedOpportunity_Exclusions].[ClientID] <> [qryMissedOpportunity_Exclusions].[BillingCode])
The initial query works as intended and exclusions successfully lists all the hits, but I get the full listing when I query with the above which is obviously wrong. Any tips would be appreciated.
EDIT - Two originating queries
qryMissedOpportunity_ALL_Clients (1)
SELECT MissedOpportunities.MOID, PriceList.BillingCode, Client.ClientID, Client.ClientName, PriceList.WorkDescription, PriceList.UnitOfWork, MissedOpportunities.Qty, PriceList.CostPerUnit AS Our_PriceList_Cost, ([MissedOpportunities].[Qty]*[PriceList].[CostPerUnit]) AS At_Cost, MissedOpportunities.fBegin
FROM PriceList INNER JOIN (Client INNER JOIN MissedOpportunities ON Client.ClientID = MissedOpportunities.ClientID) ON PriceList.BillingCode = MissedOpportunities.BillingCode
WHERE (((MissedOpportunities.fBegin)=#10/1/2009#));
qryMissedOpportunity_Exclusions
SELECT qryMissedOpportunity_ALL_Clients.*, MissedOpportunity_Exclusions.Exclusion, MissedOpportunity_Exclusions.Comments
FROM qryMissedOpportunity_ALL_Clients INNER JOIN MissedOpportunity_Exclusions ON (qryMissedOpportunity_ALL_Clients.BillingCode = MissedOpportunity_Exclusions.BillingCode) AND (qryMissedOpportunity_ALL_Clients.ClientID = MissedOpportunity_Exclusions.ClientID)
WHERE (((MissedOpportunity_Exclusions.Exclusion)=True));
One group needs to see everything, the other needs to see things they havn't deamed as "valid" missed opportunity as in, we've seen it, verified why its there and don't need to bother critiquing it every single month.
Generally you can exclude a table by doing a left join and comparing against null:
SELECT t1.* FROM t1 LEFT JOIN t2 on t1.id = t2.id where t2.id is null;
Should be pretty easy to adopt this to your situation.
Looking at your query rewritten to use table aliases so I can read it...
SELECT DISTINCT c.*
FROM qryMissedOpportunity_ALL_Clients c
JOIN qryMissedOpportunity_Exclusions e
ON c.ClientID <> e.ClientID
AND e.ClientID <> e.BillingCode
This query will produce a cartesian product of sorts... each and every row in qryMissedOpportunity_ALL_Clients will match and join with every row in qryMissedOpportunity_Exclusions where ClientIDs do not match... Is this what you want?? Generally join conditions are based on a column in one table being equal to the value of a column in the other table... Joining where they are not equal is unusual ...
Second, the second iniquality in the join conditions is between columns in the same table (qryMissedOpportunity_Exclusions table) Are you sure this is what you want? If it is, it is not a join condition, it is a Where clause condition...
Second, your question mentions two queries, but there is only the one query (above) in yr question. Where is the second one?