Novice SQL Server
Can someone explain the logic of the below update with a join. I don't understand the setting of a specific value in the 'on' clause...
(#c is a tiny temp table with fields: cert, prod, cov, i)
update m
set inieff = i
from tmempt m
inner join #c on clntcode = '01208' and
polno = '00000408' and
certno = cert and
prodcode = prod and
covgcode = cov and
rcdsts = 'A'
...so how does '..on clntcode='01208' and polno='00000408'' work in the context of a join? I thought that joins work by field relationships...
Thanks
J
An inner join is simple. For each pair of rows in the two tables, the on clause is evaluated. When it evaluates to true (i.e. not false and not NULL), then the pair passes the filter.
Note that there is no specification whatsoever on the condition. The most typical conditions are equality conditions on one or more columns. However, inequalities, function calls, and even subqueries are allowed.
The definition of outer joins is just a slight variation on the inner join definition. For outer joins, rows are output from one or both tables even when the on clause does not evaluate to true.
For inner joins, putting conditions in the on versus where is really a matter of style. For outer joins, some conditions may need to go in the on -- and others in the where.
The join conditions can have whatever clauses you like.
The main purpose is to join one column of one table to a column in another table, but it can also be used to limit the rows you look at in the joined tables.
For example something like this is relatively common
select a1.address as postal, a2.address as street
from customer
join address a1 on a1.customerid=customer.id and a1.addresstype='postal'
join address a2 on a2.customerid=customer.id and a2.addresstype='street'
Related
SQL Masters,
I don't understand part of this query. In the select statement there are what look like independent 'select statements'almost like a function. This code is vendor written Blackbaud CRM. As independent code there is no join in the code for the info they bring into the data set as you can see in the from clause. One last odd item is that in the column aliased Spouse_id the column SPOUSE.RECIPROCALCONSTITUENTID dose not even exist in the table referred to. Any BBCRM people out there that can explain this?
Thanks
select
CONSTITUENT.ID,
CONSTITUENT.ISORGANIZATION,
CONSTITUENT.KEYNAME,
CONSTITUENT.FIRSTNAME,
CONSTITUENT.MIDDLENAME,
CONSTITUENT.MAIDENNAME,
CONSTITUENT.NICKNAME,
(select SPOUSE.RECIPROCALCONSTITUENTID
from dbo.RELATIONSHIP as SPOUSE
where SPOUSE.RELATIONSHIPCONSTITUENTID = CONSTITUENT.ID
and SPOUSE.ISSPOUSE = 1) as [SPOUSE_ID],
(select MARITALSTATUSCODE.DESCRIPTION
from dbo.MARITALSTATUSCODE
where MARITALSTATUSCODE.ID = CONSTITUENT.MARITALSTATUSCODEID) as [MARITALSTATUSCODEID_TRANSLATION]
From
dbo.constituent
left join
dbo.ORGANIZATIONDATA on ORGANIZATIONDATA.ID = CONSTITUENT.ID
where
(CONSTITUENT.ISCONSTITUENT = 1)
These are correlated subqueries. Although there is no explicit JOIN, there is a link to the outer table which behaves like a join (although more constrained than explicit JOINs):
(select SPOUSE.RECIPROCALCONSTITUENTID
from dbo.RELATIONSHIP as SPOUSE
where SPOUSE.RELATIONSHIPCONSTITUENTID = CONSTITUENT.ID AND
-------^ correlation clause connecting to outer table
SPOUSE.ISSPOUSE = 1
) as [SPOUSE_ID],
This behaves like a LEFT JOIN. If no rows match, then the result is NULL.
Note that in this context, the correlated subquery is also a scalar subquery. That means that it returns exactly one column and at most one row.
If the query returned more than one column, you would get a compile-time error on the query. If the query returns more than one row, you will get a run-time error on the query.
I've been dealing with a slow running query, similar to the following
select
count(*)
from
a
join b
on a.akey = b.akey
join c
on b.bkey = c.bkey
left join d
on c.ykey = d.ykey
and b.xkey = d.xkey
where
a.idkey = 'someid'
This query takes 130s to run for 'someid'
If I remove either condition of the left join, the query runs in <1s.
I've determined the issue for this particular record (someid). There are a huge number of matching d.xkey values (~5 000 000). I've done some tests and modifying the relevant d.xkey values for this record to more unique values improves run time to <1s.
This is the fix I'm currently using.
select
count(*)
from
a
join b
on a.akey = b.akey
join c
on b.bkey = c.bkey
left join d
on c.ykey = d.ykey
where
a.idkey = 'someid'
and (
b.xkey = d.xkey
OR b.xkey is null
OR not exists (
select
dd.xkey
from
d dd
where
dd.xkey = b.xkey
and dd.ykey = c.ykey
)
)
This query runs in less than 1s.
My question is, why is this so much faster than the left join?
Is my new query equivalent to the old one in terms of results?
If the join onto d is efficient for either b.xkey or c.ykey alone (these names are appallingly subtle), but not when both are combined, it's probably because it is able to use an index on d for each one individually, but there is no combined index available.
The second example you've posted with the NOT EXISTS clause is almost unfathomable, but crucially it includes extra logic and is not directly equivalent to the LEFT JOIN in the first example.
In the WHERE clause of your second example, you permit rows to be included that have been left-joined between c and d where b.xkey is null, whereas in your first example, the joining of these rows would never have occurred (because b.xkey being null would have precluded the left-join). This means d has already possibly multiplied the rows in the results set improperly, which cannot be filtered by the where-clause (because without a ROW_NUMBER function, the where-clause cannot differentiate between each improper match - and can only filter either all or none of them, rather than reducing them back down to a single row), so the two queries can be deemed not logically identical on this ground alone.
It's otherwise difficult to reason precisely about what the combined effect of the whole where-clause is, and how it might be interacting with the other constraints and the underlying data to allow the query to perform better (despite ostensibly having to perform a similar lookup as the left-join did in the first example). If you are getting identical results from both queries, I would say it is only because of a dangerous coincidence in the data, while the logical constraints imposed by the two queries are fundamentally different.
Both queries don't seems logically equal. You can understand simply with condition
OR b.xkey is null .
a=b and b=c and c=d(+)
If you are taking OR b.xkey is null then a=b is filtering out some data.
Actually both are very much different.
Consider below SQL.
SELECT DISTINCT bvc_Order.ID,
bvc_OrderItem.ProductID,
bvc_OrderItem_BundleItem.ProductID
FROM dbo.bvc_OrderItem WITH (nolock)
RIGHT OUTER JOIN dbo.bvc_Order WITH (nolock)
LEFT OUTER JOIN dbo.bvc_User WITH (nolock) ON dbo.bvc_Order.UserID = dbo.bvc_User.ID
LEFT OUTER JOIN dbo.Amazon_Merchants WITH (nolock) ON dbo.bvc_Order.CompanyID = dbo.Amazon_Merchants.ID ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
LEFT OUTER JOIN dbo.bvc_OrderItem_BundleItem WITH (nolock) ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemID
LEFT OUTER JOIN dbo.bvc_Product WITH (nolock) ON dbo.bvc_OrderItem.ProductID = dbo.bvc_Product.ID
WHERE 1=1
AND (bvc_Order.StatusCode <> 1
AND bvc_Order.StatusCode <> 999)
AND ( bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00'))
AND bvc_Order.OrderSource = 56;
The query when I execute against my database, it returns 85 rows. Well, that is not correct.
If I just remove the part "AND bvc_Order.OrderSource = 56" it returns back 5 rows which is really correct.
Strange.....
Another thing, if I remove the part
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
it will also return the 5 rows as expected even with bvc_Order.OrderSource filter.
I am not sure why it is adding more rows while I am trying to reduce rows by using filters.
the table bvc_OrderItem_BundleItem doesn't contain any rows for the result order ids or OrderItemIDs
[edit]
Thanks guys, I tried to remove the LEFT/RIGHT Join Mix but Query manager doesn't allows only LEFT, it does add at least one RIGHT join. I updated the SQL to remove extra tables and now we have only three. But same result
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS Expr1
FROM dbo.bvc_OrderItem
LEFT OUTER JOIN dbo.bvc_OrderItem_BundleItem ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemId
RIGHT OUTER JOIN dbo.bvc_Order ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
WHERE 1=1
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
AND (
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
)
AND bvc_Order.OrderSource = 56;
[edit]So far, there is no solution for this. I previously pasted a link in my comment with example data outout for both valid/invalid results with queries. here it is again.
http://sameers.me/SQLIssue.xlsx
One thing to remember here is that ALL left join is not possible. Let me explain further
bvc_Order contains main order record
bvc_ORderItem contains Order Items/Products
bvc_ORderItem_BundleItem contains child products of the product which are available in bvC_OrderItem table.
Now NOT Every product has child products, so bvc_OrderItem_BundleItem may not have any record (and in current scenario, there is really no valid row for the orders in bvC_OrderItem_BundleItem).
In short, in current scenario, there is NO matching row available in bvc_OrderItem_BundleItem table. If I remove that join for now, it is all okay, but in real world, I can't remove that BundleItem table join ofcourse.
thank you
When you say
WHERE bvc_Order.OrderSource = 56
that evaluates to false when bvc_Order.OrderSource is NULL. If the LEFT/RIGHT join failed then it will be NULL. This effectively turns the LEFT/RIGHT join into an inner join.
You probably should write the predicate into the ON clause. An alternative approach, which might not deliver the same results, is:
WHERE (bvc_Order.OrderSource IS NULL OR bvc_Order.OrderSource = 56)
The other predicates have the same problem:
Another thing, if I remove the part OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00') it will also return the 5 rows as expected
When the join fails bvc_OrderItem_BundleItem.ProductID is NULL.
I also would recommend writing queries manually. If I understand you right this query comes from a designer. It's structure is quite confusing. I'm pulling up the most important comment:
Mixing left and right outer joins in a query is just confusing. You should start by rewriting the from clause to only use one type (and I strongly recommend left outer join). – Gordon Linoff
When you have eliminated the impossible, whatever remains, however
improbable, must be the truth? S.H.
It is impossible that an extra AND condition appended to a WHERE clause can ever result in extra rows. That would imply a database engine defect, which I hope I can assume is "impossible". (If not, then I guess it's back to square one).
That fact makes it easier to concentrate on possible reasons:
When you comment out
AND bvc_Order.OrderSource = 56;
then you also comment out the semicolon terminator. Is it possible
that there is text following this query that is affecting it? Try
putting a semicolon at the end of the previous line to make sure.
Depending on the tool you are using to run queries, sometimes when a
query fails to execute, the tool mistakenly shows an old result set.
Make sure your query is executing correctly by adding a dummy column
to the SELECT statement to absolutely prove you are seeing live
results. Which tool are you using?
when you use LEFT outer join it will give all the rows from left table (dbo.bvc_OrderItem) once the your and, or conditions satisfies,
the same thing happens with Right outer join too,
Those conditions (Left join, right join ) may not restrict the rows since rows from one table can be all, another table with some rows only.
check with your join condition
Then check you condition :
(bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
if any rows satisfying this condition
next check with another condition
[bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')]
Then bvc_Order.OrderSource = 56
compare the result of three queries and check the data in with the conditions and then write your complete query, so that you will understand where the mistake you have done.
Few points to remember
1.And is applied during Virtual join phases
2.Where clause is applied after the final result
3.Left join followed by right join is effectively an inner join in some cases
Lets break your query step by step..
dbo.bvc_OrderItem a1
LEFT OUTER JOIN
dbo.bvc_OrderItem_BundleItem b1
Above output will be a single virtual table (logically) which contains all rows from b1 with matching rows from a1
now below predicates from your and clause will be applied
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
which effectively eliminates all rows from bvc_OrderItem_BundleItem even if they have matches and gives result some thing like below if bvc_OrderItem_BundleItem.ProductID IN ('28046_00') is true
bvc_OrderItem bvc_OrderItem_BundleItem
28046 28046
null 1
null 2
null 3
if this condition(bvc_OrderItem.ProductID IN ('28046_00')) is true,then you are asking sql to ignore all rows in bvc_OrderItem ,which effectively means the same result set as above
bvc_OrderItem bvc_OrderItem_BundleItem other columns
28046 28046
null 1
null 2
null 3
next you are doing right outer join with dbo.bvc_Order which may qualifies for the join point I mentioned above
Assume ,you got below result set as output which preserves all of bvc_order table(rough output only for understanding due to lack of actual data)
bvc_OrderItem bvc_OrderItem_BundleItem statuscode ordersource
28046 28046 999 56
null 1 1 57
null 2 100 58
null 3 11 59
Next below AND predicates will be applied
status code <>1 and statuscode<> 999
which means ignore rows which match with bvc_order and has status of 1 ,999 even if they found matching rows
Next you are asking bvc_Order.OrderSource = 56; which means I don't care about other rows,preserve matching rows only for 56 and keep the rest as null
Hope this clarifies on what is happening step by step.A more better way can be provide some test data and show the expected output.
you also can control physical order of joins,you can try below to see if this is what you are trying to do..
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS Expr1
dbo.bvc_OrderItem
LEFT OUTER JOIN
(
dbo.bvc_OrderItem_BundleItem
RIGHT OUTER JOIN
dbo.bvc_Order
ON dbo.bvc_OrderItem.OrderID = dbo.bvc_OrderItem_BundleItem.OrderItemId
)c
on
dbo.bvc_OrderItem.ID = c.bvc_OrderItem_BundleItem.OrderItemId
WHERE 1=1
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
AND (
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
)
AND bvc_Order.OrderSource = 56;
It looks like you are using the Query Designer. I would avoid using this as this can make your queries extremely confusing. Your queries will be much more concise if you are designing them by hand. If you don't completely understand how inner/outer joins work, a great textbook that I used to teach myself SQL is Murach's SQL Server for Developers.
https://www.murach.com/shop/murach-s-sql-server-2012-for-developers-detail
Now, onto the answer.
I've been thinking about how to resolve your problem, and if you are trying to reduce the result set to 5 rows, why are you using multiple outer joins in the first place? I would consider switching the joins to inner joins instead of outer joins if you are looking for a very specific result set. I can't really provide you with a really comprehensive answer without looking at exactly what results you are trying to achieve, but here's a general idea based on what you've provided to all of us:
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS 'bvc_OrderItem_BundleItem_ProductID'
FROM dbo.bvc_OrderItem
INNER JOIN dbo.bvc_OrderItem_BundleItem ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemId
INNER JOIN dbo.bvc_Order ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
Start here and then based upon what you are searching for, add where clauses to filter criteria.
Also, your where clause must be rewritten if you use an inner join instead of an outer join:
WHERE 1=1 --not really sure why this is here. This will always be true. Omit this statement to avoid a bad result set.
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999) --this is saying, if the StatusCode is not equal to 1 and not equal to 999, don't include it.
--Revised: Look for Status codes with 1 or 999
--bvc_Order.StatusCode = 1 OR bvc_Order.StatusCode = 999
AND (bvc_OrderItem.ProductID IN ('28046_00') --I would eliminate this unless you are looking to see if this exists in Product ID. You could also accomplish this if you are trying to see if this value is in both tables, change this to:
bvc_OrderItem.ProductID = '28046_00' AND bvc_OrderItem_BundleItem.ProductID = '28046_00')
--if you are trying to see if the order source is 56, use this.
AND bvc_Order.OrderSource = 56;
If you are trying to find out rows that are not included in this result set, then I would use OUTER JOIN as necessary (LEFT preferred). Without more information about what you're looking for in your database, that's the best all of us can do.
bLike #usr writ, the reason of this unexpected (for you) result is, you build query with outer joins, and filter rows after join. If you need filter rows of outer joined tables, you should do this before join.
but probably you try build this:
SELECT DISTINCT o.ID, oi.ProductID, bi.ProductID AS Expr1
FROM dbo.bvc_Order as o
LEFT JOIN dbo.bvc_OrderItem as oi on oi.OrderID = o.ID
LEFT JOIN dbo.bvc_OrderItem_BundleItem as bi ON oi.ID = bi.OrderItemId
WHERE 1=1
AND o.OrderSource = 56;
AND o.StatusCode not in (1, 999)
AND '28046_00' in (oi.ProductID, isnull(bi.ProductID,'_') )
Is this query give results what you need?
if not, try change last condition, for example:
and (bi.ProductID = '28046_00' or bi.ProductID is null and oi.ProductID = '28046_00')
you can also put additional condition in to join conditions, for example:
SELECT DISTINCT o.ID, oi.ProductID, bi.ProductID AS Expr1
FROM dbo.bvc_Order as o
LEFT JOIN dbo.bvc_OrderItem as oi on oi.OrderID = o.ID
LEFT JOIN dbo.bvc_OrderItem_BundleItem as bi ON oi.ID = bi.OrderItemId
and bi.ProductID in ('28046_00') --this join BundleItem only if ...
WHERE 1=1
AND o.OrderSource = 56;
AND o.StatusCode not in (1, 999)
AND (oi.ProductID in ('28046_00') or bi.ProductID is not null)
ah, and if you always need join bvc_Order with bvc_OrderItem then use inner join
I have a query which works, goes like this:
Select
count(InsuranceOrderLine.AntallPotensiale) as potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name
This query over returns what I need, but when I try to add more table (InsuranceOrder) to be able to get the regardingUser column, then all the count values are way high.
Select
count(InsuranceOrderLine.AntallPotensiale) as Potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori,
RegardingUser
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory, InsuranceSalesLead
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name,RegardingUser
Thanks in advance
You're adding one more table to your FROM statement, but you don't specify any JOIN condition for that table - so your previous result set will do a FULL OUTER JOIN (cartesian product) with your new table! Of course you'll get duplication of data....
That's one of the reasons that I'm recommending never to use that old, legacy style JOIN - do not simply list a comma-separated bunch of tables in your FROM statement.
Always use the new ANSI standard JOIN syntax with INNER JOIN, LEFT OUTER JOIN and so on:
SELECT
count(iol.AntallPotensiale) as Potensiale,
COUNT(iol.AntallSolgt) as Solgt,
ip.Name,
ipc.Name as Kategori,
isl.RegardingUser
FROM
dbo.InsuranceOrderLine iol
INNER JOIN
dbo.InsuranceProduct ip ON iol.FKInsuranceProductId = ip.InsuranceProductID
INNER JOIN
dbo.InsuranceProductCategory ipc ON ip.FKInsuranceProductCategory = ipc.InsuranceProductCategoryID
INNER JOIN
dbo.InsuranceSalesLead isl ON ???????? -- JOIN condition missing here !!
When you do this, you first of all see right away that you're missing a JOIN condition here - how is this new table InsuranceSalesLead linked to any of the other tables already used in this SQL statement??
And secondly, your intent is much clearer, since the JOIN conditions linking the tables are where they belong - right with the JOIN - and don't clutter up your WHERE clauses ...
It looks like you added the table join which slightly multiplies count of rows - make sure, that you properly joining the table. And be careful with aggregate functions over several joined tables - joins very often lead to duplicates
I am trying to find out if there's a good way to get all the column names, and values for a particular row, where a part of a condition is met. That is, I want to know which fields within my huge nested AND and OR where condition, met which conditions, and their values.
The catch is I am actually using the Dynamic LINQ API over a datatable and I will have to get it to generate that query, or do something else entirely to essentially check user-defined validation rules on some forms. If anyone has better ideas on how to approach this, I'd appreciate it.
Warning this is ugly as hell - but it may work for you.
Select a.*, b.*, mycond1, mycond2, mycond3
From a
Inner Join b On a.pk = b.pk
… rest of normal query …
-- set of conditions --
Left Outer Join (select 1 as matched where mycondition1) mycond1
Left Outer Join (select 1 as matched where mycondition2) mycond2
Left Outer Join (select 1 as matched where mycondition3) mycond3
-- Relationship between conditions
Where (mycond1.matched is not null or mycond2.matched is not null) and mycond3 is not null
The idea is to use the correlated subqueries to return a 1 or a null depending on whether the individual part of the criteria expression is true for the row. Then the logical relationship between the individual criteria expressions is applied in the where clause.
This might be doable if you're generating the SQL rather than maintaining it by hand.