Declare a variable inside of a projection - sql

Note below that I need to declare a variable which is the result another query. If don't do this, I need to repeat this query anytime where I need the value.
SQL Server is throwing an exception about not to write DECLARE inside of the SELECT keyword. What can I do or what I'm missing?
SELECT A.StudentId,
(
CASE WHEN (SELECT B.OverwrittenScore
FROM dbo.OverwrittenScores AS B
WHERE B.StudentId = A.StudentId AND B.AssignmentId = #assignmentId) IS NOT NULL
THEN (SELECT B.OverwrittenScore
FROM dbo.OverwrittenScores AS B
WHERE B.StudentId = A.StudentId AND B.AssignmentId = #assignmentId)
ELSE (-- ANOTHER QUERY, BY THE MOMENT: SELECT 0
) END
) AS FinalScore
FROM dbo.Students AS A
Inside of the parenthesis I need to implement some logic, I mean maybe implement another two queries.
I was thinking if here I can use the BEGIN keyword, but it didn't worked out

You don't need all that craziness. There are a lot of conceptual problems with what you're trying to do.
You can't declare variables in the middle of a query.
Scalar variables can only hold one value.
Scalar variables in SQL Server always begin with #. Cursor variables can be plain identifiers but you definitely don't want a cursor, here.
A simple JOIN will do what you're looking for. The subquery method works but is awkward (sticking queries in the SELECT statement), can't pull more than one column value, and can't be reused throughout the query like a JOIN can.
You can use a CASE statement directly on a column. There is no need to try to put the value into a variable first. And that wouldn't work anyway (see #2).
You can use the IsNull or Coalesce functions to turn a NULL into a 0 with simpler syntax.
I encourage you to use aliases that hint at the tables instead of using A and B. For example, S for Students and O for OverwrittenScores.
Taking all those points into consideration, you can do something like this instead:
SELECT
S.StudentId,
OverwrittenScore = Coalesce(O.OverwrittenScore, 0)
FROM
dbo.Students S
LEFT JOIN dbo.OverwrittenScores O
ON S.StudentId = O.StudentID
AND O.AssignmentId = #assignmentId
LEFT JOIN dbo.SomeOtherTable T -- add another join here if you like
ON S.StudentID = T.StudentID
AND O.OverwrittenScore IS NULL
UPDATE
I added another LEFT JOIN for you above. Do you see how it joins on the condition that O.OverwrittenScore IS NULL? This seems to me like it will probably do what you want.
Again, if you will provide more detail I will show you more answer.
Also, for what it's worth, your edit to your post is overcomplicated. If you were going to write your query that way, it would be better like this:
SELECT
S.StudentId,
FinalScore =
Coalesce(
(SELECT O.OverwrittenScore
FROM dbo.OverwrittenScores O
WHERE
S.StudentId = O.StudentId
AND O.AssignmentId = #assignmentId
),
(SELECT SomethingElse FROM SomewhereElse),
0
)
FROM dbo.Students S
I also encourage you when writing correlations or joins to put the other or outer table first in the join (as in S.StudentId = O.StudentId instead of O.StudentId = S.StudentId). I suggest this because it helps you understand the join faster, since you already know the local table and want to know the outer table, and thus your eye doesn't have to scan as far. I also recommend putting multiple conditions on separate lines. I promise you that you will be able to understand your own queries faster in the future if you get in the habit of doing this.

Related

SQL Server Query - Weird Behaviour

Consider below SQL.
SELECT DISTINCT bvc_Order.ID,
bvc_OrderItem.ProductID,
bvc_OrderItem_BundleItem.ProductID
FROM dbo.bvc_OrderItem WITH (nolock)
RIGHT OUTER JOIN dbo.bvc_Order WITH (nolock)
LEFT OUTER JOIN dbo.bvc_User WITH (nolock) ON dbo.bvc_Order.UserID = dbo.bvc_User.ID
LEFT OUTER JOIN dbo.Amazon_Merchants WITH (nolock) ON dbo.bvc_Order.CompanyID = dbo.Amazon_Merchants.ID ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
LEFT OUTER JOIN dbo.bvc_OrderItem_BundleItem WITH (nolock) ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemID
LEFT OUTER JOIN dbo.bvc_Product WITH (nolock) ON dbo.bvc_OrderItem.ProductID = dbo.bvc_Product.ID
WHERE 1=1
AND (bvc_Order.StatusCode <> 1
AND bvc_Order.StatusCode <> 999)
AND ( bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00'))
AND bvc_Order.OrderSource = 56;
The query when I execute against my database, it returns 85 rows. Well, that is not correct.
If I just remove the part "AND bvc_Order.OrderSource = 56" it returns back 5 rows which is really correct.
Strange.....
Another thing, if I remove the part
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
it will also return the 5 rows as expected even with bvc_Order.OrderSource filter.
I am not sure why it is adding more rows while I am trying to reduce rows by using filters.
the table bvc_OrderItem_BundleItem doesn't contain any rows for the result order ids or OrderItemIDs
[edit]
Thanks guys, I tried to remove the LEFT/RIGHT Join Mix but Query manager doesn't allows only LEFT, it does add at least one RIGHT join. I updated the SQL to remove extra tables and now we have only three. But same result
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS Expr1
FROM dbo.bvc_OrderItem
LEFT OUTER JOIN dbo.bvc_OrderItem_BundleItem ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemId
RIGHT OUTER JOIN dbo.bvc_Order ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
WHERE 1=1
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
AND (
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
)
AND bvc_Order.OrderSource = 56;
[edit]So far, there is no solution for this. I previously pasted a link in my comment with example data outout for both valid/invalid results with queries. here it is again.
http://sameers.me/SQLIssue.xlsx
One thing to remember here is that ALL left join is not possible. Let me explain further
bvc_Order contains main order record
bvc_ORderItem contains Order Items/Products
bvc_ORderItem_BundleItem contains child products of the product which are available in bvC_OrderItem table.
Now NOT Every product has child products, so bvc_OrderItem_BundleItem may not have any record (and in current scenario, there is really no valid row for the orders in bvC_OrderItem_BundleItem).
In short, in current scenario, there is NO matching row available in bvc_OrderItem_BundleItem table. If I remove that join for now, it is all okay, but in real world, I can't remove that BundleItem table join ofcourse.
thank you
When you say
WHERE bvc_Order.OrderSource = 56
that evaluates to false when bvc_Order.OrderSource is NULL. If the LEFT/RIGHT join failed then it will be NULL. This effectively turns the LEFT/RIGHT join into an inner join.
You probably should write the predicate into the ON clause. An alternative approach, which might not deliver the same results, is:
WHERE (bvc_Order.OrderSource IS NULL OR bvc_Order.OrderSource = 56)
The other predicates have the same problem:
Another thing, if I remove the part OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00') it will also return the 5 rows as expected
When the join fails bvc_OrderItem_BundleItem.ProductID is NULL.
I also would recommend writing queries manually. If I understand you right this query comes from a designer. It's structure is quite confusing. I'm pulling up the most important comment:
Mixing left and right outer joins in a query is just confusing. You should start by rewriting the from clause to only use one type (and I strongly recommend left outer join). – Gordon Linoff
When you have eliminated the impossible, whatever remains, however
improbable, must be the truth? S.H.
It is impossible that an extra AND condition appended to a WHERE clause can ever result in extra rows. That would imply a database engine defect, which I hope I can assume is "impossible". (If not, then I guess it's back to square one).
That fact makes it easier to concentrate on possible reasons:
When you comment out
AND bvc_Order.OrderSource = 56;
then you also comment out the semicolon terminator. Is it possible
that there is text following this query that is affecting it? Try
putting a semicolon at the end of the previous line to make sure.
Depending on the tool you are using to run queries, sometimes when a
query fails to execute, the tool mistakenly shows an old result set.
Make sure your query is executing correctly by adding a dummy column
to the SELECT statement to absolutely prove you are seeing live
results. Which tool are you using?
when you use LEFT outer join it will give all the rows from left table (dbo.bvc_OrderItem) once the your and, or conditions satisfies,
the same thing happens with Right outer join too,
Those conditions (Left join, right join ) may not restrict the rows since rows from one table can be all, another table with some rows only.
check with your join condition
Then check you condition :
(bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
if any rows satisfying this condition
next check with another condition
[bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')]
Then bvc_Order.OrderSource = 56
compare the result of three queries and check the data in with the conditions and then write your complete query, so that you will understand where the mistake you have done.
Few points to remember
1.And is applied during Virtual join phases
2.Where clause is applied after the final result
3.Left join followed by right join is effectively an inner join in some cases
Lets break your query step by step..
dbo.bvc_OrderItem a1
LEFT OUTER JOIN
dbo.bvc_OrderItem_BundleItem b1
Above output will be a single virtual table (logically) which contains all rows from b1 with matching rows from a1
now below predicates from your and clause will be applied
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
which effectively eliminates all rows from bvc_OrderItem_BundleItem even if they have matches and gives result some thing like below if bvc_OrderItem_BundleItem.ProductID IN ('28046_00') is true
bvc_OrderItem bvc_OrderItem_BundleItem
28046 28046
null 1
null 2
null 3
if this condition(bvc_OrderItem.ProductID IN ('28046_00')) is true,then you are asking sql to ignore all rows in bvc_OrderItem ,which effectively means the same result set as above
bvc_OrderItem bvc_OrderItem_BundleItem other columns
28046 28046
null 1
null 2
null 3
next you are doing right outer join with dbo.bvc_Order which may qualifies for the join point I mentioned above
Assume ,you got below result set as output which preserves all of bvc_order table(rough output only for understanding due to lack of actual data)
bvc_OrderItem bvc_OrderItem_BundleItem statuscode ordersource
28046 28046 999 56
null 1 1 57
null 2 100 58
null 3 11 59
Next below AND predicates will be applied
status code <>1 and statuscode<> 999
which means ignore rows which match with bvc_order and has status of 1 ,999 even if they found matching rows
Next you are asking bvc_Order.OrderSource = 56; which means I don't care about other rows,preserve matching rows only for 56 and keep the rest as null
Hope this clarifies on what is happening step by step.A more better way can be provide some test data and show the expected output.
you also can control physical order of joins,you can try below to see if this is what you are trying to do..
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS Expr1
dbo.bvc_OrderItem
LEFT OUTER JOIN
(
dbo.bvc_OrderItem_BundleItem
RIGHT OUTER JOIN
dbo.bvc_Order
ON dbo.bvc_OrderItem.OrderID = dbo.bvc_OrderItem_BundleItem.OrderItemId
)c
on
dbo.bvc_OrderItem.ID = c.bvc_OrderItem_BundleItem.OrderItemId
WHERE 1=1
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999)
AND (
bvc_OrderItem.ProductID IN ('28046_00')
OR bvc_OrderItem_BundleItem.ProductID IN ('28046_00')
)
AND bvc_Order.OrderSource = 56;
It looks like you are using the Query Designer. I would avoid using this as this can make your queries extremely confusing. Your queries will be much more concise if you are designing them by hand. If you don't completely understand how inner/outer joins work, a great textbook that I used to teach myself SQL is Murach's SQL Server for Developers.
https://www.murach.com/shop/murach-s-sql-server-2012-for-developers-detail
Now, onto the answer.
I've been thinking about how to resolve your problem, and if you are trying to reduce the result set to 5 rows, why are you using multiple outer joins in the first place? I would consider switching the joins to inner joins instead of outer joins if you are looking for a very specific result set. I can't really provide you with a really comprehensive answer without looking at exactly what results you are trying to achieve, but here's a general idea based on what you've provided to all of us:
SELECT DISTINCT dbo.bvc_Order.ID, dbo.bvc_OrderItem.ProductID, dbo.bvc_OrderItem_BundleItem.ProductID AS 'bvc_OrderItem_BundleItem_ProductID'
FROM dbo.bvc_OrderItem
INNER JOIN dbo.bvc_OrderItem_BundleItem ON dbo.bvc_OrderItem.ID = dbo.bvc_OrderItem_BundleItem.OrderItemId
INNER JOIN dbo.bvc_Order ON dbo.bvc_OrderItem.OrderID = dbo.bvc_Order.ID
Start here and then based upon what you are searching for, add where clauses to filter criteria.
Also, your where clause must be rewritten if you use an inner join instead of an outer join:
WHERE 1=1 --not really sure why this is here. This will always be true. Omit this statement to avoid a bad result set.
AND (bvc_Order.StatusCode <> 1 AND bvc_Order.StatusCode <> 999) --this is saying, if the StatusCode is not equal to 1 and not equal to 999, don't include it.
--Revised: Look for Status codes with 1 or 999
--bvc_Order.StatusCode = 1 OR bvc_Order.StatusCode = 999
AND (bvc_OrderItem.ProductID IN ('28046_00') --I would eliminate this unless you are looking to see if this exists in Product ID. You could also accomplish this if you are trying to see if this value is in both tables, change this to:
bvc_OrderItem.ProductID = '28046_00' AND bvc_OrderItem_BundleItem.ProductID = '28046_00')
--if you are trying to see if the order source is 56, use this.
AND bvc_Order.OrderSource = 56;
If you are trying to find out rows that are not included in this result set, then I would use OUTER JOIN as necessary (LEFT preferred). Without more information about what you're looking for in your database, that's the best all of us can do.
bLike #usr writ, the reason of this unexpected (for you) result is, you build query with outer joins, and filter rows after join. If you need filter rows of outer joined tables, you should do this before join.
but probably you try build this:
SELECT DISTINCT o.ID, oi.ProductID, bi.ProductID AS Expr1
FROM dbo.bvc_Order as o
LEFT JOIN dbo.bvc_OrderItem as oi on oi.OrderID = o.ID
LEFT JOIN dbo.bvc_OrderItem_BundleItem as bi ON oi.ID = bi.OrderItemId
WHERE 1=1
AND o.OrderSource = 56;
AND o.StatusCode not in (1, 999)
AND '28046_00' in (oi.ProductID, isnull(bi.ProductID,'_') )
Is this query give results what you need?
if not, try change last condition, for example:
and (bi.ProductID = '28046_00' or bi.ProductID is null and oi.ProductID = '28046_00')
you can also put additional condition in to join conditions, for example:
SELECT DISTINCT o.ID, oi.ProductID, bi.ProductID AS Expr1
FROM dbo.bvc_Order as o
LEFT JOIN dbo.bvc_OrderItem as oi on oi.OrderID = o.ID
LEFT JOIN dbo.bvc_OrderItem_BundleItem as bi ON oi.ID = bi.OrderItemId
and bi.ProductID in ('28046_00') --this join BundleItem only if ...
WHERE 1=1
AND o.OrderSource = 56;
AND o.StatusCode not in (1, 999)
AND (oi.ProductID in ('28046_00') or bi.ProductID is not null)
ah, and if you always need join bvc_Order with bvc_OrderItem then use inner join

Check the query efficiency

I have this below SQL query that I want to get an opinion on whether I can improve it using Temp Tables or something else or is this good enough? So basically I am just feeding the result set from inner query to the outer one.
SELECT S.SolutionID
,S.SolutionName
,S.Enabled
FROM dbo.Solution S
WHERE s.SolutionID IN (
SELECT DISTINCT sf.SolutionID
FROM dbo.SolutionToFeature sf
WHERE sf.SolutionToFeatureID IN (
SELECT sfg.SolutionToFeatureID
FROM dbo.SolutionFeatureToUsergroup SFG
WHERE sfg.UsergroupID IN (
SELECT UG.UsergroupID
FROM dbo.Usergroup UG
WHERE ug.SiteID = #SiteID
)
)
)
It's going to depend largely on the indexes you have on those tables. Since you are only selecting data out of the Solution table, you can put everything else in an exists clause, do some proper joins, and it should perform better.
The exists clause will allow you to remove the distinct you have on the SolutionToFeature table. Distinct will cause a performance hit because it is basically creating a temp table behind the scenes to do the comparison on whether or not the record is unique against the rest of the result set. You take a pretty big hit as your tables grow.
It will look something similar to what I have below, but without sample data or anything I can't tell if it's exactly right.
Select S.SolutionID, S.SolutionName, S.Enabled
From dbo.Solutin S
Where Exists (
select 1
from dbo.SolutionToFeature sf
Inner Join dbo.SolutionToFeatureTousergroup SFG on sf.SolutionToFeatureID = SFG.SolutionToFeatureID
Inner Join dbo.UserGroup UG on sfg.UserGroupID = UG.UserGroupID
Where S.SolutionID = sf.SolutionID
and UG.SiteID = #SiteID
)

multiple sql joins not producing desired results

I'm new to sql and trying to tweak someone else's huge stored procedure to get a subset of the results. The code below is maybe 10% of the whole procedure. I added the lp.posting_date, last left join, and the where clause. Trying to get records where the posting date is between the start date and the end date. Am I doing this right? Apparently not because the results are unaffected by the change. UPDATE: I CHANGED THE LAST JOIN. The results are correct if there's only one area allocation term. If there is more than one area allocation term, the results are duplicated for each term.
SELECT Distinct
l.lease_id ,
l.property_id as property_id,
l.lease_number as LeaseNumber,
l.name as LeaseName,
lty.name as LeaseType,
lst.name as LeaseStatus,
l.possession_date as PossessionDate,
l.rent as RentCommencementDate,
l.store_open_date as StoreOpenDate,
msr.description as MeasureUnit,
l.comments as Comments ,
lat.start_date as atStartDate,
lat.end_date as atEndDate,
lat.rentable_area as Rentable,
lat.usable_area as Usable,
laat.start_date as aatStartDate,
laat.end_date as aatEndDate,
MK.Path as OrgPath,
CAST(laa.percentage as numeric(9,2)) as Percentage,
laa.rentable_area as aaRentable,
laa.usable_area as aaUsable,
laa.headcounts as Headcount,
laa.area_allocation_term_id,
lat.area_term_id,
laa.area_allocation_id,
lp.posting_date
INTO #LEASES FROM la_tbl_lease l
INNER JOIN #LEASEID on l.lease_id=#LEASEID.lease_id
INNER JOIN la_tbl_lease_term lt on lt.lease_id=l.lease_id and lt.IsDeleted=0
LEFT JOIN la_tlu_lease_type lty on lty.lease_type_id=l.lease_type_id and lty.IsDeleted=0
LEFT JOIN la_tlu_lease_status lst on lst.status_id= l.status_id
LEFT JOIN la_tbl_area_group lag on lag.lease_id=l.lease_id
LEFT JOIN fnd_tlu_unit_measure msr on msr.unit_measure_key=lag.unit_measure_key
LEFT JOIN la_tbl_area_term lat on lat.lease_id=l.lease_id and lat.isDeleted=0
LEFT JOIN la_tbl_area_allocat_term laat on laat.area_term_id=lat.area_term_id and laat.isDeleted=0
LEFT JOIN dbo.la_tbl_area_allocation laa on laa.area_allocation_term_id=laat.area_allocation_term_id and laa.isDeleted=0
LEFT JOIN vw_FND_TLU_Menu_Key MK on menu_type_id_key=2 and isActive=1 and id=laa.menu_id_key
INNER JOIN la_tbl_lease_projection lp on lp.lease_projection_id = #LEASEID.lease_projection_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
As may have already been hinted at you should be careful when using the WHERE clause with an OUTER JOIN.
The idea of the OUTER JOIN is to optionally join that table and provide access to the columns.
The JOINS will generate your set and then the WHERE clause will run to restrict your set. If you are using a condition in the WHERE clause that says one of the columns in your outer joined table must exist / equal a value then by the nature of your query you are no longer doing a LEFT JOIN since you are only retrieving rows where that join occurs.
Shorten it and copy it out as a new query in ssms or whatever you are using for testing. Use an inner join unless you want to preserve the left side set even when there is no matching lp.lease_id. Try something like
if object_id('tempdb..#leases) is not null
drop table #leases;
select distinct
l.lease_id
,l.property_id as property_id
,lp.posting_date
into #leases
from la_tbl_lease as l
inner join la_tbl_lease_projection as lp on lp.lease_id = l.lease_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
select * from #leases
drop table #leases
If this gets what you want then you can work from there and add the other left joins to the query (getting rid of the select * and 'drop table' if you copy it back into your proc). If it doesn't then look at your Boolean date logic or provide more detail for us. If you are new to sql and its procedural extensions, try using the object explorer to examine the properties of the columns you are querying, and try selecting the top 1000 * from the tables you are using to get a feel for what the data looks like when building queries. -Mike
You can try the BETWEEN operator as well
Where lp.posting_date BETWEEN laat.start_date AND laat.end_date
Reasoning: You can have issues wheres there is no matching values in a table. In that instance on a left join the table will populate with null. Using the 'BETWEEN' operator insures that all returns have a value that is between the range and no nulls can slip in.
As it turns out, the problem was easier to solve and it was in a different place in the stored procedure. All I had to do was add one line to one of the cursors to include area term allocations by date.

INNER JOIN should happen only when a condition is fulfilled

I am trying to "short-circuit" an INNER JOIN, if condition isn't met.
What I tried:
I found that if a Left Join is preceded with a False clause on "ON" condition, the LEFT JOIN fails. Hence, I tried to simulate INNER JOIN with LEFT OUTER JOIN and WHERE clause got the execution plan as below:
DECLARE #a nvarchar(4) = 'All'
SELECT A.*
FROM [dbo].[your_table] A
LEFT JOIN [dbo].[your_table_2] B
ON #a <> 'All'
WHERE A.City_Code = CASE WHEN #a <> 'All'
THEN B.City_Code
ELSE A.City_Code END
This will "short-circuit" the left join and it will never occur. Execution plan is below:
But then when I tried to execute the same statement by declaring the variable as 'Al' and not 'All', I saw execution plan was still the same.
I am puzzled if Join happened in the initial step or not?
What I want:
I want to know whether the above approach is correct? Is it really short-circuiting the INNER JOIN?
I basically want the INNER JOIN to happen between the two tables only when variable is not 'All' else it should not JOIN and continue further. I have already tried by using "OR" (to short-circuit) and "IN" (to apply filter) but the performance slows down if you have too many items in IN clause.
Please help me and do tell me if I was wrong anywhere in my approach.
Sample Data:
I should get the INNER JOIN result only when variable <> 'all'
When variable = 'All', I should get table A i.e.
Note: I have simplified this query hence it appears that simple if statement can do. In actual I have 53 parameters that I need to check and run JOINS. Plus result set of one query must be joined with another i.e. I have several other JOIN conditions preceding this :)
Have you tried something like
SELECT A.*
FROM [dbo].[your_table] A
WHERE EXISTS (
SELECT 1
FROM [dbo].[your_table_2] B
WHERE A.City_Code = B.City_Code
)
OR #a = 'All'
Did you try case statement? The CASE or DECODE is usually like an IF-ELSE statement in the SQL.

Formatting Clear and readable SQL queries

I'm writing some SQL queries with several subqueries and lots of joins everywhere, both inside the subquery and the resulting table from the subquery.
We're not using views so that's out of the question.
After writing it I'm looking at it and scratching my head wondering what it's even doing cause I can't follow it.
What kind of formatting do you use to make an attempt to clean up such a mess? Indents perhaps?
With large queries I tend to rely a lot on named result sets using WITH. This allows to define the result set beforehand and it makes the main query simpler. Named results sets may help to make the query plan more efficient as well e.g. postgres stores the result set in a temporary table.
Example:
WITH
cubed_data AS (
SELECT
dimension1_id,
dimension2_id,
dimension3_id,
measure_id,
SUM(value) value
FROM
source_data
GROUP BY
CUBE(dimension1, dimension2, dimension3),
measure
),
dimension1_label AS(
SELECT
dimension1_id,
dimension1_label
FROM
labels
WHERE
object = 'dimension1'
), ...
SELECT
*
FROM
cubed_data
JOIN dimension1_label USING (dimension1_id)
JOIN dimension2_label USING (dimension2_id)
JOIN dimension3_label USING (dimension3_id)
JOIN measure_label USING (measure_id)
The example is a bit contrived but I hope it shows the increase in clarity compared to inline subqueries. Named result sets have been a great help for me when I've been preparing data for OLAP use. Named results sets are also must if you have/want to create recursive queries.
WITH works at least on current versions of Postgres, Oracle and SQL Server
Boy is this a loaded question. :) There are as many ways to do it right as there are smart people on this site. That said, here is how I keep myself sane when building complex sql statements:
select
c.customer_id
,c.customer_name
,o.order_id
,o.order_date
,o.amount_taxable
,od.order_detail_id
,p.product_name
,pt.product_type_name
from
customer c
inner join
order o
on c.customer_id = o.customer_id
inner join
order_detail od
on o.order_id = od.order_id
inner join
product p
on od.product_id = p.product_id
inner join
product_type pt
on p.product_type_id = pt.product_type_id
where
o.order_date between '1/1/2011' and '1/5/2011'
and
(
pt.product_type_name = 'toys'
or
pt.product_type_name like '%kids%'
)
order by
o.order_date
,pt.product_type_name
,p.product_name
If you're interested, I can post/send layouts for inserts, updates and deletes as well as correlated subqueries and complex join predicates.
Does this answer your question?
Generally, people break lines on reserved words, and indent any sub-queries:
SELECT *
FROM tablename
WHERE value in
(SELECT *
FROM tablename2
WHERE condition)
ORDER BY column
In general, I follow a simple hierarchical set of formatting rules. Basically, keywords such as SELECT, FROM, ORDER BY all go on their own line. Each field goes on its own line (in a recursive fashion)
SELECT
F.FIELD1,
F.FIELD2,
F.FIELD3
FROM
FOO F
WHERE
F.FIELD4 IN
(
SELECT
B.BAR
FROM
BAR B
WHERE
B.TYPE = 4
AND B.OTHER = 7
)
Table aliases and simple consistency will get you a long, long way
What looks decent is breaking lines on main keywords SELECT, FROM, WHERE (etc..).
Joins can be trickier, indenting the ON part of joins brings out the important part of it to the front.
Breaking complicated logical expressions (joins and where conditions both) on the same level also helps.
Indenting logically the same level of statement (subqueries, opening brackets, etc)
Capitalize all keywords and standard functions.
Really complex SQL will not shy away from comments - although typically you find these in SQL scripts not dynamic SQL.
EDIT example:
SELECT a.name, SUM(b.tax)
FROM db_prefix_registered_users a
INNER JOIN db_prefix_transactions b
ON a.id = b.user_id
LEFT JOIN db_countries
ON b.paid_from_country_id = c.id
WHERE a.type IN (1, 2, 7) AND
b.date < (SELECT MAX(date)
FROM audit) AND
c.country = 'CH'
So, at the end to sum it up - consistency matters the most.
I like to use something like:
SELECT col1,
col2,
...
FROM
MyTable as T1
INNER JOIN
MyOtherTable as T2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
LEFT JOIN
(
SELECT 1,2,3
FROM Someothertable
WHERE somestuff = someotherstuff
) as T3
ON t1.field = t3.field
The only true and right way to format SQL is:
SELECT t.mycolumn AS column1
,t.othercolumn AS column2
,SUM(t.tweedledum) AS column3
FROM table1 t
,(SELECT u.anothercol
,u.memaw /*this is a comment*/
FROM table2 u
,anothertable x
WHERE u.bla = :b1 /*the bla value*/
AND x.uniquecol = :b2 /*the widget id*/
) v
WHERE t.tweedledee = v.anothercol
AND t.hohum = v.memaw
GROUP BY t.mycolumn
,t.othercolumn
HAVING COUNT(*) > 1
;
;)
Seriously though, I like to use WITH clauses (as already suggested) to tame very complicated SQL queries.
Put it in a view so it's easier to visualize, maybe keep a screenshot as part of the documentation. You don't have to save the view or use it for any other purpose.
Indenting certainly but you can also split the subqueries up with comments, make your alias names something really meaningful and specify which subquery they refer to e.g. innerCustomer, outerCustomer.
Common Table Expressions can really help in some cases to break up a query into meaningful sections.
An age-old question with a thousand opinions and no one right answer, and one of my favorites. Here's my two cents.
With regards to subqueries, lately I've found it easier to follow what's going on with "extreme" indenting and adding comments like so:
SELECT mt.Col1, mt.Col2, subQ.Dollars
from MyTable1 mt
inner join (-- Get the dollar total for each SubCol
select SubCol, sum(Dollars) Dollars
from MyTable2
group by SubCol) subQ
on subQ.SubCol = mt.Col1
order by mt.Col2
As for the other cent, I only use upper case on the first word. With pages of run-on queries, it makes it a bit easier to pick out when a new one starts.
Your mileage will, of course, vary.
Wow, alot of responses here, but one thing I haven't seen in many is COMMENTS! I tend to add a lot of comments throughout, especially with large SQL statements. Formatting is important, but well placed and meaningful comments are extremely important, not just for you but the poor soul who needs to maintain the code ;)