optimization select distinct query - sql

How can i optimize this query,
how to rewrite a request through exists:
select DISTINCT
p.SBJ_ID,
nvl(l.ATTR,c.CODE) as ATTR,
l.VALUE
from T_TRFPRMLST p,
T_CMSATTR_LINK l,
C_SBJCONCEPT c
where l.SBJ_ID(+) = p.SBJ_ID
and p.sbj_id = c.ID;

Please use ANSI style join syntax, first of all.
Now, Coming to your question, according to my knowledge NVL perform worse when working with large data sets.
So how can we achieve the same functionality? -- We can use DECODE or CASE WHEN.
Among these two also, CASE WHEN will be better when it comes to performance.
Compare the execution plan of query mentioned in your question and the execution plan of the following query and you will definitely find the difference.
SELECT DISTINCT
P.SBJ_ID,
CASE
WHEN L.ATTR IS NOT NULL THEN L.ATTR
ELSE C.CODE
END AS ATTR,
L.VALUE
FROM
T_TRFPRMLST P
JOIN C_SBJCONCEPT C ON ( P.SBJ_ID = C.ID )
LEFT JOIN T_CMSATTR_LINK L ON ( P.SBJ_ID = L.SBJ_ID);
Please make sure that PKs and FKs are properly created and proper indexes are also available as indexes are created mainly for performance.
Cheers!!

You can't use exists here, because you are using more than 1 table columns in select.
And I understand you should use standard join keyword when joining tables.
select DISTINCT
p.SBJ_ID,
nvl(l.ATTR,c.CODE) as ATTR,
l.VALUE
from T_TRFPRMLST p
join T_CMSATTR_LINK l
on l.SBJ_ID = p.SBJ_ID
join C_SBJCONCEPT c
on p.sbj_id = c.ID;
I might be wrong in left/right as I am not much frequent with (+).
Hope this helps.

Related

Sub query select statment vs inner join

I'm confused about these two statements who's faster and more common to use and best for memory
select p.id, p.name, w.id, w.name
from person p
inner join work w on w.id = p.wid
where p.id in (somenumbers)
vs
select p.id, p.name, (select id from work where id=p.wid) , (select name from work where id=p.wid)
from person p
where p.id in (somenumbers)
The whole idea of this is that if I have I huge database and I want to make inner join it will take memory and less performance to johin work table and person table but the sub query select statments it will only select one statment at the time so which is the best here
First, the two queries are not the same. The first filters out any rows that have no matching rows in work.
The equivalent first query uses a left join:
select p.id, p.name, w.id, w.name
from person p left join
work w
on w.id = p.wid
where p.id in (somenumbers);
Then, the second query can be simplified to:
select p.id, p.name, p.wid,
(select name from work where w.id = p.wid)
from person p
where p.id in (somenumbers);
There is no reason to look up the id in work when it is already present in person.
If you want optimized queries, then you want indexes on person(id, wid, name) and work(id, name).
With these indexes, the two queries should have basically the same performance. The subquery will use the index on work for fetching the rows from work and the where clause will use the index on person. Either query should be fast and scalable.
The subqueries in your second example will execute once for every row, which will perform badly. That said, some optimizers may be able to convert it to a join for you - YMMV.
A good rule to follow in general is: much prefer joins to subqueries.
joins give better performance as comparison with sub-query .if there is join on Int column or have index on join column gives best performance .
select p.id, p.name, w.id, w.name
from person p
inner join work w on w.id = p.wid
where p.id in (somenumbers)
It really depends on how you want to optimaze the query (includie but not limited to add/removing/reordering the index),
I found the setup which makes join soars might let subquery suffer, the opposite may also be true. Thus there is not that much point to compare them with the same setup.
I choose to use and optimize with join. In my experince join at its best condition setup, rarely loses to subquery, but a lot eaiser to read.
When the vendor stuff an extreme load of queries with subqueries to the system. Unless the performance start to crawl, due to my other work's query optimization, it simply doesn't worth the effort to change them.

Recommended way to write database query

This anything wrong with this database query
select
abstract_author.name,
title,
affiliation_number,
af_name
from
abs_affiliation_name,
abstract_affiliation,
abstracts_item,
abstract_author,
authors_abstract
where
abstracts_item._id = authors_abstract.abstractsitem_id and
abstract_author._id = authors_abstract.abstractauthor_id and
abstract_affiliation._id = abstract_author._id and
abs_affiliation_name._id = abstracts_item._id
I'm getting my expected result. But, someone said It's not recommended way or a good practice. Would you please tell me what is recommended way to write my query(I mean which have joins) ?
It's not recommended to do your joins in the where clause. Instead it's better to use explicit JOIN conditions. So your query would be
SELECT
abstract_author.name
, title
, affiliation_number
, af_name
FROM abstracts_item
JOIN authors_abstract ON abstracts_item._id = authors_abstract.abstractsitem_id
JOIN abstract_author ON abtract_author.id = authors_abstract.abstractauthor_id
JOIN abstract_affiliation ON abstract_affiliation._id = abstract_author._id
JOIN abs_affiliation_name ON abs_affiliation_name._id = abstracts_item.id
I'd highly recommend you using aliases on your tables though as you'll avoid confusion. In this example, if you introduced a title field to one of the other tables, the query would most likely break as it would know which table to target. I'd do something like
SELECT
au.name
, af.title
, af.affiliation_number
, af.af_name
FROM abstracts_item ai
JOIN authors_abstract aa ON ai._id = aa.abstractsitem_id
JOIN abstract_author au ON au.id = aa.abstractauthor_id
JOIN abstract_affiliation af ON af._id = au._id
JOIN abs_affiliation_name an ON an._id = ai.id
You'll need to change the aliases in the select bit though as I've guessed which tables they're from
I recommend you to use joins and aliases as below
select aath.name, /*alias*/title, /*alias*/affiliation_number,/*alias*/af_name
from abs_affiliation_name aan
join abstracts_item ai on aan._id = ai._id
join abstract_affiliation aa on aa._id = aath._id
join authors_abstract aAbs on ai._id = aAbs.abstractsitem_id
join abstract_author aath on aath._id = aAbs.abstractauthor_id
No there is nothing wrong with your query. It is personal preference, the ANSI-89 impicit joins you have used are however over 20 years out of date, they were replaced in ANSI-92 with explicit JOIN syntax.
Aaron Bertrand has written a compelling article on why in most instances it is prefereable to use the newer join syntax, and the potential pitfalls of using ANSI-89 joins. In most cases the execution plan for both methods will be exactly the same (assuming you haven't accidentally cross joined with implict joins). It is worth noting though that on occassion Oracle will produce different execution plans and the ANSI-89 join syntax can produce the more efficient of the two. (I have seen an example of this posted in response to one of my answers but I can't find it at the moment, you'll have to take my word for it for now). I would not however use this as a reason to always use ANSI-89 joins, another key reason to use the ANSI-92 join syntax is that outer joins can be achieved with ANSI syntax, whereas the outer join syntax on implicit joins varies by DBMS.
e.g. on Oracle
SELECT *
FROM a, b
WHERE a.id = b.id(+)
On SQL-Server (deprecated)
SELECT *
FROM a, b
WHERE a.id *= b.id
However, the following works on both:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
If you always use explicit joins you end up with more consistent (and in my opinion more readable) queries.

SQL Server : does order of full outer join matter?

I have 4 full-outer joins in my query and its really slow, So does the order of FULL OUTER JOIN make a difference in performance / result ?
FULL OUTER JOIN = ⋈
Then,
I have a situation : A ⋈ B ⋈ C ⋈ D
All joins occur on a key common to all k contained in all A,B,C,D
Then:
Will changing the order of ⋈ joins make a difference to performance ?
Will changing the order of ⋈ change the result ?
I feel that it should not affect the result, but will it affect the performance or not I am not sure !
Update:
Will SQL Server automatically rearrange the joins for better performance assuming the result set will be independent of the order ?
No, rearranging the JOIN orders should not affect the performance. MSSQL (as with other DBMS) has a query optimizer whose job it is to find the most efficient query plan for any given query. Generally, these do a pretty good job - so you're unlikely to beat the optimizer easily.
That said, they do get it wrong occasionally. That's where reading an execution plan comes into play. You can add JOIN hints to tell MSSQL how to join your tables (at which point, ordering does matter). You'd generally order from smallest to largest table (though, with a FULL JOIN, it's not likely to matter very much) and follow the rules of thumb for join types.
Since you're doing FULL JOINS, you're basically reading the entirety of 4 tables off disk. That's likely to be very expensive. You may want to re-examine the problem, and see if it can be accomplished in a different way.
Will changing the order of ⋈ change the result ?
No, the order of the FULL JOIN does not matter, the result will be the same. Notice however, that you can't use something like this (the following may give different results depending on the order of joins):
SELECT
COALESCE(a.id, b.id, c.id, d.id) AS id, --- Key columns used in FULL JOIN
a.*, b.*, c.*, d.* --- other columns
FROM a
FULL JOIN b
ON b.id = a.id
FULL JOIN c
ON c.id = a.id
FULL JOIN d
ON d.id = a.id ;
You have to use something like this (no difference in results whatever the order of joins):
SELECT
COALESCE(a.id, b.id, c.id, d.id) AS id,
a.*, b.*, c.*, d.*
FROM a
FULL JOIN b
ON b.id = a.id
FULL JOIN c
ON c.id = COALESCE(a.id, b.id)
FULL JOIN d
ON d.id = COALESCE(a.id, b.id, c.id) ;
Will changing the order of ⋈ joins make a difference to performance?
Taking into consideration that the second and third joins have to be done on the COALESCE() of the columns and not the columns themselves, I think only testing with large enough tables will show if the indexes can be used effectively.
Changing the order of a Full outer join shouldn't affect performance or results. The only thing that will be affected based on order of a Full Outer Join is the default order of the columns produced if using a SELECT *. You may be having performance issues simply from trying to do multiple joins with large tables. If there is no where clause to limit the tables, you could be going through hundreds of thousands of results.

Formatting Clear and readable SQL queries

I'm writing some SQL queries with several subqueries and lots of joins everywhere, both inside the subquery and the resulting table from the subquery.
We're not using views so that's out of the question.
After writing it I'm looking at it and scratching my head wondering what it's even doing cause I can't follow it.
What kind of formatting do you use to make an attempt to clean up such a mess? Indents perhaps?
With large queries I tend to rely a lot on named result sets using WITH. This allows to define the result set beforehand and it makes the main query simpler. Named results sets may help to make the query plan more efficient as well e.g. postgres stores the result set in a temporary table.
Example:
WITH
cubed_data AS (
SELECT
dimension1_id,
dimension2_id,
dimension3_id,
measure_id,
SUM(value) value
FROM
source_data
GROUP BY
CUBE(dimension1, dimension2, dimension3),
measure
),
dimension1_label AS(
SELECT
dimension1_id,
dimension1_label
FROM
labels
WHERE
object = 'dimension1'
), ...
SELECT
*
FROM
cubed_data
JOIN dimension1_label USING (dimension1_id)
JOIN dimension2_label USING (dimension2_id)
JOIN dimension3_label USING (dimension3_id)
JOIN measure_label USING (measure_id)
The example is a bit contrived but I hope it shows the increase in clarity compared to inline subqueries. Named result sets have been a great help for me when I've been preparing data for OLAP use. Named results sets are also must if you have/want to create recursive queries.
WITH works at least on current versions of Postgres, Oracle and SQL Server
Boy is this a loaded question. :) There are as many ways to do it right as there are smart people on this site. That said, here is how I keep myself sane when building complex sql statements:
select
c.customer_id
,c.customer_name
,o.order_id
,o.order_date
,o.amount_taxable
,od.order_detail_id
,p.product_name
,pt.product_type_name
from
customer c
inner join
order o
on c.customer_id = o.customer_id
inner join
order_detail od
on o.order_id = od.order_id
inner join
product p
on od.product_id = p.product_id
inner join
product_type pt
on p.product_type_id = pt.product_type_id
where
o.order_date between '1/1/2011' and '1/5/2011'
and
(
pt.product_type_name = 'toys'
or
pt.product_type_name like '%kids%'
)
order by
o.order_date
,pt.product_type_name
,p.product_name
If you're interested, I can post/send layouts for inserts, updates and deletes as well as correlated subqueries and complex join predicates.
Does this answer your question?
Generally, people break lines on reserved words, and indent any sub-queries:
SELECT *
FROM tablename
WHERE value in
(SELECT *
FROM tablename2
WHERE condition)
ORDER BY column
In general, I follow a simple hierarchical set of formatting rules. Basically, keywords such as SELECT, FROM, ORDER BY all go on their own line. Each field goes on its own line (in a recursive fashion)
SELECT
F.FIELD1,
F.FIELD2,
F.FIELD3
FROM
FOO F
WHERE
F.FIELD4 IN
(
SELECT
B.BAR
FROM
BAR B
WHERE
B.TYPE = 4
AND B.OTHER = 7
)
Table aliases and simple consistency will get you a long, long way
What looks decent is breaking lines on main keywords SELECT, FROM, WHERE (etc..).
Joins can be trickier, indenting the ON part of joins brings out the important part of it to the front.
Breaking complicated logical expressions (joins and where conditions both) on the same level also helps.
Indenting logically the same level of statement (subqueries, opening brackets, etc)
Capitalize all keywords and standard functions.
Really complex SQL will not shy away from comments - although typically you find these in SQL scripts not dynamic SQL.
EDIT example:
SELECT a.name, SUM(b.tax)
FROM db_prefix_registered_users a
INNER JOIN db_prefix_transactions b
ON a.id = b.user_id
LEFT JOIN db_countries
ON b.paid_from_country_id = c.id
WHERE a.type IN (1, 2, 7) AND
b.date < (SELECT MAX(date)
FROM audit) AND
c.country = 'CH'
So, at the end to sum it up - consistency matters the most.
I like to use something like:
SELECT col1,
col2,
...
FROM
MyTable as T1
INNER JOIN
MyOtherTable as T2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
LEFT JOIN
(
SELECT 1,2,3
FROM Someothertable
WHERE somestuff = someotherstuff
) as T3
ON t1.field = t3.field
The only true and right way to format SQL is:
SELECT t.mycolumn AS column1
,t.othercolumn AS column2
,SUM(t.tweedledum) AS column3
FROM table1 t
,(SELECT u.anothercol
,u.memaw /*this is a comment*/
FROM table2 u
,anothertable x
WHERE u.bla = :b1 /*the bla value*/
AND x.uniquecol = :b2 /*the widget id*/
) v
WHERE t.tweedledee = v.anothercol
AND t.hohum = v.memaw
GROUP BY t.mycolumn
,t.othercolumn
HAVING COUNT(*) > 1
;
;)
Seriously though, I like to use WITH clauses (as already suggested) to tame very complicated SQL queries.
Put it in a view so it's easier to visualize, maybe keep a screenshot as part of the documentation. You don't have to save the view or use it for any other purpose.
Indenting certainly but you can also split the subqueries up with comments, make your alias names something really meaningful and specify which subquery they refer to e.g. innerCustomer, outerCustomer.
Common Table Expressions can really help in some cases to break up a query into meaningful sections.
An age-old question with a thousand opinions and no one right answer, and one of my favorites. Here's my two cents.
With regards to subqueries, lately I've found it easier to follow what's going on with "extreme" indenting and adding comments like so:
SELECT mt.Col1, mt.Col2, subQ.Dollars
from MyTable1 mt
inner join (-- Get the dollar total for each SubCol
select SubCol, sum(Dollars) Dollars
from MyTable2
group by SubCol) subQ
on subQ.SubCol = mt.Col1
order by mt.Col2
As for the other cent, I only use upper case on the first word. With pages of run-on queries, it makes it a bit easier to pick out when a new one starts.
Your mileage will, of course, vary.
Wow, alot of responses here, but one thing I haven't seen in many is COMMENTS! I tend to add a lot of comments throughout, especially with large SQL statements. Formatting is important, but well placed and meaningful comments are extremely important, not just for you but the poor soul who needs to maintain the code ;)

How do I remove a nested select from this SQL statement

I have the following SQL:
SELECT * FROM Name
INNER JOIN ( SELECT 2 AS item, NameInAddress.NameID as itemID, NameInAddress.AddressID
FROM NameInAddress
INNER JOIN Address ON Address.AddressID = NameInAddress.AddressID
WHERE (Address.Country != 'UK')
) AS Items ON (Items.itemID = Name .Name ID)
I have been asked to remove the nested select and use INNER JOINS instead, as it will improve performance, but I'm struggling.
Using SQL Server 2008
Can anyone help?
Thanks!
Your query is not correct as you're using Items.itemID while it's not in the subselect
I guess this is what you meant:
SELECT Name.*
FROM Name
INNER JOIN NameInAddress
ON Name.NameID = NameInAddress.NameID
INNER JOIN Address
ON Address.AddressID = NameInAddress.AddressID
WHERE (Address.Country != 'UK')
EDIT: The exact translation of your query would start with a SELECT Name.*, 2 as Item, NameInAddress.NameID, NameInAddress.AddressID though
It is one of those long-lived myths that nested selects are slower than joins. It depends completely on what the nested select says. SQL is just a declarative language to tell what you want done, the database will transform it into completely different things. Both MSSQL and Oracle (and I suspect other major engines as well) are perfectly able to transform correlated subqueries and nested views into joins if it is beneficial (unless you do really complex things which would be very hard, if possible, to describe with normal joins.
SELECT 2 AS Item, *
FROM Name
INNER JOIN NameInAddress
ON Name.NameID = NameInAddress.NameID
INNER JOIN Address
ON Address.AddressID = NameInAddress.AddressID
WHERE Address.Country != 'UK'
PS: Don't use "*". This will increase performance too. :)