SQL joins instead of nested query - sql

select baseurl from tmp_page_tbl
where baseurl NOT IN ( select baseurl from page_lookup )
How do I write this query using joins instead of nesting it.
The idea is to get the baseurls from tmp tbl which do not exist in the page_lookup table

select baseurl
from tmp_page_tbl t
left outer join page_lookup p on t.baseurl = p.baseurl
where p.baseurl IS NULL

You could rewrite using joins like below:
SELECT baseurl from tmp_page_tbl as t
LEFT JOIN page_lookup as pl
ON t.baseurl=pl.baseurl
where pl.baseurl IS NULL
I'm not sure I would though unless you have a compelling reason. Below are a few links worth looking at:
http://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/
http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

If you aren't selecting most of the table and you've index on page_lookup.baseUrl, then not exists should be most efficient.
select baseurl from tmp_page_tbl tmp
where not exists ( select 1 from page_lookup WHERE baseurl = tmp.baseurl );
On some RDBMS (Oracle DB and Postgres) you can use MINUS (or EXCEPT in Postgres). That is in some cases very efficient.

Related

using an alias of a function in a sql condition

I have something like this:
SELECT
cansa1.NAME,
mod(cansa1.PRODUCT_ID, 1000000) prodIdHash
FROM CANSA_TABLE cansa1
INNER JOIN CUSER_TABLE cuser1 ON cansa1.PRODUCT_ID = cuser1.PRODUCT_ID
AND mod(cansa1.PRODUCT_ID, 1000000) = cuser1.PRODUCT_HASH
This query is working, but I want replace the second occurrence (in the inner join) of the mod() function, to avoid execute it two times. I tried replace it by the alias in the select clause but not works. Any idea of that I can use to make this query don't repeat the mod() function?
Sorry by my english
Don't worry about executing it twice, the SQL engine will optimize the query and will decide whether the function value is cached or it executes twice and can end up re-writing the query so that what is executed has a different structure than the written query because it has determined that it would be more efficient.
If you really want to try to rewrite it then:
SELECT c.NAME,
c.prodIdHash
FROM (
SELECT name,
mod(PRODUCT_ID, 1000000) As prodIdHash
FROM CANSA_TABLE
) c
INNER JOIN CUSER_TABLE u
ON ( c.PRODUCT_ID = u.PRODUCT_ID
AND c.prodIdHash = u.PRODUCT_HASH )
However, the SQL engine may rewrite the query and push the function to the outer scope so you may need a seemingly irrelevant filter condition to materialize the inner query and force the calculation not to be rewritten:
SELECT c.NAME,
c.prodIdHash
FROM (
SELECT name,
mod(PRODUCT_ID, 1000000) As prodIdHash
FROM CANSA_TABLE
WHERE ROWNUM > 0
) c
INNER JOIN CUSER_TABLE u
ON ( c.PRODUCT_ID = u.PRODUCT_ID
AND c.prodIdHash = u.PRODUCT_HASH )
However, this really seems like a case of premature optimisation. You should check if there is actually a problem first before you try and apply an optimisation that probably is not needed.
You can use a derived table (i.e. a subquery in the FROM clause):
SELECT dt.NAME, dt.prodIdHash
FROM
(SELECT
cansa1.NAME,
mod(cansa1.PRODUCT_ID, 1000000) prodIdHash
FROM CANSA_TABLE cansa1) dt
INNER JOIN CUSER_TABLE cuser1 ON dt.PRODUCT_ID = cuser1.PRODUCT_ID
AND dt.prodIdHash = cuser1.PRODUCT_HASH

where clause in nested isNull query giving unexpected result

The below query returns around 200000 results.
The working of nested where clause in this query is not very clear i.e where is it coming in the picture ?
If I comment out the where clause inside the isNull then I get 0 results, which is fine and expected as the Max(invoiceID) is not null after join.
select * from CustomerServices where isNull((
SELECT MAX(invoiceid)
FROM Invoices
LEFT JOIN InvoicesHistory
ON InvoicesHistory.ServiceHistoryID = Invoices.ServiceHistoryID
WHERE serviceID = Invoices.serviceID
),0)=0
Please let me know if you want me to add more information.
I think you just want not exists:
select cs.*
from CustomerServices cs
where not exists (select 1
from Invoices i left join
InvoicesHistory ih
on ih.ServiceHistoryID = i.ServiceHistoryID
where cs.serviceID = i.serviceID
);
In your case, the nested WHERE clause is not doing anything. It is equivalent to:
Invoices.serviceID = Invoices.serviceID
by the scoping rules in SQL. In all likelihood, this is intended to be a correlation clause and hence needs a qualified column name.

SQL WITH clause doesn't work

I'm trying to execute seemingly simple request contains WITH clause:
WITH sub AS (SELECT url FROM site WHERE id = 15)
SELECT * FROM search_result WHERE url = sub.url
But it doesn't work. I get
ERROR: missing FROM-clause entry for table "sub"
What's the matter?
Table expressions need to be used like tables. You're trying to use the value of sub as a scalar.
Try this (forgive me, Postgres is not my first SQL dialect).
WITH sub AS (SELECT url FROM site WHERE id = 15)
SELECT * FROM sub
INNER JOIN
search_result
ON
sub.url = search_result.url
EDIT, alternatively, you could just skip the WITH clause and go with:-
SELECT * FROM
site
INNER JOIN
search_result
ON
site.url = search_result.url
WHERE
site.id = 15
Don't use a CTE at all for this simple case.
Unlike you seem to be expecting, the following simple query without a CTE will be slightly faster:
SELECT r.*
FROM search_result r
JOIN site s USING (url)
WHERE s.id = 15;
Test with EXPLAIN ANALYZE to verify.
CTEs introduce an optimization barrier. They have many very good uses, but they won't make simple queries faster.
Here is a thread on pgsql-performance that gives you more details as to why that is.
That's not the correct way to use a CTE:
With sub as (
SELECT url
FROM site
WHERE id = 15
)
SELECT *
FROM Search_Result SR
JOIN sub ON SR.url = sub.Url
You can just as easily do an inner join:
SELECT search_result .*
FROM
search_result
INNER JOIN
(SELECT url FROM site WHERE id = 15) as st
ON
search_result.url = st.url
This does the filtering so that you are joining on a smaller set than if you did the where clause outside of the filtering. This may not matter in your case, but it is something to consider.

Best way to tune NOT EXISTS in SQL queries

I am trying to tune SQLs which have NOT EXISTS clause in the queries.My database is Netezza.I tried replacing NOT EXISTS with NOT IN and looked at the query plans.Both are looking similar in execution times.Can someone help me regarding this?I am trying to tune some SQL queries.Thanks in advance.
SELECT ETL_PRCS_DT, COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
WHERE NOT EXISTS (
SELECT *
FROM DEV_AM_EDS_1..AM_STATION
WHERE D1.STN_ID = STN_ID
)
GROUP BY ETL_PRCS_DT;
You can try a JOIN:
SELECT ETL_PRCS_DT, COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
LEFT JOIN DEV_AM_EDS_1..AM_STATION TAB2 ON D1.STN_ID = TAB2.STN_ID
WHERE TAB2.STN_ID IS NULL
Try to compare the execution plans. The JOIN might produce the same you already have.
You can try a join, but you sometimes need to be careful. If the join key is not unique in the second table, then you might end up with multiple rows. The following query takes care of this:
SELECT ETL_PRCS_DT,
COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
left outer join
(
select distinct STN_ID
from DEV_AM_EDS_1..AM_STATION ams
) ams
on d1.STN_ID = ams.STN_ID
WHERE ams.STN_ID is NULL

How do I remove a nested select from this SQL statement

I have the following SQL:
SELECT * FROM Name
INNER JOIN ( SELECT 2 AS item, NameInAddress.NameID as itemID, NameInAddress.AddressID
FROM NameInAddress
INNER JOIN Address ON Address.AddressID = NameInAddress.AddressID
WHERE (Address.Country != 'UK')
) AS Items ON (Items.itemID = Name .Name ID)
I have been asked to remove the nested select and use INNER JOINS instead, as it will improve performance, but I'm struggling.
Using SQL Server 2008
Can anyone help?
Thanks!
Your query is not correct as you're using Items.itemID while it's not in the subselect
I guess this is what you meant:
SELECT Name.*
FROM Name
INNER JOIN NameInAddress
ON Name.NameID = NameInAddress.NameID
INNER JOIN Address
ON Address.AddressID = NameInAddress.AddressID
WHERE (Address.Country != 'UK')
EDIT: The exact translation of your query would start with a SELECT Name.*, 2 as Item, NameInAddress.NameID, NameInAddress.AddressID though
It is one of those long-lived myths that nested selects are slower than joins. It depends completely on what the nested select says. SQL is just a declarative language to tell what you want done, the database will transform it into completely different things. Both MSSQL and Oracle (and I suspect other major engines as well) are perfectly able to transform correlated subqueries and nested views into joins if it is beneficial (unless you do really complex things which would be very hard, if possible, to describe with normal joins.
SELECT 2 AS Item, *
FROM Name
INNER JOIN NameInAddress
ON Name.NameID = NameInAddress.NameID
INNER JOIN Address
ON Address.AddressID = NameInAddress.AddressID
WHERE Address.Country != 'UK'
PS: Don't use "*". This will increase performance too. :)