Caching inner query in SQL - sql

How can I optimize below SQL (simplified view of my complex query.) Ideally, I should be able to cache the first SQL result (order ids) and do some kind of projection on OrderLine table in the second query.
Any pointers will be helpful.
Restrictions - I cannot create temporary tables, cursors or procedures / functions. I am connecting to Oracle 10g.
SELECT 'Object_id', id, mod_id FROM
(
(Select 'Order_id', order_id, mod_id FROM Orders)
UNION
(select 'Order_line_id', order_line_id, mod_id FROM OrderLine
WHERE order_id IN (Select order_id FROM Orders)
)
)

The optimizer is responsible for optimizing your query; relax and let it do its stuff.
Any explicit caching you attempt will involve things like temporary tables (which you've ruled out), so you can only make the common query more explicit, perhaps, by using a CTE (common table expression, aka WITH clause) to name the common sub-query. But the optimizer might well process things the same way regardless.
You can replace the IN clause with a JOIN; that will likely be faster. Again, the optimizer might do that anyway. However, that's not a caching operation; it is a standard query rewrite.

You can design your queries based on following example:
WITH CTE AS
(
Select 'Order_id' Descr, order_id, mod_id FROM Orders
)
, CTE2 AS
(
SELECT * FROM CTE
UNION
SELECT 'Order_line_id' Descr, order_line_id, mod_id FROM OrderLine ol
WHERE EXISTS (Select order_id FROM CTE WHERE order_id = ol.order_id)
)
SELECT * FROM CTE2;

Select order_id, mod_id
from orders o
inner join orderline ol
on o.order_id = ol.order_line_id

You might be able to remove this line:
WHERE order_id IN (Select order_id FROM Orders)
If your database has integrity, there are no orphans allowed and this line doesn't filter anything.

with myOrders as (
select order_id,
mod_id
from Orders
)
select 'order_id' obj_type,
order_id,
mod_id
from myOrders
union all
select 'order_line_id' obj_type,
order_line_id,
mod_id
from OrderLine
join myOrders
on OrderLine.order_id = myOrders.order_id;

You could try something like the following...
SELECT DISTINCT 'Object_id', id, mod_id
FROM
(
SELECT
CASE
WHEN CROSSNUM = 1 THEN 'Order_ID'
ELSE 'Order_Line_ID'
END AS Object_ID,
CASE
WHEN CROSSNUM = 1 THEN order_id
ELSE order_line_id
END AS id,
CASE
WHEN CROSSNUM = 1 THEN Orders.mod_id
ELSE OrderLine.mod_id
END AS mod_id
FROM
(Select 'Order_id', order_id, mod_id FROM Orders) Orders
LEFT JOIN (select 'Order_line_id', order_line_id, mod_id FROM OrderLine) OrderLine
ON Orders.Order_id = OrderLine.Order_Id
CROSS JOIN
(
SELECT 1 AS CROSSNUM FROM DUAL
UNION ALL
SELECT 2 AS CROSSNUM FROM DUAL
) X
WHERE
NOT (CROSSNUM = 2 AND order_line_id IS NULL)
)

Related

All joined subquery results return null

I am trying to get all customers with their latest payment transaction, including customers without any transaction:
SELECT c.customer_id, c.phone_number, c.email
, p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN (
SELECT customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions
ORDER BY payment_transaction_id DESC
LIMIT 1
) p
ON c.customer_id = p.customer_id
The above query returns NULL for p.transaction_no, p.amount, p.transaciton_datetime in every row. But I can make sure that there are transactions made by customers in tbl_payment_transactions.
You want the subquery to be run once per each different row of the driving table tbl_customers. This is called a lateral subquery and takes the form:
SELECT
c.customer_id, c.phone_number, c.email,
p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN LATERAL (
SELECT customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions t
WHERE c.customer_id = t.customer_id
ORDER BY payment_transaction_id DESC
LIMIT 1
) p
ON true
The Impaler provided the correct form with a LATERAL subquery.
Alternatively, you can use DISTINCT ON in a subquery and a plain LEFT JOIN.
Performance of the latter can be better while retrieving all (or most) customers, and if there are only few transactions per customer and/or you don't have a multicolumn index on (customer_id, payment_transaction_id) or (customer_id, payment_transaction_id DESC):
SELECT c.customer_id, c.phone_number, c.email
, p.transaction_no, p.amount, p.transaciton_datetime
FROM tbl_customers c
LEFT JOIN (
SELECT DISTINCT ON (customer_id)
customer_id, transaction_no, amount, transaciton_datetime
FROM tbl_payment_transactions
ORDER BY customer_id, payment_transaction_id DESC
) p USING (customer_id);
About performance aspects:
Optimize GROUP BY query to retrieve latest row per user
Select first row in each GROUP BY group?

Can anybody tell me which one will perform better with millions of rows of data in SQL

In SQL which one is good:
Common Table Exasperation(CTE)
Temp table
Variable table
When we have 10000000 records in our table and need to sub-query offetch record LIKE :-
WITH cteData AS
(
SELECT
Product_Id, Variant_Id, Variant_Name, Unit_Price,
(SELECT GST FROM Product_Details P with(nolock)
WHERE V.Product_Id = P.Product_Id) AS GST
FROM
Variant_Details V with(nolock)
)
SELECT
Product_Id, Variant_Id, Variant_Name,
SUM(Unit_Price * GST) AS Variant_Total_Price,
(SELECT SUM(C.Unit_Price * C.GST) FROM cteData C
WHERE CD.Product_Id = C.Product_Id) AS Product_Total_Price
FROM
cteData CD
GROUP BY
Product_Id, Variant_Id, Variant_Name
or:
SELECT
Product_Id, Variant_Id, Variant_Name, Unit_Price,
(SELECT GST FROM Product_Details P with(nolock)
WHERE V.Product_Id = P.Product_Id) AS GST
INTO
#tempData
FROM
Variant_Details V with(nolock)
SELECT
Product_Id, Variant_Id, Variant_Name,
SUM(Unit_Price * GST) AS Variant_Total_Price,
(SELECT SUM(C.Unit_Price * C.GST) FROM #tempData C
WHERE CD.Product_Id = C.Product_Id) AS Product_Total_Price
FROM
#tempData CD
GROUP BY
Product_Id, Variant_Id, Variant_Name
In both cases - which one is better when there are millions of records in the table?
As with all performance questions, you should probably try on your database with your hardware and your data.
That said, I would write the query like this:
SELECT v.Product_Id, v.Variant_Id, v.Variant_Name,
SUM(v.Unit_Price * p.GST) as variant_total_price,
SUM(SUM(v.Unit_Price * p.GST)) OVER (PARTITION BY v.product_id) as variant_total_price,
FROM Variant_Details V LEFT JOIN
Product_Details P
ON V.Product_Id = P.Product_Id
GROUP BY Product_Id, Variant_Id, Variant_Name;
With the right indexes, I would expect this to be faster than other alternatives.
Consider using joins and window functions rather than CTE and subqueries. This should do what you want, and perform better than both solutions you are looking to compare:
select v.product_id, v.variant_id, v.variant_name, v.unit_price, pd.gst
sum(pd.unit_price * pd.gst) variant_total_price,
sum(sum(pd.unit_price * pd.gst)) over(partition by v.product_id) product_total_price
from variant_details v
left join product_details pd on pd.product_id = v.product_id
group by v.product_id, v.variant_id, v.variant_name
I can see that you have also used group by in your query. When considering millions of records group by will definitely slow down the performance.
I suggest a solution by using Outer Apply. Here is the code for it:
SELECT v.Product_Id, v.Variant_Id, v.Variant_Name,
val.variant_total_price,
val.variant_total_price OVER (PARTITION BY v.product_id) as product_total_price,
FROM Variant_Details v
OUTER APPLY
(
SELECT SUM(v.Unit_Price * p.GST) as variant_total_price
FROM Product_Details p where v.Product_Id = p.Product_Id
) val

use IN statement within ID column and retrieve count(*)?

I have the following SQL command:
SELECT * from products
WHERE id IN (
SELECT product_id, count(*)
FROM account_products
GROUP BY product_id
)
obviously it doesnt retrieve any data, because the internal query retrieves two columns (product_id and count).
But I need the count too, because I'm gonna use it to make some math later.
How can I use this IN query using the count(*) too?
Thanks!
Join them:
select products.*, t.product_count
from products
join (
SELECT product_id, count(*) as product_count
FROM account_products
GROUP BY product_id
) as t on t.product_id = products.id
select products.*, count( account_products.product_id)
from products join account_products on
products.product_id = products.id
group by products.* (obviously all products-fields)
Or with a subquery in the select clause, when your DBMS supports it:
SELECT
products.*,
(SELECT count(*) FROM account_products ap WHERE ap.product_id = products.id) as "Number of Accounts"
from products
This might work for you:
WITH MyAccountProducts AS
(
SELECT product_id, count(*) AS CountOfAccountProducts
FROM account_products
GROUP BY product_id
)
SELECT p.*,ap.CountOfAccountProducts
from products AS p
INNER JOIN MyAccountProducts AS ap ON p.id=ap.product_id

SQL Select Statement with nested group by clause

I have 2 tables. I need to select the column name and a calculated field from Invoices called balance_due.
The result of the query should be the name and their balance due from all of their records combined.
Thanks for any help.
SELECT v.vendor_name, i.totalbalance
FROM Vendors as v
INNER JOIN (
SELECT vendor_id, sum(invoice_total-payment_total) as totalbalance
FROM invoices
GROUP BY vendor_id
) as i on i.vendor_id = v.vendor_id
Or there is another syntax:
;With i As
(
SELECT vendor_id, sum(invoice_total-payment_total) as totalbalance
FROM invoices
WHERE payment_total is not null
GROUP BY vendor_id
)
SELECT Vendors.vendor_name, i.totalbalance
From Vendors LEFT JOIN i ON Vendors.vendor_id = i.vendor_id

How could I optimize this dynamic SQL query?

I've been having difficulties with the following query. I've been trying to optimize it and perhaps make it more readable. Let's say I have 3 tables orders_returned, orders_completed, orders_delivered with matching columns oder_id, customer_id. Depending on selected options, I might need to retrieve orders which were delivered, then returned and finally completed (same order_id occurs in all three tables) which have the same customer_id. Also I might only need to retrieve only delivered and returned orders in which case I would omit AND order_id IN (SELECT order_id FROM ORDERS_COMPLETED) from the WHERE clause. For example, Get delivered and returned orders by customers John and Tim
As of now my query looks like this:
SELECT order_id
FROM
(
SELECT order_id, customer_id
FROM ORDERS_RETURNED
UNION
SELECT order_id, customer_id
FROM ORDERS_COMPLETED
UNION
SELECT order_id, customer_id
FROM ORDERS_DELIVERED
)
WHERE
customer_id IN ('customer1', 'customer2', ...)
AND order_id IN (SELECT order_id FROM ORDERS_RETURNED)
AND order_id IN (SELECT order_id FROM ORDERS_COMPLETED)
AND order_id IN (SELECT order_id FROM ORDERS_DELIVERED)
I'm still learning SQL and would like to see if there are better options.
EDIT: I am using Oracle database. There is also Orders table which has distinct order_ids and some other irrelevant columns. It does not store customer_ids.
Also, the order might occur in one table or in two of them only, so joins, I think, are of no use here.
Since you have an Order table, I presume you are also storing the CustomerId in that table as well. Assuming so, try this:
SELECT DISTINCT O.OrderId
FROM Orders O
LEFT JOIN Orders_Completed OC ON O.OrderId = OC.OrderId
LEFT JOIN Orders_Delivered OD ON O.OrderId = OD.OrderId
LEFT JOIN Orders_Returned ORE ON O.OrderId = ORE.OrderId
WHERE O.CustomerId IN (...)
AND OD.OrderId IS NOT NULL AND ORE.OrderId IS NOT NULL AND OC.OrderId IS NULL
This particular query will return you all distinct orders where customer in (...) where the order has been delivered and returned, but not completed. Toggle the use of the IS NULL and IS NOT NULL to get your desired output.
Good luck.
You should use Joins instead of Inner/Nested queries.
Try below instead ::
SELECT A.order_id, A.customer_id FROM ORDERS_RETURNED A
INNER JOIN ORDERS_COMPLETED B ON A.order_id = B.order_id AND A.customer_id = B.customer_id
INNER JOIN ORDERS_DELIVERED C ON A.order_id = C.order_id AND A.customer_id = C.customer_id
Where A.customer_id IN ('customer1', 'customer2', ...)
you could do this with something like:
with data
as (select customer_id,
order_id,
nvl(max(case status when 'RETURNED' then 'Y' end), 'N') returned,
nvl(max(case status when 'COMPLETED' then 'Y' end), 'N') completed,
nvl(max(case status when 'DELIVERED' then 'Y' end), 'N') delivered
from (select 'RETURNED' status, order_id, customer_id
from orders_returned
union all
select 'COMPLETED' status, order_id, customer_id
from orders_completed
union all
select 'DELIVERED' status, order_id, customer_id
from orders_delivered)
group by customer_id, order_id)
select *
from data
where returned = 'Y'
and delivered = 'Y'
and customer_id in ('xx', 'xxx') ;
or
with data
as (select customer_id,
order_id,
max(returned) returned,
max(completed) completed,
max(delivered) delivered
from (select 'Y' returned, null completed, null delivered, order_id, customer_id
from orders_returned
union all
select null, 'Y', null, order_id, customer_id
from orders_completed
union all
select null, null, 'Y', order_id, customer_id
from orders_delivered)
group by customer_id, order_id)
select *
from data
where returned = 'Y'
and delivered = 'Y'
and customer_id in ('xx', 'xxx');
eg: http://sqlfiddle.com/#!4/3e2fb/2