NHibernate HQL SELECT TOP in sub query - nhibernate

Is there a way of using SetMaxResult() on a sub query? Im writing a query to return all the order items belonging to the most recent order. So I need to limit the number of records on the sub query.
The equivalent sql looks something like:
SELECT i.*
FROM tbl_Orders o
JOIN tbl_OrderItems i on i.OrderId = o.Id
WHERE
o.Id in (SELECT TOP 1 o.Id FROM tbl_Orders o orderby o.Date desc)
Im using hql specifically because criteria api doesnt let you project another domain object (Im querying on orders but want to return order items)
I know that hql doesnt accept "SELECT TOP", but if I use SetMaxResult() it will apply to the outer query, not the subquery.
Any ideas?

From NHibernate 3.2 you could use SKIP n / TAKE n in hql at the end of the query.
You query will be:
SELECT i.*
FROM tbl_Orders o
JOIN tbl_OrderItems i on i.OrderId = o.Id
WHERE
o.Id in (SELECT o.Id FROM tbl_Orders o orderby o.Date desc take 1)

Just query the orders (and use SetMaxResult) and do a 'fetch join' to ensure all orderitems for the selected orders are loaded straight away.
On the returned orders you can then access the order items without this resulting in a new SQL statement being sent to the database.

I encountered this problem too, but didn't found a solution using HQL...
Subqueries with top would be very nice, since this is faster then doing a full join first. When doing a full join first, the SQL Servers join the table first, sort all rows and select the top 30 then. With the subselect, the top 30 column of one table are taken and then joined with the other table. This is much faster!
My query with Subselect takes about 1 second, the one with the join and sort takes 15 seconds! So join wasn't an option.
I ended up with two queries, first the subselect:
IQuery q1 = session.CreateQuery("select id from table1 order by id desc");
q1.SetMaxResults(100);
And then the second query
IQuery q2 = session.CreateQuery("select colone, coltwo from table2 where table1id in (:subselect)");
q2.SetParameterList("subselect", q1.List());

Related

Making select query more efficient (subquery slows run speed)

The below query seems to take forever to run ever since I have added the subquery into it.
I originally tried to accomplish my goal by having two joins but the results were wrong.
Does anyone know the correct way to write this?
SELECT
c.cus_Name,
COUNT(o.orderHeader_id) AS Orders,
(select count(ol.orderLines_id) from orderlines ol where ol.orderLines_orderId = o.orderHeader_id) as linesOrderd,
MAX(o.orderHeader_dateCreated) AS lastOrdered,
SUM(o.orderHeader_totalSell) AS orderTotal,
SUM(o.orderHeader_currentSell) AS sellTotal
FROM
cus c
JOIN
orderheader o ON o.orderHeader_customer = c.cus_id
group by
c.cus_name
order by
orderTotal desc
Example data below
For the data you want, I think this is the way to go:
SELECT c.cus_Name,
COUNT(o.orderHeader_id) AS Orders,
SUM(ol.cnt) as linesOrderd,
MAX(o.orderHeader_dateCreated) AS lastOrdered,
SUM(o.orderHeader_totalSell) AS orderTotal,
SUM(o.orderHeader_currentSell) AS sellTotal
FROM cus c JOIN
orderheader o
ON o.orderHeader_customer = c.cus_id LEFT JOIN
(SELECT ol.orderLines_orderId, count(*) as cnt
FROM orderlines ol
GROUP BY ol.orderLines_orderId
) ol
ON ol.orderLines_orderId = o.orderHeader_id)
GROUP BY c.cus_name
ORDER BY orderTotal DESC;
I'm not sure if it will be much faster, but it will at least produce a sensible result -- the total number of order lines for a customer rather than the number of order lines on an arbitrary order.
Strange that subselect should not be possible since the count is only very indirectly related to the grouping. You want to count all orderlines of all orders which are related to one customer? Normally this should be done using the second join, but then the orderheader will be repeated as often as the order_lines exist. That would produce wrong results in the other aggregations.
normally this should help then, put the subselect into the joined table:
could you replace orderheader o by
(select o.*, (select count(ol.orderLines_id) from orderlines ol where ol.orderLines_orderId = o.orderHeader_id) as linesOrder from orderheader o) as o
and replace the subselect by
sum(o.linesOrder)

Optimizing aggregate function in Oracle

I have a query for pulling customer information, and I'm adding an max() function to find the most recent order date. Without the aggregate the query takes .23 seconds to run, but with it it takes 12.75 seconds.
Here's the query:
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
ORD_MST is a table with 890,000 records.
Is there a more efficient way to get this functionality?
EDIT: For the record, there's nothing specifically stopping me from running two queries and joining them in my program. I find it incredibly odd that such a simple query would take this long to run. In this case it is much cleaner/easier to let the database do the joining of information, but it's not the only way for me to get it done.
EDIT 2: As requested, here are the plans for the queries I reference in this question.
With Aggregate
Without Aggregate
the problem with your query is that you join both tables completely, then the max function is executed against the whole result, and at last the where statement filters your rows.
you have improve the join, by just joining the rows with the certain custid instead of the full tables, should look like this:
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM
(SELECT * FROM CUST_MST WHERE SEQ = :customerNumber ) U
INNER JOIN
(SELECT * FROM ORD_MST WHERE CUST_NUM = :customerNumber) O ON U.SEQ = O.CUST_NUM
GROUP BY U.SEQ;
Another option is to use an order by and filter the first rownum. its not rly the clean way. Could be faster, if not you will also need a subselect to not order the full tables. Didnt use oracle for a while but it should look something like this:
SELECT * FROM
(
SELECT U.SEQ, O.ORDER_DATE FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
ORDER BY O.ORDER_DATE DESC
)
WHERE ROWNUM = 1
Are you forced to use the join for some reason or why dont you select directly from ORD_MST without join?
EDIT
One more idea:
SELECT * FROM
(SELECT CUST_NUM, MAX(ORDER_DATE) FROM ORD_MST WHERE CUST_NUM = :customerNumber GROUP BY CUST_NUM) O
INNER JOIN CUST_MST U ON O.CUST_NUM = U.SEQ
if the inner select just takes one second, then the join should work instant.
Run this commands:
Explain plan for
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
select * from table( dbms_xplan.display );
and post results here.
Whithout knowing an execution plan we can only guess what really happens.
Btw. my feeling is that adding composite index for ORD_MST table with columns cust_num+order_date could solve the problem (assuming that SEQ is primary key for CUST_MST table and it has already an unique index). Try:
CREATE INDEX idx_name ON ORD_MST( cust_num, order_date );
Also, after creating the index refresh statistics with commands:
EXEC DBMS_STATS.gather_table_stats('your-schema-name', 'CUST_MST');
EXEC DBMS_STATS.gather_table_stats('your-schema-name', 'ORD_MST');
try your query.

database paging design

I'm fetching data for my grid like this
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
I also need the total count for the pagination.
There're two options.
1- Do an another fetch
SELECT count(*) FROM dbo.Orders
2- Put the count statement in the query
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
(SELECT count(*) FROM dbo.Orders) as Count
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
Which way should I go ?
Of the 2 methods you've put forward, the first (separate query) is better. The second method means the count will appear in every row returned which is a bit unnecessary. Also if the query returns 20 rows, the select count(*) will be executed 20 times (if i remember right, guess this could depend on which database engine you're using).
Additionally, depending on how much traffic you're envisaging and how big the table is likely to get, you can improve upon this by caching the result of select count(*) somewhere, and then refreshing it upon insertions / deletions to the table.
If this is for SQL Server 2005 or higher, one of the best ways to get pagination is to use a Common Table Expression.
CREATE PROC MyPaginatedDataProc
#pageNumber INT
AS
WITH OrdersCTE (CustomerID, OrderTime, ProductID, Quantity, RowNumber)
AS
(
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
ROW_NUMBER() OVER (ORDER BY OrderItems.OrderID) AS RowNumber
FROM
dbo.Orders INNER JOIN dbo.OrderItems ON Orders.ID = OrderItems.OrderID
)
SELECT
CustomerID,
OrderTime,
ProductId,
Quantity
FROM
OrdersCTE
WHERE
RowNumber BETWEEN (#pageNumber * 10) AND (((#pageNumber + 1) * 10) -1)
Otherwise for getting the total row count, I'd use a separate query like Mailslut said.
If you are using oracle you can use COUNT(*) OVER ( ) CNT. This one was more efficient
as it takes single table scan
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
COUNT(*) OVER ( ) CNT as Count
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID
As #Mailslut suggests, you should probably use two queries. However, you should probably add a
WHERE clause to the query that fetches the data, so you only fetch the data that you actually need to show (unless you are caching it).
If more than one thread is accessing the database at a time, you will also need to somehow make sure that the count is kept in sync with the database.
I would consider something different, because what you are trying to do is not very simple, but quite necessary. Have you considered using the SQL Server row_number function? This way you will know how many records there are by looking at the max row_number returned, but also in the order you want.
SELECT
Orders.CustomerID,
Orders.OrderTime,
OrderItems.ProductID,
OrderItems.Quantity,
ROW_NUMBER() OVER(ORDER BY Orders.CustomerId) rn
FROM
dbo.Orders INNER JOIN dbo.OrderItems
ON Orders.ID = OrderItems.OrderID

Find records that do not have related records in SQL

I have 2 tables (Orders, OrderItems) that are related based on a column OrderID. I need to find all Orders that do not have any OrderItems.
We use JOIN to find related data. To find data without any related data, we can use an anti-join.
The following joins the tables, then selects those without any order items. This tends to be more efficient that a WHERE id NOT IN (...) style query.
select *
from
Orders O
left outer join OrderItems I
on I.OrderId = O.Id
where
I.Id is null
Select * From Orders Where OrderID not in (Select Distinct OrderID From OrderItems)
try with LEFT EXCEPTION JOIN
select *
from Orders
LEFT EXCEPTION JOIN OrderItems ON ...

What is the most efficient way to write a select statement with a "not in" subquery?

What is the most efficient way to write a select statement similar to the below.
SELECT *
FROM Orders
WHERE Orders.Order_ID not in (Select Order_ID FROM HeldOrders)
The gist is you want the records from one table when the item is not in another table.
For starters, a link to an old article in my blog on how NOT IN predicate works in SQL Server (and in other systems too):
Counting missing rows: SQL Server
You can rewrite it as follows:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
, however, most databases will treat these queries the same.
Both these queries will use some kind of an ANTI JOIN.
This is useful for SQL Server if you want to check two or more columns, since SQL Server does not support this syntax:
SELECT *
FROM Orders o
WHERE (col1, col2) NOT IN
(
SELECT col1, col2
FROM HeldOrders ho
)
Note, however, that NOT IN may be tricky due to the way it treats NULL values.
If Held.Orders is nullable, no records are found and the subquery returns but a single NULL, the whole query will return nothing (both IN and NOT IN will evaluate to NULL in this case).
Consider these data:
Orders:
OrderID
---
1
HeldOrders:
OrderID
---
2
NULL
This query:
SELECT *
FROM Orders o
WHERE OrderID NOT IN
(
SELECT OrderID
FROM HeldOrders ho
)
will return nothing, which is probably not what you'd expect.
However, this one:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
will return the row with OrderID = 1.
Note that LEFT JOIN solutions proposed by others is far from being a most efficient solution.
This query:
SELECT *
FROM Orders o
LEFT JOIN
HeldOrders ho
ON ho.OrderID = o.OrderID
WHERE ho.OrderID IS NULL
will use a filter condition that will need to evaluate and filter out all matching rows which can be numerius
An ANTI JOIN method used by both IN and EXISTS will just need to make sure that a record does not exists once per each row in Orders, so it will eliminate all possible duplicates first:
NESTED LOOPS ANTI JOIN and MERGE ANTI JOIN will just skip the duplicates when evaluating HeldOrders.
A HASH ANTI JOIN will eliminate duplicates when building the hash table.
"Most efficient" is going to be different depending on tables sizes, indexes, and so on. In other words it's going to differ depending on the specific case you're using.
There are three ways I commonly use to accomplish what you want, depending on the situation.
1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.
2. Another method is the "correlated subquery" which is a slight variation of what you have...
SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID
FROM HeldOrders h
where h.order_id = o.order_id)
Note the addition of the where clause. This tends to work better when HeldOrders has a large number of rows. Order_ID needs to be indexed in both tables.
3. Another method I use sometimes is left outer join...
SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null
When using the left outer join, h.order_id will have a value in it matching o.order_id when there is a matching row. If there isn't a matching row, h.order_id will be NULL. By checking for the NULL values in the where clause you can filter on everything that doesn't have a match.
Each of these variations can work more or less efficiently in various scenarios.
You can use a LEFT OUTER JOIN and check for NULL on the right table.
SELECT O1.*
FROM Orders O1
LEFT OUTER JOIN HeldOrders O2
ON O1.Order_ID = O2.Order_Id
WHERE O2.Order_Id IS NULL
I'm not sure what is the most efficient, but other options are:
1. Use EXISTS
SELECT *
FROM ORDERS O
WHERE NOT EXISTS (SELECT 1
FROM HeldOrders HO
WHERE O.Order_ID = HO.OrderID)
2. Use EXCEPT
SELECT O.Order_ID
FROM ORDERS O
EXCEPT
SELECT HO.Order_ID
FROM HeldOrders
Try
SELECT *
FROM Orders
LEFT JOIN HeldOrders
ON HeldOrders.Order_ID = Orders.Order_ID
WHERE HeldOrders.Order_ID IS NULL