Optimizing aggregate function in Oracle - sql

I have a query for pulling customer information, and I'm adding an max() function to find the most recent order date. Without the aggregate the query takes .23 seconds to run, but with it it takes 12.75 seconds.
Here's the query:
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
ORD_MST is a table with 890,000 records.
Is there a more efficient way to get this functionality?
EDIT: For the record, there's nothing specifically stopping me from running two queries and joining them in my program. I find it incredibly odd that such a simple query would take this long to run. In this case it is much cleaner/easier to let the database do the joining of information, but it's not the only way for me to get it done.
EDIT 2: As requested, here are the plans for the queries I reference in this question.
With Aggregate
Without Aggregate

the problem with your query is that you join both tables completely, then the max function is executed against the whole result, and at last the where statement filters your rows.
you have improve the join, by just joining the rows with the certain custid instead of the full tables, should look like this:
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM
(SELECT * FROM CUST_MST WHERE SEQ = :customerNumber ) U
INNER JOIN
(SELECT * FROM ORD_MST WHERE CUST_NUM = :customerNumber) O ON U.SEQ = O.CUST_NUM
GROUP BY U.SEQ;
Another option is to use an order by and filter the first rownum. its not rly the clean way. Could be faster, if not you will also need a subselect to not order the full tables. Didnt use oracle for a while but it should look something like this:
SELECT * FROM
(
SELECT U.SEQ, O.ORDER_DATE FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
ORDER BY O.ORDER_DATE DESC
)
WHERE ROWNUM = 1
Are you forced to use the join for some reason or why dont you select directly from ORD_MST without join?
EDIT
One more idea:
SELECT * FROM
(SELECT CUST_NUM, MAX(ORDER_DATE) FROM ORD_MST WHERE CUST_NUM = :customerNumber GROUP BY CUST_NUM) O
INNER JOIN CUST_MST U ON O.CUST_NUM = U.SEQ
if the inner select just takes one second, then the join should work instant.

Run this commands:
Explain plan for
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
select * from table( dbms_xplan.display );
and post results here.
Whithout knowing an execution plan we can only guess what really happens.
Btw. my feeling is that adding composite index for ORD_MST table with columns cust_num+order_date could solve the problem (assuming that SEQ is primary key for CUST_MST table and it has already an unique index). Try:
CREATE INDEX idx_name ON ORD_MST( cust_num, order_date );
Also, after creating the index refresh statistics with commands:
EXEC DBMS_STATS.gather_table_stats('your-schema-name', 'CUST_MST');
EXEC DBMS_STATS.gather_table_stats('your-schema-name', 'ORD_MST');
try your query.

Related

Sum fields of an Inner join

How I can add two fields that belong to an inner join?
I have this code:
select
SUM(ACT.NumberOfPlants ) AS NumberOfPlants,
SUM(ACT.NumOfJornales) AS NumberOfJornals
FROM dbo.AGRMastPlanPerformance MPR (NOLOCK)
INNER JOIN GENRegion GR ON (GR.intGENRegionKey = MPR.intGENRegionLink )
INNER JOIN AGRDetPlanPerformance DPR (NOLOCK) ON
(DPR.intAGRMastPlanPerformanceLink =
MPR.intAGRMastPlanPerformanceKey)
INNER JOIN vwGENPredios P โ€‹โ€‹(NOLOCK) ON ( DPR.intGENPredioLink =
P.intGENPredioKey )
INNER JOIN AGRSubActivity SA (NOLOCK) ON (SA.intAGRSubActivityKey =
DPR.intAGRSubActivityLink)
LEFT JOIN (SELECT RA.intGENPredioLink, AR.intAGRActividadLink,
AR.intAGRSubActividadLink, SUM(AR.decNoPlantas) AS
intPlantasTrabajads, SUM(AR.decNoPersonas) AS NumOfJornales,
SUM(AR.decNoPlants) AS NumberOfPlants
FROM AGRRecordActivity RA WITH (NOLOCK)
INNER JOIN AGRActividadRealizada AR WITH (NOLOCK) ON
(AR.intAGRRegistroActividadLink = RA.intAGRRegistroActividadKey AND
AR.bitActivo = 1)
INNER JOIN AGRSubActividad SA (NOLOCK) ON (SA.intAGRSubActividadKey
= AR.intAGRSubActividadLink AND SA.bitEnabled = 1)
WHERE RA.bitActive = 1 AND
AR.bitActive = 1 AND
RA.intAGRTractorsCrewsLink IN(2)
GROUP BY RA.intGENPredioLink,
AR.decNoPersons,
AR.decNoPlants,
AR.intAGRAActivityLink,
AR.intAGRSubActividadLink) ACT ON (ACT.intGENPredioLink IN(
DPR.intGENPredioLink) AND
ACT.intAGRAActivityLink IN( DPR.intAGRAActivityLink) AND
ACT.intAGRSubActivityLink IN( DPR.intAGRSubActivityLink))
WHERE
MPR.intAGRMastPlanPerformanceKey IN(4) AND
DPR.intAGRSubActivityLink IN( 1153)
GROUP BY
P.vchRegion,
ACT.NumberOfFloors,
ACT.NumOfJournals
ORDER BY ACT.NumberOfFloors DESC
However, it does not perform the complete sum. It only retrieves all the values โ€‹โ€‹of the columns and adds them 1 by 1, instead of doing the complete sum of the whole column.
For example, the query returns these results:
What I expect is the final sums. In NumberOfPlants the result of the sum would be 163,237 and of NumberJornales would be 61.
How can I do this?
First of all the (nolock) hints are probably not accomplishing the benefit you hope for. It's not an automatic "go faster" option, and if such an option existed you can be sure it would be already enabled. It can help in some situations, but the way it works allows the possibility of reading stale data, and the situations where it's likely to make any improvement are the same situations where risk for stale data is the highest.
That out of the way, with that much code in the question we're better served with a general explanation and solution for you to adapt.
The issue here is GROUP BY. When you use a GROUP BY in SQL, you're telling the database you want to see separate results per group for any aggregate functions like SUM() (and COUNT(), AVG(), MAX(), etc).
So if you have this:
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
You will get a separate row per ColumnA group, even though it's not in the SELECT list.
If you don't really care about that, you can do one of two things:
Remove the GROUP BY If there are no grouped columns in the SELECT list, the GROUP BY clause is probably not accomplishing anything important.
Nest the query
If option 1 is somehow not possible (say, the original is actually a view) you could do this:
SELECT SUM(SumB)
FROM (
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
) t
Note in both cases any JOIN is irrelevant to the issue.

Restricting inner query with outer query atttribute

I currently have a large SQL query (not mine) which I need to modify. I have a transaction and valuation table. The transaction has a one-to-many relationship with valuations. The two tables are being joined via a foreign key.
I've been asked to prevent any transactions (along with their subsequent valuations) from being returned if no valuations for a transaction exist past a certain date. The way I thought I would achieve this would be to use an inner query, but I need to make the inner query aware of the outer query and the transaction. So something like:
SELECT * FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V WHERE T.VAL_FK = V.ID
WHERE (SELECT COUNT(*) FROM V WHERE V.DATE > <GIVEN DATE>) > 1
Obviously the above wouldn't work as the inner query is separate and I can't reference the outer query V reference from the inner. How would I go about doing this, or is there a simpler way?
This would just be the case of setting the WHERE V.DATE > in the outer query as I want to prevent any valuation for a given transaction if ANY of them exceed a specified date, not just the ones that do.
Many thanks for any help you can offer.
You may looking for this
SELECT *
FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V1 ON T.VAL_FK = V1.ID
WHERE (SELECT COUNT(*)
FROM VALUATION_TABLE V2
WHERE V2.ID = V1.ID AND V2.DATE > <GIVEN DATE>) > 1
SELECT *
FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V1 ON T.VAL_FK = V.ID
WHERE V.ID IN ( SELECT ID
FROM VALUATION_TABLE
WHERE DATE > <GIVEN DATE>
)
If execution time is important, you may want to test the various solutions on your actual data and see which works best in your situation.

SQL pagination using INNER JOINs and filtering with LIKE

This query feeds a data table with sorting, filtering, and pagination. All features worked fine until I added the INNER JOIN and then i got:
The multi-part 'identifier "Types.Description" could not be bound
if i remove the second WHERE clause at the end of the query the LIKE statements work, but i lose pagination. I removed some of the LIKE clauses to try and clean up this monstrous query.
SELECT *
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY TAG asc) AS RowNumber, *
FROM (
SELECT (SELECT COUNT(*) FROM Instruments) AS TotalDisplayRows, (SELECT COUNT(*) FROM Instruments) AS TotalRows, Instruments.Tag, Instruments.Location, Instruments.Description, Types.Description As TypeDesc, Manufacturer.Name, Lease.Name as LeaseName, Facility.Name as FacName
FROM Instruments
INNER JOIN Types ON Instruments.Type = Types.ID
INNER JOIN Manufacturer ON Instruments.Manufacturer = Manufacturer.ID
INNER JOIN Facility ON Instruments.Facility = Facility.ID
INNER JOIN Lease ON Instruments.Lease = Lease.ID
WHERE (Types.Description LIKE '%Cat%')
) RawResults
) Results
WHERE (Types.Description LIKE '%Cat%') AND RowNumber BETWEEN 1 AND 10
I think this is your problem
WHERE (types.description LIKE '%Cat%')
You can't do this because you are actually selecting from your derived table named Results and you aliased the column as TypeDesc.
So it should be
WHERE (results.typeDesc LIKE '%Cat%')

How to optimize the query? t-sql

This query works about 3 minutes and returns 7279 rows:
SELECT identity(int,1,1) as id, c.client_code, a.account_num,
c.client_short_name, u.uso, us.fio, null as new, null as txt
INTO #ttable
FROM accounts a INNER JOIN Clients c ON
c.id = a.client_id INNER JOIN Uso u ON c.uso_id = u.uso_id INNER JOIN
Magazin m ON a.account_id = m.account_id LEFT JOIN Users us ON
m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9') AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
The type of 'client_code' field is VARCHAR(6).
Is it possible to somehow optimize this query?
Insert the records in the Temporary table without using Order by Clause and then Sort them using the c.client_code. Hope it should help you.
Create table #temp
(
your columns...
)
and Insert the records in this table Without Using the Order by Clause. Now run the select with Order by Clause
Do you have indexes set up for your tables? An index on foreign key columns as well as Magazin.status might help.
Make sure there is an index on every field used in the JOINs and in the WHERE clause
If one or the tables you select from are actually views, the problem may be in the performance of these views.
Always try to list tables earlier if they are referenced in the where clause - it cuts off row combinations as early as possible. In this case, the Magazin table has some predicates in the where clause, but is listed way down in the tables list. This means that all the other joins have to be made before the Magazin rows can be filtered - possibly millions of extra rows.
Try this (and let us know how it went):
SELECT ...
INTO #ttable
FROM accounts a
INNER JOIN Magazin m ON a.account_id = m.account_id
INNER JOIN Clients c ON c.id = a.client_id
INNER JOIN Uso u ON c.uso_id = u.uso_id
LEFT JOIN Users us ON m.user_id = us.user_id
WHERE m.status_id IN ('1','5','9')
AND m.account_new_num is null
AND u.branch_id = #branch_id
ORDER BY c.client_code;
This kind of optimization can greatly improve query performance.

NHibernate HQL SELECT TOP in sub query

Is there a way of using SetMaxResult() on a sub query? Im writing a query to return all the order items belonging to the most recent order. So I need to limit the number of records on the sub query.
The equivalent sql looks something like:
SELECT i.*
FROM tbl_Orders o
JOIN tbl_OrderItems i on i.OrderId = o.Id
WHERE
o.Id in (SELECT TOP 1 o.Id FROM tbl_Orders o orderby o.Date desc)
Im using hql specifically because criteria api doesnt let you project another domain object (Im querying on orders but want to return order items)
I know that hql doesnt accept "SELECT TOP", but if I use SetMaxResult() it will apply to the outer query, not the subquery.
Any ideas?
From NHibernate 3.2 you could use SKIP n / TAKE n in hql at the end of the query.
You query will be:
SELECT i.*
FROM tbl_Orders o
JOIN tbl_OrderItems i on i.OrderId = o.Id
WHERE
o.Id in (SELECT o.Id FROM tbl_Orders o orderby o.Date desc take 1)
Just query the orders (and use SetMaxResult) and do a 'fetch join' to ensure all orderitems for the selected orders are loaded straight away.
On the returned orders you can then access the order items without this resulting in a new SQL statement being sent to the database.
I encountered this problem too, but didn't found a solution using HQL...
Subqueries with top would be very nice, since this is faster then doing a full join first. When doing a full join first, the SQL Servers join the table first, sort all rows and select the top 30 then. With the subselect, the top 30 column of one table are taken and then joined with the other table. This is much faster!
My query with Subselect takes about 1 second, the one with the join and sort takes 15 seconds! So join wasn't an option.
I ended up with two queries, first the subselect:
IQuery q1 = session.CreateQuery("select id from table1 order by id desc");
q1.SetMaxResults(100);
And then the second query
IQuery q2 = session.CreateQuery("select colone, coltwo from table2 where table1id in (:subselect)");
q2.SetParameterList("subselect", q1.List());