Restricting inner query with outer query atttribute - sql

I currently have a large SQL query (not mine) which I need to modify. I have a transaction and valuation table. The transaction has a one-to-many relationship with valuations. The two tables are being joined via a foreign key.
I've been asked to prevent any transactions (along with their subsequent valuations) from being returned if no valuations for a transaction exist past a certain date. The way I thought I would achieve this would be to use an inner query, but I need to make the inner query aware of the outer query and the transaction. So something like:
SELECT * FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V WHERE T.VAL_FK = V.ID
WHERE (SELECT COUNT(*) FROM V WHERE V.DATE > <GIVEN DATE>) > 1
Obviously the above wouldn't work as the inner query is separate and I can't reference the outer query V reference from the inner. How would I go about doing this, or is there a simpler way?
This would just be the case of setting the WHERE V.DATE > in the outer query as I want to prevent any valuation for a given transaction if ANY of them exceed a specified date, not just the ones that do.
Many thanks for any help you can offer.

You may looking for this
SELECT *
FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V1 ON T.VAL_FK = V1.ID
WHERE (SELECT COUNT(*)
FROM VALUATION_TABLE V2
WHERE V2.ID = V1.ID AND V2.DATE > <GIVEN DATE>) > 1

SELECT *
FROM TRANSACTION_TABLE T
INNER JOIN VALUATION_TABLE V1 ON T.VAL_FK = V.ID
WHERE V.ID IN ( SELECT ID
FROM VALUATION_TABLE
WHERE DATE > <GIVEN DATE>
)
If execution time is important, you may want to test the various solutions on your actual data and see which works best in your situation.

Related

Sum fields of an Inner join

How I can add two fields that belong to an inner join?
I have this code:
select
SUM(ACT.NumberOfPlants ) AS NumberOfPlants,
SUM(ACT.NumOfJornales) AS NumberOfJornals
FROM dbo.AGRMastPlanPerformance MPR (NOLOCK)
INNER JOIN GENRegion GR ON (GR.intGENRegionKey = MPR.intGENRegionLink )
INNER JOIN AGRDetPlanPerformance DPR (NOLOCK) ON
(DPR.intAGRMastPlanPerformanceLink =
MPR.intAGRMastPlanPerformanceKey)
INNER JOIN vwGENPredios P ​​(NOLOCK) ON ( DPR.intGENPredioLink =
P.intGENPredioKey )
INNER JOIN AGRSubActivity SA (NOLOCK) ON (SA.intAGRSubActivityKey =
DPR.intAGRSubActivityLink)
LEFT JOIN (SELECT RA.intGENPredioLink, AR.intAGRActividadLink,
AR.intAGRSubActividadLink, SUM(AR.decNoPlantas) AS
intPlantasTrabajads, SUM(AR.decNoPersonas) AS NumOfJornales,
SUM(AR.decNoPlants) AS NumberOfPlants
FROM AGRRecordActivity RA WITH (NOLOCK)
INNER JOIN AGRActividadRealizada AR WITH (NOLOCK) ON
(AR.intAGRRegistroActividadLink = RA.intAGRRegistroActividadKey AND
AR.bitActivo = 1)
INNER JOIN AGRSubActividad SA (NOLOCK) ON (SA.intAGRSubActividadKey
= AR.intAGRSubActividadLink AND SA.bitEnabled = 1)
WHERE RA.bitActive = 1 AND
AR.bitActive = 1 AND
RA.intAGRTractorsCrewsLink IN(2)
GROUP BY RA.intGENPredioLink,
AR.decNoPersons,
AR.decNoPlants,
AR.intAGRAActivityLink,
AR.intAGRSubActividadLink) ACT ON (ACT.intGENPredioLink IN(
DPR.intGENPredioLink) AND
ACT.intAGRAActivityLink IN( DPR.intAGRAActivityLink) AND
ACT.intAGRSubActivityLink IN( DPR.intAGRSubActivityLink))
WHERE
MPR.intAGRMastPlanPerformanceKey IN(4) AND
DPR.intAGRSubActivityLink IN( 1153)
GROUP BY
P.vchRegion,
ACT.NumberOfFloors,
ACT.NumOfJournals
ORDER BY ACT.NumberOfFloors DESC
However, it does not perform the complete sum. It only retrieves all the values ​​of the columns and adds them 1 by 1, instead of doing the complete sum of the whole column.
For example, the query returns these results:
What I expect is the final sums. In NumberOfPlants the result of the sum would be 163,237 and of NumberJornales would be 61.
How can I do this?
First of all the (nolock) hints are probably not accomplishing the benefit you hope for. It's not an automatic "go faster" option, and if such an option existed you can be sure it would be already enabled. It can help in some situations, but the way it works allows the possibility of reading stale data, and the situations where it's likely to make any improvement are the same situations where risk for stale data is the highest.
That out of the way, with that much code in the question we're better served with a general explanation and solution for you to adapt.
The issue here is GROUP BY. When you use a GROUP BY in SQL, you're telling the database you want to see separate results per group for any aggregate functions like SUM() (and COUNT(), AVG(), MAX(), etc).
So if you have this:
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
You will get a separate row per ColumnA group, even though it's not in the SELECT list.
If you don't really care about that, you can do one of two things:
Remove the GROUP BY If there are no grouped columns in the SELECT list, the GROUP BY clause is probably not accomplishing anything important.
Nest the query
If option 1 is somehow not possible (say, the original is actually a view) you could do this:
SELECT SUM(SumB)
FROM (
SELECT Sum(ColumnB) As SumB
FROM [Table]
GROUP BY ColumnA
) t
Note in both cases any JOIN is irrelevant to the issue.

SQL Query to count the records

I am making up a SQL query which will get all the transaction types from one table, and from the other table it will count the frequency of that transaction type.
My query is this:
with CTE as
(
select a.trxType,a.created,b.transaction_key,b.description,a.mode
FROM transaction_data AS a with (nolock)
RIGHT JOIN transaction_types b with (nolock) ON b.transaction_key = a.trxType
)
SELECT COUNT (trxType) AS Frequency, description as trxType,mode
from CTE where created >='2017-04-11' and created <= '2018-04-13'
group by trxType ,description,mode
The transaction_types table contains all the types of transactions only and transaction_data contains the transactions which have occurred.
The problem I am facing is that even though it's the RIGHT join, it does not select all the records from the transaction_types table.
I need to select all the transactions from the transaction_types table and show the number of counts for each transaction, even if it's 0.
Please help.
LEFT JOIN is so much easier to follow.
I think you want:
select tt.transaction_key, tt.description, t.mode, count(t.trxType)
from transaction_types tt left join
transaction_data t
on tt.transaction_key = t.trxType and
t.created >= '2017-04-11' and t.created <= '2018-04-13'
group by tt.transaction_key, tt.description, t.mode;
Notes:
Use reasonable table aliases! a and b mean nothing. t and tt are abbreviations of the table name, so they are easier to follow.
t.mode will be NULL for non-matching rows.
The condition on dates needs to be in the ON clause. Otherwise, the outer join is turned into an inner join.
LEFT JOIN is easier to follow (at least for people whose native language reads left-to-right) because it means "keep all the rows in the table you have already read".

Oracle SQL: Is it more efficient to use a WHERE clause in a subquery or after the join?

I wanted to know which would be more efficient and why:
example 1:
SELECT a.CUSTOMER_KEY a.LAST_NAME b.TRASACTION_AMT,
FROM CUSTOMER_TABLE a
LEFT JOIN TRANSACTION_TABLE b
ON a.CUSTOMER_KEY = b.CUSTOMER_KEY
WHERE b.DATE_TRANSACTION > 20150101 AND a.CUSTOMER_ACTIVE_FLAG = 'Y';
or example 2:
SELECT a.CUSTOMER_KEY a.LAST_NAME b.TRASACTION_AMT,
FROM
(SELECT *
FROM CUSTOMER_TABLE
WHERE CUSTOMER_ACTIVE_FLAG = 'Y') a
LEFT JOIN
(SELECT *
FROM TRANSACTION_TABLE
WHERE b.DATE_TRANSACTION > 20150101) b
ON a.CUSTOMER_KEY = b.CUSTOMER_KEY
For instance would option 2 be better optimized because it would filter out the records not satisfying the where clause first?
(NOTE: the query looks to join customer information with transaction information based on customer key. The customer key is unique to the customer table. Both querys produce equivalent output.)
The correct equivalent query without a join is:
SELECT a.CUSTOMER_KEY a.LAST_NAME b.TRASACTION_AMT,
FROM CUSTOMER_TABLE a LEFT JOIN
TRANSACTION_TABLE b
ON a.CUSTOMER_KEY = b.CUSTOMER_KEY AND b.DATE_TRANSACTION > 20150101
WHERE a.CUSTOMER_ACTIVE_FLAG = 'Y';
The condition on the second table goes in the ON clause.
The best way to know is to look at the execution plans and run-times for the two queries. I would expect the equivalent versions to have the same execution plan. Oracle has a smart optimizer and should optimize away the subqueries. However, it might miss a particular case or two, which is why you should check on your own queries.

How to do an inequality join in Apache Drill?

I am trying to run a query in Drill that requires inequality joins (such as ‘on a.event_time >= b.event_time and a.event_time < b.next_event_time’). I am getting the error that Drill does not support inequality joins, and that is also what I am reading online.
Are there any work arounds to use in drill to get the same results without using an inequality join? All I can think of is expanding one of my tables to include duplicate rows for every iteration of the field I am trying to join on, but I am guessing there is a more straightforward way Drill users get around this.
I guess you are trying
SELECT *
FROM Table1
JOIN Table2
ON Table1.time > Table2.time
Can you try ?
SELECT *
FROM Table1, Table2
WHERE Table1.time > Table2.time
This is hacky but I was able to get it to work by duplicating and bundling the logic of the join in the "WHERE" clause, then adding an OR to the opposite of the join.
So for example if you want to do
SELECT * FROM
ORDERS as Ord
LEFT JOIN Customers as Cus
ON Cus.CustomerID = Ord.CustomerID
AND Cus.CustomerType <> 'Employee'
You can do this:
SELECT * FROM
ORDERS as Ord
LEFT JOIN Customers as Cus
ON Cus.CustomerID = Ord.CustomerID
WHERE ((Cus.CustomerID = Ord.CustomerID
AND Cus.CustomerType <> 'Employee') OR (Cus.CustomerID <> Ord.CustomerID))

Optimizing aggregate function in Oracle

I have a query for pulling customer information, and I'm adding an max() function to find the most recent order date. Without the aggregate the query takes .23 seconds to run, but with it it takes 12.75 seconds.
Here's the query:
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
ORD_MST is a table with 890,000 records.
Is there a more efficient way to get this functionality?
EDIT: For the record, there's nothing specifically stopping me from running two queries and joining them in my program. I find it incredibly odd that such a simple query would take this long to run. In this case it is much cleaner/easier to let the database do the joining of information, but it's not the only way for me to get it done.
EDIT 2: As requested, here are the plans for the queries I reference in this question.
With Aggregate
Without Aggregate
the problem with your query is that you join both tables completely, then the max function is executed against the whole result, and at last the where statement filters your rows.
you have improve the join, by just joining the rows with the certain custid instead of the full tables, should look like this:
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM
(SELECT * FROM CUST_MST WHERE SEQ = :customerNumber ) U
INNER JOIN
(SELECT * FROM ORD_MST WHERE CUST_NUM = :customerNumber) O ON U.SEQ = O.CUST_NUM
GROUP BY U.SEQ;
Another option is to use an order by and filter the first rownum. its not rly the clean way. Could be faster, if not you will also need a subselect to not order the full tables. Didnt use oracle for a while but it should look something like this:
SELECT * FROM
(
SELECT U.SEQ, O.ORDER_DATE FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
ORDER BY O.ORDER_DATE DESC
)
WHERE ROWNUM = 1
Are you forced to use the join for some reason or why dont you select directly from ORD_MST without join?
EDIT
One more idea:
SELECT * FROM
(SELECT CUST_NUM, MAX(ORDER_DATE) FROM ORD_MST WHERE CUST_NUM = :customerNumber GROUP BY CUST_NUM) O
INNER JOIN CUST_MST U ON O.CUST_NUM = U.SEQ
if the inner select just takes one second, then the join should work instant.
Run this commands:
Explain plan for
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
select * from table( dbms_xplan.display );
and post results here.
Whithout knowing an execution plan we can only guess what really happens.
Btw. my feeling is that adding composite index for ORD_MST table with columns cust_num+order_date could solve the problem (assuming that SEQ is primary key for CUST_MST table and it has already an unique index). Try:
CREATE INDEX idx_name ON ORD_MST( cust_num, order_date );
Also, after creating the index refresh statistics with commands:
EXEC DBMS_STATS.gather_table_stats('your-schema-name', 'CUST_MST');
EXEC DBMS_STATS.gather_table_stats('your-schema-name', 'ORD_MST');
try your query.