Tuning SQL query : subquery with aggregate function on the same table

Tuning SQL query : subquery with aggregate function on the same table - sql

The following query takes approximately 30 seconds to give results.
table1 contains ~20m lines
table2 contains ~10000 lines
I'm trying to find a way to improve performances. Any ideas ?
declare #PreviousMonthDate datetime
select #PreviousMonthDate = (SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19000101', GETDATE()) - 1, '19000101') as [PreviousMonthDate])
select
distinct(t1.code), t1.ent, t3.lib, t3.typ from table1 t1, table2 t3
where (select min(t2.dat) from table1 t2 where t2.code=t1.code) >#PreviousMonthDate
and t1.ent in ('XXX')
and t1.code=t3.cod
and t1.dat>#PreviousMonthDate
Thanks

This is your query, more sensibly written:
select t1.code, t1.ent, t2.lib, t2.typ
from table1 t1 join
table2 t2
on t1.code = t2.cod
where not exists (select 1
from table1 tt1
where tt1.code = t1.code and
tt1.dat <= #PreviousMonthDate
) and
t1.ent = 'XXX' and
t1.dat > #PreviousMonthDate;
For this query, you want the following indexes:
table1(ent, dat, code) -- for the where
table1(code, dat) -- for the subquery
table2(cod, lib, typ) -- for the join
Notes:
Table aliases should make sense. t3 for table2 is cognitively dissonant, even though I know these are made up names.
not exists (especially with the right indexes) should be faster than the aggregation subquery.
The indexes will satisfy the where clause, reducing the data needed for filtering.
select distinct is a statement. distinct is not a function, so the parentheses do nothing.
Never use comma in the FROM clause. Always use proper, explicit, standard JOIN syntax.

Related

How to join two SQL tables by extracting maximum numbers from one then into another?

As others have commented, I'm now going to add some code:
Imported tables
table3
Case No. is the primary key. Each report date shows one patient. Depending on if the patient is import or local, the cumulative column increases. You can see some days there are no cases so the date like 25/01/2020 is skipped
table2
Report date has no duplicate.
Now, I want to join the tables. Example outcome here:
enter image description here
The maximum cumulative of each date is joined into the new table. So although 26/01/2020 of table3 shows the increase from 6, 7, to 8, I only want the highest cumulative number there.
Thanks for letting me know how my previous query could be improved. Your opinion helps me a lot.
I have tried Gordon Linoff's by substituting the actual names (which I initially omitted because I thought they were ambiguous).
His code is as follows (I've upvoted):
SELECT t3.`Report date`,
max(max(t3.cumulative_local)) over (order by t3.`Report date`),
max(max(t3.cumulative_import)) over (order by t3.`Report date`)
from table3 t3 left join
table2 t2
using (`Report date`)
group by t2.`Report date`;
But I got an error
Error Code: 1055. Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'new.t3.Report date' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Anyways I am now experimenting. Both answers helped. If you know how to fix 1055, let me know, or if you could propose another solution. Thanks

I think you just want aggregation and window functions:
select t1.date,
max(max(cumulativea)) over (order by t1.date),
max(max(cumulativeb)) over (order by t1.date)
from table1 t1 left join
table2 t2
on t1.date = t2.date
group by t1.date;
This returns the maximum values of the two columns up to each date, which is, I think, what you are trying to describe.

I don't understand why you have cumulA and cumulB on table1. I suppose it will be to store the Max cumulA and cumulB for each days.
You must first self-join table2 to find the Max for each date (with a GROUP BY date) :
SELECT t2.id, t2.date, cA
FROM t2
JOIN (
SELECT id, MAX(cumulA) AS cA, date AS d2
FROM t2
GROUP BY d2
) AS td
ON t2.id=td.id
AND t2.date=d2
ORDER BY t2.date
After, you join left table1 on result of self-join table2 to have each days.
SELECT * FROM `t1` LEFT JOIN t2 ON t1.date = t2.date ORDER BY t1.date
Here is the fusion of the 2 junctions :
SELECT * FROM `t1` LEFT JOIN (
SELECT t2.id, t2.date, cA
FROM t2
JOIN (
SELECT id, MAX(cumulA) AS cA, date AS d2
FROM t2
GROUP BY d2
) AS td
ON t2.id=td.id
AND t2.date=d2
ORDER BY t2.date
) AS tt
ON t1.date = tt.date ORDER BY t1.date
You do the same for cumulB.
And after (I suppose), you INSERT INTO the result into table1.
I hope I answered your question.
Good continuation.
_Teddy_

Need help in postgreSQL query

I have two tables, table1 and table2. I have written the query with some condition as follows
Select t2.employee_id,
t2.adddate,
t2.previousaleave
from table2 as t2, table1 as t1
WHERE t1.enddate IS NULL
OR t1.enddate>t2.adddate
AND t2.adddate<=now()
AND t2.leavetype='annualleave'
If i run this,the conditions are not working.It is selecting all the empids of the table t2.?I checked that the problem is with the t1.enddate is NULL condition. Since the enddate column can be either,
some date
or, null
I need to get the empid if the t1.enddate IS NULL and the other conditions succeed. Here leave type is distinct with in each empid. (Each employee have only one row for the annualleave). Is there any other alternative way to do this.

You always need parentheses if using OR
For example:
Select t2.employee_id,t2.adddate,t2.previousaleave
from table2 as t2
Inner join table1 as t1 ON ............. ?????
WHERE (t1.enddate IS NULL OR t1.enddate>t2.adddate AND t2.adddate<=now())
AND t2.leavetype='annualleave'
BUT ALSO NOTE you don't appear to have joined the tables. Always use explicit ANSI join syntax to avoid creating an accidental Cartesian product of the rows from the tables.
If empid exists in both tables try
Select t2.employee_id,t2.adddate,t2.previousaleave
from table2 as t2
Inner join table1 as t1 ON t2.empid = t1.empid
WHERE (t1.enddate IS NULL OR t1.enddate>t2.adddate AND t2.adddate<=now())
AND t2.leavetype='annualleave'

SELECT t2.employee_id, t2.adddate, t2.previousaleave
FROM table2 AS t2
INNER JOIN table1 as t1
ON (
( t1.enddate IS NULL OR t1.enddate > t2.adddate )
AND t2.adddate <= now()
AND t2.leavetype = 'annualleave'
)
Is it possible for t2.adddate to be > now()? That seems awkward.

Multiple Indexes or Single One in a comparison between two large tables

I'm going to compare two tables on Oracle with about 10 million records in each one.
t1 (anumber, bnumber, cdate, ctime, duration)
t2 (fcode, anumber, bnumber, mdate, mtime, odate, otime, duration)
Rows in these tables are the information of calls from a number to the other for a specific month (august 2012).
For example (12345,9876,120821,120000,68) indicates a call from anumber=12345 to bnumber=9876 in date=2012/08/21 and time=12:08:21 which lasted for 68 seconds.
I want to find records that don't exists in one of these tables but exists in the other. My comparison query is like this
select t1.*
from table1 t1
where not exists(select t1.* from table2 t2
where t1.anumber = t2.anumber
and t1.cdate = t2.mdate
and t1.duration = t2.duration);
and my questions are:
Which kind of indexes is better to use? Multiple index on columns (anumber,cdate,duration) or single index on each of them?
Considering that the third column is duration of a call which could have a wide range, is it worth to create an index on it? doesn't it slower down my query?
What is the fastest way to find the differences between these table?
Is it better to loop through dates and execute my query with (cdate='A DATE MONTH') added to the where clause?
Compared to the above query how much slower is this one:
select t1.*
from table1 t1
where not exists (select t1.*
from table2 t2
where t1.anumber = t2.anumber
and t1.bnumber like '%t2.bnumber%'
and t1.cdate = t2.mdate
and t1.duration = t2.duration);

select * from t1
minus
select * from t2
don't use indexes, you want to scan all 10 million rows in both tables, therefore a TABLE_ACCESS_FULL is rather in this case.

try this way:
select t1.*
from table1 t1
where (t1.anumber, t1.date, t1.duration) not in (select t2.anumber, t2.date, t2.duration
from table2 t2);
see the explain, if good then don't creat indexes, or create like this
create index idx_anum_dat_dur on table2(anumber, date, duration)
query performance depends on the result of which will return, you must to see the explain and try different variants

Your query is equivalient to:
select t1.*
from table1 t1
WHERE t1.bnumber like '%t2.bnumber%' -- << like 'Literal' !!!
AND NOT EXISTS (
select *
from table2 t2
where t2.anumber = t1.anumber
and t2.cdate = t1.mdate
and t2.duration = t1.duration
);
(table references inside quotes are not expanded! Is this the OP's intention ??? )

Adding other columns into a join with a group by

In Oracle 11g Express, I have the following query:
select t1.product_name, SUM(t1.product_cost_per_month)
FROM table1 t1 INNER JOIN table2 t2 on (t1.product_name = t2.product_name)
WHERE t2.date > sysdate
GROUP BY t1.product_name
This works, it returns the sum of the cost of products per month, group by product after a certain date (I just use sysdate here as an example).
However, I would like to display some additional description about each product, i.e the vendor. So I use this code:
select t1.product_name, SUM(t1.product_cost_per_month), t2.vendor
FROM table1 t1 INNER JOIN table2 t2 ON (t1.product_name = t2.product_name)
WHERE t2.date > sysdate
GROUP BY t1.product_name
This doesn't work because all variables need to have an aggregation function applied to them to use "Group by", but an aggregation function for something like "vendor" seems meaningless... So is there a way to do this?
I am probably going to write a short pl/sql routine to solve, but I am wondering if there is a purely SQL way to do this?

Vendor should also be included in the GROUP BY clause.
GROUP BY t1.product_name, t2.vendor
Another technique to achieve what you're doing would be a nested query:
SELECT t1.product_name,
(
select sum(product_cost_per_month)
from table2 t2
where
t1.product_name = t2.product_name
and t2.date > sysdate
) as total_product_cost,
t1.another_field,
t1.another_field2,
t1.another_field3
FROM table1
(Apologies for any errors, I didn't test this but this should give you the gist of it)

Column ambiguously defined in subquery using rownums

I have to execute a SQL made from some users and show its results. An example SQL could be this:
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where table1.id = table2.id
This SQL works fine as it is, but I need to manually add pagination and show the rownum, so the SQL ends up like this.
SELECT z.*
FROM(
SELECT y.*, ROWNUM rn
FROM (
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where table1.id = table2.id
) y
WHERE ROWNUM <= 50) z
WHERE rn > 0
This throws an exception: "ORA-00918: column ambiguously defined" because both Table1 and Table2 contains a field with the same name ("id").
What could be the best way to avoid this?
Regards.
UPDATE
In the end, we had to go for the ugly way and parse each SQL coming before executing them. Basically, we resolved asterisks to discover what fields we needed to add, and alias every field with an unique id. This introduced a performance penalty but our client understood it was the only option given the requirements.
I will mark Lex answer as it´s the solution we ended up working on.

I think you have to specify aliasses for (at least one of) table1.id and table2.id. And possibly for any other corresponding columnnames as well.
So instead of SELECT t1.*, t2.* FROM table1 t1, table2 use something like:
SELECT t1.id t1id, t2.id t2id [rest of columns] FROM table1 t1, table2 t2
I'm not familiar with Oracle syntax, but I think you'll get the idea.

I was searching for an answer to something similar. I was referencing an aliased sub-query that had a couple of NULL columns. I had to alias the NULL columns because I had more than one;
select a.*, t2.column, t2.column, t2.column
(select t1.column, t1.column, NULL, NULL, t1.column from t1
where t1='VALUE') a
left outer join t2 on t2.column=t1.column;
Once i aliased the NULL columns in the sub-query it worked fine.

If you could modify the query syntactically (or get the users to do so) to use explicit JOIN syntax with the USING clause, this would automatically fix the problem at hand:
SELECT t1.*, t2.*
FROM table1 t1
JOIN table2 t2 USING (id)
The USING clause does the same as ON t1.id = t2.id (or the implicit JOIN you have in the question), except that only one id column remains in the result, thereby eliminating your problem.
You would still run into problems if there are more columns with identical names that are not included in the USING clause. Aliases as described by #Lex are indispensable then.

Use replace null values function to fix this.
SELECT z.*
FROM(
SELECT y.*, ROWNUM rn
FROM (
SELECT t1.*, t2.* FROM table1 t1, table2 t2, where
NVL(table1.id,0) = NVL(table2.id,0)
) y
WHERE ROWNUM <= 50) z
WHERE rn > 0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Tuning SQL query : subquery with aggregate function on the same table - sql

Related

How to join two SQL tables by extracting maximum numbers from one then into another?

Need help in postgreSQL query

Multiple Indexes or Single One in a comparison between two large tables

Adding other columns into a join with a group by

Column ambiguously defined in subquery using rownums

Categories

Resources