Teradata SQL Tuning : What was the purpose of the below code - sql

I tuned a query that was badly skewed written by a Teradata Co. Consultant few years back. The same code was a perpetually high CPU report and it has gotten worse
SELECT
c.child ,
a.username ,
CAST( SUM((((a.AmpCPUTime(DEC(18,3)))+
ZEROIFNULL(a.ParserCPUTime)) )) AS DECIMAL(18,3))
FROM pdcrinfo.dbqlogtbl a
LEFT OUTER JOIN (
SELECT queryid,logdate,
MIN (objectdatabasename) AS objectdatabasename
FROM pdcrinfo.dbqlobjtbl_hst
GROUP BY 1,2 ) b ON a.queryid=b.queryid
JOIN dbc.children c ON b.objectdatabasename=c.child
WHERE c.parent ='FINDB'
AND a.logdate BETWEEN '2015-12-01' AND '2015-12-31'
and b.logdate BETWEEN '2015-12-01' AND '2015-12-31'
GROUP BY 1,
2,
3
ORDER BY 1,
2,
3;
I already rewrote the query joining log & obj tables which have the same PI and then doing an exists on the dbc.child table and it runs fabulously - same o/p.
But I thought I got lucky just because FINDB does not have any children that are view databases .
My question :
I am trying to understand what is the purpose of
MIN (objectdatabasename)
Most of our table database names precede view database names ( which are of the form findb_vw etc ) and so I think he may have tried to eliminate view databases ?
The other thing : Why LOJ ( I changed to IJ ) because you want a value for Objectdatabasename . I think LOJ does not apply here
I am not sure so throwing the question open on the stage. So just to clarify - I am not looking for tuning tips. I wanted other perspectives on the MIN ( Objectdatabasename ) code .

You're right, the Left Join is useless (but the optimizer will change it to an Inner Join anyway, so it's just confusing).
The MIN (objectdatabasename) was probably used to avoid multiple rows for the same queryid resulting in duplicate rows (and maybe to remove the view dbs).
But IMHO the main reason for bad performance is a missing join condition between the DBQL tables. The tables in pdcrinfo should be partitioned by LogDate and you need to add AND a.LogDate=b.LogDate to the existing ON a.queryid=b.queryid to get a fast join (PI + partitioning), otherwise the optimizer must do some kind of preparation or a more expensive sliding window join.

Related

Long SQL subquery trouble

I just registered and want to ask.
I learn sql queries not so long time and I got a trouble when I decided to move a table to another database. A few articles were read about building long subqueries , but they didn't help me.
Everything works perfect before that my action.
I just moved the table and tried to rewrite the query while whole day.
update [dbo].Full
set [salary] = 1000
where [dbo].Full.id in (
select distinct k1.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.S_School)
) k1
where k1.id not in (
select distinct k2.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.Shool)
) k2,
List_School t3
where charindex (t3.NameApp, k2.Topic)>5
)
)
I moved table List_School to database [DB_1] and I can't to bend with it.
I can't write [DB_1].dbo.List_School. Should I use one more subquery?
I even thought about create a few temporary tables but it can influence on speed of execution.
Sql gurus , please invest some your time on me. Thank you in advance.
I will be happy for each hint, which you give me.
There appear to be a number of issues. You are comparing the user column to the topic_name column. An expected meaning of those column names would suggest you are not comparing the correct columns. But that is a guess.
In the final subquery you have an ansi join on table List_School but no join columns which means the join witk k2 is a cartesian product (aka cross join) which is not what you would want in most situations. Again a guess as no details of actual problem data or error messages was provided.

Efficiently order by columns from different tables

We're currently trying to improve a system that allows the user to filter and sort a large list of objects (> 100k) by fields that are not being displayed. Since the fields can be selected dynamically we'd plan to build the query dynamically as well.
That doesn't sound too hard and the basics are done easily but the problem lies in how the data is structured. In some cases some more or less expensive joins would be needed which could some up to a quite expensive query, especially when those joins are combined (i.e. select * from table join some_expensive_join join another_expensive_join ...).
Filtering wouldn't be that big a problem since we can use intersections.
Ordering, however, would require us to first build a table that contains all necessary data which if being done via a huge select statement with all those joins would become quite expensive.
So the question is: is there a more efficient way to do that?
I could think of doing it like this:
do a select query for the first column and order by that
for all elements that basically have the same order (e.g. same value) do another query to resolve that
repeat the step above until the order is unambiguous or we run out of sort criteria
Does that make sense? If yes, how could this be done in Postgresql 9.4 (we currently can't upgrade so 9.5+ solutions though welcome wouldn't help as right now).
Does this help, or is it too trivial? (the subqueries could be prefab join views)
SELECT t0.id, t0.a,t0.b,t0.c, ...
FROM main_table t0
JOIN ( SELECT t1.id AS id
, rank() OVER (ORDER BY whatever) AS rnk
FROM different_tables_or_JOINS
) AS t1 ON t1.id=t0.id
JOIN ( SELECT t2.id AS id
, rank() OVER (ORDER BY whatever) AS rnk
FROM different_tables_or_JOINS2
) AS t2 ON t2.id=t0.id
...
ORDER BY t1.rnk
, t2.rnk
...
, t0.id
;

Small vs Large and Large vs Small sql joins [duplicate]

I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.

Does Sql JOIN order affect performance?

I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.

Sql = want to use make an inner join statement but not with tables

I want to use the fact that on two tables t1 and t2 I can make an inner join with on t1.colum1>t2.colum2 to calculate the maximum drawdown of a return vector. The problem is that an inner join is only possible with two stored databases or tables and I wanted to do it selecting just a part of the tables.
Is there any other possibility, I am totally new to sql and I can't find any other option?
Thank you
edit
before manipulating my inner join to be able to calculate my maximum drawdown I have to be able to make this inner join on a selection on the tables and not the tables themselves. So I followed Mark's advice but I am still getting an error. Here is my query:
select *
from (select * from bars where rownum <= 10 as x)as tab1
inner join (select * from bars where rownum <= 10 as y) as tab2
on tab1.x=tab2.y
The error is ora-00907 missing right parenthesis
additional information extracted from OP's message published as answer to this post. *
You can inner join on subselects too, you just need to give the subselects an alias:
SELECT *
FROM (SELECT 1 AS X) AS T1
INNER JOIN (SELECT 1 AS Y) AS T2
ON T1.X = T2.Y
If you post your non-working query, I can give you a better answer more tailored to your exact tables.
(inner) join is not limited to whole tables.
I got a definition of maximum drawdown from an investment website (thanks Google!). So I think we need to calculate the percentage drop between the highest point in a graph and its subsequent lowest point.
The following query calculates the maximum drawdown on investments in Oracle stock over the last twelve months. It joins the investments table to itself, with aliases to distinguish the versions of the table (one for the highest peak, one for the lowest trough). This may not mirror your precise business logic, but it shows the SQL techniques which Oracle offers you.
select round(((max_return-min_return)/max_return)*100, 2) as max_drawdown
from
( select max(t1.return_amt) as max_return
, min(t2.return_amt) as min_return
from investments t1
join investments t2
on ( t1.stock_id = 'ORCL'
and t2.stock_id = t1.stock_id
and t2.created_date > t1.created_date )
where t1.created_date >= add_months(sysdate, -12)
and t2.created_date >= add_months(sysdate, -12)
)
/
This query will return zero if the stock has not experienced a drop during the window. It also does not check for a following upturn (as I understand drawdown it is supposed to be the bottom of a trough, a point we can only establish once the stock has started to climb again).
With regard to training at home, we can download software from Oracle TechNet for that purpose. If bandwidth or disk space are an issue go for the Express Edition; it doesn't have all the features but you probably won't want them for a while yet. Oracle do provide a free IDE, SQL Developer. As its name suggests it is primarily targeted at developers but it has many of the DBA-oriented features of DB Artisan. For full-on database management Oracle offers Enterprise Manager.
edit
In the comments outis suggests
You could add a t1.return_amt >
t2.return_amt in the join as a minor
optimization
I think it is unlikely that return_amt would be indexed, so I think it is unlikely that such a clause would have an impact on performance. What it would do is change the behaviour for stocks which do not have a drawdown. The query I presented returns zero for stocks which have increased continuously through the time window. The additional filter would return a NULL in such a case. Which is the more desirable outcome is a matter of taste (or requirements spec).