Spool space issue using XMLAGG in a single table - sql

I need to aggregate all the NOTI_TEXT corresponding to NOTI_IDs. One NOTI_ID can have multiple NOTI_TEXT. I am using XMLAGG but it is running out of spool .
Below is the query:
select
NOTI_ID,
cast(XMLAGG(NOTI_TEXT order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID,
CURRENT_DATE as TABLE_LOAD_DT
from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
group by
NOTI_ID,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID
All relevant stats have been collected. The skew factor of the source table is 1.5
Below is the EXPLAIN plan :
Explain select
NOTI_ID,
cast(XMLAGG(NOTI_TEXT order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID,
CURRENT_DATE as TABLE_LOAD_DT
from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
group by
NOTI_ID,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID;
1) First, we lock
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
_1 for read on a reserved RowHash in all partitions to prevent
global deadlock.
2) Next, we lock
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
_1 for read.
3) We do an all-AMPs SUM step to aggregate from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
_1 by way of an all-rows scan with no residual conditions, and the
grouping identifier in field 1. Aggregate Intermediate Results
are computed globally, then placed in Spool 3. The input table
will not be cached in memory, but it is eligible for synchronized
scanning. The aggregate spool file will not be cached in memory.
The size of Spool 3 is estimated with high confidence to be
13,749,188 rows (64,456,193,344 bytes). The estimated time for
this step is 15 hours and 20 minutes.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with high confidence to be
13,749,188 rows (148,092,503,948 bytes). The estimated time for
this step is 4 minutes and 10 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 15 hours and 24 minutes.
Having used this table for other queries we never found any anomaly. I want to check if it can be further optimized or any alternative to achieve the same .

It's a large base table and a large aggregation and your system is probably not that big (spools are not cached).
Try to aggregate seperatly, .i.e. only based on the key column(s) (probably NOTI_ID) and then join back, this will remove those additional Group By columns from spool (NOTI_MAT_DIVISION_TEXT might cause this issue if it's a large VarChar):
select
t1.NOTI_ID,
t1.NOTI_TEXT,
t2.NOTI_COUNTRY_ID,
t2.NOTI_MAT_DIVISION_ID,
t2.NOTI_MAT_DIVISION_TEXT,
t2.NOTI_SOURCESYSTEM_ID,
CURRENT_DATE as TABLE_LOAD_DT
from
( select
NOTI_ID,
cast(XMLAGG(NOTI_TEXT order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT
from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
group by
NOTI_ID,
) as t1
join
( select distinct
NOTI_ID,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID
from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
) as t2
on t1.NOTI_ID = t2.NOTI_ID

Related

Does SQL Server Table-Scan Time depend on the Query?

I observed that doing a full table scan takes a different time based on the query. I believed that under similar conditions (set of columns under select, column data types) a table scan should take a somewhat similar time. Seems like it's not the case. I just want to understand the reason behind that.
I have used "CHECKPOINT" and "DBCC DROPCLEANBUFFERS" before querying to make sure there is no impact from the query cache.
Table:
10 Columns
10M rows Each column has different densities ranging from 0.1 to 0.000001
No indexes
Queries:
Query A: returned 100 rows, time took: ~ 900ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL07 = 50000
Query B: returned 910595 rows, time took: ~ 15000ms
SELECT [COL00]
FROM [TEST].[dbo].[Test]
WHERE COL01 = 5
** Where column COL07 was randomly populated with integers ranging from 0 to 100000 and column COL01 was randomly populated with integers ranging from 0 to 10
Time Taken:
Query A: around 900 ms
Query B: around 18000 ms
What's the point I'm missing here?
Query A: (returned 100 rows, time took: ~ 900ms)
Query B: (returned 910595 rows, time took: ~ 15000ms)
I believe that what you are missing is that there are about x100 more rows to fetch in the second query. That only could explain why it took 20 times longer.
The two columns have different density of the data.
Query A, COL07: 10000000/100000 = 100
Query B, COL05: 10000000/10 = 1000000
The fact that both the search parameters are in the middle of the data range doesn't necessarily impact the speed of the search. This is depending on the number of times the engine scans the column to return the values of the search predicate.
In order to see if this is indeed the case, I would try the following:
COL04: 10000000/1000 = 10000. Filtering on WHERE COL04 = 500
COL08: 10000000/10000 = 1000. Filtering on WHERE COL05 = 5000
Considering the times from the initial test, you would expect to see COL04 at ~7200ms and COL05 at ~3600ms.
An interesting article about SQL Server COUNT() Function Performance Comparison
Full Table Scan (also known as Sequential Scan) is a scan made on a database where each row of the table under scan is read in a sequential (serial) order
Reference
In your case, full table scan scans sequentially (in ordered way) so that it does not need to scan whole table in order to advance next record because Col7 is ordered.
but in Query2 the case is not like that, Col01 is randomly distributed so full table scan is needed.
Query 1 is optimistic scan where as Query 2 is pessimistic can.

Explaining the Explain in teradata

Explain
sel * from sandbox.comapny_employees ce
left join
sandbox.comapny_age ca
on ce.age=ca.age and ce.age>42
some times its really confusing to understand how explain plan was created.
below explain plan totally skipped the condition "ce.age>42" from steps, we know all the records get printed and this condition will not work but how this explain was created to discard the condition.
Please let me know if table structure is required.
1) First, we lock sandbox.ca for read on a reserved RowHash to
prevent global deadlock.
2) Next, we lock sandbox.ce for read on a reserved RowHash to prevent
global deadlock.
3) We lock sandbox.ca for read, and we lock sandbox.ce for read.
4) We do an all-AMPs RETRIEVE step from sandbox.ce by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps),
which is redistributed by the hash code of (sandbox.ce.age) to all
AMPs. Then we do a SORT to order Spool 2 by row hash. The size
of Spool 2 is estimated with high confidence to be 3 rows (81
bytes). The estimated time for this step is 0.01 seconds.
5) We do an all-AMPs JOIN step from sandbox.ca by way of a RowHash
match scan with a condition of ("sandbox.ca.age >= 43"), which is
joined to Spool 2 (Last Use) by way of a RowHash match scan.
sandbox.ca and Spool 2 are right outer joined using a merge join,
with condition(s) used for non-matching on right table ("age >= 43"),
with a join condition of ("age = sandbox.ca.age"). The result
goes into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with low confidence to be 4 rows
(176 bytes). The estimated time for this step is 0.02 seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.03 seconds.

Performance issue for join on the same tables multiple times

I am facing a performance issue for the below query where the same table is self joined multiple times. How can i avoid multiple joins on same table?
INSERT INTO "TEMP"."TABLE2"
SELECT
T1."PRODUCT_SNO"
,T2."PRODUCT_SNO"
,T3."PRODUCT_SNO"
,T4."PRODUCT_SNO"
,((COUNT(DISTINCT T1."ACCESS_METHOD_ID")(FLOAT)) /
(MAX(T5.GROUP_NUM(FLOAT))))
FROM
"TEMP"."TABLE1" T1
,"TEMP"."TABLE1" T2
,"TEMP"."TABLE1" T3
,"TEMP"."TABLE1" T4
,"TEMP"."_TWM_GROUP_COUNT" T5
WHERE
T1."ACCESS_METHOD_ID" = T2."ACCESS_METHOD_ID"
AND T2."ACCESS_METHOD_ID" = T3."ACCESS_METHOD_ID"
AND T3."ACCESS_METHOD_ID" = T4."ACCESS_METHOD_ID"
AND T1."SUBSCRIPTION_DATE" < T2."SUBSCRIPTION_DATE"
AND T2."SUBSCRIPTION_DATE" < T3."SUBSCRIPTION_DATE"
AND T3."SUBSCRIPTION_DATE" < T4."SUBSCRIPTION_DATE"
GROUP BY 1, 2, 3, 4;
This is taking 3 hrs to complete. Below is the explain for it:
1) First, we lock a distinct TEMP."pseudo table" for write on a
RowHash to prevent global deadlock for
TEMP.TABLE2.
2) Next, we lock a distinct TEMP."pseudo table" for read on a
RowHash to prevent global deadlock for TEMP.T5.
3) We lock TEMP.TABLE2 for write, we lock
TEMP.TABLE1 for access, and we lock TEMP.T5 for read.
4) We do an all-AMPs RETRIEVE step from TEMP.T5 by way of an
all-rows scan with no residual conditions into Spool 4 (all_amps),
which is duplicated on all AMPs. The size of Spool 4 is estimated
with high confidence to be 48 rows (816 bytes). The estimated
time for this step is 0.01 seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of
an all-rows scan, which is joined to TEMP.T4 by way of an
all-rows scan with no residual conditions. Spool 4 and
TEMP.T4 are joined using a product join, with a join
condition of ("(1=1)"). The result goes into Spool 5
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 5 by the hash code of (
TEMP.T4.ACCESS_METHOD_ID). The size of Spool 5 is
estimated with high confidence to be 8,051,801 rows (
233,502,229 bytes). The estimated time for this step is 1.77
seconds.
2) We do an all-AMPs JOIN step from TEMP.T2 by way of a
RowHash match scan with no residual conditions, which is
joined to TEMP.T1 by way of a RowHash match scan with no
residual conditions. TEMP.T2 and TEMP.T1 are joined
using a merge join, with a join condition of (
"(TEMP.T1.ACCESS_METHOD_ID = TEMP.T2.ACCESS_METHOD_ID)
AND (TEMP.T1.SUBSCRIPTION_DATE <
TEMP.T2.SUBSCRIPTION_DATE)"). The result goes into Spool
6 (all_amps), which is built locally on the AMPs. The size
of Spool 6 is estimated with low confidence to be 36,764,681
rows (1,213,234,473 bytes). The estimated time for this step
is 4.12 seconds.
6) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of a
RowHash match scan, which is joined to TEMP.T3 by way of a
RowHash match scan with no residual conditions. Spool 5 and
TEMP.T3 are joined using a merge join, with a join condition
of ("(TEMP.T3.SUBSCRIPTION_DATE < SUBSCRIPTION_DATE) AND
(TEMP.T3.ACCESS_METHOD_ID = ACCESS_METHOD_ID)"). The result
goes into Spool 7 (all_amps), which is built locally on the AMPs.
The size of Spool 7 is estimated with low confidence to be
36,764,681 rows (1,360,293,197 bytes). The estimated time for
this step is 4.14 seconds.
7) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of a
RowHash match scan, which is joined to Spool 7 (Last Use) by way
of a RowHash match scan. Spool 6 and Spool 7 are joined using a
merge join, with a join condition of ("(SUBSCRIPTION_DATE <
SUBSCRIPTION_DATE) AND ((ACCESS_METHOD_ID = ACCESS_METHOD_ID) AND
((ACCESS_METHOD_ID = ACCESS_METHOD_ID) AND ((ACCESS_METHOD_ID =
ACCESS_METHOD_ID) AND (ACCESS_METHOD_ID = ACCESS_METHOD_ID ))))").
The result goes into Spool 3 (all_amps), which is built locally on
the AMPs. The result spool file will not be cached in memory.
The size of Spool 3 is estimated with low confidence to be
766,489,720 rows (29,893,099,080 bytes). The estimated time for
this step is 1 minute and 21 seconds.
8) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
way of an all-rows scan , grouping by field1 (
TEMP.T1.PRODUCT_SNO ,TEMP.T2.PRODUCT_SNO
,TEMP.T3.PRODUCT_SNO ,TEMP.T4.PRODUCT_SNO
,TEMP.T1.ACCESS_METHOD_ID). Aggregate Intermediate Results
are computed globally, then placed in Spool 9. The aggregate
spool file will not be cached in memory. The size of Spool 9 is
estimated with low confidence to be 574,867,290 rows (
46,564,250,490 bytes). The estimated time for this step is 6
minutes and 38 seconds.
9) We do an all-AMPs SUM step to aggregate from Spool 9 (Last Use) by
way of an all-rows scan , grouping by field1 (
TEMP.T1.PRODUCT_SNO ,TEMP.T2.PRODUCT_SNO
,TEMP.T3.PRODUCT_SNO ,TEMP.T4.PRODUCT_SNO). Aggregate
Intermediate Results are computed globally, then placed in Spool
11. The size of Spool 11 is estimated with low confidence to be
50,625 rows (3,695,625 bytes). The estimated time for this step
is 41.87 seconds.
10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
an all-rows scan into Spool 1 (all_amps), which is redistributed
by the hash code of (TEMP.T1.PRODUCT_SNO,
TEMP.T2.PRODUCT_SNO, TEMP.T3.PRODUCT_SNO,
TEMP.T4.PRODUCT_SNO) to all AMPs. Then we do a SORT to order
Spool 1 by row hash. The size of Spool 1 is estimated with low
confidence to be 50,625 rows (1,873,125 bytes). The estimated
time for this step is 0.04 seconds.
11) We do an all-AMPs MERGE into TEMP.TABLE2 from
Spool 1 (Last Use). The size is estimated with low confidence to
be 50,625 rows. The estimated time for this step is 1 second.
12) We spoil the parser's dictionary cache for the table.
13) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.
All the required stats are collected.
I have to admit I am not an expert with Teradata, but I did a quick check, and you can use the ANSI JOIN syntax.
So first I rewrote your query so I could understand it:
INSERT INTO
"TEMP"."TABLE2"
SELECT
T1."PRODUCT_SNO",
T2."PRODUCT_SNO",
T3."PRODUCT_SNO",
T4."PRODUCT_SNO",
((COUNT(DISTINCT T1."ACCESS_METHOD_ID")(FLOAT)) /
(MAX(T5.GROUP_NUM(FLOAT))))
FROM
"TEMP"."TABLE1" T1
INNER JOIN "TEMP"."TABLE1" T2 ON T2."ACCESS_METHOD_ID" = T1."ACCESS_METHOD_ID"
AND T2."SUBSCRIPTION_DATE" > T1."SUBSCRIPTION_DATE"
INNER JOIN "TEMP"."TABLE1" T3 ON T3."ACCESS_METHOD_ID" = T2."ACCESS_METHOD_ID"
AND T3."SUBSCRIPTION_DATE" > T2."SUBSCRIPTION_DATE"
INNER JOIN "TEMP"."TABLE1" T4 ON T4."ACCESS_METHOD_ID" = T3."ACCESS_METHOD_ID"
AND T4."SUBSCRIPTION_DATE" > T3."SUBSCRIPTION_DATE"
CROSS JOIN "TEMP"."_TWM_GROUP_COUNT" T5
GROUP BY
T1."PRODUCT_SNO",
T2."PRODUCT_SNO",
T3."PRODUCT_SNO",
T4."PRODUCT_SNO";
Note that many of those changes are just a personal preference, but others will "allow your queries to enter the 21st Century" ;P
Now I can read your SQL I can make a number of assumptions on what you are actually trying to achieve here:
you have some table that has products in it, each product has a serial number, an "access method"(no idea what this is?) and a subscription date;
you are finding products with the same "access method" and then chaining them together into subscription date order, then you display the serial number of each product in the chain;
each chain must be exactly 4 products long. No idea what happens if there are either less than or more than 4 products in a chain (well I can see that if there are less than 4 products in a chain then this will be discarded);
you also have a metric which turns this logic on its head. Now you are counting the number of distinct access methods per chain, and dividing this by some number that comes from another table that we know nothing at all about.
That isn't really a lot to go on, but I can see a few places that you could look to optimise:
you only ever use that _TMW_GROUP_COUNT table for one thing, the MAX(GROUP_NUM). So you could work that out ahead of the main query then remove the need for this potentially expensive JOIN. I have no idea how you could do this with Teradata, but in other SQL variants you could stick this into a variable, use a common table expression, use a sub query, etc. If there are many rows in that table, then there is the potential that the optimiser will run your query x times, then discard x-1 result sets!
any non-equi join is going to be inefficient, but it doesn't appear as though you can avoid these. If your table isn't indexed by SUBSCRIPTION_DATE then it might help to pre-sort the data in the table, adding a numeric order number (again in other variants of SQL this would be a ROW_NUMBER() OVER (ORDER BY SUBSCRIPTION_DATE) type syntax. Then your date comparison could be a numeric comparison;
obviously indexes are going to be important here;
finally, you could split the query into stages, starting with the T1 to T2 join, then using this as the basis for the (T1 to T2) to T3 join, etc. This might not help, but it's possibly worth a try?
That's probably not a lot of help, but there isn't really enough to go on without some sample data, etc. really...

Teradata SQL Optimization

I hope this is concise. I am basically looking for a methodology on how to improve queries after watching one of my colleagues speed up my query almost 10 fold with a quick change
I had a query that had two tables t_item and t_action
t_item is basically an item with characteristics and t_action is the events or actions that are performed on this item with a time stamp for each action each action also has an id
My query joined the two tables on id. There were also some criteria made on t_action.action_type which is free text
My simplified original query was like below
SELECT *
FROM t_item
JOIN t_action
ON t_item.pk = t_action.fk
WHERE t_action.action_type LIKE ('%PURCHASE%')
AND t_item.location = 'DE'
This ran OK, it came back in roughly 8 mins
My colleague changed it so that the t_action.action_type ended up in the FROM portion of the SQL. This reduced the time to 2 mins
SELECT *
FROM t_item
JOIN t_action
ON t_item.pk = t_action.fk
t_action.action_type LIKE ('%PURCHASE%')
WHERE t_item.location = 'DE'
My question is, Generally, how do you know when to put limits in the FROM clause vs in the WHERE clause.
I thought that Teradata SQL optimizer does this automatically
Thank you for your help
In this case, you don't actually need to understand the plan. You just need to see if the two plans are the same. Teradata has a pretty good optimizer, so I would not expect there to be a difference between the two version (could be, but I would be surprised). Hence, caching is a possibility for explaining the difference in performance.
For this query:
SELECT *
FROM t_item JOIN
t_action
ON t_item.pk = t_action.fk
t_action.action_type LIKE '%PURCHASE%'
WHERE t_item.location = 'DE';
The best indexes are probably on t_item(location, pk) and t_action(action_type). However, you should try to get rid of the wildcards for a production query. This makes the query harder to optimize, which in turn might have a large impact on performance.
I tried to create similar query but didn't see any difference in the explain plan..though record counts were less trans(15k) and accounts(10k) with indexes on Account_number. Probably what Gordon has specified , try to run the query at different time and also check explain plan for both the queries to see any difference.
Explain select * from trans t
inner join
ap.accounts a
on t.account_number = a.account_number
where t.trans_id like '%DEP%';
4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
scan with no residual conditions, which is joined to ap.t by way
of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
'%DEP%'"). ap.a and ap.t are joined using a merge join, with a
join condition of ("ap.t.Account_Number = ap.a.Account_Number").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 11,996 rows (1,511,496 bytes). The estimated time for this
step is 0.25 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.25 seconds.
Explain select * from trans t
inner join
ap.accounts a
on t.account_number = a.account_number
and t.trans_id like '%DEP%';
4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
scan with no residual conditions, which is joined to ap.t by way
of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
'%DEP%'"). ap.a and ap.t are joined using a merge join, with a
join condition of ("ap.t.Account_Number = ap.a.Account_Number").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 11,996 rows (1,511,496 bytes). The estimated time for this
step is 0.25 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.25 seconds.
The general order of query processing on Teradata is :
Where/And + Joins
Aggregate
Having
Olap/Window
Qualify
Sample/Top
Order By
Format
An easy way to remember is WAHOQSOF - as in Wax On, Wax Off :)

Teradata Performance issue and example

I am facing an issue in our Teradata QA environment where a simple query that ran in under 1 minute is now taking 12 minutes to complete. This select is pulling 5 fields based on a simple inner join
select a.material
, b.season
, b.theme
, b.collection
from SalesOrders_view.Allocation_Deliveries_cur a
inner join SalesOrders_view.Material_Attributes_cur b
on a.material = b.material;
I can run this same query in our Prod environment and it returns in less than a minute while running on approx 200k more records than QA.
Total volume is under 1.1 M records in SalesOrders.Allocation_Deliveries and 129 k records in SalesOrders.Material_Attributes. These are small datasets.
I compared the Explain plans on both environments and there is a stark difference in the estimated spool volume in the first Join step. The estimate in Production is on the money while the Estimate in QA is an order of magnitude off. However the data and table/views are identical in both systems and we have collected stats in every conceivable manner and we can see the particular table demographics in both systems as identical.
Lastly, this query has always returned in under a minute in all environments including QA as it is still doing in Production. This latent behavior is recent in the last week or so. I discussed this with our DBA and we have had no changes to software or configuration. He is new, but seems to know what he's doing but still getting caught up with a new environment.
I am looking for some pointers on what to check next. I have compared the relavant table / view definitions across QA and Prod and they are identical. The Table demographics in each system are also the same (I went through these with our DBA to make sure)
Any help is appreciated. Thanks in advance.
Pat
This is the Explain plan from QA. Note the very Low estimate in Step 5 (144 Rows). In Prod, the same Explain shows > 1 M rows which would be close to what I know.
Explain select a.material
, b.season
, b.theme
, b.collection
from SalesOrders_view.Allocation_Deliveries a
inner join SalesOrders_view.Material_Attributes_cur b
on a.material = b.material;
1) First, we lock SalesOrders.Allocation_Deliveries in view
SalesOrders_view.Allocation_Deliveries for access, and we lock
SalesOrders.Material_Attributes in view SalesOrders_view.Material_Attributes_cur for
access.
2) Next, we do an all-AMPs SUM step to aggregate from
SalesOrders.Material_Attributes in view SalesOrders_view.Material_Attributes_cur by way
of an all-rows scan with no residual conditions
, grouping by field1 ( SalesOrders.Material_Attributes.material
,SalesOrders.Material_Attributes.season ,SalesOrders.Material_Attributes.theme
,SalesOrders.Material_Attributes.theme ,SalesOrders.Material_Attributes.af_grdval
,SalesOrders.Material_Attributes.af_stcat
,SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM). Aggregate
Intermediate Results are computed locally, then placed in Spool 4.
The size of Spool 4 is estimated with high confidence to be
129,144 rows (41,713,512 bytes). The estimated time for this step
is 0.06 seconds.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by
way of an all-rows scan into Spool 2 (all_amps), which is
redistributed by the hash code of (
SalesOrders.Material_Attributes.Field_9,
SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
SalesOrders.Material_Attributes.Field_7, SalesOrders.Material_Attributes.Field_6,
SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
SalesOrders.Material_Attributes.season, SalesOrders.Material_Attributes.material)
to all AMPs. Then we do a SORT to order Spool 2 by row hash
and the sort key in spool field1 eliminating duplicate rows.
The size of Spool 2 is estimated with low confidence to be
129,144 rows (23,504,208 bytes). The estimated time for this
step is 0.11 seconds.
2) We do an all-AMPs RETRIEVE step from SalesOrders.Material_Attributes in
view SalesOrders_view.Material_Attributes_cur by way of an all-rows scan
with no residual conditions locking for access into Spool 6
(all_amps), which is redistributed by the hash code of (
SalesOrders.Material_Attributes.material, SalesOrders.Material_Attributes.season,
SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
SalesOrders.Material_Attributes.Material_Attributes_UPD_TS, (CASE WHEN (NOT
(SalesOrders.Material_Attributes.af_stcat IS NULL )) THEN
(SalesOrders.Material_Attributes.af_stcat) ELSE ('') END )(VARCHAR(16),
CHARACTER SET UNICODE, NOT CASESPECIFIC), (CASE WHEN (NOT
(SalesOrders.Material_Attributes.af_grdval IS NULL )) THEN
(SalesOrders.Material_Attributes.af_grdval) ELSE ('') END )(VARCHAR(8),
CHARACTER SET UNICODE, NOT CASESPECIFIC)) to all AMPs. Then
we do a SORT to order Spool 6 by row hash. The size of Spool
6 is estimated with high confidence to be 129,144 rows (
13,430,976 bytes). The estimated time for this step is 0.08
seconds.
4) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
an all-rows scan into Spool 7 (all_amps), which is built locally
on the AMPs. Then we do a SORT to order Spool 7 by the hash code
of (SalesOrders.Material_Attributes.material, SalesOrders.Material_Attributes.season,
SalesOrders.Material_Attributes.theme, SalesOrders.Material_Attributes.theme,
SalesOrders.Material_Attributes.Field_6, SalesOrders.Material_Attributes.Field_7,
SalesOrders.Material_Attributes.Material_Attributes_SRC_SYS_NM,
SalesOrders.Material_Attributes.Field_9). The size of Spool 7 is estimated
with low confidence to be 129,144 rows (13,301,832 bytes). The
estimated time for this step is 0.05 seconds.
5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 7 are joined using an inclusion
merge join, with a join condition of ("(material = material) AND
((season = season) AND ((theme = theme) AND ((theme =
theme) AND (((( CASE WHEN (NOT (af_grdval IS NULL )) THEN
(af_grdval) ELSE ('') END ))= Field_6) AND (((( CASE WHEN (NOT
(AF_STCAT IS NULL )) THEN (AF_STCAT) ELSE ('') END ))= Field_7)
AND ((Material_Attributes_SRC_SYS_NM = Material_Attributes_SRC_SYS_NM) AND
(Material_Attributes_UPD_TS = Field_9 )))))))"). The result goes into Spool
8 (all_amps), which is duplicated on all AMPs. The size of Spool
8 is estimated with low confidence to be 144 rows (5,616 bytes).
The estimated time for this step is 0.04 seconds.
6) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to SalesOrders.Allocation_Deliveries in view
SalesOrders_view.Allocation_Deliveries by way of an all-rows scan with no
residual conditions. Spool 8 and SalesOrders.Allocation_Deliveries are
joined using a single partition hash_ join, with a join condition
of ("SalesOrders.Allocation_Deliveries.material = material"). The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with low confidence to be 3,858
rows (146,604 bytes). The estimated time for this step is 0.44
seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.70 seconds.
Here is what the record distribution looks like and the SQL I used to generate the result set
SELECT HASHAMP(HASHBUCKET(HASHROW( MATERIAL ))) AS
"AMP#",COUNT(*)
FROM EDW_LND_SAP_VIEW.EMDMMU01_CUR
GROUP BY 1
ORDER BY 2 DESC;
Output
Highest: AMP 137 with 1093 rows
Lowest: AMP 72 with 768 rows
Total AMPs: 144
Statistics Recommendations
Run the following in PROD and QA and post the differences (obscure column names if need be):
DIAGNOSTIC HELPSTATS ON FOR SESSION;
EXPLAIN
select a.material
, b.season
, b.theme
, b.collection
from SalesOrders_view.Allocation_Deliveries_cur a
inner join SalesOrders_view.Material_Attributes_cur b
on a.material = b.material;
This diagnostic when run in conjunction with the EXPLAIN command will produce a list of recommended statistics that may beneficial to optimizer in producing the lowest cost query plan. This may yield no difference or it may point to something that is different between the environments (data or otherwise).
Views and JOIN conditions
Based on your EXPLAIN plan one or both of the views in the SalesOrders_View database appear to be using an EXISTS clause. This EXISTS clause is relying on a COALESCE condition (or explicit CASE logic) to accomodate for a comparison between a column in one table that is defined as NOT NULL and a column in another table that is defined to allow NULL values. This can affect performance of that join.
Data Distribution
Your distribution results appear to be from the PRODUCTION environment. (Based on the number of AMPS and the number of rows shown on the AMP with the highest and lowest rows.) What does that look like for QA?
Edit - 2013-01-09 09:21
If the data was copied from Prod 2 months ago it may seem silly to ask but were the statistics recollected afterward? Stale statistics on top of replaced data could lead to the variance in the query plan between the environments.
Are you collecting PARTITION statistics on your tables even if they are not PPI tables? This helps the optimizer with cardinality estimates.
Are you the only workload running on the QA system?
Have you looked at the DBQL metrics to compare CPU and IO consumption for the query in each environment? Look at IO Skew, CPU Skew, and Unnecessary IO metrics as well.
Do you have delay throttles in place in the QA environment that may be delaying your workload? This will givie you the perception that it is taking longer to run in the QA environment when in fact the actual CPU consumption and IO consumption are the same between QA and PROD.
Do you have access to Viewpoint?
If so, have you looked at your query using either the My Queries and/or Query Spotlight
portlets to observe it's behavior?
Do you know which step in the Query Plan is the most expensive or time consuming? Viewpoint Rewind with either portlet I mentioned or Step logging in DBQL can show you this.
Are the DBS Control settings between the environments identical? Ask your DBA to look at this. There are settings in there that can affect the join plans that are used by the optimizer.
In the end if the data, table structures, indexes, and statistics are the same on the two systems whose hardware and TDBMS patch levels are identical you should not get two different EXPLAIN plans. If that ends up being the case I would suggest that you contact the GSC and get them involved.