I am using Teradata database but I am quite new to its functioning. Could you please help me with making the below query more efficient so that it does not yield 'no more spool space' error? It is getting too heavy after I add the 2nd join.
SELECT
a.src_cmpgn_code,
a.cmc_name,
SUM(b.open_cnt)
FROM access_views.dw_cmc_lkp a
LEFT JOIN prs_restricted_v.mh_crm_engmnt_sd b
ON b.cmpgn_id = a.cmc_id
LEFT JOIN access_views.dw_cmc_instnc c
ON b.cmpgn_id = c.cmc_id
WHERE 1=1
AND b.trigger_dt BETWEEN '2019-01-01' AND '2019-12-31'
AND b.site_cntry_id = 1
AND a.cmpgn_group_name IN ('a', 'b', 'c', 'd')
AND c.dlvry_vhcl_id IN (1, 10)
AND c.chnl_id = 1
GROUP BY 1,2;
Explain looks like this:
This query is optimized using type 2 profile Cost_NoSlidingJ_Profile,
profileid 10007. 1) First, we lock mdm_tables.DW_CMC_INSTNC in view
access_views.dw_cmc_instnc for access, we lock MDM_TABLES.DW_CMC_LKP
in view access_views.dw_cmc_lkp for access, and we lock
PRS_T.MH_CRM_ENGMNT_SD in view prs_restricted_v.mh_crm_engmnt_sd for
access. 2) Next, we do an all-AMPs RETRIEVE step from 365 partitions
of PRS_T.MH_CRM_ENGMNT_SD in view prs_restricted_v.mh_crm_engmnt_sd
with a condition of ("(NOT (PRS_T.MH_CRM_ENGMNT_SD in view
prs_restricted_v.mh_crm_engmnt_sd.CMPGN_ID IS NULL )) AND
(((PRS_T.MH_CRM_ENGMNT_SD in view
prs_restricted_v.mh_crm_engmnt_sd.TRIGGER_DT <= DATE '2019-12-31') AND
(PRS_T.MH_CRM_ENGMNT_SD.TRIGGER_DT >= DATE '2019-01-01')) AND
(PRS_T.MH_CRM_ENGMNT_SD in view
prs_restricted_v.mh_crm_engmnt_sd.SITE_CNTRY_ID = 1. ))") into Spool 4
(all_amps), which is redistributed by the hash code of (
PRS_T.MH_CRM_ENGMNT_SD.CMPGN_ID) to all AMPs. The size of Spool 4 is
estimated with no confidence to be 329,656,959 rows ( 7,582,110,057
bytes). The estimated time for this step is 2.40 seconds. 3) We do an
all-AMPs JOIN step from MDM_TABLES.DW_CMC_LKP in view
access_views.dw_cmc_lkp by way of an all-rows scan with a condition of
("MDM_TABLES.DW_CMC_LKP in view
access_views.dw_cmc_lkp.CMPGN_GROUP_NAME IN ('Bucks_Nectar_eBayPlus',
'DailyDeal','Other','STEP_User_Agreement')"), which is joined to Spool
4 (Last Use) by way of an all-rows scan. MDM_TABLES.DW_CMC_LKP and
Spool 4 are joined using a single partition hash join, with a join
condition of ("CMPGN_ID = MDM_TABLES.DW_CMC_LKP.CMC_ID"). The result
goes into Spool 5 (all_amps) fanned out into 5 hash join partitions,
which is built locally on the AMPs. The size of Spool 5 is estimated
with no confidence to be 79,119,821 rows (10,681,175,835 bytes). The
estimated time for this step is 0.19 seconds. 4) We do an all-AMPs
RETRIEVE step from mdm_tables.DW_CMC_INSTNC in view
access_views.dw_cmc_instnc by way of an all-rows scan with a condition
of ("(mdm_tables.DW_CMC_INSTNC in view
access_views.dw_cmc_instnc.DLVRY_VHCL_ID IN (1 , 10 )) AND
((mdm_tables.DW_CMC_INSTNC in view access_views.dw_cmc_instnc.CHNL_ID
= 1) AND (mdm_tables.DW_CMC_INSTNC in view access_views.dw_cmc_instnc.TRTMNT_TYPE_CODE <> 'I'))") into Spool 6
(all_amps) fanned out into 5 hash join partitions, which is
redistributed by the hash code of ( mdm_tables.DW_CMC_INSTNC.CMC_ID)
to all AMPs. The size of Spool 6 is estimated with no confidence to be
2,874,675 rows (48,869,475 bytes). The estimated time for this step is
0.58 seconds. 5) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of an all-rows scan, which is joined to Spool 6 (Last Use) by
way of an all-rows scan. Spool 5 and Spool 6 are joined using a hash
join of 5 partitions, with a join condition of ("(CMPGN_ID = CMC_ID)
AND (CMC_ID = CMC_ID)"). The result goes into Spool 3 (all_amps),
which is built locally on the AMPs. The size of Spool 3 is estimated
with no confidence to be 5,353,507,625 rows ( 690,602,483,625 bytes).
The estimated time for this step is 14.82 seconds. 6) We do an
all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an
all-rows scan , grouping by field1 (
MDM_TABLES.DW_CMC_LKP.SRC_CMPGN_CODE ,MDM_TABLES.DW_CMC_LKP.CMC_NAME).
Aggregate Intermediate Results are computed globally, then placed in
Spool 7. The size of Spool 7 is estimated with no confidence to be
11,774 rows (5,286,526 bytes). The estimated time for this step is
24.51 seconds. 7) We do an all-AMPs RETRIEVE step from Spool 7 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is
built locally on the AMPs. The size of Spool 1 is estimated with no
confidence to be 11,774 rows (2,837,534 bytes). The estimated time for
this step is 0.01 seconds. 8) Finally, we send out an END TRANSACTION
step to all AMPs involved in processing the request. -> The contents
of Spool 1 are sent back to the user as the result of statement 1. The
total estimated time is 42.50 seconds.
I want to insert data into the target table but unfortunately it takes too much time, even it is only around 800 000 records.
I think that the problem is with the execution plan/bad indexes or statistics, however, can you please look at the explain plan whether something is suspected there?
Here is the plan:
This query is optimized using type 2 profile nonested_cost, profileid
10003.
This request is eligible for incremental planning and execution (IPE)
but does not meet cost thresholds. The following is the static plan
for the request.
1) First, we lock Schema1.Target_Table in TD_MAP1
for write on a reserved RowHash to prevent global deadlock.
2) Next, we lock Schema1.Target_Table in TD_MAP1
for write, we lock Schema1.Table1 in view
Schema1.Table1_View in TD_MAP1 for access, and we lock
Schema1.Table2 in view
Schema1.Table3 in TD_MAP1 for access.
3) We do an all-AMPs RETRIEVE step in TD_MAP1 from
Schema1.Table2 in view
Schema1.Table3 by way of an all-rows scan
with a condition of ("(NOT (Schema1.Table2 in
view Schema1.Table3.CUSTOMER_ID IS NULL
)) AND (Schema1.Table2 in view
Schema1.Table3.TYPE = 2.)") into Spool 3
(all_amps) (compressed columns allowed), which is redistributed by
the hash code of (
Schema1.Table2.CUSTOMER_ID) to all AMPs in
TD_Map1. The size of Spool 3 is estimated with high confidence to
be 66,211 rows (66,939,321 bytes). The estimated time for this
step is 0.80 seconds.
4) We do an all-AMPs JOIN step in TD_Map1 from Spool 3 (Last Use) by
way of a RowHash match scan, which is joined to
Schema1.Table1 in view Schema1.Table1_View by way
of a RowHash match scan with no residual conditions. Spool 3 and
Schema1.Table1 are joined using a single partition hash
join, with a join condition of ("CUSTOMER_ID =
Schema1.Table1.CUSTOMER_ID"). The result goes into
Spool 4 (all_amps) (compressed columns allowed), which is
redistributed by the hash code of (
HERE IS THE LIST OF COLUMNS FROM TABLE 1 AND TABLE 2 to all AMPs in TD_Map1.
Then we do a SORT to order Spool 4 by row hash. The size of Spool
4 is estimated with index join confidence to be 66,211 rows (
73,163,155 bytes). The estimated time for this step is 0.11
seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 4 by
way of an all-rows scan into Spool 5 (all_amps) (compressed
columns allowed) fanned out into 33 hash join partitions,
which is duplicated on all AMPs in TD_Map1. The size of
Spool 5 is estimated with index join confidence to be
28,603,152 rows (31,635,086,112 bytes). The estimated time
for this step is 1.69 seconds.
2) We do an all-AMPs RETRIEVE step in TD_MAP1 from
S.TAB_M by way of an all-rows scan with a
condition of ("(NOT (S.TAB_M.EQUIPMENT_ID IS NULL ))
AND (NOT (S.TAB_M.COUNTY_ID IS NULL ))")
into Spool 6 (all_amps) (compressed columns allowed) fanned
out into 33 hash join partitions, which is built locally on
the AMPs. The size of Spool 6 is estimated with high
confidence to be 120,648,009 rows (29,679,410,214 bytes).
The estimated time for this step is 2.68 seconds.
6) We do an all-AMPs JOIN step in TD_Map1 from Spool 6 (Last Use) by
way of an all-rows scan, which is joined to Spool 5 (Last Use) by
way of an all-rows scan. Spool 6 and Spool 5 are joined using a
hash join of 33 partitions, with a join condition of ("(NOT
(SubCustomer_ID IS NULL )) AND ((NOT (ORG_ID IS NULL )) AND
((EQUIPMENT_ID = SubCustomer_ID) AND ((COUNTY_ID )= (ORG_ID
(FLOAT, FORMAT '-9.99999999999999E-999')))))"). The result goes
into Spool 7 (all_amps) (compressed columns allowed), which is
redistributed by the hash code of (
HERE IS THE LIST OF COLUMNS FROM TABLE 1 AND TABLE 2) to all AMPs in
TD_Map1. Then we do a SORT to order Spool 7 by row hash. The
size of Spool 7 is estimated with index join confidence to be
88,525 rows (28,239,475 bytes).
7) We do an all-AMPs JOIN step in TD_Map1 from Spool 7 (Last Use) by
way of a RowHash match scan, which is joined to Spool 4 (Last Use)
by way of a RowHash match scan. Spool 7 and Spool 4 are
right outer joined using a merge join, with a join condition of (
"Field_1 = Field_1"). The result goes into Spool 2 (all_amps)
(compressed columns allowed), which is built locally on the AMPs.
The size of Spool 2 is estimated with index join confidence to be
88,525 rows (114,639,875 bytes). The estimated time for this step
is 1.68 seconds.
8) We do an all-AMPs SUM step in TD_Map1 to aggregate from Spool 2
(Last Use) by way of an all-rows scan, grouping by field1 (HERE IS THE LIST OF COLUMNS FROM TABLE 1 AND TABLE 2 AND S.TAB_M). Aggregate Intermediate
Results are computed locally, then placed in Spool 11 in TD_Map1.
The size of Spool 11 is estimated with low confidence to be 66,394
rows (330,575,726 bytes). The estimated time for this step is
0.23 seconds.
9) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 11 (Last
Use) by way of an all-rows scan into Spool 1 (all_amps)
(compressed columns allowed), which is redistributed by the hash
code of ((CASE WHEN (NOT (S.TAB_M.TIME_DELIVERY IS
NULL )) THEN (S.TAB_M.TIME_DELIVERY) WHEN (NOT
(S.TAB_M.TIME_DELIVERY_EXP IS NULL )) THEN
(S.TAB_M.TIME_DELIVERY_EXP) ELSE (0) END )(INTEGER)) to
all AMPs in TD_Map1. Then we do a SORT to order Spool 1 by row
hash. The size of Spool 1 is estimated with low confidence to be
66,394 rows (85,050,714 bytes). The estimated time for this step
is 0.05 seconds.
10) We do an all-AMPs MERGE step in TD_MAP1 into
Schema1.Target_Table from Spool 1 (Last Use).
The size is estimated with low confidence to be 66,394 rows. The
estimated time for this step is 4.00 seconds.
11) We spoil the parser's dictionary cache for the table.
12) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.```
This is pretty weird for me. TD overestimates amount of rows, actually 42 mln, estimated 943 mln.
The query is pretty easy:
select ID, sum(amount)
from v_tb -- view
where REPORT_DATE between Date '2017-11-01' and Date '2017-11-30' -- report_date has date format
group by 1
Plan:
1) First, we lock tb in view v_tb for access.
2) Next, we do an all-AMPs SUM step to aggregate from 1230 partitions
of tb in view v_tb with a
condition of ("(tb.REPORT_DATE >= DATE '2017-11-01') AND
(tb.REPORT_DATE <= DATE '2017-11-30')")
, grouping by field1 ( ID). Aggregate
Intermediate Results are computed locally, then placed in Spool 1.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The size of Spool 1 is estimated with
low confidence to be 943,975,437 rows (27,375,287,673 bytes). The
estimated time for this step is 1 minute and 26 seconds.
3) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 1 minute and 26 seconds.
There is collected statistics according to DBC.statsV on ID, report_date, (ID, report_date) - they all up to date. There is no null values - TRUE
UniqueValueCount for ID, report_date, (ID, report_date) - 36 mln, 839, 1232 mln values - seems to be correct
Why TD overestimated the amount of rows? Isn't it should get the final result based on UniqueValueCount of ID only, because I group on it
UPD1:
-- estimates 32 mln rows
select ID, sum(amount)
from v_tb -- view
where REPORT_DATE between Date '2017-11-01' and Date '2017-11-01' -- report_date has date format
group by 1
-- estimates 89 mln rows
select ID, sum(amount)
from v_tb -- view
where REPORT_DATE between Date '2017-11-01' and Date '2017-11-02' -- report_date has date format
group by 1
So the problem is with where predicate
SampleSizePct is equal to 5.01 - does it mean, that sample size is only 5%? - yes it is
UPD2: Previous query was part of a bigger query, which looks like this:
select top 100000000
base.*
, case when CPE_MODEL_NEW.device_type in ('Smartphone', 'Phone', 'Tablet', 'USB modem') then CPE_MODEL_NEW.device_type
else 'other' end as device_type
, usg_mbou
, usg_arpu_content
, date '2017-11-30' as max_report_date
, macroregion_name
from (
select
a.SUBS_ID
, a.tac
, MSISDN
, BRANCH_ID
, max(bsegment) bsegment
, max((date '2017-11-30' - cast (activation_dttm as date))/30.4167) as LT_month
, Sum(REVENUE_COMMERCE) REVENUE_COMMERCE
, max(LAST_FLASH_DTTM) LAST_FLASH_DTTM
from PRD2_BDS_V2.SUBS_CLR_D a
where a.REPORT_DATE between Date '2017-11-01' and Date '2017-11-30'
group by 1,2,3,4 --, 8, 9
) base
left join CPE_MODEL_NEW on base.tac = CPE_MODEL_NEW.tac
left join
(
select SUBS_ID, sum(case when TRAFFIC_TYPE_ID = 4 /*DATA*/ then all_vol / (1024 * 1024) else 0 end) usg_mbou
,sum(case when COST_BAND_ID IN (3,46,49,56) then rated_amount else 0 end) usg_arpu_content
from PRD2_BDS_V2.SUBS_USG_D where SUBS_USG_D.REPORT_DATE between Date '2017-11-01' and Date '2017-11-30'
group by 1
) SUBS_USG_D
on SUBS_USG_D.SUBS_ID = base.SUBS_ID
LEFT JOIN PRD2_DIC_V.BRANCH AS BRANCH ON base.BRANCH_ID = BRANCH.BRANCH_ID
LEFT JOIN PRD2_DIC_V2.REGION AS REGION ON BRANCH.REGION_ID = REGION.REGION_ID
AND Date '2017-11-30' >= REGION.SDATE AND REGION.EDATE >= Date '2017-11-01'
LEFT JOIN PRD2_DIC_V2.MACROREGION AS MACROREGION ON REGION.MACROREGION_ID = MACROREGION.MACROREGION_ID
AND Date '2017-11-30' >= MACROREGION.SDATE AND Date '2017-11-01' <= MACROREGION.EDATE
Query fail on spool problem on a almost last steps:
We do an All-AMPs STAT FUNCTION step from Spool 10 by way of an all-rows scan into Spool 29, which is redistributed by hash code to all AMPs. The result rows are put into Spool 9, which is redistributed by hash code to all AMPs..
There is no product join, no wrong dublication to all amps, what offen lead to spool problem. However there is another problem, very high skew:
Snapshot CPU skew: 99.7%
Snapshot I/O skew: 99.7%
Spool usage just only 30 GB, but it easily uses more than 300 Gb at the beginning of query execution.
Tables aren't skewed
Full explain:
1) First, we lock TELE2_UAT.CPE_MODEL_NEW for access, we lock
PRD2_DIC.REGION in view PRD2_DIC_V2.REGION for access, we lock
PRD2_DIC.MACROREGION in view PRD2_DIC_V2.MACROREGION for access,
we lock PRD2_DIC.BRANCH in view PRD2_DIC_V.BRANCH for access, we
lock PRD2_BDS.SUBS_CLR_D for access, and we lock
PRD2_BDS.SUBS_USG_D for access.
2) Next, we do an all-AMPs SUM step to aggregate from 1230 partitions
of PRD2_BDS.SUBS_CLR_D with a condition of (
"(PRD2_BDS.SUBS_CLR_D.REPORT_DATE >= DATE '2017-11-01') AND
(PRD2_BDS.SUBS_CLR_D.REPORT_DATE <= DATE '2017-11-30')"), and the
grouping identifier in field 1. Aggregate Intermediate Results
are computed locally,skipping sort when applicable, then placed in
Spool 4. The input table will not be cached in memory, but it is
eligible for synchronized scanning. The size of Spool 4 is
estimated with low confidence to be 1,496,102,647 rows (
285,755,605,577 bytes). The estimated time for this step is 1
minute and 55 seconds.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by
way of an all-rows scan into Spool 2 (used to materialize
view, derived table, table function or table operator base)
(all_amps) (compressed columns allowed), which is built
locally on the AMPs with Field1 ("UniqueId"). The size of
Spool 2 is estimated with low confidence to be 1,496,102,647
rows (140,633,648,818 bytes). Spool AsgnList:
"Field_1" = "UniqueId",
"Field_2" = "SUBS_ID",
"Field_3" = "TAC",
"Field_4" = "MSISDN",
"Field_5" = "BRANCH_ID",
"Field_6" = "Field_6",
"Field_7" = "Field_7",
"Field_8" = "Field_8",
"Field_9" = "Field_9".
The estimated time for this step is 57.85 seconds.
2) We do an all-AMPs SUM step to aggregate from 1230 partitions
of PRD2_BDS.SUBS_USG_D with a condition of ("(NOT
(PRD2_BDS.SUBS_USG_D.SUBS_ID IS NULL )) AND
((PRD2_BDS.SUBS_USG_D.REPORT_DATE >= DATE '2017-11-01') AND
(PRD2_BDS.SUBS_USG_D.REPORT_DATE <= DATE '2017-11-30'))"),
and the grouping identifier in field 1. Aggregate
Intermediate Results are computed locally,skipping sort when
applicable, then placed in Spool 7. The input table will not
be cached in memory, but it is eligible for synchronized
scanning. The size of Spool 7 is estimated with low
confidence to be 943,975,437 rows (42,478,894,665 bytes).
The estimated time for this step is 1 minute and 29 seconds.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 7 (Last Use) by
way of an all-rows scan into Spool 1 (used to materialize
view, derived table, table function or table operator
SUBS_USG_D) (all_amps) (compressed columns allowed), which is
built locally on the AMPs with Field1 ("UniqueId"). The size
of Spool 1 is estimated with low confidence to be 943,975,437
rows (42,478,894,665 bytes). Spool AsgnList:
"Field_1" = "UniqueId",
"Field_2" = "SUBS_ID",
"Field_3" = "Field_3",
"Field_4" = "Field_4".
The estimated time for this step is 16.75 seconds.
2) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 11 (all_amps) (compressed
columns allowed), which is redistributed by hash code to all
AMPs to all AMPs with hash fields ("Spool_2.SUBS_ID"). Then
we do a SORT to order Spool 11 by row hash. The size of
Spool 11 is estimated with low confidence to be 1,496,102,647
rows (128,664,827,642 bytes). Spool AsgnList:
"SUBS_ID" = "Spool_2.SUBS_ID",
"TAC" = "TAC",
"MSISDN" = "MSISDN",
"BRANCH_ID" = "BRANCH_ID",
"BSEGMENT" = "BSEGMENT",
"LT_MONTH" = "LT_MONTH",
"REVENUE_COMMERCE" = "REVENUE_COMMERCE",
"LAST_FLASH_DTTM" = "LAST_FLASH_DTTM".
The estimated time for this step is 4 minutes and 8 seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
way of an all-rows scan into Spool 12 (all_amps) (compressed
columns allowed), which is redistributed by hash code to all
AMPs to all AMPs with hash fields ("Spool_1.SUBS_ID"). Then
we do a SORT to order Spool 12 by row hash. The size of
Spool 12 is estimated with low confidence to be 943,975,437
rows (34,927,091,169 bytes). Spool AsgnList:
"SUBS_ID" = "Spool_1.SUBS_ID",
"USG_MBOU" = "USG_MBOU",
"USG_ARPU_CONTENT" = "USG_ARPU_CONTENT".
The estimated time for this step is 1 minute and 5 seconds.
2) We do an all-AMPs RETRIEVE step from PRD2_DIC.BRANCH in view
PRD2_DIC_V.BRANCH by way of an all-rows scan with a condition
of ("NOT (PRD2_DIC.BRANCH in view PRD2_DIC_V.BRANCH.BRANCH_ID
IS NULL)") into Spool 13 (all_amps) (compressed columns
allowed), which is redistributed by hash code to all AMPs to
all AMPs with hash fields ("PRD2_DIC.BRANCH.REGION_ID").
Then we do a SORT to order Spool 13 by row hash. The size of
Spool 13 is estimated with high confidence to be 107 rows (
1,712 bytes). Spool AsgnList:
"BRANCH_ID" = "BRANCH_ID",
"REGION_ID" = "PRD2_DIC.BRANCH.REGION_ID".
The estimated time for this step is 0.02 seconds.
6) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step (No Sum) from PRD2_DIC.REGION in
view PRD2_DIC_V2.REGION by way of a RowHash match scan with a
condition of ("(PRD2_DIC.REGION in view
PRD2_DIC_V2.REGION.EDATE >= DATE '2017-11-01') AND
(PRD2_DIC.REGION in view PRD2_DIC_V2.REGION.SDATE <= DATE
'2017-11-30')"), which is joined to Spool 13 (Last Use) by
way of a RowHash match scan. PRD2_DIC.REGION and Spool 13
are right outer joined using a merge join, with condition(s)
used for non-matching on right table ("NOT
(Spool_13.REGION_ID IS NULL)"), with a join condition of (
"Spool_13.REGION_ID = PRD2_DIC.REGION.ID"). The result goes
into Spool 14 (all_amps) (compressed columns allowed), which
is redistributed by hash code to all AMPs to all AMPs with
hash fields ("PRD2_DIC.REGION.MACROREGION_CODE"). Then we do
a SORT to order Spool 14 by row hash. The size of Spool 14
is estimated with low confidence to be 107 rows (2,461 bytes).
Spool AsgnList:
"MACROREGION_CODE" = "PRD2_DIC.REGION.MACROREGION_CODE",
"BRANCH_ID" = "{RightTable}.BRANCH_ID".
The estimated time for this step is 0.03 seconds.
2) We do an all-AMPs RETRIEVE step from TELE2_UAT.CPE_MODEL_NEW
by way of an all-rows scan with no residual conditions into
Spool 17 (all_amps) (compressed columns allowed), which is
duplicated on all AMPs with hash fields (
"TELE2_UAT.CPE_MODEL_NEW.TAC"). Then we do a SORT to order
Spool 17 by row hash. The size of Spool 17 is estimated with
high confidence to be 49,024,320 rows (2,696,337,600 bytes).
Spool AsgnList:
"TAC" = "TELE2_UAT.CPE_MODEL_NEW.TAC",
"DEVICE_TYPE" = "DEVICE_TYPE".
The estimated time for this step is 2.81 seconds.
3) We do an all-AMPs JOIN step (No Sum) from Spool 11 (Last Use)
by way of a RowHash match scan, which is joined to Spool 12
(Last Use) by way of a RowHash match scan. Spool 11 and
Spool 12 are left outer joined using a merge join, with
condition(s) used for non-matching on left table ("NOT
(Spool_11.SUBS_ID IS NULL)"), with a join condition of (
"Spool_12.SUBS_ID = Spool_11.SUBS_ID"). The result goes into
Spool 18 (all_amps) (compressed columns allowed), which is
built locally on the AMPs with hash fields ("Spool_11.TAC").
Then we do a SORT to order Spool 18 by row hash. The size of
Spool 18 is estimated with low confidence to be 1,496,102,648
rows (152,602,470,096 bytes). Spool AsgnList:
"BRANCH_ID" = "{LeftTable}.BRANCH_ID",
"TAC" = "Spool_11.TAC",
"SUBS_ID" = "{LeftTable}.SUBS_ID",
"MSISDN" = "{LeftTable}.MSISDN",
"BSEGMENT" = "{LeftTable}.BSEGMENT",
"LT_MONTH" = "{LeftTable}.LT_MONTH",
"REVENUE_COMMERCE" = "{LeftTable}.REVENUE_COMMERCE",
"LAST_FLASH_DTTM" = "{LeftTable}.LAST_FLASH_DTTM",
"USG_MBOU" = "{RightTable}.USG_MBOU",
"USG_ARPU_CONTENT" = "{RightTable}.USG_ARPU_CONTENT".
The estimated time for this step is 3 minutes and 45 seconds.
7) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step (No Sum) from
PRD2_DIC.MACROREGION in view PRD2_DIC_V2.MACROREGION by way
of a RowHash match scan with a condition of (
"(PRD2_DIC.MACROREGION in view PRD2_DIC_V2.MACROREGION.EDATE
>= DATE '2017-11-01') AND (PRD2_DIC.MACROREGION in view
PRD2_DIC_V2.MACROREGION.SDATE <= DATE '2017-11-30')"), which
is joined to Spool 14 (Last Use) by way of a RowHash match
scan. PRD2_DIC.MACROREGION and Spool 14 are right outer
joined using a merge join, with condition(s) used for
non-matching on right table ("NOT (Spool_14.MACROREGION_CODE
IS NULL)"), with a join condition of (
"Spool_14.MACROREGION_CODE = PRD2_DIC.MACROREGION.MR_CODE").
The result goes into Spool 19 (all_amps) (compressed columns
allowed), which is duplicated on all AMPs with hash fields (
"Spool_14.BRANCH_ID"). The size of Spool 19 is estimated
with low confidence to be 34,240 rows (1,712,000 bytes).
Spool AsgnList:
"BRANCH_ID" = "Spool_14.BRANCH_ID",
"MR_NAME" = "{LeftTable}.MR_NAME".
The estimated time for this step is 0.04 seconds.
2) We do an all-AMPs JOIN step (No Sum) from Spool 17 (Last Use)
by way of a RowHash match scan, which is joined to Spool 18
(Last Use) by way of a RowHash match scan. Spool 17 and
Spool 18 are right outer joined using a merge join, with
condition(s) used for non-matching on right table ("NOT
(Spool_18.TAC IS NULL)"), with a join condition of (
"Spool_18.TAC = Spool_17.TAC"). The result goes into Spool
22 (all_amps) (compressed columns allowed), which is built
locally on the AMPs with hash fields ("Spool_18.BRANCH_ID").
The size of Spool 22 is estimated with low confidence to be
1,496,102,648 rows (204,966,062,776 bytes). Spool AsgnList:
"BRANCH_ID" = "Spool_18.BRANCH_ID",
"SUBS_ID" = "{RightTable}.SUBS_ID",
"TAC" = "{RightTable}.TAC",
"MSISDN" = "{RightTable}.MSISDN",
"BSEGMENT" = "{RightTable}.BSEGMENT",
"LT_MONTH" = "{RightTable}.LT_MONTH",
"REVENUE_COMMERCE" = "{RightTable}.REVENUE_COMMERCE",
"LAST_FLASH_DTTM" = "{RightTable}.LAST_FLASH_DTTM",
"DEVICE_TYPE" = "{LeftTable}.DEVICE_TYPE",
"USG_MBOU" = "{RightTable}.USG_MBOU",
"USG_ARPU_CONTENT" = "{RightTable}.USG_ARPU_CONTENT".
The estimated time for this step is 1 minute and 23 seconds.
8) We do an all-AMPs JOIN step (No Sum) from Spool 19 (Last Use) by
way of an all-rows scan, which is joined to Spool 22 (Last Use) by
way of an all-rows scan. Spool 19 is used as the hash table and
Spool 22 is used as the probe table in a right outer joined using
a single partition classical hash join, with condition(s) used for
non-matching on right table ("NOT (Spool_22.BRANCH_ID IS NULL)"),
with a join condition of ("Spool_22.BRANCH_ID = Spool_19.BRANCH_ID").
The result goes into Spool 10 (all_amps) (compressed columns
allowed), which is built locally on the AMPs with Field1 ("28364").
The size of Spool 10 is estimated with low confidence to be
1,496,102,648 rows (260,321,860,752 bytes). Spool AsgnList:
"Field_1" = "28364",
"Spool_10.SUBS_ID" = "{ Copy }{RightTable}.SUBS_ID",
"Spool_10.TAC" = "{ Copy }{RightTable}.TAC",
"Spool_10.MSISDN" = "{ Copy }{RightTable}.MSISDN",
"Spool_10.BRANCH_ID" = "{ Copy }{RightTable}.BRANCH_ID",
"Spool_10.BSEGMENT" = "{ Copy }{RightTable}.BSEGMENT",
"Spool_10.LT_MONTH" = "{ Copy }{RightTable}.LT_MONTH",
"Spool_10.REVENUE_COMMERCE" = "{ Copy
}{RightTable}.REVENUE_COMMERCE",
"Spool_10.LAST_FLASH_DTTM" = "{ Copy }{RightTable}.LAST_FLASH_DTTM",
"Spool_10.DEVICE_TYPE" = "{ Copy }{RightTable}.DEVICE_TYPE",
"Spool_10.USG_MBOU" = "{ Copy }{RightTable}.USG_MBOU",
"Spool_10.USG_ARPU_CONTENT" = "{ Copy
}{RightTable}.USG_ARPU_CONTENT",
"Spool_10.MR_NAME" = "{ Copy }{LeftTable}.MR_NAME".
The estimated time for this step is 1 minute and 45 seconds.
9) We do an all-AMPs STAT FUNCTION step from Spool 10 by way of an
all-rows scan into Spool 29, which is redistributed by hash code
to all AMPs. The result rows are put into Spool 9 (group_amps),
which is built locally on the AMPs with Field1 ("Field_1"). This
step is used to retrieve the TOP 100000000 rows. Load
distribution optimization is used. If this step retrieves less
than 100000000 rows, then execute step 10. The size is estimated
with low confidence to be 100,000,000 rows (25,000,000,000 bytes).
10) We do an all-AMPs STAT FUNCTION step from Spool 10 (Last Use) by
way of an all-rows scan into Spool 29 (Last Use), which is
redistributed by hash code to all AMPs. The result rows are put
into Spool 9 (group_amps), which is built locally on the AMPs with
Field1 ("Field_1"). This step is used to retrieve the TOP
100000000 rows. The size is estimated with low confidence to be
100,000,000 rows (25,000,000,000 bytes).
11) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 9 are sent back to the user as the result of
statement 1.
What can I do here?
Most databases show wrong estimates, and this is OK as long as the relationship between those estimates is good enough to produce a decent execution plan.
Now, if you think the execution plan is wrong, then you should seriously care about those estimates. Did you update the tables statistics recently?
Otherwise, I wouldn't worry too much about it.
There is a snipped of product code that does some row check. It's actually migrated code that came into teradata and no one has bothered to change it to be TD savvy, should I say.
This code now throws
2646 : No More spool...
Error and that is not really a spool shortage but due to data-skew as would be evident to any Teradata Master.
Code logic is plain stupid but they are running it in Prod. Code change is NOT an option now because this is production. I can rewrite it using a Simple NOT Exists and the Query will run fine.
EXPLAIN SELECT ((COALESCE(FF.SKEW_COL,-99999))) AS Cnt1,
COUNT(*) AS Cnt
FROM DB.10_BILLON_FACT FF
WHERE FF.SKEW_COL IN(
SELECT F.SKEW_COL
FROM DB.10_BILLON_FACT F
EXCEPT
SELECT D.DIM_COL
FROM DB.Smaller_DIM D
)
Its failing because it wants to redistribute on SKEW_COL. WHATEVER I DO THIS WILL NOT CHANGE. SKEW_COL is 99% skewed.
here's the explain.FAILS ON STEP # 4.1
This query is optimized using type 2 profile insert-sel, profileid
10001.
1) First, we lock a distinct DB."pseudo table" for read on a
RowHash to prevent global deadlock for DB.F.
2) Next, we lock a distinct DB."pseudo table" for read on a
RowHash to prevent global deadlock for DB.D.
3) We lock DB.F for read, and we lock DB.D for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from DB.F by way of an
all-rows scan with no residual conditions into Spool 6
(all_amps), which is redistributed by the hash code of (
DB.F.SKEW_COL) to all AMPs. Then we
do a SORT to order Spool 6 by row hash and the sort key in
spool field1 eliminating duplicate rows. The size of Spool 6
is estimated with low confidence to be 989,301 rows (
28,689,729 bytes). The estimated time for this step is 1
minute and 36 seconds.
2) We do an all-AMPs RETRIEVE step from DB.D by way of an
all-rows scan with no residual conditions into Spool 7
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 7 by the hash code of (
DB.D.DIM_COL). The size of Spool 7 is
estimated with low confidence to be 6,118,545 rows (
177,437,805 bytes). The estimated time for this step is 0.11
seconds.
5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 7 are joined using an exclusion
merge join, with a join condition of ("Field_1 = Field_1"). The
result goes into Spool 1 (all_amps), which is built locally on the
AMPs. The size of Spool 1 is estimated with low confidence to be
494,651 rows (14,344,879 bytes). The estimated time for this step
is 3.00 seconds.
6) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
way of an all-rows scan into Spool 5 (all_amps), which is
redistributed by the hash code of (
DB.F.SKEW_COL) to all AMPs. Then we
do a SORT to order Spool 5 by row hash. The size of Spool 5
is estimated with low confidence to be 494,651 rows (
12,366,275 bytes). The estimated time for this step is 0.13
seconds.
2) We do an all-AMPs RETRIEVE step from DB.FF by way of an
all-rows scan with no residual conditions into Spool 8
(all_amps) fanned out into 24 hash join partitions, which is
built locally on the AMPs. The size of Spool 8 is estimated
with high confidence to be 2,603,284,805 rows (
54,668,980,905 bytes). The estimated time for this step is
24.40 seconds.
7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
an all-rows scan into Spool 9 (all_amps) fanned out into 24 hash
join partitions, which is duplicated on all AMPs. The size of
Spool 9 is estimated with low confidence to be 249,304,104 rows (
5,235,386,184 bytes). The estimated time for this step is 1.55
seconds.
8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
all-rows scan, which is joined to Spool 9 (Last Use) by way of an
all-rows scan. Spool 8 and Spool 9 are joined using a inclusion
hash join of 24 partitions, with a join condition of (
"SKEW_COL = SKEW_COL"). The
result goes into Spool 4 (all_amps), which is built locally on the
AMPs. The size of Spool 4 is estimated with index join confidence
to be 1,630,304,007 rows (37,496,992,161 bytes). The estimated
time for this step is 11.92 seconds.
9) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
way of an all-rows scan , grouping by field1 (
DB.FF.SKEW_COL). Aggregate Intermediate
Results are computed globally, then placed in Spool 11. The size
of Spool 11 is estimated with low confidence to be 494,651 rows (
14,344,879 bytes). The estimated time for this step is 35.00
seconds.
10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
an all-rows scan into Spool 2 (group_amps), which is built locally
on the AMPs. The size of Spool 2 is estimated with low confidence
to be 494,651 rows (16,323,483 bytes). The estimated time for
this step is 0.01 seconds.
11) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 2 are sent back to the user as the result of
statement 1. The total estimated time is 2 minutes and 52 seconds.
There are some 900K unique values of skewed_ column and * ( interestingly there are 6 Million unique values for DIM_COL, which is why I think it is veering towards the Fact table column. But still..it knows from the Low Unique value in the bigger table, that its badly skewed )
My Q is after knowing that SKEWED_COL is 99% skewed due to a constant value like -9999 WHY does the optimizer still redistribute by this skewed column instead of using alternate PRPD approach. A similar ( but not same ) situation happened in past but when we upgraded to faster box ( more AMPS ) it went away .
Anything that comes to mind that will make it change plans. I tried most diagnostics - no result. Created a SI ( On a similar VT but it will still skew ).SKEWING is inevitable , ( You can artificially change the data - I am aware so to minimize this BUT all that is NOT after the fact. Now we are in PROD. Everything is over ) but even after it knows the Col is Skewed, why re-distribute it when other options are available
Its not the NULL value that skewing . Its a constant flag value ( probably value rep. of the NULL like -9999 that is causing the skew as I mentioned in the poster ) . If you rewrite the Q as I updated it works fine. I preferred NOT EXISTS because the latter will not need NULL CHECKING ( as a practice though from my DD knowledge - i know both cols are declared NOT NULL ) . I have updated the Poster with an alternative code that will work ( though like I explained - i finalized with the NOT exists version)
Select count(*) , f.SKEW_COL
from (
select ff.SKEW_COL
from DB.10_BILLON_FACT ff
where ff.SKEW_COL not in (
select d.DIM_COL
from DB.Smaller_DIM d )) as f
Group by f.SKEW_COL
Can I not get the optimizer query rewrite feature to think through the Q and rewrite with above logic. The above will NOT redistribute but JUST SORT By the Skewed Column
Until you can replace the SQL, adding spool may be your only option.
Make sure your stats are current or consider a join index with an alternative PI that covers this particular query without having to do the redistribution. You may have a skewed JI but if the work can be done AMP local you may be able to address the spool issue.
I would appreciate it if you can help me with a problem that i have.
I have this join condition :
SELECT *
FROM
T1_STAGING.(first_table) AS STG
JOIN T1_STAGING.(second_table) AS B
ON
(
STG.DLOF_ID_NO=B.DLOF_ID_NO_RU
)
This simple join is taking too long to finish, more than 20 minutes. The data of each table is less than 600,000K data. i tried the following things :
I took statistics on each table.
I changed the columns to be PRIMARY INDEX.
I created JOIN INDEX for the second table but still nothing!
The query never ends it takes 20 mins ++. This seems to be data distribution problem in the second table, but i can't do anything with the data.
Please bear in mind that if i join my first_table with any other it takes only seconds.
Can you give me a suggestion to try? I need to optimize it for better performance.
Here is the explain of TERADATA:
Explain SEL *
FROM
T1_STAGING.DLS_DLO_OWS_STAGE_STG AS STG
JOIN T1_STAGING.DLS_ACQUISITION_STG AS B
ON
(
STG.DLOF_ID_NO=B.DLOF_ID_NO_RU
)
1) First, we lock a distinct T1_STAGING."pseudo table" for read on a
RowHash to prevent global deadlock for T1_STAGING.STG.
2) Next, we lock a distinct T1_STAGING."pseudo table" for read on a
RowHash to prevent global deadlock for T1_STAGING.B.
3) We lock T1_STAGING.STG for read, and we lock T1_STAGING.B for read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from T1_STAGING.B by way of
an all-rows scan with no residual conditions split into Spool
2 (all_amps) with a condition of ("DLOF_ID_NO_RU IN (:)") to
qualify skewed rows and Spool 3 (all_amps) with a condition
of ("DLOF_ID_NO_RU IN (:)") to qualify rows matching skewed
rows of the skewed relation and Spool 4 (all_amps) with
remaining rows fanned out into 2 hash join partitions. Spool
2 is built locally on the AMPs. Then we do a SORT to order
Spool 2 by row hash. The size of Spool 2 is estimated with
high confidence to be 303 rows. Spool 3 is built locally on
the AMPs. The size of Spool 3 is estimated with high
confidence to be 4,710 rows. Spool 4 is redistributed by
hash code to all AMPs. The size of Spool 4 is estimated with
high confidence to be 97,742 rows. The estimated time for
this step is 1.27 seconds.
2) We do an all-AMPs RETRIEVE step from T1_STAGING.STG by way of
an all-rows scan with no residual conditions split into Spool
6 (all_amps) with a condition of ("DLOF_ID_NO IN (:)") to
qualify skewed rows and Spool 5 (all_amps) with a condition
of ("DLOF_ID_NO IN (:)") to qualify rows matching skewed
rows of the skewed relation and Spool 7 (all_amps) with
remaining rows fanned out into 2 hash join partitions. Spool
6 is built locally on the AMPs. The size of Spool 6 is
estimated with high confidence to be 21,587 rows. Spool 5 is
built locally on the AMPs. The size of Spool 5 is estimated
with high confidence to be 7 rows. Spool 7 is redistributed
by hash code to all AMPs. The size of Spool 7 is estimated
with high confidence to be 301,682 rows. The estimated time
for this step is 4.20 seconds.
5) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by
way of an all-rows scan into Spool 8 (all_amps), which is
duplicated on all AMPs. Then we do a SORT to order Spool 8
by the hash code of (T1_STAGING.STG.DLOF_ID_NO). The size of
Spool 8 is estimated with high confidence to be 336 rows (
640,080 bytes). The estimated time for this step is 0.01
seconds.
2) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by
way of an all-rows scan into Spool 9 (all_amps), which is
duplicated on all AMPs. The result spool file will not be
cached in memory. The size of Spool 9 is estimated with high
confidence to be 226,080 rows (391,796,640 bytes). The
estimated time for this step is 1.05 seconds.
6) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of a
RowHash match scan, which is joined to Spool 2 (Last Use) by way
of a RowHash match scan. Spool 8 and Spool 2 are joined using a
merge join, with a join condition of ("DLOF_ID_NO = DLOF_ID_NO_RU").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with low confidence to be 2,121
rows (11,491,578 bytes). The estimated time for this step is 0.03
seconds.
7) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
all-rows scan, which is joined to Spool 9 (Last Use) by way of an
all-rows scan. Spool 6 and Spool 9 are joined using a single
partition hash join, with a join condition of ("DLOF_ID_NO =
DLOF_ID_NO_RU"). The result goes into Spool 1 (group_amps), which
is built locally on the AMPs. The result spool file will not be
cached in memory. The size of Spool 1 is estimated with low
confidence to be 9,243,161 rows (50,079,446,298 bytes). The
estimated time for this step is 0.60 seconds.
8) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
all-rows scan, which is joined to Spool 7 (Last Use) by way of an
all-rows scan. Spool 4 and Spool 7 are joined using a hash join
of 2 partitions, with a join condition of ("DLOF_ID_NO =
DLOF_ID_NO_RU"). The result goes into Spool 1 (group_amps), which
is built locally on the AMPs. The result spool file will not be
cached in memory. The size of Spool 1 is estimated with low
confidence to be 731,525 rows (3,963,402,450 bytes). The
estimated time for this step is 0.96 seconds.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 6.84 seconds.
600,000K you mean 600M, right? ;) That's not that little.
First. The columns used in JOIN should be indexed. (Looks like you've done it)
2. Add WHERE condition to choose specific values you need
3. Add LIMIT to limit the SELECT result
And finally, use EXPLAIN SELECT ... to understand the issue.