Creating a materialized view filling up the complete temp space - sql

I want to create a materialized view(mv) please see below SQL query. When I try to create the materialized view my temp tablespace completely being (~128g) used and given below error
SQL Error: ORA-12801: error signaled in parallel query server P007
ORA-01652: unable to extend temp segment by 64 in tablespace TEMP1
12801. 00000 - "error signaled in parallel query server %s"
then I checked in the OEM it used parallelism 8 degree. So I disabled the parallelism using alter statement (ALTER SESSION DISABLE PARALLEL QUERY). Then the mv ran long and took several hours and got created. Please suggest is there any approaches to create it with out using much space temp space. The count for the select query for this MV is around 55 million rows. Any suggestions really appreciated.
DB: Oracle 11gR2
CREATE MATERIALIZED VIEW TEST NOLOGGING
REFRESH FORCE ON DEMAND
ENABLE QUERY REWRITE
AS
select
table4.num as "Number",table4.num as "SNum",
table4.status as "S_status",
'Open' as "NLP",
create_table2.fmonth as "SMN",
table6.wgrp as "SOW",
(table2.end_dt - create_table2.dt) as "elp",
table6.d_c as "SDC",
create_table2.fiscal_quarter_name as "SQN",
'TS' as "SSL",
table3.table3_id as "SR Owner CEC ID",
table4.sev as "ssev",
SUBSTR(table8.stech,1,INSTR(table8.stech,'=>')-1) as "srtech",
SUBSTR(table8.stech,INSTR(table8.stech,'=>')+2) as "srstech",
table5.sr_type as "SR Type",
table5.problem_code as "SR Problem Code",
--null as "SR Entry Channel",
--null as "SR Time in Status (Days)",
table6.center,
table6.th1col,
table6.master_theater,
table6.rol_3,
table7.center hier_17_center,
table7.rol_1
table7.rol_2,
table7.rol_3 wg,
table2.dt as "SBD",
table2.wk_n as "SBFW",
table2.fmonth as "SBFM",
table3.defect_indicator as "Has Defect",
table2.sofw,
table2.sofm
from
A table1
join B table2 on (table1.date_id = table2.dw_date_key)
join C table3 on (table1.date_id = table3.date_id and table1.incident_id = table3.incident_id)
join D table4 on (table3.incident_id = table4.incident_id and table4.key_d <= table3.date_id and table3.table3_id = table4.current_owner_table3_id)
join E table5 on table4.incident_id = table5.incident_id
join B create_table2 on (table5.creation_dw_date_key = create_table2.dw_date_key)
join F table6 on (table1.objectnumber=table6.DW_WORKGROUP_KEY)
join G table7 on (table1.objectnumber=table7.DW_WORKGROUP_KEY)
left outer JOIN H table8 ON (table8.natural_key= table5.UPDATED_COT_TECH_KEY)
where
table4.bl_incident_key in (select max(bl_incident_key) from D b
where b.incident_id=table3.incident_id and b.key_d <= table3.date_id and b.current_owner_table3_id = table3.table3_id)
and table2.fiscal_year_name in ('FY2013','FY2014')

Without knowing your system, tables or data i assume, that
some of the 8 tables have many rows (>> 55 millons)
join-predicates and filters will not reduce the amount of data significantly
so nearly all data will be written to the mv
Probably the execution plan will use some hash-operations and/or sort-aggregations.
This hashing and sorting cannot be done in memory, if hash and sort segments are too big.
So this will be done in temp.
8 parallel slots will probably use more temp than 1 session. So this can be reason for the ora.
You can
accept the several hours. normally such operations are done at night or weekend
it doesn't matter, if it takes 4 or 1 hour.
Increase temp
Try to scale the degree of parallelism by a hint: create .... as select /*+ parallel(n) */ table4.num...
Use 2 or 4 or 8 for n to have 2,4,8 slots
Try some indexes for the joined columns, i.e.
TABLE1(DATE_ID, INCIDENT_ID)
TABLE1(OBJECTNUMBER)
TABLE2(DW_DATE_KEY)
TABLE2(FISCAL_YEAR_NAME)
TABLE3(DATE_ID, INCIDENT_ID, TABLE3_ID)
TABLE3(INCIDENT_ID, TABLE3_ID, DATE_ID)
TABLE4(INCIDENT_ID, CURRENT_OWNER_TABLE3_ID, KEY_D, BL_INCIDENT_KEY)
TABLE5(INCIDENT_ID)
TABLE5(CREATION_DW_DATE_KEY)
TABLE5(UPDATED_COT_TECH_KEY)
TABLE6(DW_WORKGROUP_KEY)
TABLE7(DW_WORKGROUP_KEY)
TABLE8(NATURAL_KEY)
And use explain plan for the different sqls to see wich plan oracle will generate.

Related

Oracle Query takes ages to execute

I have this below Oracle query. It takes ages to execute.
Select Distinct Z.WH_Source,
substr(Z.L_Y_Month,0,4) || '-' || substr(Z.L_Y_Month,5) Ld_Yr_Mth,
m.model_Name, p.SR, p.PLATE_NO, pp.value, z.CNT_number, z.platform_SR_number,
z.account_name, z.owner_name, z.operator_name, z.jetcare_expiry_date, z.wave,
z.address, z.country, substr(z.CNT_status, 10) ctstatus,
ALLOEM.GET_CNT_TYRE_SR#TNS_GG(z.CNT_number, Z.WH_Source, Z.L_Y_Month,
z.platform_SR_number, '¿')
product_SR_number
From MST.ROLE p
inner join MST.model m on m.model_id = p.model_id
left join MST.ROLEproperty pp on pp.ROLE_id = p.ROLE_id
and pp.property_lookup = 'SSG-WH-ENROLL'
left join alloem.Z_SSG_HM_LOG#TNS_GG z on z.camp_ac_ROLE_id = p.ROLE_id
Where
1 = 1 or z.L_Y_Month = 1
Order By 1, 2 desc, 3,4
If i remove this line,
ALLOEM.GET_CNT_TYRE_SR#TNS_GG(z.CNT_number, Z.WH_Source, Z.L_Y_Month,
z.platform_SR_number, '¿')
it executes very fast. But, I can't remove the line. Is there any way to make this query to execute fast.?
If i remove this line,
ALLOEM.GET_CNT_TYRE_SR#TNS_GG(z.CNT_number, Z.WH_Source,
Z.L_Y_Month, z.platform_SR_number, '¿')
it executes very fast. But, I can't remove the line. Is there any way to make this query to execute fast.?
Query tuning is a complex thing. Without table structures, indexes, execution plan or statistics it is very hard to provide one universal answer.
Anyway I would try scalar subquery caching(if applicable):
ALLOEM.GET_CNT_TYRE_SR#TNS_GG(z.CNT_number, Z.WH_Source, Z.L_Y_Month,
z.platform_SR_number, '¿')
=>
(SELECT ALLOEM.GET_CNT_TYRE_SR#TNS_GG(z.CNT_number, Z.WH_Source,Z.L_Y_Month,
z.platform_SR_number, '¿') FROM dual)
Also usage of DISTINCT may indicate some problems with normalization. If possible please fix underlying problem and remove it.
Finally you should avoid using positional ORDER BY (it is commom anti-pattern).
This:
alloem.Z_SSG_HM_LOG#TNS_GG
suggests that you fetch data over a database link. It is usually slower than fetching data locally. So, if you can afford it & if your query manipulates "static" data (i.e. nothing changes in Z_SSG_HM_LOG table frequently) and - even if it does - the amount of data isn't very high, consider creating a materialized view (MV) in schema you're connected to while running that query. You can even create index(es) on a MV so ... hopefully, everything will run faster without too much effort.

PostgreSQL No space left Error for a self-join query

I have the following query which produces matrix of products which are bought together means has the same ticket_id. table calc_base has 500 millions rows (43gb). This query is run on a machine with 122gb RAM, 16 CPU, 600 SSD. CREATE INDEX ON calc_base(TICKET_ID);
create table calc_tmp as
select
a.product_id x_product_id,
a.product_desc x_product_desc,
b.product_id y_product_id,
b.product_desc y_product_desc,
a.units x_units,
b.units y_units,
a.sales x_sales,
b.sales y_sales,
a.flag x_flag,
b.flag y_flag
from calc_base a
inner join calc_base b on a.ticket_id = b.ticket_id;
All other queries working fine just this query after 45 minutes threw this error:
org.postgresql.util.PSQLException: ERROR: could not extend file "base/12407/18990.223": No space left on device
Hint: Check free disk space.
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2455)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2155)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:168)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:116)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
at dbAnalysis.config.NamedParamStatement.executeQuery(NamedParamStatement.java:31)
at dbAnalysis.dao.DbAccess.profile(DbAccess.java:61)
at dbAnalysis.Benchmark.perform(Benchmark.java:63)
at dbAnalysis.controller.ConsoleApplication.main(ConsoleApplication.java:95)
Is it related to the temporary files size?
I want to know why this kind of behaviour happens in PostgreSQL.
I appreciate any suggestion to solve this problem.
You clearly have lots of duplicates in ticket_id. To see the number of rows generated, you can run the following query:
select sum(cnt * cnt)
from (select cb.ticket_id, count(*) as cnt
from calc_base cb
group by cb.ticket_id
) cb;
Actually, I realized that the above would count NULL, whereas your query would filter it out. Add where cb.ticket_id is not null to the subquery if the value can be NULL.

Google Big Query - Error: 37.1 - 38.24: The JOIN operator's right-side table must be a small table

I got this error message when trying execute this SQL query into Google Big Query.
Error: 37.1 - 38.24: The JOIN operator's right-side table must be a
small table. Switch the tables if the left-side table is smaller, or
use JOIN EACH if both tables are larger than the maximum described at
https://cloud.google.com/bigquery/docs/reference/legacy-sql#joins
SQL query
SELECT
a.login,
e.CommitCommentEvent,
e.CreateEvent,
e.DeleteEvent,
e.DeploymentEvent,
e.DeploymentStatusEvent,
e.DownloadEvent,
e.FollowEvent,
e.ForkEvent,
e.ForkApplyEvent,
e.GistEvent,
e.GollumEvent,
e.IssueCommentEvent,
e.IssuesEvent,
e.MemberEvent,
e.MembershipEvent,
e.PageBuildEvent,
e.PublicEvent,
e.PullRequestEvent,
e.PullRequestReviewCommentEvent,
e.PushEvent,
e.ReleaseEvent,
e.RepositoryEvent,
e.StatusEvent,
e.TeamAddEvent,
e.WatchEvent
FROM (
SELECT actor.login AS login,
IFNULL(SUM(IF(type='CommitCommentEvent',1,NULL)),0) AS CommitCommentEvent,
IFNULL(SUM(IF(type='CreateEvent',1,NULL)),0) AS CreateEvent,
IFNULL(SUM(IF(type='DeleteEvent',1,NULL)),0) AS DeleteEvent,
IFNULL(SUM(IF(type='DeploymentEvent',1,NULL)),0) AS DeploymentEvent,
IFNULL(SUM(IF(type='DeploymentStatusEvent',1,NULL)),0) AS DeploymentStatusEvent,
IFNULL(SUM(IF(type='DownloadEvent',1,NULL)),0) AS DownloadEvent,
IFNULL(SUM(IF(type='FollowEvent',1,NULL)),0) AS FollowEvent,
IFNULL(SUM(IF(type='ForkEvent',1,NULL)),0) AS ForkEvent,
IFNULL(SUM(IF(type='ForkApplyEvent',1,NULL)),0) AS ForkApplyEvent,
IFNULL(SUM(IF(type='GistEvent',1,NULL)),0) AS GistEvent,
IFNULL(SUM(IF(type='GollumEvent',1,NULL)),0) AS GollumEvent,
IFNULL(SUM(IF(type='IssueCommentEvent',1,NULL)),0) AS IssueCommentEvent,
IFNULL(SUM(IF(type='IssuesEvent',1,NULL)),0) AS IssuesEvent,
IFNULL(SUM(IF(type='MemberEvent',1,NULL)),0) AS MemberEvent,
IFNULL(SUM(IF(type='MembershipEvent',1,NULL)),0) AS MembershipEvent,
IFNULL(SUM(IF(type='PageBuildEvent',1,NULL)),0) AS PageBuildEvent,
IFNULL(SUM(IF(type='PublicEvent',1,NULL)),0) AS PublicEvent,
IFNULL(SUM(IF(type='PullRequestEvent',1,NULL)),0) AS PullRequestEvent,
IFNULL(SUM(IF(type='PullRequestReviewCommentEvent',1,NULL)),0) AS PullRequestReviewCommentEvent,
IFNULL(SUM(IF(type='PushEvent',1,NULL)),0) AS PushEvent,
IFNULL(SUM(IF(type='ReleaseEvent',1,NULL)),0) AS ReleaseEvent,
IFNULL(SUM(IF(type='RepositoryEvent',1,NULL)),0) AS RepositoryEvent,
IFNULL(SUM(IF(type='StatusEvent',1,NULL)),0) AS StatusEvent,
IFNULL(SUM(IF(type='TeamAddEvent',1,NULL)),0) AS TeamAddEvent,
IFNULL(SUM(IF(type='WatchEvent',1,NULL)),0) AS WatchEvent
FROM (
TABLE_DATE_RANGE([githubarchive:day.events_],
DATE_ADD(CURRENT_TIMESTAMP(), -1, "YEAR"),
CURRENT_TIMESTAMP()
)) AS events
WHERE type IN ("CommitCommentEvent","CreateEvent","DeleteEvent","DeploymentEvent","DeploymentStatusEvent","DownloadEvent","FollowEvent",
"ForkEvent","ForkApplyEvent","GistEvent","GollumEvent","IssueCommentEvent","IssuesEvent","MemberEvent","MembershipEvent","PageBuildEvent",
"PublicEvent","PullRequestEvent","PullRequestReviewCommentEvent","PushEvent","ReleaseEvent","RepositoryEvent","StatusEvent","TeamAddEvent",
"WatchEvent")
GROUP BY 1
) AS e
JOIN [githubuser.malaysia] AS a
ON e.login = a.login
Change your JOIN line to use JOIN EACH.
Explicitly:
JOIN EACH [githubuser.malaysia] AS a
BigQuery handles joins for small tables differently than for large tables, and the query language currently forces you to choose between them.
Note to the BQ team: It would be nice to have the query planner automatically switch sides (when possible), or "upgrade" to JOIN EACH when necessary, and include a query option to do so, e.g. "Automatically optimize joins" that is on by default.

Optimize this query getting exceed recourse limit

SELECT DISTINCT
A.IDPRE
,A.IDARTB
,A.TIREGDAT
,B.IDDATE
,B.IDINFO
,C.TIINTRO
FROM
GLHAZQ A
,PRTINFO B
,PRTCON C
WHERE
B.IDARTB = A.IDARTB
AND B.IDPRE = A.IDPRE
AND C.IDPRE = A.IDPRE
AND C.IDARTB = A.IDARTB
AND C.TIINTRO = (
SELECT MIN(TIINTRO)
FROM
PRTCON D
WHERE D.IDPRE = A.IDPRE
AND D.IDARTB = A.IDARTB)
ORDER BY C.TIINTRO
I get below error when I run this query(DB2)
SQL0495N Estimated processor cost of "000000012093" processor seconds
("000575872000" service units) in cost category "A" exceeds a resource limit error
threshold of "000007000005" service units. SQLSTATE=57051
Please help me to fix this problem
Apparently, the workload manager is doing its job in preventing you from using too many resources. You'll need to tune your query so that its estimated cost is lower than the threshold set by your DBA. You would start by examining the query explain plan as produced by db2exfmt. If you want help, publish the plan here, along with the table and index definitions.
To produce the explain plan, perform the following 3 steps:
Create explain tables by executing db2 -tf $INSTANCE_HOME/sqllib/misc/EXPLAIN.DDL
Generate the plan by executing the explain statement: db2 explain plan for select ...<the rest of your query>
Format the plan: db2exfmt -d <your db name> -1 (note the second parameter is the digit "1", not the letter "l").
To generate the table DDL statements use the db2look utility:
db2look -d <your db name> -o tables.sql -e -t GLHAZQ PRTINFO PRTCON
Although not a db2 person, but I would suspect query syntax is the same. In your query, you are doing a sub-select based on the C.TIINTRO which can kill performance. You are also querying for all records.
I would start the query by pre-querying the MIN() value and since you are not even using any other value field from the "C" alias, leave it out.
SELECT DISTINCT
A.IDPRE,
A.IDARTB,
A.TIREGDAT,
B.IDDATE,
B.IDINFO,
PreQuery.TIINTRO
FROM
( SELECT D.IDPRE,
D.IDARTB,
MIN(D.TIINTRO) TIINTRO
from
PRTCON D
group by
D.IDPRE,
D.IDARTB ) PreQuery
JOIN GLHAZQ A
ON PreQuery.IDPre = A.IDPRE
AND PreQuery.IDArtB = A.IDArtB
JOIN PRTINFO B
ON PreQuery.IDPre = B.IDPRE
AND PreQuery.IDArtB = B.IDArtB
ORDER BY
PreQuery.TIINTRO
I would ensure you have indexes on
table Index keys
PRTCON (IDPRE, IDARTB, TIINTRO)
GLHAZQ (IDPRE, IDARTB)
PRTINFO (IDPRE, IDARTB)
If you really DO need your "C" table, you could just add as another JOIN such as
JOIN PRTCON C
ON PreQuery.IDArtB = C.IDArtB
AND PreQuery.TIIntro = C.TIIntro
With such time, you might be better having a "covering index" with
GLHAZQ table key ( IDPRE, IDARTB, TIREGDAT )
PRTINFO (IDPRE, IDARTB, IDDATE, IDINFO)
this way, the index has all the elements you are returning in the query vs having to go back to all the actual pages of data. It can get the values from the index directly

The Same SQL Query takes longer to run in one DB than another DB under the same server

I have a SQL database server and 2 databases under it with the same structure and data. I run the same sql query in the 2 databases, one of them takes longer while the other completes in less than 50% of the time. They both have different execution plans.
The query for the view is as below:
SELECT DISTINCT i.SmtIssuer, i.SecID, ra.AssetNameCurrency AS AssetIdCurrency, i.IssuerCurrency, seg.ProxyCurrency, shifts.ScenarioDate, ten.TenorID, ten.Tenor,
shifts.Shift, shifts.BusinessDate, shifts.ScenarioNum
FROM dbo.tblRrmIssuer AS i INNER JOIN
dbo.tblRrmSegment AS seg ON i.Identifier = seg.Identifier AND i.SegmentID = seg.SegmentID INNER JOIN
dbo.tblRrmAsset AS ra ON seg.AssetID = ra.AssetID INNER JOIN
dbo.tblRrmHistSimShift AS shifts ON seg.Identifier = shifts.Identifier AND i.SegmentID = shifts.SegmentID INNER JOIN
dbo.tblRrmTenor AS ten ON shifts.TenorID = ten.TenorID INNER JOIN
dbo.tblAsset AS a ON i.SmtIssuer = a.SmtIssuer INNER JOIN
dbo.tblRrmSource AS sc ON seg.SourceID = sc.SourceID
WHERE (a.AssetTypeID = 0) AND (sc.SourceName = 'CsVaR') AND (shifts.SourceID =
(SELECT SourceID
FROM dbo.tblRrmSource
WHERE (SourceName = 'CsVaR')))
The things i have already tried are - rebuild & reorganize index on the table (tblRRMHistSimShifts - this table has over 2 million records), checked for locks or other background processes or errors on server, Max degree of parallelism for the server is 0.
Is there anything more you can suggest to fix this issue?
The fact that you have two databases on same server and with same data set (as you said) does not ensure same execution plan.
Here are some of the reasons why the query plan may be different:
mdf and ldf files (for each database) are on different drives. If one
drives is faster, that database will run the query faster too.
stalled statistics. If you have one database with newer stats than
the other one, SQL has better chances of picking a proper (and
faster) execution plan.
Indexes: I know you said they both are identical, but I would check
if you have same type of Indexes on both.
Focus on see why the query is running slow or see the actual execution plan, instead of comparing. Checking the actual execution plan for the slow query will give you a hint of why is running slower.
Also, I would not add a NO LOCK statement to fix the issue. In my experience, most slow queries can be tuned up via code or Index, instead of adding a NO LOCK hint that may get you modified or old result sets, depending of your transactions.
Best way is rebuild & reorganize your request
SELECT DISTINCT i.SmtIssuer, i.SecID, ra.AssetNameCurrency AS AssetIdCurrency, i.IssuerCurrency, seg.ProxyCurrency, shifts.ScenarioDate, ten.TenorID, ten.Tenor,
shifts.Shift, shifts.BusinessDate, shifts.ScenarioNum
FROM dbo.tblRrmIssuer AS i INNER JOIN dbo.tblRrmSegment AS seg ON i.Identifier = seg.Identifier AND i.SegmentID = seg.SegmentID
INNER JOIN dbo.tblRrmSource AS sc ON seg.SourceID = sc.SourceID
INNER JOIN dbo.tblRrmAsset AS ra ON seg.AssetID = ra.AssetID
INNER JOIN dbo.tblRrmHistSimShift AS shifts ON seg.Identifier = shifts.Identifier AND i.SegmentID = shifts.SegmentID AND shifts.SourceID = sc.SourceID
INNER JOIN dbo.tblRrmTenor AS ten ON shifts.TenorID = ten.TenorID
INNER JOIN dbo.tblAsset AS a ON i.SmtIssuer = a.SmtIssuer
WHERE (a.AssetTypeID = 0) AND (sc.SourceName = 'CsVaR')