I need my query to run weekly, but it is taking very long (about one hour per execution). The sheer volume of information makes it long to run, but I wondered how I could optimize it. I'm an SQL novice. This is my request :
SELECT PRIME_MINISTER.PRIME_MINISTER_ID,
PRIME_MINISTER.PRIME_MINISTER_NAME,
CITY.CITY_ID,
CITY.CITY_POPULATION,
CITY.CITY_FOUNDATION_DATE,
STATE.STATE_ID,
STATE.STATE_NAME,
CITY_ACCOUNTANT.CITY_ACCOUNTANT_ID,
CITY_ACCOUNTANT.CITY_ACCOUNTANT_SOCIAL,
CITY_ACCOUNTANT.CITY_ACCOUNTANT_NAME,
CITY_COUNCIL.CITY_COUNCIL_ID,
CITY_COUNCIL.CITY_COUNCIL_FREQUENCY,
CITY_DEBT.CITY_DEBT_ID,
CITY_DEBT.CITY_DEBT_NATURE,
CITY_DEBT.CITY_DEBT_AMOUNT,
HEAD_OF_STATE.HEAD_OF_STATE_ID,
HEAD_OF_STATE.HEAD_OF_STATE_SOCIAL,
DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_ID,
DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_SOCIAL,
DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_NAME
FROM CITY
LEFT JOIN CITY_COUNCIL ON CITY.CITY_ID = CITY_COUNCIL.CITY_ID
LEFT JOIN CITY_DEBT ON CITY_COUNCIL.CITY_COUNCIL_ID = CITY_DEBT.CITY_COUNCIL_ID
OR CITY.CITY_ID = CITY_DEBT.CITY_ID
INNER JOIN CITY_ACCOUNTANT ON CITY_ACCOUNTANT.CITY_ACCOUNTANT_ID = CITY.CITY_ACCOUNTANT_ID
INNER JOIN STATE ON STATE.STATE_ID = CITY.STATE_ID
INNER JOIN HEAD_OF_STATE ON HEAD_OF_STATE.HEAD_OF_STATE_ID = STATE.HEAD_OF_STATE_ID
INNER JOIN DEPUTY_HEAD_OF_STATE ON DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_ID = HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_ID
INNER JOIN PRIME_MINISTER ON STATE.STATE_ID = PRIME_MINISTER.STATE_ID
WHERE CITY.CITY_STATUS = 2
AND CITY.PRIME_MINISTER_STATUS = 2
AND CITY.JURISDICTION = '70'
AND CITY.CITY_ACCOUNTANT_NATURE = 'S'
ORDER BY DEPUTY_HEAD_OF_STATE.DEPUTY_HEAD_OF_STATE_ID,
HEAD_OF_STATE.HEAD_OF_STATE_ID,
STATE.STATE_ID,
CITY.CITY_ID,
CITY_COUNCIL.CITY_COUNCIL_ID,
CITY_DEBT.CITY_DEBT_ID,
CITY_ACCOUNTANT.CITY_ACCOUNTANT_ID;
I select all of this data in the reading part of a spring batch to be able to write it in a file.
This is the database model :
Database Model
The database is not mine so I can't modify the database model but I can create indexes if needed.
There is between 1,000 and 7,000 rows selected per execution. All the columns are needed.
CITY_ACCOUTANT in SQL vs CITY_ACCOUNTANT in picture: is this the right query or a typo?
Is the spring batch process is taking an hour or just the query?
Related
RDBMS - MS SQL
How can I combine these two SQL select queries into one and get them executed quicly:
Query 1
select
VVO.VV_CODE,
V.Vessel_name,
VVO.Arrival_date,
isnull(IGM.VIR_NO,'NULL') as VIR_NO,
isnull(VVO.TERMINAL_CODE,'NULL') as TERMINAL_CODE
from Vessel_voyage VVO, Vessel V, IGM
where V.Vessel_code = substring(VVO.VV_CODE,1,3) and VVO.VV_CODE = IGM.VV_CODE
Query 2
select
BLD.BL_NO,
isnull(BLD.Parent_BL,'NULL') as Parent_BL,
BLD.Consignee_Description,
DO.DO_Issue_Date,
CA.CAgent_Name,
BLC.Container_No ,
CS.Container_Size_Description
from BL_DATA BLD, CAgent CA, Delivery_Order DO, BL_Container BLC, Container_Size CS
where BLD.BL_NO = DO.BL_NO and DO.CAgent_Code = CA.CAgent_Code and BLD.BL_NO = BLC.BL_NO and BLC.Container_Size_Code = CS.Container_Size_Code
Executing these select queries individually, they get executed within seconds.
But making them into a single select query, they take around 30 to 40 minutes to get executed.
This is what I tried:
select
VVO.VV_CODE,
V.Vessel_name,
VVO.Arrival_date,
isnull(IGM.VIR_NO,'NULL') as VIR_NO,
isnull(VVO.TERMINAL_CODE,'NULL') as TERMINAL_CODE,
BLD.BL_NO,
isnull(BLD.Parent_BL,'NULL') as Parent_BL,
BLD.Consignee_Description,
DO.DO_Issue_Date,
CA.CAgent_Name,
BLC.Container_No ,
CS.Container_Size_Description
from Vessel_voyage VVO, Vessel V, IGM ,BL_DATA BLD, CAgent CA, Delivery_Order DO, BL_Container BLC, Container_Size CS
where V.Vessel_code = substring(VVO.VV_CODE,1,3) and VVO.VV_CODE = IGM.VV_CODE and BLD.BL_NO = DO.BL_NO and DO.CAgent_Code = CA.CAgent_Code and BLD.BL_NO = BLC.BL_NO and BLC.Container_Size_Code = CS.Container_Size_Code
Writing the query using ANSI style joins would make it easier to read - so I've done exactly that. It also helps spot where there are problems within the join logic.
Re-writing the joins I get to this :
from Vessel_voyage VVO
inner join Vessel V on V.Vessel_code = substring(VVO.VV_CODE,1,3)
inner join IGM on VVO.VV_CODE = IGM.VV_CODE
inner join BL_DATA BLD on BLD.BL_NO = DO.BL_NO
inner join CAgent CA
inner join Delivery_Order DO on DO.CAgent_Code = CA.CAgent_Code
inner join BL_Container BLC on BLD.BL_NO = BLC.BL_NO
inner join Container_Size CS on BLC.Container_Size_Code = CS.Container_Size_Code
I could move around the DO to CA join predicate but ultimately I have 8 tables being joined, but only 6 predicates joining them - net result is that it is cartesian'ing in one of the tables which is likely to give incorrect results, but most certainly will cause a performance degradation.
If you use this style of join I suspect you will be able to fix it very easily.
I have a query against a table that contains like 2 million rows using linked server.
Select * from OPENQUERY(LinkedServerName,
'SELECT
PV.col1
,PV.col2
,PV.col3
,VTR.col1
,CTR.col1
,PSR.col1
FROM
LinkedDbName.dbo.tbl1 PV
INNER JOIN LinkedDbName.dbo.tbl2 VTR
ON PV.col_id = VTR.col_id
INNER JOIN LinkedDbName.dbo.tbl3 CTR
ON PV.col_id = CTR.col_id
INNER JOIN LinkedDbName.dbo.tbl4 PSR
ON PV.col_id = PSR.col_id
WHERE
PV.col_id = ''80C53C9B-6272-11DA-BB34-000E0C7F3ED2''')
That query results into 365 rows and is executed within 0 second.
However when I make that query into a view it runs for about minimum of 20 seconds and sometimes it reaches to 40 seconds tops.
Here's my create view script
CREATE VIEW [dbo].[myview]
AS
Select * from OPENQUERY(LinkedServerName,
'SELECT
PV.col1
,PV.col2
,PV.col3
,VTR.col1
,CTR.col1
,PSR.col1
FROM
LinkedDbName.dbo.tbl1 PV
INNER JOIN LinkedDbName.dbo.tbl2 VTR
ON PV.col_id = VTR.col_id
INNER JOIN LinkedDbName.dbo.tbl3 CTR
ON PV.col_id = CTR.col_id
INNER JOIN LinkedDbName.dbo.tbl4 PSR
ON PV.col_id = PSR.col_id')
then
Select * from myview where PV.col_id = '80C53C9B-6272-11DA-BB34-000E0C7F3ED2'
Any idea ? Thanks !
Your queries are quite different. In the first, the where clause is part of the SQL statement passed to OPENQUERY(). This has two important effects:
The amount of data returned is much smaller, only being the rows that match the condition.
The query can be optimized with the WHERE clause.
If you need to share the table, I might suggest that you make a copy on the local server -- either using replication or scheduling a job to copy it over.
I have a SQL query with many left joins
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
LEFT JOIN T_PLAN_TYPE tp ON tp.plan_type_id = po.Plan_Type_Fk
LEFT JOIN T_PRODUCT_TYPE pt ON pt.PRODUCT_TYPE_ID = po.cust_product_type_fk
LEFT JOIN T_PROPOSAL_TYPE prt ON prt.PROPTYPE_ID = po.proposal_type_fk
LEFT JOIN T_BUSINESS_SOURCE bs ON bs.BUSINESS_SOURCE_ID = po.CONT_AGT_BRK_CHANNEL_FK
LEFT JOIN T_USER ur ON ur.Id = po.user_id_fk
LEFT JOIN T_ROLES ro ON ur.roleid_fk = ro.Role_Id
LEFT JOIN T_UNDERWRITING_DECISION und ON und.O_Id = po.decision_id_fk
LEFT JOIN T_STATUS st ON st.STATUS_ID = po.piv_uw_status_fk
LEFT OUTER JOIN T_MEMBER_INFO mi ON mi.proposal_info_fk = po.O_ID
WHERE 1 = 1
AND po.CUST_APP_NO LIKE '%100010233976%'
AND 1 = 1
AND po.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10
The performance seems to be not good and I would like to optimize the query.
Any suggestions please?
Try this one -
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
WHERE PO.CUST_APP_NO LIKE '%100010233976%'
AND PO.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10
First, check your indexes. Are they old? Did they get fragmented? Do they need rebuilding?
Then, check your "execution plan" (varies depending on the SQL Engine): are all joins properly understood? Are some of them 'out of order'? Do some of them transfer too many data?
Then, check your plan and indexes: are all important columns covered? Are there any outstandingly lengthy table scans or joins? Are the columns in indexes IN ORDER with the query?
Then, revise your query:
- can you extract some parts that normally would quickly generate small rowset?
- can you add new columns to indexes so join/filter expressions will get covered?
- or reorder them so they match the query better?
And, supporting the solution from #Devart:
Can you eliminate some tables on the way? does the where touch the other tables at all? does the data in the other tables modify the count significantly? If neither SELECT nor WHERE never touches the other joined columns, and if the COUNT exact value is not that important (i.e. does that T_PROPOSAL_INFO exist?) then you might remove all the joins completely, as Devart suggested. LEFTJOINs never reduce the number of rows. They only copy/expand/multiply the rows.
Looking for suggestions on speeding up this query. Access 2007.
The inner query takes a few minutes , full query takes very long (40 - 80 minutes). Result is as expected. Everything is Indexed.
SELECT qtdetails.F5, qtdetails.F16, ExpectedResult.DLID, ExpectedResult.NumRows
FROM qtdetails
INNER JOIN (INVDL
INNER JOIN ExpectedResult
ON INVDL.DLID = ExpectedResult.DLID)
ON (qtdetails.F1 = INVDL.RegionCode)
AND (qtdetails.RoundTotal = ExpectedResult.RoundTotal)
WHERE
(qtdetails.F5 IN (SELECT qtdetails.F5
FROM (ExpectedResult
INNER JOIN INVDL
ON ExpectedResult.DLID = INVDL.DLID)
INNER JOIN qtdetails
ON (INVDL.RegionCode = qtdetails.F1)
AND (ExpectedResult.RoundTotal = qtdetails.RoundTotal)
GROUP BY qtdetails.F5
HAVING (((COUNT(ExpectedResult.DLID)) < 2));
)
);
INVDL - 80,000 records
ExpectedResult - Ten Million records
qtDetails - 12,000 records
The Inner Query will result in around 5000 - 8000 records.
Tried saving the results of the Inner Query in a table. and then using
Select F5 from qTempTable instead. But still taking a lot of time.
Any help would be very highly appreciated.
Data Type :
qtdetails.F5 = Number
qtdetails.F16 = Text
ExpectedResult.NumRows = Number
INVDL.DLID = Number
ExpectedResult.DLID = Number
INVDL.RegionCode = Text
qtdetails.F1 = Text
Rebuild indexes on all the tables involved in the query. Run the query again and check time. It will decrease the execution time. I will update you soon with tuned query if I can.
Keep Querying!
I have a SQL database server and 2 databases under it with the same structure and data. I run the same sql query in the 2 databases, one of them takes longer while the other completes in less than 50% of the time. They both have different execution plans.
The query for the view is as below:
SELECT DISTINCT i.SmtIssuer, i.SecID, ra.AssetNameCurrency AS AssetIdCurrency, i.IssuerCurrency, seg.ProxyCurrency, shifts.ScenarioDate, ten.TenorID, ten.Tenor,
shifts.Shift, shifts.BusinessDate, shifts.ScenarioNum
FROM dbo.tblRrmIssuer AS i INNER JOIN
dbo.tblRrmSegment AS seg ON i.Identifier = seg.Identifier AND i.SegmentID = seg.SegmentID INNER JOIN
dbo.tblRrmAsset AS ra ON seg.AssetID = ra.AssetID INNER JOIN
dbo.tblRrmHistSimShift AS shifts ON seg.Identifier = shifts.Identifier AND i.SegmentID = shifts.SegmentID INNER JOIN
dbo.tblRrmTenor AS ten ON shifts.TenorID = ten.TenorID INNER JOIN
dbo.tblAsset AS a ON i.SmtIssuer = a.SmtIssuer INNER JOIN
dbo.tblRrmSource AS sc ON seg.SourceID = sc.SourceID
WHERE (a.AssetTypeID = 0) AND (sc.SourceName = 'CsVaR') AND (shifts.SourceID =
(SELECT SourceID
FROM dbo.tblRrmSource
WHERE (SourceName = 'CsVaR')))
The things i have already tried are - rebuild & reorganize index on the table (tblRRMHistSimShifts - this table has over 2 million records), checked for locks or other background processes or errors on server, Max degree of parallelism for the server is 0.
Is there anything more you can suggest to fix this issue?
The fact that you have two databases on same server and with same data set (as you said) does not ensure same execution plan.
Here are some of the reasons why the query plan may be different:
mdf and ldf files (for each database) are on different drives. If one
drives is faster, that database will run the query faster too.
stalled statistics. If you have one database with newer stats than
the other one, SQL has better chances of picking a proper (and
faster) execution plan.
Indexes: I know you said they both are identical, but I would check
if you have same type of Indexes on both.
Focus on see why the query is running slow or see the actual execution plan, instead of comparing. Checking the actual execution plan for the slow query will give you a hint of why is running slower.
Also, I would not add a NO LOCK statement to fix the issue. In my experience, most slow queries can be tuned up via code or Index, instead of adding a NO LOCK hint that may get you modified or old result sets, depending of your transactions.
Best way is rebuild & reorganize your request
SELECT DISTINCT i.SmtIssuer, i.SecID, ra.AssetNameCurrency AS AssetIdCurrency, i.IssuerCurrency, seg.ProxyCurrency, shifts.ScenarioDate, ten.TenorID, ten.Tenor,
shifts.Shift, shifts.BusinessDate, shifts.ScenarioNum
FROM dbo.tblRrmIssuer AS i INNER JOIN dbo.tblRrmSegment AS seg ON i.Identifier = seg.Identifier AND i.SegmentID = seg.SegmentID
INNER JOIN dbo.tblRrmSource AS sc ON seg.SourceID = sc.SourceID
INNER JOIN dbo.tblRrmAsset AS ra ON seg.AssetID = ra.AssetID
INNER JOIN dbo.tblRrmHistSimShift AS shifts ON seg.Identifier = shifts.Identifier AND i.SegmentID = shifts.SegmentID AND shifts.SourceID = sc.SourceID
INNER JOIN dbo.tblRrmTenor AS ten ON shifts.TenorID = ten.TenorID
INNER JOIN dbo.tblAsset AS a ON i.SmtIssuer = a.SmtIssuer
WHERE (a.AssetTypeID = 0) AND (sc.SourceName = 'CsVaR')