I am new to sql and have created the below sql to fetch the required results.However the query seems to take ages in running and is quite slow. It will be great if any help in optimization is provided.
Below is the sql query i am using:
SELECT
Date_trunc('week',a.pair_date) as pair_week,
a.used_code,
a.used_name,
b.line,
b.channel,
count(
case when b.sku = c.sku then used_code else null end
)
from
a
left join b on a.ma_number = b.ma_number
and (a.imei = b.set_id or a.imei = b.repair_imei
)
left join c on a.used_code = c.code
group by 1,2,3,4,5
I would rewrite the query as:
select Date_trunc('week',a.pair_date) as pair_week,
a.used_code, a.used_name, b.line, b.channel,
count(*) filter (where b.sku = c.sku)
from a left join
b
on a.ma_number = b.ma_number and
a.imei in ( b.set_id, b.repair_imei ) left join
c
on a.used_code = c.code
group by 1,2,3,4,5;
For this query, you want indexes on b(ma_number, set_id, repair_imei) and c(code, sku). However, this doesn't leave much scope for optimization.
There might be some other possibilities, depending on the tables. For instance, or/in in the on clause is usually a bad sign -- but it is unclear what your intention really is.
I wrote the following query to pull a unit cost from another table COSreport into my profitability query ProfitabilityReport and am having a problem with my sub-query.
select
i.tranid
, it.item_id
, it.displayname
, tl.Item_Count * -1 Unit_Qty
, case when tl.Item_Count=0 then 0
else ((tl.GROSS_AMOUNT * -1)/ Item_Count) * -1
end as PricePerUnit,
**(select sum(c.tranamt) from ns.COSreport c
inner join ns.ProfitabilityReport d
on c.InvoiceID = d.tranid
and c.item_id = d.item_id) as 'True Cost'**
, '0' 'Cost Per M'
from ns.tinvoice i
join ns.transaction_lines tl on i.transaction_id = tl.transaction_id
join ns.Customers cust on c.customer_id = i.ENTITY_ID
join ns.items it on it.item_id = tl.item_id
left join ns.ITEM_CLASSIFICATION it_class on it_class.list_id =
it.ITEM_CLASSIFICATION_ID
where list_item_name IS NOT NULL
and i.tranid = '1262INV'
I'm joining on both the invoice id and item id so that the proper cost is pulled across for the given invoice and item from COSReport.
However, the true cost is not coming up with the unit cost but instead is summing up the cost field for the entire table.
See below for example using invoice # 1262INV specified in the query above. The cost should be 1.04, .26, and 4 respectively vs 138m.
Any help getting this cleared up would be appreciated
I actually prefer using CTEs for readability. You can take your subquery, put it into a CTE, and then join it in your main query, but you'll want to add the tranid and item_id fields to the CTE so you can use them in your join.
EDIT: since you're using Azure SQL Server, you don't need the semicolon before the WITH.
WITH TrueCosts AS
(
SELECT
d.tranid
,d.item_id
,TrueCost = SUM(c.tranamt)
FROM ns.COSreport c
INNER JOIN ns.ProfitabilityReport d
ON c.InvoiceID = d.tranid
AND c.item_id = d.item_id
GROUP BY d.tranid
,d.item_id
)
SELECT
i.tranid
, it.item_id
, it.displayname
, tl.Item_Count * -1 Unit_Qty
, case when tl.Item_Count=0 then 0
else ((tl.GROSS_AMOUNT * -1)/ Item_Count) * -1
END as PricePerUnit
, tc.TrueCost AS 'True Cost'
, '0' AS 'Cost Per M'
FROM ns.tinvoice i
JOIN ns.transaction_lines tl on i.transaction_id = tl.transaction_id
JOIN ns.Customers c on c.customer_id = i.ENTITY_ID
JOIN ns.items it on it.item_id = tl.item_id
LEFT JOIN ns.ITEM_CLASSIFICATION it_class on it_class.list_id = it.ITEM_CLASSIFICATION_ID
LEFT JOIN TrueCosts tc ON tc.tranid = i.tranid AND tc.item_id = tl.item_id
WHERE list_item_name IS NOT NULL
AND i.tranid = '1262INV'
You have a couple of problems.
In your subquery you use the alias c for COSReport. You also use the alias c for customers in your outer query.
The bigger problem is that your subquery isn't correlated to your outer query at all. That's why it's summing up the entire table.
To correlate your subquery, you need to join (in the subquery) to one of the tables in the outer query. Not sure of your tables or data, but at a guess, I'd say you want to use ns.tinvoice i in a WHERE clause in your subquery.
I have this INSERT setup where I want the following to be filtered and inserted in a table. The 'Splitout' has more than 30000000 rows. This statement is taking more than 2 hours for a single project I have 100 projects like this.
My initial plan was to insert everything at once but because it took more than 20 for it to execute I had to split by projects but even then the performance was very low. I was planning on using CROSS-APPLY but wasn't really sure how it would apply in my case. Any suggestions to improve the performance is appreciated.
Below is the code I have now - Thank you !
insert into DimQuestion (ResponderKey, ProjectID, qid, Question, QuestionType,AttributeID,
Attribute, ProductID, ProductCode, ProductName, AnswerCode, AnswerLabel)
SELECT distinct RKey
,a.ProID
,a.qid
,c.QID + ' - ' + c.[Ql] as Question
,c.[Type] as QType
,a.AttributeID
,e.Attribute
,a.ProductID
,d.ProductCode
,d.ProductName
,a.Answers
,'AnswerLabel' = case when a.qid not in ('Q2','QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then datamap.[Answer Label]
when a.qid = 'Q2'
then f.AnswerLabel
when a.qid in ('QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then a.Answers END
FROM [SplitOut] a
INNER join [DimResponder] b on a.responseid = b.ResponseID and a.ProjectID = b.ProjectID
INNER join Question_List c on a.qid = c.qid
left outer JOIN Data_Map datamap ON a.QID = datamap.QID and a.answers = datamap.[answer code]
left outer join DimProduct d on a.ProductID = d.ProductTypeCode and a.ProjectID = d.ProjectID
left outer join DimAttribute e on e.projectid = 0 and a.AttributeID = e.AttributeCode
left outer join Q2AnswerData f on a.QID = f.QID and a.Answers = f.AnswerCode and a.AttributeID = f.VariableID
where a.columnNames not like '%open%' and a.ColumnNames not like '%seg%' and a.columnnames not like '%rot%' and a.Answers not like ''and datamap.Project not in ('Project 0') and a.ProjectID in (1,2,3,4,5,6,7,8,9,10)
Make sure columns in the ON clauses are indexed.
Just looking at the SQL and not having the table counts and explain plan, here's something to try. Use a subquery for the SPLITOUT table. You say that table has 30,000,000 rows. Most of your where clause qualifiers are against the SPLITOUT table, so I'm using a subquery to reduce the number of rows joined to it. The way it's coded in your version, SPLITOUT will possibly be joined to the other tables before the where clause is applied. I agree with the comments that like clauses are bad. They don't use an index so you're most likely table scanning your 30,000,000 row table.
I also made the call to DATAMAP a subquery because it's a left join with a qualifier in the where clause. If there's no row, the qualifier will fail when you may have wanted it to succeed.
Run the subquery on SPLITOUT by itself. Tune it first. Create a composite index on splitout.projectID, answers, and columnNames. If the optimizer uses it for projectID, the likes on columnNames may be index scanned. Once the SPLITOUT subquery is tuned, add the other joins in one at a time.
Try to remove the distinct with cleaner joins. The optimizer has to sort to do a distinct which is costly.
Don't use like and in when you don't need to. Use = and not when possible.
I wouldn't use cross join for a query such as this.
insert into DimQuestion (ResponderKey, ProjectID, qid, Question, QuestionType,AttributeID, Attribute, ProductID, ProductCode, ProductName, AnswerCode, AnswerLabel)
SELECT distinct RKey
,a.ProID
,a.qid
,c.QID + ' - ' + c.[Ql] as Question
,c.[Type] as QType
,a.AttributeID
,e.Attribute
,a.ProductID
,d.ProductCode
,d.ProductName
,a.Answers
,'AnswerLabel' = case when a.qid = 'Q2' then f.AnswerLabel
when a.qid not in ('Q2','QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then datamap.[Answer Label]
when a.qid in ('QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then a.Answers
end
FROM (select a.responseID, a.ProjectID, a.ProID, a.qid, a.AttributeID, a.ProductID, a.Answers
from [SplitOut]
where a.ProjectID in (1,2,3,4,5,6,7,8,9,10)
and a.Answers <> '' -- don't use like when equality or inequality will work, note not and <> do not use the index
and a.columnNames not like '%open%' and a.ColumnNames not like '%seg%' and a.columnnames not like '%rot%' -- very bad, won't use index, consider creating a category or codes column to identify these values.
) a
inner join [DimResponder] b on a.responseid = b.ResponseID and a.ProjectID = b.ProjectID
inner join Question_List c on a.qid = c.qid
left outer join (select quid, [Answer Code] ,[Answer Label] from datamap where datamap.Project <> 'Project 0') datamap ON a.QID = datamap.QID and a.answers = datamap.[answer code]
left outer join DimProduct d on a.ProductID = d.ProductTypeCode and a.ProjectID = d.ProjectID
left outer join DimAttribute e on e.projectid = 0 and a.AttributeID = e.AttributeCode
left outer join Q2AnswerData f on a.QID = f.QID and a.Answers = f.AnswerCode and a.AttributeID = f.VariableID
Inner and outer joins tend to be expensive (used to do a lot of SQL years ago and they were then). You may be able to do it as several insert statements and it may be a lot faster, expecialy if you can reduce/get rid on inner/outer joins.
Also maybe making some temp tables will help, pre processing some of the data.
Also prociding a list of indexed will realy help. Indexes can make the diference wetween minutes and houres,
I am building a complex select statement, and when one of my values (pcf_auto_key) is null it will not disipaly any values for that header entry.
select c.company_name, h.prj_number, h.description, s.status_code, h.header_notes, h.cm_udf_001, h.cm_udf_002, h.cm_udf_008, l.classification_code
from project_header h, companies c, project_status s, project_classification l
where exists
(select company_name from companies where h.cmp_auto_key = c.cmp_auto_key)
and exists
(select status_code from project_status s where s.pjs_auto_key = h.pjs_auto_key)
and exists
(select classification_code from project_classification where h.pcf_auto_key = l.pcf_auto_key)
and pjm_auto_key = 11
--and pjt_auto_key = 10
and c.cmp_auto_key = h.cmp_auto_key
and h.pjs_auto_key = s.pjs_auto_key
and l.pcf_auto_key = h.pcf_auto_key
and s.status_type = 'O'
How does my select statement look? Is this an appropriate way of pulling info from other tables?
This is an oracle database, and I am using SQL Developer.
Assuming you want to show all the data that you can find but display the classification as blank when there is no match in that table, you can use a left outer join; which is much clearer with explicit join syntax:
select c.company_name, h.prj_number, h.description, s.status_code, h.header_notes,
h.cm_udf_001, h.cm_udf_002, h.cm_udf_008, l.classification_code
from project_header h
join companies c on c.cmp_auto_key = h.cmp_auto_key
join project_status s on s.pjs_auto_key = h.pjs_auto_key
left join project_classification l on l.pcf_auto_key = h.pcf_auto_key
where pjm_auto_key = 11
and s.status_type = 'O'
I've taken out the exists conditions as they just seem to be replicating the join conditions.
If you might not have matching data in any of the other tables you can make the other inner joins into outer joins in the same way, but be aware that if you outer join to project_status you will need to move the statatus_type check into the join condition as well, or Oracle will convert that back into an inner join.
Read more about the different kinds of joins.
We are currently upgrading to SQL Server 2014; I have a join that runs fine in SQL Server 2008 R2 but returns duplicates in SQL Server 2014. The issue appears to be with the predicate AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO for if I change it to anything but 4, I do not get the duplicates. The query is returning those values in Accounting Period 4 twice. This query gets account balances for all the previous Accounting Periods so in this case it returns values for Accounting Periods 0, 1, 2 and 3 correctly but then duplicates the values from Period 4.
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
LEFT JOIN PS_GL_ACCOUNT_TBL B
ON B.SETID = 'LTSHR'
LEFT OUTER JOIN PS_LEDGER L2
ON A.BUSINESS_UNIT = L2.BUSINESS_UNIT
AND A.LEDGER = L2.LEDGER
AND A.ACCOUNT = L2.ACCOUNT
AND A.ALTACCT = L2.ALTACCT
AND A.DEPTID = L2.DEPTID
AND A.PROJECT_ID = L2.PROJECT_ID
AND A.DATE_CODE = L2.DATE_CODE
AND A.BOOK_CODE = L2.BOOK_CODE
AND A.GL_ADJUST_TYPE = L2.GL_ADJUST_TYPE
AND A.CURRENCY_CD = L2.CURRENCY_CD
AND A.STATISTICS_CODE = L2.STATISTICS_CODE
AND A.FISCAL_YEAR = L2.FISCAL_YEAR
AND A.ACCOUNTING_PERIOD = L2.ACCOUNTING_PERIOD
AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO
WHERE
A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND ( (A.ACCOUNTING_PERIOD BETWEEN 1 and 4
AND B.ACCOUNT_TYPE IN ('E','R') )
OR
(A.ACCOUNTING_PERIOD BETWEEN 0 and 4
AND B.ACCOUNT_TYPE IN ('A','L','Q') ) )
AND A.STATISTICS_CODE = ' '
AND A.ACCOUNT = '21101'
AND A.CURRENCY_CD <> ' '
AND A.CURRENCY_CD = 'GBP'
AND B.SETID='LTSHR'
AND B.ACCOUNT=A.ACCOUNT
AND B.SETID = SETID
AND B.EFFDT=(SELECT MAX(EFFDT) FROM PS_GL_ACCOUNT_TBL WHERE SETID='LTSHR' AND WHERE ACCOUNT=B.ACCOUNT AND EFFDT<='2015-01-31 00:00:00.000')
GROUP BY A.ACCOUNT
ORDER BY A.ACCOUNT
I'm inclined to suspect that you have simplified your original query too much to reflect the real problem, but I'm going to answer the question as posed, in light of the comments on it to this point.
Since your query does not in fact select anything derived from table L2, nor do any other predicates rely on anything from that table, the only thing accomplished by (left) joining it is to duplicate rows of the pre-aggregation results where more than one satisfies the join condition for the same L2 row. That seems unlikely to be what you want, especially with that particular join being a self join, so I don't see any reason not to remove it altogether. Dollars to doughnuts, that solves the duplication problem.
I'm also going to suggest removing the correlated subquery in the WHERE clause in favor of joining an inline view, since you already join the base table for the subquery anyway. This particular inline view uses the window function version of MAX() instead of the aggregate function version. Ideally, it would directly select only the rows with the target EFFDT values, but it cannot do so without being rather more complicated, which is exactly what I am trying to avoid. The resulting query therefore filters EFFDT externally, as the original did, but without a correlated subquery.
I furthermore removed a few redundant predicates and rewrote one of the messier ones to a somewhat nicer equivalent. And I reordered the predicates in a way that seems more logical to me.
Additionally, since you are filtering on a specific value of A.ACCOUNT, it is pointless (but not wrong) to GROUP BY or ORDER_BY that column. Accordingly, I have removed those clauses to make the query simpler and clearer.
Here's what I came up with:
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
INNER JOIN (
SELECT
*,
MAX(EFFDT) OVER (PARTITION BY ACCOUNT) AS MAX_EFFDT
FROM PS_GL_ACCOUNT_TBL
WHERE
EFFDT <= '2015-01-31 00:00:00.000'
AND SETID = 'LTSHR'
) B
ON B.ACCOUNT=A.ACCOUNT
WHERE
A.ACCOUNT = '21101'
AND A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND A.CURRENCY_CD = 'GBP'
AND A.STATISTICS_CODE = ' '
AND B.EFFDT = B.MAX_EFFDT
AND CASE
WHEN B.ACCOUNT_TYPE IN ('E','R')
THEN A.ACCOUNTING_PERIOD BETWEEN 1 and 4
WHEN B.ACCOUNT_TYPE IN ('A','L','Q')
THEN A.ACCOUNTING_PERIOD BETWEEN 0 and 4
ELSE 0
END