Query optimization beyond indexes - sql

I wrote this query that 'cubes' some data writing partial totals:
select upper(coalesce(left(k.SubStabilimento,12),'ALL')) as Stabilimento,
sum(k.PotenzialmenteInappropriato) as Numeratore,
count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato) as Denominatore,
case when (count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato)) > 0 then 1.0*sum(k.PotenzialmenteInappropriato) / (count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato)) else 0 end as Rapporto,
upper(coalesce(DescrDisciplina,'ALL')) AS Disciplina,
case when K.TipologiaDRG = 'C' then 'CHIR.'
when K.TipologiaDRG = 'M' then 'MED.'
when K.TipologiaDRG is null then 'ALL'
when K.TipologiaDRG = '' then 'SENZA TIPO'
end as TipoDRG,
case when [Anno]=#anno then 'ATTUALE'
when [Anno]=#anno-1 then 'PRECEDENTE'
else cast([Anno] as varchar(4))
end as Periodo,
upper(coalesce(left(k.mese,2), 'ALL')) as Mese,
upper(coalesce(NomeMese,'ALL')) as MeseDescr
from
tabella k
where k.Mese <= #mese
and k.anno between #anno-1 and #anno
and k.RegimeRicovero = 1
and codicepresidio=080808
and TipologiaFlusso like 'Pro%'
group by SubStabilimento, DescrDisciplina, TipologiaDRG, anno, mese,nomemese with cube
having grouping(anno) = 0
AND GROUPING(nomeMese) = GROUPING(mese)
this groovy code is added runtime according to parameters value that have to be passed to the query:
if ( parameters.get('par_stabilimenti').toUpperCase() != "'TUTTO'" )
{ query = query + "and upper(coalesce(left(k.SubStabilimento,12),'AUSL_TOTALE')) in ("+ parameters.get('par_stabilimenti').toUpperCase() +" )";}
if ( parameters.get('par_discipline').toUpperCase() != "'TUTTO'" )
{ query = query + "and upper(coalesce(k.DescrDisciplina,'TOT. STABILIMENTO')) in ("+ parameters.get('par_discipline').toUpperCase() +" )";}
SQL parameters are passed by the application runtime
I did (manually) all indexing on single columns and on table primary key, I also added indexes suggested by sql server query tuner.
Now it still takes too long to execute (about 4"), now I need to have it running 8 time faster.
Is there some optimization I can do on the query? (parameters are passed by the application)
Is there a way I can precalculate execution plan,so sql server don't have to re-do it all the times I launch the query?
I really don't have an idea how to improve performances beyond whayt I already did.
I'm on SQL Server 2018 pro (so no columnstore indexes)
Here you can find the execution plan.

Related

Oracle Query is timing out

I'm trying to write an Oracle query to join data from 4 different tables. The code is below:
SELECT
PROJ.PRJ_NO, PROJ.PRJ_NAME, PROJ.PRJ_BEG_DATE, PROJ.PRJ_END_DATE, PORT.TIER1_NAME, PORT.TIER2_NAME, PORT.TIER3_NAME, MAX(A.FIS_WK_END_DATE) AS "FISCAL_WEEK", SUM(A.ABDOL) AS "AAB_DOL", SUM(A.VHDOL) AS "AVH_DOL", SUM(A.ADOL) AS "AA_DOL", SUM(A.DCDOL) AS "ADC_DOL", SUM(A.DCGADOL) AS "ADC_GA_DOL", SUM(A.COM) AS "AM_DOL", SUM(A.FE) AS "AFE_DOL", SUM(A.IE) AS "AIE_DOL", SUM(A.OTHER) AS "AR_DOL", SUM(A.MTSFT) AS "AS_FT", SUM(A.MTSST) AS "AS_ST", SUM(A.ACTST) AS "AL_ST", SUM(A.ACTFT) AS "ALL_FT", MAX(P.SNAPSHOT_DATE) as "SNAP_DATE", P.FINSCN_TYPE, SUM(P.ABDOL) AS "PAB_DOL", SUM(P.VHDOL) AS "PVH_DOL", SUM(P.DCDOL) AS "PDC_DOL", SUM(P.TCI_DOL) AS "PCI_GA_DOL", SUM(P.GADOL) AS "PN_GA_DOL", SUM(P.COM) AS "PN_COM", SUM(P.FEE) AS "PN_FEE", SUM(P.D_IE) AS "PN_MOIE", SUM(P.OTHER) AS "PN_OTHER"
FROM PROJ_TASK_VW PROJ
LEFT JOIN PORTFOLIO_VW PORT
ON PROJ.TASKNO = PORT.TASKNO
LEFT JOIN ACTUAL_VW A
ON PROJ.TASKNO = A.CURR_TASKNO
LEFT JOIN BUDG_DOLL_VW P
ON PROJ.TASKNO = P.CURR_TASKNO
WHERE TO_CHAR(PROJ.PRJ_END_DATE, 'YYYY-MM-DD') > '2018-10-01'
AND PROJ.P_FLAG = 'N'
AND (PROJ.P_TYPE LIKE 'D-%' OR PROJ.P_TYPE LIKE '%MR%' OR PROJ.P_TYPE LIKE '%ID%')
AND (SUBSTR(PROJ.PRJ_NO,3,3) != 'BP' AND SUBSTR(PROJ.PRJ_NO,3,3) != 'PJ')
AND (P.FINSCN_TYPE = 'SR' OR P.FINSCN_TYPE = 'BUG')
AND (A.ABDOL + A.VHDOL + A.ADOL + A.DCDOL + A.DCGADOL + A.COM +
A.FE + A.IE + A.OTHER) <> 0
GROUP BY
PROJ.PRJ_NO,
PROJ.PRJ_NAME,
PROJ.PRJ_BEG_DATE,
PROJ.PRJ_END_DATE,
PORT.TIER1_NAME,
PORT.TIER2_NAME,
PORT.TIER3_NAME,
P.FINSCN_TYPE
My overall intent is to bring all of the select fields into a single table using left joins (using table "PROJ" as the parent table and the remaining tables providing child data based on the data returned from the "PROJ" table. When the query is ran it times out after about 30mins. Is there a better way to write this query to where I can build the table I need without timing out???
First, there's no way to answer this question without an execution plan. What columns do you have indexed? But here are some things I noticed.
WHERE TO_CHAR(PROJ.PRJ_END_DATE, 'YYYY-MM-DD') > '2018-10-01'
Your column is a date, so you should be comparing to a date, rather than converting to a VARCHAR2 and doing an inequality on strings.
AND (PROJ.P_TYPE LIKE 'D-%' OR PROJ.P_TYPE LIKE '%MR%' OR PROJ.P_TYPE LIKE '%ID%')
I'm not sure, but these will likely not be very performant because of the wildcards. Indexes might make these better, but I never remember how wildcard searches work with indexes.
AND (SUBSTR(PROJ.PRJ_NO,3,3) != 'BP' AND SUBSTR(PROJ.PRJ_NO,3,3) != 'PJ')
These do nothing since your two SUBSTRs return strings of 3 characters long and you are comparing them to 2 character long strings.
AND (A.ABDOL + A.VHDOL + A.ADOL + A.DCDOL + A.DCGADOL + A.COM + A.FE + A.IE + A.OTHER) <> 0
Do you actually care about the sum here, or are you just checking that one or more of these values is non-zero. If these values are always > 0, then you're better off replacing this with:
AND ( a.ABDOL > 0 OR A.VHDOL > 0 ...

how do I join two tables sql

I have an issue that I'm hoping you can help me with. I am trying to create charting data for performance of an application that I am working on. The first step for me to perform two select statements with my feature turned off and on.
SELECT onSet.testName,
avg(onSet.elapsed) as avgOn,
0 as avgOff
FROM Results onSet
WHERE onSet.pll = 'On'
GROUP BY onSet.testName
union
SELECT offSet1.testName,
0 as avgOn,
avg(offSet1.elapsed) as avgOff
FROM Results offSet1
WHERE offSet1.pll = 'Off'
GROUP BY offSet1.testName
This gives me data that looks like this:
Add,0,11.4160277777777778
Add,11.413625,0
Delete,0,4.5245277777777778
Delete,4.0039861111111111,0
Evidently union is not the correct feature. Since the data needs to look like:
Add,11.413625,11.4160277777777778
Delete,4.0039861111111111,4.5245277777777778
I've been trying to get inner joins to work but I can't get the syntax to work.
Removing the union and trying to put this statement after the select statements also doesn't work. I evidently have the wrong syntax.
inner join xxx ON onSet.testName=offset1.testName
After getting the data to be like this I want to apply one last select statement that will subtract one column from another and give me the difference. So for me it's just one step at a time.
Thanks in advance.
-KAP
I think you can use a single query with conditional aggregation:
SELECT
testName,
AVG(CASE WHEN pll = 'On' THEN elapsed ELSE 0 END) AS avgOn,
AVG(CASE WHEN pll = 'Off' THEN elapsed ELSE 0 END) AS avgOff
FROM Results
GROUP BY testName
I just saw the filemaker tag and have no idea if this work there, but on MySQL I would try something along
SELECT testName, sum(if(pll = 'On',elapsed,0)) as sumOn,
sum(if(pll = 'On',1,0)) as numOn,
sum(if(pll ='Off',elapsed,0)) as sumOff,
sum(if(pll ='Off',1,0)) as numOff,
sumOn/numOn as avgOn,
sumOff/numOff as avgOff
FROM Results
WHERE pll = 'On' or pll='Off'
GROUP BY testName ;
If it works for you then this should be rather efficient as you do not need to join. If not, thumbs pressed that this triggers another idea.
The difficulty you have with the join you envisioned is that the filtering in the WHERE clause is performed after the join was completed. So, you would still not know what records to use to compute the averages. If the above is not implementable with FileMaker then check if nested queries work. You would then
SELECT testName, on.avg as avgOn, off.avg as avgOff
FROM ( SELECT ... FROM Results ...) as on, () as off
JOIN on.testName=off.testName
If that is also not possible then I would look for temporary tables.
OK guys... thanks for the help again. Here is the final answer. The statement below is FileMaker custom function that takes 4 arguments (platform, runID, model and user count. You can see the sql statement is specified. FileMaker executeSQL() function does not support nested select statements, does not support IF statements embedded in select statements (calc functions do of course) and finally does not support the SQL keyword VALUES. FileMaker does support the SQL keyword CASE which is a little more powerful but is a bit wordy. The select statement is in a variable named sql and result is placed in a variable named result. The ExecuteSQL() function works like a printf statement for param text so you can see the swaps do occur.
Let(
[
sql =
"SELECT testName, (sum( CASE WHEN PLL='On' THEN elapsed ELSE 0 END)) as sumOn,
sum( CASE WHEN PLL='On' THEN 1 ELSE 0 END) as countOn,
sum( CASE WHEN PLL='Off' THEN elapsed ELSE 0 END) as sumOff,
sum( CASE WHEN PLL='Off' THEN 1 ELSE 0 END) as countOff
FROM Results
WHERE Platform = ?
and RunID = ?
and Model = ?
and UserCnt = ?
GROUP BY testName";
result = ExecuteSQL ( sql ; "" ; ""
; platform
; runID
; model
; userCnt )
];
getAverages ( Result ; "" ; 2 )
)
For those interested the custom function looks like this:
getAverages( result, newList, pos )
Let (
[
curValues = Substitute( GetValue( data; pos ); ","; ¶ );
sumOn = GetValue( curValues; 2 ) ;
countOn = GetValue( curValues; 3 );
sumOff = GetValue( curValues; 4 );
countOff = GetValue( curValues; 5 );
avgOn = sumOn / countOn;
avgOff = sumOff / countOff
newItem = ((avgOff - avgOn) / avgOff ) * 100
];
newList & If ( pos > ValueCount( data); newList;
getAverages( data; If ( not IsEmpty( newList); ¶ ) & newItem; pos + 1 ))
)

optional parameter checking in where clauses

I have a bunch of report parameters and as a result my criteria checking first checks if parameter value is null and if not compares it with a column value .
(#dateStart IS NULL OR #dateStart <= BELGE.AccDate)
AND (#dateEnd IS NULL OR #dateEnd >= BELGE.AccDate)
AND (#CompanyId IS NULL OR #CompanyId = hrktlr.CompanyId)
AND ((#onKayitlarDahil = 1 and hrktlr.StatusCode in ('M', 'O'))
OR (#onKayitlarDahil = 0 AND hrktlr.StatusCode = 'M'))
AND (#BizPartnerId IS NULL or CK.BizPartnerId = #BizPartnerId)
AND (#BizPartnerKodStart is null or #BizPartnerKodStart = '' or #BizPartnerKodStart <= CK.BizPartnerKod)
AND (#BizPartnerKodEnd is null or #BizPartnerKodEnd = '' or #BizPartnerKodEnd >= CK.BizPartnerKod)
AND (#BizPartnerType is null or #BizPartnerType=CK.BizPartnerType)
this is great for a maintainable sql query, but the problem is that Sql Query Optimizer prepares itself for the worst case I guess, and index usage is bad. For example when I pass in BizPartnerId and thus avoid BizPartnerId is null check, query runs a 100 times faster.
So if I keep going with this approach are there any pointers that you can recommend for Query Planner to help increase query performance.
Any viable alternatives to optional parameter checking?
To stop sql server form saving a sub optimal query plan you can use the option WITH RECOMPILE. The query plan will be recalculated each time you run the query.

LINQ to SQL selecting records and converting dates

I'm trying to select records from a table based on a date using Linq to SQL. Unfortunately the date is split across two tables - the Hours table has the day and the related JobTime table has the month and year in two columns.
I have the following query:
Dim qry = From h As Hour In ctx.Hours Where Convert.ToDateTime(h.day & "/" & h.JobTime.month & "/" & h.JobTime.year & " 00:00:00") > Convert.ToDateTime("01/01/2012 00:00:00")
This gives me the error "Arithmetic overflow error converting expression to data type datetime."
Looking at the SQL query in SQL server profiler, I see:
exec sp_executesql N'SELECT [t0].[JobTimeID], [t0].[day], [t0].[hours]
FROM [dbo].[tbl_pm_hours] AS [t0]
INNER JOIN [dbo].[tbl_pm_jobtimes] AS [t1] ON [t1].[JobTimeID] = [t0].[JobTimeID]
WHERE (CONVERT(DateTime,(((((CONVERT(NVarChar,[t0].[day])) + #p0) + (CONVERT(NVarChar,COALESCE([t1].[month],NULL)))) + #p1) + (CONVERT(NVarChar,COALESCE([t1].[year],NULL)))) + #p2)) > #p3',N'#p0 nvarchar(4000),#p1 nvarchar(4000),#p2 nvarchar(4000),#p3 datetime',#p0=N'/',#p1=N'/',#p2=N' 00:00:00',#p3='2012-01-31 00:00:00'
I can see that it's not passing in the date to search for correctly but I'm not sure how to correct it.
Can anyone please help?
Thanks,
Emma
The direct cause of the error may have to do with this issue.
As said there, the conversions you use are a very inefficient way to build a query. On top of that, it is inefficient because the expressions are not sargable. I.e. you are using a computed value from database columns in a comparison which disables the query analyzer to use indexes to jump to individual column values. So, you could try to fix the error by doctoring the direct cause, but I think it's better to rewrite the query in a way that only the single column values are used in comparions.
I've worked this out in C#:
var cfg = new DateTime(12,6,12);
int year = 12, month = 6, day = 13; // Try some more values here.
// Date from components > datetime value?
bool gt = (
year > cfg.Year || (
(year == cfg.Year && month > cfg.Month) || (
year == cfg.Year && month == cfg.Month && day > cfg.Day)
)
);
You see that it's not as straightforward as it may look at first, but it works. There are much more comparisons to work out, but I'm sure that the ability to use indexes will easily outweigh this.
A more straightforward, but not sargable, way is to use sortable dates, like 20120101 and compare those (as integers).

LINQ Incompatibility Issue with SQL Server 2000

I have a Linq to SQL query that was working just fine with SQL Server 2005 but, I have to deploy the web app with a SQL Server 2000 and, when executing that query, I get his error:
"System.Data.SqlClient.SqlException: The column prefix 't0' does not match with a table name or alias name used in the query."
I have more queries but it doesn't seems to have problems with those.
Now, this is the query:
from warningNotices in DBContext_Analyze.FARs
where warningNotices.FAR_Area_ID == filter.WarningAreaID &&
warningNotices.FAR_Seq == filter.WarningSeq &&
warningNotices.FAR_Year == filter.WarningYear
orderby warningNotices.FAR_Seq ascending
select new Search_Result
{
FAR_Area_ID = warningNotices.FAR_Area_ID,
FAR_Seq = warningNotices.FAR_Seq,
FAR_Year = warningNotices.FAR_Year,
DateTime_Entered = (DateTime)warningNotices.DateTime_Entered == null ? DateTime.MaxValue : (DateTime)warningNotices.DateTime_Entered,
Time_Entered = warningNotices.Time_Entered,
OrigDept = warningNotices.OrigDept,
Part_No = warningNotices.Part_No,
DateTime_Analyzed = (DateTime)warningNotices.DateTime_Analyzed == null ? DateTime.MaxValue : (DateTime)warningNotices.DateTime_Analyzed,
Analyzed_By = warningNotices.Analyzed_By,
MDR_Required = (Char)warningNotices.MDR_Required == null ? Char.MinValue : (Char)warningNotices.MDR_Required,
Resp_Dept = (from FARSympt in DBContext_Analyze.FAR_Symptoms
where FARSympt.FAR_Area_ID == warningNotices.FAR_Area_ID &&
FARSympt.FAR_Year == warningNotices.FAR_Year &&
FARSympt.FAR_Seq == warningNotices.FAR_Seq
select new { FARSympt.Resp_Dept}).FirstOrDefault().Resp_Dept,
Sympt_Desc = (from SymptomsCatalog in DBContext_Analyze.Symptoms
where SymptomsCatalog.symptom_ID == filter.Status_ID
select new {
SymptomsCatalog.Sympt_Desc
}).FirstOrDefault().Sympt_Desc,
Status_ID = warningNotices.Status.HasValue ? warningNotices.Status.Value : 0
};
Previously I had a "Distinc" in the subquery for the Resp_Dept field, but I removed it.
Any ideas? Thanks in advance for your comments =)
This is query I get from the SQL Server profiler:
exec sp_executesql N'SELECT [t0].[FAR_Seq], [t0].[FAR_Year],
(CASE
WHEN ([t0].[DateTime_Entered]) IS NULL THEN #p3
ELSE [t0].[DateTime_Entered]
END) AS [DateTime_Entered], [t0].[Time_Entered], [t0].[OrigDept], [t0].[Part_No],
(CASE
WHEN ([t0].[DateTime_Analyzed]) IS NULL THEN #p4
ELSE [t0].[DateTime_Analyzed]
END) AS [DateTime_Analyzed], [t0].[Analyzed_By],
(CASE
WHEN (UNICODE([t0].[MDR_Required])) IS NULL THEN #p5
ELSE CONVERT(NChar(1),[t0].[MDR_Required])
END) AS [MDR_Required], (
SELECT [t2].[Resp_Dept]
FROM (
**SELECT TOP (1)** [t1].[Resp_Dept]
FROM [dbo].[FAR_Symptoms] AS [t1]
WHERE (UNICODE([t1].[FAR_Area_ID]) = UNICODE([t0].[FAR_Area_ID])) AND ([t1].[FAR_Year] = [t0].[FAR_Year]) AND ([t1].[FAR_Seq]
= [t0].[FAR_Seq])
) AS [t2]
) AS [Resp_Dept], (
SELECT [t4].[Sympt_Desc]
FROM (
**SELECT TOP (1)** [t3].[Sympt_Desc]
FROM [dbo].[Symptoms] AS [t3]
WHERE [t3].[symptom_ID] = #p6
) AS [t4]
) AS [Sympt_Desc], [t0].[FAR_Area_ID],
(CASE
WHEN [t0].[Status] IS NOT NULL THEN [t0].[Status]
ELSE #p7
END) AS [Status_ID]
FROM [dbo].[FARs] AS [t0]
WHERE (UNICODE([t0].[FAR_Area_ID]) = #p0) AND ([t0].[FAR_Seq] = #p1) AND ([t0].[FAR_Year] = #p2)
ORDER BY [t0].[FAR_Seq]',N'#p0 int,#p1 int,#p2 varchar(2),#p3 datetime,#p4 datetime,#p5 nchar(1),#p6 int,#p7
int',#p0=76,#p1=7204,#p2='08',#p3=''9999-12-31 23:59:59:997'',#p4=''9999-12-31 23:59:59:997'',#p5=N' ',#p6=0,#p7=0
The only think that I see there that may not in SQL Server 2000 is the '()' in the "Select top..." but I'm not sure if that is what is causing the problem and, also, I don't know how that could be fixed =S
Thanks again =)
My Linq statement worked on SQL2008 but broke with the exact same error message on SQL2000.
Had a very similar Linq query that worked on both, the only real difference was that before calling .ToList() I called the .OrderBy() clause.
Ex:
var query = from t1 in table1 ...
...;
list = query.OrderBy(o => o.Field).ToList()
Tried the same OrderBy clause on the broken Linq query and it worked!
Has to be a bug?
Do you have the latest Service Pack for Visual Studio and the framework?
I just checked some of my Linq generated SQL and it is using "Top 1" correctly against a SQL Server 2000 database.
after several testing and review the DB, I found that the problem was a legacy table I was working on: that table has "text" type fields. Also, I had to remove some "Distinct" instructions in a nested query I had.
I found this and, after review that, I found that I have to change my queries and that the "Distinct" instruction does not work correctly. As a side note, let me say that the nested queries can also generate unexpected behavior.
So, the real lesson here is that if you need to deploy this against a SQL Server 2000, set an instance of the server and test against it!!! XD
Thanks a lot of your help =)