Oracle Query is timing out - sql

I'm trying to write an Oracle query to join data from 4 different tables. The code is below:
SELECT
PROJ.PRJ_NO, PROJ.PRJ_NAME, PROJ.PRJ_BEG_DATE, PROJ.PRJ_END_DATE, PORT.TIER1_NAME, PORT.TIER2_NAME, PORT.TIER3_NAME, MAX(A.FIS_WK_END_DATE) AS "FISCAL_WEEK", SUM(A.ABDOL) AS "AAB_DOL", SUM(A.VHDOL) AS "AVH_DOL", SUM(A.ADOL) AS "AA_DOL", SUM(A.DCDOL) AS "ADC_DOL", SUM(A.DCGADOL) AS "ADC_GA_DOL", SUM(A.COM) AS "AM_DOL", SUM(A.FE) AS "AFE_DOL", SUM(A.IE) AS "AIE_DOL", SUM(A.OTHER) AS "AR_DOL", SUM(A.MTSFT) AS "AS_FT", SUM(A.MTSST) AS "AS_ST", SUM(A.ACTST) AS "AL_ST", SUM(A.ACTFT) AS "ALL_FT", MAX(P.SNAPSHOT_DATE) as "SNAP_DATE", P.FINSCN_TYPE, SUM(P.ABDOL) AS "PAB_DOL", SUM(P.VHDOL) AS "PVH_DOL", SUM(P.DCDOL) AS "PDC_DOL", SUM(P.TCI_DOL) AS "PCI_GA_DOL", SUM(P.GADOL) AS "PN_GA_DOL", SUM(P.COM) AS "PN_COM", SUM(P.FEE) AS "PN_FEE", SUM(P.D_IE) AS "PN_MOIE", SUM(P.OTHER) AS "PN_OTHER"
FROM PROJ_TASK_VW PROJ
LEFT JOIN PORTFOLIO_VW PORT
ON PROJ.TASKNO = PORT.TASKNO
LEFT JOIN ACTUAL_VW A
ON PROJ.TASKNO = A.CURR_TASKNO
LEFT JOIN BUDG_DOLL_VW P
ON PROJ.TASKNO = P.CURR_TASKNO
WHERE TO_CHAR(PROJ.PRJ_END_DATE, 'YYYY-MM-DD') > '2018-10-01'
AND PROJ.P_FLAG = 'N'
AND (PROJ.P_TYPE LIKE 'D-%' OR PROJ.P_TYPE LIKE '%MR%' OR PROJ.P_TYPE LIKE '%ID%')
AND (SUBSTR(PROJ.PRJ_NO,3,3) != 'BP' AND SUBSTR(PROJ.PRJ_NO,3,3) != 'PJ')
AND (P.FINSCN_TYPE = 'SR' OR P.FINSCN_TYPE = 'BUG')
AND (A.ABDOL + A.VHDOL + A.ADOL + A.DCDOL + A.DCGADOL + A.COM +
A.FE + A.IE + A.OTHER) <> 0
GROUP BY
PROJ.PRJ_NO,
PROJ.PRJ_NAME,
PROJ.PRJ_BEG_DATE,
PROJ.PRJ_END_DATE,
PORT.TIER1_NAME,
PORT.TIER2_NAME,
PORT.TIER3_NAME,
P.FINSCN_TYPE
My overall intent is to bring all of the select fields into a single table using left joins (using table "PROJ" as the parent table and the remaining tables providing child data based on the data returned from the "PROJ" table. When the query is ran it times out after about 30mins. Is there a better way to write this query to where I can build the table I need without timing out???

First, there's no way to answer this question without an execution plan. What columns do you have indexed? But here are some things I noticed.
WHERE TO_CHAR(PROJ.PRJ_END_DATE, 'YYYY-MM-DD') > '2018-10-01'
Your column is a date, so you should be comparing to a date, rather than converting to a VARCHAR2 and doing an inequality on strings.
AND (PROJ.P_TYPE LIKE 'D-%' OR PROJ.P_TYPE LIKE '%MR%' OR PROJ.P_TYPE LIKE '%ID%')
I'm not sure, but these will likely not be very performant because of the wildcards. Indexes might make these better, but I never remember how wildcard searches work with indexes.
AND (SUBSTR(PROJ.PRJ_NO,3,3) != 'BP' AND SUBSTR(PROJ.PRJ_NO,3,3) != 'PJ')
These do nothing since your two SUBSTRs return strings of 3 characters long and you are comparing them to 2 character long strings.
AND (A.ABDOL + A.VHDOL + A.ADOL + A.DCDOL + A.DCGADOL + A.COM + A.FE + A.IE + A.OTHER) <> 0
Do you actually care about the sum here, or are you just checking that one or more of these values is non-zero. If these values are always > 0, then you're better off replacing this with:
AND ( a.ABDOL > 0 OR A.VHDOL > 0 ...

Related

How can I count all NULL values, without column names, using SQL?

I'm reading and executing sql queries from file and I need to inspect the result sets to count all the null values across all columns. Because the SQL is read from file, I don't know the column names and thus can't call the columns by name when trying to find the null values.
I think using CTE is the best way to do this, but how can I call the columns when I don't know what the column names are?
WITH query_results AS
(
<sql_read_from_file_here>
)
select count_if(<column_name> is not null) FROM query_results
If you are using Python to read the file of SQL statements, you can do something like this which uses pglast to parse the SQL query to get the columns for you:
import pglast
sql_read_from_file_here = "SELECT 1 foo, 1 bar"
ast = pglast.parse_sql(sql_read_from_file_here)
cols = ast[0]['RawStmt']['stmt']['SelectStmt']['targetList']
sum_stmt = "sum(iff({col} is null,1,0))"
sums = [sum_sql.format(col = col['ResTarget']['name']) for col in cols]
print(f"select {' + '.join(sums)} total_null_count from query_results")
# outputs: select sum(iff(foo is null,1,0)) + sum(iff(bar is null,1,0)) total_null_count from query_results

How can this SQL query be improved?

This code actually works, but I guess it can be done in an easier way, or even more efficient (because I guess comparing strings like that is not a problem but may be). I tried to convert everything into datetime and not string but failed.
This code takes the date and time row from one table, CONCATing them and compares it to the datetime row from another table. The result is: 2019-12-02 09:00:00
It is just a regular Date, Time and Datetime parameters in the table. Like this Date 2019-11-17, Time 09:00:00, Datetime 2019-01-15 16:00:00
SELECT
MAX(CONCAT(f_ini, CONCAT( " ", h_ini)))
FROM posible_work
WHERE
fk_id_asigned = 100573 AND
(SELECT MAX(CONCAT(f_ini, CONCAT( " ", h_ini)))
FROM posible_work) >
(SELECT MAX(RIGHT(f_fin,19)) AS FechaAVI FROM next_work WHERE fk_id_worker = 100573)
Apart from the fact that you force concat() to be called twice when you can use it with 3 arguments (or more), one additional problem would be this condition:
(SELECT MAX(CONCAT(f_ini, CONCAT( " ", h_ini)))
FROM posible_work) >
(SELECT MAX(RIGHT(f_fin,19)) AS FechaAVI FROM next_work WHERE fk_id_worker = 100573)
Why not use in the left side of the inequality just concat(...) and you repeat SELECT...MAX(..)...?
I don't think that this breaks your code's logic:
SELECT
MAX(CONCAT(f_ini, ' ', h_ini))
FROM posible_work
WHERE
fk_id_asigned = 100573
AND
CONCAT(f_ini, ' ', h_ini) >
(SELECT MAX(RIGHT(f_fin,19)) AS FechaAVI FROM next_work WHERE fk_id_worker = 100573)
Also it could be better if you did not hardcode 100573 twice.
Just use aliases properly:
SELECT
MAX(CONCAT(p.f_ini, ' ', p.h_ini))
FROM posible_work p
WHERE
p.fk_id_asigned = 100573
AND
CONCAT(p.f_ini, ' ', p.h_ini) >
(SELECT MAX(RIGHT(f_fin,19)) AS FechaAVI FROM next_work WHERE fk_id_worker = p.fk_id_asigned)

Query optimization beyond indexes

I wrote this query that 'cubes' some data writing partial totals:
select upper(coalesce(left(k.SubStabilimento,12),'ALL')) as Stabilimento,
sum(k.PotenzialmenteInappropriato) as Numeratore,
count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato) as Denominatore,
case when (count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato)) > 0 then 1.0*sum(k.PotenzialmenteInappropriato) / (count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato)) else 0 end as Rapporto,
upper(coalesce(DescrDisciplina,'ALL')) AS Disciplina,
case when K.TipologiaDRG = 'C' then 'CHIR.'
when K.TipologiaDRG = 'M' then 'MED.'
when K.TipologiaDRG is null then 'ALL'
when K.TipologiaDRG = '' then 'SENZA TIPO'
end as TipoDRG,
case when [Anno]=#anno then 'ATTUALE'
when [Anno]=#anno-1 then 'PRECEDENTE'
else cast([Anno] as varchar(4))
end as Periodo,
upper(coalesce(left(k.mese,2), 'ALL')) as Mese,
upper(coalesce(NomeMese,'ALL')) as MeseDescr
from
tabella k
where k.Mese <= #mese
and k.anno between #anno-1 and #anno
and k.RegimeRicovero = 1
and codicepresidio=080808
and TipologiaFlusso like 'Pro%'
group by SubStabilimento, DescrDisciplina, TipologiaDRG, anno, mese,nomemese with cube
having grouping(anno) = 0
AND GROUPING(nomeMese) = GROUPING(mese)
this groovy code is added runtime according to parameters value that have to be passed to the query:
if ( parameters.get('par_stabilimenti').toUpperCase() != "'TUTTO'" )
{ query = query + "and upper(coalesce(left(k.SubStabilimento,12),'AUSL_TOTALE')) in ("+ parameters.get('par_stabilimenti').toUpperCase() +" )";}
if ( parameters.get('par_discipline').toUpperCase() != "'TUTTO'" )
{ query = query + "and upper(coalesce(k.DescrDisciplina,'TOT. STABILIMENTO')) in ("+ parameters.get('par_discipline').toUpperCase() +" )";}
SQL parameters are passed by the application runtime
I did (manually) all indexing on single columns and on table primary key, I also added indexes suggested by sql server query tuner.
Now it still takes too long to execute (about 4"), now I need to have it running 8 time faster.
Is there some optimization I can do on the query? (parameters are passed by the application)
Is there a way I can precalculate execution plan,so sql server don't have to re-do it all the times I launch the query?
I really don't have an idea how to improve performances beyond whayt I already did.
I'm on SQL Server 2018 pro (so no columnstore indexes)
Here you can find the execution plan.

LINQ to SQL selecting records and converting dates

I'm trying to select records from a table based on a date using Linq to SQL. Unfortunately the date is split across two tables - the Hours table has the day and the related JobTime table has the month and year in two columns.
I have the following query:
Dim qry = From h As Hour In ctx.Hours Where Convert.ToDateTime(h.day & "/" & h.JobTime.month & "/" & h.JobTime.year & " 00:00:00") > Convert.ToDateTime("01/01/2012 00:00:00")
This gives me the error "Arithmetic overflow error converting expression to data type datetime."
Looking at the SQL query in SQL server profiler, I see:
exec sp_executesql N'SELECT [t0].[JobTimeID], [t0].[day], [t0].[hours]
FROM [dbo].[tbl_pm_hours] AS [t0]
INNER JOIN [dbo].[tbl_pm_jobtimes] AS [t1] ON [t1].[JobTimeID] = [t0].[JobTimeID]
WHERE (CONVERT(DateTime,(((((CONVERT(NVarChar,[t0].[day])) + #p0) + (CONVERT(NVarChar,COALESCE([t1].[month],NULL)))) + #p1) + (CONVERT(NVarChar,COALESCE([t1].[year],NULL)))) + #p2)) > #p3',N'#p0 nvarchar(4000),#p1 nvarchar(4000),#p2 nvarchar(4000),#p3 datetime',#p0=N'/',#p1=N'/',#p2=N' 00:00:00',#p3='2012-01-31 00:00:00'
I can see that it's not passing in the date to search for correctly but I'm not sure how to correct it.
Can anyone please help?
Thanks,
Emma
The direct cause of the error may have to do with this issue.
As said there, the conversions you use are a very inefficient way to build a query. On top of that, it is inefficient because the expressions are not sargable. I.e. you are using a computed value from database columns in a comparison which disables the query analyzer to use indexes to jump to individual column values. So, you could try to fix the error by doctoring the direct cause, but I think it's better to rewrite the query in a way that only the single column values are used in comparions.
I've worked this out in C#:
var cfg = new DateTime(12,6,12);
int year = 12, month = 6, day = 13; // Try some more values here.
// Date from components > datetime value?
bool gt = (
year > cfg.Year || (
(year == cfg.Year && month > cfg.Month) || (
year == cfg.Year && month == cfg.Month && day > cfg.Day)
)
);
You see that it's not as straightforward as it may look at first, but it works. There are much more comparisons to work out, but I'm sure that the ability to use indexes will easily outweigh this.
A more straightforward, but not sargable, way is to use sortable dates, like 20120101 and compare those (as integers).

Modify Return Value of SELECT-Statement (TSQL) [Optimizing query]

Problem:
A Database collumn has a Tristate (0,1,2).
Each of the values are used serversidely.
The Clientcode (which cant be changed anymore) is only able to understand '0,1'.
In the Clients view '1' is identic with '2'. So I want to change the SQL Query in the Database to return '1', if the specific value is > 0.
My current Solution is combining 2 Selects (using UNION SELECT) with different WHERE-Clauses and returning '1' or '0' as static values. Now I'm looking for a solution to 'translate' the value within only ONE SELECT statement.
This is my current Solution:
SELECT
dbo.Nachricht.NachrichtID, dbo.Nachricht.Bezeichnung, '1' AS BetrifftKontoeinrichtung,
FROM dbo.Nachricht INNER JOIN dbo.AdditionalData
ON dbo.Nachricht.NachrichtID = dbo.AdditionalData.NachrichtID
WHERE (dbo.Nachricht.NachrichtID in ( 450,439 ))
AND dbo.AdditionalData.BetrifftKontoeinrichtung > 0
UNION SELECT
dbo.Nachricht.NachrichtID, dbo.Nachricht.Bezeichnung, '0' AS BetrifftKontoeinrichtung,
FROM dbo.Nachricht INNER JOIN dbo.AdditionalData
ON dbo.Nachricht.NachrichtID = dbo.AdditionalData.NachrichtID
WHERE (dbo.Nachricht.NachrichtID in ( 450,439 ))
AND dbo.AdditionalData.BetrifftKontoeinrichtung = 0
You can use a case statement, like this:
SELECT
dbo.Nachricht.NachrichtID, dbo.Nachricht.Bezeichnung,
CASE WHEN dbo.AdditionalData.BetrifftKontoeinrichtung = 0
THEN '0' ELSE '1'
END AS BetrifftKontoeinrichtung,
FROM dbo.Nachricht
INNER JOIN dbo.AdditionalData
ON dbo.Nachricht.NachrichtID = dbo.AdditionalData.NachrichtID
WHERE (dbo.Nachricht.NachrichtID in ( 450,439 ))
Looks like you need to use CASE. A decent tutorial here
http://www.databasejournal.com/features/mssql/article.php/3288921/T-SQL-Programming-Part-5---Using-the-CASE-Function.htm
See the worked example
If you just CAST(CAST(val AS BIT) AS INT) you will get integer 0 for 0 and integer 1 for everything else.