I have the following query that joins two large tables. I am trying to join on patient_id and records that are not older than 30 days.
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0
and to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30
Currently, this query takes 2 hours to run. What indexes can I create on these tables for this query to run faster.
I will take a shot in the dark, because as others said it depends on what the table structure, indices, and the output of the planner is.
The most obvious thing here is that as long as it is possible, you want to represent dates as some date datatype instead of strings. That is the first and most important change you should make here. No index can save you if you transform strings. Because very likely, the problem is not the patient_id, it's your date calculation.
Other than that, forcing hash joins on the patient_id and then doing the filtering could help if for some reason the planner decided to do nested loops for that condition. But that is for after you fixed your date representation AND you still have a problem AND you see that the planner does nested loops on that attribute.
Some observations if you are stuck with string fields for the dates:
YYYYMMDD date strings are ordered and can be used for <,> and =.
Building strings from the data in chairs to use to JOIN on data will make good use of an index like one on data for patient_id, from_date.
So my suggestion would be to write expressions that build the date strings you want to use in the JOIN. Or to put it another way: do not transform the child table data from a string to something else.
Example expression that takes 30 days off a string date and returns a string date:
select to_char(to_date('20200112', 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
Untested:
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and id.from_date between to_char(to_date(c.from_date, 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
and c.from_date
For this query:
select *
from chairs c join data
id
on c.patient_id = id.patient_id and
to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0 and
to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30;
You should start with indexes on (patient_id, from_date) -- you can put them in both tables.
The date comparisons are problematic. Storing the values as actual dates can help. But it is not a 100% solution because comparison operations are still needed.
Depending on what you are actually trying to accomplish there might be other ways of writing the query. I might encourage you to ask a new question, providing sample data, desired results, and a clear explanation of what you really want. For instance, this query is likely to return a lot of rows. And that just takes time as well.
Your query have a non SERGABLE predicate because it uses functions that are iteratively executed. You need to discard such functions and replace them by a direct access to the columns. As an exemple :
SELECT *
FROM chairs AS c
JOIN data AS id
ON c.patient_id = id.patient_id
AND c.from_date BETWEEN id.from_date AND id.from_date + INTERVAL '1 day'
Will run faster with those two indexes :
CREATE X_SQLpro_001 ON chairs (patient_id, from_date);
CREATE X_SQLpro_002 ON data (patient_id, from_date) ;
Also try to avoid
SELECT *
And list only the necessary columns
I have 2 columns of phone number and the requirement is to get the numbers which have same last 8 digits. ColumnA's numbers have 11 digits and columnB's numbers have 9 or 10 digits.
I tried to use SUBSTR or LIKE and LEFT RIGHT function to solve but the problem is the data is too big and i can't use that way.
select trunc(ta.timeA), ta.columnA
from table1A ta,
tableB tb
WHERE substr(ta.columnA,-8) LIKE substr(tb.columnB,-8)
and trunc(ta.timeA) = trunc(ta.timeB)
AND trunc(ta.timeA) >= TO_DATE('01/01/2018', 'dd/mm/yyyy')
AND trunc(ta.timeA) < TO_DATE('01/01/2018', 'dd/mm/yyyy') + 1
GROUP BY ta.columnA, trunc(ta.timeA)
You want to select from tableA, so do this. Don't join. You only want to select tableA rows that have a match in tableB. So place an EXISTS clause in your WHERE clause.
select trunc(timea), columna
from table1a ta
where trunc(timea) >= date '2018-01-01'
and trunc(timea) < date '2018-01-02'
and exists
(
select *
from tableb tb
where trunc(tb.timeb) = trunc(ta.timea)
and substr(tb.columnb, -8) = substr(ta.columna, -8)
)
order by trunc(timea), columna;
In order to have this run fast, create the following indexes:
create idxa on tablea( trunc(timea), substr(columna, -8) );
create idxb on tableb( trunc(timeb), substr(columnb, -8) );
I don't see, however, why you are so eager to have this run fast. Do you want to keep all data as is and run the query again and again? There should be a better solution. Splitting the area code and number into two separate columns is the first thing that comes to mind.
UPDATE: Still faster than the suggested idxa should be a covering index for tableA:
create idxa on tablea( trunc(timea), substr(columna, -8), columna );
Here the DBMS can work with the index only and doesn't have to access the table. So just in case the above was still a bit too slow for you, you can try with this altered index.
And as Alex Poole has pointed out in the comments below, it should be
where trunc(timea) = date '2018-01-01'
only, if the range you are looking at is always a single day as in the example.
You can try below using = operator instead of like operator
as your want to match last 2 digit
select trunc(ta.timeA),ta.columnA
from table1A ta inner join tableB tb
on substr(ta.columnA,-8) = substr(tb.columnB,-8)
and trunc(ta.timeA) = trunc(ta.timeB)
AND trunc(ta.timeA) >= TO_DATE('01/01/2018', 'dd/mm/yyyy')
AND trunc(ta.timeA) < TO_DATE('01/01/2018', 'dd/mm/yyyy') + 1
GROUP BY ta.columnA, trunc(ta.timeA)
It would be easier to help if you were more specific about your SQL environment, below is some advice on this query that would apply in most environments.
When dealing with large data sets performance becomes even more critical and small changes in technique can have a big impact.
For example:
Like is normally used for a partial match with wildcards, do you not mean equals? Like is slower than equals, if you're not using wildcards I recommend looking for equality.
Also, you initially start with a (cross/cartesian product) join, but then your where clause defines very specific match criteria (matching time fields), if you need a matching time field make it part of the table join, this will reduce the number of join results which will significantly shrink the dataset that then needs to have the other criteria applied to it.
Also, having date calculated values in your where clause is slow. It is better to set a #fromDate and #toDate parameter before the query, then use these in the where clause as what are then literals which then don't need to be calculated for every row.
Okay, so I've done quite a lot of reading on the possibility of emulating the networkdays function of excel in sql, and have come to the conclusion that by far the easiest solution is to have a calendar table which will flag working days or non working days. However, due to circumstances out of my control, we don't have access to such a luxury and it's unlikely that we will any time in the near future.
Currently I have managed to bodge together what is undoubtedly a horrible ineffecient query in SQL that does work - the catch is, it will only work for a single client record at a time.
SELECT O_ASSESSMENTS.ASM_ID,
O_ASSESSMENTS.ASM_START_DATE,
O_ASSESSMENTS.ASM_END_DATE,
sum(CASE
When TO_CHAR(O_ASSESSMENTS.ASM_START_DATE + rownum -1,'Day')
= 'Sunday ' THEN 0
When TO_CHAR(O_ASSESSMENTS.ASM_START_DATE + rownum -1,'Day')
= 'Saturday ' THEN 0
WHEN O_ASSESSMENTS.ASM_START_DATE + rownum - 1
IN ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
THEN 0
ELSE 1
END)-1 AS Week_Day
From O_ASSESSMENTS,
ALL_OBJECTS
WHERE O_ASSESSMENTS.ASM_QSA_ID IN ('TYPE1')
AND O_ASSESSMENTS.ASM_END_DATE >= '01/01/2012'
AND O_ASSESSMENTS.ASM_ID = 'A00000'
AND ROWNUM <= O_ASSESSMENTS.ASM_END_DATE-O_ASSESSMENTS.ASM_START_DATE+1
GROUP BY
O_ASSESSMENTS.ASM_ID,
O_ASSESSMENTS.ASM_START_DATE,
O_ASSESSMENTS.ASM_END_DATE
Basically, I'm wondering if a) I should stop wasting my time on this or b) is it possible to get this to work for multiple clients? Any pointers appreciated thanks!
Edit: Further clarification - I already work out timescales using excel, but it would be ideal if we could do it in the report as the report in question is something that we would like end users to be able to run without any further manipulation.
Edit:
MarkBannister's answer works perfectly albeit slowly (though I had expected as much given it's not the preferred solution) - the challenge now lies in me integrating this into an existing report!
with
calendar_cte as (select
to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'day') in ('sunday ','saturday ') then 0 when to_date('01-01-2000')+level-1 in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011','01-01-2012','02-01-2012') then 0 else 1 end working_day
from dual
connect by level <= 1825 + sysdate - to_date('01-01-2000') )
SELECT
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE,
sum(c.working_day)-1 AS Week_Day
From
O_ASSESSMENTS a
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
WHERE a.ASM_QSA_ID IN ('TYPE1')
and a.ASM_END_DATE >= '01/01/2012'
GROUP BY
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE
There are a few ways to do this. Perhaps the simplest might be to create a CTE that produces a virtual calendar table, based on Oracle's connect by syntax, and then join it to the Assesments table, like so:
with calendar_cte as (
select to_date('01-01-2000')+level-1 calendar_date,
case when to_char(to_date('01-01-2000')+level-1, 'Day')
in ('Sunday ','Saturday ') then 0
when to_date('01-01-2000')+level-1
in ('03-01-2000','21-04-2000','24-04-2000','01-05-2000','29-05-2000','28-08-2000','25-12-2000','26-12-2000','01-01-2001','13-04-2001','16-04-2001','07-05-2001','28-05-2001','27-08-2001','25-12-2001','26-12-2001','01-01-2002','29-03-2002','01-04-2002','06-04-2002','03-06-2002','04-06-2002','26-08-2002','25-12-2002','26-12-2002','01-01-2003','18-04-2003','21-04-2003','05-05-2003','26-05-2003','25-08-2003','25-12-2003','26-12-2003','01-01-2004','09-04-2004','12-04-2004','03-05-2004','31-05-2004','30-08-2004','25-12-2004','26-12-2004','27-12-2004','28-12-2004','01-01-2005','03-01-2005','25-03-2005','28-03-2005','02-05-2005','30-05-2005','29-08-2005','27-12-2005','28-12-2005','02-01-2006','14-04-2006','17-04-2006','01-05-2006','29-05-2006','28-08-2006','25-12-2006','26-12-2006','02-01-2007','06-04-2007','09-04-2007','07-05-2007','28-05-2007','27-08-2007','25-12-2007','26-12-2007','01-01-2008','21-03-2008','24-03-2008','05-05-2008','26-05-2008','25-08-2008','25-12-2008','26-12-2008','01-01-2009','10-04-2009','13-04-2009','04-05-2009','25-05-2009','31-08-2009','25-12-2009','28-12-2009','01-01-2010','02-04-2010','05-04-2010','03-05-2010','31-05-2010','30-08-2010','24-12-2010','27-12-2010','28-12-2010','31-12-2010','03-01-2011','22-04-2011','25-04-2011','29-04-2011','02-05-2011','30-05-2011','29-08-2011','26-12-2011','27-12-2011')
then 0
else 1
end working_day
from dual
connect by level <= 36525 + sysdate - to_date('01-01-2000') )
SELECT a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE,
sum(c.working_day) AS Week_Day
From O_ASSESSMENTS a
join calendar_cte c
on c.calendar_date between a.ASM_START_DATE and a.ASM_END_DATE
WHERE a.ASM_QSA_ID IN ('TYPE1') and
a.ASM_END_DATE >= '01/01/2012' -- and a.ASM_ID = 'A00000'
GROUP BY
a.ASM_ID,
a.ASM_START_DATE,
a.ASM_END_DATE
This will produce a virtual table populated with dates from 01 January 2000 to 10 years after the current date, with all weekends marked as non-working days and all days specified in the second in clause (ie. up to 27 December 2011) also marked as non-working days.
The drawback of this method (or any method where the holiday dates are hardcoded into the query) is that each time new holiday dates are defined, every single query that uses this approach will have to have those dates added.
If you can't use a calendar table in Oracle, you might be better off exporting to Excel. Brute force always works.
Networkdays() "returns the number of whole working days between start_date and end_date. Working days exclude weekends and any dates identified in holidays."
Excluding weekends seems fairly straightforward. Every 7-day period will contain two weekend days. You'll just need to take some care with the leftover days.
Holidays are a different story. You have to either store them or pass them as an argument. If you could store them, you'd store them in a calendar table, and your problem would be over. But you can't do that.
So you're looking at passing them as an argument. Off the top of my head--and I haven't had any tea yet this morning--I'd consider a common table expression or a wrapper for a stored procedure.