i'm working with oracle pl/sql and i have a stored procedure with this query, and it is a bit convoluted, but it gets the job done, the thing is it takes like 35 minutes, and the sql developer Autotrace says that is doing a full scan even though the tables have their indexes.
So is there any way to improve this query?
select tipotrx, sum(saldo) as saldo,
count(*) as totaltrx from (
select max(ids) as IDTRX, max(monto) as monto, min(saldo) as saldo, max(aq_data) as aq_data, thekey, tipotrx
from (
select t.SID as ids, (TO_NUMBER(SUBSTR(P.P1, 18, 12))) as monto,
((TO_NUMBER(SUBSTR(P.P1, 18, 12)) * (TO_NUMBER(SUBSTR(t.acquirer_data, 13,2)) -
TO_NUMBER(SUBSTR(P.P4, 3,2))))) as saldo,
(TO_CHAR(t.trx_date, 'YYMMDD') || t.auth_code || t.trx_amount || (SELECT
functions.decrypt(t.card_number) FROM DUAL)) as thekey,
t.acquirer_data AS aq_data,
TO_NUMBER(SUBSTR(t.acquirer_data, 12, 1)) as tipotrx
from TBL_TRX t INNER JOIN TBL_POS P ON (t.SID = P.transaction)
WHERE (TO_NUMBER(SUBSTR(t.acquirer_data, 13,2)) >= TO_NUMBER(SUBSTR(P.P4, 3,2)))
AND trunc(t.INC_DATE) between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35) AND TO_DATE('20/06/2020', 'DD/MM/YYYY')
) t
group by thekey, tipotrx order by max(ids) desc) j
group by tipotrx;
Thanks.
Most of the time the index has to match exactly what's in the WHERE clause to be eligible for use. An index on the acquirer_data column cannot be used when your WHERE clause says
TO_NUMBER(SUBSTR(t.acquirer_data, 13,2))
An index on the INC_DATE cannot be used when your WHERE clause says
trunc(t.INC_DATE)
You manipulate every column in the WHERE clause and that alone can potentially prevent the use of any normal index.
If however you create function-based indexes, you can make some new indexes that match what's in your WHERE clause. That way at least there's a chance that the DB will use an index instead of doing full table scans.
--example function based index.
CREATE INDEX TRUNC_INC_DATE ON TBL_TRX (trunc(t.INC_DATE));
Of course, new indexes take up more space and add overhead of their own. Keep using that autotrace to see if it's worth it.
Also, updating table statistics probably wont hurt either.
Change this:
trunc(t.INC_DATE) between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35)
AND TO_DATE('20/06/2020', 'DD/MM/YYYY')
To this:
t.INC_DATE between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35)
AND TO_DATE('21/06/2020', 'DD/MM/YYYY') - INTERVAL '1' SECOND
Instead of building a function-based index you can modify the predicate to be sargable (able to use an index). Instead of using TRUNC to subtract from the the column, add a day minus one second to upper bound literal.
The code is more confusing but should be able to take advantage of the index. However, 35 days of data may be a large amount; the date index may not be very useful and you may need to look at other predicates.
Related
I have the following query that joins two large tables. I am trying to join on patient_id and records that are not older than 30 days.
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0
and to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30
Currently, this query takes 2 hours to run. What indexes can I create on these tables for this query to run faster.
I will take a shot in the dark, because as others said it depends on what the table structure, indices, and the output of the planner is.
The most obvious thing here is that as long as it is possible, you want to represent dates as some date datatype instead of strings. That is the first and most important change you should make here. No index can save you if you transform strings. Because very likely, the problem is not the patient_id, it's your date calculation.
Other than that, forcing hash joins on the patient_id and then doing the filtering could help if for some reason the planner decided to do nested loops for that condition. But that is for after you fixed your date representation AND you still have a problem AND you see that the planner does nested loops on that attribute.
Some observations if you are stuck with string fields for the dates:
YYYYMMDD date strings are ordered and can be used for <,> and =.
Building strings from the data in chairs to use to JOIN on data will make good use of an index like one on data for patient_id, from_date.
So my suggestion would be to write expressions that build the date strings you want to use in the JOIN. Or to put it another way: do not transform the child table data from a string to something else.
Example expression that takes 30 days off a string date and returns a string date:
select to_char(to_date('20200112', 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
Untested:
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and id.from_date between to_char(to_date(c.from_date, 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
and c.from_date
For this query:
select *
from chairs c join data
id
on c.patient_id = id.patient_id and
to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0 and
to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30;
You should start with indexes on (patient_id, from_date) -- you can put them in both tables.
The date comparisons are problematic. Storing the values as actual dates can help. But it is not a 100% solution because comparison operations are still needed.
Depending on what you are actually trying to accomplish there might be other ways of writing the query. I might encourage you to ask a new question, providing sample data, desired results, and a clear explanation of what you really want. For instance, this query is likely to return a lot of rows. And that just takes time as well.
Your query have a non SERGABLE predicate because it uses functions that are iteratively executed. You need to discard such functions and replace them by a direct access to the columns. As an exemple :
SELECT *
FROM chairs AS c
JOIN data AS id
ON c.patient_id = id.patient_id
AND c.from_date BETWEEN id.from_date AND id.from_date + INTERVAL '1 day'
Will run faster with those two indexes :
CREATE X_SQLpro_001 ON chairs (patient_id, from_date);
CREATE X_SQLpro_002 ON data (patient_id, from_date) ;
Also try to avoid
SELECT *
And list only the necessary columns
I have 2 columns of phone number and the requirement is to get the numbers which have same last 8 digits. ColumnA's numbers have 11 digits and columnB's numbers have 9 or 10 digits.
I tried to use SUBSTR or LIKE and LEFT RIGHT function to solve but the problem is the data is too big and i can't use that way.
select trunc(ta.timeA), ta.columnA
from table1A ta,
tableB tb
WHERE substr(ta.columnA,-8) LIKE substr(tb.columnB,-8)
and trunc(ta.timeA) = trunc(ta.timeB)
AND trunc(ta.timeA) >= TO_DATE('01/01/2018', 'dd/mm/yyyy')
AND trunc(ta.timeA) < TO_DATE('01/01/2018', 'dd/mm/yyyy') + 1
GROUP BY ta.columnA, trunc(ta.timeA)
You want to select from tableA, so do this. Don't join. You only want to select tableA rows that have a match in tableB. So place an EXISTS clause in your WHERE clause.
select trunc(timea), columna
from table1a ta
where trunc(timea) >= date '2018-01-01'
and trunc(timea) < date '2018-01-02'
and exists
(
select *
from tableb tb
where trunc(tb.timeb) = trunc(ta.timea)
and substr(tb.columnb, -8) = substr(ta.columna, -8)
)
order by trunc(timea), columna;
In order to have this run fast, create the following indexes:
create idxa on tablea( trunc(timea), substr(columna, -8) );
create idxb on tableb( trunc(timeb), substr(columnb, -8) );
I don't see, however, why you are so eager to have this run fast. Do you want to keep all data as is and run the query again and again? There should be a better solution. Splitting the area code and number into two separate columns is the first thing that comes to mind.
UPDATE: Still faster than the suggested idxa should be a covering index for tableA:
create idxa on tablea( trunc(timea), substr(columna, -8), columna );
Here the DBMS can work with the index only and doesn't have to access the table. So just in case the above was still a bit too slow for you, you can try with this altered index.
And as Alex Poole has pointed out in the comments below, it should be
where trunc(timea) = date '2018-01-01'
only, if the range you are looking at is always a single day as in the example.
You can try below using = operator instead of like operator
as your want to match last 2 digit
select trunc(ta.timeA),ta.columnA
from table1A ta inner join tableB tb
on substr(ta.columnA,-8) = substr(tb.columnB,-8)
and trunc(ta.timeA) = trunc(ta.timeB)
AND trunc(ta.timeA) >= TO_DATE('01/01/2018', 'dd/mm/yyyy')
AND trunc(ta.timeA) < TO_DATE('01/01/2018', 'dd/mm/yyyy') + 1
GROUP BY ta.columnA, trunc(ta.timeA)
It would be easier to help if you were more specific about your SQL environment, below is some advice on this query that would apply in most environments.
When dealing with large data sets performance becomes even more critical and small changes in technique can have a big impact.
For example:
Like is normally used for a partial match with wildcards, do you not mean equals? Like is slower than equals, if you're not using wildcards I recommend looking for equality.
Also, you initially start with a (cross/cartesian product) join, but then your where clause defines very specific match criteria (matching time fields), if you need a matching time field make it part of the table join, this will reduce the number of join results which will significantly shrink the dataset that then needs to have the other criteria applied to it.
Also, having date calculated values in your where clause is slow. It is better to set a #fromDate and #toDate parameter before the query, then use these in the where clause as what are then literals which then don't need to be calculated for every row.
I'm trying to track down new id numbers over time for at least the past twelve months. Note, the data is such that once id numbers are in, they stick around for at least 3-5 years. And I just literally run this thing once a month. These are the specs, Oracle Database 11g Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production.
So far I'm wondering if I can use more dynamic date ranges and run this whole thing on a timer, or if this is the best way to write something. I've just picked up sql mostly through Googling and looking at sample queries that others have graciously shared. I also do not know how to write PL/SQL right now either but am willing to learn.
create table New_ids_calendar_year_20xx
as
select b.id_num, (bunch of other fields)
from (select * from source_table where date = last_day(date_add(sysdate,-11))a, (select * from source_table where date = last_day(date_add(sysdate,-10))b where a.id_num (+) = b.id_num
union all
*repeats this same select statement /w union all until:
last_day(date_add(sysdate,0)
In Oracle there is no built-in function date_add, maybe you have some which you created, anyway for adding and substracting dates
I used simple sysdate+number. Also I am not quite sure about logic behind your whole query. And for field names - better avoid
reserved words like date in column names, so I used
tdate here.
This query does what your unioned query did for last 30 days. For other periods change 30 to something other.
The whole solution is based on the hierarchical
subquery connect by which gives simple list of numbers 0..29.
select b.id_num, b.field1 field1_b, a.field1 field1_a --..., (bunch of other fields)
from (select level - 1 lvl from dual connect by level <= 30) l
join source_table b
on b.tdate = last_day(trunc(sysdate) - l.lvl - 1)
left join source_table a
on a.id_num = b.id_num and a.tdate = last_day(trunc(sysdate) - l.lvl)
order by lvl desc
For date column you may want to use trunc(tdate) if you store time also, but this way your index on date field will not work if one exists.
In this case change date condition to something like x-1 <= date and date < x.
select round(avg(et_gsm_sınyal)) as sinyal,mahalle_kodu,ilce_kodu,sebeke
from
(select et_gsm_sınyal,sozlesme_no,SUBSTR(et_operator,1,5) as sebeke
from thkol316old
where tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND tarih < TRUNC(SYSDATE, 'MM')) okuma,
(select sozlesme_no,ilce_kodu,mahalle_kodu from commt020) bilgiler
where okuma.sozlesme_no=bilgiler.sozlesme_no
group by mahalle_kodu,ilce_kodu,sebeke;
commt020 -> customer table
thkol316old -> old bill table
This query is works but it's works very slow.
It's about 550 seconds response time.
What am I supposed to do this query work faster ?
It's the execution plan
SELECT STATEMENT 7547
HASH
GROUP BY 7547
FILTER
Filter Predicates
ADD_MONTHS(TRUNC(SYSDATE#!,'fmmm'),-1)
NESTED LOOPS
NESTED LOOPS
7546
TABLE ACCESS
COMMT020 BY GLOBAL INDEX ROWID 3 ROW LOCATION ROW LOCATION`
First of all I would try with ordinary table joins instead of the more awkward inline view - joining, I hope the following will work (I haven't created tables and tried it):
select round(avg(okuma.et_gsm_sınyal)) as sinyal,
bilgiler.mahalle_kodu,bilgiler.ilce_kodu, SUBSTR(okuma.et_operator,1,5) as sebeke
from thkol316old okuma
inner join commt020 bilgiler on okuma.sozlesme_no=bilgiler.sozlesme_no
where okuma.tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND okuma.tarih < TRUNC(SYSDATE, 'MM')
group by bilgiler.mahalle_kodu,bilgiler.ilce_kodu, SUBSTR(okuma.et_operator,1,5);
Then, if things still are slow:
Verify that there is an index where bilgiler.sozlesme_no is the
first column in the index
Verify that there is an index where okuma.sozlesme_no is the first column in the index
Verify that there is an index where okuma.tarih is the first column in the index
Clearly
column tarih is not indexed for table thkol316old
create index tarih_idx on thkol316old (tarih);
Then
analyze table commt020 compute statistics;
analyze table thkol316old compute statistics;
In Oracle analyzing tables produce information that is used in creating query execution plans. Every time you alter a table you may want to also analyze it. Many big systems do it on a schedule.
First of all, you don’t need BILIGILER inline view just to select a few columns from COMMT020 table, you’ll be perfectly fine selecting from that table directly:
SELECT ROUND(AVG(et_gsm_s1nyal)) AS sinyal,
mahalle_kodu,ilce_kodu,sebeke
FROM (
SELECT et_gsm_s1nyal,
sozlesme_no,
SUBSTR(et_operator,1,5) AS sebeke
FROM thkol316old
WHERE tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND tarih < TRUNC(SYSDATE, 'MM')
) okuma, commt020
WHERE okuma.sozlesme_no = commt020.sozlesme_no
GROUP BY mahalle_kodu,ilce_kodu,sebeke
/
Then, let’s rewrite the join using ANSI join expression. I prefer ANSI joins over old Oracle-style joins because they allow separating join conditions from filtering conditions, therefore providing more clarity into what’s actually going on. Also, it is a good style to assign aliases to the tables and clearly indicate which columns we select from which table.
SELECT ROUND(AVG(o.et_gsm_s1nyal)) AS sinyal,
c.mahalle_kodu, c.ilce_kodu, o.sebeke
FROM (
SELECT th.et_gsm_s1nyal,
th.sozlesme_no,
SUBSTR(th.et_operator,1,5) AS sebeke
FROM thkol316old th
WHERE th.tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND th.tarih < TRUNC(SYSDATE, 'MM')
) okuma o
JOIN commt020 c ON o.sozlesme_no = c.sozlesme_no
GROUP BY c.mahalle_kodu, c.ilce_kodu, o.sebeke
/
Now it is clearer than the remaining inline view is redundant too. Although it's hard to tell without knowing the details of those tables, but you will probably be better off “unwrapping” that inline view and replacing it with a straight join:
SELECT ROUND(AVG(th.et_gsm_s1nyal)) AS sinyal,
c.mahalle_kodu,
c.ilce_kodu,
SUBSTR(th.et_operator,1,5) AS sebeke
FROM commt020 c
JOIN thkol316old th ON c.sozlesme_no = th.sozlesme_no
WHERE th.tarih >= ADD_MONTHS (TRUNC (SYSDATE, 'MM'), -1)
AND th.tarih < TRUNC(SYSDATE, 'MM')
GROUP BY c.mahalle_kodu, c.ilce_kodu, SUBSTR(th.et_operator,1,5)
/
Unfortunately optimizing this query further requires additional information, such as:
The new execution plan.
The version of Oracle you’re using.
Numbers of rows in THKOL316OLD and COMMT020 tables.
What indexes exist on those tables.
I'm not as clued up on shortcuts in SQL so I was hoping to utilize the brainpower on here to help speed up a query I'm using. I'm currently using Oracle 8i.
I have a query:
SELECT
NAME_CODE, ACTIVITY_CODE, GPS_CODE
FROM
(SELECT
a.NAME_CODE, b.ACTIVITY_CODE, a.GPS_CODE,
ROW_NUMBER() OVER (PARTITION BY a.GPS_DATE ORDER BY b.ACTIVITY_DATE DESC) AS RN
FROM GPS_TABLE a, ACTIVITY_TABLE b
WHERE a.NAME_CODE = b.NAME_CODE
AND a.GPS_DATE >= b.ACTIVITY_DATE
AND TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2)
WHERE
RN = 1
and this takes about 7 minutes give or take 10 seconds to run.
Now the GPS_TABLE is currently 6.586.429 rows and continues to grow as new GPS coordinates are put into the system, each day it grows by about 8.000 rows in 6 columns.
The ACTIVITY_TABLE is currently 1.989.093 rows and continues to grow as new activities are put into the system, each day it grows by about 2.000 rows in 31 columns.
So all in all these are not small tables and I understand that there will always be a time hit running this or similar queries. As you can see I'm already limiting it to only the last 2 days worth of data, but anything to speed it up would be appreciated.
Your strongest filter seems to be the filter on the last 2 days of GPS_TABLE. It should filter the GPS_TABLE to about 15k rows. Therefore one of the best candidate for improvement is an index on the column GPS_DATE.
You will find that your filter TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2 is equivalent to a.GPS_DATE > TRUNC(SYSDATE) - 2, therefore a simple index on your column will work if you change the query. If you can't change it, you could add a function-based index on TRUNC(GPS_DATE).
Once you have this index in place, we need to access the rows in ACTIVITY_TABLE. The problem with your join is that we will get all the old activities and therefore a good portion of the table. This means that the join as it is will not be efficient with index scans.
I suggest you define an index on ACTIVITY_TABLE(name_code, activity_date DESC) and a PL/SQL function that will retrieve the last activity in the least amount of work using this index specifically:
CREATE OR REPLACE FUNCTION get_last_activity (p_name_code VARCHAR2,
p_gps_date DATE)
RETURN ACTIVITY_TABLE.activity_code%type IS
l_result ACTIVITY_TABLE.activity_code%type;
BEGIN
SELECT activity_code
INTO l_result
FROM (SELECT activity_code
FROM activity_table
WHERE name_code = p_name_code
AND activity_date <= p_gps_date
ORDER BY activity_date DESC)
WHERE ROWNUM = 1;
RETURN l_result;
END;
Modify your query to use this function:
SELECT a.NAME_CODE,
a.GPS_CODE,
get_last_activity(a.name_code, a.gps_date)
FROM GPS_TABLE a
WHERE trunc(a.GPS_DATE) > trunc(sysdate) - 2
Optimising an SQL query is generally done by:
Add some indexes
Try a different way to get the same information
So, start by adding an index for ACTIVITY_DATE, and perhaps some other fields that are used in the conditions.