Finds number which have same last digit from 2 different columns - sql

I have 2 columns of phone number and the requirement is to get the numbers which have same last 8 digits. ColumnA's numbers have 11 digits and columnB's numbers have 9 or 10 digits.
I tried to use SUBSTR or LIKE and LEFT RIGHT function to solve but the problem is the data is too big and i can't use that way.
select trunc(ta.timeA), ta.columnA
from table1A ta,
tableB tb
WHERE substr(ta.columnA,-8) LIKE substr(tb.columnB,-8)
and trunc(ta.timeA) = trunc(ta.timeB)
AND trunc(ta.timeA) >= TO_DATE('01/01/2018', 'dd/mm/yyyy')
AND trunc(ta.timeA) < TO_DATE('01/01/2018', 'dd/mm/yyyy') + 1
GROUP BY ta.columnA, trunc(ta.timeA)

You want to select from tableA, so do this. Don't join. You only want to select tableA rows that have a match in tableB. So place an EXISTS clause in your WHERE clause.
select trunc(timea), columna
from table1a ta
where trunc(timea) >= date '2018-01-01'
and trunc(timea) < date '2018-01-02'
and exists
(
select *
from tableb tb
where trunc(tb.timeb) = trunc(ta.timea)
and substr(tb.columnb, -8) = substr(ta.columna, -8)
)
order by trunc(timea), columna;
In order to have this run fast, create the following indexes:
create idxa on tablea( trunc(timea), substr(columna, -8) );
create idxb on tableb( trunc(timeb), substr(columnb, -8) );
I don't see, however, why you are so eager to have this run fast. Do you want to keep all data as is and run the query again and again? There should be a better solution. Splitting the area code and number into two separate columns is the first thing that comes to mind.
UPDATE: Still faster than the suggested idxa should be a covering index for tableA:
create idxa on tablea( trunc(timea), substr(columna, -8), columna );
Here the DBMS can work with the index only and doesn't have to access the table. So just in case the above was still a bit too slow for you, you can try with this altered index.
And as Alex Poole has pointed out in the comments below, it should be
where trunc(timea) = date '2018-01-01'
only, if the range you are looking at is always a single day as in the example.

You can try below using = operator instead of like operator
as your want to match last 2 digit
select trunc(ta.timeA),ta.columnA
from table1A ta inner join tableB tb
on substr(ta.columnA,-8) = substr(tb.columnB,-8)
and trunc(ta.timeA) = trunc(ta.timeB)
AND trunc(ta.timeA) >= TO_DATE('01/01/2018', 'dd/mm/yyyy')
AND trunc(ta.timeA) < TO_DATE('01/01/2018', 'dd/mm/yyyy') + 1
GROUP BY ta.columnA, trunc(ta.timeA)

It would be easier to help if you were more specific about your SQL environment, below is some advice on this query that would apply in most environments.
When dealing with large data sets performance becomes even more critical and small changes in technique can have a big impact.
For example:
Like is normally used for a partial match with wildcards, do you not mean equals? Like is slower than equals, if you're not using wildcards I recommend looking for equality.
Also, you initially start with a (cross/cartesian product) join, but then your where clause defines very specific match criteria (matching time fields), if you need a matching time field make it part of the table join, this will reduce the number of join results which will significantly shrink the dataset that then needs to have the other criteria applied to it.
Also, having date calculated values in your where clause is slow. It is better to set a #fromDate and #toDate parameter before the query, then use these in the where clause as what are then literals which then don't need to be calculated for every row.

Related

Optimization on large tables

I have the following query that joins two large tables. I am trying to join on patient_id and records that are not older than 30 days.
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0
and to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30
Currently, this query takes 2 hours to run. What indexes can I create on these tables for this query to run faster.
I will take a shot in the dark, because as others said it depends on what the table structure, indices, and the output of the planner is.
The most obvious thing here is that as long as it is possible, you want to represent dates as some date datatype instead of strings. That is the first and most important change you should make here. No index can save you if you transform strings. Because very likely, the problem is not the patient_id, it's your date calculation.
Other than that, forcing hash joins on the patient_id and then doing the filtering could help if for some reason the planner decided to do nested loops for that condition. But that is for after you fixed your date representation AND you still have a problem AND you see that the planner does nested loops on that attribute.
Some observations if you are stuck with string fields for the dates:
YYYYMMDD date strings are ordered and can be used for <,> and =.
Building strings from the data in chairs to use to JOIN on data will make good use of an index like one on data for patient_id, from_date.
So my suggestion would be to write expressions that build the date strings you want to use in the JOIN. Or to put it another way: do not transform the child table data from a string to something else.
Example expression that takes 30 days off a string date and returns a string date:
select to_char(to_date('20200112', 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
Untested:
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and id.from_date between to_char(to_date(c.from_date, 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
and c.from_date
For this query:
select *
from chairs c join data
id
on c.patient_id = id.patient_id and
to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0 and
to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30;
You should start with indexes on (patient_id, from_date) -- you can put them in both tables.
The date comparisons are problematic. Storing the values as actual dates can help. But it is not a 100% solution because comparison operations are still needed.
Depending on what you are actually trying to accomplish there might be other ways of writing the query. I might encourage you to ask a new question, providing sample data, desired results, and a clear explanation of what you really want. For instance, this query is likely to return a lot of rows. And that just takes time as well.
Your query have a non SERGABLE predicate because it uses functions that are iteratively executed. You need to discard such functions and replace them by a direct access to the columns. As an exemple :
SELECT *
FROM chairs AS c
JOIN data AS id
ON c.patient_id = id.patient_id
AND c.from_date BETWEEN id.from_date AND id.from_date + INTERVAL '1 day'
Will run faster with those two indexes :
CREATE X_SQLpro_001 ON chairs (patient_id, from_date);
CREATE X_SQLpro_002 ON data (patient_id, from_date) ;
Also try to avoid
SELECT *
And list only the necessary columns

Oracle Optimize Query

i'm working with oracle pl/sql and i have a stored procedure with this query, and it is a bit convoluted, but it gets the job done, the thing is it takes like 35 minutes, and the sql developer Autotrace says that is doing a full scan even though the tables have their indexes.
So is there any way to improve this query?
select tipotrx, sum(saldo) as saldo,
count(*) as totaltrx from (
select max(ids) as IDTRX, max(monto) as monto, min(saldo) as saldo, max(aq_data) as aq_data, thekey, tipotrx
from (
select t.SID as ids, (TO_NUMBER(SUBSTR(P.P1, 18, 12))) as monto,
((TO_NUMBER(SUBSTR(P.P1, 18, 12)) * (TO_NUMBER(SUBSTR(t.acquirer_data, 13,2)) -
TO_NUMBER(SUBSTR(P.P4, 3,2))))) as saldo,
(TO_CHAR(t.trx_date, 'YYMMDD') || t.auth_code || t.trx_amount || (SELECT
functions.decrypt(t.card_number) FROM DUAL)) as thekey,
t.acquirer_data AS aq_data,
TO_NUMBER(SUBSTR(t.acquirer_data, 12, 1)) as tipotrx
from TBL_TRX t INNER JOIN TBL_POS P ON (t.SID = P.transaction)
WHERE (TO_NUMBER(SUBSTR(t.acquirer_data, 13,2)) >= TO_NUMBER(SUBSTR(P.P4, 3,2)))
AND trunc(t.INC_DATE) between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35) AND TO_DATE('20/06/2020', 'DD/MM/YYYY')
) t
group by thekey, tipotrx order by max(ids) desc) j
group by tipotrx;
Thanks.
Most of the time the index has to match exactly what's in the WHERE clause to be eligible for use. An index on the acquirer_data column cannot be used when your WHERE clause says
TO_NUMBER(SUBSTR(t.acquirer_data, 13,2))
An index on the INC_DATE cannot be used when your WHERE clause says
trunc(t.INC_DATE)
You manipulate every column in the WHERE clause and that alone can potentially prevent the use of any normal index.
If however you create function-based indexes, you can make some new indexes that match what's in your WHERE clause. That way at least there's a chance that the DB will use an index instead of doing full table scans.
--example function based index.
CREATE INDEX TRUNC_INC_DATE ON TBL_TRX (trunc(t.INC_DATE));
Of course, new indexes take up more space and add overhead of their own. Keep using that autotrace to see if it's worth it.
Also, updating table statistics probably wont hurt either.
Change this:
trunc(t.INC_DATE) between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35)
AND TO_DATE('20/06/2020', 'DD/MM/YYYY')
To this:
t.INC_DATE between (TO_DATE('20/06/2020', 'DD/MM/YYYY') - 35)
AND TO_DATE('21/06/2020', 'DD/MM/YYYY') - INTERVAL '1' SECOND
Instead of building a function-based index you can modify the predicate to be sargable (able to use an index). Instead of using TRUNC to subtract from the the column, add a day minus one second to upper bound literal.
The code is more confusing but should be able to take advantage of the index. However, 35 days of data may be a large amount; the date index may not be very useful and you may need to look at other predicates.

Other efficient way to write multiple select

I have following queries:
select * from
( select volume as vol1 from table1 where code='A1' and daytime='12-may-2012') a,
( select volume as vol2 from table2 where code='A2' and daytime='12-may-2012') b,
( select volume as vol3 from table3 where code='A3' and daytime='12-may-2012') c
result:
vol1 vol2 vol3
20 45
What would be other efficient way to write this query(in real case it could be up to 15 sub queries), assuming data not always exists in any of these tables for selected date? I think it could be join but not sure.
thanks,
S
If the concern is that data might not exist, then cross join is not the right operator. If any subquery returns zero rows, then you will get an empty result set.
Assuming at most one row is returned per query, just use subqueries in the select:
select (select volume from table1 where code = 'A1' and daytime = date '2012-05-12') as vol1,
(select volume from table2 where code = 'A2' and daytime = date '2012-05-12') as vol2,
(select volume from table3 where code = 'A3' and daytime = date '2012-05-12') as vol3
from dual;
If a value is missing, it will be NULL. If a subquery returns more than one row, then you'll get an error.
I much prefer ANSI standard formats, which is why I use the date keyword.
I am highly suspicious of comparing a field called datetime to a date constant with no time component. I would double check the logic on this. Perhaps you intend trunc(daytime) = date '2012-05-12' or something similar.
I should also note that if performance is an issue, then you want an index on each table on (code, daytime, volume).

SQLl create table /w select statements and defined date ranges

I'm trying to track down new id numbers over time for at least the past twelve months. Note, the data is such that once id numbers are in, they stick around for at least 3-5 years. And I just literally run this thing once a month. These are the specs, Oracle Database 11g Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production.
So far I'm wondering if I can use more dynamic date ranges and run this whole thing on a timer, or if this is the best way to write something. I've just picked up sql mostly through Googling and looking at sample queries that others have graciously shared. I also do not know how to write PL/SQL right now either but am willing to learn.
create table New_ids_calendar_year_20xx
as
select b.id_num, (bunch of other fields)
from (select * from source_table where date = last_day(date_add(sysdate,-11))a, (select * from source_table where date = last_day(date_add(sysdate,-10))b where a.id_num (+) = b.id_num
union all
*repeats this same select statement /w union all until:
last_day(date_add(sysdate,0)
In Oracle there is no built-in function date_add, maybe you have some which you created, anyway for adding and substracting dates
I used simple sysdate+number. Also I am not quite sure about logic behind your whole query. And for field names - better avoid
reserved words like date in column names, so I used
tdate here.
This query does what your unioned query did for last 30 days. For other periods change 30 to something other.
The whole solution is based on the hierarchical
subquery connect by which gives simple list of numbers 0..29.
select b.id_num, b.field1 field1_b, a.field1 field1_a --..., (bunch of other fields)
from (select level - 1 lvl from dual connect by level <= 30) l
join source_table b
on b.tdate = last_day(trunc(sysdate) - l.lvl - 1)
left join source_table a
on a.id_num = b.id_num and a.tdate = last_day(trunc(sysdate) - l.lvl)
order by lvl desc
For date column you may want to use trunc(tdate) if you store time also, but this way your index on date field will not work if one exists.
In this case change date condition to something like x-1 <= date and date < x.

How do I count data from 2 different tables by date

I have 2 tables with no relations, both tables have different number of columns, but there are a few columns that are the same but hold different data. I was able to create a function or view of only the data I wanted, but when I try to count the data by filtering the date, I always get the wrong count in return. Let me explain by showing the 2 functions and what I try to do:
Function 1
ID - number from 1 to 8
data sent - YES or NO
Date - date value
Function 2
ID - number from 1 to 8
data sent - yes or no
date - date value
Upon running both separately, I get all the rows from the tables and everything looks good.
Then I try to add the following to each function:
select
count([data sent]), ID
from function1
Where (date between #date1 and #date2)
group by ID
The above statement works great and gives me the right result for each function.
Now I thought what if I want to add those 2 functions into one and get the count from both functions on 1 page.
So I created the following function:
Function 3
select
count(Function1.[data sent]) as Expr1,
Function1.id,
count(Function2.[data sent]) as Expr2,
Function1.date
from
Function1
LEFT OUTER JOIN
Function2 on Function1.id = Function2.id
Where
(Function1.date between #date1 and #date2)
group by
Function1.id
Upon running the above, I get the following table:
ID Expr1 Expr2
On both Expr1 and Expr2, I get results which I am not sure where they come from. I guess something is being multiplied by 100000 since one table holds almost 15000 rows and the other around 5000 rows.
What I would like to know first is if it possible at all to be able to filter by date and count records from both table at the same time. If anyone need more information please let me know and I will be glad to share and explain more.
Thank you
The LEFT OUTER JOIN is taking each row of the left table, finding ALL of the rows in the right table with the same id field, and creating that many rows in the result table. Since id isn't what we usually think of as an identity field (it looks more like a "deviceId" or something), you'll get lots of matches for each one. Repeat 15000 times and you get your combinatorial explosion.
Tip: To debug things like this, you can create sample tables with a tiny subset of the real data, say 10 rows from each, and run your query on them. You'll see the issue immediately.
It's possible to filter by date. It's hard to recommend an actual solution without better understanding your phrase "I want to add those 2 functions into one and get the count from both functions on 1 page".
Why can't you create a temporary table for each function then join them together?
Maybe subqueries can help you to achieve what you want:
SELECT
ID = COALESCE(f1.ID, f2.ID),
Date = COALESCE(f1.Date, f2.Date),
f1.Expr1,
f2.Expr2
FROM (
SELECT
ID,
Date,
Expr1 = COUNT([data sent])
FROM Function1
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f1
FULL JOIN (
SELECT
ID,
Date,
Expr2 = COUNT([data sent])
FROM Function2
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f2
ON f1.ID = f2.ID AND f1.Date = f2.Date
This query also uses full (outer) join instead of left join, in case the right side of the join contains rows that have no match in the left side (and you want those rows).