Optimizing Oracle query - sql

SELECT MAX(verification_id)
FROM VERIFICATION_TABLE
WHERE head = 687422
AND mbr = 23102
AND RTRIM(LTRIM(lname)) = '.iq bzw'
AND TO_CHAR(dob,'MM/DD/YYYY')= '08/10/2004'
AND system_code = 'M';
This query is taking 153 seconds to run. there are millions of rows in VERIFICATION_TABLE.
I think query is taking long because of the functions in where clause. However, I need to do ltrim rtrim on the columns and also date has to be matched in MM/DD/YYYY format. How can I optimize this query?
Explain plan:
SELECT STATEMENT, GOAL = ALL_ROWS 80604 1 59
SORT AGGREGATE 1 59
TABLE ACCESS FULL P181 VERIFICATION_TABLE 80604 1 59
Primary key:
VRFTN_PK Primary VERIFICATION_ID
Indexes:
N_VRFTN_IDX2 head, mbr, dob, lname, verification_id
N_VRFTN_IDX3 last_update_date
N_VRFTN_IDX4 mbr, lname, dob, verification_id
N_VRFTN_IDX4 verification_id
Though, in the explain plan I dont see indexes/primary key being used. is that the problem?

Try this:
SELECT MAX(verification_id)
FROM VERIFICATION_TABLE
WHERE head = 687422
AND mbr = 23102
AND TRIM(lname) = '.iq bzw'
AND TRUNCATE(dob) = TO_DATE('08/10/2004')
AND system_code = 'M';
Remove that TRUNCATE() if dob doesn't have time on it already, from the looks of it (Date of Birth?) it may not. Past that, you need some indexing work. If you're querying that much in this style, I'd index mbr and head in a 2 column index, if you said what the columns mean it'd help determine the best indexing here.

The only index that is a possible candidate for use in your query is N_VRFTN_IDX2, because it indexes four of the columns you use in your WHERE clause: HEAD, MBR, DOB and LNAME.
However, because you apply functions to both DOB and LNAME they are ineligible for consideration. The optimizer may then decide not to use that index because it thinks HEAD+MBR on their own are an insufficiently selective combination. If you removed the TO_CHAR() call from DOB then you have three leading columns on N_VRFTN_IDX2 which might make it more attractive to the optimizer. Likewise, is it necessary to TRIM() LNAME?
The other thing is, the need to look up SYSTEM_CODE means the query has to read from the table (because that column is not indexed). If N_VRFTN_IDX2 has a poor clustering factoring the optimizer may decide to go for a FULL TABLE SCAN because the indexed reads are an overhead. Whereas if you added SYSTEM_CODE to the index the entire query could be satisfied by an INDEX RANGE SCAN, which would be a lot faster.
Finally, how fresh are your statistics? If your statistics are stale, that might lead the optimizer to make a duff decision. For instance, more accurate statistics might lead the optimizer to use the compound index even with just the two leading columns.

You should turn the literal into a DATE and not the column into a VARCHAR2 like this:
AND dob = TO_DATE('08/10/2004','MM/DD/YYYY')
Or use the preferable ANSI date literal syntax:
AND dob = DATE '2004-08-10'
If the dob column contains time (a date of birth doesn't usually, except presumably in a hospital!) then you can do:
AND dob >= DATE '2004-08-10'
AND dob < DATE '2004-08-11'

Check the datatypes for HEAD and MBR.
The values "687422 and 23102" have the 'feel' of being quite selective. That is, if you have hundreds of thousands of values for head and millions of records in the table, it would seem that HEAD is quite selective. [That could be totally misleading though.]
Anyway, you may find that HEAD and/or MBR are actually stored as VARCHAR2 or CHAR fields rather than NUMBER. If so, comparing the character to a number would prevent the use of the index. Try the following (and I've included the conversion of the dob predicate with a date but added the explicit format mask).
SELECT MAX(verification_id)
FROM VERIFICATION_TABLE
WHERE head = '687422'
AND mbr = '23102'
AND RTRIM(LTRIM(lname)) = '.iq bzw'
AND TRUNCATE(dob) = TO_DATE('08/10/2004','MM/DD/YYYY')
AND system_code = 'M';

Please provide an EXPLAIN output on this query so we know where the slow-down occurs. Two thoughts:
change
AND TO_CHAR(dob,'MM/DD/YYYY')= '08/10/2004'
to
AND dob = <date here, not sure which oracle str2date function you need>
and use a function based index on
RTRIM(LTRIM(lname))

Try this:
SELECT MAX(verification_id)
FROM VERIFICATION_TABLE
WHERE head = 687422
AND mbr = 23102
AND TRIM(lname) = '.iq bzw'
AND dob between TO_DATE('08/10/2004') and TO_DATE('08/11/2004')
AND system_code = 'M';
This way a possible index on dob will be used.

Related

How to speed up sql sub query

i'm running a job and its taking too long to run.
I have created a job to update based on the value of multiple tables
UPDATE applicant_scores
SET applicant_scores.Age=2.5
where applicant_scores.Applicant_id in
(select applicantinfo.subebno from applicantinfo
WHERE SUBSTR(applicantinfo.DOB,7,4) ='1985')
This should update a column with about 17000 rows, but it's taking too long time.
I would recommend using not exists and an index:
UPDATE applicant_scores
SET applicant_scores.Age = 2.5
WHERE EXISTS (SELECT 1
FROM applicantinfo ai
WHERE appliacnt_scores.Applicant_id = ai.subebno AND
SUBSTR(ai.DOB, 7, 4) ='1985'
);
For performance, you want an index on applicantinfo(subebno, DOB).
Note: DOB probably means "date of birth". It should be stored as a date in your database and you should be using proper date functions, such as:
extract(year from dob) = 1985
year(dob) = 1985
dob >= '1985-01-01' and dob < '1986-01-01'
Do not store dates as strings. Do not use string functions on dates.
Your problem is this:
WHERE SUBSTR(applicantinfo.DOB,7,4) ='1985'
The database doesn't have a way to quickly find all the rows that match that criteria. There's no index. It has to check every row in the database to find the ones that match that expression.
One solution you have is you can add another column to your table, maybe call it dob_year, then is just the year out of that date. Then, you CREATE INDEX applicantinfo_dob_year ON applicantinfo(dob_year). Then change your WHERE clause to WHERE dob_year ='1985')
https://use-the-index-luke.com/ is a great site for learning about database indexes and how to use them properly to make your queries fast.
SET Age_Score=(SELECT Age_Score FROM age_scoretbl WHERE SUBSTR(applicantinfo.DOB,7,4)= age_scoretbl.Birth_Year);```

Why does changing the where clause on this criteria reduce the execution time so drastically?

I ran across a problem with a SQL statement today that I was able to fix by adding additional criteria, however I really want to know why my change fixed the problem.
The problem query:
SELECT *
FROM
(SELECT ah.*,
com.location,
ha.customer_number,
d.name applicance_NAME,
house.name house_NAME,
dr.name RULE_NAME
FROM actionhistory ah
INNER JOIN community com
ON (t.city_id = com.city_id)
INNER JOIN house_address ha
ON (t.applicance_id = ha.applicance_id
AND ha.status_cd = 'ACTIVE')
INNER JOIN applicance d
ON (t.applicance_id = d.applicance_id)
INNER JOIN house house
ON (house.house_id = t.house_id)
LEFT JOIN the_rule tr
ON (tr.the_rule_id = t.the_rule_id)
WHERE actionhistory_id >= 'ACT100010000'
ORDER BY actionhistory_id
)
WHERE rownum <= 30000;
The "fix"
SELECT *
FROM
(SELECT ah.*,
com.location,
ha.customer_number,
d.name applicance_NAME,
house.name house_NAME,
dr.name RULE_NAME
FROM actionhistory ah
INNER JOIN community com
ON (t.city_id = com.city_id)
INNER JOIN house_address ha
ON (t.applicance_id = ha.applicance_id
AND ha.status_cd = 'ACTIVE')
INNER JOIN applicance d
ON (t.applicance_id = d.applicance_id)
INNER JOIN house house
ON (house.house_id = t.house_id)
LEFT JOIN the_rule tr
ON (tr.the_rule_id = t.the_rule_id)
WHERE actionhistory_id >= 'ACT100010000' and actionhistory_id <= 'ACT100030000'
ORDER BY actionhistory_id
)
All of the _id columns are indexed sequences.
The first query's explain plan had a cost of 372 and the second was 14. This is running on an Oracle 11g database.
Additionally, if actionhistory_id in the where clause is anything less than ACT100000000, the original query returns instantly.
This is because of the index on the actionhistory_id column.
During the first query Oracle has to return all the index blocks containing indexes for records that come after 'ACT100010000', then it has to match the index to the table to get all the records, and then it pulls 29999 records from the result set.
During the second query Oracle only has to return the index blocks containing records between 'ACT100010000' and 'ACT100030000'. Then it grabs from the table those records that are represented in the index blocks. A lot less work in that step of grabbing the record after having found the index than if you use the first query.
Noticing your last line about if the id is less than ACT100000000 - sounds to me that those records may all be in the same memory block (or in a contiguous set of blocks).
EDIT: Please also consider what is said by Justin - I was talking about actual performance, but he is pointing out that the id being a varchar greatly increases the potential values (as opposed to a number) and that the estimated plan may reflect a greater time than reality because the optimizer doesn't know the full range until execution. To further optimize, taking his point into consideration, you could put a function based index on the id column or you could make it a combination key, with the varchar portion in one column and the numeric portion in another.
What are the plans for both queries?
Are the statistics on your tables up to date?
Do the two queries return the same set of rows? It's not obvious that they do but perhaps ACT100030000 is the largest actionhistory_id in the system. It's also a bit confusing because the first query has a predicate on actionhistory_id with a value of TRA100010000 which is very different than the ACT value in the second query. I'm guessing that is a typo?
Are you measuring the time required to fetch the first row? Or the time required to fetch the last row? What are those elapsed times?
My guess without that information is that the fact that you appear to be using the wrong data type for your actionhistory_id column is affecting the Oracle optimizer's ability to generate appropriate cardinality estimates which is likely causing the optimizer to underestimate the selectivity of your predicates and to generate poorly performing plans. A human may be able to guess that actionhistory_id is a string that starts with ACT10000 and then has 30,000 sequential numeric values from 00001 to 30000 but the optimizer is not that smart. It sees a 13 character string and isn't able to figure out that the last 10 characters are always going to be numbers so there are only 10 possible values rather than 256 (assuming 8-bit characters) and that the first 8 characters are always going to be the same constant value. If, on the other hand, actionhistory_id was defined as a NUMBER and had values between 1 and 30000, it would be dramatically easier for the optimizer to make reasonable estimates about the selectivity of various predicates.

What does +0 mean after an ORDER BY in Oracle

I am trying to understand what the +0 at the end of this Oracle 9i query means:
SELECT /*+ INDEX (a CODE_ZIP_CODE_IX) */
a.city,
a.state,
LPAD(a.code,5,0) ZipCode,
b.County_Name CoName,
c.Description RegDesc,
d.Description RegTypeDesc
FROM TBL_CODE_ZIP a,
TBL_CODE_COUNTY b,
TBL_CODE_REGION c,
TBL_CODE_REGION_TYPE d
WHERE a.City = 'LONDONDERRY'
AND a.State = 'NH'
AND lpad(a.Code,5,0) = '03038'
AND a.Region_Type_Code = 1
AND b.County(+) = a.County_Code
AND b.STATE(+) = a.STATE
AND c.Code(+) = a.Region_Code
AND d.Code(+) = a.Region_Type_Code
ORDER BY a.Code +0
Any ideas?
NOTE: I don't think it has to do with ascending or descending since I can't add asc or desc between a.Code and +0 and I can add asc or desc after +0
The + 0 was a trick back in the days of the rule based optimizer, which made it impossible to use an index on the numeric column. Similarly, they did a || '' for alphanumeric columns.
For your query, the only conclusion I can reach after inspecting it is that its creator was struggling with the performance. If (that's my assumption) index CODE_ZIP_CODE_IX is an index on TBL_CODE_ZIP(Code), then the query won't use it, even though it is hinted to use it. The creator probably wasn't aware that by using LPAD(a.code,5,0) instead of a.code, the index cannot be used. An order by clause takes its intermediate result set - which resides in memory - and sorts it. No index is needed for that. But with the + 0 it looks like he was thinking to disable it.
So, the tricks that were used were ineffective, and are now only misleading, as you have found out.
Regards,
Rob.
PS1: It's better to use LPAD(TO_CHAR(a.code),5,'0') or TO_CHAR(a.code,'fm00009'). Then it is clear what you are doing with the datatype.
PS2: Your query might benefit from using a function based index on LPAD(TO_CHAR(a.code),5,'0'), or whatever expression you use to left pad your zipcode.
My guess would be that a.code is a VARCHAR2 containing a numeric string, and the +0 is effectively casting it to a NUMBER so the sort will be numeric rather than alpha
You should be able to add ASC/DESC after the +0
Note: I had deleted this answer, because Mark B was the faster typist. However, I have re-instated it because I think there is some value in demonstrating what may have been the underlying intent of the SQL which Lucas posted.
Suppose CODE had been a VARCHAR2 column holding strings of digits (zip codes). The problem is that varchars sort as strings not numbers. Adding a zero to the CODE spawns an implicit cast to number, and hence sorts numerically:
SQL> select id, code
2 from t72
3 order by code
4 /
ID CODE
---------- -----
1 1
2 11
3 111
4 12
SQL> select id, code
2 from t72
3 order by code+0
4 /
ID CODE
---------- -----
1 1
2 11
4 12
3 111
SQL>
If the stored codes had been left-padded with zeroes then the cast would not have been necessary, as they would sort in numeric order anyway.
As others have observed, using TO_NUMBER() would have been the better choice. The +0 is less obvious than an explicit cast, and it is always good to be clear about intent.
Is there an index on TBL_CODE_ZIP.Code? I've seen queries that add 0 to a number (or '' to a string) in order to force the optimizer to avoid using an index for that part of the query. (Of course, the proper way to avoid using an index is to add an appropriate hint)
Maybe the original writer had a problem where the ORDER BY was being optimized to an index scan, which caused the query to run slower; so they added +0 to force a different access path and do an ordinary sort.
Fisrt of all sorry for answer because it is very old question now. However +0 is a hint to your database to ignore the index (if it is on a.Code column) for this specific query,
Some time index make retrieval fast while some time make it very slow depending on optimizer mode of database.
so now you have two options eigther use +0 hint or delete index if it on a.code you will get same speed.

Optimizing a strange MySQL Query

Hoping someone can help with this. I have a query that pulls data from a PHP application and turns it into a view for use in a Ruby on Rails application. The PHP app's table is an E-A-V style table, with the following business rules:
Given fields: First Name, Last Name, Email Address, Phone Number and Mobile Phone Carrier:
Each property has two custom fields defined: one being required, one being not required. Clients can use either one, and different clients use different ones based on their own rules (e.g. Client A may not care about First and Last Name, but client B might)
The RoR app must treat each "pair" of properties as only a single property.
Now, here is the query. The problem is it runs beautifully with around 11,000 records. However, the real database has over 40,000 and the query is extremely slow, taking roughly 125 seconds to run which is totally unacceptable from a business perspective. It's absolutely required that we pull this data, and we need to interface with the existing system.
The UserID part is to fake out a Rails-esque foreign key which relates to a Rails table. I'm a SQL Server guy, not a MySQL guy, so maybe someone can point out how to improve this query? They (the business) demand that it be sped up but I'm not sure how to since the various group_concat and ifnull calls are required due to the fact that I need every field for every client and then have to combine the data.
select `ls`.`subscriberid` AS `id`,left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) AS `user_id`,
ifnull(min((case when (`s`.`fieldid` in (2,35)) then `s`.`data` else NULL end)),_utf8'') AS `first_name`,
ifnull(min((case when (`s`.`fieldid` in (3,36)) then `s`.`data` else NULL end)),_utf8'') AS `last_name`,
ifnull(`ls`.`emailaddress`,_utf8'') AS `email_address`,
ifnull(group_concat((case when (`s`.`fieldid` = 81) then `s`.`data` when (`s`.`fieldid` = 154) then `s`.`data` else NULL end) separator ''),_utf8'') AS `mobile_phone`,
ifnull(group_concat((case when (`s`.`fieldid` = 100) then `s`.`data` else NULL end) separator ','),_utf8'') AS `sms_only`,
ifnull(group_concat((case when (`s`.`fieldid` = 34) then `s`.`data` else NULL end) separator ','),_utf8'') AS `mobile_carrier`
from ((`list_subscribers` `ls`
join `lists` `l` on((`ls`.`listid` = `l`.`listid`)))
left join `subscribers_data` `s` on((`ls`.`subscriberid` = `s`.`subscriberid`)))
where (left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) regexp _utf8'[[:digit:]]+')
group by `ls`.`subscriberid`,`l`.`name`,`ls`.`emailaddress`
EDIT
I removed the regexp and that sped the query up to about 20 seconds, instead of nearly 120 seconds. If I could remove the group by then it would be faster, but I cannot as removing this causes it to duplicate rows with blank data for each field, instead of aggregating them. For instance:
With group by
id user_id first_name last_name email_address mobile_phone sms_only mobile_carrier
1 1 John Doe jdoe#example.com 5551234567 0 Sprint
Without group by
id user_id first_name last_name email_address mobile_phone sms_only mobile_carrier
1 1 John jdoe#xample.com
1 1 Doe jdoe#example.com
1 1 jdoe#example.com
1 1 jdoe#example.com 5551234567
And so on. What we need is the first result.
EDIT #2
The query still seems to take a long time, but earlier today it was running in only about 20 seconds on the production database. Without changing a thing, the same query is now once again taking over 60 seconds. This is still unacceptable.. any other ideas on how to improve this?
That is, without a doubt, the second most hideous SQL query I have ever laid my eyes on :-)
My advice is to trade storage requirements for speed. This is a common trick used when you find your queries have a lot of per-row functions (ifnull, case and so forth). These per-row functions never scale very well as the table becomes larger.
Create new fields in the table which will hold the values you want to extract and then calculate those values on insert/update (with a trigger) rather than select. This doesn't technically break 3NF since the triggers guarantee data consistency between columns.
The vast majority of database tables are read far more often than they're written so this will amortise the cost of the calculation across many selects. In addition, just about every reported problem with databases is one of speed, not storage.
An example of what I mean. You can replace:
case when (`s`.`fieldid` in (2,35)) then `s`.`data` else NULL end
with:
`s`.`data_2_35`
in your query if your insert/update trigger simply sets the data_2_35 column to data or NULL depending on the value of fieldid. Then you index data_2_35 and, voila, instant speed improvement at the cost of a little storage.
This trick can be done to the five case clauses, the left/regexp bit and the "naked" ifnull function as well (the ifnull functions containing min and group_concat may be harder to do).
The problem is most likely the WHERE condition:
where (left(`l`.`name`,(locate(_utf8'_',`l`.`name`) - 1)) regexp _utf8'[[:digit:]]+')
This looks like complex string comparison, so no index can be used, which results in a full table scan, possibly for every row in the result set. I am not a MySQL expert, but if you can simplify this into more simple column comparisons it will probably run much faster.
The first thing that jumps out at me as the source of all the trouble:
The PHP app's table is an E-A-V style table...
Trying to convert data in EAV format into conventional relational format on the fly using SQL is bound to be awkward and inefficient. So don't try to smash it into a conventional column-per-attribute format. The following query returns multiple rows per subscriber, one row per EAV attribute:
SELECT ls.subscriberid AS id,
SUBSTRING_INDEX(l.name, _utf8'_', 1) AS user_id,
COALESCE(ls.emailaddress, _utf8'') AS email_address,
s.fieldid, s.data
FROM list_subscribers ls JOIN lists l ON (ls.listid = l.listid)
LEFT JOIN subscribers_data s ON (ls.subscriberid = s.subscriberid
AND s.fieldid IN (2,3,34,35,36,81,100,154)
WHERE SUBSTRING_INDEX(l.name, _utf8'_', 1) REGEXP _utf8'[[:digit:]]+'
This eliminates the GROUP BY which is not optimized well in MySQL -- it usually incurs a temporary table which kills performance.
id user_id email_address fieldid data
1 1 jdoe#example.com 2 John
1 1 jdoe#example.com 3 Doe
1 1 jdoe#example.com 81 5551234567
But you'll have to sort out the EAV attributes in application code. That is, you can't seamlessly use ActiveRecord in this case. Sorry about that, but that's one of the disadvantages of using a non-relational design like EAV.
The next thing that I notice is the killer string manipulation (even after I've simplified it with SUBSTRING_INDEX()). When you're picking substrings out of a column, this says you me that you've overloaded one column with two distinct pieces of information. One is the name and the other is some kind of list-type attribute that you would use to filter the query. Store one piece of information in each column.
You should add a column for this attribute, and index it. Then the WHERE clause can utilize the index:
SELECT ls.subscriberid AS id,
SUBSTRING_INDEX(l.name, _utf8'_', 1) AS user_id,
COALESCE(ls.emailaddress, _utf8'') AS email_address,
s.fieldid, s.data
FROM list_subscribers ls JOIN lists l ON (ls.listid = l.listid)
LEFT JOIN subscribers_data s ON (ls.subscriberid = s.subscriberid
AND s.fieldid IN (2,3,34,35,36,81,100,154)
WHERE l.list_name_contains_digits = 1;
Also, you should always analyze an SQL query with EXPLAIN if it's important for them to have good performance. There's an analogous feature in MS SQL Server, so you should be accustomed to the concept, but the MySQL terminology may be different.
You'll have to read the documentation to learn how to interpret the EXPLAIN report in MySQL, there's too much info to describe here.
Re your additional info: Yes, I understand you can't do away with the EAV table structure. Can you create an additional table? Then you can load the EAV data into it:
CREATE TABLE subscriber_mirror (
subscriberid INT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
first_name2 VARCHAR(100),
last_name2 VARCHAR(100),
mobile_phone VARCHAR(100),
sms_only VARCHAR(100),
mobile_carrier VARCHAR(100)
);
INSERT INTO subscriber_mirror (subscriberid)
SELECT DISTINCT subscriberid FROM list_subscribers;
UPDATE subscriber_data s JOIN subscriber_mirror m USING (subscriberid)
SET m.first_name = IF(s.fieldid = 2, s.data, m.first_name),
m.last_name = IF(s.fieldid = 3, s.data, m.last_name),
m.first_name2 = IF(s.fieldid = 35, s.data, m.first_name2),
m.last_name2 = IF(s.fieldid = 36, s.data, m.last_name2),
m.mobile_phone = IF(s.fieldid = 81, s.data, m.mobile_phone),
m.sms_only = IF(s.fieldid = 100, s.data, m.sms_only),
m.mobile_carrer = IF(s.fieldid = 34, s.data, m.mobile_carrier);
This will take a while, but you only need to do it when you get a new data update from the vendor. Subsequently you can query subscriber_mirror in a much more conventional SQL query:
SELECT ls.subscriberid AS id, l.name+0 AS user_id,
COALESCE(s.first_name, s.first_name2) AS first_name,
COALESCE(s.last_name, s.last_name2) AS last_name,
COALESCE(ls.email_address, '') AS email_address),
COALESCE(s.mobile_phone, '') AS mobile_phone,
COALESCE(s.sms_only, '') AS sms_only,
COALESCE(s.mobile_carrier, '') AS mobile_carrier
FROM lists l JOIN list_subscribers USING (listid)
JOIN subscriber_mirror s USING (subscriberid)
WHERE l.name+0 > 0
As for the userid that's embedded in the l.name column, if the digits are the leading characters in the column value, MySQL allows you to convert to an integer value much more easily:
An expression like '123_bill'+0 yields an integer value of 123. An expression like 'bill_123'+0 has no digits at the beginning, so it yields an integer value of 0.

Creating a quicker MySQL Query

I'm trying to create a faster query, right now i have large databases. My table sizes are 5 col, 530k rows, and 300 col, 4k rows (sadly i have 0 control over architecture, otherwise I wouldn't be having this silly problem with a poor db).
SELECT cast( table2.foo_1 AS datetime ) as date,
table1.*, table2.foo_2, foo_3, foo_4, foo_5, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11, foo_12, foo_13, foo_14, foo_15, foo_16, foo_17, foo_18, foo_19, foo_20, foo_21
FROM table1, table2
WHERE table2.foo_0 = table1.foo_0
AND table1.bar1 >= NOW()
AND foo_20="tada"
ORDER BY
date desc
LIMIT 0,10
I've indexed the table2.foo_0 and table1.foo_0 along with foo_20 in hopes that it would allow for faster querying.. i'm still at nearly 7 second load time.. is there something else I can do?
Cheers
I think an index on bar1 is the key. I always run into performance issues with dates because it has to compare each of the 530K rows.
Create the following indexes:
CREATE INDEX ix_table1_0_1 ON table1 (foo_1, foo_0)
CREATE INDEX ix_table2_20_0 ON table2 (foo_20, foo_0)
and rewrite you query as this:
SELECT cast( table2.foo_1 AS datetime ) as date,
table1.*, table2.foo_2, foo_3, foo_4, foo_5, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11, foo_12, foo_13, foo_14, foo_15, foo_16, foo_17, foo_18, foo_19, foo_20, foo_21
FROM table1
JOIN table2
ON table2.foo_0 = table1.foo_0
AND table2.foo_20 = "tada"
WHERE table1.bar1 >= NOW()
ORDER BY
table1.foo_1 DESC
LIMIT 0, 10
The first index will be used for ORDER BY, the second one will be used for JOIN.
You, though, may benefit more from creating the first index like this:
CREATE INDEX ix_table1_0_1 ON table1 (bar, foo_0)
which may apply more restrictive filtering on bar.
I have a blog post on this:
Choosing index
, which advices on how to choose which index to create for cases like that.
Indexing table1.bar1 may improve the >=NOW comparison.
A compound index on table2.foo_0 and table2.foo_20 will help.
An index on table2.foo_1 may help the sort.
Overall, pasting the output of your query with EXPLAIN prepended may also give some hints.
table2 needs a compound index on foo_0, foo_20, and bar1.
An index on table1.foo_0, table1.bar1 could help too, assuming that foo_20 belongs to table1.
See How to use MySQL indexes and Optimizing queries with explain.
Use compound indexes that corresponds to your WHERE equalities (in general leftmost col in the index), WHERE commparison to abolute value (middle), and ORDER BY clause (right, in the same order).