SELECT FROM WHERE IN compared to SELECT FROM on multiple tables - sql

I attend a database course at my school. The teacher gave us a simple exercise: consider the following, simple schema:
Table Book:
Column title (primary key)
Column genre (one of: "romance", "polar", ...)
Table Author:
Column title (foreign key on Book.title)
Column name
Primary key on (title, name)
Among the questions was the following one:
Write the query that returns the authors who have written romance books.
I proposed this answer:
select distinct name
from Author where title in (select title from Book where genre = "romance")
However the teacher said it was wrong, and that the correct answer was:
select distinct name
from Book, Author
where Book.title = Author.title
and genre = "romance"
When I asked for explanations all I got was a "if you had paid more attention to the course you would know why". Brilliant.
So, why is my answer incorrect? What exactly is the difference between these queries? What exactly do they do, on the DB engine level?

So, why is my answer incorrect?
You answer is correct.
My guess why the teacher marked it as wrong, that he/she tried to practise the use of joins with that question. But that should have been part of the question if it was intended.
What exactly is the difference between these queries
Technically they are different indeed. A DBMS with a simple query optimizer will retrieve the subselect in a different way than the join from your teacher's answer.
I wouldn't be surprised if a DBMS with good optimizer might actually come up with the same execution plan for both queries.
Edit
I created some testdata with 50000 books, 50000 authors and 7 different genres to test (smaller numbers don't really make sense as the optimizers tend to simply grab the whole table then). The statement would return 7144 rows.
PostgreSQL
The execution plans are nearly identical with some small change in the "join" method.
Here is the plan for the sub-select version: http://explain.depesz.com/s/eov
Here is the plan for the join version: http://explain.depesz.com/s/aTI
Surprisingly, the join version has a slightly higher cost value.
Oracle
Both plans are 100% identical:
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6815 | 399K| | 273 (2)| 00:00:04 |
| 1 | HASH UNIQUE | | 6815 | 399K| 464K| 273 (2)| 00:00:04 |
|* 2 | HASH JOIN | | 6815 | 399K| | 172 (2)| 00:00:03 |
|* 3 | TABLE ACCESS FULL| BOOK | 6815 | 166K| | 69 (2)| 00:00:01 |
| 4 | TABLE ACCESS FULL| AUTHOR | 50000 | 1708K| | 103 (1)| 00:00:02 |
--------------------------------------------------------------------------------------
Looking at the statistics when using autotrace there is also no difference whatsoever. I didn't bother to actually create a trace file to analyze it as I don't expect to see a difference there.
Things don't really change if an index on book.genre is added. Oracle sticks with the full table scan (even with 100000 rows). Probably because the tables are not very wide and a lot of rows fit on a single page.
PostgreSQL does use the index for both statements but there is still no real difference between the plans.

Both queries are valid and return the same.
Your teacher uses quite outdated (though still valid) join syntax, and you are using the construct which is less efficient in some databases (MySQL, for instance).
If I were your teacher, I would write the query as this:
SELECT DISTINCT name
FROM books b
JOIN authors a
ON a.title = b.title
WHERE b.genre = 'romance'
but still accept both your and your teacher's queries, if the course was not specific to MySQL optimization.
Can't it be what the teacher meant when he/she said about paying attention?
Update:
On the DB engine level both queries would be optimized to use the same plan, except if the DB engine is MySQL.
In MySQL, your query would be forced to use Authors as a leading table, while for you teacher's query, the optimizer can choose which table to make leading depending on the table statistics.

Related

Self-Join with Natural Join

What is the difference between
select * from degreeprogram NATURAL JOIN degreeprogram ;
and
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
in oracle?
I expected that they return the same result set, however, they do not. The second query does what I expect: it joins the two relations using the same named attributes and so it returns the same tuples as stored in degreeprogram. However, the first query is confusing for me: here, each tuple occurs several times in the result set-> what join condition is used here?
Thank you
NATURAL JOIN means join the two tables based on all columns having the same name in both tables.
I imagine that for each column in your table, Oracle is internally writing a condition like:
degreeprogram.column1 = degreeprogram.column1
(which you would not be able to write yourself due to ORA-00918 column ambiguously defined error)
And then, I imagine, Oracle is optimizing that away to just
degreeprogram.column1 is not null
So, you're not exactly getting a CROSS JOIN of your table with itself -- only a CROSS JOIN of those rows having no null columns.
UPDATE: Since this was the selected answer, I will just add from Thorsten Kettner's answer that this behavior is probably a bug on Oracle's part. In 18c, Oracle behaves properly and returns an ORA-00918 error when you try to NATURAL JOIN a table to itself.
The difference between those two statements is that the second explicitly defines a self join on the table, where the first statement, the optimizer is trying to figure out what you really want. On my database, the first statement performs a cartesian merge join and is not optimized at all, and the second statement has a better explain plan, using a single full table access with index scanning.
I'd call this a bug. This query:
select * from degreeprogram d1 NATURAL JOIN degreeprogram d2;
translates to
select col1, col2, ... -- all columns
from degreeprogram d1
join degreeprogram d2 using (col1, col2, ...)
and gives you all rows from the table where all columns are not null (because using(col) never matches nulls).
This query, however:
select * from degreeprogram NATURAL JOIN degreeprogram;
is invalid according to standard SQL, because every table must have a unique name or alias in a query. Oracle lets this pass, but doing so it should do something still to keep the table instances apart (e.g. create internally an alias for them). It obviously doesn't and multiplies the result with the number of rows in the table. A bug.
A so-called natural join instructs the database to
Find all column names common to both tables (in this case, degreeprogram and degreeprogram, which of course have the same columns.)
Generate a join condition for each pair of matching column names, in the form table1.column1 = table2.column1 (in this case, there will be one for every column in degreeprogram.)
Therefore a query like this
select count(*) from demo natural join demo;
will be transformed into
select count(*) from demo, demo where demo.x = demo.x;
I checked this by creating a table with one column and two rows:
create table demo (x integer);
insert into demo values (1);
insert into demo values (2);
commit;
and then tracing the session:
SQL> alter session set tracefile_identifier='demo_trace';
Session altered.
SQL> alter session set events 'trace [SQL_Compiler.*]';
Session altered.
SQL> select /* nj test */ count(*) from demo natural join demo;
COUNT(*)
----------
4
1 row selected.
SQL> alter session set events 'trace [SQL_Compiler.*] off';
Session altered.
Then in twelve_ora_6196_demo_trace.trc I found this line:
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)" FROM "WILLIAM"."DEMO" "DEMO","WILLIAM"."DEMO" "DEMO" WHERE "DEMO"."X"="DEMO"."X"
and a few lines later:
try to generate single-table filter predicates from ORs for query block SEL$58A6D7F6 (#0)
finally: "DEMO"."X" IS NOT NULL
(This is merely an optimisation on top of the generated query above, as column X is nullable but the join allows the optimiser to infer that only non-null values are required. It doesn't replace the joins.)
Hence the execution plan:
-----------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-----------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 7 | |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | MERGE JOIN CARTESIAN | | 4 | 52 | 7 | 00:00:01 |
| 3 | TABLE ACCESS FULL | DEMO | 2 | 26 | 3 | 00:00:01 |
| 4 | BUFFER SORT | | 2 | | 4 | 00:00:01 |
| 5 | TABLE ACCESS FULL | DEMO | 2 | | 2 | 00:00:01 |
-----------------------------------------+-----------------------------------+
Query Block Name / Object Alias(identified by operation id):
------------------------------------------------------------
1 - SEL$58A6D7F6
3 - SEL$58A6D7F6 / DEMO_0001#SEL$1
5 - SEL$58A6D7F6 / DEMO_0002#SEL$1
------------------------------------------------------------
Predicate Information:
----------------------
3 - filter("DEMO"."X" IS NOT NULL)
Alternatively, let's see what dbms_utility.expand_sql_text does with it. I'm not quite sure what to make of this given the trace file above, but it shows a similar expansion taking place:
SQL> var result varchar2(1000)
SQL> exec dbms_utility.expand_sql_text('select count(*) from demo natural join demo', :result)
PL/SQL procedure successfully completed.
RESULT
----------------------------------------------------------------------------------------------------------------------------------
SELECT COUNT(*) "COUNT(*)" FROM (SELECT "A2"."X" "X" FROM "WILLIAM"."DEMO" "A3","WILLIAM"."DEMO" "A2" WHERE "A2"."X"="A2"."X") "A1"
Lesson: NATURAL JOIN is evil. Everybody knows this.

Improve join query in Oracle

I have a query which takes 17 seconds to execute. I have applied indexes on FIPS, STR_DT, END_DT but still it's taking time. Any suggestions on how I can improve the performance?
My query:
SELECT /*+ALL_ROWS*/ K_LF_SVA_VA.NEXTVAL VAL_REC_ID, a.REC_ID,
b.VID,
1 VA_SEQ,
51 VA_VALUE_DATATYPE,
b.VALUE VAL_NUM,
SYSDATE CREATED_DATE,
SYSDATE UPDATED_DATE
FROM CTY_REC a JOIN FIPS_CONS b
ON a.FIPS=b.FIPS AND a.STR_DT=b.STR_DT AND a.END_DT=b.END_DT;
DESC CTY_REC;
Name Null Type
------------------- ---- -------------
REC_ID NUMBER(38)
DATA_SOURCE_DATE DATE
STR_DT DATE
END_DT DATE
VID_RECSET_ID NUMBER
VID_VALSET_ID NUMBER
FIPS VARCHAR2(255)
DESC FIPS_CONS;
Name Null Type
------------- -------- -------------
STR_DT DATE
END_DT DATE
FIPS VARCHAR2(255)
VARIABLE VARCHAR2(515)
VALUE NUMBER
VID NOT NULL NUMBER
Explain Plan:
Plan hash value: 919279614
--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SEQUENCE | K_VAL |
| 2 | HASH JOIN | |
| 3 | TABLE ACCESS FULL| CTY_REC |
| 4 | TABLE ACCESS FULL| FIPS_CONS |
--------------------------------------------------------------
I have added description of tables and explain plan for my query.
On the face of it, and without information on the configuration of the sequence you're using, the number of rows in each table, and the total number of rows projected from the query, it's possible that the execution plan you have is the most efficient one for returning all rows.
The optimiser clearly thinks that the indexes will not benefit performance, and this is often more likely when you optimise for all rows, not first rows. Index-based access is single block and one row at a time, so can be inherently slower than multiblock full scans on a per-block basis.
The hash join that Oracle is using is an extremely efficient way of joining data sets. Unless the hashed table is so large that it spills to disk, the total cost is only slightly more than full scans of the two tables. We need more detailed statistics on the execution to be able to tell if the hashed table is spilling to disk, and if it is the solution may just be modified memory management, not indexes.
What might also hold up your SQL execution is calling that sequence, if the sequence's cache value is very low and the number of records is high. More info required on that -- if you need to generate a sequential identifier for each row then you could use ROWNUM.
This is basically your query:
SELECT . . .
FROM CTY_REC a JOIN
FIPS_CONS b
ON a.FIPS = b.FIPS AND a.STR_DT = b.STR_DT AND a.END_DT = b.END_DT;
You want a composite index on (FIPS, STR_DT, END_DT), perhaps on both tables:
create index idx_cty_rec_3 on cty_rec(FIPS, STR_DT, END_DT);
create index idx_fipx_con_3 on cty_rec(FIPS, STR_DT, END_DT);
Actually, only one is probably necessary but having both gives the optimizer more choices for improving the query.
You should have at least these two indexes on the table:
CTY_REC(FIPS, STR_DT, END_DT)
FIPS_CONS(FIPS, STR_DT, END_DT)
which can still be sped up with covering indexes instead:
CTY_REC(FIPS, STR_DT, END_DT, REC_ID)
FIPS_CONS(FIPS, STR_DT, END_DT, VALUE, VID)
If you wish to drive the optimizer to use the indexes,
replace /*+ all_rows */ with /*+ first_rows */

optimize mysql count query

Is there a way to optimize this further or should I just be satisfied that it takes 9 seconds to count 11M rows ?
devuser#xcmst > mysql --user=user --password=pass -D marctoxctransformation -e "desc record_updates"
+--------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+----------+------+-----+---------+-------+
| record_id | int(11) | YES | MUL | NULL | |
| date_updated | datetime | YES | MUL | NULL | |
+--------------+----------+------+-----+---------+-------+
devuser#xcmst > date; mysql --user=user --password=pass -D marctoxctransformation -e "select count(*) from record_updates where date_updated > '2009-10-11 15:33:22' "; date
Thu Dec 9 11:13:17 EST 2010
+----------+
| count(*) |
+----------+
| 11772117 |
+----------+
Thu Dec 9 11:13:26 EST 2010
devuser#xcmst > mysql --user=user --password=pass -D marctoxctransformation -e "explain select count(*) from record_updates where date_updated > '2009-10-11 15:33:22' "
+----+-------------+----------------+-------+--------------------------------------------------------+--------------------------------------------------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+--------------------------------------------------------+--------------------------------------------------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | record_updates | index | idx_marctoxctransformation_record_updates_date_updated | idx_marctoxctransformation_record_updates_date_updated | 9 | NULL | 11772117 | Using where; Using index |
+----+-------------+----------------+-------+--------------------------------------------------------+--------------------------------------------------------+---------+------+----------+--------------------------+
devuser#xcmst > mysql --user=user --password=pass -D marctoxctransformation -e "show keys from record_updates"
+----------------+------------+--------------------------------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+--------------------------------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| record_updates | 1 | idx_marctoxctransformation_record_updates_date_updated | 1 | date_updated | A | 2416 | NULL | NULL | YES | BTREE | |
| record_updates | 1 | idx_marctoxctransformation_record_updates_record_id | 1 | record_id | A | 11772117 | NULL | NULL | YES | BTREE | |
+----------------+------------+--------------------------------------------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
If mysql has to count 11M rows, there really isn't much of a way to speed up a simple count. At least not to get it to a sub 1 second speed. You should rethink how you do your count. A few ideas:
Add an auto increment field to the table. It looks you wouldn't delete from the table, so you can use simple math to find the record count. Select the min auto increment number for the initial earlier date and the max for the latter date and subtract one from the other to get the record count. For example:
SELECT min(incr_id) min_id FROM record_updates WHERE date_updated BETWEEN '2009-10-11 15:33:22' AND '2009-10-12 23:59:59';
SELECT max(incr_id) max_id FROM record_updates WHERE date_updated > DATE_SUB(NOW(), INTERVAL 2 DAY);`
Create another table summarizing the record count for each day. Then you can query that table for the total records. There would only be 365 records for each year. If you need to get down to more fine grained times, query the summary table for full days and the current table for just the record count for the start and end days. Then add them all together.
If the data isn't changing, which it doesn't seem like it is, then summary tables will be easy to maintain and update. They will significantly speed things up.
Since >'2009-10-11 15:33:22' contains most of the records,
I would suggest to do a reverse matching like <'2009-10-11 15:33:22' (mysql work less harder and less rows involved)
select
TABLE_ROWS -
(select count(*) from record_updates where add_date<"2009-10-11 15:33:22")
from information_schema.tables
where table_schema = "marctoxctransformation" and table_name="record_updates"
You can combine with programming language (like bash shell)
to make this calculation a bit smarter...
such as do execution plan first to calculate which comparison will use lesser row
From my testing (around 10M records), the normal comparison takes around 3s,
and now cut-down to around 0.25s
MySQL doesn't "optimize" count(*) queries in InnoDB because of versioning. Every item in the index has to be iterated over and checked to make sure that the version is correct for display (e.g., not an open commit). Since any of your data can be modified across the database, ranged selects and caching won't work. However, you possibly can get by using triggers. There are two methods to this madness.
This first method risks slowing down your transactions since none of them can truly run in parallel: use after insert and after delete triggers to increment / decrement a counter table. Second trick: use those insert / delete triggers to call a stored procedure which feeds into an external program which similarly adjusts values up and down, or acts upon a non-transactional table. Beware that in the event of a rollback, this will result in inaccurate numbers.
If you don't need an exact numbers, check out this query:
select table_rows from information_schema.tables
where table_name = 'foo';
Example difference: count(*): 1876668, table_rows: 1899004. The table_rows value is an estimation, and you'll get a different number every time even if you database doesn't change.
For my own curiosity: do you need exact numbers that are updated every second? IF so, why?
If the historical data is not volatile, create a summary table. There are various approaches, the one to choose will depend on how your table is updated, and how often.
For example, assuming old data is rarely/never changed, but recent data is, create a monthly summary table, populated for the previous month at the end of each month (eg insert January's count at the end of February). Once you have your summary table, you can add up the full months and the part months at the beginning and end of the range:
select count(*)
from record_updates
where date_updated >= '2009-10-11 15:33:22' and date_updated < '2009-11-01';
select count(*)
from record_updates
where date_updated >= '2010-12-00';
select sum(row_count)
from record_updates_summary
where date_updated >= '2009-11-01' and date_updated < '2010-12-00';
I've left it split out above for clarity but you can do this in one query:
select ( select count(*)
from record_updates
where date_updated >= '2010-12-00'
or ( date_updated>='2009-10-11 15:33:22'
and date_updated < '2009-11-01' ) ) +
( select count(*)
from record_updates
where date_updated >= '2010-12-00' );
You can adapt this approach for make the summary table based on whole weeks or whole days.
You should add an index on the 'date_updated' field.
Another thing you can do if you don't mind changing the structure of the table, is to use the timestamp of the date in 'int' instead of 'datetime' format, and it might be even faster.
If you decide to do so, the query will be
select count(date_updated) from record_updates where date_updated > 1291911807
There is no primary key in your table. It's possible that in this case it always scans the whole table. Having a primary key is never a bad idea.
If you need to return the total table's row count, then there is an alternative to the
SELECT COUNT(*) statement which you can use. SELECT COUNT(*) makes a full table scan to return the total table's row count, so it can take a long time. You can use the sysindexes system table instead in this case. There is a ROWS column in the sysindexes table. This column contains the total row count for each table in your database. So, you can use the following select statement instead of SELECT COUNT(*):
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
This can improve the speed of your query.
EDIT: I have discovered that my answer would be correct if you were using a SQL Server database. MySQL databases do not have a sysindexes table.
It depends on a few things but something like this may work for you
im assuming this count never changes as it is in the past so the result can be cached somehow
count1 = "select count(*) from record_updates where date_updated <= '2009-10-11 15:33:22'"
gives you the total count of records in the table,
this is an approximate value in innodb table so BEWARE, depends on engine
count2 = "select table_rows from information_schema.`TABLES` where table_schema = 'marctoxctransformation' and TABLE_NAME = 'record_updates'"
your answer
result = count2 - count1
There are a few details I'd like you to clarify (would put into comments on the q, but it is actually easier to remove from here when you update your question).
What is the intended usage of data, insert once and get the counts many times, or your inserts and selects are approx on par?
Do you care about insert/update performance?
What is the engine used for the table? (heck you can do SHOW CREATE TABLE ...)
Do you need the counts to be exact or approximately exact (like 0.1% correct)
Can you use triggers, summary tables, change schema, change RDBMS, etc.. or just add/remove indexes?
Maybe you should explain also what is this table supposed to be? You have record_id with cardinality that matches the number of rows, so is it PK or FK or what is it? Also the cardinality of the date_updated suggests (though not necessarily correct) that it has same values for ~5,000 records on average), so what is that? - it is ok to ask a SQL tuning question with not context, but it is also nice to have some context - especially if redesigning is an option.
In the meantime, I'll suggest you to get this tuning script and check the recommendations it will give you (it's just a general tuning script - but it will inspect your data and stats).
Instead of doing count(*), try doing count(1), like this:-
select count(1) from record_updates where date_updated > '2009-10-11 15:33:22'
I took a DB2 class before, and I remember the instructor mentioned about doing a count(1) when we just want to count number of rows in the table regardless the data because it is technically faster than count(*). Let me know if it makes a difference.
NOTE: Here's a link you might be interested to read: http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/

how to optimize a left join query?

I have two tables, jos_eimcart_customers_addresses and jos_eimcart_customers. I want to pull all records from the customers table, and include address information where available from the addresses table. The query does work, but on my localhost machine it took over a minute to run. On localhost, the tables are about 8000 rows each, but in production the tables could have upwards of 25,000 rows each. Is there any way to optimize this so it doesn't take as long? Both tables have an index on the id field, which is primary key. Is there some other index I need to create that would help this run faster? Should the addresses table have an index on the customer_id field, since it's a foreign key? I have other database queries that are similar and run on much larger tables, more quickly.
(EDITED TO ADD: There can be more than one address record per customer, so customer_id is not a unique value in the addresses table.)
select
c.firstname,
c.lastname,
c.email as customer_email,
a.email as address_email,
c.phone as customer_phone,
a.phone as address_phone,
a.company,
a.address1,
a.address2,
a.city,
a.state,a.zip,
c.last_signin
from jos_eimcart_customers c
left join jos_eimcart_customers_addresses a
on c.id = a.customer_id
order by c.last_signin desc
EDITED TO ADD: Explain results
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
==========================================================================================
1 | SIMPLE | c | ALL | NULL | NULL| NULL |NULL |6175 |Using temporary; Using filesort
---------------------------------------------------------------------------------------
1 | SIMPLE | a | ALL | NULL | NULL| NULL |NULL |8111 |
You should create an index on a.customer_id. It doesn't need to be a unique index, but it should definitely be indexed.
Try creating an index and see if it is faster. For further optimisation, you can use SQL's EXPLAIN to see if your query is using indexes where it should be.
Try http://www.dbtuna.com/article.asp?id=14 and http://www.devshed.com/c/a/MySQL/MySQL-Optimization-part-1/2/ for a bit of info on EXPLAIN.
Short answer: Yes, customer_id should have index.
Better answer: It would be best to find a query analyzer for MySql and use it to determine what the actual cause of the slow down is.
For example you could put EXPLAIN before your select and see what the results is.
Optimizing MySQL: Queries and Indexes

How can I optimize this query?

I have the following query:
SELECT `masters_tp`.*, `masters_cp`.`cp` as cp, `masters_cp`.`punti` as punti
FROM (`masters_tp`)
LEFT JOIN `masters_cp` ON `masters_cp`.`nickname` = `masters_tp`.`nickname`
WHERE `masters_tp`.`stake` = 'report_A'
AND `masters_cp`.`stake` = 'report_A'
ORDER BY `masters_tp`.`tp` DESC, `masters_cp`.`punti` DESC
LIMIT 400;
Is there something wrong with this query that could affect the server memory?
Here is the output of EXPLAIN
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | masters_cp | ALL | NULL | NULL | NULL | NULL | 8943 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | masters_tp | ALL | NULL | NULL | NULL | NULL | 12693 | Using where |
Run the same query prefixed with EXPLAIN and add the output to your question - this will show what indexes you are using and the number of rows being analyzed.
You can see from your explain that no indexes are being used, and its having to look at thousands of rows to get your result. Try adding an index on the columns used to perform the join, e.g. nickname and stake:
ALTER TABLE masters_tp ADD INDEX(nickname),ADD INDEX(stake);
ALTER TABLE masters_cp ADD INDEX(nickname),ADD INDEX(stake);
(I've assumed the columns might have duplicated values, if not, use UNIQUE rather than INDEX). See the MySQL manual for more information.
Replace the "masters_tp.* " bit by explicitly naming only the fields from that table you actually need. Even if you need them all, name them all.
There's actually no reason to do a left join here. You're using your filters to whisk away any leftiness of the join. Try this:
SELECT
`masters_tp`.*,
`masters_cp`.`cp` as cp,
`masters_cp`.`punti` as punti
FROM
`masters_tp`
INNER JOIN `masters_cp` ON
`masters_tp`.`stake` = `masters_cp`.stake`
and `masters_tp`.`nickname` = `masters_cp`.`nickname`
WHERE
`masters_tp`.`stake` = 'report_A'
ORDER BY
`masters_tp`.`tp` DESC,
`masters_cp`.`punti` DESC
LIMIT 400;
inner joins tend to be faster than left joins. The query can limit the number of rows that have to be joined using the predicates (aka the where clause). This means that the database is handling, potentially, a lot less rows, which obviously speeds things up.
Additionally, make sure you have a non-clustered index on stake and nickname (in that order).
It is simple query. I think everything is ok with it. You can try add indexes on 'stake' fields or make limit lower.