Imagine you have a table comments in your database.
The comment table has the columns, id, text, show, comment_id_no.
If a user enters a comment, it inserts a row into the database
| id | comment_id_no | text | show | inserted_at |
| -- | -------------- | ---- | ---- | ----------- |
| 1 | 1 | hi | true | 1/1/2000 |
If a user wants to update that comment it inserts a new row into the db
| id | comment_id_no | text | show | inserted_at |
| -- | -------------- | ---- | ---- | ----------- |
| 1 | 1 | hi | true | 1/1/2000 |
| 2 | 1 | hey | true | 1/1/2001 |
Notice it keeps the same comment_id_no. This is so we will be able to see the history of a comment.
Now the user decides that they no longer want to display their comment
| id | comment_id_no | text | show | inserted_at |
| -- | -------------- | ---- | ----- | ----------- |
| 1 | 1 | hi | true | 1/1/2000 |
| 2 | 1 | hey | true | 1/1/2001 |
| 3 | 1 | hey | false | 1/1/2002 |
This hides the comment from the end users.
Now a second comment is made (not an update of the first)
| id | comment_id_no | text | show | inserted_at |
| -- | -------------- | ---- | ----- | ----------- |
| 1 | 1 | hi | true | 1/1/2000 |
| 2 | 1 | hey | true | 1/1/2001 |
| 3 | 1 | hey | false | 1/1/2002 |
| 4 | 2 | new | true | 1/1/2003 |
What I would like to be able to do is select all the latest versions of unique commend_id_no, where show is equal to true. However, I do not want the query to return id=2.
Steps the query needs to take...
select all the most recent, distinct comment_id_nos. (should return id=3 and id=4)
select where show = true (should only return id=4)
Note: I am actually writing this query in elixir using ecto and would like to be able to do this without using the subquery function. If anyone can answer this in sql I can convert the answer myself. If anyone knows how to answer this in elixir then also feel free to answer.
You can do this without using a subquery using LEFT JOIN:
SELECT c.id, c.comment_id_no, c.text, c.show, c.inserted_at
FROM Comments AS c
LEFT JOIN Comments AS c2
ON c2.comment_id_no = c.comment_id_no
AND c2.inserted_at > c.inserted_at
WHERE c2.id IS NULL
AND c.show = 'true';
I think all other approaches will require a subquery of some sort, this would usually be done with a ranking function:
SELECT c.id, c.comment_id_no, c.text, c.show, c.inserted_at
FROM ( SELECT c.id,
c.comment_id_no,
c.text,
c.show,
c.inserted_at,
ROW_NUMBER() OVER(PARTITION BY c.comment_id_no
ORDER BY c.inserted_at DESC) AS RowNumber
FROM Comments AS c
) AS c
WHERE c.RowNumber = 1
AND c.show = 'true';
Since you have tagged with Postgresql you could also make use of DISTINCT ON ():
SELECT *
FROM ( SELECT DISTINCT ON (c.comment_id_no)
c.id, c.comment_id_no, c.text, c.show, c.inserted_at
FROM Comments AS c
ORDER By c.comment_id_no, inserted_at DESC
) x
WHERE show = 'true';
Examples on DB<>Fiddle
I think you want:
select c.*
from comments c
where c.inserted_at = (select max(c2.inserted_at)
from comments c2
where c2.comment_id_no = c.comment_id_no
) and
c.show = 'true';
I don't understand what this has to do with select distinct. You simply want the last version of a comment, and then to check if you can show that.
EDIT:
In Postgres, I would do:
select c.*
from (select distinct on (comment_id_no) c.*
from comments c
order by c.comment_id_no, c.inserted_at desc
) c
where c.show
distinct on usually has pretty good performance characteristics.
As I told in comments I don't advice to pollute data tables with history/auditory stuff.
And no: "double versioning" suggested by #Josh_Eller in his comment isn't a
good solution too: Not only for complicating queries unnecessarily but also for
being much more expensive in terms of processing and tablespace fragmentation.
Take in mind that UPDATE operations never update anything. They instead
write a whole new version of the row and mark the old one as deleted. That's
why vacuum processes are needed to defragment tablespaces in order to
recover that space.
In any case, apart of suboptimal, that approach forces you to implement more
complex queries to read and write data while in fact, I suppose most of the times you will only need to select, insert, update or even delete single row and only eventually, look its history up.
So the best solution (IMHO) is to simply implement the schema you actually need
for your main task and implement the auditory aside in a separate table and
maintained by a trigger.
This would be much more:
Robust and Simple: Because you focus on single thing every time (Single
Responsibility and KISS principles).
Fast: Auditory operations can be performed in an after trigger so
every time you perform an INSERT, UPDATE, or DELETE any possible lock
within the transaction is yet freed because the database engine knows that its outcome won't change.
Efficient: I.e. an update will, of course, insert a new row and mark
the old one as deleted. But this will be done at a low level by the database engine and, more than that: your auditory data will be fully unfragmented (because you only write there: never update). So the overall fragmentation would be always much less.
That being said, how to implement it?
Suppose this simple schema:
create table comments (
text text,
mtime timestamp not null default now(),
id serial primary key
);
create table comments_audit ( -- Or audit.comments if using separate schema
text text,
mtime timestamp not null,
id integer,
rev integer not null,
primary key (id, rev)
);
...and then this function and trigger:
create or replace function fn_comments_audit()
returns trigger
language plpgsql
security definer
-- This allows you to restrict permissions to the auditory table
-- because the function will be executed by the user who defined
-- it instead of whom executed the statement which triggered it.
as $$
DECLARE
BEGIN
if TG_OP = 'DELETE' then
raise exception 'FATAL: Deletion is not allowed for %', TG_TABLE_NAME;
-- If you want to allow deletion there are a few more decisions to take...
-- So here I block it for the sake of simplicity ;-)
end if;
insert into comments_audit (
text
, mtime
, id
, rev
) values (
NEW.text
, NEW.mtime
, NEW.id
, coalesce (
(select max(rev) + 1 from comments_audit where id = new.ID)
, 0
)
);
return NULL;
END;
$$;
create trigger tg_comments_audit
after insert or update or delete
on public.comments
for each row
execute procedure fn_comments_audit()
;
And that's all.
Notice that in this approach you will have always your current comments data
in comments_audit. You could have instead used the OLD register and only
define the trigger in the UPDATE (and DELETE) operations to avoid it.
But I prefer this approach not only because it gives us an extra redundancy (an
accidental deletion -in case it were allowed or the trigger where accidentally
disabled- on the master table, then we would be able to recover all data from
the auditory one) but also because it simplifies (and optimises) querying the
history when it's needed.
Now you only need to insert, update or select (or even delete if you develop a little more this schema, i.e. by inserting a row with nulls...) in a fully transparent manner just like if it weren't any auditory system. And, when you need that data, you only need to query the auditory table instead.
NOTE: Additionally you could want to include a creation timestamp (ctime). In this case it would be interesting to prevent it of being modified in a BEFORE trigger so I omitted it (for the sake of simplicity again) because you can already guess it from the mtimes in the auditory table (even if you are going to use it in your application it would be very advisable to add it).
If you are running Postgres 8.4 or higher, ROW_NUMBER() is the most efficient solution :
SELECT *
FROM (
SELECT c.*, ROW_NUMBER() OVER(PARTITION BY comment_id_no ORDER BY inserted_at DESC) rn
FROM comments c
WHERE c.show = 'true'
) x WHERE rn = 1
Else, this could also be achieved using a WHERE NOT EXISTS condition, that ensures that you are showing the latest comment :
SELECT c.*
FROM comments c
WHERE
c.show = 'true '
AND NOT EXISTS (
SELECT 1
FROM comments c1
WHERE c1.comment_id_no = c.comment_id_no AND c1.inserted_at > c.inserted_at
)
You have to use group by to get the latest ids and the join to the comments table to filter out the rows where show = false:
select c.*
from comments c inner join (
select comment_id_no, max(id) maxid
from comments
group by comment_id_no
) g on g.maxid = c.id
where c.show = 'true'
I assume that the column id is unique and autoincrement in comments table.
See the demo
Related
I have an image summary table [summary] that will serve as a reporting table in the near future. There is a reference table [views] and a third table that the image team populates [TeamImage]. The summary table has 1 row per part number (table has distinct part numbers) and many columns of image views (TOP, BOT, FRO, BAC, etc.). The [views] table lists each of these views with an id field, which is an IDENTITY field. The [TeamImage] table contains part numbers and views (part number field is not unique as the part numbers will be listed multiple times as they have image views).
Example:
TABLE [summary]
Part_Number | TOP | BOT | FRO | BAC |
12345 | | | | |
67890 | | | | |
TABLE [views]
id | View |
1 | TOP |
2 | BOT |
3 | FRO |
4 | BAC |
TABLE [TeamImage]
PartNum | View |
12345 | TOP |
12345 | BOT |
12345 | FRO |
12345 | BAC |
67890 | FRO |
67890 | BAC |
Here's what I need in the end:
TABLE [summary]
Part_Number | TOP | BOT | FRO | BAC |
12345 | 1 | 1 | 1 | 1 |
67890 | | | 1 | 1 |
I could run several update queries but I have 27 views and about 2 million part numbers. I was hoping I could run something like below, even though I know I cannot use a variable as the column name:
DECLARE #id int = (SELECT max(id) FROM [views]), #ViewType nvarchar(3);
WHILE #id IS NOT NULL
BEGIN
SELECT #ViewType = (SELECT [View] FROM [views] WHERE id = #id);
UPDATE a
SET a.[#ViewType] = '1'
FROM [summary] a
INNER JOIN [TeamImage] b
AND a.[Part_Number] = b.[PartNum]
WHERE b.[View] = #ViewType;
SELECT #id = max(id) FROM [views] WHERE id < #id;
END;
Basically, I was hoping to use a variable to grab the different views from the [views] table (id = 27 down to id=1...could have counted up but doesn't matter) and populate the corresponding field in the [summary] table.
I know the SET a.[#ViewType] = '1' won't work, and a colleague of mine mentioned using VB but didn't know if that really was the most efficient option. I understand that I could use a PIVOT on the [TeamImage] table, but I'm not sure that will allow me to update my [summary] table (which has many more fields in it than just the image views). It still seems I need something that will effectively loop through update queries. I could write 4 update queries, one for each view (although my real table has 27 views), but I need something more dynamic in case we add views in the future.
To create your final summary, you can do via a simple pivot, yet this is fixed to the few codes you've done... but I know SQL does have a PIVOT command, but not directly familiar enough with it.
select
TA.PartNum,
max( case when TA.TeamImage = 'TOP' then '1' else ' ' end ) as TOPview,
max( case when TA.TeamImage = 'BOT' then '1' else ' ' end ) as BOTview,
max( case when TA.TeamImage = 'FRO' then '1' else ' ' end ) as FROview,
max( case when TA.TeamImage = 'BAC' then '1' else ' ' end ) as BACview
from
TeamImage TA
group by
TA.PartNum
Obviously simple to expand, but you can also look into the "PIVOT" syntax
I asked the question a little better here: SQL output as variable in VB.net and was able to receive an answer that worked for what I was looking for. I appreciate DRapp providing a solution through PIVOT, but I think the VB way will be easier for me moving forward. In short, using VB with ExecuteScalar and ExecuteNonQuery, I was able to re-write my query using the variables I had above.
I'm a bit stumped on writing a query (SQL not my strong point).
Say I have the following TABLE1:
CODE NAME SCOPE1 SCOPE2 SEQ
------------------------------------
A a Here 1
B b Here 2
C c Here 3
C c Room 1
A aa Room 2
B bbb Room 3
The business key is CODE + SCOPE1 + SCOPE2, where SCOPE1 and SCOPE2 are always mutually exclusive.
How can I get a distinct result of CODE and NAME given that I need sort by SCOPE1, SCOPE2, and SEQ?
That is, given SCOPE1 = 'Here' and SCOPE2 = 'Room', I would like to get this result:
CODE NAME
---------
A a
B b
C c
A aa
B bbb
Note: C c from Room is not wanted as it's a duplicate to C c from Here.
I do realise the limitation of using DISTINCT with ORDER BY and the best I could come up with was the following:
select distinct CODE, NAME from
(
select CODE, NAME from MYTABLE
where (SCOPE1='Here' or SCOPE2='Room')
order by SCOPE1, SCOPE2, SEQ
);
The above produces the correct pairs but in the wrong sequence. I tried messing around with GROUP BY, but I guess I didn't know enough.
I have to stick with standard SQL (that is, no product-specific SQL constructs, unless it's Oracle, maybe), and I guess with this particular query, it's probably impossible to avoid subselects.
I would be very grateful for any pointers. Thanks in advance.
UPDATE: I've updated the data set, and based on peterm's answer, here's what I have so far: sqlfiddle. The MIN/MAX trick doesn't work well when I start tweaking the sequences.
The assumption is that I will always search for one specific SCOPE1 paired with one specific SCOPE2. But I need all SCOPE1 records to appear before SCOPE2. The idea is that I don't care whether CODE + NAME comes from SCOPE1 or SCOPE2 - I just want unique pairs that are sorted by SCOPE1, SCOPE2, and SEQ.
UPDATE Based on your updated requirements for Oracle
SELECT CODE, NAME
FROM
(
SELECT CODE, NAME,
ROW_NUMBER() OVER (ORDER BY SCOPE1, SCOPE2, SEQ) rnum
FROM Table1
WHERE SCOPE1='Here'
OR SCOPE2='Room'
) q
GROUP BY CODE, NAME
ORDER BY MIN(rnum)
Here is SQLFiddle
To make it work the same way in SQL Server
SELECT CODE, NAME
FROM
(
SELECT CODE, NAME,
ROW_NUMBER() OVER (ORDER BY CASE WHEN SCOPE1 IS NULL
THEN 1 ELSE 0 END, SCOPE1,
CASE WHEN SCOPE2 IS NULL
THEN 2 ELSE 3 END, SCOPE2, SEQ) rnum
FROM Table1
WHERE SCOPE1='Here'
OR SCOPE2='Room'
) q
GROUP BY CODE, NAME
ORDER BY MIN(rnum)
Here is SQLFiddle
Output:
| CODE | NAME |
---------------
| A | a |
| B | b |
| C | c |
| A | aa |
| B | bbb |
Original answer: The only thing I could think of based on your description of requirements
SELECT CODE, NAME
FROM Table1
WHERE SCOPE1='Here'
OR SCOPE2='Room'
GROUP BY CODE, NAME
ORDER BY MIN(SCOPE1), MIN(SCOPE2), MIN(SEQ)
Here is SQLFiddle demo (MySql)
Here is SQLFiddle demo (SQL Server)
Here is SQLFiddle demo (Oracle)
Now in MySql and SQL Server NULLs go first by default therefore you'll get
| CODE | NAME |
---------------
| B | bbb |
| A | a |
| B | b |
| C | c |
In Oracle NULLs go last by default therefore you'll get
| CODE | NAME |
---------------
| A | a |
| B | b |
| C | c |
| B | bbb |
OK. So this is a hard one to explain, but I am replacing the type of a foreign key in a database. To do this I need to update the values in a table that references it. That is all fine and good, and nice and easy to do.
I'm inserting this stuff into a temporary table which will replace the original table, but the insert query isn't at all difficult, it's the select that I get the values from.
However, I also want to keep any entries where the original reference was NULL. Also not hard, I could use a Left Inner Join for that.
But we're not done yet: I don't want the entries for which there is no match in the second table. I've been dinking around with this for 2 hours now, and am no closer to figuring this out than I am to the moon.
Let me give you an example data set:
____________________________
| Inventory || Customer |
|============||============|
| ID Cust || ID Name |
|------------||------------|
| 1 A || 1 A |
| 2 B || 2 B |
| 3 E || 3 C |
| 4 NULL || 4 D |
|____________||____________|
Let's say the database used to use the Customer.Name field as its Primary Key, and I need to change it to a standard int identity(1,1) not null ID. I've added the field with no issues in the Customer table, and kept the Name because I need it for other stuff. I have had no trouble with this in all the tables that do not allow NULLs, but since the "Inventory" table allows something to be associated with No customer, I'm running into troubles.
If I did a left inner join, my results would be:
______________
| Results |
|============|
| ID Cust |
|------------|
| 1 1 |
| 2 2 |
| 3 NULL |
| 4 NULL |
|____________|
However, Inventory #3 was referencing a customer which does not exist. I want that to be filtered out.
This database is my development database, where I hack, slash, and destroy things with wanton disregard for validity. So a lot of links in these tables are no longer valid.
The next step is replicating this process in the beta-testing environment, where bad records shouldn't exist, but I can't guarantee that. So I'd like to keep the filter, if possible.
The query I have right now is using a sub-query to find all rows in Inventory whose CustID either exists in Customers, or is null. It then tries to only grab the value from those rows which the subquery found. Here's the translated query:
insert into results
(
ID,
Cust
)
select
inv.ID, cust.ID
from Inventory inv, Customer cust
where inv.ID in
(
select inv.ID from Inventory inv, Customer cust
where inv.Cust is null
or cust.Name = inv.Cust
)
and cust.Name = inv.Cust
But, as I'm sure you can see, this query isn't right. I've tried using 2, 3 subqueries, inner joins, left joins, bleh. The results of this query, and many others I've tried (that weren't horribly, horribly wrong) are:
______________
| Results |
|============|
| ID Cust |
|------------|
| 1 1 |
| 2 2 |
|____________|
Which is essentially an inner-join. Considering my actual data has around 1100 records which have NULL values in that field, I don't think truncating them is the answer.
The answer I'm looking for is:
______________
| Results |
|============|
| ID Cust |
|------------|
| 1 1 |
| 2 2 |
| 4 NULL |
|____________|
The trickiest part of this insert into select is the fact that I'm looking to insert either a value from another table, or essentially a value from this table or the literal NULL. That just isn't something I know how to do; I'm still getting the hang of SQL.
Since I'm inserting the results of this query into a table, I've considered doing the insert using a select which leaves out the NULL values and un-matched records, then going back through and adding in all the NULL records, but I really want to learn how to do the more advanced queries like this.
So do any of yous folks have any ideas? 'Cause I'm lost.
How about a union?
Select all records where ID and Cust match and union that with all records where ID matches and inventory.cust is null.
I've got a table events that contains events for users, for example:
PK | user | event_type | timestamp
--------------------------------
1 | ab | DTV | 1
2 | ab | DTV | 2
3 | ab | CPVR | 3
4 | cd | DTV | 1
5 | cd | DTV | 2
6 | cd | DTV | 3
What I want to do is keep only one event per user, namely the one with the latest timestamp and event_type = 'DTV'.
After applying the delete to the example above, the table should look like this:
PK | user | event_type | timestamp
--------------------------------
2 | ab | DTV | 2
6 | cd | DTV | 3
Can any one of you come up with something that accomplishes this task?
Update: I'm using Sqlite. This is what I have so far:
delete from events
where id not in (
select id from (
select id, user, max(timestamp)
from events
where event_type = 'DTV'
group by user)
);
I'm pretty sure this can be improved upon. Any ideas?
I think you should be able to do something like this:
delete from events
where (user, timestamp) not in (
select user, max(timestamp)
from events
where event_type = 'DTV'
group by user
)
You could potentially do some more sophisticated tricks like table or partition replacement, depending on the database you're working with
If using sql server roo5/2008 then use following sql:
;WITH ce
AS (SELECT *,
Row_number()
OVER (
partition BY [user], event_type
ORDER BY timestamp DESC) AS rownumber
FROM emp)
DELETE FROM ce
WHERE rownumber <> 1
OR event_type <> 'DTV'
Your solution doesn't seem to me reliable enough, because your subquery is pulling a column that is neither aggregated nor added to GROUP BY. I mean, I am not an experienced SQLite user and your solution did work when I tested it. And if there's any confirmation that the id column is always reliably correlated to the MAX(timestamp) value in this situation, fine, your approach seems quite a decent one.
But if you are as unsure about your solution as I am, you could try the following:
DELETE FROM events
WHERE NOT EXISTS (
SELECT *
FROM (
SELECT MAX(timestamp) AS ts
FROM events e
WHERE event_type = 'DTV'
AND user = events.user
) s
WHERE ts = events.timestamp
);
The inner instance of events is assigned a different alias so that the events alias could be used to unambiguously reference the outer instance of the table (the one the DELETE command is actually being applied to). This solution does assume that timestamp is unique per user, though.
A working example can be run and played with on SQL Fiddle.
I have a query which is starting to cause some concern in my application. I'm trying to understand this EXPLAIN statement better to understand where indexes are potentially missing:
+----+-------------+-------+--------+---------------+------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+------------+---------+-------------------------------+------+---------------------------------+
| 1 | SIMPLE | s | ref | client_id | client_id | 4 | const | 102 | Using temporary; Using filesort |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | www_foo_com.s.user_id | 1 | |
| 1 | SIMPLE | a | ref | session_id | session_id | 4 | www_foo_com.s.session_id | 1 | Using index |
| 1 | SIMPLE | h | ref | email_id | email_id | 4 | www_foo_com.a.email_id | 10 | Using index |
| 1 | SIMPLE | ph | ref | session_id | session_id | 4 | www_foo_com.s.session_id | 1 | Using index |
| 1 | SIMPLE | em | ref | session_id | session_id | 4 | www_foo_com.s.session_id | 1 | |
| 1 | SIMPLE | pho | ref | session_id | session_id | 4 | www_foo_com.s.session_id | 1 | |
| 1 | SIMPLE | c | ALL | userfield | NULL | NULL | NULL | 1108 | |
+----+-------------+-------+--------+---------------+------------+---------+-------------------------------+------+---------------------------------+
8 rows in set (0.00 sec)
I'm trying to understand where my indexes are missing by reading this EXPLAIN statement. Is it fair to say that one can understand how to optimize this query without seeing the query at all and just look at the results of the EXPLAIN?
It appears that the ALL scan against the 'c' table is the achilles heel. What's the best way to index this based on constant values as recommended on MySQL's documentation? |
Note, I also added an index to userfield in the cdr table and that hasn't done much good either.
Thanks.
--- edit ---
Here's the query, sorry -- don't know why I neglected to include it the first pass through.
SELECT s.`session_id` id,
DATE_FORMAT(s.`created`,'%m/%d/%Y') date,
u.`name`,
COUNT(DISTINCT c.id) calls,
COUNT(DISTINCT h.id) emails,
SEC_TO_TIME(MAX(DISTINCT c.duration)) duration,
(COUNT(DISTINCT em.email_id) + COUNT(DISTINCT pho.phone_id) > 0) status
FROM `fa_sessions` s
LEFT JOIN `fa_users` u ON s.`user_id`=u.`user_id`
LEFT JOIN `fa_email_aliases` a ON a.session_id = s.session_id
LEFT JOIN `fa_email_headers` h ON h.email_id = a.email_id
LEFT JOIN `fa_phones` ph ON ph.session_id = s.session_id
LEFT JOIN `fa_email_aliases` em ON em.session_id = s.session_id AND em.status = 1
LEFT JOIN `fa_phones` pho ON pho.session_id = s.session_id AND pho.status = 1
LEFT JOIN `cdr` c ON c.userfield = ph.phone_id
WHERE s.`partner_id`=1
GROUP BY s.`session_id`
I assume you've looked here to get more info about what it is telling you. Obviously the ALL means its going through all of them. The using temporary and using filesort are talked about on that page. You might want to look at that.
From the page:
Using filesort
MySQL must do an extra pass to find
out how to retrieve the rows in sorted
order. The sort is done by going
through all rows according to the join
type and storing the sort key and
pointer to the row for all rows that
match the WHERE clause. The keys then
are sorted and the rows are retrieved
in sorted order. See Section 7.2.12,
“ORDER BY Optimization”.
Using temporary
To resolve the query, MySQL needs to
create a temporary table to hold the
result. This typically happens if the
query contains GROUP BY and ORDER BY
clauses that list columns differently.
I agree that seeing the query might help to figure things out better.
My advice?
Break the query into 2 and use a temporary table in the middle.
Reasonning
The problem appears to be that table c is being table scanned, and that this is the last table in the query. This is probably bad: if you have a table scan, you want to do it at the start of the query, so it's only done once.
I'm not a MySQL guru, but I have spent a whole lot of time optimising queries on other DBs. It looks to me like the optimiser hasn't worked out that it should start with c and work backwards.
The other thing that strikes me is that there are probably too many tables in the join. Most optimisers struggle with more than 4 tables (because the number of possible table orders is growing exponentially, so checking them all becomes impractical).
Having too many tables in a join is the root of 90% of performance problems I have seen.
Give it a go, and let us know how you get on. If it doesn't help, please post the SQL, table definitions and indeces, and I'll take another look.
General Tips
Feel free to look at this answer I gave on general performance tips.
A great resource
MySQL Documentation for EXPLAIN
Well looking at the query would be useful, but there's at least one thing that's obviously worth looking into - the final line shows the ALL type for that part of the query, which is generally not great to see. If the suggested possible key (userfield) makes sense as an added index to table c, it might be worth adding it and seeing if that reduces the rows returned for that table in the search.
Query Plan
The query plan we might hope the optimiser would choose would be something like:
start with sessions where partner_id=1 , possibly using an index on partner_id,
join sessions to users, using an index on user_id
join sessions to phones, where status=1, using an index on session_id and possibly status
join sessions to phones again using an index on session_id and phone_id **
join phones to cdr using an index on userfield
join sessions to email_aliases, where status=1 using an index on session_id and possibly status
join sessions to email_aliases again using an index on session_id and email_id **
join email_aliases to email_headers using an index on email_id
** by putting 2 fields in these indeces, we enable the optimiser to join to the table using session_id, and immediately find out the associated phone_id or email_id without having to read the underlying table. This technique saves us a read, and can save a lot of time.
Indeces I would create:
The above query plan suggests these indeces:
fa_sessions ( partner_id, session_id )
fa_users ( user_id )
fa_email_aliases ( session_id, email_id )
fa_email_headers ( email_id )
fa_email_aliases ( session_id, status )
fa_phones ( session_id, status, phone_id )
cdr ( userfield )
Notes
You will almost certainly get acceptable performance without creating all of these.
If any of the tables are small ( less than 100 rows ) then it's probably not worth creating an index.
fa_email_aliases might work with ( session_id, status, email_id ), depending on how the optimiser works.