help optimizing query (shows strength of two-way relationships between contacts) - sql

i have a contact_relationship table that stores the reported strength of a relationship between one contact and another at a given point in time.
mysql> desc contact_relationship;
+------------------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+-----------------------------+
| relationship_id | int(11) | YES | | NULL | |
| contact_id | int(11) | YES | MUL | NULL | |
| other_contact_id | int(11) | YES | | NULL | |
| strength | int(11) | YES | | NULL | |
| recorded | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+-----------+------+-----+-------------------+-----------------------------+
now i want to get a list of two-way relationships between contacts (meaning there are two rows, one with contact a specifying a relationship strength with contact b and another with contact b specifying a strength for contact a -- the strength of the two-way relationship is the smaller of those two strength values).
this is the query i've come up with but it is pretty slow:
select
mrcr1.contact_id,
mrcr1.other_contact_id,
case when (mrcr1.strength < mrcr2.strength) then
mrcr1.strength
else
mrcr2.strength
end strength
from (
select
cr1.*
from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id,
other_contact_id
) as cr2
inner join contact_relationship cr1 on
cr1.contact_id = cr2.contact_id
and cr1.other_contact_id = cr2.other_contact_id
and cr1.recorded = cr2.max_recorded
) as mrcr1,
(
select
cr3.*
from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id,
other_contact_id
) as cr4
inner join contact_relationship cr3 on
cr3.contact_id = cr4.contact_id
and cr3.other_contact_id = cr4.other_contact_id
and cr3.recorded = cr4.max_recorded
) as mrcr2
where
mrcr1.contact_id = mrcr2.other_contact_id
and mrcr1.other_contact_id = mrcr2.contact_id
and mrcr1.contact_id != mrcr1.other_contact_id
and mrcr2.contact_id != mrcr2.other_contact_id
and mrcr1.contact_id <= mrcr1.other_contact_id;
anyone have any recommendations of how to speed it up?
note that because a user may specify the strength of his relationship with a particular user more than once, you must only grab the most recent record for each pair of contacts.
update: here is the result of explaining the query...
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 36029 | Using where |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 36029 | Using where; Using join buffer |
| 4 | DERIVED | <derived5> | ALL | NULL | NULL | NULL | NULL | 36021 | |
| 4 | DERIVED | cr3 | ref | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10 | cr4.contact_id,cr4.other_contact_id | 1 | Using where |
| 5 | DERIVED | contact_relationship | index | NULL | contact_relationship_index_3 | 14 | NULL | 37973 | Using index |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 36021 | |
| 2 | DERIVED | cr1 | ref | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10 | cr2.contact_id,cr2.other_contact_id | 1 | Using where |
| 3 | DERIVED | contact_relationship | index | NULL | contact_relationship_index_3 | 14 | NULL | 37973 | Using index |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+

You are losing a lot lot lot of time selecting the most recent record. 2 options :
1- Change the way you are stocking data, and have a table with only recent record, and an other table more like historical record.
2- Use analytic request to select the most recent record, if your DBMS allows you to do this. Something like
Select first_value(strength) over(partition by contact_id, other_contact_id order by recorded desc)
from contact_relationship
Once you have the good record line, I think your query will go a lot faster.

Scorpi0's answer got me to thinking maybe I could use a temp table...
create temporary table mrcr1 (
contact_id int,
other_contact_id int,
strength int,
index mrcr1_index_1 (
contact_id,
other_contact_id
)
) replace as
select
cr1.contact_id,
cr1.other_contact_id,
cr1.strength from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id, other_contact_id
) as cr2
inner join
contact_relationship cr1 on
cr1.contact_id = cr2.contact_id
and cr1.other_contact_id = cr2.other_contact_id
and cr1.recorded = cr2.max_recorded;
which i had to do twice (second time into a temp table named mrcr2) because mysql has a limitation where you can't alias the same temp table twice in one query.
with my two temp tables created my query then becomes:
select
mrcr1.contact_id,
mrcr1.other_contact_id,
case when (mrcr1.strength < mrcr2.strength) then
mrcr1.strength
else
mrcr2.strength
end strength
from
mrcr1,
mrcr2
where
mrcr1.contact_id = mrcr2.other_contact_id
and mrcr1.other_contact_id = mrcr2.contact_id
and mrcr1.contact_id != mrcr1.other_contact_id
and mrcr2.contact_id != mrcr2.other_contact_id
and mrcr1.contact_id <= mrcr1.other_contact_id;

Related

SQL select columns ordered by a query

I have tried and googled everything I can think of but I couldn't find an answer I have this simple database
technicien
+------------------+------------+------+-----+
| Field | Type | Null | Key |
+------------------+------------+------+-----+
| employe id | number(4) | NO | PRI |
| name | char(11) | NO | |
| salary | int(11) | NO | |
+------------------+------------+------+-----+
maintenance
+------------------+------------+------+---------+
| Field | Type | Null | Key |
+------------------+------------+------+---------+
| employe id | number(4) | NO | foreign |
| IP | char(11) | NO | foreign |
| maintenance_date | int(11) | NO | |
+------------------+------------+------+---------+
pc
+------------------+------------+------+---------+
| Field | Type | Null | Key |
+------------------+------------+------+---------+
| value | number(4) | NO | foreign |
| IP | char(11) | NO | PRI |
| price | int(11) | NO | |
+------------------+------------+------+---------+
what I need is to show the name, id, and salary of every technicien who has done a maintenance ordered by the total number of maintenances effected.
SELECT technicien.name, technicien.employeId, technicien.salary
FROM technicien
INNER JOIN maintenance
ON maintenance.employeId = technicien.employeId
INNER JOIN pc
ON maintenance.IP = pc.IP
ORDER BY COUNT(maintenance.maintenance_date)
What you need is to GROUPing with respect to the columns of table technicien, and ORDERing by count of maintenance tasks :
select t.*, count(0) cnt_maintenance
from technicien t
inner join maintenance m on ( t.employee_id = m.employee_id )
inner join pc p on ( p.value = m.IP )
group by t.employee_id, t.name, t.salary
order by count(0);
SQL Fiddle Demo

Sub-query producing an empty set

I am trying to provide a list of active quarterbacks who play for teams who had 20 or more sacks during the season.
I have information in two tables, one of them is a view(v_active_quarterbacks) which shows which quarterbacks are active, the other is the table team_game_stats.
I created the command that lists the teams that have 20+ sacks.
SELECT SUM(sacks)
FROM team_game_stats
GROUP BY team_code
HAVING SUM(sacks) > 20;
I now need to connect this to v_active_quarterbacks so that I can get a list. I have tried the following but it just provides an empty set.
SELECT player_code
FROM v_active_quaterbacks
INNER JOIN team_game_stats ON v_active_quaterbacks.team_code = team_game_stats.team_code
WHERE sacks IN (SELECT SUM(sacks)
FROM team_game_stats
GROUP BY team_code
HAVING SUM(sacks) > 20);
Here is the view description:
+-----------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+-------+
| player_code | int(11) | NO | | NULL | |
| first_name | varchar(30) | YES | | NULL | |
| last_name | varchar(30) | YES | | NULL | |
| team_code | int(11) | YES | | NULL | |
| uniform_number | varchar(3) | YES | | NULL | |
| passes_player_code | int(11) | YES | | NULL | |
| COUNT(passes.attempt) | bigint(21) | NO | | 0 | |
+-----------------------+-------------+------+-----+---------+-------+
At this point I am confused and stuck. Any help would be appreciated.
You could use a an inner join on subselect for team_code and sum
SELECT player_code , T.sum_sacks
FROM v_active_quaterbacks
INNER JOIN team_game_stats ON v_active_quaterbacks.team_code = team_game_stats.team_code
INNER JOIN (
SELECT team_code, SUM(sacks) sum_sacks
FROM team_game_stats
GROUP BY team_code
HAVING SUM(sacks) > 20
) T on T.team_code = v_active_quaterbacks.team_code

MySQL Join Sub-Select Optimization

UPDATE users u
JOIN (select count(*) as job_count, user_id from job_responses where date_created > subdate(now(), 30) group by user_id) j
ON j.user_id = u.user_id
JOIN users_profile p
ON p.user_id = u.user_id
JOIN users_roles_xref x
ON x.user_id = u.user_id
SET num_job_responses = least(j.job_count, 5)
WHERE u.status = 1 AND p.visible = "Y" AND x.role_id = 2000
And explain tells me this:
+----+-------------+---------------+--------+---------------------------------+---------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+---------------------------------+---------------+---------+----------------------+--------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 23008 | |
| 1 | PRIMARY | u | eq_ref | PRIMARY,user_id,status,status_2 | PRIMARY | 4 | j.user_id | 1 | Using where |
| 1 | PRIMARY | p | ref | user_id,visible | user_id | 4 | scoop_jazz.u.user_id | 2 | Using where |
| 1 | PRIMARY | x | ref | index_role_id,index_user_id | index_user_id | 4 | scoop_jazz.u.user_id | 3 | Using where |
| 2 | DERIVED | job_responses | range | date_created | date_created | 4 | NULL | 135417 | Using where; Using temporary; Using filesort |
+----+-------------+---------------+--------+---------------------------------+---------------+---------+----------------------+--------+----------------------------------------------+
I'm having trouble optimizing this query with explain. Any way to do it?
You will want to add an index on job_responses(date_created, user_id).
Then you can drop the current single-column index on date_created.
The most expensive part of the query is the subquery
(select count(*) as job_count, user_id
from job_responses
where date_created > subdate(now(), 30)
group by user_id)
The only two fields of note are user_id and date_created. There is an index on date_created that has been chosen to satisfy date_created in last 30 days. However, it will have to go back to the data pages to retrieve user_id, then group by it.
If you had a composite index, the user_id is available directly from the index. It also covers the single-column index date_created, so you can drop that one.
It ended up being easier and way faster to generate a temporary table, populate it, and then use a join on that table. I was "chunking" the original query, which ends up being very expensive when it has to create and destroy tables created by sub-selects.

SQL Performance: Using Union and Subqueries

Hi stackoverflow(My first question!),
We're doing something like an SNS, and got a question about optimizing queries.
Using mysql 5.1, the current table was created with:
CREATE TABLE friends(
user_id BIGINT NOT NULL,
friend_id BIGINT NOT NULL,
PRIMARY KEY (user_id, friend_id)
) ENGINE INNODB;
Sample data is populated like:
INSERT INTO friends VALUES
(1,2),
(1,3),
(1,4),
(1,5),
(2,1),
(2,3),
(2,4),
(3,1),
(3,2),
(4,1),
(4,2),
(5,1),
(5,6),
(6,5),
(7,8),
(8,7);
The business logic: we need to figure out which users are friends or friends of friends for a given user.
The current query for this for a user with user_id=1 is:
SELECT friend_id FROM friends WHERE user_id = 1
UNION
SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
);
The expected result is(order doesn't matter):
2
3
4
5
1
6
As you can see, the above query performs the subquery "SELECT friend_id FROM friends WHERE user_id = 1" twice.
So, here is the question. If performance is your primary concern, how would you change the above query or schema?
Thanks in advance.
In this particular case, you can use a JOIN:
SELECT DISTINCT f2.friend_id
FROM friends AS f1
JOIN friends AS f2 ON f1.friend_id=f2.user_id OR f2.user_id=1
WHERE f1.user_id=1;
Examining each query suggests the JOIN will about as performant as the UNION in a big-O sense, though perhaps faster by a constant factor. Jasie's query looks like it might be big-O faster.
EXPLAIN SELECT friend_id FROM friends WHERE user_id = 1
UNION
SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
);
+----+--------------------+------------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
| 1 | PRIMARY | friends | ref | PRIMARY | PRIMARY | 8 | const | 4 | Using index |
| 2 | UNION | friends | index | NULL | PRIMARY | 16 | NULL | 16 | Using where; Using index; Using temporary |
| 3 | DEPENDENT SUBQUERY | friends | eq_ref | PRIMARY | PRIMARY | 16 | const,func | 1 | Using index |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------------+------------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
EXPLAIN SELECT DISTINCT f2.friend_id
FROM friends AS f1
JOIN friends AS f2
ON f1.friend_id=f2.user_id OR f2.user_id=1
WHERE f1.user_id=1;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------------------+
| 1 | SIMPLE | f1 | ref | PRIMARY | PRIMARY | 8 | const | 4 | Using index; Using temporary |
| 1 | SIMPLE | f2 | index | PRIMARY | PRIMARY | 16 | NULL | 16 | Using where; Using index; Using join buffer |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+---------------------------------------------+
EXPLAIN SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
) OR user_id = 1;
+----+--------------------+---------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
| 1 | PRIMARY | friends | index | PRIMARY | PRIMARY | 16 | NULL | 16 | Using where; Using index; Using temporary |
| 2 | DEPENDENT SUBQUERY | friends | eq_ref | PRIMARY | PRIMARY | 16 | const,func | 1 | Using index |
+----+--------------------+---------+--------+---------------+---------+---------+------------+------+-------------------------------------------+
No need for the UNION. Just include an OR with the user_id of the beginning user:
SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
) OR user_id = 1;
+-----------+
| friend_id |
+-----------+
| 2 |
| 3 |
| 4 |
| 5 |
| 1 |
| 6 |
+-----------+

4 table query / join. getting duplicate rows

So I have written a query that will grab an order (this is for an ecommerce type site), and from that order id it will get all order items (ecom_order_items), print options (c_print_options) and images (images). The eoi_p_id is currently a foreign key from the images table.
This works fine and the query is:
SELECT
eoi_parentid, eoi_p_id, eoi_po_id, eoi_quantity,
i_id, i_parentid,
po_name, po_price
FROM ecom_order_items, images, c_print_options WHERE eoi_parentid = '1' AND i_id = eoi_p_id AND po_id = eoi_po_id;
The above would grab all the stuff I need for order #1
Now to complicate things I added an extra table (ecom_products), which needs to act in a similar way to the images table. The eoi_p_id can also point at a foreign key in this table too. I have added an extra field 'eoi_type' which will either have the value 'image', or 'product'.
Now items in the order could be made up of a mix of items from images or ecom_products. Whatever I try it either ends up with too many records, wont actually output any with eoi_type = 'product', and just generally wont work. Any ideas on how to achieve what I am after? Can provide SQL samples if needed?
SELECT
eoi_id, eoi_parentid, eoi_p_id, eoi_po_id, eoi_po_id_2, eoi_quantity, eoi_type,
i_id, i_parentid,
po_name, po_price, po_id,
ep_id
FROM ecom_order_items, images, c_print_options, ecom_products WHERE eoi_parentid = '9' AND i_id = eoi_p_id AND po_id = eoi_po_id
The above outputs duplicate rows and doesnt work as expected. Am I going about this the wrong way? Should I have seperate foreign key fields for the eoi_p_id depending it its an image or a product?
Should I be using JOINs?
Here is a mysql explain of the tables in question
ecom_products
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| ep_id | int(8) | NO | PRI | NULL | auto_increment |
| ep_title | varchar(255) | NO | | NULL | |
| ep_link | text | NO | | NULL | |
| ep_desc | text | NO | | NULL | |
| ep_imgdrop | text | NO | | NULL | |
| ep_price | decimal(6,2) | NO | | NULL | |
| ep_category | varchar(255) | NO | | NULL | |
| ep_hide | tinyint(1) | NO | | 0 | |
| ep_featured | tinyint(1) | NO | | 0 | |
+-------------+--------------+------+-----+---------+----------------+
ecom_order_items
+--------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+----------------+
| eoi_id | int(8) | NO | PRI | NULL | auto_increment |
| eoi_parentid | int(8) | NO | | NULL | |
| eoi_type | varchar(32) | NO | | NULL | |
| eoi_p_id | int(8) | NO | | NULL | |
| eoi_po_id | int(8) | NO | | NULL | |
| eoi_quantity | int(4) | NO | | NULL | |
+--------------+-------------+------+-----+---------+----------------+
c_print_options
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| po_id | int(8) | NO | PRI | NULL | auto_increment |
| po_name | varchar(255) | NO | | NULL | |
| po_price | decimal(6,2) | NO | | NULL | |
+------------+--------------+------+-----+---------+----------------+
images
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| i_id | int(8) | NO | PRI | NULL | auto_increment |
| i_filename | varchar(255) | NO | | NULL | |
| i_data | longtext | NO | | NULL | |
| i_parentid | int(8) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
You're missing a join condition for ecom_products either in the WHERE or FROM Clause. This is how it would be done using ANSI-92 joins
SELECT
eoi_id,
eoi_parentid,
eoi_p_id,
eoi_po_id,
eoi_po_id_2,
eoi_quantity,
eoi_type,
i_id,
i_parentid,
po_name,
po_price,
po_id,
ep_id
FROM
ecom_order_items,
LEFT JOIN images
ON i_id = eoi_p_id
LEFT JOIN c_print_options
ON po_id = eoi_po_id
INNER JOIN ecom_products
ON eoi_p_id = ep_id
WHERE
eoi_parentid = '9'
ANSI 92 joins are preferred and its a little clearer what's a join and what's filtering. That said you could just add AND eoi_p_id = ep_id to you where clause.
This is how I'd write the first query. I prefer to use joins.
SELECT eoi_parentid, eoi_p_id, eoi_po_id, eoi_quantity, i_id, i_parentid, po_name, po_price
FROM ecom_order_items
INNER JOIN images
ON i_id = eoi_p_id
INNER JOIN c_print_options
ON po_id = eoi_po_id
WHERE eoi_parentid = '1'
For your second query I would use a UNION on two queries, one for images and one for products.
SELECT eoi_id, eoi_parentid, eoi_p_id, eoi_po_id, eoi_po_id_2, eoi_quantity, eoi_type, i_id, i_parentid, po_name, po_price, po_id, ep_id
FROM ecom_order_items
INNER JOIN images
ON i_id = eoi_p_id
INNER JOIN c_print_options
ON po_id = eoi_po_id
WHERE eoi_type = 'image' AND i_id = eoi_p_id --Image conditions
AND eoi_parentid = '9'
AND po_id = eoi_po_id
UNION
SELECT eoi_id, eoi_parentid, eoi_p_id, eoi_po_id, eoi_po_id_2, eoi_quantity, eoi_type, i_id, i_parentid, po_name, po_price, po_id, ep_id
FROM ecom_order_items
INNER JOIN images
ON i_id = eoi_p_id
INNER JOIN c_print_options
ON po_id = eoi_po_id
WHERE eoi_type = 'product' AND ep_id = eoi_p_id -- Product conditions
AND eoi_parentid = '9'
AND po_id = eoi_po_id