I have the following data structure and data:
CREATE TABLE `parent` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(10) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `parent` VALUES(1, 'parent 1');
INSERT INTO `parent` VALUES(2, 'parent 2');
CREATE TABLE `other` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(10) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `other` VALUES(1, 'other 1');
INSERT INTO `other` VALUES(2, 'other 2');
CREATE TABLE `relationship` (
`id` int(11) NOT NULL auto_increment,
`parent_id` int(11) NOT NULL,
`other_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `relationship` VALUES(1, 1, 1);
INSERT INTO `relationship` VALUES(2, 1, 2);
INSERT INTO `relationship` VALUES(3, 2, 1);
I want to find the the parent records with both other's 1 & 2.
This is what I've figured out, but I'm wondering if there is a better way:
SELECT p.id, p.name
FROM parent AS p
LEFT JOIN relationship AS r1 ON (r1.parent_id = p.id)
LEFT JOIN relationship AS r2 ON (r2.parent_id = p.id)
WHERE r1.other_id = 1 AND r2.other_id = 2;
The result is 1, "parent 1" which is correct. The problem is that once you get a list of 5+ joins, it gets messy and as the relationship table grows, it gets slow.
Is there a better way?
I'm using MySQL and PHP, but this is probably pretty generic.
Ok, I tested this. The queries from best to worst were:
Query 1: Joins (0.016s; basically instant)
SELECT p.id, name
FROM parent p
JOIN relationship r1 ON p.id = r1.parent_id AND r1.other_id = 100
JOIN relationship r2 ON p.id = r2.parent_id AND r2.other_id = 101
JOIN relationship r3 ON p.id = r3.parent_id AND r3.other_id = 102
JOIN relationship r4 ON p.id = r4.parent_id AND r4.other_id = 103
Query 2: EXISTS (0.625s)
SELECT id, name
FROM parent p
WHERE EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 100)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 101)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND other_id = 102)
AND EXISTS (SELECT 1 FROM relationship WHERE parent_id = p.id AND oth
Query 3: Aggregate (1.016s)
SELECT p.id, p.name
FROM parent p
WHERE (SELECT COUNT(*) FROM relationship WHERE parent_id = p.id AND other_id IN (100,101,102,103))
Query 4: UNION Aggregate (2.39s)
SELECT id, name FROM (
SELECT p1.id, p1.name
FROM parent AS p1 LEFT JOIN relationship as r1 ON(r1.parent_id=p1.id)
WHERE r1.other_id = 100
UNION ALL
SELECT p2.id, p2.name
FROM parent AS p2 LEFT JOIN relationship as r2 ON(r2.parent_id=p2.id)
WHERE r2.other_id = 101
UNION ALL
SELECT p3.id, p3.name
FROM parent AS p3 LEFT JOIN relationship as r3 ON(r3.parent_id=p3.id)
WHERE r3.other_id = 102
UNION ALL
SELECT p4.id, p4.name
FROM parent AS p4 LEFT JOIN relationship as r4 ON(r4.parent_id=p4.id)
WHERE r4.other_id = 103
) a
GROUP BY id, name
HAVING count(*) = 4
Actually the above was producing the wrong data so it's either wrong or I did something wrong with it. Whatever the case, the above is just a bad idea.
If that's not fast then you need to look at the explain plan for the query. You're probably just lacking appropriate indices. Try it with:
CREATE INDEX ON relationship (parent_id, other_id)
Before you go down the route of aggregation (SELECT COUNT(*) FROM ...) you should read SQL Statement - “Join” Vs “Group By and Having”.
Note: The above timings are based on:
CREATE TABLE parent (
id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE other (
id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE relationship (
id INT PRIMARY KEY,
parent_id INT,
other_id INT
);
CREATE INDEX idx1 ON relationship (parent_id, other_id);
CREATE INDEX idx2 ON relationship (other_id, parent_id);
and nearly 800,000 records created with:
<?php
ini_set('max_execution_time', 600);
$start = microtime(true);
echo "<pre>\n";
mysql_connect('localhost', 'scratch', 'scratch');
if (mysql_error()) {
echo "Connect error: " . mysql_error() . "\n";
}
mysql_select_db('scratch');
if (mysql_error()) {
echo "Selct DB error: " . mysql_error() . "\n";
}
define('PARENTS', 100000);
define('CHILDREN', 100000);
define('MAX_CHILDREN', 10);
define('SCATTER', 10);
$rel = 0;
for ($i=1; $i<=PARENTS; $i++) {
query("INSERT INTO parent VALUES ($i, 'Parent $i')");
$potential = range(max(1, $i - SCATTER), min(CHILDREN, $i + SCATTER));
$elements = sizeof($potential);
$other = rand(1, min(MAX_CHILDREN, $elements - 4));
$j = 0;
while ($j < $other) {
$index = rand(0, $elements - 1);
if (isset($potential[$index])) {
$c = $potential[$index];
$rel++;
query("INSERT INTO relationship VALUES ($rel, $i, $c)");
unset($potential[$index]);
$j++;
}
}
}
for ($i=1; $i<=CHILDREN; $i++) {
query("INSERT INTO other VALUES ($i, 'Other $i')");
}
$count = PARENTS + CHILDREN + $rel;
$stop = microtime(true);
$duration = $stop - $start;
$insert = $duration / $count;
echo "$count records added.\n";
echo "Program ran for $duration seconds.\n";
echo "Insert time $insert seconds.\n";
echo "</pre>\n";
function query($str) {
mysql_query($str);
if (mysql_error()) {
echo "$str: " . mysql_error() . "\n";
}
}
?>
So once again joins carry the day.
Given that parent table contains unique key on (parent_id, other_id) you can do this:
select p.id, p.name
from parent as p
where (select count(*)
from relationship as r
where r.parent_id = p.id
and r.other_id in (1,2)
) >= 2
Simplifying a bit, this should work, and efficiently.
SELECT DISTINCT p.id, p.name
FROM parent p
INNER JOIN relationship r1 ON p.id = r1.parent_id AND r1.other_id = 1
INNER JOIN relationship r2 ON p.id = r2.parent_id AND r2.other_id = 2
will require at least one joined record for each "other" value. And the optimizer should know it only has to find one match each, and it only needs to read the index, not either of the subsidiary tables, one of which isn't even referenced at all.
I haven't actually tested it, but something along the lines of:
SELECT id, name FROM (
SELECT p1.id, p1.name
FROM parent AS p1 LEFT JOIN relationship as r1 ON(r1.parent_id=p1.id)
WHERE r1.other_id = 1
UNION ALL
SELECT p2.id, p2.name
FROM parent AS p2 LEFT JOIN relationship as r2 ON(r2.parent_id=p2.id)
WHERE r2.other_id = 2
-- etc
) GROUP BY id, name
HAVING count(*) = 2
The idea is you don't have to do multi-way joins; just concatenate the results of regular joins, group by your ids, and pick the rows that showed up in every segment.
This is a common problem when searching multiple associates via a many to many join. This is often encountered in services using the 'tag' concept e.g. Stackoverflow
See my other post on a better architecture for tag (in your case 'other') storage
Searching is a two step process:
Find all possible candiates of TagCollections that have any/all the tags you require (may be easier using a cursor of loop construct)
Select data based that matches TagCollection
Performance is always faster due to there being significantly less TagCollections than data items to search
You can do it with a nested select , I tested it in MSSQL 2005 but as you said it should be pretty generic
SELECT * FROM parent p
WHERE p.id in(
SELECT r.parent_Id
FROM relationship r
WHERE r.parent_id in(1,2)
GROUP BY r.parent_id
HAVING COUNT(r.parent_Id)=2
)
and the number 2 in COUNT(r.parent_Id)=2 is according to the number of joins you need)
If you can put your list of other_id values into a table that would be ideal. The code below looks for parents with AT LEAST the ids given. If you want it to have EXACTLY the same ids (i.e. no extras) you would have to change the query slightly.
SELECT
p.id,
p.name
FROM
My_Other_IDs MOI
INNER JOIN Relationships R ON
R.other_id = MOI.other_id
INNER JOIN Parents P ON
P.parent_id = R.parent_id
GROUP BY
p.parent_id,
p.name
HAVING
COUNT(*) = (SELECT COUNT(*) FROM My_Other_IDs)
Related
I have the following tables:
CREATE TABLE forms
(
ID INT NOT NULL,
NAME TEXT NOT NULL,
TITLE TEXT NOT NULL
);
CREATE TABLE new_forms
(
ID INT NOT NULL,
NAME TEXT NULL,
TITLE TEXT NULL
);
INSERT INTO forms VALUES (0, 'test', 'test');
INSERT INTO new_forms VALUES (0, 'new_test', NULL);
And I'm using the following query:
INSERT INTO forms(id, name, title)
SELECT
1, COALESCE(nf.name, f.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id;
SELECT * FROM forms;
The idea is to add both rows that match to the table.
In this example this two new records should be added:
1 test test
1 new_test test
But it's only adding the last one.
I have tried with all the join and none of them worked.
Fiddle
Thanks
You are using a join in the query which will give you only 1 row. If you need 2 rows. You have to use UNION ALL clause -
INSERT INTO forms(id, name, title)
SELECT
1, COALESCE(nf.name, f.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id
UNION ALL
SELECT
1, COALESCE(f.name, nf.name), COALESCE(nf.title, f.title)
FROM
forms f
LEFT OUTER JOIN
new_forms nf ON nf.id = f.id;
I have this dictionary thing and the table with translations. I can do a nice select using it with SQLite
SELECT e.slug,
en.title,
en.locale
FROM entities AS e
LEFT JOIN (
locales AS en,
entity_locales AS el
) ON (
el.entity_id = e.id
AND el.locale_id = en.id
AND en.locale == 'en'
)
That produces:
present, translation, en
missing, NULL, NULL
But I can't convert it to Postgres because I don't understand what is going on when you specify more than one table in LEFT JOIN in SQLite:
SELECT e.slug,
en.title,
en.locale
FROM entities e
LEFT JOIN entity_locales el ON (el.entity_id = e.id)
JOIN locales en ON (
el.locale_id = en.id
AND en.locale = 'en'
)
Produces only
present, translation, en
Is there a way to make it work?
Database structure in SQLite format:
CREATE TABLE IF NOT EXISTS "entities" (
"id" integer PRIMARY KEY AUTOINCREMENT NOT NULL,
"slug" varchar
);
CREATE TABLE IF NOT EXISTS "entity_locales" (
"entity_id" integer,
"locale_id" integer
);
CREATE TABLE IF NOT EXISTS "locales" (
"id" integer PRIMARY KEY AUTOINCREMENT NOT NULL,
"title" varchar,
"locale" varchar
);
insert into entities(id, slug) values(1, 'present');
insert into entities(id, slug) values(2, 'missing');
insert into locales(id, title, locale) values(1, 'translation', 'en');
insert into entity_locales(entity_id, locale_id) values(1, 1);
You are using a left join only to the second table, and then you are using an inner join to the third. You need to use a left join to the product of the inner join between the second and third table.
Try this instead:
SELECT e.slug,
en.title,
en.locale
FROM entities e
LEFT JOIN
(
entity_locales el
JOIN locales en ON (
el.locale_id = en.id
AND en.locale = 'en'
)
) ON (el.entity_id = e.id)
btw, your initial script mixes implicit and explicit joins. I would advise against using implicit joins since explicit joins are a standard part of SQL for over 25 years now.
I am trying to solve the following problem entirely in SQL (ANSI or TSQL, in Sybase ASE 12), without relying on cursors or loop-based row-by-row processing.
NOTE: I already created a solution that accomplishes the same goal in application layer (therefore please refrain from "answering" with "don't do this in SQL"), but as a matter of principle (and hopefully improved performance) I would like to know if there is an efficient (e.g. no cursors) pure SQL solution.
Setup:
I have a table T with the following 3 columns (all NOT NULL):
---- Table T -----------------------------
| item | tag | value |
| [int] | [varchar(10)] | [varchar(255)] |
The table has unique index on item, tag
Every tag has a form of a string "TAG##" where "##" is a number 1-99
Existing tags are not guaranteed to be contiguous, e.g. item 13 may have tags "TAG1", "TAG3", "TAG10".
TASK: I need to insert a bunch of new rows into the table from another table T_NEW, which only have items and values, and assign new tag to them so they don't violate unique index on item, tag.
Uniqueness of values is irrelevant (assume that item+value is always unique already).
---- Table T_NEW --------------------------
| item | tag | value |
| [int] | STARTS AS NULL | [varchar(255)] |
QUESTION: How can I assign new tags to all rows in table T_NEW, such that:
All item+tag combinations in a union of T and T_NEW are unique
Newly assigned tags should all be in the form "TAG##"
Newly assigned tags should ideally be the smallest available for a given item.
If it helps, you can assume that I already have a temp table #tags, with a "tag" column that contains 99 rows containing all the valid tags (TAG1..TAG99, one per row)
I started a fiddle that will get you the list of available "open" tags by item. It does this using the #tags (AllTags) and doing an outer-join-where-null. You could use that to insert new tags from T_New...
with T_openTags as (
select
items.item,
openTagName = a.tag
from
(select distinct item from T) items
cross join AllTags a
left outer join T on
items.item = T.item
and T.tag = a.tag
where
T.item is null
)
select * from T_openTags
or see this updated fiddle to do an update on T_New table. Essentially adds a row_number so we can pick the correct open tag to use in a single update statement. I padded the Tag names with a leading zero to simplify the sorting.
with T_openTags as (
select
items.item,
openTagName = a.tag,
rn = row_number() over(partition by items.item order by a.tag)
from
(select distinct item from T) items
cross join AllTags a
left outer join T on
items.item = T.item
and T.tag = a.tag
where
T.item is null
), T_New_numbered as (
select *,
rn = row_number() over(partition by item order by value)
from T_New
)
update tnn set tag = openTagName
from T_New_numbered tnn
inner join T_openTags tot on
tot.item = tnn.item
and tot.rn = tnn.rn
select * from T_New
updated fiddle with poor mans row_number replacement that only works with distinct T_New values
Try this:
DECLARE #T TABLE (ITEM INT, TAG VARCHAR(10), VALUE VARCHAR(255))
INSERT INTO #T VALUES
(1,'TAG1', '100'),
(2,'TAG2', '200')
DECLARE #T_NEW TABLE (ITEM INT, TAG VARCHAR(10), VALUE VARCHAR(255))
INSERT INTO #T_NEW VALUES
(3,NULL, '500'),
(4,NULL, '600')
INSERT INTO #T
SELECT
ITEM,
('TAG' + CONVERT(VARCHAR(20),ITEM)) AS TAG,
VALUE
FROM
#T_NEW
SELECT * FROM #T
OK, here's a correct solution, tested to work on Sybase (H/T: big thanks to #ypercube for providing a solid basis for it)
declare #c int
select #c = 1
WHILE (#c > 0)
BEGIN
UPDATE
t_new
SET
tag =
( SELECT min(tags.tag)
FROM #tags tags
LEFT JOIN t o
ON tags.tag = o.tag
AND o.item = t_new.item
LEFT JOIN t_new n3
ON tags.tag = n3.tag
AND n3.item = t_new.item
WHERE o.tag IS NULL
AND n3.tag IS NULL
)
WHERE tag IS NULL
-- and here's the main magic for only updating one item at a time
AND NOT EXISTS (SELECT 1 FROM t_new n2 WHERE t_new.value > n2.value
and n2.tag IS NULL and n2.item=t_new.item)
SELECT #c = ##rowcount
END
Inserting directly to t:
INSERT INTO t
(item, tag, value)
SELECT
item,
( SELECT MIN(tags.tag)
FROM #tags AS tags
LEFT JOIN t AS o
ON tags.tag = o.tag
AND o.item_id = n.item_id
WHERE o.tag IS NULL
) AS tag,
value
FROM
t_new AS n ;
Updating t_new:
UPDATE
t_new AS n
SET
tag =
( SELECT MIN(tags.tag)
FROM #tags AS tags
LEFT JOIN t AS o
ON tags.tag = o.tag
AND o.item_id = n.item_id
WHERE o.tag IS NULL
) ;
Correction
UPDATE
n
SET
n.tag = w.tag
FROM
( SELECT item_id,
tag,
ROW_NUMBER() OVER (PARTITION BY item_id ORDER BY value) AS rn
FROM t_new
) AS n
JOIN
( SELECT di.item_id,
tags.tag,
ROW_NUMBER() OVER (PARTITION BY di.item_id ORDER BY tags.tag) AS rn
FROM
( SELECT DISTINCT item_id
FROM t_new
) AS di
CROSS JOIN
#tags AS tags
LEFT JOIN
t AS o
ON tags.tag = o.tag
AND o.item_id = di.item_id
WHERE o.tag IS NULL
) AS w
ON w.item_id = n.item_id
AND w.rn = n.rn ;
I have a database which has 2 tables:
CREATE TABLE RecipeDB (
RecipeID INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR,
Recipe VARCHAR,
Origin VARCHAR,
Category VARCHAR,
Favoured BOOL);
CREATE TABLE IngredientDB (
RecipeID REFERENCES RecipeDB.RecipeID,
Ingredient VARCHAR,
Quantity VARCHAR);
(One-to-many relation between Recipe and Ingredients)
I also have an actionscript, in which I have ingArr:Array of ingredient strings.
Now, I would like to realize the following queries here:
1) Select (all fields) one recipe which has the most of ingredients from the array. If more than one record have the same amount of matches, then divide the number of matches by total number of ingredients in recipe and return the one with the highest ratio. If there are no matches return nothing.
2) As above, but return 10 recipes with the most matches and do not perform check for equal number of matches. Sort the results by the number of matches.
Any ideas how to compose those queries in SQLite?
(The SQL statement provide under are for SQLite)
So for the second one you need the top 10 recipee that match the most ingredient
What you need is:
count the row that match your ingredient list (use IN operator)
order the result by best count in descendant order (4,3,2,...)
limit the result by 10
So the sql statement looks like
SELECT
r.RecipeId, COUNT(1) cnt, r.Name, r.Recipe, r.Origin, r.Category, r.Favoured
FROM
RecipeDB r
INNER JOIN IngredientDB i USING(RecipeID)
WHERE
i.Ingredient in ('ingr_1',..,'ingr_x')
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
using AIR + AS3 it can be something like that :
var sqls:SQLStatement = new SQLStatement()
sqls.sqlConnection = YOUR SQL CONNECTION
// your ingredient list
var ingredients:Array = ['i2', 'i3', 'i4']
// use to build the in parameter array
var inParams:Array = []
// fill parameter values
for(var i:int = 0; i < ingredients.length; ++i) {
inParams[i] = '?'
sqls.parameters[i] = ingredients[i]
}
// build the query
var qry:String = "SELECT r.RecipeId, COUNT(1) cnt, r.Name, r.Recipe, r.Origin,"+
"r.Category, r.Favoured FROM RecipeDB r INNER JOIN IngredientDB i USING(RecipeID)"+
"WHERE i.Ingredient in (" + inParams.join(',') + ") GROUP BY 1 "+
"ORDER BY 2 DESC LIMIT 10"
// set the query
sqls.text = qry
//execute
sqls.execute()
And for the first one same idea as above but you need to count also all the ingredient present into the recipee to provide a ratio between match / total
What you need is:
count the row that match your ingredient list (use IN operator)
make a rank by divide previous count by all total ingredient
get the best match
limit the result by 1
So the sql statement looks like :
SELECT
i1.RecipeId, (cast(rs.cnt as real) / cast (COUNT(1) as real)) rank,
rs.Name, rs.Recipe, rs.Origin, rs.Category, rs.Favoured
FROM
IngredientDB i1
INNER JOIN (
SELECT
r.RecipeId, COUNT(1) cnt, r.Name, r.Recipe, r.Origin, r.Category, r.Favoured
FROM
RecipeDB r
INNER JOIN IngredientDB i USING(RecipeID)
WHERE
i.Ingredient in ('ingr_1',..,'ingr_x')
GROUP BY 1
) rs USING (RecipeId)
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1
Using the same logic as for the first example your query can be written as :
var ingredients:Array = ['i2', 'i3', 'i4']
var inParams:Array = []
for(var i:int = 0; i < ingredients.length; ++i) {
inParams[i] = '?'
sqls.parameters[i] = ingredients[i]
}
var qry:String = "SELECT i1.RecipeId, (cast(rs.cnt as real) / cast (COUNT(1) as real)) rank,"+
"rs.Name, rs.Recipe, rs.Origin, rs.Category, rs.Favoured "+
"FROM IngredientDB i1 INNER JOIN ("+
"SELECT r.RecipeId, COUNT(1) cnt, r.Name, r.Recipe, r.Origin, r.Category, r.Favoured "+
"FROM RecipeDB r INNER JOIN IngredientDB i USING(RecipeID) "+
"WHERE i.Ingredient in (" + inParams.join(',') + ") GROUP BY 1) rs USING (RecipeId) "+
"GROUP BY 1 ORDER BY 2 DESC LIMIT 1"
The queries can be tuned slightly, but the T-SQL below demonstrates the answers your looking for in a fairly readable way.
BEGIN
-- setup test
DECLARE #one TABLE(id INT, name VARCHAR(10))
DECLARE #many TABLE(pid INT, name VARCHAR(10))
INSERT INTO #one VALUES
(1, 'AAA'),
(2, 'BBB'),
(3, 'CCC')
INSERT INTO #many VALUES
(1, 'x'),(1, 'y'),(1, 'z'),
(2, 'x'),(2, 'y'),
(3, 'z')
--
-- WHERE m.name IN ('x', 'y')
-- 'x', 'y' represent your list of ingrediants
-- answer 1
SELECT * FROM #one WHERE id = (
SELECT TOP 1 x.id FROM (
SELECT o.id, COUNT(o.id) 'match', (SELECT COUNT(*) FROM #many WHERE pid=o.id) 'total' FROM #one o
INNER JOIN #many m ON o.id = m.pid
WHERE m.name IN ('x', 'y', 'z')
GROUP BY o.id
) as x ORDER BY x.match DESC, x.match/x.total DESC
)
-- answer 2
SELECT * FROM #one WHERE id IN (
SELECT TOP 10 x.id FROM (
SELECT o.id, COUNT(o.id) 'match' FROM #one o
INNER JOIN #many m ON o.id = m.pid
WHERE m.name IN ('x', 'y')
GROUP BY o.id
) as x ORDER BY x.match DESC
)
END
About the system:
-The system has a total of 8 tables
- Users
- Tutor_Details (Tutors are a type of User,Tutor_Details table is linked to Users)
- learning_packs, (stores packs created by tutors)
- learning_packs_tag_relations, (holds tag relations meant for search)
- tutors_tag_relations and tags and
orders (containing purchase details of tutor's packs),
order_details linked to orders and tutor_details.
For a more clear idea about the tables involved please check the The tables section in the end.
-A tags based search approach is being followed.Tag relations are created when new tutors register and when tutors create packs (this makes tutors and packs searcheable). For details please check the section How tags work in this system? below.
Following is a simpler representation (not the actual) of the more complex query which I am trying to optimize:- I have used statements like explanation of parts in the query
============================================================================
select
SUM(DISTINCT( t.tag LIKE "%Dictatorship%" )) as key_1_total_matches,
SUM(DISTINCT( t.tag LIKE "%democracy%" )) as key_2_total_matches,
td.*, u.*, count(distinct(od.id_od)), `if (lp.id_lp > 0) then some conditional logic on lp fields else 0 as tutor_popularity`
from Tutor_Details AS td JOIN Users as u on u.id_user = td.id_user
LEFT JOIN Learning_Packs_Tag_Relations AS lptagrels ON td.id_tutor = lptagrels.id_tutor
LEFT JOIN Learning_Packs AS lp ON lptagrels.id_lp = lp.id_lp
LEFT JOIN `some other tables on lp.id_lp - let's call learning pack tables set (including
Learning_Packs table)`
LEFT JOIN Order_Details as od on td.id_tutor = od.id_author LEFT JOIN Orders as o on
od.id_order = o.id_order
LEFT JOIN Tutors_Tag_Relations as ttagrels ON td.id_tutor = ttagrels.id_tutor
JOIN Tags as t on (t.id_tag = ttagrels.id_tag) OR (t.id_tag = lptagrels.id_tag)
where `some condition on Users table's fields`
AND CASE WHEN ((t.id_tag = lptagrels.id_tag) AND (lp.id_lp > 0)) THEN `some
conditions on learning pack tables set` ELSE 1 END
AND CASE WHEN ((t.id_tag = wtagrels.id_tag) AND (wc.id_wc > 0)) THEN `some
conditions on webclasses tables set` ELSE 1 END
AND CASE WHEN (od.id_od>0) THEN od.id_author = td.id_tutor and `some conditions on Orders table's fields` ELSE 1 END
AND ( t.tag LIKE "%Dictatorship%" OR t.tag LIKE "%democracy%")
group by td.id_tutor HAVING key_1_total_matches = 1 AND key_2_total_matches = 1
order by tutor_popularity desc, u.surname asc, u.name asc limit
0,20
=====================================================================
What does the above query do?
Does AND logic search on the search keywords (2 in this example - "Democracy" and "Dictatorship").
Returns only those tutors for which both the keywords are present in the union of the two sets - tutors details and details of all the packs created by a tutor.
To make things clear - Suppose a Tutor name "Sandeepan Nath" has created a pack "My first pack", then:-
Searching "Sandeepan Nath" returns Sandeepan Nath.
Searching "Sandeepan first" returns Sandeepan Nath.
Searching "Sandeepan second" does not return Sandeepan Nath.
======================================================================================
The problem
The results returned by the above query are correct (AND logic working as per expectation), but the time taken by the query on heavily loaded databases is like 25 seconds as against normal query timings of the order of 0.005 - 0.0002 seconds, which makes it totally unusable.
It is possible that some of the delay is being caused because all the possible fields have not yet been indexed, but I would appreciate a better query as a solution, optimized as much as possible, displaying the same results
==========================================================================================
How tags work in this system?
When a tutor registers, tags are entered and tag relations are created with respect to tutor's details like name, surname etc.
When a Tutors create packs, again tags are entered and tag relations are created with respect to pack's details like pack name, description etc.
tag relations for tutors stored in tutors_tag_relations and those for packs stored in learning_packs_tag_relations. All individual tags are stored in tags table.
====================================================================
The tables
Most of the following tables contain many other fields which I have omitted here.
CREATE TABLE IF NOT EXISTS `users` (
`id_user` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL DEFAULT '',
`surname` varchar(155) NOT NULL DEFAULT '',
PRIMARY KEY (`id_user`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=636 ;
CREATE TABLE IF NOT EXISTS `tutor_details` (
`id_tutor` int(10) NOT NULL AUTO_INCREMENT,
`id_user` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_tutor`),
KEY `Users_FKIndex1` (`id_user`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=51 ;
CREATE TABLE IF NOT EXISTS `orders` (
`id_order` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id_order`),
KEY `Orders_FKIndex1` (`id_user`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=275 ;
ALTER TABLE `orders`
ADD CONSTRAINT `Orders_ibfk_1` FOREIGN KEY (`id_user`) REFERENCES `users`
(`id_user`) ON DELETE NO ACTION ON UPDATE NO ACTION;
CREATE TABLE IF NOT EXISTS `order_details` (
`id_od` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_order` int(10) unsigned NOT NULL DEFAULT '0',
`id_author` int(10) NOT NULL DEFAULT '0',
PRIMARY KEY (`id_od`),
KEY `Order_Details_FKIndex1` (`id_order`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=284 ;
ALTER TABLE `order_details`
ADD CONSTRAINT `Order_Details_ibfk_1` FOREIGN KEY (`id_order`) REFERENCES `orders`
(`id_order`) ON DELETE NO ACTION ON UPDATE NO ACTION;
CREATE TABLE IF NOT EXISTS `learning_packs` (
`id_lp` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_author` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id_lp`),
KEY `Learning_Packs_FKIndex2` (`id_author`),
KEY `id_lp` (`id_lp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=23 ;
CREATE TABLE IF NOT EXISTS `tags` (
`id_tag` int(10) unsigned NOT NULL AUTO_INCREMENT,
`tag` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id_tag`),
UNIQUE KEY `tag` (`tag`),
KEY `id_tag` (`id_tag`),
KEY `tag_2` (`tag`),
KEY `tag_3` (`tag`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=3419 ;
CREATE TABLE IF NOT EXISTS `tutors_tag_relations` (
`id_tag` int(10) unsigned NOT NULL DEFAULT '0',
`id_tutor` int(10) DEFAULT NULL,
KEY `Tutors_Tag_Relations` (`id_tag`),
KEY `id_tutor` (`id_tutor`),
KEY `id_tag` (`id_tag`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `tutors_tag_relations`
ADD CONSTRAINT `Tutors_Tag_Relations_ibfk_1` FOREIGN KEY (`id_tag`) REFERENCES
`tags` (`id_tag`) ON DELETE NO ACTION ON UPDATE NO ACTION;
CREATE TABLE IF NOT EXISTS `learning_packs_tag_relations` (
`id_tag` int(10) unsigned NOT NULL DEFAULT '0',
`id_tutor` int(10) DEFAULT NULL,
`id_lp` int(10) unsigned DEFAULT NULL,
KEY `Learning_Packs_Tag_Relations_FKIndex1` (`id_tag`),
KEY `id_lp` (`id_lp`),
KEY `id_tag` (`id_tag`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `learning_packs_tag_relations`
ADD CONSTRAINT `Learning_Packs_Tag_Relations_ibfk_1` FOREIGN KEY (`id_tag`)
REFERENCES `tags` (`id_tag`) ON DELETE NO ACTION ON UPDATE NO ACTION;
===================================================================================
Following is the exact query (this includes classes also - tutors can create classes and search terms are matched with classes created by tutors):-
SELECT SUM(DISTINCT( t.tag LIKE "%Dictatorship%" )) AS key_1_total_matches,
SUM(DISTINCT( t.tag LIKE "%democracy%" )) AS key_2_total_matches,
COUNT(DISTINCT( od.id_od )) AS tutor_popularity,
CASE
WHEN ( IF(( wc.id_wc > 0 ), ( wc.wc_api_status = 1
AND wc.wc_type = 0
AND wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'classes_published',
CASE
WHEN ( IF(( lp.id_lp > 0 ), ( lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'packs_published',
td . *,
u . *
FROM tutor_details AS td
JOIN users AS u
ON u.id_user = td.id_user
LEFT JOIN learning_packs_tag_relations AS lptagrels
ON td.id_tutor = lptagrels.id_tutor
LEFT JOIN learning_packs AS lp
ON lptagrels.id_lp = lp.id_lp
LEFT JOIN learning_packs_categories AS lpc
ON lpc.id_lp_cat = lp.id_lp_cat
LEFT JOIN learning_packs_categories AS lpcp
ON lpcp.id_lp_cat = lpc.id_parent
LEFT JOIN learning_pack_content AS lpct
ON ( lp.id_lp = lpct.id_lp )
LEFT JOIN webclasses_tag_relations AS wtagrels
ON td.id_tutor = wtagrels.id_tutor
LEFT JOIN webclasses AS wc
ON wtagrels.id_wc = wc.id_wc
LEFT JOIN learning_packs_categories AS wcc
ON wcc.id_lp_cat = wc.id_wp_cat
LEFT JOIN learning_packs_categories AS wccp
ON wccp.id_lp_cat = wcc.id_parent
LEFT JOIN order_details AS od
ON td.id_tutor = od.id_author
LEFT JOIN orders AS o
ON od.id_order = o.id_order
LEFT JOIN tutors_tag_relations AS ttagrels
ON td.id_tutor = ttagrels.id_tutor
JOIN tags AS t
ON ( t.id_tag = ttagrels.id_tag )
OR ( t.id_tag = lptagrels.id_tag )
OR ( t.id_tag = wtagrels.id_tag )
WHERE ( u.country = 'IE'
OR u.country IN ( 'INT' ) )
AND CASE
WHEN ( ( t.id_tag = lptagrels.id_tag )
AND ( lp.id_lp > 0 ) ) THEN lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( ( t.id_tag = wtagrels.id_tag )
AND ( wc.id_wc > 0 ) ) THEN wc.wc_api_status = 1
AND wc.wc_type = 0
AND
wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor
AND o.order_status = 'paid'
AND CASE
WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1
ELSE 1
END
ELSE 1
END
GROUP BY td.id_tutor
HAVING key_1_total_matches = 1
AND key_2_total_matches = 1
ORDER BY tutor_popularity DESC,
u.surname ASC,
u.name ASC
LIMIT 0, 20
Please note - The provided database structure does not show all the fields and tables as in this query
=====================================================================================
The explain query output:-
Please see this screenshot
http://www.test.examvillage.com/Explain_query.jpg
Information on row counts, value distributions, indexes, size of the database, size of memory, disk layout - raid 0, 5, etc - how many users are hitting your database when queries are slow - what other queries are running. All these things factor into performance.
Also a print out of the explain plan output may shed some light on the cause if it's simply a query / index issue. The exact query would be needed as well.
You really should use some better formatting for the query.
Just add at least 4 spaces to the beginning of each row to get this nice code formatting.
SELECT * FROM sometable
INNER JOIN anothertable ON sometable.id = anothertable.sometable_id
Or have a look here: https://stackoverflow.com/editing-help
Could you provide the execution plan from mysql? You need to add "EXPLAIN" to the query and copy the result.
EXPLAIN SELECT * FROM ...complexquery...
will give you some useful hints (execution order, returned rows, available/used indexes)
Your question is, "how can I find tutors that match certain tags?" That's not a hard question, so the query to answer it shouldn't be hard either.
Something like:
SELECT *
FROM tutors
WHERE tags LIKE '%Dictator%' AND tags LIKE '%Democracy%'
That will work, if you modify your design to have a "tags" field in your "tutors" table, in which you put all the tags that apply to that tutor. It will eliminate layers of joins and tables.
Are all those layers of joins and tables providing real functionality, or just more programming headaches? Think about the functionality that your app REALLY needs, and then simplify your database design!!
Answering my own question.
The main problem with this approach was that too many tables were joined in a single query. Some of those tables like Tags (having large number of records - which can in future hold as many as all the English words in the vocabulary) when joined with so many tables cause this multiplication effect which can in no way be countered.
The solution is basically to make sure too many joins are not made in a single query. Breaking one large join query into steps, using the results of the one query (involving joins on some of the tables) for the next join query (involving joins on the other tables) reduces the multiplication effect.
I will try to provide better explanation to this later.