recursive select and join related table - sql

CREATE TABLE IF NOT EXISTS "product_category"(
"id" SERIAL NOT NULL,
"parent_id" integer DEFAULT NULL,
"name" varchar DEFAULT NULL,
PRIMARY KEY ("id")
);
CREATE TABLE IF NOT EXISTS "product"(
"id" SERIAL NOT NULL,
"product_category_id" integer DEFAULT NULL,
PRIMARY KEY ("id")
);
I have two table like above,
the product_category is hierarchy,
and product.product_category_id fk product_category.id.
How to select all product under specific product_category id,
e.g
if input product_category 1,
output -> product:1, product:2
if input product_category 2
output -> product:2
product_category
id | parent_id | name
1 | | parent
2 | 1 | child
3 | 2 | child child
product
id | product_category_id
1 | 1
2 | 3
query
like this ?? but this return only product_category list .... I want product list
WITH RECURSIVE pc AS (
SELECT pc.id AS id
FROM product_category pc
LEFT JOIN product p ON p.product_category_id = pc.id
WHERE id = $1
UNION ALL
SELECT child.id
FROM product_category AS child
LEFT JOIN product p ON p.product_category_id = child.id
JOIN pc ON pc.id = child.parent_id
)
SELECT * FROM product_category WHERE id IN (SELECT * FROM pc)

You should first build up the list of categories, then join that to the products.
WITH RECURSIVE pc AS (
SELECT id
FROM product_category
WHERE id = $id
UNION ALL
SELECT child.id
FROM product_category AS child
JOIN pc ON pc.id = child.parent_id
)
SELECT pr.*
FROM product pr
JOIN pc on pr.product_category_id = pc.id
;

Note:I do not have postgresql installed on my system so I am talking based on concepts. I do not see where you have defined the foreign key constraint ... still I am assuming you have done so. And I have not checked the correctness of the CTE/Common Table Expression (- basically with recursive portion of your SQL). Assuming that CTE is correct -
How is about replacing
SELECT * FROM product_category WHERE id IN (SELECT * FROM pc)
with
SELECT * FROM product WHERE product_category_id IN (SELECT * FROM pc)
The correctness of the CTE is the next thing I am going to check.
So apparently the following should work:
;
WITH RECURSIVE pc AS (
SELECT pc.id AS id
FROM
product_category pc
LEFT JOIN product p
ON p.product_category_id = pc.id
WHERE pc.id = <Your product category id of interest>
UNION ALL
SELECT child.id
FROM
product_category AS child
LEFT JOIN product p
ON p.product_category_id = child.id
JOIN pc
ON pc.id = child.parent_id
)
SELECT id as product_id FROM product WHERE product_category_id
IN (
SELECT * FROM pc
);

Related

Names of nodes at depth d for every descendant leaf

I have a category hierarchy that products are attached to. That category hierarchy is saved as an adjacency list. Products can be attached to any category nodes at any level. The category hierarchy is a tree.
I would like to...
get the name of every level 3 category...
per product...
where that product is attached to any level 3 category node...
or a descendant of a level 3 node.
I know I can materialize the hierarchy, and from that I've been able to satisfy all requirements but the last. I always lose some products or categories.
Given
CREATE TABLE product (p_id varchar PRIMARY KEY);
CREATE TABLE category (c_id varchar PRIMARY KEY, parent_c_id varchar);
CREATE TABLE product_category (
p_id varchar,
c_id varchar,
PRIMARY KEY (p_id, c_id),
FOREIGN KEY (p_id) REFERENCES product (p_id)
ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY (c_id) REFERENCES category (c_id)
ON UPDATE CASCADE ON DELETE CASCADE
);
INSERT INTO product (p_id) VALUES
('p_01'),
('p_02'),
('p_03'),
('p_04'),
('p_05');
INSERT INTO category (c_id, parent_c_id) VALUES
('c_0_1', NULL),
-- L1
('c_1_1', 'c_0_1'),
('c_1_2', 'c_0_1'),
('c_1_3', 'c_0_1'),
-- L2
('c_2_1', 'c_1_1'),
('c_2_2', 'c_1_1'),
('c_2_3', 'c_1_2'),
('c_2_4', 'c_1_3'),
-- L3
('c_3_1', 'c_2_1'),
('c_3_2', 'c_2_2'),
('c_3_3', 'c_2_3'),
('c_3_4', 'c_2_4'),
-- L4
('c_4_1', 'c_3_1'),
('c_4_2', 'c_3_2'),
('c_4_3', 'c_3_3'),
('c_4_4', 'c_3_4');
INSERT INTO product_category (p_id, c_id) VALUES
-- p_01 explicitly attached to every level in path 1; include.
('p_01', 'c_0_1'),
('p_01', 'c_2_1'),
('p_01', 'c_3_1'),
('p_01', 'c_4_1'),
-- p_02 explicitly attached to desired level in paths 1 and 3; include both.
('p_02', 'c_3_3'),
('p_02', 'c_3_4'),
-- p_03 explicitly attached to super-level in path 3; exclude.
('p_03', 'c_2_4'),
-- p_04 explicitly attached to sub-level in path 1,
-- transitively to desired level in path 1; include.
('p_04', 'c_4_2');
-- p_05 not attached at all.
I would like to end up with something like
p_id | c_id
------+----------------
p_01 | {c_3_1}
p_02 | {c_3_3, c_3_4}
p_04 | {c_3_2}
(3 rows)
but the closest I have gotten is
WITH RECURSIVE category_tree (c_id, parent_c_id, depth, path) AS (
SELECT c_id, parent_c_id, 0 AS depth, ARRAY[]::varchar[]
FROM category
WHERE parent_c_id IS NULL
UNION ALL
SELECT c.c_id, c.parent_c_id, ct.depth + 1, path || c.c_id
FROM category_tree AS ct
INNER JOIN category AS c ON c.parent_c_id = ct.c_id
)
SELECT *
INTO TEMP TABLE t_category_path
FROM category_tree;
SELECT p.p_id, ARRAY_AGG(c_id) category_names
FROM product AS p,
(SELECT DISTINCT t1.c_id, p_id
FROM product_category AS pc
INNER JOIN t_category_path AS t1 ON pc.c_id = t1.c_id
WHERE t1.depth = 3
ORDER BY c_id) x
WHERE p.p_id = x.p_id
GROUP BY p.p_id;
p_id | category_names
------+----------------
p_01 | {c_3_1}
p_02 | {c_3_4,c_3_3}
(2 rows)
The order of categories is irrelevent (I want a set, not a list).
I can tolerate duplicate categories far better than missing categories or products.
I have some liberty to adjust the schema.
> select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 10.12 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
step-by-step demo:db<>fiddle
WITH RECURSIVE cte AS (
SELECT c_id, parent_c_id, 0 as level, NULL AS level3_category
FROM category
WHERE parent_c_id IS NULL
UNION
SELECT
c.c_id,
cte.parent_c_id,
cte.level + 1,
CASE -- 1
WHEN cte.level + 1 = 3 THEN c.c_id
ELSE cte.level3_category
END
FROM
category c
JOIN
cte
ON c.parent_c_id = cte.c_id
)
SELECT
p_id,
ARRAY_AGG(DISTINCT level3_category) as c_id -- 2
FROM
cte
JOIN
product_category pc
ON cte.c_id = pc.c_id AND cte.level3_category IS NOT NULL
GROUP BY p_id
This CASE clause stores the current name if and only if it is level 3. If it is less, than it returns NULL, if it is greater, it takes the level 3 value.
DISTINCT is allowed in GROUP BY aggregates to eliminate non-distinct values.
You can use exists and not exists with joins to get a particular depth:
select p.p_id, array_agg(pc.c_id)
from products p join
product_category pc
on p.p_id = pc.p_id
where exists (select 1
from category_tree ct join
category_tree ctp
on ct.parent_cid = ctp.cid join
category_tree ctp2
on ctp.parent_cid = ctp2.cid
where ct.cid = pc.c_id
) and
not exists (select 1
from category_tree ct join
category_tree ctp
on ct.parent_cid = ctp.cid join
category_tree ctp2
on ctp.parent_cid = ctp2.cid join
category_tree ctp3
on ctp2.parent_cid = ctp3.cid
where ct.cid = pc.c_id
)
group by p.p_id;

How to left join two tables on specific conditions

I have two tables Price(Type, Values) and Product(Seat) and some values.
Price | Product
-------------+---------
Type Values | Seat
S 4 | FO
P 6 | CA
| FA
I know that [FO] and [CA] belong to type [P], and [FA] belongs to type [S]. How can I join these tables and shows associated type and values:
Results
Seat Type Values
----- ----- -----------
FO P 6
CA P 6
FA S 4
You can join the tables like this:
select pr.seat, sum(p.value)
from price p join
product pr
on pr.seat in ('FO', 'CA') and p.type = 'P' or
pr.seat in ('FA') and p.type = 'S'
group by pr.seat;
That said, you should have a proper table that connects the seats to the products, probably called ProductSeats with one row per product and matching seat.
I would use a derived table to store the mapping between price and seat. This is easily extensible when new requirements come up.
SELECT pri.*, pro.*
FROM price pri
INNER JOIN (
SELECT 'FO' seat, 'P' price
UNION ALL SELECT 'CA' seat, 'P' price
UNION ALL SELECT 'FA' seat, 'S' price
) map ON map.pri = pri.price
INNER JOIN product pro ON pro.seat = map.pro
This can be simplified by using the VALUES() syntax:
SELECT pri.*, pro.*
FROM price pri
INNER JOIN (
VALUES('FO', 'P'), ('CA', 'P'), ('FA', 'S')
) AS map(seat, price) ON map.pri = pri.price
INNER JOIN product pro ON pro.seat = map.pro

Count total number of rows where this.row is related with rows in another table

I have two simple tables, parents and children. I am trying to count the number of parents who have at least one child.
create table People(
id integer unique,
name varchar(120),
primary key (id)
);
create table children(
id integer unique,
name varchar(120),
parentId integer,
primary key(id),
foreign key (parentId) references People(id)
);
This is the code I tried but it gives me the total number of children instead:
select count(*)
from (people p join children ch on ch.parentid = p.id)
having count(ch.id) > 0;
I am trying to count the number of parents who have at least one children.
This should be as simple as:
SELECT COUNT(*)
FROM people p
WHERE EXISTS (SELECT 1 FROM children c WHERE c.parentid = p.id)
Using EXISTS is usually the most efficient way to check that something, well, exists.
You're close. You just need to make the check for children on a per-parent basis:
SELECT COUNT(*) AS parents_with_children
FROM (SELECT p.name, COUNT(c.id) AS num_children
FROM people p
JOIN children c ON c.parentid = p.id
GROUP BY p.name
HAVING COUNT(c.id) > 0) p
Demo on dbfiddle
SELECT COUNT(*),p.*
FROM People p JOIN children c ON c.parnetId=p.id
WHERE NOT c.parnetId IS NULL
GROUP BY (p.id)
(no need for having since it only joins existing children anyways)
select count(p.*)
from people p inner join children ch
on ch.parentid = p.id
You could try something like this,
SELECT COUNT(DISTINCT children.parentid)
FROM People
INNER JOIN children
ON children.parentid = people.id;
With EXISTS:
select count(distinct p.id) counter from people p
where exists (
select 1 from children
where parentid = p.id
)
or even better:
select count(distinct parentid) counter
from children
because all the info you need is in the table children, so just count the distinct values in column parentid

Find the top value for each parent

I'm sure this is a common request but I wouldn't know how to ask for it formally.
I encountered this a long time ago when I was in the Army. A soldier has multiple physical fitness tests but the primary test that counts in the most recent. The soldier also has multiple marksmanship qualifications but only the most recent qualification to the weapon assigned is significant.
How do you create a view that itemizes the most significant child of the parent?
Use:
SELECT p.*, x.*
FROM PARENT p
JOIN CHILD x ON x.parent_id = p.id
JOIN (SELECT c.id,
c.parent_id,
MAX(c.date_column) AS max_date
FROM CHILD c
GROUP BY c.id, c.parent_id) y ON y.id = x.id
AND y.parent_id = x.parent_id
AND y.max_date = x.date
Assuming SQL Server 2005+:
WITH summary AS (
SELECT p.*,
c.*,
ROW_NUMBER() OVER (PARTITION BY p.id
ORDER BY c.date DESC) AS rank
FROM PARENT p
JOIN CHILD c ON c.parent_id = p.id)
SELECT s.*
FROM summary s
WHERE s.rank = 1
Although I'm not quite sure what you are implying by "itemizing", you can do something like so:
Select ..
From Soldier
Left Join FitnessTest
On FitnessTest.SoldierId = Soldier.Id
And FitnessTest.TestDate = (
Select Max(FT1.TestDate)
From FitnessTest As FT1
Where FT1.SoldierId = FitnessTest.SoldierId
)
Left Join MarksmanshipTest
On MarksmanshipTest.SoldierId = Soldier.Id
And MarksmanshipTest.TestDate = (
Select Max(MT1.TestDate)
From MarksmanshipTest As MT1
Where MT1.SoldierId = MarksmanshipTest.SoldierId
)
This assumes that a solider cannot have two test datetime values for a fitness test or a marksmanship test.
No significant differnce from previous two answer but a little more detail perhaps:
create table soldier ( soldierId int primary key,
name varchar(100) )
create table fitnessTest ( soldierId int foreign key references soldier,
occurred datetime, result int )
create table marksmanshipTest ( soldierId int foreign key references soldier,
occurred datetime, result int )
;with
mostRecentFitnessTest as
(
select
fitnessTest.soldierId,
fitnessTest.result,
row_number() over (order by occurred desc) as row
from fitnessTest
),
mostRecentMarksmanshipTest as
(
select
marksmanshipTest.soldierId,
marksmanshipTest.result,
row_number() over (order by occurred desc) as row
from marksmanshipTest
)
select
soldier.soldierId,
soldier.name,
mostRecentFitnessTest.result,
mostRecentMarksmanshipTest.result
from soldier
left outer join mostRecentFitnessTest on
mostRecentFitnessTest.soldierId = soldier.soldierId
and mostRecentFitnessTest.row = 1
left outer join mostRecentMarksmanshipTest on
mostRecentMarksmanshipTest.soldierId = soldier.soldierId
and mostRecentMarksmanshipTest.row = 1

How to tune a 7-table-join MySQL count query where tables contain 30,000+ rows?

I have an sql query that counts the number of results for a complex query. The actual select query is very fast when limiting to 20 results, but the count version takes about 4.5 seconds on my current tables after lots of optimizing.
If I remove the two joins and where clauses on site tags and gallery tags, the query performs at 1.5 seconds. If I create 3 separate queries - one to select the pay sites, one to select the names and one to pull everything together - I can get the query down to .6 seconds, which is still not good enough. This would also force me to use a stored procedure since I will have to make a total of 4 queries in Hibernate.
For the query "as is", here is some info:
The Handler_read_key is 1746669
The Handler_read_next is 1546324
The gallery table has 40,000 rows
The site table has 900 rows
The name table has 800 rows
The tag table has 3560 rows
I'm pretty new to MySQL and tuning, and I have indexes on the:
'term' column in the tag table
'published' column in the gallery table
'value' for the name table
I am looking to get this query to 0.1 milliseconds.
SELECT count(distinct gallery.id)
from gallery gallery
inner join
site site
on gallery.site_id = site.id
inner join
site_to_tag p2t
on site.id = p2t.site_id
inner join
tag site_tag
on p2t.tag_id = site_tag.id
inner join
gallery_to_name g2mn
on gallery.id = g2mn.gallery_id
inner join
name name
on g2mn.name_id = name.id
inner join
gallery_to_tag g2t
on gallery.id = g2t.gallery_id
inner join
tag tag
on g2t.tag_id = tag.id
where
gallery.published = true and (
name.value LIKE 'sometext%' or
tag.term = 'sometext' or
site.`name` like 'sometext%' or
site_tag.term = 'sometext'
)
Explain Data:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
| 1 | SIMPLE | site | index | PRIMARY,nameIndex | nameIndex | 258 | NULL | 950 | Using index; Using temporary |
| 1 | SIMPLE | gallery | ref | PRIMARY,publishedIndex,FKF44C775296EECE37,publishedSiteIdIndex | FKF44C775296EECE37 | 9 | production.site.id | 20 | Using where |
| 1 | SIMPLE | g2mn | ref | PRIMARY,FK3EFFD7F8AFAD7A5E,FK3EFFD7F832C04188 | FK3EFFD7F8AFAD7A5E | 8 | production.gallery.id | 1 | Using index; Distinct |
| 1 | SIMPLE | name | eq_ref | PRIMARY,valueIndex | PRIMARY | 8 | production.g2mn.name_id | 1 | Distinct |
| 1 | SIMPLE | g2t | ref | PRIMARY,FK3DDB4D63AFAD7A5E,FK3DDB4D63E210FBA6 | FK3DDB4D63AFAD7A5E | 8 | production.g2mn.gallery_id | 2 | Using where; Using index; Distinct |
| 1 | SIMPLE | tag | eq_ref | PRIMARY,termIndex | PRIMARY | 8 | production.g2t.tag_id | 1 | Distinct |
| 1 | SIMPLE | p2t | ref | PRIMARY,FK29424AB796EECE37,FK29424AB7E210FBA6 | PRIMARY | 8 | production.gallery.site_id | 3 | Using where; Using index; Distinct |
| 1 | SIMPLE | site_tag | eq_ref | PRIMARY,termIndex | PRIMARY | 8 | production.p2t.tag_id | 1 | Using where; Distinct |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
Individual Count Speeds:
[SQL] select count(*) from gallery;
Affected rows: 0
Time: 0.014ms
Results: 40385
[SQL]
select count(*) from gallery_to_name;
Affected rows: 0
Time: 0.012ms
Results: 35615
[SQL]
select count(*) from gallery_to_tag;
Affected rows: 0
Time: 0.055ms
Results: 165104
[SQL]
select count(*) from tag;
Affected rows: 0
Time: 0.002ms
Results: 3560
[SQL]
select count(*) from site;
Affected rows: 0
Time: 0.001ms
Results: 901
[SQL]
select count(*) from site_to_tag;
Affected rows: 0
Time: 0.003ms
Results: 7026
I've included my test schema and a script to produce test data at the end of this post. I have used the SQL_NO_CACHE option to prevent MySQL from caching query results - this is just for testing and should ultimately be removed.
This is a similar idea to that proposed by Donnie, but I have tidied it up a little. If I have understood the joins correctly, there is no need to repeat all the joins in each select, as each is effectively independent from the others. The original WHERE clause stipulates that gallery.published must be true and then follows with a series of 4 conditions joined by OR. Each query can therefore be executed separately. Here are the four joins:
gallery <--> gallery_to_name <--> name
gallery <--> gallery_to_tag <--> tag
gallery <--> site
gallery <--> site <--> site_to_tag <--> tag
Because gallery contains site_id, in this case, there's no need for the intermediate join via the site table. The last join can therefore be reduced to this:
gallery <--> site_to_tag <--> tag
Running each SELECT separately, and using UNION to combine the results, is very fast. The results here assume the table structures and indexes shown at the end of this post:
SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
(SELECT g.id
FROM gallery AS g
INNER JOIN site AS s ON s.id = g.site_id
WHERE g.published = TRUE AND s.name LIKE '3GRD%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
WHERE g.published = TRUE AND n.value LIKE '3GRD%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
WHERE g.published = TRUE AND gt.term = '3GRD')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND st.term = '3GRD')
) AS totals;
+---------+
| matches |
+---------+
| 99 |
+---------+
1 row in set (0.00 sec)
The speed does vary depending on the search criteria. In the following example, a different search value is used for each table, and the LIKE operator has to do a little more work, as there are now more potential matches for each:
SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
(SELECT g.id
FROM gallery AS g
INNER JOIN site AS s ON s.id = g.site_id
WHERE g.published = TRUE AND s.name LIKE '3H%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
WHERE g.published = TRUE AND n.value LIKE '3G%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
WHERE g.published = TRUE AND gt.term = '3IDP')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND st.term = '3OJX')
) AS totals;
+---------+
| matches |
+---------+
| 12505 |
+---------+
1 row in set (0.24 sec)
These results compare favourably with the a query which uses multiple joins:
SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
INNER JOIN site AS s ON s.id = g.site_id
INNER JOIN site_to_tag AS s2t ON s2t.site_id = s.id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
gt.term = '3GRD' OR
st.term = '3GRD' OR
n.value LIKE '3GRD%' OR
s.name LIKE '3GRD%');
+---------+
| matches |
+---------+
| 99 |
+---------+
1 row in set (2.62 sec)
SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
INNER JOIN site AS s ON s.id = g.site_id
INNER JOIN site_to_tag AS s2t ON s2t.site_id = s.id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
gt.term = '3IDP' OR
st.term = '3OJX' OR
n.value LIKE '3G%' OR
s.name LIKE '3H%');
+---------+
| matches |
+---------+
| 12505 |
+---------+
1 row in set (3.17 sec)
SCHEMA
The indexes on id columns plus site.name, name.value and tag.term are important:
DROP SCHEMA IF EXISTS `egervari`;
CREATE SCHEMA IF NOT EXISTS `egervari`;
USE `egervari`;
-- -----------------------------------------------------
-- Table `site`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `site` ;
CREATE TABLE IF NOT EXISTS `site` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(255) NOT NULL ,
INDEX `name` (`name` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `gallery`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `gallery` ;
CREATE TABLE IF NOT EXISTS `gallery` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`site_id` INT UNSIGNED NOT NULL ,
`published` TINYINT(1) NOT NULL DEFAULT 0 ,
PRIMARY KEY (`id`) ,
INDEX `fk_gallery_site` (`site_id` ASC) ,
CONSTRAINT `fk_gallery_site`
FOREIGN KEY (`site_id` )
REFERENCES `site` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `name`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `name` ;
CREATE TABLE IF NOT EXISTS `name` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`value` VARCHAR(255) NOT NULL ,
INDEX `value` (`value` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `tag`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `tag` ;
CREATE TABLE IF NOT EXISTS `tag` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`term` VARCHAR(255) NOT NULL ,
INDEX `term` (`term` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `gallery_to_name`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `gallery_to_name` ;
CREATE TABLE IF NOT EXISTS `gallery_to_name` (
`gallery_id` INT UNSIGNED NOT NULL ,
`name_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`gallery_id`, `name_id`) ,
INDEX `fk_gallery_to_name_gallery` (`gallery_id` ASC) ,
INDEX `fk_gallery_to_name_name` (`name_id` ASC) ,
CONSTRAINT `fk_gallery_to_name_gallery`
FOREIGN KEY (`gallery_id` )
REFERENCES `gallery` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_gallery_to_name_name`
FOREIGN KEY (`name_id` )
REFERENCES `name` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `gallery_to_tag`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `gallery_to_tag` ;
CREATE TABLE IF NOT EXISTS `gallery_to_tag` (
`gallery_id` INT UNSIGNED NOT NULL ,
`tag_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`gallery_id`, `tag_id`) ,
INDEX `fk_gallery_to_tag_gallery` (`gallery_id` ASC) ,
INDEX `fk_gallery_to_tag_tag` (`tag_id` ASC) ,
CONSTRAINT `fk_gallery_to_tag_gallery`
FOREIGN KEY (`gallery_id` )
REFERENCES `gallery` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_gallery_to_tag_tag`
FOREIGN KEY (`tag_id` )
REFERENCES `tag` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `site_to_tag`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `site_to_tag` ;
CREATE TABLE IF NOT EXISTS `site_to_tag` (
`site_id` INT UNSIGNED NOT NULL ,
`tag_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`site_id`, `tag_id`) ,
INDEX `fk_site_to_tag_site` (`site_id` ASC) ,
INDEX `fk_site_to_tag_tag` (`tag_id` ASC) ,
CONSTRAINT `fk_site_to_tag_site`
FOREIGN KEY (`site_id` )
REFERENCES `site` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_site_to_tag_tag`
FOREIGN KEY (`tag_id` )
REFERENCES `tag` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
TEST DATA
This populates site with 900 rows, tag with 3560 rows, name with 800 rows and gallery with 40,000 rows, and inserts entries into the link tables:
DELIMITER //
DROP PROCEDURE IF EXISTS populate//
CREATE PROCEDURE populate()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 900 DO
INSERT INTO site (name) VALUES (CONV(i + 1 * 10000, 20, 36));
SET i = i + 1;
END WHILE;
SET i = 0;
WHILE i < 3560 DO
INSERT INTO tag (term) VALUES (CONV(i + 1 * 10000, 20, 36));
INSERT INTO site_to_tag (site_id, tag_id) VALUES ( (i MOD 900) + 1, i + 1 );
SET i = i + 1;
END WHILE;
SET i = 0;
WHILE i < 800 DO
INSERT INTO name (value) VALUES (CONV(i + 1 * 10000, 20, 36));
SET i = i + 1;
END WHILE;
SET i = 0;
WHILE i < 40000 DO
INSERT INTO gallery (site_id, published) VALUES ( (i MOD 900) + 1, i MOD 2 );
INSERT INTO gallery_to_name (gallery_id, name_id) VALUES ( i + 1, (i MOD 800) + 1 );
INSERT INTO gallery_to_tag (gallery_id, tag_id) VALUES ( i + 1, (i MOD 3560) + 1 );
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL populate();
Count's are often slow as they require fetching all data returned by the cursor in order to figure out how many rows would actually be fetched.
How long does it take to do a count on each of the individual tables? Add up the total times - if it's more than 0.1 milliseconds I don't think you'll be able to get the query to execute as fast as you'd like. As far as ways to speed it up goes, you could try pushing some of the WHERE clause criteria into a sub-select, as in
select
count(distinct this_.id) as y0_
from
(select * from gallery where published=?) this_
inner join
site site3_
on this_.site_id=site3_.id
inner join
site_to_tag list7_
on site3_.id=list7_.site_id
inner join
tag sitetag4_
on list7_.tag_id=sitetag4_.id
inner join
gallery_to_name names9_
on this_.id=names9_.gallery_id
inner join
name name2_
on names9_.name_id=name2_.id
inner join
gallery_to_tag list11_
on this_.id=list11_.gallery_id
inner join
tag tag1_
on list11_.tag_id=tag1_.id
where lower(name2_.value) like ? or
tag1_.term=? or
lower(site3_.name) like ? or
lower(this_.description) like ? or
sitetag4_.term=?
How many fields are on each of these tables? Can you use sub-selects to cut down on the amount of data the database has to join together, or do you really need all the columns?
The presence of three LIKE predicates is going to slow things down, as will the use of the LOWER function in the WHERE clause. If you need to be able to do case-insensitive compares it might be better to have two fields, one in 'normal' (as typed in) case and one stored in lower (or UPPER) case to do insensitive searches on. You could use a trigger to keep the lower/UPPER one in sync with the 'normal' case version.
I hope this helps.
EDIT:
Looking at the EXPLAIN PLAN output it doesn't appear that the fields used in your WHERE clause are indexed - or at least it appears the indexes aren't being used. This could be a by-product of all the OR predicates in the WHERE. If these fields aren't indexed, you might try indexing them.
It appears that your WHERE clause may be the offender, especially the following:
lower(name2_.value) like ?
According to MySQL documentation:
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default.
You may not need the LOWER() function in your WHERE clause. Functions on the left side of the comparison prevent the use of indexes.
What do your LIKE values look like? If you are using a wildcard on the left side of the value, it prevents the use of indexes.
Try replacing your OR statements with UNION.
Try running the query without DISTINCT just to see how much it's affecting your query.
OR murders query performance, even with good indexes. It gets worse as tables get larger.
This is horrifcly ugly, but it's likely to be faster (at the expense of readability, obviously). If MySQL only supported CTEs then this would be much, much neater.
You could also look into writing a short batch and selecting the common part of the repeated query into a temp table and then doing everything against the temp table. You may or may not have to index the temp table for this to work out well, it depends on row counts really.
(Note that union already does a distinct, so there's no need to do it again the count and force another sort)
select
count(id)
from (
SELECT gallery.id
from gallery gallery
inner join
site site
on gallery.site_id = site.id
inner join
site_to_tag p2t
on site.id = p2t.site_id
inner join
tag site_tag
on p2t.tag_id = site_tag.id
inner join
gallery_to_name g2mn
on gallery.id = g2mn.gallery_id
inner join
name name
on g2mn.name_id = name.id
inner join
gallery_to_tag g2t
on gallery.id = g2t.gallery_id
inner join
tag tag
on g2t.tag_id = tag.id
where
gallery.published = true and name.value like 'sometext%'
UNION
SELECT gallery.id
from gallery gallery
inner join
site site
on gallery.site_id = site.id
inner join
site_to_tag p2t
on site.id = p2t.site_id
inner join
tag site_tag
on p2t.tag_id = site_tag.id
inner join
gallery_to_name g2mn
on gallery.id = g2mn.gallery_id
inner join
name name
on g2mn.name_id = name.id
inner join
gallery_to_tag g2t
on gallery.id = g2t.gallery_id
inner join
tag tag
on g2t.tag_id = tag.id
where
gallery.published = true and tag.term = 'sometext'
UNION
SELECT gallery.id
from gallery gallery
inner join
site site
on gallery.site_id = site.id
inner join
site_to_tag p2t
on site.id = p2t.site_id
inner join
tag site_tag
on p2t.tag_id = site_tag.id
inner join
gallery_to_name g2mn
on gallery.id = g2mn.gallery_id
inner join
name name
on g2mn.name_id = name.id
inner join
gallery_to_tag g2t
on gallery.id = g2t.gallery_id
inner join
tag tag
on g2t.tag_id = tag.id
where
gallery.published = true and site.`name` like 'sometext%'
UNION
SELECT gallery.id
from gallery gallery
inner join
site site
on gallery.site_id = site.id
inner join
site_to_tag p2t
on site.id = p2t.site_id
inner join
tag site_tag
on p2t.tag_id = site_tag.id
inner join
gallery_to_name g2mn
on gallery.id = g2mn.gallery_id
inner join
name name
on g2mn.name_id = name.id
inner join
gallery_to_tag g2t
on gallery.id = g2t.gallery_id
inner join
tag tag
on g2t.tag_id = tag.id
where
gallery.published = true and site_tag.term = 'sometext'
) as x
I admit I didn't take the time to fully understand your tables and queries. However, for the kind of response time you're asking for, and for the apparent complexity of the current suggestions, I would say this is one of those situations where (instead of asking SQL to tally all the records I want to count) I'd keep a separate table of always-up-to-date counts, and always update any appropriate counts with triggered code upon any record add/change/delete.
Eg, imagine a transaction file with a million rows, and I want the total of field 2. I can ask the db to SUM() the field, or I can keep a separate total for field 2 in a table somewhere that gets adjusted any time a record is added, deleted, or has field 2 edited. It's redundant, but super fast when I want to know the total. And I can always SUM() if I want to audit my separate computed total.
Hm... just looking at your post for two minutes, so my answer might not be perfect... but have you thought of introducing an index table that links to the other entities?
like
CREATE TABLE `references`
`text` VARC>HAR(...) NOT NULL,
`name` VARCHAR(255) NOT NULL,
`reference_type` WHATEVER, // enum or what suits your needs
`reference_id` INTEGER NOT NULL
);
Then just query this table:
SELECT COUNT(*) FROM references WHERE sometext LIKE ...;
Would have to handle the cases with 'sometext%' though...
Also, is the number of galleries really important, or is your query just intended to check whether a single one exists?