Optimizing Oracle Query - sql

Running explain plan on this query I am getting Full table Access.
Two tables used are:
user_role: 803507 rows
cmp_role: 27 rows
Query:
SELECT
r.user_id, r.role_id, r.participant_code, MAX(status_id)
FROM
user_role r,
cmp_role c
WHERE
r.role_id = c.role_id
AND r.participant_code IS NOT NULL
AND c.group_id = 3
GROUP BY
r.user_id, r.role_id, r.participant_code
HAVING MAX(status_id) IN (SELECT b.status_id FROM USER_ROLE b
WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 )
AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date
))
)
How can I better write this query so that it returns results in a decent time. Following are the indexes:
idx 1 = role_id
idx 2 = last_updt_user_id
idx 3 = actv_id, participant_code, effective_from_Date, effective_to_date
idx 4 = user_id, role_id, effective_from_Date, effective_to_date
idx 5 = participant_code, user_id, roke_id, actv_cd
Explain plan:
Q_PLAN
--------------------------------------------------------------------------------
SELECT STATEMENT
FILTER
HASH GROUP BY
HASH JOIN
TABLE ACCESS BY INDEX ROWID ROLE
INDEX RANGE SCAN N_ROLE_IDX2
TABLE ACCESS FULL USER_ROLE
TABLE ACCESS BY INDEX ROWID USER_ROLE
INDEX UNIQUE SCAN U_USER_ROLE_IDX1
FILTER
HASH GROUP BY
HASH JOIN
TABLE ACCESS BY INDEX ROWID ROLE
INDEX RANGE SCAN N_ROLE_IDX2
TABLE ACCESS FULL USER_ROLE
TABLE ACCESS BY INDEX ROWID USER_ROLE
INDEX UNIQUE SCAN U_USER_ROLE_IDX1
I do not have enough priveleges to run stats on the table
Tried the following changes but it shaves off 1 or 2 seconds only:
WITH CTE AS (SELECT b.status_id FROM USER_ROLE b
WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 )
AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date
))
)
SELECT
r.user_id, r.role_id, r.participant_code, MAX(status_id)
FROM
user_role r,
cmp_role c
WHERE
r.role_id = c.role_id
AND r.participant_code IS NOT NULL
AND c.group_id = 3
GROUP BY
r.user_id, r.role_id, r.participant_code
HAVING MAX(status_id) IN (select * from CTE)

Firstly you have the subquery
SELECT b.status_id FROM USER_ROLE b
WHERE (b.ACTIVE = 1
OR ( b.ACTIVE IN ( 0,3 )
AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date )
)
There is no way that you can do anything other than a full table scan to get that result.
You may be missing a join, but not knowing what you expect your query to do, there's no way for us to tell.
Secondly, depending on the proportion of cmp_role records with a group_id of 3, and the proportion of user_role than match those roles, it may be better off doing the full scan there. If, say, 3 out of the 27 cmp_role records are in group 3, and 100,000 of the user_role records match those cmp_role records, then it can be more efficient doing a single scan of the table than doing 100,000 index lookups.

Collect statistics for the tables
explain plan for the query and show the results.

I think the following approach will work.I would have thought the subquery will be evaluated only once since it is not correlated - this doesnt seem to be the case.I tried a similar query (simple) against sales table in sh demo schema. I modified it to use a Materialized CTE approach and it ran in 1 second as opposed to 18 sec. See below for the approach.This was 10 times faster
with cte as (
select /*+materialize*/ max(amount_sold) from sales)
select prod_id,sum(amount_sold) from
sales
group by prod_id
having max(amount_sold) in(
select * from cte)
/
So in you case you materialize the subquery as
with CTE as (
SELECT /*+ materialize */ b.status_id FROM USER_ROLE b
WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 )
AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date
))
)
)
and select FROM CTE in main query

So you have a query that currently takes 16,5 seconds and you want it to run faster. To do that, you need to know where those 16,5 seconds are spent on. The Oracle database is extremely well instrumented, so you can see in great detail what it is doing. You can check it this thread that I wrote on OTN Forums:
http://forums.oracle.com/forums/thread.jspa?messageID=1812597
Without knowing where your time is being spent, all efforts are just guesses ...
Regards,
Rob.

Related

SELECT 100 last entries with maximum 3 entries per unique user id

I'm having the following request to get all artworks inner join with their user info:
SELECT a.*, row_to_json(u.*) as users
FROM artworks a INNER JOIN users u USING(address)
WHERE (a.flag != "ILLEGAL" OR a.flag IS NULL)
ORDER BY a.date DESC
LIMIT 100
How could i have the same query but including no more than 3 entries per user?
Each user have a unique id called "address"
I think DISTINCT ON only work for 1 per user, maybe ROW_NUMBER?
Thank you in advance, i'm pretty new to DB queries.
You need an extra column in which you specify the nth time that the user is in the table. This will look something like this:
USER | N
user1 | 1
user1 | 2
user1 | 3
user2 | 1
user2 | 2
Getting the extra column in a new table can be done by using the following code
--Create new Table as T
WITH T AS (
SELECT TOP 100
a.*,
row_to_json(u.*) as users,
ROW_NUMBER() OVER(PARTITION BY u.user ORDER BY a.date DESC) AS N
FROM artworks a INNER JOIN users u USING(address)
WHERE (a.flag != "ILLEGAL" OR a.flag IS NULL) )
--Select columns from your new table
SELECT columns from T
WHERE (T.N =1 OR T.N =2 OR T.N =3)
Just an addition to your original query will do. Count the resulting records for each user and then filter by the counter value.
I am using users.address as the user id.
SELECT * from
(
SELECT a.*, row_to_json(u.*) as userinfo,
row_number() over (partition by u.address order by a.date desc) as ucount
FROM artworks a INNER JOIN users u ON a.address = u.address
WHERE a.flag != "ILLEGAL" OR a.flag IS NULL
) t
WHERE ucount <= 3
ORDER BY date DESC
LIMIT 100;
A remark - you have users as a column alias and as a table name which may cause confusion. I have changed the alias to userinfo.

Optimize a complex PostgreSQL Query

I am attempting to make a complex SQL join on several tables: as shown below. I have included an image of the dB schema also.
Consider table_1 -
e_id name
1 a
2 b
3 c
4 d
and table_2 -
e_id date
1 1/1/2019
1 1/1/2020
2 2/1/2019
4 2/1/2019
The issue here is performance. From the tables 2 - 4 we only want the most recent entry for a given e_id but because these tables contain historical data (~ >3.5M rows) it's quite slow. I've attached an example of how we're currently trying to achieve this but it only includes one join of 'table_1' with 'table_x'. We group by e_id and get the max date for it. The other way we've thought about doing this is creating a Materialized View and pulling data from that and refreshing it after some period of time. Any improvements welcome.
from fds.region as rg
inner join (
select e_id, name, p_id
from fds.table_1
where sec_type = 'S' AND active_flag = 1
) as table_1 on table_1.e_id = rg.e_id
inner join fds.table_2 table_2 on table_2.e_id = rg.e_id
inner join fds.sec sec on sec.p_id = table_1.p_id
inner join fds.entity ent on ent.int_entity_id = sec.int_entity_id
inner join (
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4 on table_4.e_id = rg.e_id
where rg.region_str like '%US' and ent.sec_type = 'P'
order by table_2.int_price
limit 500;
You can simplify this logic:
(
SELECT int_1.e_id, int_1.date, int_1.int_price
FROM fds.table_4 int_1
INNER JOIN (
SELECT e_id, MAX(date) date
FROM fds.table_2
GROUP BY e_id
) int_2 ON int_1.e_id = int_2.fsym_id AND int_1.date = int_2.date
) as table_4
To:
(SELECT DISTINCT ON (int_1.e_id) int_1.*
FROM fds.table_4 int_1
ORDER BY int_1.e_id, int_1.date DESC
) table_4
This can take advantage of an index on fds.table_4(e_id, date desc) -- and might be wicked fast with such an index.
You also want appropriate indexes for the joins and filtering. However, it is hard to be more specific without an execution plan.

Postgres LEFT JOIN with SUM, missing records

I am trying to get the count of certain types of records in a related table. I am using a left join.
So I have a query that isn't quite right and one that is returning the correct results. The correct results query has a higher execution cost. Id like to use the first approach, if I can correct the results. (see http://sqlfiddle.com/#!15/7c20b/5/2)
CREATE TABLE people(
id SERIAL,
name varchar not null
);
CREATE TABLE pets(
id SERIAL,
name varchar not null,
kind varchar not null,
alive boolean not null default false,
person_id integer not null
);
INSERT INTO people(name) VALUES
('Chad'),
('Buck'); --can't keep pets alive
INSERT INTO pets(name, alive, kind, person_id) VALUES
('doggio', true, 'dog', 1),
('dog master flash', true, 'dog', 1),
('catio', true, 'cat', 1),
('lucky', false, 'cat', 2);
My goal is to get a table back with ALL of the people and the counts of the KINDS of pets they have alive:
| ID | ALIVE_DOGS_COUNT | ALIVE_CATS_COUNT |
|----|------------------|------------------|
| 1 | 2 | 1 |
| 2 | 0 | 0 |
I made the example more trivial. In our production app (not really pets) there would be about 100,000 dead dogs and cats per person. Pretty screwed up I know, but this example is simpler to relay ;) I was hoping to filter all the 'dead' stuff out before the count. I have the slower query in production now (from sqlfiddle above), but would love to get the LEFT JOIN version working.
Typically fastest if you fetch all or most rows:
SELECT pp.id
, COALESCE(pt.a_dog_ct, 0) AS alive_dogs_count
, COALESCE(pt.a_cat_ct, 0) AS alive_cats_count
FROM people pp
LEFT JOIN (
SELECT person_id
, count(kind = 'dog' OR NULL) AS a_dog_ct
, count(kind = 'cat' OR NULL) AS a_cat_ct
FROM pets
WHERE alive
GROUP BY 1
) pt ON pt.person_id = pp.id;
Indexes are irrelevant here, full table scans will be fastest. Except if alive pets are a rare case, then a partial index should help. Like:
CREATE INDEX pets_alive_idx ON pets (person_id, kind) WHERE alive;
I included all columns needed for the query (person_id, kind) to allow index-only scans.
SQL Fiddle.
Typically fastest for a small subset or a single row:
SELECT pp.id
, count(kind = 'dog' OR NULL) AS alive_dogs_count
, count(kind = 'cat' OR NULL) AS alive_cats_count
FROM people pp
LEFT JOIN pets pt ON pt.person_id = pp.id
AND pt.alive
WHERE <some condition to retrieve a small subset>
GROUP BY 1;
You should at least have an index on pets.person_id for this (or the partial index from above) - and possibly more, depending ion the WHERE condition.
Related answers:
Query with LEFT JOIN not returning rows for count of 0
GROUP or DISTINCT after JOIN returns duplicates
Get count of foreign key from multiple tables
Your WHERE alive=true is actually filtering out record for person_id = 2. Use the below query, push the WHERE alive=true condition into the CASE condition as can be noticed here. See your modified Fiddle
SELECT people.id,
pe.alive_dogs_count,
pe.alive_cats_count
FROM people
LEFT JOIN
(
select person_id,
COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
from pets
GROUP BY person_id
) pe on people.id = pe.person_id
(OR) your version
SELECT
people.id,
COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
FROM people
LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;
JOIN with SUM
I think your original query was something like this:
SELECT people.id, stats.dog, stats.cat
FROM people
JOIN (SELECT person_id, count(kind)filter(where kind='dog') dog, count(kind)filter(where kind='cat') cat FROM pets WHERE alive GROUP BY person_id) stats
ON stats.person_id = people.id
That works smoothly, but you should understand, that the result will miss the people with 0 pets, because of inner join.
In order to include people who miss pets, you can:
firstly LEFT JOIN,
then GROUP BY joined result
and be ready for NULL values instead of counts.
See the accepted answer above.
Credits to #ErwinBrandstetter
Slowness
In contrast to other DBMS', Postgresql doesn't create indexes for foreign keys.
One multicolumn index will be more efficient than three single indexes. Extend the foreign key index with extra columns from WHERE and JOIN ON columns in the right order:
CREATE INDEX people_fk_with_kind_alive ON test2 (person_id, alive, kind);
REF: https://postgresql.org/docs/11/indexes-multicolumn.html
Of course, your primary keys should be defined. The primary key will be indexed by default.

GROUP BY and COUNT in PostgreSQL

The query:
SELECT COUNT(*) as count_all,
posts.id as post_id
FROM posts
INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id;
Returns n records in Postgresql:
count_all | post_id
-----------+---------
1 | 6
3 | 4
3 | 5
3 | 1
1 | 9
1 | 10
(6 rows)
I just want to retrieve the number of records returned: 6.
I used a subquery to achieve what I want, but this doesn't seem optimum:
SELECT COUNT(*) FROM (
SELECT COUNT(*) as count_all, posts.id as post_id
FROM posts
INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id
) as x;
How would I get the number of records in this context right in PostgreSQL?
I think you just need COUNT(DISTINCT post_id) FROM votes.
See "4.2.7. Aggregate Expressions" section in http://www.postgresql.org/docs/current/static/sql-expressions.html.
EDIT: Corrected my careless mistake per Erwin's comment.
There is also EXISTS:
SELECT count(*) AS post_ct
FROM posts p
WHERE EXISTS (SELECT FROM votes v WHERE v.post_id = p.id);
In Postgres and with multiple entries on the n-side like you probably have, it's generally faster than count(DISTINCT post_id):
SELECT count(DISTINCT p.id) AS post_ct
FROM posts p
JOIN votes v ON v.post_id = p.id;
The more rows per post there are in votes, the bigger the difference in performance. Test with EXPLAIN ANALYZE.
count(DISTINCT post_id) has to read all rows, sort or hash them, and then only consider the first per identical set. EXISTS will only scan votes (or, preferably, an index on post_id) until the first match is found.
If every post_id in votes is guaranteed to be present in the table posts (referential integrity enforced with a foreign key constraint), this short form is equivalent to the longer form:
SELECT count(DISTINCT post_id) AS post_ct
FROM votes;
May actually be faster than the EXISTS query with no or few entries per post.
The query you had works in simpler form, too:
SELECT count(*) AS post_ct
FROM (
SELECT FROM posts
JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id
) sub;
Benchmark
To verify my claims I ran a benchmark on my test server with limited resources. All in a separate schema:
Test setup
Fake a typical post / vote situation:
CREATE SCHEMA y;
SET search_path = y;
CREATE TABLE posts (
id int PRIMARY KEY
, post text
);
INSERT INTO posts
SELECT g, repeat(chr(g%100 + 32), (random()* 500)::int) -- random text
FROM generate_series(1,10000) g;
DELETE FROM posts WHERE random() > 0.9; -- create ~ 10 % dead tuples
CREATE TABLE votes (
vote_id serial PRIMARY KEY
, post_id int REFERENCES posts(id)
, up_down bool
);
INSERT INTO votes (post_id, up_down)
SELECT g.*
FROM (
SELECT ((random()* 21)^3)::int + 1111 AS post_id -- uneven distribution
, random()::int::bool AS up_down
FROM generate_series(1,70000)
) g
JOIN posts p ON p.id = g.post_id;
All of the following queries returned the same result (8093 of 9107 posts had votes).
I ran 4 tests with EXPLAIN ANALYZE ant took the best of five on Postgres 9.1.4 with each of the three queries and appended the resulting total runtimes.
As is.
After ..
ANALYZE posts;
ANALYZE votes;
After ..
CREATE INDEX foo on votes(post_id);
After ..
VACUUM FULL ANALYZE posts;
CLUSTER votes using foo;
count(*) ... WHERE EXISTS
253 ms
220 ms
85 ms -- winner (seq scan on posts, index scan on votes, nested loop)
85 ms
count(DISTINCT x) - long form with join
354 ms
358 ms
373 ms -- (index scan on posts, index scan on votes, merge join)
330 ms
count(DISTINCT x) - short form without join
164 ms
164 ms
164 ms -- (always seq scan)
142 ms
Best time for original query in question:
353 ms
For simplified version:
348 ms
#wildplasser's query with a CTE uses the same plan as the long form (index scan on posts, index scan on votes, merge join) plus a little overhead for the CTE. Best time:
366 ms
Index-only scans in the upcoming PostgreSQL 9.2 can improve the result for each of these queries, most of all for EXISTS.
Related, more detailed benchmark for Postgres 9.5 (actually retrieving distinct rows, not just counting):
Select first row in each GROUP BY group?
Using OVER() and LIMIT 1:
SELECT COUNT(1) OVER()
FROM posts
INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id
LIMIT 1;
WITH uniq AS (
SELECT DISTINCT posts.id as post_id
FROM posts
JOIN votes ON votes.post_id = posts.id
-- GROUP BY not needed anymore
-- GROUP BY posts.id
)
SELECT COUNT(*)
FROM uniq;
For followers, I like the OP's inner query method:
SELECT COUNT(*) FROM (
SELECT COUNT(*) as count_all, posts.id as post_id
FROM posts
INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id
) as x;
Since then you can use HAVING in there as well:
SELECT COUNT(*) FROM (
SELECT COUNT(*) as count_all, posts.id as post_id
FROM posts
INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id HAVING count(*) > 1
) as x;
Or the equivalent CTE
with posts_coalesced as (
SELECT COUNT(*) as count_all, posts.id as post_id
FROM posts
INNER JOIN votes ON votes.post_id = posts.id
GROUP BY posts.id )
select count(*) from posts_coalesced;

MySQL query - possible to include this clause?

I have the following query, which retrieves 4 adverts from certain categories in a random order.
At the moment, if a user has more than 1 advert, then potentially all of those ads might be retrieved - I need to limit it so that only 1 ad per user is displayed.
Is this possible to achieve in the same query?
SELECT a.advert_id, a.title, a.url, a.user_id,
FLOOR(1 + RAND() * x.m_id) 'rand_ind'
FROM adverts AS a
INNER JOIN advert_categories AS ac
ON a.advert_id = ac.advert_id,
(
SELECT MAX(t.advert_id) - 1 'm_id'
FROM adverts t
) x
WHERE ac.category_id IN
(
SELECT category_id
FROM website_categories
WHERE website_id = '8'
)
AND a.advert_type = 'text'
GROUP BY a.advert_id
ORDER BY rand_ind
LIMIT 4
Note: The solution is the last query at the bottom of this answer.
Test Schema and Data
create table adverts (
advert_id int primary key, title varchar(20), url varchar(20), user_id int, advert_type varchar(10))
;
create table advert_categories (
advert_id int, category_id int, primary key(category_id, advert_id))
;
create table website_categories (
website_id int, category_id int, primary key(website_id, category_id))
;
insert website_categories values
(8,1),(8,3),(8,5),
(1,1),(2,3),(4,5)
;
insert adverts (advert_id, title, user_id) values
(1, 'StackExchange', 1),
(2, 'StackOverflow', 1),
(3, 'SuperUser', 1),
(4, 'ServerFault', 1),
(5, 'Programming', 1),
(6, 'C#', 2),
(7, 'Java', 2),
(8, 'Python', 2),
(9, 'Perl', 2),
(10, 'Google', 3)
;
update adverts set advert_type = 'text'
;
insert advert_categories values
(1,1),(1,3),
(2,3),(2,4),
(3,1),(3,2),(3,3),(3,4),
(4,1),
(5,4),
(6,1),(6,4),
(7,2),
(8,1),
(9,3),
(10,3),(10,5)
;
Data properties
each website can belong to multiple categories
for simplicity, all adverts are of type 'text'
each advert can belong to multiple categories. If a website has multiple categories that are matched multiple times in advert_categories for the same user_id, this causes the advert_id's to show twice when using a straight join between 3 tables in the next query.
This query joins the 3 tables together (notice that ids 1, 3 and 10 each appear twice)
select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
inner join adverts a on a.advert_id = ac.advert_id and a.advert_type = 'text'
where wc.website_id='8'
order by a.advert_id
To make each website show only once, this is the core query to show all eligible ads, each only once
select *
from adverts a
where a.advert_type = 'text'
and exists (
select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
where wc.website_id='8'
and a.advert_id = ac.advert_id)
The next query retrieves all the advert_id's to be shown
select advert_id, user_id
from (
select
advert_id, user_id,
#r := #r + 1 r
from (select #r:=0) r
cross join
(
# core query -- vvv
select a.advert_id, a.user_id
from adverts a
where a.advert_type = 'text'
and exists (
select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
where wc.website_id='8'
and a.advert_id = ac.advert_id)
# core query -- ^^^
order by rand()
) EligibleAdsAndUserIDs
) RowNumbered
group by user_id
order by r
limit 2
There are 3 levels to this query
aliased EligibleAdsAndUserIDs: core query, sorted randomly using order by rand()
aliased RowNumbered: row number added to core query, using MySQL side-effecting #variables
the outermost query forces mysql to collect rows as numbered randomly in the inner queries, and group by user_id causes it to retain only the first row for each user_id. limit 2 causes the query to stop as soon as two distinct user_id's have been encountered.
This is the final query which takes the advert_id's from the previous query and joins it back to table adverts to retrieve the required columns.
only once per user_id
feature user's with more ads proportionally (statistically) to the number of eligible ads they have
Note: Point (2) works because the more ads you have, the more likely you will hit the top placings in the row numbering subquery
select a.advert_id, a.title, a.url, a.user_id
from
(
select advert_id
from (
select
advert_id, user_id,
#r := #r + 1 r
from (select #r:=0) r
cross join
(
# core query -- vvv
select a.advert_id, a.user_id
from adverts a
where a.advert_type = 'text'
and exists (
select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
where wc.website_id='8'
and a.advert_id = ac.advert_id)
# core query -- ^^^
order by rand()
) EligibleAdsAndUserIDs
) RowNumbered
group by user_id
order by r
limit 2
) Top2
inner join adverts a on a.advert_id = Top2.advert_id;
I'm thinking through something but don't have MySQL available.. can you try this query to see if it works or crashes...
SELECT
PreQuery.user_id,
(select max( tmp.someRandom ) from PreQuery tmp where tmp.User_ID = PreQuery.User_ID ) MaxRandom
from
( select adverts.user_id,
rand() someRandom
from adverts, advert_categories
where adverts.advert_id = advert_categories.advert_id ) PreQuery
If the "tmp" alias is recognized as a temp buffer of the preliminary query as defined by the OUTER FROM clause, I might have something that will work... I think the field as a select statement from a queried from WONT work, but if it does, I know I'll have something solid for you.
Ok, this one might make the head hurt a bit, but lets get the logical thing going... The inner most "Core Query" is a basis that gets all unique and randomly assigned QUALIFIED Users that have a qualifying ad base on the category chosen, and type = 'text'. Since the order is random, I don't care what the assigned sequence is, and order by that. The limit 4 will return the first 4 entries that qualify. This is regardless of one user having 1 ad vs another having 1000 ads.
Next, join to the advertisements, reversing the table / join qualifications... but by having a WHERE - IN SUB-SELECT, the sub-select will be on each unique USER ID that was qualified by the "CoreQuery" and will ONLY be done 4 times based on ITs inner limit. So even if 100 users with different advertisements, we get 4 users.
Now, the Join to the CoreQuery is the Advert Table based on the same qualifying user. Typically this would join ALL records against the core query given they are for the same user in question... This is correct... HOWEVER, the NEXT WHERE clause is what filters it down to only ONE ad for the given person.
The Sub-Select is making sure its "Advert_ID" matches the one selected in the sub-select. The sub-select is based ONLY on the current "CoreQuery.user_ID" and gets ALL the qualifying category / ads for the user (wrong... we don't want ALL ads)... So, by adding an ORDER BY RAND() will randomize only this one person's ads in the result set... then Limiting THAT by 1 will only give ONE of their qualified ads...
So, the CoreQuery restricts down to 4 users. Then for each qualified user ID, gets only 1 of the qualified ads (by its inner order by RAND() and LIMIT 1 )...
Although I don't have MySQL to try, the queries are COMPLETELY legit and hope it works for you.... man, I love brain teasers like this...
SELECT
ad1.*
from
( SELECT ad.user_id,
count(*) as UserAdCount,
RAND() as ANYRand
from
website_categories wc
inner join advert_categories ac
ON wc.category_id = ac.category_id
inner join adverts ad
ON ac.advert_id = ad.advert_id
AND ad.advert_type = 'text'
where
wc.website_id = 8
GROUP BY
1
order by
3
limit
4 ) CoreQuery,
adverts ad1
WHERE
ad1.advert_type = 'text'
AND CoreQuery.User_ID = ad1.User_ID
AND ad1.advert_id in
( select
ad2.advert_id
FROM
adverts ad2,
advert_categories ac2,
website_categories wc2
WHERE
ad2.user_id = CoreQuery.user_id
AND ad2.advert_id = ac2.advert_id
AND ac2.category_id = wc2.category_id
AND wc2.website_id = 8
ORDER BY
RAND()
LIMIT
1 )
I like to suggest that you do the random with php. This is way faster than doing it in mySQL.
"However, when the table is large (over about 10,000 rows) this method of selecting a random row becomes increasingly slow with the size of the table and can create a great load on the server. I tested this on a table I was working that contained 2,394,968 rows. It took 717 seconds (12 minutes!) to return a random row."
http://www.greggdev.com/web/articles.php?id=6
set #userid = -1;
select
a.id,
a.title,
case when #userid = a.userid then
0
else
1
end as isfirst,
(#userid := a.userid)
from
adverts a
inner join advertcategories ac on ac.advertid = a.advertid
inner join categories c on c.categoryid = ac.categoryid
where
c.website = 8
order by
a.userid,
rand()
having
isfirst = 1
limit 4
Add COUNT(a.user_id) as owned in the main select directive and add HAVING owned < 2 after Group By
http://dev.mysql.com/doc/refman/5.5/en/select.html
I think this is the way to do it, if the one user has more than one advert then we will not select it.