How to implement/improve/ make faster this query with joins - sql

Please bear with me I am not skilled in SQL:
I have three tables
1) Notifications - stores all my data
2) GroupTable - Has the names of groups and related id
3) GroupUser - this table maps Uname and Udob to a group from GroupTable.
Now before I fetch records from Notifications I want to check the GroupTable for GroupID take this GroupID and look in GroupUser for all the records in this GroupID (Names,DOB as these are unique) Once I get this data I want to fetch records from Notifications table for the Names and DOB's in ascending order of the date:
So far I have the following query, it works fine just that I am not satisfied and I think this can be improved:
SELECT
*
FROM
(SELECT
*
FROM Notifications
WHERE
DateToNotify < '2016-03-24' AND
NotificationDateFor IN
(SELECT gu.Name
FROM GroupUser AS gu
INNER JOIN GroupTable AS gt ON
gu.GroupID = gt._id AND
gt.GroupName = "Groupn"
) AND
DOB IN
(SELECT gu.DOB
FROM GroupUser AS gu
INNER JOIN GroupTable AS gt ON
gu.GroupID = gt._id AND
gt.GroupName = "Groupn"
)
) as T
ORDER BY
SUBSTR(DATE('NOW'), 0) > SUBSTR(DateToNotify, 0)
, SUBSTR(DateToNotify, 0)

I don't think that you would get this faster with joins instead of the IN clauses. It can be that re-writing would not even change the execution plan, because the dbms tries to access the data in the optimal way anyhow.
It seems a bit strange that you don't look for group users matching name and dob, but only ensure that there are group users matching the name and - possibly other - group users matching the dob. But as you say that the query works fine as is, okay.
EDIT: Okay, according to your comment you actually want groupuser matches on both name and dob. So what you are looking for would be
AND (NotificationDateFor, DOB) IN (SELECT gu.Name, gu.DOB FROM ...)
But SQLite doesn't support this beautiful syntax (Oracle is the only dbms I know of that does).
So you either join or use EXISTS.
With JOIN:
select distinct n.*
from notifications n
join
(
select name, dob
from groupuser
where groupid = (select _id from grouptable where groupname = 'groupn')
) as gu on n.notificationdatefor = gu.name and n.dob = gu.dob
where n.datetonotify < '2016-03-24'
order by date('now') > n.datetonotify, n.datetonotify;
With EXISTS:
select *
from notifications n
where datetonotify < '2016-03-24'
and exists
(
select *
from groupuser gu
where gu.groupid = (select _id from grouptable where groupname = 'groupn')
and gu.name = n.notificationdatefor
and gu.dob = n.dob
)
order by date('now') > n.datetonotify, n.datetonotify;

Related

SQL - Select highest value when data across 3 tables

I have 3 tables:
Person (with a column PersonKey)
Telephone (with columns Tel_NumberKey, Tel_Number, Tel_NumberType e.g. 1=home, 2=mobile)
xref_Person+Telephone (columns PersonKey, Tel_NumberKey, CreatedDate, ModifiedDate)
I'm looking to get the most recent (e.g. the highest Tel_NumberKey) from the xref_Person+Telephone for each Person and use that Tel_NumberKey to get the actual Tel_Number from the Telephone table.
The problem I am having is that I keep getting duplicates for the same Tel_NumberKey. I also need to be sure I get both the home and mobile from the Telephone table, which I've been looking to do via 2 individual joins for each Tel_NumberType - again getting duplicates.
Been trying the following but to no avail:
-- For HOME
SELECT
p.PersonKey, pn.Phone_Number, pn.Tel_NumberKey
FROM
Persons AS p
INNER JOIN
xref_Person+Telephone AS x ON p.PersonKey = x.PersonKey
INNER JOIN
Telephone AS pn ON x.Tel_NumberKey = pn.Tel_NumberKey
WHERE
pn.Tel_NumberType = 1 -- e.g. Home phone number
AND pn.Tel_NumberKey = (SELECT MAX(pn1.Tel_NumberKey) AS Tel_NumberKey
FROM Person AS p1
INNER JOIN xref_Person+Telephone AS x1 ON p1.PersonKey = x1.PersonKey
INNER JOIN Telephone AS pn1 ON x1.Tel_NumberKey = pn1.Tel_NumberKey
WHERE pn1.Tel_NumberType = 1
AND p1.PersonKey = p.PersonKey
AND pn1.Tel_Number = pn.Tel_Number)
ORDER BY
p.PersonKey
And have been looking over the following links but again keep getting duplicates.
SQL select max(date) and corresponding value
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
SQL Server: SELECT only the rows with MAX(DATE)
Am sure this must be possible but been at this a couple of days and can't believe its that difficult to get the most recent / highest value when referencing 3 tables. Any help greatly appreciated.
select *
from
( SELECT p.PersonKey, pn.Phone_Number, pn.Tel_NumberKey
, row_number() over (partition by p.PersonKey, pn.Phone_Number order by pn.Tel_NumberKey desc) rn
FROM
Persons AS p
INNER JOIN
xref_Person+Telephone AS x ON p.PersonKey = x.PersonKey
INNER JOIN
Telephone AS pn ON x.Tel_NumberKey = pn.Tel_NumberKey
WHERE
pn.Tel_NumberType = 1
) tt
where tt.rn = 1
ORDER BY
tt.PersonKey
you have to use max() function and then you have to order by rownum in descending order like.
select f.empno
from(select max(empno) empno from emp e
group by rownum)f
order by rownum desc
It will give you all employees having highest employee number to lowest employee number. Now implement it with your case then let me know.

SQL SUBQUERY ON SELF

I have a table with the following data:
licence_number
date_of_birth
organisation
I want to do a query where:
Get the licence_numbers and dobs in organisation1 where the same
licence numbers and dobs are in organisation2.
I know it cant be that hard, but im struggling.
You can group by license_number and date_of_birth where organization is set to either of the two interesting organizations, and count how many distinct organizations there are in a group.
If there are two out of two possible in a single group, you have a hit.
SELECT license_number, date_of_birth
FROM mytable
WHERE organisation IN ('organisation1', 'organisation2')
GROUP BY license_number, date_of_birth
HAVING COUNT(DISTINCT organisation) = 2;
...or you can use INTERSECT;
SELECT license_number, date_of_birth
FROM mytable WHERE organisation = 'organisation1'
INTERSECT
SELECT license_number, date_of_birth
FROM mytable WHERE organisation = 'organisation2'
An SQLfiddle to test both.
select
from t t0
where
organization = 'organization1'
and
exists (
select 1
from t
where
organization = 'organization2'
and
licence_number = t0.licence_number
and
date_of_birth = t0.date_of_birth
)
You can just self join the table where the licence number and the dates are the same but the organisation isn't:
SELECT DISTINCT p1.licence_number, p1.date_of_birth
FROM people p1
INNER JOIN people p2
ON p1.licence_number = p2.licence_number AND
p1.date_of_birth = p2.date_of_birth AND
p1.organisation <> p2.organisation
SQL Fiddle here
Is it a JOIN or 1 table?
Select
[licence_number],
[date_of_birth],
[organisation]
From YourTable
Where organisation1 = organisation2
--OR
Select
[licence_number],
[date_of_birth],
[organisation]
From YourTable
Where organisation1 IN ('organisation2','organisation3','organisation3')
Order By [licence_number]

How to exclude some rows from a SELECT Statement when 2 keys match?

I have 3 tables in my system: Courses, Scores and Users. Scores is a table which has the test results for each course and each user. So I have the ScoreID, The CourseID the UserID and the Score itself.
I want to show in some page the list of courses that the user didn't finished yet. So I want it to show all the courses excluding those the user has records in the Scores table (meaning he already has finished it).
How do I exclude the rows from a SELECT statement when certain CourseID and UserID match at the same time?
Assuming that this is for just one user, Mark Bannister's answer can be simplified a little...
SELECT
*
FROM
Courses
WHERE
NOT EXISTS (SELECT * FROM Scores WHERE CourseID = Courses.CourseID AND UserID = #userID)
Try:
select *
from Courses c
cross join Users u
where not exists
(select null from Scores s where s.CourseID = c.CourseID and s.UserID = u.UserID)
select *
from Courses
where not exists
(
select null from Scores where Scores.CourseID = Courses.CourseID
and Scores.UserID = Courses.UserID
)
Assuming you are using SQL Server you can
CROSS APPLY the courses and users, creating every possible combinations of courses and users
use NOT EXISTS to filter out those records where a UserID exists.
SQL Statement
SELECT *
FROM Courses c
CROSS APPLY Users u
WHERE NOT EXISTS (
SELECT *
FROM Scores
WHERE UserID = u.UserID
AND ScoreID = c.ScoreID
)
In case you are using any other DBMS, following should work on most DBMS's
SELECT *
FROM Courses AS c
, Users AS u
WHERE NOT EXISTS (
SELECT *
FROM Scores
WHERE UserID = u.UserID
AND ScoreID = c.ScoreID
)

MySQL query - possible to include this clause?

I have the following query, which retrieves 4 adverts from certain categories in a random order.
At the moment, if a user has more than 1 advert, then potentially all of those ads might be retrieved - I need to limit it so that only 1 ad per user is displayed.
Is this possible to achieve in the same query?
SELECT a.advert_id, a.title, a.url, a.user_id,
FLOOR(1 + RAND() * x.m_id) 'rand_ind'
FROM adverts AS a
INNER JOIN advert_categories AS ac
ON a.advert_id = ac.advert_id,
(
SELECT MAX(t.advert_id) - 1 'm_id'
FROM adverts t
) x
WHERE ac.category_id IN
(
SELECT category_id
FROM website_categories
WHERE website_id = '8'
)
AND a.advert_type = 'text'
GROUP BY a.advert_id
ORDER BY rand_ind
LIMIT 4
Note: The solution is the last query at the bottom of this answer.
Test Schema and Data
create table adverts (
advert_id int primary key, title varchar(20), url varchar(20), user_id int, advert_type varchar(10))
;
create table advert_categories (
advert_id int, category_id int, primary key(category_id, advert_id))
;
create table website_categories (
website_id int, category_id int, primary key(website_id, category_id))
;
insert website_categories values
(8,1),(8,3),(8,5),
(1,1),(2,3),(4,5)
;
insert adverts (advert_id, title, user_id) values
(1, 'StackExchange', 1),
(2, 'StackOverflow', 1),
(3, 'SuperUser', 1),
(4, 'ServerFault', 1),
(5, 'Programming', 1),
(6, 'C#', 2),
(7, 'Java', 2),
(8, 'Python', 2),
(9, 'Perl', 2),
(10, 'Google', 3)
;
update adverts set advert_type = 'text'
;
insert advert_categories values
(1,1),(1,3),
(2,3),(2,4),
(3,1),(3,2),(3,3),(3,4),
(4,1),
(5,4),
(6,1),(6,4),
(7,2),
(8,1),
(9,3),
(10,3),(10,5)
;
Data properties
each website can belong to multiple categories
for simplicity, all adverts are of type 'text'
each advert can belong to multiple categories. If a website has multiple categories that are matched multiple times in advert_categories for the same user_id, this causes the advert_id's to show twice when using a straight join between 3 tables in the next query.
This query joins the 3 tables together (notice that ids 1, 3 and 10 each appear twice)
select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
inner join adverts a on a.advert_id = ac.advert_id and a.advert_type = 'text'
where wc.website_id='8'
order by a.advert_id
To make each website show only once, this is the core query to show all eligible ads, each only once
select *
from adverts a
where a.advert_type = 'text'
and exists (
select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
where wc.website_id='8'
and a.advert_id = ac.advert_id)
The next query retrieves all the advert_id's to be shown
select advert_id, user_id
from (
select
advert_id, user_id,
#r := #r + 1 r
from (select #r:=0) r
cross join
(
# core query -- vvv
select a.advert_id, a.user_id
from adverts a
where a.advert_type = 'text'
and exists (
select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
where wc.website_id='8'
and a.advert_id = ac.advert_id)
# core query -- ^^^
order by rand()
) EligibleAdsAndUserIDs
) RowNumbered
group by user_id
order by r
limit 2
There are 3 levels to this query
aliased EligibleAdsAndUserIDs: core query, sorted randomly using order by rand()
aliased RowNumbered: row number added to core query, using MySQL side-effecting #variables
the outermost query forces mysql to collect rows as numbered randomly in the inner queries, and group by user_id causes it to retain only the first row for each user_id. limit 2 causes the query to stop as soon as two distinct user_id's have been encountered.
This is the final query which takes the advert_id's from the previous query and joins it back to table adverts to retrieve the required columns.
only once per user_id
feature user's with more ads proportionally (statistically) to the number of eligible ads they have
Note: Point (2) works because the more ads you have, the more likely you will hit the top placings in the row numbering subquery
select a.advert_id, a.title, a.url, a.user_id
from
(
select advert_id
from (
select
advert_id, user_id,
#r := #r + 1 r
from (select #r:=0) r
cross join
(
# core query -- vvv
select a.advert_id, a.user_id
from adverts a
where a.advert_type = 'text'
and exists (
select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
where wc.website_id='8'
and a.advert_id = ac.advert_id)
# core query -- ^^^
order by rand()
) EligibleAdsAndUserIDs
) RowNumbered
group by user_id
order by r
limit 2
) Top2
inner join adverts a on a.advert_id = Top2.advert_id;
I'm thinking through something but don't have MySQL available.. can you try this query to see if it works or crashes...
SELECT
PreQuery.user_id,
(select max( tmp.someRandom ) from PreQuery tmp where tmp.User_ID = PreQuery.User_ID ) MaxRandom
from
( select adverts.user_id,
rand() someRandom
from adverts, advert_categories
where adverts.advert_id = advert_categories.advert_id ) PreQuery
If the "tmp" alias is recognized as a temp buffer of the preliminary query as defined by the OUTER FROM clause, I might have something that will work... I think the field as a select statement from a queried from WONT work, but if it does, I know I'll have something solid for you.
Ok, this one might make the head hurt a bit, but lets get the logical thing going... The inner most "Core Query" is a basis that gets all unique and randomly assigned QUALIFIED Users that have a qualifying ad base on the category chosen, and type = 'text'. Since the order is random, I don't care what the assigned sequence is, and order by that. The limit 4 will return the first 4 entries that qualify. This is regardless of one user having 1 ad vs another having 1000 ads.
Next, join to the advertisements, reversing the table / join qualifications... but by having a WHERE - IN SUB-SELECT, the sub-select will be on each unique USER ID that was qualified by the "CoreQuery" and will ONLY be done 4 times based on ITs inner limit. So even if 100 users with different advertisements, we get 4 users.
Now, the Join to the CoreQuery is the Advert Table based on the same qualifying user. Typically this would join ALL records against the core query given they are for the same user in question... This is correct... HOWEVER, the NEXT WHERE clause is what filters it down to only ONE ad for the given person.
The Sub-Select is making sure its "Advert_ID" matches the one selected in the sub-select. The sub-select is based ONLY on the current "CoreQuery.user_ID" and gets ALL the qualifying category / ads for the user (wrong... we don't want ALL ads)... So, by adding an ORDER BY RAND() will randomize only this one person's ads in the result set... then Limiting THAT by 1 will only give ONE of their qualified ads...
So, the CoreQuery restricts down to 4 users. Then for each qualified user ID, gets only 1 of the qualified ads (by its inner order by RAND() and LIMIT 1 )...
Although I don't have MySQL to try, the queries are COMPLETELY legit and hope it works for you.... man, I love brain teasers like this...
SELECT
ad1.*
from
( SELECT ad.user_id,
count(*) as UserAdCount,
RAND() as ANYRand
from
website_categories wc
inner join advert_categories ac
ON wc.category_id = ac.category_id
inner join adverts ad
ON ac.advert_id = ad.advert_id
AND ad.advert_type = 'text'
where
wc.website_id = 8
GROUP BY
1
order by
3
limit
4 ) CoreQuery,
adverts ad1
WHERE
ad1.advert_type = 'text'
AND CoreQuery.User_ID = ad1.User_ID
AND ad1.advert_id in
( select
ad2.advert_id
FROM
adverts ad2,
advert_categories ac2,
website_categories wc2
WHERE
ad2.user_id = CoreQuery.user_id
AND ad2.advert_id = ac2.advert_id
AND ac2.category_id = wc2.category_id
AND wc2.website_id = 8
ORDER BY
RAND()
LIMIT
1 )
I like to suggest that you do the random with php. This is way faster than doing it in mySQL.
"However, when the table is large (over about 10,000 rows) this method of selecting a random row becomes increasingly slow with the size of the table and can create a great load on the server. I tested this on a table I was working that contained 2,394,968 rows. It took 717 seconds (12 minutes!) to return a random row."
http://www.greggdev.com/web/articles.php?id=6
set #userid = -1;
select
a.id,
a.title,
case when #userid = a.userid then
0
else
1
end as isfirst,
(#userid := a.userid)
from
adverts a
inner join advertcategories ac on ac.advertid = a.advertid
inner join categories c on c.categoryid = ac.categoryid
where
c.website = 8
order by
a.userid,
rand()
having
isfirst = 1
limit 4
Add COUNT(a.user_id) as owned in the main select directive and add HAVING owned < 2 after Group By
http://dev.mysql.com/doc/refman/5.5/en/select.html
I think this is the way to do it, if the one user has more than one advert then we will not select it.

What's wrong with this MySQL query? SELECT * AS `x`, how to use x again later?

The following MySQL query:
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
where `sID` in (select `siteID` from `sites` where `foo` = "bar")
order by `timestamp` desc limit 100
…returns an error:
Unknown column 'sID' in 'IN/ALL/ANY subquery'
I don't understand what I'm doing wrong here. The sID thing is not supposed to be a column, but the 'alias' (what is this called?) I created by executing (select siteID from users where userID = uID) as sID. And it’s not even inside the IN subquery.
Any ideas?
Edit: #Roland: Thanks for your comment. I have three tables, actions, users and sites. The table actions contains a userID field, which corresponds to an entry in the users table. Every user in this table (users) has a siteID.
I'm trying to select the latest actions from the actions table, and link them to the users and sites table to find out who performed those actions, and on which site. Hope that makes sense :)
You either need to enclose it into a subquery:
SELECT *
FROM (
SELECT userID as uID, (select siteID from users where userID = actions.userID) as sID,
FROM actions
) q
WHERE sID IN (select siteID from sites where foo = "bar")
ORDER BY
timestamp DESC
LIMIT 100
, or, better, rewrite it as a JOIN
SELECT a.userId, u.siteID
FROM actions a
JOIN users u
ON u.userID = a.userID
WHERE siteID IN
(
SELECT siteID
FROM sites
WHERE foo = 'bar'
)
ORDER BY
timestamp DESC
LIMIT 100
Create the following indexes:
actions (timestamp)
users (userId)
sites (foo, siteID)
The column alias is not established until the query processor finishes the Select clause, and buiulds the first intermediate result set, so it can only be referenced in a group By, (since the group By clause operates on that intermediate result set) if you want ot use it this way, puit the alias inside the sub-query, then it will be in the resultset generated by the subquery, and therefore accessible to the outer query. To illustrate
(This is not the simplest way to do this query but it illustrates how to establish and use a column alias from a subquery)
select a.userID as uID, z.Sid
from actions a
Join (select userID, siteID as sid1 from users) Z,
On z.userID = a.userID
where Z.sID in (select siteID from sites where foo = "bar")
order by timestamp desc limit 100
Try the following:
SELECT
a.userID as uID
,u.siteID as sID
FROM
actions as a
INNER JOIN
users as u ON u.userID=a.userID
WHERE
u.siteID IN (SELECT siteID FROM sites WHERE foo = 'bar')
ORDER BY
a.timestamp DESC
LIMIT 100
I think the reason for the error is that the alias isn't available to the WHERE instruction, which is why we have HAVING.
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
HAVING `sID` in (select `siteID` from `sites` where `foo` = "bar")
order by `timestamp` desc limit 100
Though i also agree with the other answers that your query could be better structured.
Try the following
SELECT
a.userID as uID
,u.siteID as sID
FROM
actions as a
INNER JOIN
users as u ON u.userID = a.userID
INNER JOIN
sites as s ON u.siteID = s.siteID
WHERE
s.foo = 'bar'
ORDER BY
a.timestamp DESC
LIMIT 100
If you wish to use a field from the select section later you can try a subselect
SELECT One,
Two,
One + Two as Three
FROM (
SELECT 1 AS One,
2 as Two
) sub
I don't know whether this was not in the SQL standard 11 years ago, but I found it the easiest way to use HAVING:
select `userID` as uID,
(select `siteID` from `users` where `userID` = uID) as `sID`,
from `actions`
order by `timestamp` desc limit 100
HAVING `sID` in (select `siteID` from `sites` where `foo` = "bar")