left join with special condition on right table - sql

don't know if this is possible.. I'm using sqlite3
schema:
CREATE TABLE docs (id integer primary key, name string);
CREATE TABLE revs (id integer primary key, doc_id integer, number integer);
I want to select every job joined with only one of its revisions, the one with the highest number. How can I achieve this?
Right now I'm doing a left join and getting everything and then I'm filtering it in the application, but this sucks..
(by the way, can you suggest me a good and easy introductory book on databases and how they work and maybe something about sql too..)
thanks!

try this
Select * From docs d
Join revs r
On r.doc_id = d.id
Where r.number =
(Select Max(number ) from revs
Where Doc_Id = d.Id)
or, if you want the docs with no Revs (Is this possible?)
Select * From docs d
Left Join revs r
On r.doc_id = d.id
And r.number =
(Select Max(number ) from revs
Where Doc_Id = d.Id)

Not sure if your engine supports this, but typically, you would do something like this in ANSI SQL:
SELECT docs.*
,revs.*
FROM docs
INNER /* LEFT works here also if you don't have revs */ JOIN revs
ON docs.id = revs.doc_id
AND revs.number IN (
SELECT MAX(number)
FROM revs
WHERE doc_id = docs.id
)
There are a number of ways to write equivalent queries, using common table expressions, correlated aggregate subqueries, etc.

select d.*, r.max_number
from docs d
left outer join (
select doc_id, max(number) as max_number
from revs
group by doc_id
) r on d.id = r.doc_id

Database Design : Database Design for Mere Mortals by Hernandez
SQL : The Practical SQL Handbook
If you want to hurt your head, any of the SQL books by Joe Celko.

Here is a very good list of books for Database Design
https://stackoverflow.com/search?q=database+book

If every job has revisions (e.g., starting with rev 0), I would use the same approach as OrbMan, but with an inner join. (If you are certain you are looking for a 1-to-1 match. why not let SQL know, too?)
select d.*, r.max_number
from docs d
inner join
(
select doc_id, max(number) as max_number
from revs
group by doc_id
) r on d.id = r.doc_id

I'd recommend "A Sane Approach to Database Design" as an excellent introduction to good design practices. (I am slightly biased. I wrote it. But hey, so far it has a 5-star average review on Amazon, none of which reviews were contributed by me or any friends or relatives.)

Related

Counting empty relations from a SQL table

I'm trying to count authors who don't have any articles in our system, which aggregates authorship across sites. I've got a query working, but it isn't performant.
The best query I have thus far is this:
select count(*) as count_all
from (
select authors.id
from authors
left outer join site_authors on site_authors.author_id = authors.id
left outer join articles on articles.site_author_id = site_authors.id
group by authors.id
having count(articles.id) = 0
) a;
However, the subquery is rather inefficient. I was hoping there's a way to flatten this. I have several similar queries that add extra conditions on the left outer joins, so adding a count column to my schema isn't really an option here.
Extra rub: this is a cross-platform query and needs to work against both pgSQL, SQLite, and MySQL.
you can try a little bit different query, but I'm not sure that it will be faster:
select count(*)
from authors as a
where not exists (
select b.id
from site_authors as b
inner join
articles as c
on a.id=b.author_id and b.id=c.site_author_id)
of course I suppose you have proper indexes on tables:
site_authors: unique (author_id, id)
articles: non unique (site_author_id)
Assuming that 'normal' joins are simpler and faster, you could subtract the number of authors with articles from the total number of authors:
SELECT (SELECT COUNT(*)
FROM authors) -
(SELECT COUNT(DISTINCT site_authors.author_id)
FROM site_authors
JOIN articles ON articles.site_author_id = site_authors.id)
Alternatively, try a subquery:
SELECT COUNT(*)
FROM authors
WHERE id NOT IN (SELECT site_authors.author_id
FROM site_authors
JOIN articles ON articles.site_author_id = site_authors.id)
It might be simpler and faster to use NOT IN rather than a join. Sql processors are pretty smart about using indexes even when it looks obtuse. Something like this:
Select count(*)
from authors
where id not in (select author_id from site_authors)
and id not in (select site_author_id from articles);
Be sure that author_id and site_author_id are indexed. The optimizer will notice what your are doing and create an indexed look up for the "NOT IN" clause.

What is the best way to find the max record of a table per a foreign key?

At work, I often have to find the max status per a foreign key. I have for the most part always used a correlated sub-query on the join to get the right record. This is assuming the highest primary key is the most recent. Here is a little demo
select
c.plate_number, o.name
from
Car c
inner join Owner o
on o.owner_id = (
select max(owner_id)
from Owner
where owner_type = 'PRIMARY'
)
This is pretty fast in most queries I use, not to mention being able to put extra criteria in the sub-query for type columns. I have tried using NOT EXIST clauses to make sure there are no higher records, but can't find anything else. Can someone suggest anything better and if so why?
I recommend using the sandard windowing functions....
;with cte as (
select c.plateNumber, o.name,
row_number() over (partition by c.ownerId order by purchaseDate desc) rw
from car c
inner join owner o
on o.ownerid = c.ownerid
)
select *
from cte
where rw=1;
allows you to get whatever you want from either table, and still only get one record

Questions about SQL query

I need to select exam results of students in class 7A, but need to look into another table (student_profile) to identify students in 7A (identify by student_id).
I wonder which of the following method will be faster, assume index for student_id is created in both tables:
Method 1:
select * from exam_results r
where exists
(select 1
from student_profile p
where p.student_id = r.student_id
and p.class = '7A')
Method 2:
select * from exam_results
where student_id in
(select student_id
from student_profile
where class = '7A')
Thanks in Advance,
Jonathan
Short answer, it doesn't matter. The query engine will treat them the same.
Personally, I'd consider this syntax.
select
r.*
from
exam_results r
join
student_profile p
on p.student_id = r.student_id
where
p.class = '7A'
The inner is implicit if omitted.
You'll get the same performance because modern query engines are well developed but, I think this standard syntax is more extendable and easier to read.
If you extend this query in the future, multiple join conditons will be easier to optimize than multiple exists or ins.
If you compare the two queries, then the one with the EXISTS is faster. However the right (and usually faster) approach to such problems is a JOIN.
select r.student_id, r.other_columns
from exam_results r
inner join student_profiles s
on r.student_id = s.student_id
where s.class = '7A'

Simple SQL question with subqueries

I'm refreshing my SQL with the online Stanford database class exercises, found here. Here is the problem:
"Find names and grades of students who only have friends in the same
grade. Return the result sorted by grade, then by name within each
grade."
We have a highschooler table, with the attributes name, grade, id. Also, the likes table has attributes id1 and id2. id1 and id2 in likes correspond to id in highschooler.
Based on the problem section this comes from, I can tell that I'll need to use subqueries, but I'm not sure where. How should I approach this problem? None of the currently suggested solutions work.
Here is my current SQL statement, that is not working correctly (ignoring sorting):
select distinct
student1.id,
student1.name,
student1.grade
from
highschooler student1,
highschooler student2,
friend
where not exists (select *
from friend
where student1.id = id1
and student2.id = id2
and student1.grade = student2.grade
and student1.id <> student2.id);
I assumed that, if A was B's Friend, it's equal to B was A's friend.
CREATE VIEW Temp
AS
SELECT id,name,grade,id2,[grd2] FROM highschooler
INNER JOIN Likes ON highschooler.id = Likes.id1
INNER JOIN (SELECT id as [id2t], grade as [grd2] from highschooler) a ON a.id2t = Likes.id2
UNION ALL
SELECT id,name,grade,[id1] as [id2],[grd2] FROM highschooler
INNER JOIN Likes ON highschooler.id = Likes.id2
INNER JOIN (SELECT id as [id2t], grade as [grd2] from highschooler) a ON a.id2t = Likes.id1
The temp view let me have all the info i need.
CREATE VIEW PlayWithClassMate
AS
SELECT distinct id FROM Temp WHERE grade = grd2
This PlayWithClassMate view let me have all student who play with her/his classmate (some how, i think a person can play, with all his/her friend not their classmate).
CREATE VIEW IDResult
AS
SELECT id FROM (
SELECT id, COUNT(GRD2) as c FROM TEMP
WHERE id in (SELECT id FROM PlayWithClassMate)
GROUP BY ID) A
WHERE C>1
this IDResult view have all the id the question ask you.
Now, select whatever you need, inwhich its ID in IDResult
i think it's not the best, or it may be the worst, but it work.
(srr abt terribe grammar)
This is harder than it looks, because it requires preparing sets sequentially. But, there are a few ways to solve this one. Here's what quickly comes to mind:
First, find the friend-of-friend for everybody by grade producing something like:
[ID], [FoF ID], [Grade of FoF]
You really don't need [FoF ID], but it might help when debugging.
Then, as a second-order operation, you'll need to produce a list of [ID]s where [Grade of FoF] is equal to both the MAX() and MIN():
SELECT [ID], MAX(Grade of FoF) as A, MIN(Grade of FoF) as B FROM [the above] WHERE A=B
UPDATE:
I realized that I should also add that in the final qry: A=B and A=Grade. Then this solution works. Keep in mind: it only answers the question "Find names and grades of students who only have friends in the same grade." and it assumes friendship is one-directional. (Sorry, I had to leave something undone.)
For those that need to see some SQL, here you are. It's written for MS Access, but easily ported (start by removing the "()" in the inner-most query) to MySQL, PGSQL, or Oracle. Better still, no procedural extensions and no temp tables.
SELECT
name
FROM
(
SELECT
ID
,name
,grade
,min( friend_grade) as min_friend_grade
,max( friend_grade) as max_friend_grade
FROM
(
SELECT
hs1.ID
,hs1.name
,hs1.grade
,l.ID2 as friend_id
,hs2.name as friend_name
,hs2.grade as friend_grade
FROM
( highschooler hs1
INNER JOIN likes l ON (hs1.ID = l.ID1) )
INNER JOIN highschooler hs2 ON (l.ID2 = hs2.ID)
)FoF
GROUP BY
ID
,name
,grade
)FoF_max_min
WHERE
grade=min_friend_grade
AND min_friend_grade=max_friend_grade

SQL: Speed Improvement - Left Join on cond1 or cond2

SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
)
Two tables that are basically the same
I don't have access to the table structure or data input (thus no cleaning up primary keys)
Sometimes the user_id is populated in one and not the other
Sometimes names are equal, sometimes they are not
I've found that I can get the most of the data by matching on user_id or the first/last names. I'm using the ' ' between the names to avoid cases where one user has the same first name as another's last name and both are missing the other field (unlikely, but plausible).
This query runs in 33000ms, whereas individualized they are each about 200ms.
I've been up late and can't think straight right now
I'm thinking that I could do a UNION and only query by name where a user_id does not exist (the default join is the user_id, if a user_id doesn't exist then I want to join by the name)
Here is some free points to anyone that wants to help
Please don't ask for the execution plan.
Looks like you can easily avoid the string concatenation:
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
Change it to:
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
Rather than concatenating first and last name and comparing them, try comparing them individually instead. Assuming you have them (and you should create them if you don't), this should improve your chances of using indexes on the first name and last name columns.
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR (a.f_name = b.f_name and a.l_name = b.l_name)
)
If people's suggestions don't provide a major speed increase, there is a possibility that your real problem is that the best query plan for your two possible join conditions is different. For that situation you would want to do two queries and merge results in some way. This is likely to make your query much, much uglier.
One obscure trick that I have used for that kind of situation is to do a GROUP BY off of a UNION ALL query. The idea looks like this:
SELECT a_field1, a_field2, ...
MAX(b_field1) as b_field1, MAX(b_field2) as b_field2, ...
FROM (
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
UNION ALL
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.f_name = b.f_name AND a.l_name = b.l_name
)
GROUP BY a_field1, a_field2, ...
And now the database can do each of the two joins using the most efficient plan.
(Warning of a drawback in this approach. If a row in current_tbl joins to multiple rows in import_tbl, then you'll wind up merging data in a very odd way.)
Incidental random performance tip. Unless you have reason to believe that there are potential duplicate rows, avoid DISTINCT. It forces an implicit GROUP BY, which can be expensive.
I don't really understand why you're concatenating those strings. Seems like that's where your slowdown would be. Does this work instead?
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
)
Here is Yet Another Ugly Way To Do It.
SELECT a.*
, CASE WHEN b.user_id IS NULL THEN c.field1 ELSE b.field1 END as b_field1
, CASE WHEN b.user_id IS NULL THEN c.field2 ELSE b.field2 END as b_field2
...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
LEFT JOIN import_tbl c
ON a.f_name = c.f_name AND a.l_name = c.l_name;
This avoids any GROUP BY, and also handles conflicting matches in a somewhat reasonable way.
Try using JOIN hints:
http://msdn.microsoft.com/en-us/library/ms173815.aspx
We were encountering the same type of behavior with one of our queries. As a last resort we added the LOOP hint, and the query ran much much faster.
It's important to note that Microsoft says this about JOIN hints:
Because the SQL Server query optimizer typically selects the best execution plan for a query, we recommend that hints, including , be used only as a last resort by experienced developers and database administrators.
my boss at my last job.. I swear.. he thought that using UNIONS was ALWAYS FASTER THAN OR.
For example.. instead of writing
Select * from employees Where Employee_id = 12 or employee_id = 47
he would write (and have me write)
Select * from employees Where employee_id = 12
UNION
Select * from employees Where employee_id = 47
SQL Sever optimizer said that this was the right thing to do in SOME situations.. I have a friend who works on the SQL Server team at Microsoft, I emailed him about this and he told me that my stats were out of date or something along those lines.
I never really got a good answer on WHY the unions are faster, it seems REALLY counter-intuitive.
I'm not recommending you DO this, but in some situations it can help.
Also two more things-- GET RID OF THE DISTINCT CLAUSE unless you absolutely need it.. n
and more importantly, you can easily get rid of the concatenation in your join, like this for example (pardon my lack of mySQL knowledge)
SELECT DISTINCT a., b.
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name and a.l_name = b.l_name)
)
I've had some tests at work in a similiar situation that show 10x performance improvement by getting rid of the simple concatenation in your join