Using subselect to accomplish LEFT JOIN

Using subselect to accomplish LEFT JOIN - sql

Is is possible to accomplish the equivalent of a LEFT JOIN with subselect where multiple columns are required.
Here's what I mean.
SELECT m.*, (SELECT * FROM model WHERE id = m.id LIMIT 1) AS models FROM make m
As it stands now doing this gives me a 'Operand should contain 1 column(s)' error.
Yes I know this is possible with LEFT JOIN, but I was told it was possible with subselect to I'm curious as to how it's done.

There are many practical uses for what you suggest.
This hypothetical query would return the most recent release_date (contrived example) for any make with at least one release_date, and null for any make with no release_date:
SELECT m.make_name,
sub.max_release_date
FROM make m
LEFT JOIN
(SELECT id,
max(release_date) as max_release_date
FROM make
GROUP BY 1) sub
ON sub.id = m.id

A subselect can only have one column returned from it, so you would need one subselect for each column that you would want returned from the model table.

Related

In PostgreSQL, return rows with unique values of one column based on the minimum value of another

Background
I've got this PostgreSQL join that works pretty well for me:
select m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
As you can see, I'm joining two tables to bring in a combined_id, which I need for later analysis elsewhere.
The Goal
I'd like to write a query that does so by picking the combined_id that's got the lowest value of m.id next to it (along with the other variables too). This ought to result in a new table with unique/distinct values of combined_id.
The Problem
The issue is that the current query returns ~300 records, but I need it to return ~100. Why? Each combined_id has, on average, 3 different m.id's. I don't actually care about the m.id's; I care about getting a unique combined_id. Because of this, I decided that a good "selection criterion" would be to select rows based on the lowest value m.id for rows with the same combined_id.
What I've tried
I've consulted several posts on this and I feel like I'm fairly close. See for instance this one or this one. This other one does exactly what I need (with MAX instead of MIN) but he's asking for it in Unix Bash 😞
Here's an example of something I've tried:
select m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
WHERE m.id IN (select min(m.id))
This returns the error ERROR: aggregate functions are not allowed in WHERE.
Any ideas?

Postgres's DISTINCT ON is probably the best approach here:
SELECT DISTINCT ON (c.combined_id)
m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
FROM people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c
ON m.id = c.id
ORDER BY
c.combined_id,
m.id;
As for performance, the following index on the crosstable might speed up the query:
CREATE INDEX idx ON people.person_to_person_composite_crosstable (id, combined_id);
If used, the above index should let the join happen faster. Note that I cover the combined_id column, which is required by the select.

Using COUNT (DISTINCT..) when also using INNER JOIN to join 3 tables but Postgres keeps erroring

I need to use INNER JOINs to get a series of information and then I need to COUNT this info. I need to be able to "View all courses and the instructor taking them, the capacity of the course, and the number of members currently booked on the course."
To get all the info I have done the following query:
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo, membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID;
but no matter where I put the COUNT it always tells me that whatever is outside the COUNT should be in the GROUP BY
I have worked out how to COUNT/GROUP BY the necessary info e.g.:
SELECT courseID, COUNT (DISTINCT MC.memno)
FROM Membercourse AS MC
GROUP BY MC.courseID;
but I don't know how to combine the two!

I think what you're looking for is a subquery. I'm a SQL-Server guy (not postgresql) but the concept looks to be almost identical after some crash-course postgresql googling.
Anyway, basically, when you write a SELECT statement, you can use a subquery instead of an actual table. So your SQL would look something like:
select count(*)
from
(
select stuff from table
inner join someOtherTable
)
... hopefully that makes sense. Instead of trying to write one big query where you're doing both the inner join and count, you're writing two: an inner one that gets your inner-join'ed data, and then an outer one to actually count the rows.
EDIT: To help explain a bit more on the thought process behind subqueries.
Subqueries are a way of logically breaking down the steps/processes on the data. Instead of trying to do everything in one big step, you do it in steps.
In this case, what's step one? It's to get a combined data source for your combined, inner-join'ed data.
Step 1: Write the Inner Join query
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo,
membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID;
Okay, now, what next?
Well, let's say we want to get a count of how many entries there are for each 'memno' in that result above.
Instead of trying to figure out how to modify that query above, we instead use it as a data source, like it was a table itself.
Step 2 - Make it A Subquery
select * from
(
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo,
membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID
) mySubQuery
Step 3 - Modify your outer query to get the data you want.
Well, we wanted to group by 'memno', and get the count, right? So...
select memno, count(*)
from
(
-- all that same subquery stuff
) mySubQuery
group by memno
... make sense? Once you've got your subquery written out, you don't need to worry about it any more - you just treat it like a table you're working with.
This is actually incredibly important, and makes it much easier to read more intricate queries - especially since you can name your subqueries in a way that explains what the subquery represents data-wise.

There are many ways to solve this, such using Window Functions and so on. But you can also achieve it using a simple subquery:
SELECT
C.coursename,
Instructors.fname,
Instructors.lname,
C.maxNo,
(SELECT
COUNT(*)
FROM
membercourse
WHERE
C.courseID = Membercourse.courseID) AS members
FROM
Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo;

Use count where id=id - oracle

using oracle live sql trying to display a "Standings" view,
two tables in use would be MATCH and TEAM
so every time MATCH.WIN_TEAM = the id of a team it shows as a count
I guess I'm having issues with my count function. I know i can easily pull up where
so it would be
TEAM_NAME WINS
This is pulling up just the teams,subquery isn't working. am i on the right track?
select A.TEAM, B.W
from TEAM A
left join
(
select MATCH.WIN_TEAM,
Count(MATCH.WIN_TEAM) as W
from MATCH
group by MATCH.WIN_TEAM
) B
on A.ID = B.WIN_TEAM
order by B.W desc;
im a newb, halp. I'm guessing i need aliases too?

select TEAM.NAME ,copy_match.count_match
from TEAM
left join (select MATCH.WIN_TEAM as WIN_TEAM,
Count(MATCH.WIN_TEAM) AS count_match
from MATCH
group by MATCH.WIN_TEAM
) copy_match
on TEAM.ID=copy_match.WIN_TEAM;

Even if, technically, it works, I think you are over complicating your query
select TEAM.NAME ,COUNT(MATCH.WIN_TEAM)
from TEAM
left join
MATCH ON TEAM.ID=MATCH.WIN_TEAM
GROUP BY TEAM.NAME
ORDER BY COUNT(MATCH.WIN_TEAM)
Isn't this version a little easier ? Or maybe i missed a detail

Getting way more results than expected in SQL left join query

My code is such:
SELECT COUNT(*)
FROM earned_dollars a
LEFT JOIN product_reference b ON a.product_code = b.product_code
WHERE a.activity_year = '2015'
I'm trying to match two tables based on their product codes. I would expect the same number of results back from this as total records in table a (with a year of 2015). But for some reason I'm getting close to 3 million.
Table a has about 40,000,000 records and table b has 2000. When I run this statement without the join I get 2,500,000 results, so I would expect this even with the left join, but somehow I'm getting 300,000,000. Any ideas? I even refered to the diagram in this post.

it means either your left join is using only part of foreign key, which causes row multiplication, or there are simply duplicate rows in the joined table.
use COUNT(DISTINCT a.product_code)

What is the question are are trying to answer with the tsql?
instead of select count(*) try select a.product_code, b.product_code. That will show you which records match and which don't.
Should also add a where b.product_code is not null. That should exclude the records that don't match.
b is the parent table and a is the child table? try a right join instead.

Or use the table's unique identifier, i.e.
SELECT COUNT(a.earned_dollars_id)

Not sure what your datamodel looks like and how it is structured, but i'm guessing you only care about earned_dollars?
SELECT COUNT(*)
FROM earned_dollars a
WHERE a.activity_year = '2015'
and exists (select 1 from product_reference b ON a.product_code = b.product_code)

optimize SQL query

What more can I do to optimize this query?
SELECT * FROM
(SELECT `item`.itemID, COUNT(`votes`.itemID) AS `votes`,
`item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin' ,
`myItems`.userID as userIDFav, `myItems`.deleted as myDeleted
FROM (votes `votes` RIGHT OUTER JOIN item `item`
ON (`votes`.itemID = `item`.itemID))
INNER JOIN
users `users`
ON (`users`.userID = `item`.userID)
LEFT OUTER JOIN
myItems `myItems`
ON (`myItems`.itemID = `item`.itemID)
WHERE (`item`.deleted = 0)
GROUP BY `item`.itemID,
`votes`.itemID,
`item`.title,
`item`.itemTypeID,
`item`.submitDate,
`item`.deleted,
`item`.ItemCat,
`item`.counter,
`item`.userID,
`users`.name,
`myItems`.deleted,
`myItems`.userID
ORDER BY `item`.itemID DESC) as myTable
where myTable.userIDFav = 3 or myTable.userIDFav is null
limit 0, 20
I'm using MySQL
Thanks

What does the analyzer say for this query? Without knowledge about how many rows there are in the table you cant tell any optimization. So run the analyzer and you'll see what parts costs what.

Of course, as #theomega said, look at the execution plan.
But I'd also suggest to try and "clean up" your statement. (I don't know which one is faster - that depends on your table sizes.) Usually, I'd try to start with a clean statement and start optimizing from there. But typically, a clean statement makes it easier for the optimizer to come up with a good execution plan.
So here are some observations about your statement that might make things slow:
a couple of outer joins (makes it hard for the optimzer to figure out an index to use)
a group by
a lot of columns to group by
As far as I understand your SQL, this statement should do most of what yours is doing:
SELECT `item`.itemID, `item`.title, `item`.itemTypeID, `item`.
submitDate, `item`.deleted, `item`.ItemCat,
`item`.counter, `item`.userID, `users`.name,
TIMESTAMPDIFF(minute,`submitDate`,NOW()) AS 'timeMin'
FROM (item `item` INNER JOIN users `users`
ON (`users`.userID = `item`.userID)
WHERE
Of course, this misses the info from the tables you outer joined, I'd suggest to try to add the required columns via a subselect:
SELECT `item`.itemID,
(SELECT count (itemID)
FROM votes v
WHERE v.itemID = 'item'.itemID) as 'votes', <etc.>
This way, you can get rid of one outer join and the group by. The outer join is replaced by the subselect, so there is a trade-off which may be bad for the "cleaner" statement.
Depending on the cardinality between item and myItems, you can do the same or you'd have to stick with the outer join (but no need to reintroduce the group by).
Hope this helps.

Some quick semi-random thoughts:
Are your itemID and userID columns indexed?
What happens if you add "EXPLAIN " to the start of the query and run it? Does it use indexes? Are they sensible?
DO you need to run the whole inner query and filter on it, or could you put move the where myTable.userIDFav = 3 or myTable.userIDFav is null part into the inner query?

You do seem to have too many fields in the Group By list, since one of them is itemID, I suspect that you could use an inner SELECT to preform the grouping and an outer SELECT to return the set of fields desired.

Can't you add the where clause myTable.userIDFav = 3 or myTable.userIDFav is null to WHERE (item.deleted = 0)?
Regards
Lieven

Look at the way your query is built. You join a lot of stuff, then limit the output to 20 rows. You should have the outer join on items and myitems, since your conditions only apply to these two tables, limit the output to the first 20 rows, then join and aggregate. Here you are performing a lot of work that is going to be discarded.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas