Sorry for my English
I have two tables:
Partners
ID | NAME | IS_FAVORITE
PartnerPoints
ID | PARTNER_ID | NAME
And I want to get all rows from PartnerPoints which related to Partners (by PARTNER_ID) with the field IS_FAVORITE set to 1. I.e. I want to get all favorite partner points.
How can I do that?
You just need to use a WHERE clause:
SELECT PartnerPoints.*
FROM PartnerPoints
WHERE EXISTS ( SELECT *
FROM Partners
WHERE Partners.ID = PartnerPoints.PARTNER_ID
AND Partners.IS_FAVORITE = 1
)
You can do this by JOINING the tables.
SELECT PartnerPoints.*
FROM PartnerPoints JOIN Partners ON PartnerPoints.Partner_ID=Partners.ID
WHERE Partners.Is_favorite = 1
This is an INNER JOIN. Oscar Pérez’s answer, with the subquery, is called a SEMI-JOIN. The database may execute the same plan, or this INNER JOIN may be faster. In more complicated cases, you may have to use a semi-join.
You can do this by first computing the IDs of all favorite partners, and then searching for PartnerPoints that have such a partner ID:
SELECT *
FROM PartnerPoints
WHERE Partner_ID IN (SELECT ID
FROM Partners
WHERE Is_Favorite = 1)
Which type of query is fastest depends on the amount and distribution of data in the tables, and of which indexes you have; if the speed actually matters to you, you have to measure all three queries.
Related
I have the following SQL:
SELECT j.AssocJobKey
, COUNT(DISTINCT o.ID) AS SubjectsOrdered
, COUNT(DISTINCT s.ID) AS SubjectsShot
FROM Jobs j
LEFT JOIN Orders o ON o.AssocJobKey = j.AssocJobKey
LEFT JOIN Subjects s ON j.AssocJobKey = s.AssocJobKey
GROUP BY
j.AssocJobKey
,j.JobYear
The basic structure is a Job is the parent that is unique by AssocJobKey and has a one to many relationships with Subjects and Orders.
The query gives me what I want, the output looks like this:
| AssocJobKey | SubjectsOrdered | SubjectsShot |
|-----------------------|------------------------|---------------------|
| BAT-H181 | 107 | 830 |
|--------------------- |------------------------|---------------------|
| BAT-H131 | 226 | 1287 |
The problem is the query is way to heavy and my memory is spiking, there's no way I could run this on a large dataset. If I remove one of the LEFT JOINs on the corresponding count the query executes instantly and theres no problem. So somehow things are bouncing around between the two left joins more than they should, but I don't understand why they would.
Really hoping to avoid joining on sub selects if at all possible.
Your query is generating a Cartesian product for each job. And this is big -- your second row has about 500k rows being generated. COUNT(DISTINCT) then has to figure out the unique ids among this Cartesian product.
The solution is simple: pre-aggregate:
SELECT j.AssocJobKey, o.SubjectsOrdered, s.SubjectsShot
FROM Jobs j LEFT JOIN
(SELECT o.AssocJobKey, COUNT(*) as SubjectsOrdered
FROM Orders o
GROUP BY o.AssocJobKey
) o
ON o.AssocJobKey = j.AssocJobKey LEFT JOIN
(SELECT j.AssocJobKey, COUNT(s.ID) AS SubjectsShot
FROM Subjects s
GROUP BY j.AssocJobKey
) s
ON j.AssocJobKey = s.AssocJobKey;
This makes certain assumptions that I think are reasonable:
The ids in the orders and subjects table are unique and non-NULL.
jobs.AssocJobKey is unique.
The query can be easily adapted if either of these are not true, but they seem like reasonable assumptions.
Often for these types of joins over different dimensions, COUNT(DISTINCT) is a reasonable solution (the queries are certainly simpler). This is true when there are at most a handful of values.
I have a schema that looks like the following:
Invoices: | id | ... |
InvoicePayments: | id | invoice_id | amount_cents | ... |
LineItems: | id | invoice_id | unit_price_cents | quantity | ... |
and I am looking to find unpaid invoices, that is, Invoices who have a sum of amount_cents from InvoicePayments that is less than the sum of (quantity * unit_price) from LineItems. I was able to accomplish this with two sub queries, like:
SELECT prices.id FROM (
SELECT invoices.id, sum(invoice_payments.amount_cents) as paid
FROM invoices
LEFT JOIN invoice_payments ON invoice_payments.invoice_id = invoices.id
GROUP BY invoices.id
) payments JOIN (
SELECT invoices.id, sum(line_items.quantity * line_items.unit_price_cents) as price
FROM invoices
LEFT JOIN line_items ON line_items.invoice_id = invoices.id
GROUP BY invoices.id
) prices
ON payments.id = prices.id
WHERE paid < price OR paid IS NULL;
However, I am using ActiveRecord and would like something simpler that could be translated into Arel statements; additionally, I would like to use this as a reusable scope, so I could apply additional constraints, such as finding Invoices that were unpaid on a certain date, by filtering out InvoicePayments that are newer than that date.
Is there a way to accomplish this without subqueries so that I can use this more easily with Rails and apply flexible filters?
One approach would be to define views to encapsulate the subqueries you have defined above, and to then define read only models on them.
They can then associate with the invoice model, and give you the opportunity to simplify your syntax considerably.
SELECT
i.id
COALESCE(SUM(l.quantity * l.unit_price_cents),0) - COALESCE(SUM(p.amounts_cents),0) as UnpaidBalance
FROM
invoices i
LEFT JOIN invoice_payments p
ON i.id = p.invoice_id
AND p.DateColumn <= '2016-01-30'
LEFT JOIN line_items l
ON i.id = l.invoice_id
GROUP BY
i.id
HAVING
COALESCE(SUM(p.amounts_cents),0) < COALESCE(SUM(l.quantity * l.unit_price_cents),0)
Especially with the addition of window functions to many relational database management systems such as postgressql you rarely need to sub queries like that anymore. When using left joins aggregate functions will ignore the null values so by simply combining your left joins into the same query you can still get to your desired result.
You may notice the use of table alias which makes it a little easier to read and a lot less writing of code for you. you can define an alias by typing one just after then table name in your query. I did include p.DateColumn <= some date. you can parameratize the query to pass a variable their and test if it is null and choose current date time if you want as of today.
I have 2 tables. One has been pruned to show only ID's which meet certain criteria. The second needs to be pruned to show only data that matches the previous "array" of id's. there can be multiple results.
Consider the following:
Query_1_final: Returns the ID's of users whom meet certain criteria:
select
t1.[user_id]
from
[SQLDB].[db].[meeting_parties] as t1
inner join
(select distinct
[user_id]
from
[SQLDB].[db].[meeting_parties]
group by
[user_id]
having
count([user_id]) = 1) as t2 on t1.user_id = t2.user_id
where
[type] = 'organiser'
This works great and returns:
user_id
--------------------
22
1255
9821
and so on...
It produces a single column with the ID's of everyone who is a "Meeting Organizer" and also in the active_meetings table. (note, there are multiple types/roles, this was the best way to grab them all)
Now, I need this data to filter another table, another join. Here is the start of my query
Query_2_PREP: returns 5 columns where the meeting has "started" already.
SELECT
[meeting_id]
,[meeting_style]
,[meeting_day]
,[address]
,[promos]
FROM
[SQLDB].[db].[all_meetings]
WHERE
[meeting_started] = 'TRUE'
This works as well
meeting_id | meeting_style | meeting_day ...
---------------------------------------------
23 open M,F,SA
23 discussion TU,TH
23 lead W,F
and so on...
and returns ALL 10,982 meetings that started, but I need it to return only the meetings that are from the distinct 'organiser's ID's from Query_1_final (which should be more like 1200 records or so)
Ideally, I need something "like" this below (but of course it does not work)
Query 2: needs to return all meetings that are from organiser ID's only.
SELECT
[meeting_party_id]
,[meeting_style]
,[meeting_day]
,[address]
,[promos]
FROM
[SQLDB].[db].[all_meetings]
WHERE
[meeting_started] = 'TRUE'
AND [meeting_party_id] = "ANY Query_1_final results, especially multiple"
I have tried nesting JOIN and INNER JOIN's but I think there is something fundamental I am missing here about SQL. In PHP I would use an array compare or just run another query... any help would be much appreciated.
Just use IN. Here is the structure of the logic:
with q1 as (
<first query here>
)
SELECT m.*
FROM [SQLDB].[db].[all_meetings] m
WHERE meeting_started = 'TRUE' AND
meeting_party_id IN (SELECT user_id FROM q1);
For an assignment i must combine 3 tables and write a query that returns the names of all people that have less than half of the wealth of the richest person. We define the wealth of a person as the total money on all of his/her accounts.
The 3 tables are:
Persons
id | name | address | age | eyeColor | gender
BankAccounts
id | balance
AccountOf
id | person_id → Persons | account_id → BankAccounts
I know how to use te SUM() function and the MAX() function, but combining them is a pain in my ass.
There is also someone without an bankaccount.
Does anyone know how to do this assignment or can give me a hint?
Not to give it away, since it's an assignment and that kind of ruins the whole thing, but... you'll need to find the sum(balance) for the richest person, which would be the max of all the persons' sum(balance). This will look something like:
SELECT
max(personbalance)
FROM
(
Select
sum(balance)
FROM
persons
join accountof
join bankaccounts
GROUP BY persons.id
)subForSum
This will just be a subquery in your main query, but it should give you enough direction to slap the rest of it together. When in doubt with these things, just subquery and subquery and subquery. You can clean it up after you get the answer you expect.
For future students who are looking for an answer:
Use a left Join as some persons might not be in the BankAccount table
Obtain null value and use coalesce to replace the values
use this as a subquery to obtain the richest person and compare values:
SELECT max(personbalance)
FROM
(
Select
sum(balance)
FROM
persons
join accountof
join bankaccounts
GROUP BY persons.id
)
Good Luck!
For sake of example, let's assume 3 tables:
PHYSICAL_ITEM
ID
SELLER_ID
NAME
COST
DIMENSIONS
WEIGHT
DIGITAL_ITEM
ID
SELLER_ID
NAME
COST
DOWNLOAD_PATH
SELLER
ID
NAME
Item IDs are guaranteed unique across both item tables. I want to select, in order, with a type label, all item IDs for a given seller. I've come up with:
Query A
SELECT PI.ID AS ID, 'PHYSICAL' AS TYPE
FROM PHYSICAL_ITEM PI
JOIN SELLER S ON PI.SELLER_ID = S.ID
WHERE S.NAME = 'name'
UNION
SELECT DI.ID AS ID, 'DIGITAL' AS TYPE
FROM DIGITAL_ITEM DI
JOIN SELLER S ON DI.SELLER_ID = S.ID
WHERE S.NAME = 'name'
ORDER BY ID
Query B
SELECT ITEM.ID, ITEM.TYPE
FROM (SELECT ID, SELLER_ID, 'PHYSICAL' AS TYPE
FROM PHYSICAL_ITEM
UNION
SELECT ID, SELLER_ID, 'DIGITAL' AS TYPE
FROM DIGITAL_ITEM) AS ITEM
JOIN SELLER ON ITEM.SELLER_ID = SELLER.ID
WHERE SELLER.NAME = 'name'
ORDER BY ITEM.ID
Query A seems like it would be the most efficient, but it also looks unnecessarily duplicative (2 table joins to the same table, 2 where clauses on the same table column). Query B looks cleaner in a way to me (no duplication), but it also looks much less efficient, since it has a subquery. Is there a way to get the best of both worlds, so to speak?
In both cases, replace the union with union all. Union unnecessarily removes duplicates.
I would expect Query A to be more efficient, because the optimizer has more information when doing the join (although I think Oracle is pretty good with using indexes even after a union). In addition, the first query reduces the amount of data before the union.
This is, however, only an opinion. The real test is to time the two queries -- multiple times to avoid cache fill delays -- to see which is better.