Re-writing query from in() to joins - sql

Can you assist in re-writing this into joins?
select * from users where users.advised_by in (
select p.id
from advisors p
join advisor_members m on p.id = m.advisor_id
join representatives r on m.user_id=r.user_id
where m.memeber_type='Advisor'
)
This is part of 200+ row query and that in() statement is hard to maintain when there are changes.

you should use a proper on clause
select *
from users
inner join
(
select p.id
from advisors p
join advisor_members m on p.id = m.advisor_id
join representatives r on m.user_id=r.user_id
where m.memeber_type='Advisor'
) t on users.advised_by = t.id

/*Option 1 */
SELECT *
FROM users usr
INNER JOIN
(
SELECT p.id AS advisor_id
FROM advisors p
JOIN advisor_members m
ON p.id = m.advisor_id
JOIN representatives r
ON m.user_id=r.user_id
WHERE m.memeber_type='Advisor' ) T2 usr.advised_by = t2.advisor_id
/*Option2 -- */
SELECT *
FROM users usr
INNER JOIN advisors p
ON usr.advised_by=p.id
JOIN
(
SELECT *
FROM advisor_members
WHERE m.memeber_type='Advisor') m
ON p.id = m.advisor_id
JOIN representatives r
ON m.user_id=r.user_id

Related

SQL to HiveQL conversion

I have this SQL query and I am trying to convert it so that it can be run on HiveQL 2.1.1.
SELECT p.id FROM page p, comments c, users u,
WHERE c.commentid= p.id
AND u.id = p.creatorid
AND u.upvotes IN (
SELECT MAX(upvotes)
FROM users u WHERE u.date > p.date
)
AND EXISTS (
SELECT 1 FROM links l WHERE l.relid > p.id
)
This does not work on Hive QL, as it has more than 1 SubQuery (which is not supported)
EXISTS or IN replacements from SQL to Hive SQL are done like this:
WHERE A.aid IN (SELECT bid FROM B...)
can be replaced by:
A LEFT SEMI JOIN B ON aid=bid
But I can`t come up with a way to do this with the additional MAX() function.
Use standard join syntax instead of comma separated :
SELECT p.id
FROM page p INNER JOIN
comments c
ON c.commentid= p.id INNER JOIN
users u
ON u.id = p.creatorid INNER JOIN
links l
ON l.relid > p.id
WHERE u.upvotes IN (SELECT MAX(upvotes)
FROM users u
WHERE u.date > p.date
);
I am not sure what the upvotes logic is supposed to be doing. The links logic is easy to handle. Hive may handle this:
SELECT p.id
FROM page p JOIN
comments c
ON c.commentid = p.id JOIN
users u
ON u.id = p.creatorid CROSS JOIN
(SELECT MAX(l.relid) as max_relid
FROM links l
) l
WHERE l.max_relid > p.id AND
u.upvotes IN (SELECT MAX(upvotes)
FROM users u
WHERE u.date > p.date
);

SELECT * and SELECT COUNT(*) in one query

My SQL query looks like this
SELECT *
FROM categories AS c
LEFT JOIN LATERAL (SELECT i.*
FROM influencer_profiles AS i
WHERE c.id = i.category_id
ORDER BY i.updated_at
LIMIT 2) AS i ON 1 = 1
INNER JOIN users AS u ON i.user_id = u.id
But I also want to count each influencer_profile for category to display how many influencer_profiles in each categories. How can I use COUNT(*) with selecting all columns?
SELECT *
FROM categories AS c
LEFT JOIN LATERAL (SELECT COUNT(*)
FROM influencer_profiles AS i
WHERE c.id = i.category_id
ORDER BY i.updated_at
LIMIT 2) AS i ON 1 = 1
INNER JOIN users AS u ON i.user_id = u.id
This code doesn't work.
Perhaps you just want a window function. I note that you are using left join in one place and the inner join is undoing it.
So, I am thinking:
SELECT c.*, i.*, u.*,
COUNT(*) OVER (PARTITION BY c.id) as category_cnt
FROM categories c LEFT JOIN LATERAL
(SELECT i.*
FROM influencer_profiles AS i
WHERE c.id = i.category_id
ORDER BY i.updated_at
LIMIT 2
) i
ON 1=1 LEFT JOIN
users u
ON i.user_id = u.id;

SQL JOIN 3 TABLES WITH COUNT AND GROUP BY CLAUSE

I Have 3 tables like that:
EXPEDITION (ID, CreateDate, Status);
PACKAGE (ID, EXPEDITION_ID)
ITEM (ID, EXPEDIITONPACKAGE_ID);
I need to know, for each expedition, the quantity of packages and the quantity of items.
UPDATE
This is the query that seems to have it.
SELECT
E.ID,
P.Packages,
I.Items
FROM EXPEDITION E
LEFT JOIN (
SELECT DISTINCT E.ID, COUNT(P.ID) AS "Packages" FROM EXPEDITION E
LEFT JOIN PACKAGE P
ON E.ID = P.EXPEDITION_ID
GROUP BY E.ID
) P
ON E.ID = P.ID
LEFT JOIN (
SELECT DISTINCT P.ID as "PackageID", COUNT(I.ID) AS "Items" FROM PACKAGE P
JOIN ITEM I
ON P.ID = I.EXPEDIITONPACKAGE_ID
GROUP BY P.ID
) I
ON P.ID = I.PackageId
GROUP BY
E.ID,
P.Packages,
I.Items
ORDER BY
E.ID
It has two inner queries, that count the IDs separately, and they are joined in the main query to show the results.
Try this. Not tested yet...but it should work..
;With c1 as
(
Select e.expid, count(e.expid) as qtyPck
From packages p inner join
Expeditions e on p.expid = e.expid
Group by e.expid
),
C2 as
(
Select i.pakId, count(i.pakId) as qtyItems
From items i inner join packages p
On i.pakId = p.pakId
Group by i.pakid
)
Select e.expId, p.qtyPck, I.qtyItems
From expeditions e
Join packages p on p.expId = e.expId
Join items i on i.pakId = p.pakId;

Optimize JOIN SQL query with additional SELECT

I need a query which will select just one (GROUP BY phi.id_product) image for each product and this image have to be the one with the highest priority (inner SELECT with ORDER BY statement).
The priority is stored in N:M relation table called product_has_image
I've created a query, but it tooks about 3 seconds to execute and I need to optimize it. Here it is:
SELECT p.*, i.id AS imageid
FROM `product` p JOIN `category` c on c.`id` = p.`id_category`
LEFT OUTER JOIN (SELECT id_product, id_image FROM
`product_has_image` ORDER BY priority DESC) phi ON p.id = phi.id_product
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
Indexes which I find to be important in this query are:
image (PRIMARY id)
product_has_image (PRIMARY id_product, id_image; INDEX id_product; INDEX id_image)
product (PRIMARY id, id_category; INDEX id_category)
category (PRIMARY id; INDEX id_parent)
Most of the time takes joining the tables using the SELECT statement which is required for sorting.
Joining with LEFT JOIN [product_has_image] phi ON p.id = phi.id_product is much faster, but doesn't assign the image with the highest priority.
Any help would be appreciated.
Reformatted for sensibility . . .
SELECT p.*, i.id AS imageid
FROM `product` p
INNER JOIN `category` c on (c.`id` = p.`id_category`)
LEFT OUTER JOIN (SELECT id_product, id_image
FROM `product_has_image`
ORDER BY priority DESC) phi
ON (p.id = phi.id_product)
LEFT OUTER JOIN `image` i
ON (phi.id_image = i.id)
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
Without seeing an execution plan or DDL, I'd guess (shudder) that the problem is likely to be the inner select/sort. If you create a view
create view highest_priority_images as
select id_product, max(priority)
from product_has_image
group by id_product
Then you can replace that inner SELECT...ORDER BY with a SELECT...INNER JOIN on that view. That would reduce the cardinality, so I'd expect it to run faster.
Posting DDL would help.
I would probably try to do it like this:
SELECT p.*, i.id AS imageid
FROM `product` p
INNER JOIN `category` c ON c.id = p.id_category
/* a list of `id_product`s with their highest priorities
from `product_has_image` */
LEFT OUTER JOIN (
SELECT id_product, MAX(priority) AS max_priority
FROM `product_has_image`
GROUP BY id_product
) m ON p.id = m.id_product
/* now joining `product_has_image` again, using
m.`max_priority` for additional filtering */
LEFT OUTER JOIN `product_has_image` phi
ON p.id = phi.id_product AND m.max_priority = phi.priority
/* if you only select `id` from `image`, you can use
phi.`id_image` instead and remove this join */
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE c.id_parent = 2 OR c.id = 2
Can't test it now, but wouldn't it be possible to do this?
SELECT p.*, i.id AS imageid
FROM `product` p JOIN `category` c on c.`id` = p.`id_category`
LEFT JOIN `product_has_image` phi ON p.id = phi.id_product
LEFT OUTER JOIN `image` i ON phi.id_image = i.id
WHERE (c.`id_parent` = 2 OR c.`id` = 2)
GROUP BY phi.id_product
ORDER BY phi.priority DESC
Do it in a regular join and order by phi.priority.

INNER JOIN Distinct ID

I have the following code:
FROM CTE_Order cte
INNER JOIN tblOrders o
ON cte.OrderId = o.Id
INNER JOIN tblOrderUnits ou
ON o.id = ou.OrderId
INNER JOIN tblOrderServiceUnits osu
ON ou.VMSUnitID = osu.UnitId
When I join the ou I get 2 of the same unit Id's. This make the Inner Join tblOrderServiceUnits return 4 rows with 2 being duplicates. I need it to only return the 2 rows the are different. How do I use a distinct to Inner Join only distinct ou.id?
Sorry for the bad explanation but basically I am jsut trying to see how an INNER JOIN with a distinct subquery would work, If someone could give me an example of that I could figure it out from there.
INNER JOIN (SELECT DISTINCT * FROM X) Alias
ON Alias.ID = Primary.ID
For your example:
INNER JOIN (SELECT DISTINCT VMSUnitID, OrderId FROM tblOrderUnits) ou
ON o.id = ou.OrderId