Optimising where clause with x in y or z in y - hive

I'm just wondering if there is any way to optimise this query :
select * from table_x where buyer_id in (select id from table_y) x or
seller_id in (select id from table_y) y
Since the two subqueries in my where-clause are identical and I suspect that the program will run the two subqueries separately
Thanks!

Your query is essentially:
select x.*
from table_x x
where x.buyer_id in (select y.id from table_y y) or
x.seller_id in (select y.id from table_y y);
This construct should be fine. In some databases, you might use exists instead of in, but I think Hive will be fine with this.

To solve multiple equery issue in hive use semi left join:
SELECT x.*
FROM table_x x
LEFT SEMI JOIN table_y b ON (x.buyer_id = b.id )
LEFT SEMI JOIN table_y c ON (x.seller_id = c.id )

Related

How to update Column of one CTE with another CTE in oracle

I Have two CTE let's say A and B, and I want to update a column of A with CTE B.
WITH cte_A AS ( SELECT X, 0 AS Y from table_1 -- Some complex logic for Y that is why updating with other CTE )
UPDATE cte_A SET Y = (
WITH CTE_B AS (SELECT Y FROM table_2 )
SELECT Y FROM CTE_B WHERE CTE_B.ID = cte_A.ID
)
SELECT * FROM cte_A
I am getting errors like missing SELECT keyword in oracle
You cannot UPDATE a query; you UPDATE a table (or a view). However, you don't need to update it, you just need to display data from the two sources.
What you can do is use JOIN the two tables and instead of selecting * from CTE_A select Y from CTE_B and the other columns from CTE_A (and given your logic you appear to be using an OUTER JOIN):
WITH cte_A (id, x, y) AS (
SELECT id, x, 0 from table_1
),
CTE_B (id, y) AS (
SELECT id, y FROM table_2
)
SELECT a.id,
a.x,
b.y
FROM CTE_A a
LEFT OUTER JOIN CTE_B b
ON b.ID = a.ID
You can't; a CTE is just another way of writing a subquery. You can't update a subquery; you update tables (in most cases), rarely views, but subqueries - nope.
In your case, you'd actually update table_1. merge might be a good choice:
MERGE INTO table_1 a
USING table_2 b
ON (a.id = b.id)
WHEN MATCHED
THEN
UPDATE SET a.y = b.y;
OK, I understand that there's some "complex logic" involved so the query might need to be modified, but - that's the general idea.

PostgreSQL: find rows where a value is equal maximum in subquery

I have a PostgreSQL that works which uses multiple joins, and one column that is the result of a calculation. From the result of that query I need to extract the rows in which that one column is maximal, and there might be many. If there weren't, I could just ORDER BY that_column DESC LIMIT 1; and I found that if I just needed to that on an existing database table, I could just do:
SELECT columns FROM table_name
WHERE that_coulmn = (SELECT MAX(that_column) FROM table_name)
However, that's also not the case. I have a query that is like this:
SELECT z.xyz, (x.aa- x.bb) * y.qq as that_column
FROM table_1 x
JOIN table_2 y ON x.foo = y.foo
JOIN table_3 z ON y.bar = z.bar
JOIN table_4 w ON w.baz = z.baz
where x.aa IS NOT NULL
Now, this works. Now I need to know how I put that as a subquery and take from that the z.xyz from the rows where that_column is maximal. Any help?
Maybe is this?
SELECT MAX(thatmax.that_column) as theMax FROM (
SELECT z.xyz, (x.aa- x.bb) * y.qq as that_column
FROM table_1 x
JOIN table_2 y ON x.foo = y.foo
JOIN table_3 z ON y.bar = z.bar
JOIN table_4 w ON w.baz = z.baz
where x.aa IS NOT NULL
) thatmax
if you need to select other columns as your output:
select * from (
select *, dense() over(order by (x.aa- x.bb) * y.qq desc) as that_column
from table_1 x
join table_2 y ON x.foo = y.foo
join table_3 z ON y.bar = z.bar
join table_4 w ON w.baz = z.baz
where x.aa IS NOT NULL
) t where that_column = 1
I found it!
I took that as a subquery, used [rank](https://www.postgresqltutorial.com/postgresql-rank-function/) to rank by thatColumn and then put that in another subquery to take the ones with rank=1
SELECT xyz
FROM
(xyz, thatColumn, RANK() OVER(ORDER BY thatColumn DESC) FROM (
-- the subquery in the question
)
) AS highest_and_rank
WHERE thatColumnRank = 1
Thank you to everyone who tried to help. You're awesome.

SELECT * FROM t WHERE t.x OR t.y IN (SELECT id FROM z)

Is something like this possible (I know this statement is not working, I tried it):
SELECT * FROM t WHERE t.x OR t.y IN (SELECT id FROM z)
Example
Table t:
|id|x |y
|1 |101|201
|2 |102|202
Table z:
|id |
|101 |
|201 |
And from this table t I want to select all entries where either attribute x or attribute y is contained in the list of ids of table z.
I know I can do
SELECT * FROM t WHERE t.x IN (SELECT id FROM z) OR t.y IN (SELECT id FROM z)
but this feels like it is very inefficient when the IN values are coming from a complex subquery (which then is the same in both IN clauses).
Or are current query planner implementations clever enough to see that both subqueries give the same results and only execute this one time? Or maybe there is another solution using EXISTS which I currently don't see?
PS: I'm using Postgres, but I'm looking for a generic solution.
Use EXISTS
SELECT * FROM t WHERE exists (SELECT 1 FROM z where z.id in (t.x,t.y))
If z is a complex query, then you can use a CTE to simplify the code:
WITH z AS (
. . .
)
SELECT *
FROM t
WHERE t.x IN (SELECT id FROM z) OR t.y IN (SELECT id FROM z);
You can also use JOIN or EXISTS instead:
SELECT *
FROM t
WHERE EXISTS (SELECT 1
FROM z
WHERE z.id IN (t.x, t.y)
);
The JOIN version has the downside that rows can multiply due to duplicates in z.
That said, the version with the two IN expressions is possibly the most efficient.
Try...
SELECT t.* FROM t join z on t.x = z.id or t.y = z.id

In sql, can you join to a select statement that references the outer tables in other joins?

What I want to do is to transform the following sql
SELECT X
FROM Y LEFT JOIN Z ON Y.Id=Z.id
WHERE Y.Fld='P'
into
SELECT Y
FROM Y LEFT JOIN (SELECT TOP 1 Id FROM Z WHERE Z.Id=Y.Id ORDER BY Z.PrimaryKey DESC) ON 1=1
WHERE Y.Fld='P'
The reason I want to do this is because Z has multiple rows that can be joined to Y, that are not unique in a distinguishable way, other than that the one we need is the latest one, and we only need that one record. Is this possible? I tried it but mssql complained that I cannot reference Y.Id from within the sub query.
How about a CTE approach:
;WITH CTE
AS
(
SELECT Id,
PrimaryKey,
ROW_NUMBER() OVER (PARTITION BY Id, ORDER BY Primarykey Desc) AS RN
FROM Z
)
SELECT X
FROM Y
LEFT JOIN CTE
ON CTE.ID = Y.ID
WHERE CTE.RN = 1

Using having count() in exists clause

I am trying to make a SQL query where the subquery in an 'exists' clause has a 'having' clause. The strange thing is that. There is no error and the subquery works as a stand-alone query. However, the whole query gives exactly the same results with the 'having' clause as without.
This is kind of what my query looks like:
SELECT X
FROM A
WHERE exists (
SELECT X, count(distinct Y)
FROM B
GROUP BY X
HAVING count(distinct Y) > 2)
So I'm trying to select the rows from A where X has more then two occurances of Y in B.
However, the results also include records that do not exist in the subquery. What am I doing wrong here?
You don't correlate the two queries:
SELECT X
FROM A
WHERE (
SELECT COUNT(DISTINCT y)
FROM b
WHERE b.x = a.x
) > 2
Your query says something like this:
select X from A IF THERE ARE records having more than one occurence if grouped by Y in B.
If your 'exists subquery' returns even one record from table B the condition is true and you will get all the rows from A.
Try:
select X
from A
where exists (select 1
from B
where B.x = A.x
group by b.x
having count(distinct b.y) > 2
)
I had a similar situation and solved by a JOIN since the other answers didn't work for me. I tried to correlate to your generic example. Hope it is helpful to someone else!
SELECT X
FROM A
JOIN (SELECT X, COUNT(DISTINCT y)
FROM B
GROUP BY X
HAVING count(distinct Y) > 2) C
ON A.X = C.X