using multiple sub select query in Hive SQL

using multiple sub select query in Hive SQL - hive

I have the following query in Hive and since Hive doesn't allow more than 1 sub select, I am not sure how to work this out. I looked at existing postings but had nothing that came close to this. I have additional subselects in this query for more joins. Any help is greatly appreciated. Thank you.
select * from table_a a inner join table_b on a.id = b.id)
where a.timestamp in (select max(c.timestamp) from table_a c
where c.id = a.id)
and b.timestamp in (select max(d.timestamp) from table_b dateofbirth
where d.id = b.id);

Related

Optimise oracle sql

Hi I have existing sql script need to optimise. How the script written is like this
select * from table A
Where (A.ID, A.name, A.age)
in
(
Select B.ID,B.name, B.age
from table B
where B.ID in
(
Select A.ID from table A Where age = ‘30’
)
)
When I try to run this it takes very Long time and I have no idea why the previous programmer write in this ways please help if there is better ways to rewrite this. Appreciate much thanks!

I would try something like this:
Select A.* from (select ID, NAME, age from table A WHERE age = 30) A
LEFT JOIN table B ON A.ID = B.ID AND B.NAME = B.NAME AND A.age = B.age

Pulling values from IBM DB2 lookup table based on relationship between 3 tables

I sincerely hope what I entered in the Title is not confusing. I also hope I explain this properly. In a nutshell, I have 3 tables as follows:
TABLE_A
id *
value
TABLE_B
id *
other_id
TABLE_C
other_id *
name_of_product
I want to pull multiple values from TABLE_A and one value from TABLE_C based on matching IDs between TABLE_A and TABLE_B, as well as matching ID between TABLE_B and TABLE_C. I have tried searching this, but haven't as yet found anything directly related to my problem. I have tried this SQL code, but I know it is wrong:
SELECT
TRIM(id) primary_key_value,
a.value name,
c.name_of_product product
FROM TABLE_A a, TABLE_C c
JOIN TABLE_A t1 ON t1.id = a.id
JOIN TABLE_B t2 ON t2.other_id = c.other_id
WHERE c.name_of_product = 'widget'
Any help would be greatly appreciated. If it isn't obvious by the code above, I should state that I am somewhat of an SQL newbie. Thank you.

It seems you need two joins:
SELECT TRIM(a.id) as primary_key_value,
a.value as name,
c.name_of_product as product
FROM TABLE_A a JOIN
TABLE_B b
ON b.id = a.id JOIN
TABLE_C c
ON c.other_id = b.other_id
WHERE c.name_of_product = 'widget'

Please help - SQL Join query

I am working on a project, and need to join results from 2 tables into one set.
The tables are ordered as such:
gameData: [Id,TeamID, data..........]
players: [Id (same as above), name, data.....]
I need to do something like:
SELECT * FROM gameData and SELECT data FROM players WHERE gameData.Id = players.Id
And here is what I have thusfar.
SELECT * FROM gameData AS A LEFT OUTER JOIN players AS B on A.playerID = B.Id;
And have it return all of the values from A, and only the data from B.
I know that the syntax is not correct, I have little experience working with SQL Joins, any advice would be greatly appreciated!
Edit: Trying both answers now. Thanks!
Edit2: Can I do something like: "Select a.* from tableA as a"

I love you guys, working as intended now!
Thanks!
The query I ended up using was:
Select a.*, b.height, b.weight from gameData as a LEFT OUTER JOIN players b on a.playerID = b.Id;

You could enumerate the fields that you select and alias the tables, like:
select a.Id, a.TeamId, a.data, b.data
from tableA a
join tableB b on a.Id = b.Id

Select a.Id, a.TeamID, a.data, b.data
FROM gameData as a
LEFT OUTER JOIN
players b On a.ID = b.ID

Postgresql: alternative to WHERE IN respective WHERE NOT IN

I have several statements which access very large Postgresql tables i.e. with:
SELECT a.id FROM a WHERE a.id IN ( SELECT b.id FROM b );
SELECT a.id FROM a WHERE a.id NOT IN ( SELECT b.id FROM b );
Some of them even access even more tables in that way. What is the best approach to increase the performence, should I switch i.e. to joins?
Many thanks!

JOIN will be far more efficient, or you can use EXISTS:
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)
The subquery will return at most 1 row.

Here's a way to filter rows with an INNER JOIN:
SELECT a.id
FROM a
INNER JOIN b ON a.id = b.id
Note that each version can perform differently; sometimes IN is faster, sometimes EXISTS, and sometimes the INNER JOIN.

Yes, i would recomend going to joins. It will speed up the select statements.

How do I find records that are not joined?

I have two tables that are joined together.
A has many B
Normally you would do:
select * from a,b where b.a_id = a.id
To get all of the records from a that has a record in b.
How do I get just the records in a that does not have anything in b?

select * from a where id not in (select a_id from b)
Or like some other people on this thread says:
select a.* from a
left outer join b on a.id = b.a_id
where b.a_id is null

select * from a
left outer join b on a.id = b.a_id
where b.a_id is null

The following image will help to understand SQL LET JOIN :

Another approach:
select * from a where not exists (select * from b where b.a_id = a.id)
The "exists" approach is useful if there is some other "where" clause you need to attach to the inner query.

SELECT id FROM a
EXCEPT
SELECT a_id FROM b;

You will probably get a lot better performance (than using 'not in') if you use an outer join:
select * from a left outer join b on a.id = b.a_id where b.a_id is null;

SELECT <columnns>
FROM a WHERE id NOT IN (SELECT a_id FROM b)

In case of one join it is pretty fast, but when we are removing records from database which has about 50 milions records and 4 and more joins due to foreign keys, it takes a few minutes to do it.
Much faster to use WHERE NOT IN condition like this:
select a.* from a
where a.id NOT IN(SELECT DISTINCT a_id FROM b where a_id IS NOT NULL)
//And for more joins
AND a.id NOT IN(SELECT DISTINCT a_id FROM c where a_id IS NOT NULL)
I can also recommended this approach for deleting in case we don't have configured cascade delete.
This query takes only a few seconds.

The first approach is
select a.* from a where a.id not in (select b.ida from b)
the second approach is
select a.*
from a left outer join b on a.id = b.ida
where b.ida is null
The first approach is very expensive. The second approach is better.
With PostgreSql 9.4, I did the "explain query" function and the first query as a cost of cost=0.00..1982043603.32.
Instead the join query as a cost of cost=45946.77..45946.78
For example, I search for all products that are not compatible with no vehicles. I've 100k products and more than 1m compatibilities.
select count(*) from product a left outer join compatible c on a.id=c.idprod where c.idprod is null
The join query spent about 5 seconds, instead the subquery version has never ended after 3 minutes.

Another way of writing it
select a.*
from a
left outer join b
on a.id = b.id
where b.id is null
Ouch, beaten by Nathan :)

This will protect you from nulls in the IN clause, which can cause unexpected behavior.
select * from a where id not in (select [a id] from b where [a id] is not null)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

using multiple sub select query in Hive SQL - hive

Related

Optimise oracle sql

Pulling values from IBM DB2 lookup table based on relationship between 3 tables

Please help - SQL Join query

Postgresql: alternative to WHERE IN respective WHERE NOT IN

How do I find records that are not joined?

Categories

Resources