subquery in SELECT without JOIN in Hive? - sql

I know Hive doesn't support this
SELECT (CASE WHEN table1.id in (SELECT table1.id
from table1,table2
where table1.id = table2.id and table2.company like '%My Company%')
THEN table1.email
ELSE regexp_replace(table1.email, substr(table1.email, 1), 'XXXX')
END) as email, table1.id
FROM table1
Hive cannot do SELECT within SELECT (subquery in SELECT).
But let say for some restriction I cannot do JOIN after FROM clause. Is there a "creative" way to do this? I was thinking about parsing and passing a "static list" from SELECT table1.id from table1,table2 where table1.id = table2.id and table2.company like '%My Company%' in a separate query. But this could go up to thousands.

if you could use a select for a join the you could use a left join and check for null value
SELECT case when t1.id is null
then regexp_replace(table1.email, substr(table1.email, 1), 'XXXX')
else table1.email
end
, table1.id
FROM table1
left join (
SELECT table1.id
from table1,table2
where table1.id = table2.id
and table2.company like '%My Company%'
) t on table1.id = t.id

Related

Teradata update with left join and inner join

I need to bring the below sample code from mssql to Teradata. Please let me know how to convert it.Sample code -
Update table1
set table1.name = table3.name
from table1
inner join table2
on table2.id = table1.id
left join table3
on table2.id = table3.id where table3.name is null
It's ugly, but this should work. You can get around Teradata not allowing outer joins in an update by using a derived table.
update table1
from table1,
(select <column list> from table2 left join table3 on table2.id = table3.id) t
set ...
where
table1.id = t.id
and t.name is null
Like the others mentioned, you should check your NULL condition. Nevertheless, here's one more option:
Update table1
FROM (
select table1.id, table3.name
from table1
inner join table2
on table2.id = table1.id
left join table3
on table2.id = table3.id where table3.name is null
) src
SET table1.name = src.name
WHERE table1.id = src.id
You just move your original source query into a "src" sub-query and update from it. Just make sure in your "src" query you only have one row for each table1.id value so you don't get target row updated by multiple source rows errors.
I think your logic is better handled by not exists:
Update table1
set table1.name = null
where not exists (select 1
from table2 join
table3
on table2.id = table3.id
where table2.id = table1.id
);
This is not exactly what your query specifies -- this will update table1.name when there is no match in table2. If that is an issue, you can do:
update table1
set table1.name = null
where exists (select 1
from table2
where table2.id = table3.id
) and
not exists (select 1
from table2 join
table3
on table2.id = table3.id
where table2.id = table1.id
);

How to exclude records based on values in 2 other tables

In SQL Server I need to select rows that have a specific value in one column, but I have to then take those records and exclude them from my results if they have specific values in 2 other tables. The records I want might not exist in the other 2 tables.
SELECT Table1.ID
FROM
Table1
WHERE
Table1.Column1 = 'A'
AND
Table1.ID NOT IN (SELECT Table2.ID FROM Table2 WHERE Table2.Column2 = 'X')
AND
Table1.ID NOT IN (SELECT Table3.ID FROM Table3 WHERE Table3.Column3 = 'Y')
I keep getting records where ID does appear in Table2 but not Table3 and vice versa. What I want to do is exclude that ID if it's in either Table2 with Column2 = 'X' or Table3 with Column3 = 'Y'
I think my logic or syntax is wrong.
Plus I know "NOT IN" can do weird things if the results of a subquery contains NULLs, so I'm not sure if there's a more straightforward way to do this.
I've been rewriting this a bunch of different ways, but I'm not getting the results I want. I keep getting no records, too few records, or too many records...argh!
Any suggestions?
You need to change to OR:
SELECT Table1.ID
FROM Table1
WHERE Table1.Column1 = 'A'
AND (
Table1.ID NOT IN (SELECT Table2.ID FROM Table2 WHERE Table2.Column2 = 'X')
OR
Table1.ID NOT IN (SELECT Table3.ID FROM Table3 WHERE Table3.Column3 = 'Y'))
or you could rewrite it to:
SELECT Table1.ID
FROM Table1
WHERE Table1.Column1 = 'A'
EXCEPT (
SELECT Table2.ID FROM Table2 WHERE Table2.Column2 = 'X'
UNION ALL
SELECT Table3.ID FROM Table3 WHERE Table3.Column3 = 'Y'
)
EDIT:
SELECT DISTINCT Table1.ID
FROM Table1
LEFT JOIN Table2
ON Table1.ID = Table2.ID
AND Table2.Column2 = 'X'
LEFT JOIN Table3
ON Table1.ID = Table3.ID
AND Table3.Column3 = 'Y'
WHERE Table1.Column1 = 'A'
AND (Table2.ID IS NULL AND Table3.ID IS NULL);

Need help in sql query to give result set based on previous version

i have two tables , table1 and table2.
table1
id name city uqid
1 vikas mysore 2
table2
id uqid name status
1 1 vikas pending
1 2 Vikas processing
I have a SQL query to fetch the details of table1 joined with table2
select table1.id,
table1.name,
table1.city,
table2.status
from table1
left outer join table2
on table2.uqid = table1.uqid
and table2.id = table1.id
this will give me the result set
id name city status
1 vikas mysore processing
how can i modify the above query to not to give us the result set until the status is set to "pass" in table2 for uqid = 1 and id = 1 ?
Try the following.
select table1.id,
table1.name,
table1.city,
table2.status
from table1
left outer join table2
on table2.uqid = table1.uqid
and table2.id = table1.id
where table2.status ilike 'pass';
If by stating that you need table2's uqid=id=1, you mean that you need both the fields to have same value then use the following.
select table1.id,
table1.name,
table1.city,
table2.status
from table1
left outer join table2
on table2.uqid = table1.uqid
and table2.id = table1.id
where table2.status ilike 'pass' and table2.uqid=table2.id;
Suggestion: Try to normalize your tables
I am not sure this is the proper way or any other efficient way exists , but this will give you the desired result.
select table1.id,
table1.name,
table1.city,
table2.status
from table1
left outer join table2
on table2.uqid = table1.uqid and table2.id = table1.id
where table1.id in(select distinct id from table2 where status like 'pass'
and uqid not in(select uqid from table1))

Using Case statement in Where clause

In my query I have the following condition
left Join Table2 on Table2.Id = Table1.Id and Table2.status in ('Close', 'Open')
And the above condition gives me 2 extra rows because of the left join. I noticed that if I have only either Close or Open in the condition it returns the correct number of rows.
To fix that I was trying to write something like this
And Table2.status = (Case Table2.status
WHEN 'Open' Then 'Open'
When 'Close' Then 'Close'
End )
But this still returns 2 extra rows. Any suggestions on how to fix this??
You could do something like this:
select *
from table1
left join
(
select id from table2
where status in ('open', 'close')
group by id
) as table2
on table1.id = table2.id
This seems pretty hacky, but I cannot seem to come up with something better at the moment. Otherwise, you could use a DISTINCT
select DISTINCT table1.*, table2.stuff
FROM Table1
LEFT JOIN Table2
on Table2.Id = Table1.Id and Table2.status in ('Close', 'Open')
The key is the DISTINCT statement.
SELECT DISTINCT Table1.* FROM Table1
LEFT JOIN Table2 ON Table2.Id = Table1.Id
AND Table2.status in ('Close', 'Open')

Subquery not in performance question

I have this slow query
select * from table1 where id NOT IN ( select id from table2 )
Would this be faster by doing something like (not sure if this is possible):
select * from table1 where id not in ( select id from table2 where id = table1.id )
Or:
select * from table1 where table1.id NOT EXIST( select id from table2 where table2.id = table1.id )
Or:
select * from table1
left join table2 on table2.id = table1.id
WHERE table2.id is null
Or do something else? Like break it up into two queries ...
The question is - are the field(s) in the comparison nullable (meaning, can the column value be NULL)?
If they're nullable...
...in MySQL the NOT IN or NOT EXISTS perform better - see this link.
If they are NOT nullable...
... LEFT JOIN / IS NULL performs better - see this link.
select table1.* from table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL
The object being to get rid of NOT IN