Hive - Update From statement with a inline query - sql

I have the following query and I wanted to run it in Hive but Hive does not support inline queries in update. Can anyone please help me with this update query in Hive?
UPDATE TABLE1 FROM
(SELECT COUNT(*) AS NEW_COUNT
FROM TABLE2
WHERE XTRCT_DT IN (SELECT MAX(XTRCT_DT) FROM TABLE3)) AS T
SET TBL = T.NEW_COUNT
WHERE XTRCT_DT IN(SELECT MAX(XTRCT_DT) FROM TABLE4) AND TN=1;
Currently I am using hive version Above 3.0.
I have tried Merge statement for this update but it didn't worked. Can someone please help?
This was the MERGE statement that I tried working but I was getting an error in ON clause for inclusion of IN.
MERGE INTO TABLE1 USING (
SELECT COUNT(*) AS NEW_COUNT FROM TABLE2 WHERE XTRCT_DT IN(SELECT MAX(XTRCT_DT) FROM TABLE3)) AS T
ON XTRCT_DT IN(SELECT MAX(XTRCT_DT) FROM TABLE4) AND TN=1
SET TBL=T.NEW_COUNT;

Your MERGE is missing a WHEN MATCHED THEN UPDATE clause and also does not have a join condition between the source subquery T and the target TABLE1. Move the filter date into the source subquery and use that to join:
MERGE INTO TABLE1
USING (SELECT COUNT(*) AS NEW_COUNT,
(SELECT MAX(XTRCT_DT) FROM TABLE4) AS MATCH_DT
FROM TABLE2
WHERE XTRCT_DT IN (SELECT MAX(XTRCT_DT) FROM TABLE3)
) AS T
ON XTRCT_DT = T.MATCH_DT AND TN=1
WHEN MATCHED THEN UPDATE
SET TBL=T.NEW_COUNT;
You could instead cross-join (JOIN with no ON clause) the subquery for MATCH_DT with TABLE2 rather than using a scalar subquery in the SELECT list.

Related

How to write subquery like: column=(select xx from table) in Hive?

I have a scenario, for example:
with tmp as (select name from table1)
select * from table2 b
where b.name=(select max(name) from tmp)
However, Hive can't recognize this syntax, so is there any legal syntax for this?
After search, I learnt it can use join to realize:
select table2.* from table2
join (select max(name) as name from tmp) t2
where table2.name = t2.name
but I don't want to use join, as the join will be very slow, I just want to regard it as a reference.
Like in MySQL, you are able to set the result as a reference:
set #max_date := select max(date) from some_table;
select * from some_other_table where date > #max_date;
While Hive can achieve the effect that storing query result in shell. Check: HiveQL: Using query results as variables
Can Hive support such feature in SQL mode?
In Hive you can achieve it as below:
select * from table2 b
where b.name=(select max(name) from table1)
Other way :
You can also create temporary table in hive which will help to replicate your Oracle query above.
CREATE TEMPORARY TABLE tmp AS SELECT name FROM table1;
SELECT * FROM table2 b WHERE b.name=(SELECT max(name) FROM tmp);

Nested SELECT with a WHERE clause in Spark

I have a problem with running a Spark SQL query which uses a nested select with a "where in" clause. In the query below table1 represents a temporary table which comes from a more complicated query. In the end I want to substitute table1 with this query.
select * from (select * from table1) as table2
where (product, price)
in (select product, min(price) from table2 group by product)
The Spark error I get says:
AnalysisException: 'Table or view not found: table2;
How could I possibly change the query to make it work as intended?
subquery (i.e. (select * from table1) as table2 ) is not needed & it is limited to immediate use after subquery defined you can't use with in or where clause, you can use correlated subquery instead :
select t1.*
from table1 t1
where t1.price = (select min(t2.price) from table1 t2 where t2.product = t1.product);

postgres - select with "using"

I found a very useful delete query that will delete duplicates based on specific columns:
DELETE FROM table USING table alias
WHERE table.field1 = alias.field1 AND table.field2 = alias.field2 AND
table.max_field < alias.max_field
How to delete duplicate entries?
However, is there an equivalent SELECT query that will allow to filter the same way? Was trying USING but no success.
Thank you.
You can join your table with itself using the specific columns, field1 and field2, and then filter based on a comparison between max_field on both tables.
select t1.*
from mytable t1
join mytable t2 on (t1.field1 = t2.field1 and t1.field2 = t2.field2)
where t1.max_field < t2.max_field;
You will get all the duplicates whose max_field is not the greatest.
sqlfiddle here.

Deleting from Oracle SQL table using 'inner join'

SO I've searched high and low, trying other tips used on this forum to no avail.
So trying to delete using inner join in Oracle SQL Developer (v3.2.20.09)
Table I wish to delete from (Table1, column name Column1), where the data matches the column 'Column2' in 'Table2.
I know there are some differences between Oracle/Microsoft SQL, tried multiple queries such as below, with slight variation (using open/close brackets, inner joins, WHERE EXISTS, WHERE (select's).
TRY:
delete from table2 where
exists (select column1 from table1);
delete from table2,
inner join table1 on table2.column2 = table1.column1;
What are the problem(s) of the code that I wrote?
The EXISTS version would look like this:
delete from table2
where exists (select *
from table1
where table1.column1 = table2.column2);
Alternatively you can use an IN clause
delete from table2
where column2 in (select column1
from table1);
If you're trying to delete from table1, then that's the table name that has to be used in the delete clause, not table2.
delete table1 t1
where exists (select null
from table2 t2
where t2.column2 = t1.column1)

How to convert a SQL subquery to a join

I have two tables with a 1:n relationship: "content" and "versioned-content-data" (for example, an article entity and all the versions created of that article). I would like to create a view that displays the top version of each "content".
Currently I use this query (with a simple subquery):
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1
WHERE (version = (SELECT MAX(version) AS topversion
FROM mytable
WHERE (fk_idothertable = t1.fk_idothertable)))
The subquery is actually a query to the same table that extracts the highest version of a specific item. Notice that the versioned items will have the same fk_idothertable.
In SQL Server I tried to create an indexed view of this query but it seems I'm not able since subqueries are not allowed in indexed views. So... here's my question... Can you think of a way to convert this query to some sort of query with JOINs?
It seems like indexed views cannot contain:
subqueries
common table expressions
derived tables
HAVING clauses
I'm desperate. Any other ideas are welcome :-)
Thanks a lot!
This probably won't help if table is already in production but the right way to model this is to make version = 0 the permanent version and always increment the version of OLDER material. So when you insert a new version you would say:
UPDATE thetable SET version = version + 1 WHERE id = :id
INSERT INTO thetable (id, version, title, ...) VALUES (:id, 0, :title, ...)
Then this query would just be
SELECT id, title, ... FROM thetable WHERE version = 0
No subqueries, no MAX aggregation. You always know what the current version is. You never have to select max(version) in order to insert the new record.
Maybe something like this?
SELECT
t2.id,
t2.title,
t2.contenttext,
t2.fk_idothertable,
t2.version
FROM mytable t1, mytable t2
WHERE t1.fk_idothertable == t2.fk_idothertable
GROUP BY t2.fk_idothertable, t2.version
HAVING t2.version=MAX(t1.version)
Just a wild guess...
You Might be able to make the MAX a table alias that does group by.
It might look something like this:
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1 JOIN
(SELECT fk_idothertable, MAX(version) AS topversion
FROM mytable
GROUP BY fk_idothertable) as t2
ON t1.version = t2.topversion
I think FerranB was close but didn't quite have the grouping right:
with
latest_versions as (
select
max(version) as latest_version,
fk_idothertable
from
mytable
group by
fk_idothertable
)
select
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable,
t1.version
from
mytable as t1
join latest_versions on (t1.version = latest_versions.latest_version
and t1.fk_idothertable = latest_versions.fk_idothertable);
M
If SQL Server accepts LIMIT clause, I think the following should work:
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1 ordery by t1.version DESC LIMIT 1;
(DESC - For descending sort; LIMIT 1 chooses only the first row and
DBMS usually does good optimization on seeing LIMIT).
I don't know how efficient this would be, but:
SELECT t1.*, t2.version
FROM mytable AS t1
JOIN (
SElECT mytable.fk_idothertable, MAX(mytable.version) AS version
FROM mytable
) t2 ON t1.fk_idothertable = t2.fk_idothertable
Like this...I assume that the 'mytable' in the subquery was a different actual table...so I called it mytable2. If it was the same table then this will still work, but then I imagine that fk_idothertable will just be 'id'.
SELECT
t1.id,
t1.title,
t1.contenttext,
t1.fk_idothertable
t1.version
FROM mytable as t1
INNER JOIN (SELECT MAX(Version) AS topversion,fk_idothertable FROM mytable2 GROUP BY fk_idothertable) t2
ON t1.id = t2.fk_idothertable AND t1.version = t2.topversion
Hope this helps