Hive: execution error when "where" condition contains a subquery - hive

I have two tables. Table 1 is a large one, and Table 2 is a small one. I would like to extract data from Table 1 if values in Table1.column1 matches those in Table2.column1. Both Table 1 and Table 2 have column, column1. Here is my code.
select *
from Table1
where condition1
and condition2
and column1 in (select column1 from Table2)
Condition 1 and Condition 2 are meant to restrict the size of the table to be extracted. Not sure if this actually works. Then I got execution error, return code 1. I am on Hue platform.
EDIT
As suggested by #yammanuruarun, I tried the following code.
SELECT *
FROM
(SELECT *
FROM Table1
WHERE condition1
AND condition2) t1
INNER JOIN Table2 ON t1.column1 = t2.column1
Then, I got the following error.
Error while processing statement: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Application
application_1580875150091_97539 failed 2 times due to AM Container for
appattempt_1580875150091_97539_000002 exited with exitCode: 255 Failing this
attempt.Diagnostics: [2020-02-07 14:35:53.944]Exception from container-launch.
Container id: container_e1237_1580875150091_97539_02_000001 Exit code: 255
Exception message: Launch container failed Shell output: main : command provided 1
main : run as user is hive main : requested yarn user is hive Getting exit code
file... Creating script paths... Writing pid file... Writing to tmp file /disk-
11/hadoop/yarn/local/nmPrivate/application_1580875150091_97539/container_e1237_1580875150091_97539_02_000001/container_e1237_1580875150091_97539_02_000001.pid.tmp
Writing to cgroup task files... Creating local dirs... Launching container...
Getting exit code file... Creating script paths... [2020-02-07 14:35:53.967]Container exited with a non-zero exit code 255. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr :
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in
thread "IPC Server idle connection scanner for port 26888" Halting due to Out Of
Memory Error... Halting due to Out Of Memory Error... Halting due to Out Of Memory
Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... [2020-02-07 14:35:53.967]Container exited
with a non-zero exit code 255. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr :
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in
thread "IPC Server idle connection scanner for port 26888" Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
For more detailed output, check the application tracking page: http://dcwipphm12002.edc.nam.gm.com:8088/cluster/app/application_1580875150091_97539 Then click on links to logs of each attempt. . Failing the application.
Looks like it is a memory error. Is there any way I could optimize my query?

Related

DBT: How to fix Database Error Expecting Value?

I was running into troubles today while running Airflow and airflow-dbt-python. I tried to debug a bit using the logs and the error shown in the logs was this one:
[2022-12-27, 13:53:53 CET] {functions.py:226} ERROR - [0m12:53:53.642186 [error] [MainThread]: Encountered an error:
Database Error
Expecting value: line 2 column 5 (char 5)
Quite a weird one.
Possibly check your credentials file that allows DBT to run queries on your DB (in our case we run DBT with BigQuery), in our case the credentials file was empty. We even tried to run DBT directly in the worker instead of running it through airflow, giving as a result exactly the same error. Unfortunately this error is not really explicit.

Getting an Assert code 1000 error running VACUUM query in Redshift

I am running a Redshift cluster (version 1.0.39380) that fails when running a VACUUM DELETE ONLY <table> query.
This is the output I'm getting.
Query 1 ERROR: ERROR: Assert
DETAIL:
-----------------------------------------------
error: Assert
code: 1000
context: state->m_tbl_rowid_encoding_type == EncodeRawAuto || (state->m_tbl_rowid_encoding_type == EncodeFor64Auto && gconf_enable_rowid_compression_with_az64 && gconf_enable_vacuum_for_rowid_compressed_table) - Tableid108790 is rowid compressed with0 encoding and
query: 1531591
location: xen_vacuum.cpp:4772
process: query1_24_1531591 [pid=16344]
-----------------------------------------------
To give a bit more context, this is a regular job that runs 3 times a week. Other queries have been running fine on the same table, also on other tables.
The cluster is not particularly under pressure (not much as usual).
What may be the cause of this error?

What happens when tensorflow.io.read_file fails to read

I've been running a training model that is pulling images from Google Storage using the command tensorflow.io.read_file
and relatively often I am seeing the following message:
tensorflow/core/platform/cloud/curl_http_request.cc:596] The transmission of request 0x7f8c80610d80 (URI: https://storage.googleapis.com/..../images1024x1024%2F51000%2F51249.png) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 0.000304 (No error), connect time: 0 (No error), pre-transfer time: 0 (No error), start-transfer time: 0 (No error)
It does not seem to be causing any error, and so I am curious what this call does in the event it fails to read.

Scheduled Query Fails To Read Spreadsheet

I have 14 scheduled queries that run hourly from Google Sheets but they fail half of the time. I don't understand the error status though as the queries do run successfully half of the time. The error reads:
Error status: Invalid value Error while reading table: tester-253410.test1.Table_View_2_Agent_Targets, error message: Failed to read the spreadsheet. Errors: Deadline=118.888051456s; JobID: tester-253410:5e59a150-0000-2421-b469-001a1144591c
Is there anything that I can try?

FAILED: Error in metadata:

when i am trying to show tables from hive databases the following error displays..
i granted permissions to ware house & Tables even though the error shows
hive> show tables;
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Thanks in advance.
this error occurs when hive CLI is terminated improperly.
solution:
exit from hive, give 'jps' command. some process named runjar will be there. kill it using ' kill -9 pid'
thats it. you are done.
plz ignore typo- replied from mobile