Hive query CREATE TABLE WITH ORDER BY COLUMN stuck at 83% - hive

CREATE TABLE ordered_table
AS
SELECT name, epoc_timeĀ 
FROM reference_table
order by name, cast(epoc_time as bigint)
My reference_table has 1000000000 records.
This query works on smaller table but fails on this table at 83% with alerts on YARN and spark as below:
Yarn alert
Spark Thrift alert

Related

Hive to BigQuery Converted INSERT OVERWRITE TABLE with PARTITION on Integer

I am trying to convert the following Hive query to BigQuery with little luck. The idea is to remove the records from the specified partition and insert new records into the partition without touching other partitions. I have seen Google's documentation on using a DML statement to add rows to an ingestion-time partitioned table, but this isn't what I'm trying to accomplish.
INSERT OVERWRITE TABLE mytable PARTITION (integer_id = 100) select tmp.*, NULL as value from (select * from mytable2) as tmp;
Any help would be greatly appreciated!

`CREATE TABLE AS SELECT FROM` in Oracle Cloud doesn't create a new table

I was trying to create a series of tables in a single SQL query in Oracle Cloud under the ADMIN account. In the minimum script below, RAW_TABLE refers to an existing table.
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
CREATE TABLE BASE2 AS SELECT * FROM BASE1;
CREATE TABLE BASE3 AS SELECT * FROM BASE2;
SELECT * FROM BASE3
This returns a view of the first 100 rows in BASE3, but it doesn't create the three tables along the way. Did I miss something or is there something peculiar about create table statements in Oracle SQL?
EDIT: The environment is Oracle Database Actions in Oracle Cloud. The three tables would not be available in the list of tables in the database, and doing something like select * from BASE3 in a subsequent query would fail.
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
CREATE TABLE BASE2 AS SELECT * FROM BASE1;
CREATE TABLE BASE3 AS SELECT * FROM BASE2;
SELECT * FROM BASE3
Above is a valid query sequence for Oracle database. It should have been created three new tables in database. Since it's not happening please do the work in few steps to find out what's wrong.
First please check whether RAW_TABLE is available in database or not. Then try to select data from RAW_TABLE
select * from RAW_TABLE;
If all those are successful then try to create single table with below query:
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
Hope you would find the problem by then.
DB-Fiddle:
Creating RAW_TABLE and populating data
create table RAW_TABLE (id int, name varchar(50));
insert into RAW_TABLE values (1,'A');
Query to create three more tables ans selecting from the last table:
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
CREATE TABLE BASE2 AS SELECT * FROM BASE1;
CREATE TABLE BASE3 AS SELECT * FROM BASE2;
SELECT * FROM BASE3
Output:
ID
NAME
1
A
db<>fiddle here
your query fails because you are executing the whole script as one batch and each line is depends on another one , the transactional DBMS's work with blocks of code as one transaction , and that block of code doesn't commit until sql engine can parse and validate the whole block, and since in your block, BASE1 and BASE2 tables doesn't exists just yet , It fails.
so you need to run each statement as a separate batch. either by executing them one by one or in Oracle you can use / as batch separator, like in sql server you can use GO. these commands are not SQL or Oracle commands and are not sent to the database server , they are just break block of code in batches on your client ( like SQL*Plus or shell or SSMS (for Microsoft sql server), so It would look like this:
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
/
CREATE TABLE BASE2 AS SELECT * FROM BASE1;
/
CREATE TABLE BASE3 AS SELECT * FROM BASE2;
/
SELECT * FROM BASE3
if your client doesn't support that then you only have to run them one by one in separate batches.

Select count(*) from Table , Select * from Table dosent yeild any output

I am trying to build a managed table (which orc formatted ,bucketed and table properties is set to true for transnational )on which i can run the update/Insert Statement In hive .
I am running this whole setup on AWS EMR and the Hive version is 2.4.3 the default directory store the data is S3.
I am able to populate the table from another external table .
However am getting select count(*) as zero and no output for select *
i dropped the table and recreated the table and repopulated the data .
The ANALYZE TABLE TABLE-NAME COMPUTE STATISTICS gives proper output .

Get actual target table insert count

I'm inserting data into hive external table in append mode. Every time I insert some records in a table, I want to get the count of actual records which are inserted into the hive external table. Is there any way I could find this information in any hive log file?
There can be workaround for this. Not sure about any hive property for this.
Have an additional timestamp column in your table.
Do self join on table on timestamp column.
count the latest records inserted into table. You can check below sample query:-
SELECT count(1) from (
SELECT tbl_alias.* FROM test_table tbl_alias JOIN
( select max(timestamp_date) as max_timestamp_date FROM test_table) max_timestamp_date_table ON
tbl_alias.timestamp_date=max_timestamp_date_table.max_timestamp_date ) outer_table;

hadoop hive insert query to insert all rows of one table to another table

i want to insert all rows of one hive table to another hive table
insert into table <table_name> as select * from <table_bkp>
i have many rows in table but it is inserting only one row from to
Please suggest the solution for it
and i am using hive 1.2.1 version
In your query remove 'as' and write the query as follows
insert into table <table_name> select * from <table_bkp>