Issue Counting Rows in Hive - hive

I have a table in Hive. When I run the following I always get 0 returned:
select count(*) from <table_name>;
Event though if I run something like:
select * from <table_name> limit 10;
I get data returned.
I am on Hive 1.1.0.
I believe the following two issues are related:
https://issues.apache.org/jira/browse/HIVE-11266
https://issues.apache.org/jira/browse/HIVE-7400
Is there anything I can do to workaround this issue?

The root cause is the old and outdated statistics of the table. Try issuing this command which should solve the problem.
ANALYZE TABLE <table_name> COMPUTE STATISTICS;
When you import the table first there may be various reasons the statistics is not updated by Hive services. I am still looking for options and properties to make it right.

Related

Out of Memory error while creating table but SELECT works

I'm trying to CREATE a table using CREATE TABLE AS, which gives the following error:
[Amazon](500310) Invalid operation: Out Of Memory:
Details:
-----------------------------------------------
error: Out Of Memory:
code: 1004
context: alloc(524288,MtPool)
query: 6041453
location: alloc.cpp:405
process: query2_98_6041453 [pid=8414]
-----------------------------------------------;
I'm getting this error everytime I execute the query, but executing just the SELECT part of the query (without the CREATE TABLE AS) works fine. The result has around 38k rows. However I see there's a drastic difference in Bytes returned in the sequential scan on one table.
SELECT
CREATE TABLE AS SELECT
I fail to understand why there's so much difference between these two scenarios and what can be done to mitigate it. I also tried to create a TEMP TABLE but that also results in memory error.
I'm not great at understanding Query Plans (never found a detailed guide to it for Redshift, so if you could link to some resource that'd be a bonus).
Update : Also tried creating the table first and then INSERTing the data using SELECT, that also gives the same error.
Update 2 : Tried set wlm_query_slot_count to 40; or even 50 but still the same error.
We ran into a similar issue after our clusters got updated to the latest release (1.0.10694).
Two things that helped:
Changing your WLM to allocate more memory to your query (in our case,
we switched to WLM
Auto)
Allocating a higher query_slot_count to your query:
set wlm_query_slot_count to 2; to allocate 2 query slots for
example.
We suspect that AWS may have changed something with memory management with the most recent updates. I'll update once we hear back.
As a workaround, you could try inserting the records in batches.
Solved this using manual WLM implement.

BigQuery data using SQL "INSERT INTO" is gone after some time

Today I notice another strange behaviour of BigQuery.
I run UDF standard SQL in the BQ web ui:
CREATE TEMPORARY FUNCTION ...
INSERT INTO projectid.dataset.inserttable...
All seems good, the result of the UDF SQL are inserted in the insert table correct, I can tell from "Number of rows". But the table size is not correct, still keep the table size before run the insert query. Furthermore, I found all the inserted rows are gone after 1 hour later.
Some more info I found, when run a "DETELE FROM insert table true" or "SELECT ...", then the deleted number of rows and table size seems correct with the inserted data. But just can not preview the insert table correctly in the WEB UI.
Then I am guessing the "Detail" or "Preview" info of the table has time delay? May I know do you have any idea about this behaviour?
The preview may have a delay, so SELECT * FROM YourTable; will give the most up-to-date results, or you can use COUNT(*) just to verify that the number of rows is correct. You can think of it as being similar to streaming, if you have tried that, where some rows may be in the streaming buffer for a while before they make it into regular storage.

Oracle simple SQL query result in: ORA-08103: object no longer exists

please help with a query on Oracle. I'm using SQLPlus (but with SQLDeveloper is the same) to accomplish a simple query like:
select count(*) from ARCHIT_D_CC where (TYP_ID=22 OR TYP_ID=23) and SUBTM like '%SEP%' and CONS=1234
This is a very simple query that works perfect until I'll execute it on a big table that contains tons of data. After a few minutes I got:
ERROR at line 1: ORA-08103: object no longer exists
This because the database is partitioned and due to large ammount of data in the table and before my query finishes, oracle BT mechanism rotates the table partitions. That's why I got the message.
Now, is there a way to avoid this error? Maybe specify the partition or something like that. As already wrote, in other table with less data, it works perfect.
Thanks
Lucas

How to run a query for a map job only in Apache hive

If I write a query in Apache hive then it executes mapreduce job behind the scene but how I can run only map job in hive?
Thanks
Certain optimized queries do in fact only require a map phase. You may provide a MAPJOIN hint in Hive to achieve same: this is recommended for small secondary tables:
SELECT /*+ MAPJOIN(...) */ * FROM ...
This was a question which was asked to me in an interview,I didn't knew the answer that time but then i figured it out later on.
The following query runs a Map only job.So selecting column values will run map only job.Hence we dont need reducer for this scenario.
select id,salary from tableA;

Select from a SQL table starting with a certain index?

I'm new to SQL (using postgreSQL) and I've written a java program that selects from a large table and performs a few functions. The problem is that when I run the program I get a java OutOfMemoryError because the table is simply too big. I know that I can select from the beginning of the table using the LIMIT operator, but is there a way I can start the selection from a certain index where I left off with the LIMIT command? Thanks!
There is offset option in Postgres as in:
select from table
offset 50
limit 50
For mysql you can use the follwoing approaches:
SELECT * FROM table LIMIT {offset}, row_count
SELECT * FROM table WHERE id > {max_id_from_the previous_selection} LIMIT row_count. First max_id_from_the previous_selection = 0.
This is actually something that the jdbc driver will handle for you transparently. You can actually stream the result set instead of loading it all into memory at once. To do this in MySQL, you need to follow the instructions here: http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html
Basically when you create you call connection.prepareStatement you need to pass ResultSet.TYPE_FORWARD_ONLY and ResultSet.CONCUR_READ_ONLY as the second and third parameters, then call setFetchSize(Integer.MIN_VALUE) on your PreparedStatement object.
There are similar instructions for doing this with other databases which I could iterate if needed.
EDIT: now we know you need instructions for PostgreSQL. Follow the instructions here: How to read all rows from huge table?