Hive: select * is ok but select a column fails - hive

It is very strange that when I input the Hive query:
SELECT * FROM tb LIMIT 1;
It returns a row from the table successfully.
However, when I select a column from the table, Hive will fail:
SELECT col FROM tb LIMIT 1;
Hive gives an error message:
FAILED: Execution Error, return code -101 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask. GC overhead limit
exceeded
What is wrong with Hive?

This looks like a java memory error. The reason why a select * works, but a select column doesn't, is that the select * just pulls a row of data from HDFS rather than actually executing a map-reduce job.
You might be able to solve the problem by increasing the maximum heap size:
export HADOOP_CLIENT_OPTS="-Xmx512m"
would set the heap size to 512m, for example.

Related

BigQuery - Resources error when I make a SELECT FIELD but no error when I do SELECT * (on the same table)

Working query: SELECT * FROM my_project.my_dataset.my_table
Not working query: SELECT id FROM my_project.my_dataset.my_table
Error I get: Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.
I understand the error, but I don't understand why it's occuring when I change '*' to any field (id for exemple)
thank you :)

Redshift showing 0 rows for external table, though data is viewable in Athena

I created an external table in Redshift and then added some data to the specified S3 folder. I can view all the data perfectly in Athena, but I can't seem to query it from Redshift. What's weird is that select count(*) works, so that means it can find the data, but it can't actually show anything. I'm guessing it's some mis-configuration somewhere, but I'm not sure what.
Some stuff that may be relevant (I anonymized some stuff):
create external schema spectrum_staging
from data catalog
database 'spectrum_db'
iam_role 'arn:aws:iam::############:role/RedshiftSpectrumRole'
create external database if not exists;
create external table spectrum_staging.errors(
id varchar(100),
error varchar(100))
stored as parquet
location 's3://mybucket/errors/';
My sample data is stored in s3://mybucket/errors/2018-08-27-errors.parquet
This query works:
db=# select count(*) from spectrum_staging.errors;
count
-------
11
(1 row)
This query does not:
db=# select * from spectrum_staging.errors;
id | error
----+-------
(0 rows)
Check your parquet file and make sure the column data types in the Spectrum table match up.
Then run SELECT pg_last_query_id(); after your query to get the query number and look in the system tables STL_S3CLIENT and STL_S3CLIENT_ERROR to find further details about the query execution.
You don't need to define external tables when you have defined external schema based on Glue Data Catalog. Redshift Spectrum pics up all the tables that are in the Catalog.
What's probably going on there is that you somehow have two things with the same name and in one case it picks it up from the data catalog and in the other case it tries to use the external table.
Check these tables from Redshift side to get a better view of what's there:
select * from SVV_EXTERNAL_SCHEMAS
select * from SVV_EXTERNAL_TABLES
select * from SVV_EXTERNAL_PARTITIONS
select * from SVV_EXTERNAL_COLUMNS
And these tables for queries that use the tables from external schema:
select * from SVL_S3QUERY_SUMMARY
select * from SVL_S3LOG order by eventtime desc
select * from SVL_S3QUERY where query = xyz
select * from SVL_S3PARTITION where query = xyz
was there ever a resolution for this? a year down, i have the same problem today.
nothing stands out in terms of schema differences- an error exists though
select recordtime, file, process, errcode, linenum as line,
trim(error) as err
from stl_error order by recordtime desc;
/home/ec2-user/padb/src/sys/cg_util.cpp padbmaster 1 601 Compilation of segment failed: /rds/bin/padb.1.0.10480/data/exec/227/48844003/de67afa670209cb9cffcd4f6a61e1c32a5b3dccc/0
Not sure what this means.
I encountered a similar issue when creating an external table in Athena using RegexSerDe row format. I was able to query this external table from Athena without any issues. However, when querying the external table from Redhift the results were null.
Resolved by converting to parquet format as Spectrum cannot handle regular expression serialization.
See link below:
Redshift spectrum shows NULL values for all rows

Hive Parquet Table simple select command ending up with error

I have a Parquet table in Hive (within Cloudera cluster and this is an External Table). When I execute select * from table_name command, it is working fine
But when I try to see the values of a particular column, I am getting the outofmemory error even though I limit the results to be just 10.
select col_name from table_name limit 10;
java.lang.OutOfMemoryError: Java heap space
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask.
Java heap space
Really weird . I am new to Parquet. So appreciating any of your help on this. Thanks
Additional Info of the Hive table retrieved from desc table command :
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1

Gathering column stats throwing error

When I gather stats on columns of a table its throwing an error.
hive> ANALYZE TABLE customer COMPUTE STATISTICS FOR COLUMNS;
FAILED: NullPointerException null
And also if I query select * from customer; it's working.
If I use any where condition it's throwing error.
select * from customer where cust_id=45633;
FAILED: NullPointerException null
Please help me.
Thanks in advance

select not working after putting where condition in nicknames?

I have a remote mainframe db2 database for which I have created nicknames in my db2 server .
Problem is as below -
When I run query
SELECT * FROM LNICKNAME.TABLE - It runs and I can get all columns.
but if I run below query it never gives me any output and keeps running .
SELECT * FROM LNICKNAME.TABLE a where a.columnB = 'ADH00040';
So technically it does not work if I add any where condition .
It doesn't seem like there is an error with your SELECT statement. So I am assuming that one of two things are happening:
Senario 1:
The file is really big and there isn't an index on columnB. If this is the case it would take long as the DB would have to read through each record and check if columnB = 'ADH00040'. To see how many records are in the table just run a count on the table
SELECT COUNT(*) FROM LNICKNAME.BMS_TABLE
Senario 2:
Something or someone is disconnecting your connection before your query is complete. I know you can limit the amount of CPU time a iSeries job is allowed before it gets ended forceably (CHGJOB CPUTIME(60000)). Is there no form of a job log that you could share with us?
Are you sure your value is into your table?
try a like :
SELECT * FROM LNICKNAME.TABLE a where a.columnB like '%ADH00040%';