As default value of hive dynamic partition is 100 per node and in my case the value of dynamic partition is greater than 100 that's why my job got failed So can anybody suggest what is the best way to overcome this problem.
Thanks in Advance
You have to set these desired properties just before dynamic partition:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
I hope this will help you!!!
Related
I have a dataset that has 1 billion rows. The data is stored in Hive. Also, I put Impala as a layer between Hive and Superset. The queries that are run in Superset have row limit max. 100.000. I need to change it with no row limit. Furthermore, I need to make a visualization from what the queries return from SQL lab, but it cannot be done because there is a timeout cache limit also. Therefore, if I change/increase the row limit in SQL lab and timeout cache in visualization, then I guess, there will be no problem.
i am trying my best to answer below. Pls backup all config files before changing.
For SQL row limit issue -
modify config.py file inside the 'anaconda3/lib/python3.7/site-packages' and set
DEFAULT_SQLLAB_LIMIT to 1000000000
QUERY_SEARCH_LIMIT to 1000000000
modify viz.py and set-
filter_row_limit to 1000000000
For timeout issue, increase below parameter values -
For synchronous queries - change superset_config.py
SUPERSET_WEBSERVER_TIMEOUT
SQLLAB_TIMEOUT
SUPERSET_TIMEOUT --This value should be >=SQLLAB_TIMEOUT
For async queries -
SQLLAB_ASYNC_TIME_LIMIT_SEC
There must be a config parameter to change the max row limit in site-packages/superset, DEFAULT_SQLLAB_LIMIT to set the default and SQL_MAX_ROW to set the max in SQL Lab.
I guess we have to run superset_init again to make the change appear on Superset.
I've been able to solve the problem as follows:
modify config.py in site-packages/superset:
increase SQL_MAX_ROW from 100'000.
I have a table in Hive. When I run the following I always get 0 returned:
select count(*) from <table_name>;
Event though if I run something like:
select * from <table_name> limit 10;
I get data returned.
I am on Hive 1.1.0.
I believe the following two issues are related:
https://issues.apache.org/jira/browse/HIVE-11266
https://issues.apache.org/jira/browse/HIVE-7400
Is there anything I can do to workaround this issue?
The root cause is the old and outdated statistics of the table. Try issuing this command which should solve the problem.
ANALYZE TABLE <table_name> COMPUTE STATISTICS;
When you import the table first there may be various reasons the statistics is not updated by Hive services. I am still looking for options and properties to make it right.
I would like to know if I can update the parameter values (mainly timestamp) automatically using any SQL query. Please don't suggest me to use Environments, as the value is constantly changing and I need them to be updated automatically when I run.
you may follow these steps to config your parameter:
create a variable, e.g. User::starttime
add an 'Execute SQL Task': specify the connection info and the query. e.g. SELECT STARTTIME FROM TABLE. set the ResultSet to 'Single row'. in the ResultSet Tab map the result name and variable name.
after the 'Execute SQL Task' executed, the variable 'User::starttime' will be set to the SQL result.
use the variable User::starttime as your parameter.
Please go throw below link you can use any query output as veritable.
http://dataqueen.unlimitedviz.com/2012/08/how-to-set-and-use-variables-in-ssis-execute-sql-task/
I have a "very long" create external table" statement that i try to run in Hive (200+ columns) but I end up with this error message.
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
It's suppose to create an external table over an already populated hbase table. If reduce the number of column in my Hive statement it works.
So could it be the max number of column?, a connection timeout? , the lenght of the statement?
Please share your thought.
Regards,
Breach
Not sure if the number of variables is the real problem given the limited information provided, but this post should be able to help you check if the number of variables is the problem.
Creating a hive table with ~40K columns
Change the type of column "PARAM_VALUE" in "SERDE_PARAMS" Table in metastore database.
Try this command if you are using mysql server for storing the metastore DB
ALTER TABLE SERDE_PARAMS MODIFY PARAM_VALUE TEXT NOT NULL;
Hope it works for you.
select orderid from orders where REGEXP_REPLACE(orderid,'/^0+(.)/')
I have searched the documentation and am missing it. If I run this query will it change any real data or just my set returned for output (the "virtual" data)? The word replace scares me. I am using oracle 11g.
Thank you.
Because you are performing a SELECT, you end up getting a read only view of the data, nothing has changed.
So you don't need to worry about running this select statement. The only way to update it would be to follow this up with an UPDATE command.
No, it doesn't. (even though this answer is too short for SO).