Error starting cluster: Catalog was not initialized in expected time period - impala

Currently I encountered the following error while starting Impala cluster.
Command:
$ ./start-impala-cluster.py --verbose
Output:
...
Waiting for Catalog... Status: 1 DBs / 0 tables (ready=False)
Waiting for Catalog... Status: 1 DBs / 0 tables (ready=False)
Error starting cluster: Catalog was not initialized in expected time period.
When I opened start-implaa-cluster.py, the metric value for 'catalog.num-tables' was always zero. May I know how I could deeply look into and fix the issue?
I referred the "Building Impala" document: https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
I am using CentOS 7 now.
Thanks,
Jinchul

I found a solution by myself :)
The catalog information should be aligned with Hive Metastore.
It means Impala may not connect Hive metastore. I could find a clue from log files under ${IMPALA_HOME}/logs/cluster.
As for configuration files,
Check /etc/impala/conf if you install Impala via CDH.
Check ${IMPALA_HOME}/fe/src/test/resoucers if you build and install Impala using source code.
For your information, Cloudera Impala user guide definitely gave me good advise to understand how it could work. Please refer the link or do googling with the keywords {cloudera + impala + pdf}
https://www.cloudera.com/documentation/enterprise/5-5-x/PDF/cloudera-impala.pdf
Thanks,
Jinchul

Related

Can't access external Hive metastore with Pyspark

I am trying to run a simple code to simply show databases that I created previously on my hive2-server. (note in this example there are both, examples in python and scala both with the same results).
If I log in into a hive shell and list my databases I see a total of 3 databases.
When I start Spark shell(2.3) on pyspark I do the usual and add the following property to my SparkSession:
sqlContext.setConf("hive.metastore.uris","thrift://*****:9083")
And re-start a SparkContext within my session.
If I run the following line to see all the configs:
pyspark.conf.SparkConf().getAll()
spark.sparkContext._conf.getAll()
I can indeed see the parameter has been added, I start a new HiveContext:
hiveContext = pyspark.sql.HiveContext(sc)
But If I list my databases:
hiveContext.sql("SHOW DATABASES").show()
It will not show the same results from the hive shell.
I'm a bit lost, for some reason it looks like it is ignoring the config parameter as I am sure the one I'm using it's my metastore as the address I get from running:
hive -e "SET" | grep metastore.uris
Is the same address also if I run:
ses2 = spark.builder.master("local").appName("Hive_Test").config('hive.metastore.uris','thrift://******:9083').getOrCreate()
ses2.sql("SET").show()
Could it be a permission issue? Like some tables are not set to be seen outside the hive shell/user.
Thanks
Managed to solve the issue, because a communication issue the Hive was not hosted in that machine, corrected the code and everything fine.

Gcloud SQL Postgres import error : CREATE TABLE ERROR: syntax error at or near "AS" LINE 2: AS integer ^ Import error: exit status 3**

Problem:
Getting below mentioned error while importing schema from AWS Postgres to Gcloud postgres.
Error:
Import failed:
SET
SET
SET
SET
SET set_config
------------
(1 row)
SET
SET
SET
CREATE SCHEMA
SET
SET
CREATE TABLE
ERROR: syntax error at or near "AS" LINE 2: AS integer ^
Import error: exit status 3
I used --no-acl --no-owner --format=plain while exporting data from AWS postgres
pg_dump -Fc -n <schema_name> -h hostname -U user -d database --no-acl --no-owner --format=plain -f data.dump
I am able to import certain schemas in gcloud sql exported using same method but getting error for some other similar schemas. Table has geospatial info and postgis is already installed in destination database.
Looking for some quick help here.
My solution:
Basically, I had a data dump file from postgres 10.0 with tables having 'sequence' for PK . Apparently, the way sequences along with other table data got dumped in file, was not been read properly by Gcloud postgres 9.6. That's where it was giving error "AS integer". Also, finally I did find this express in dump file which I couldn't find earlier. Hence I need to filter out this bit.
CREATE SEQUENCE sample.geofences_id_seq
AS integer <=====had to filter out this bit to get it working
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
No sure if anyone else faced this issue but i had and this solution worked for me without loosing any functionality.
Happy to get other better solutions here.
The original answer is correct, and similar answers are given for the general case. Options include:
Upgrading the target database to 10: this depends on what you are using in GCP. For a managed service like Cloud SQL, upgrading is not an option (though support for 10 is in the works, so waiting may be an option in some cases). It is, if you are running the database inside a Compute instance, or as a container in, e.g., App Engine (a ready instance is available from the Marketplace).
Downgrading the source before exporting. Only possible if you control the source installation.
Removing all instances of this one line from the file before uploading it. Adapting other responses to modify an already-created dump file, the following worked for me:
cat dump10.sql | sed -e '/AS integer/d' > dump96.sql

Cant create ORC external tables on Hawq PXF

I'm using Pivotal Hawq with ambari and now I'm trying to run some queries over ORC hive tables with hawq.
Previously I was able to create the external queries on pqsql using SELECT * FROM hcatalog.hive-db-name.hive-table-name distributed randomly;
But now everytime I get the error:
Exception report message java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.
Can you provide some help on how to surpass this?
I believe you have missed a step to update your pxf-profiles.xml file that's required after upgrading to HDB 2.2. Please see the instructions listed here:
http://hdb.docs.pivotal.io/220/hdb/install/install-ambari.html#post-install-212-req

Merge replication Error: The process could not bulk copy into table

Hi I am using SQL SERVER 2005 Service pack 4 on both publisher and distributor. While trying to setup merge replication, i am getting below error continuously. Below are replication details.
I am using push subscription and path is network path.
Distributer and publisher present on the same server.
I have restored recent backup on subscriber and 1 week back backup on publisher.
I am setting up replication for only few tables, procedures and user defined functions.
I have verified and both the publisher and subscriber are having same schema.
As the replication is failing initially saying unable to drop userdefined functions : To resolve it I have set publisher property for user defined functions as Keep existing object unchanged.
Every time the error is coming after running synchronization for around 50 to 55 minutes.
My snapshot agent is working fine without any issue. Problem is only with merge agent.
I have changed the verbosehistory value to 3 in merge agent profile but it is not giving any additional information
Error messages: The merge process was unable to deliver the snapshot
to the Subscriber. If using Web synchronization, the merge process may
have been unable to create or write to the message file. When
troubleshooting, restart the synchronization with verbose history
logging and specify an output file to which to write. (Source:
MSSQL_REPL, Error number: MSSQL_REPL-2147201001)
Get help: http://help/MSSQL_REPL-2147201001
The process could not bulk copy into table
'"dbo"."refund_import_log"'. (Source: MSSQL_REPL, Error number:
MSSQL_REPL20037)
Get help: http://help/MSSQL_REPL20037
The system cannot find the file specified. (Source: MSSQLServer, Error
number: 0)
Get help: http://help/0
To obtain an error file with details on the errors encountered when
initializing the subscribing table, execute the bcp command that
appears below. Consult the BOL for more information on the bcp
utility and its supported options. (Source: MSSQLServer, Error number:
20253)
Get help: http://help/20253
bcp "greyhound"."dbo"."refund_import_log" in
"\usaz-ism-db-02\ghstgrpltest\unc\USAZ-ISM-DB-02_GREYHOUND_GREYHOUND-STAGE\20150529112681\refund_import_log_7.bcp"
-e "errorfile" -t"\n\n" -r"\n<,#g>\n" -m10000 -SUSGA-QTS-GT-01 -T -w (Source: MSSQLServer, Error number: 20253)
Here i am getting problem with different table every time.
Is there any bug related to it ? If so where i can get the fix ? If it is not a bug then please let me know how to resolve this problem.
The error message tells you the problem:
The process could not bulk copy into table '"dbo"."refund_import_log"'. (Source: MSSQL_REPL, Error number: MSSQL_REPL20037)
It then gives you a perfectly good repro, to see why bulk copy is failing:
bcp "greyhound"."dbo"."refund_import_log" in "\usaz-ism-db-02\ghstgrpltest\unc\USAZ-ISM-DB-02_GREYHOUND_GREYHOUND-STAGE\20150529112681\refund_import_log_7.bcp" -e "errorfile" -t"\n\n" -r"\n<,#g>\n" -m10000 -SUSGA-QTS-GT-01 -T -w
Looking at the bcp repro above, can you please doublecheck the UNC path that you set for the snapshot folder, it looks incorrect to me. UNC paths should have two forward slashes in the beginning, yours only has one. The UNC path should look like this:
\\usaz-ism-db-02\ghstgrpltest\unc\

FAILED: Hive Internal Error: java.util.NoSuchElementException(null) while running a CREATE TABLE query from shark command line

I am trying to create a table in hive metastore using shark by executing the following command:
CREATE TABLE src(key int, value string);
but i always get:
FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
Read about the same thing in the google group- shark-users but alas.
My spark version is 0.8.1
My shark version is 0.8.1
Hive binary version is 0.9.0
I have pre installed hive-0.10.0 from cdh4.5.0 but i cant use it since shark 0.8.1 is not compatible with hive-0.10.0 yet.
I can run various queries like select * from table_name; but not create table query.
Even trying to create a cached table fails.
If i try and do sbt build using my HADOOP_VERSION=2.0.0cdh4.5.0, i get DistributedFileSystem error and i am not able to run any query.
I am dire need of a solution. Ill be glad if somebody can put me on to a right direction. I have mysql database and not derby.
I encountered a similar problem, and it seems that this occurs only in 0.8.1 of Shark. I solved it by reverting to Spark and Shark 0.8.0, and it works fine.
0.8.0 and 0.8.1 are very similar in functionality and unless you are using Spark for the added functionality between the two releases, you would be better off staying with 0.8.0.
By the way, it's SPARK_HADOOP_VERSION and SHARK_HADOOP_VERSION if you intend to build those two from the source code. It's not just HADOOP_VERSION.