Share UDFs in Multi hiveserver

Share UDFs in Multi hiveserver - hive

In our production environment,we have multi hiveserver2 for high availibility.User create permenent UDFs by running
beeline -u "jdbc:hive2//hs1.name.com"
add jar <hdfs://ns:8020/path/udf.jar>
create function myfunc as 'com.test.udf.UDF_CLASS' using jar 'hdfs://ns:8020/path/udf.jar'
User connect to hs1.name.com is OK,but user get function not found ERROR when using beeline connect another hiveserver2 to call the UDF,like
beeline -u "jdbc:hive2//hs2.name.com"
select myfunc(id) from table1
Error messages is
Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 1:7 Invalid function 'myfunc' (state=42000,code=10011)`
After restart hiveserver2 on hs2.name.com,user can call the UDF correctly.
Is there someway not to restart hiveserver2 but told hiveserver2 to reload UDFs information from metastore?
Thanks #Kishore, reload function is great!

USE RELOAD FUNCTION;
As of HIVE-2573, creating permanent functions in one Hive CLI session may not be reflected in HiveServer2 or other Hive CLI sessions, if they were started before the function was created. Issuing RELOAD FUNCTION within a HiveServer2 or HiveCLI session will allow it to pick up any changes to the permanent functions that may have been done by a different HiveCLI session.
ref - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/ReloadFunction

Related

How to drop a database from hive metastore when underlying HDFS cluster in no longer there

I'm working with ephemeral GCP Dataproc clusters ( Apache Spark 2.2.1, Apache Hadoop 2.8.4 and Apache Hive 2.1.1). These clusters all point to the same Hive Metastore (hosted on a Google Cloud SQL instance).
I created a database on one such cluster and set it's location to 'HDFS:///database_name' like so:
$ gcloud dataproc jobs submit hive \
-e "create database db_name LOCATION 'hdfs:///db_name'" \
--cluster=my-first-ephemeral-cluster --region=europe-west1
my-first-ephemeral-cluster then got deleted and with it the associated HDFS.
On all subsequent clusters the following error has since been popping up:
u'java.net.UnknownHostException: my-first-ephemeral-cluster-m'
This is probably because the Hive Metastore now has an entry for a location that does not exist.
Trying to drop the corrupted database is a no go as well:
$ gcloud dataproc jobs submit hive \
-e 'drop database db_name' \
--cluster=my-second-ephemeral-cluster --region=europe-west1
Job [4462cb1d-88f2-4e2b-8a86-c342c0ce46ee] submitted.
Waiting for job output...
Connecting to jdbc:hive2://my-second-ephemeral-cluster-m:10000
Connected to: Apache Hive (version 2.1.1)
Driver: Hive JDBC (version 2.1.1)
18/11/03 13:40:04 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: java.net.UnknownHostException: my-first-ephemeral-cluster-m) (state=08S01,code=1)
Closing: 0: jdbc:hive2://my-second-ephemeral-cluster-m:10000
The reason being that the host my-first-ephemeral-cluster-m is no longer valid. Since changing the database's location is not an option in the version of hive I'm using, I need a different workaround to dropping this database.

https://cwiki.apache.org/confluence/display/Hive/Hive+MetaTool
The Hive MetaTool enables administrators to do bulk updates on the
location fields in database, table, and partition records in the
metastore (...)
Example (...)
./hive --service metatool -updateLocation hdfs://localhost:9000 hdfs://namenode2:8020
But first, you need to know how exactly the pseudo-HDFS paths have been saved in the Metastore, in their "canonical" form e.g. hdfs://my-first-ephemeral-cluster-m/db_name (if Google follows Hadoop standards somewhat)

Since my point of view, the correct way to delete the Hive metastore entry that causes error is removing the database just before you delete the cluster my-first-ephemeral-cluster, for example an script with this sequence:
gcloud dataproc jobs submit hive -e 'drop database db_name' --cluster=my-first-ephemeral-cluster --region=europe-west1
gcloud dataproc clusters delete my-first-ephemeral-cluster
However, I found instructions of Cloud SQL proxy for setting up a shared hive warehouse between different Dataproc clusters using cloud storage (instead of LOCATION 'hdfs:///db_name' that creates the hive warehouse in the local HDFS), which could give you a behavior like the one you are looking for.

I have created a cluster in Dataproc with the same name, to remove the schema that was created with a location in HDFS.

Google cloud dataproc failing to create new cluster with initialization scripts

I am using the below command to create data proc cluster:
gcloud dataproc clusters create informetis-dev
--initialization-actions “gs://dataproc-initialization-actions/jupyter/jupyter.sh,gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh,gs://dataproc-initialization-actions/hue/hue.sh,gs://dataproc-initialization-actions/ipython-notebook/ipython.sh,gs://dataproc-initialization-actions/tez/tez.sh,gs://dataproc-initialization-actions/oozie/oozie.sh,gs://dataproc-initialization-actions/zeppelin/zeppelin.sh,gs://dataproc-initialization-actions/user-environment/user-environment.sh,gs://dataproc-initialization-actions/list-consistency-cache/shared-list-consistency-cache.sh,gs://dataproc-initialization-actions/kafka/kafka.sh,gs://dataproc-initialization-actions/ganglia/ganglia.sh,gs://dataproc-initialization-actions/flink/flink.sh”
--image-version 1.1 --master-boot-disk-size 100GB --master-machine-type n1-standard-1 --metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance”
--num-preemptible-workers 2 --num-workers 2 --preemptible-worker-boot-disk-size 1TB --properties hive:hive.metastore.warehouse.dir=gs://informetis-dev/hive-warehouse
--worker-machine-type n1-standard-2 --zone asia-east1-b --bucket info-dev
But Dataproc failed to create cluster with following errors in failure file:
cat
+ mysql -u hive -phive-password -e '' ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (111)
+ mysql -e 'CREATE USER '\''hive'\'' IDENTIFIED BY '\''hive-password'\'';' ERROR 2003 (HY000): Can't connect to MySQL
server on 'localhost' (111)
Does anyone have any idea behind this failure ?

It looks like you're missing the --scopes sql-admin flag as described in the initialization action's documentation, which will prevent the CloudSQL proxy from being able to authorize its tunnel into your CloudSQL instance.
Additionally, aside from just the scopes, you need to make sure the default Compute Engine service account has the right project-level permissions in whichever project holds your CloudSQL instance. Normally the default service account is a project editor in the GCE project, so that should be sufficient when combined with the sql-admin scopes to access a CloudSQL instance in the same project, but if you're accessing a CloudSQL instance in a separate project, you'll also have to add that service account as a project editor in the project which owns the CloudSQL instance.
You can find the email address of your default compute service account under the IAM page for your project deploying Dataproc clusters, with the name "Compute Engine default service account"; it should look something like <number>#project.gserviceaccount.com`.

I am assuming that you already created the Cloud SQL instance with something like this, correct?
gcloud sql instances create g-test-1022 \
--tier db-n1-standard-1 \
--activation-policy=ALWAYS
If so, then it looks like the error is in how the argument for the metadata is formatted. You have this:
--metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance”
Unfortuinately, the zone looks to be incomplete (asia-east1 instead of asia-east1-b).
Additionally, with running that many initializayion actions, you'll want to provide a pretty generous initialization action timeout so the cluster does not assume something has failed while your actions take awhile to install. You can do that by specifying:
--initialization-action-timeout 30m
That will allow the cluster to give the initialization actions 30 minutes to bootstrap.

By the time you reported, it was detected an issue with cloud sql proxy initialization action. It is most probably that such issue affected you.
Nowadays, it shouldn't be an issue.

Hive multiple users on same tables

is that possible to have tables that are shared in hive.
I mean a user creates a hive table. Later multiple users can work on that same table simultaneously.
I heard about derby and individual metastore for each users. But individual metastore option does not allow users to work simultaneously on same set of tables right?
Is there any other way to work on this?
Because when we try to access hive at the same time, we get the following error-
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the da tabase /root/metastore_db.

ERROR XSDB6: Another instance of Derby may have already booted the da
tabase /root/metastore_db.
This error can occur when you are trying to start more than one instance of hive shell. The lock may sustain in background (due to improper disconnection) even after closing tab/terminal.
Solution is to find the process using grep
ps aux | grep hive
Now, kill the process using,
kill -9 hive_process_id (ex: kill -9 21765)
Restart the hive shell. It works fine.

I use Ubuntu and this error occurred when i opened hive from same location in two separate terminal windows. This would be interpreted as multiple users by the system. Close one of the terminal windows/tabs and that should do the trick.

This occurs when running two instances of a spark application (eg: spark-shell, spark-sql, or start-thriftserver) started in the same directory using the embedded Derby metastore.
When not configured by the hive-site.xml, the Spark context automatically creates metastore_db in the current directory (see Spark docs). To avoid this, start the second spark application in a different directory or use a persistent metastore (eg: Hive Derby in Server Mode) and configure it via hive-site.xml.

Hive query is giving error

I have a HIVE table called testdata and the columns are as follows
name
age
gender
From hive prompt when i am issuing the command "select * from testdata", it is showing me the whole dataset. But when i am issuing the command select name from testdata, it is showing me the error
java.net.NoRouteToHostException: No Route to Host from [NAMENODE_IP] to [CLUSTER_IP]:35946 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost.
Can anybody please help me to find out what exactly i am doing wrong.
My Hadoop Version is 2.2.0 and Hive version is 0.11.0

Check for your 'hosts' file, system's current IP should have to be first entry in your hosts file.

I ran into the same trouble so I thought I would tell you guys what I did to fix this very issue.
1) Disable your firewall on the name node and see if it works
2) If it does then your firewall is blocking the network discussions between nodes. You will have to manually add rules to allow connections between your name node IPs
You can find how to do so with this very detailed answer on iptables here
https://serverfault.com/questions/30026/whitelist-allowed-ips-in-out-using-iptables/30031#30031

You need correct related hadoop configuration. It seems that hive cannot connect jobtracker.
When you query select * from testdata , hive have not use mapreduce to get result.
While you query select name from testdata, hive will call hadoop to start a mapreduce job.
So, make your hadoop configuration correct.

how to connect to a file based HSQLDB database with sqltool?

I have tried to follow the instructions in chapter 1 of the HSQLDB doc and started my server like:
java -cp hsqldb-2.2.5/hsqldb/lib/hsqldb.jar org.hsqldb.Server -database.0 file:#pathtodb# -dbname.0 xdb
and I have reason to believe that worked cause it said (among other things):
Database [index=0, id=0, db=file:#pathtodb#, alias=xdb] opened sucessfully in 2463 ms.
However at the next step I try to connect using SqlTool and based on chapter 8 of the documentation I came up with this command to connect:
java -jar hsqldb-2.2.5/hsqldb/lib/sqltool.jar localhost-sa
Which gives the following error:
Failed to get a connection to 'jdbc:hsqldb:hsql://localhost' as user "SA".
Cause: General error: database alias does not exist
while the server says:
[Server#60072ffb]: [Thread[HSQLDB Connection #4ceafb71,5,HSQLDB Connections #60072ffb]]: database alias= does not exist
I am at a loss. Should I specify alias when connecting somehow? What alias would my database have then? The server did not say anything about that...
(also, yes I have copied the sqltool.rc file to my home folder.

Your server has -dbname.0 xdb as the database alias. Therefore the connection URL should include xdb. For example jdbc:hsqldb:hsql://localhost/xdb
The server can serve several databases with different aliases. The URL without alias corresponds to a server command line that does not include the alias setting.

java -jar /hsqldb-2.3.2/hsqldb/lib/sqltool.jar --inlineRc=url=jdbc:hsqldb:localhost:3333/runtime,user=sa
Enter password for sa: as2dbadmin
SqlTool v. 5337.
JDBC Connection established to a HSQL Database Engine v. 2.3.2 database

This error has been hunting me for the last 5 hours.
Together with this stupid error: HSQL Driver not working?
If you want to run your hsqldb on your servlet with Apache Tomcat it is necessary that you CLOSE the runManagerSwing.bat. I know it sounds trivial but even if you create the desired database and you run Eclipse J22 Servlet with Tomcat afterwards, you will get a bunch of errors. So runManagerSwing.bat must be closed.

See my sqltool answer over on the question "How to see all the tables in an HSQLDB database". The critical piece is setting up your sqltool.rc correctly and putting it in the right location.

You can also use the following statement for getting a connection from a files based store. this can be used if you are running the application from Windows.
connection = DriverManager.getConnection("jdbc:hsqldb:file:///c:/hsqldb/mydb", "SA", "");

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas