Hive multiple users on same tables - hive

is that possible to have tables that are shared in hive.
I mean a user creates a hive table. Later multiple users can work on that same table simultaneously.
I heard about derby and individual metastore for each users. But individual metastore option does not allow users to work simultaneously on same set of tables right?
Is there any other way to work on this?
Because when we try to access hive at the same time, we get the following error-
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the da tabase /root/metastore_db.

ERROR XSDB6: Another instance of Derby may have already booted the da
tabase /root/metastore_db.
This error can occur when you are trying to start more than one instance of hive shell. The lock may sustain in background (due to improper disconnection) even after closing tab/terminal.
Solution is to find the process using grep
ps aux | grep hive
Now, kill the process using,
kill -9 hive_process_id (ex: kill -9 21765)
Restart the hive shell. It works fine.

I use Ubuntu and this error occurred when i opened hive from same location in two separate terminal windows. This would be interpreted as multiple users by the system. Close one of the terminal windows/tabs and that should do the trick.

This occurs when running two instances of a spark application (eg: spark-shell, spark-sql, or start-thriftserver) started in the same directory using the embedded Derby metastore.
When not configured by the hive-site.xml, the Spark context automatically creates metastore_db in the current directory (see Spark docs). To avoid this, start the second spark application in a different directory or use a persistent metastore (eg: Hive Derby in Server Mode) and configure it via hive-site.xml.

Related

Installing RCU for Oracle Data integrator runs on Error

I am trying to installing RCU for Oracle Repository Creation utility however everytime I try to install the development repository it runs on error.
Steps to reproduce the issue.
I run rcu bat file
Create repository/system load and product load
3.Choose oracle as database type/hostname localhost, port 1521 servicename xe, username sys and the password (I am able to login into oracle database with this login information) (I am using oracle 18c express
I am using the prefix dev
I am adding password for schemas, supervisor, and for the repository user.
I am not touching the tablespaces
I start the install and several error messages are dropping. like
Ora-65096 invalid common user or role name
Ora-01917 DEV_STB user does not exists
Ora-00955 The name has already been in use by another object.
My question what could be the problem with the installation of RCU and How can I resolve the issue ?
Funny thing is I am trying to install odi and RCU based on step by step video still something went wrong...
The issue here is you are logging to CDB with the service name XE. You need to log into the PDB (Plugable database).
Just change the service name from XE to XEPDB1 while connecting to the DB through RCU and your issue should be resolved.

How to drop a database from hive metastore when underlying HDFS cluster in no longer there

I'm working with ephemeral GCP Dataproc clusters ( Apache Spark 2.2.1, Apache Hadoop 2.8.4 and Apache Hive 2.1.1). These clusters all point to the same Hive Metastore (hosted on a Google Cloud SQL instance).
I created a database on one such cluster and set it's location to 'HDFS:///database_name' like so:
$ gcloud dataproc jobs submit hive \
-e "create database db_name LOCATION 'hdfs:///db_name'" \
--cluster=my-first-ephemeral-cluster --region=europe-west1
my-first-ephemeral-cluster then got deleted and with it the associated HDFS.
On all subsequent clusters the following error has since been popping up:
u'java.net.UnknownHostException: my-first-ephemeral-cluster-m'
This is probably because the Hive Metastore now has an entry for a location that does not exist.
Trying to drop the corrupted database is a no go as well:
$ gcloud dataproc jobs submit hive \
-e 'drop database db_name' \
--cluster=my-second-ephemeral-cluster --region=europe-west1
Job [4462cb1d-88f2-4e2b-8a86-c342c0ce46ee] submitted.
Waiting for job output...
Connecting to jdbc:hive2://my-second-ephemeral-cluster-m:10000
Connected to: Apache Hive (version 2.1.1)
Driver: Hive JDBC (version 2.1.1)
18/11/03 13:40:04 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: java.net.UnknownHostException: my-first-ephemeral-cluster-m) (state=08S01,code=1)
Closing: 0: jdbc:hive2://my-second-ephemeral-cluster-m:10000
The reason being that the host my-first-ephemeral-cluster-m is no longer valid. Since changing the database's location is not an option in the version of hive I'm using, I need a different workaround to dropping this database.
https://cwiki.apache.org/confluence/display/Hive/Hive+MetaTool
The Hive MetaTool enables administrators to do bulk updates on the
location fields in database, table, and partition records in the
metastore (...)
Example (...)
./hive --service metatool -updateLocation hdfs://localhost:9000 hdfs://namenode2:8020
But first, you need to know how exactly the pseudo-HDFS paths have been saved in the Metastore, in their "canonical" form e.g. hdfs://my-first-ephemeral-cluster-m/db_name (if Google follows Hadoop standards somewhat)
Since my point of view, the correct way to delete the Hive metastore entry that causes error is removing the database just before you delete the cluster my-first-ephemeral-cluster, for example an script with this sequence:
gcloud dataproc jobs submit hive -e 'drop database db_name' --cluster=my-first-ephemeral-cluster --region=europe-west1
gcloud dataproc clusters delete my-first-ephemeral-cluster
However, I found instructions of Cloud SQL proxy for setting up a shared hive warehouse between different Dataproc clusters using cloud storage (instead of LOCATION 'hdfs:///db_name' that creates the hive warehouse in the local HDFS), which could give you a behavior like the one you are looking for.
I have created a cluster in Dataproc with the same name, to remove the schema that was created with a location in HDFS.

Hive query is giving error

I have a HIVE table called testdata and the columns are as follows
name
age
gender
From hive prompt when i am issuing the command "select * from testdata", it is showing me the whole dataset. But when i am issuing the command select name from testdata, it is showing me the error
java.net.NoRouteToHostException: No Route to Host from [NAMENODE_IP] to [CLUSTER_IP]:35946 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost.
Can anybody please help me to find out what exactly i am doing wrong.
My Hadoop Version is 2.2.0 and Hive version is 0.11.0
Check for your 'hosts' file, system's current IP should have to be first entry in your hosts file.
I ran into the same trouble so I thought I would tell you guys what I did to fix this very issue.
1) Disable your firewall on the name node and see if it works
2) If it does then your firewall is blocking the network discussions between nodes. You will have to manually add rules to allow connections between your name node IPs
You can find how to do so with this very detailed answer on iptables here
https://serverfault.com/questions/30026/whitelist-allowed-ips-in-out-using-iptables/30031#30031
You need correct related hadoop configuration. It seems that hive cannot connect jobtracker.
When you query select * from testdata , hive have not use mapreduce to get result.
While you query select name from testdata, hive will call hadoop to start a mapreduce job.
So, make your hadoop configuration correct.

Migrate mysql users to another server

I have created a mysqldump --all-databases and transferred all my databases to the new server. It didn't work as debian-sys-maintusers password didn't match. So I change the password of this user. After that I restart my server and got this error.
ERROR 1577 (HY000) at line 1: Cannot proceed because system tables
used by Event Scheduler were found damaged at server start ERROR 1547
(HY000) at line 1: Column count of mysql.proc is wrong. Expected 20,
found 16. The table is probably corrupted
I dont know how many more errors will come after this. So I thought create dump with only databases which are associated with my applications (mysqldump --databases).
Now how do migrate the users? Is there any standard way?
More Information:
New Server version: 5.1.63-0+squeeze1 (Debian)
Old Server version: 5.0.51a-24+lenny5 (Debian)
You probably need to run mysql_upgrade, since your MySql versions are different.
As a general rule, however, do not copy the mysql system schema from one server to another. As a consequence, and as far as I know, there is no "standard" way of copying users and user privileges from one server to another.
If you really want/need to do it, try the following:
$> mysql --silent --skip-column-names -e"show grants for user#host"
The above outputs GRANT statements that you can feed straight away into your target server to create the user and give the same authorisations.
However, if your target server is empty, you could just move the whole data folder from your old server to the new server, and then run the standard upgrade procedure from 5.0 to 5.1 on the new server.

how to connect to a file based HSQLDB database with sqltool?

I have tried to follow the instructions in chapter 1 of the HSQLDB doc and started my server like:
java -cp hsqldb-2.2.5/hsqldb/lib/hsqldb.jar org.hsqldb.Server -database.0 file:#pathtodb# -dbname.0 xdb
and I have reason to believe that worked cause it said (among other things):
Database [index=0, id=0, db=file:#pathtodb#, alias=xdb] opened sucessfully in 2463 ms.
However at the next step I try to connect using SqlTool and based on chapter 8 of the documentation I came up with this command to connect:
java -jar hsqldb-2.2.5/hsqldb/lib/sqltool.jar localhost-sa
Which gives the following error:
Failed to get a connection to 'jdbc:hsqldb:hsql://localhost' as user "SA".
Cause: General error: database alias does not exist
while the server says:
[Server#60072ffb]: [Thread[HSQLDB Connection #4ceafb71,5,HSQLDB Connections #60072ffb]]: database alias= does not exist
I am at a loss. Should I specify alias when connecting somehow? What alias would my database have then? The server did not say anything about that...
(also, yes I have copied the sqltool.rc file to my home folder.
Your server has -dbname.0 xdb as the database alias. Therefore the connection URL should include xdb. For example jdbc:hsqldb:hsql://localhost/xdb
The server can serve several databases with different aliases. The URL without alias corresponds to a server command line that does not include the alias setting.
java -jar /hsqldb-2.3.2/hsqldb/lib/sqltool.jar --inlineRc=url=jdbc:hsqldb:localhost:3333/runtime,user=sa
Enter password for sa: as2dbadmin
SqlTool v. 5337.
JDBC Connection established to a HSQL Database Engine v. 2.3.2 database
This error has been hunting me for the last 5 hours.
Together with this stupid error: HSQL Driver not working?
If you want to run your hsqldb on your servlet with Apache Tomcat it is necessary that you CLOSE the runManagerSwing.bat. I know it sounds trivial but even if you create the desired database and you run Eclipse J22 Servlet with Tomcat afterwards, you will get a bunch of errors. So runManagerSwing.bat must be closed.
See my sqltool answer over on the question "How to see all the tables in an HSQLDB database". The critical piece is setting up your sqltool.rc correctly and putting it in the right location.
You can also use the following statement for getting a connection from a files based store. this can be used if you are running the application from Windows.
connection = DriverManager.getConnection("jdbc:hsqldb:file:///c:/hsqldb/mydb", "SA", "");