For a given hive query, how to trace/link Hive metastore logs - hive

I am getting MySQL alerts for some slow queries. So, I want to trace back the Hive query which is responsible for this?
Hive Stack is like: MySQL -> HMS(Hive Metadata Service) -> HS2 server -> beeline
Now, each component is logging queries being executed by it but I am not able to find a traceId sort of value which can link all these executed queries by different components.
For HS2 server and beeline, I am getting a queryId like
in beeline
INFO : Compiling command(queryId=hadoop_20221109165616_c873bba7-b6e1-4ba0-a92f-b46c5f76ca36): use test
and in HS2
2022-11-09T16:56:16,429 INFO [180b5b91-eea3-47a9-a16f-1bf6c49d438a HiveServer2-Handler-Pool: Thread-125]: ql.Driver (Driver.java:compile(409)) - Compiling command(queryId=hadoop_20221109165616_c873bba7-b6e1-4ba0-a92f-b46c5f76ca36): use test
but nothing sort of this in hive metastore logs.
Is there a way to get this sort of traceId or queryId in HMS logs, so that my object of tracing a hive query in all hive services can be achieved?

Related

Oozie solution to execute a query and get results back from sql & Hive

I am trying to solve the below problem using oozie. Any suggestions about solution are much appreciated.
Back ground : I had developed a code to import data from SQL database using (oozie - Sqoop import) and done some transformation and loaded the data to Hive. Now I need to do a count check between SQL and Hive for reconciliation
Is there any way I can do that using oozie.
I am thinking about executing sql query using "sqoop eval" and hive query using "hive action" from oozie , but I am wondering how can we get the results back to oozie / capture the results after the query execution .
Once the results are available I need to do a reconciliation in subsequent action
I had implemented it using a py-spark action , by executing sqoop eval and Hive Dataframe counts. Its working fine.

Where does hive metastore store locks info?

I am trying to create indexes on one hive table and getting error:
FAILED: Error in acquiring locks: Lock acquisition for
LockRequest(component:[LockComponent(type:EXCLUSIVE, level:PARTITION,
dbname:,
tablename:jobs_indx_jobs_title,
partitionname:year=2016/month=1/sourcecd=BYD),
LockComponent(type:SHARED_READ, level:TABLE, dbname:,
tablename:jobs), LockComponent(type:SHARED_READ, level:PARTITION,
dbname:, tablename:jobs,
partitionname:year=2016/month=1/sourcecd=BD)], txnid:0, user:hadoop,
hostname:Hortorn-NN-2.b2vheq12ivkfdsjdskdf3nba.dx.internal.cloudapp.net)
timed out after 5504043ms. LockResponse(lockid:58318, state:WAITING)
I want to know in which table hive metastore locks info that it shows while executing "show locks" command?
It's not in the Metastore, it's in a ZooKeeper topic...
Just read the documentation and the design decisions back in 2010 for HIVE-1293
If the table is non-transactional, try setting hive.support.concurrency=false.

failed to run hive queries in parallel using hue query editor

I have cdh-5 cluster with hive, impala and hue installed.
When 2 users try to use in parallel the Hue "Query Editor" with either Impala or Hive, we never get the result back.
When a single user fires a query we get results without a problem
When we tried to user Hive command line interface we could run queries in parallel.
We also tried to create different hue users, but even when different hue users tried to run queries in parallel we still got no result
It looks like hue configuration issue.
Any ideas?
Yosi

Hive command for finding the data nodes on which a query was run

Can somebody help me with a hive command to find the data nodes on which a aprticular hive query was run.
For eg- Select * from mytable;
ran on which data nodes in my hadoop cluster having only hive.
datanode is only for storage.what you really want is which mr node is running the sql
hive transform the sql to normal MR jobs.So you can find your sql job at jobtracker(MR1) or resoucemanager(yarn) web interface

Can Spark SQL be executed against Hive tables without any Map/Reduce (/Yarn) running?

It is my understanding that Spark SQL reads hdfs files directly - no need for M/R here. Specifically none of the Map/Reduce based Hadoop Input/OutputFormat's are employed (except in special cases like HBase)
So then are there any built-in dependencies on a functioning hive server? Or is it only required to have
a) Spark Standalone
b) HDFS and
c) Hive metastore server running
i.e Yarn/MRV1 are not required?
The hadoop related I/O formats for accessing hive files seem to include:
TextInput/Output Format
ParquetFileInput/Output Format
Can Spark SQL/Catalyst read Hive tables stored in those formats - with only the Hive Metastore server running ?
Yes.
The Spark SQL Readme says:
Hive Support (sql/hive) - Includes an extension of SQLContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
This is implemented by depending on Hive libraries for reading the data. But the processing happens inside Spark. So no need for MapReduce or YARN.