In hue we are not able to find table - hive

Client created a table in hue under some db and that table is not reflecting in hue. But we are able to see that table in Hive but not in hue browser.
Please someone can help me on this
Thanks.

You have two options:
a. distribute the tables in multiple DBs (recommended).
b. manually adjust the 'rows' limit in dbms.py as shown in below. Change it from 5000 to 8000.
Restart hue after the change.
Sample from /usr/lib/hue/apps/beeswax/src/beeswax/server/dbms.py below:
-----------------
if handle:
result = self.fetch(handle, rows=8000) --> This value needs to be changed to 8000.
self.close(handle)
return [name for table in result.rows() for name in table]
else:
return []
Please note, since you are customizing this file, you need to apply this change after every hive upgrade and hue performance can be impacted because of this.

Related

Hive retrieving meta data too slow

Situation :
I am aiming to retrieve location info for a list of hive external tables. For one table the easiest way is just using show create table table_name but I have quite a number of table, so I am finding alternative to achieve this. I managed to find there is the sys db in hive.
It seems the db is storing meta info of all tables, and I found the table sds that is storing these location info.
However when I query this sds table with the simplest where query select * from sds where sd_id = a_sd_id searching for info of only one table. It takes more than 50 seconds to return the result.
On the other hand, what is weird is that if I try to retrieve the same info using show create table the_table_name command, all table info include the location info is returned in 0.05 second .
So now my questions is when I trigger show create table, where did hive retrieve these info? Is it the same source when I query from the sys.sds table? If the two are the same source then the huge time gap between the ways cannot be explained.
Could anyone help cast some light on why the situation turns out like this and how can I retrieve the location info as I expected, i.e. retrieving from mysql metastore which can return as fast as the show create table command? I suppose the show create table should be accessing the mysql. But if the sys db is a mapping of the mysql db, why the query on these tables return 100 times slower than the show create?

ClickHouse limitations in column manipulation

I found in CH documentation that column manipulations have some limitations.
For tables that don’t store data themselves (such as Merge and Distributed), ALTER just changes the table structure, and does not change the structure of subordinate tables. For example, when running ALTER for a Distributed table, you will also need to run ALTER for the tables on all remote servers.
And here I have questions.. do you have some solutions to run it automatically? I have 4 servers created on containers and I don't want to login in each one and execute it manually commands like ALTER ... itd.
run ALTER TABLE db.table ADD COLUMN ... ON CLUSTER 'cluster-name'
first part for underlying Engine=ReplicatedMergeTree(...) table, and in second part for Engine=Distributed(...) table
Hmm Just expose port and write script that can go through each container and run command. ?
In Python ClickHouse have driver.
from clickhouse_driver import Client
client = Client('localhost', port=8090, user='admin', password='admin')
And iterate just over ports.

Hive Data Flow Issues

I am using Hive on HDInsights/Azure Spark 2.2 Cluster, submitting my queries through Ambari, the data is stored in External tables on Azure Data Lake. Staging and Target tables are partitioned.
I've been working on loading data in Hive today. The flow of data goes from .gz file -> staging table -> target table. It's an incremental load, left join from target to landing to preserve old data and then union all with new data for the full set.
I've noticed some behaviors that seem odd to me, was hoping to gather more insight.
Observation 1: After running the script through, I notice the new data is not present in the staging or the target from the original table/gz file. I wouldn't expect that since there's a UNION ALL present.
Observation 2: I did one step, manually loading data into my staging table from the .gz file/table. I run a simple count(*) on it. It returns 39k, great. I try running a select * where val = XYZ, it returns records, great again. I put a count(*) on that expression, starts returning 0 records.
Apologies if my thoughts are jumbled but wanted to know if there's anybody out there who's experienced similar occurrences and how to overcome them. Let me know any clarifications needed.
Are you sure you don't have spaces in your key ? have you tried trim(val) ?
Observation 2 is really surprising : from the same where predicates, you have rows being returned with a select * but nothing with select(*) ?
Could you include SQL queries and some rows of data ?

Add unique value in hive table

I want to add a unique value to my hive table whenever i enter any record, that value should not be repeated in the entire hive table. I am not able to find any solutions or any function for this. In my case i want to enter the record in hive using pig latin. Please help.
HIVE does not provide RDBMS database like constraints.
The suggested approch using PIG Script is as below.
1. Load data
2. Apply DISTINCT to data
3. Store data at a location
4. Create external hive table at the same location.
Step 3 and 4 can be combined if you can use HCATALOG which allows you to directly store data in Hive table.
Official documentation :Link 1 link 2
did you take a look to this? https://github.com/manojkumarvohra/hive-hilo it seems to provide a way to generate sequence numbers in hive using hi/lo algorithm

Hive -where are tables information stored

I am creating and insert tables in HIVE,and the files are created on HDFS and some on external storage S3
Assuming if I created a 10 tables,is there any system table in Hive where I can find the table info created by the user??? (for example like in Teradata we have DBC.tablesv which hold information of all the user defined tables)
You can find where you metastore is configured to be in the hive-site.xml file.
Its usual location is under /etc/hive/{$hadoop_version}/ or /etc/hive/conf/.
grep for "hive.metastore.uris" or "javax.jdo.option.ConnectionURL" to see which db you are using for the metastore. The credentials should also be there.
If, for example, your metastore is on a MySQL server, you can run queries like
SELECT * FROM TBLS;
SELECT * FROM PARTITIONS;
etc
You can't query (as in SELECT ... FROM...) the metadata from within Hive.
You do however have comnands that display that information, e.g. show databases, show tables, desc MyTable etc.
I'm not sure I understood 100% your question, if you mean the informations about the creation of the table, like the query itself, with the location on HDFS, table properties, etc, you can try with:
SHOW CREATE TABLE <table>;
If you need to retrieve a list of the columns names and datatypes try with:
DESCRIBE <table>;