Just get column names from hive table

Just get column names from hive table - sql

I know that you can get column names from a table via the following trick in hive:
hive> set hive.cli.print.header=true;
hive> select * from tablename;
Is it also possible to just get the column names from the table?
I dislike having to change a setting for something I only need once.
My current solution is the following:
hive> set hive.cli.print.header=true;
hive> select * from tablename;
hive> set hive.cli.print.header=false;
This seems too verbose and against the DRY-principle.

If you simply want to see the column names this one line should provide it without changing any settings:
describe database.tablename;
However, if that doesn't work for your version of hive this code will provide it, but your default database will now be the database you are using:
use database;
describe tablename;

you could also do show columns in $table or see Hive, how do I retrieve all the database's tables columns for access to hive metadata

The solution is
show columns in table_name;
This is simpler than use
describe tablename;
Thanks a lot.

use desc tablename from Hive CLI or beeline to get all the column names. If you want the column names in a file then run the below command from the shell.
$ hive -e 'desc dbname.tablename;' > ~/columnnames.txt
where dbname is the name of the Hive database where your table is residing
You can find the file columnnames.txt in your root directory.
$cd ~
$ls

Best way to do this is setting the below property:
set hive.cli.print.header=true;
set hive.resultset.use.unique.column.names=false;

Related

Hive -where are tables information stored

I am creating and insert tables in HIVE,and the files are created on HDFS and some on external storage S3
Assuming if I created a 10 tables,is there any system table in Hive where I can find the table info created by the user??? (for example like in Teradata we have DBC.tablesv which hold information of all the user defined tables)

You can find where you metastore is configured to be in the hive-site.xml file.
Its usual location is under /etc/hive/{$hadoop_version}/ or /etc/hive/conf/.
grep for "hive.metastore.uris" or "javax.jdo.option.ConnectionURL" to see which db you are using for the metastore. The credentials should also be there.
If, for example, your metastore is on a MySQL server, you can run queries like
SELECT * FROM TBLS;
SELECT * FROM PARTITIONS;
etc

You can't query (as in SELECT ... FROM...) the metadata from within Hive.
You do however have comnands that display that information, e.g. show databases, show tables, desc MyTable etc.

I'm not sure I understood 100% your question, if you mean the informations about the creation of the table, like the query itself, with the location on HDFS, table properties, etc, you can try with:
SHOW CREATE TABLE <table>;
If you need to retrieve a list of the columns names and datatypes try with:
DESCRIBE <table>;

Storing query result in a variable

I have a query whose result I wanted to store in a variable
How can I do it ?
I tried
./hive -e "use telecom;insert overwrite local directory '/tmp/result' select
avg(a) from abc;"
./hive --hiveconf MY_VAR =`cat /tmp/result/000000_0`;
I am able to get average value in MY_VAR but it takes me in hive CLI which is not required
and is there a way to access unix commands inside hive CLI?

Use Case: in mysql the following is valid:
set #max_date := select max(date) from some_table;
select * from some_other_table where date > #max_date;
This is super useful for scripts that need to repeatedly call this variable since you only need to execute the max date query once rather than every time the variable is called.
HIVE does not currently support this. (please correct me if I'm wrong! I have been trying to figure out how to do this all all afternoon)
My workaround is to store the required variable in a table that is small enough to map join onto the query in which it is used. Because the join is a map rather than a broadcast join it should not significantly hurt performance. For example:
drop table if exists var_table;
create table var_table as
select max(date) as max_date from some_table;
select some_other_table.*
from some_other_table
left join var_table
where some_other_table.date > var_table.max_date;
The suggested solution by #visakh is not optimal because stores the string 'select count(1) from table_name;' rather than the returned value and so will not be helpful in cases where you need to call a var repeatedly during a script.

Storing hive query output in a variable and using it in another query.
In shell create a variable with desired value by doing:
var=`hive -S -e "select max(datekey) from ....;"`
echo $var
Use the variable value in another hive query by:
hive -hiveconf MID_DATE=$var -f test.hql

You can simply achieve this using a shell script.
create a shell script
file: avg_op.sh
#!/bin/sh
hive -e 'use telecom;select avg(a) from abc;' > avg.txt
wait
value=`cat avg.txt`
hive --hiveconf avgval=$value -e "set avgval;set hiveconf:avgval;
use telecom;
select * from abc2 where avg_var=\${hiveconf:avgval};"
execute the .sh file
>bash avg_op.sh

If you trying to capture the number from a Hive query or impala query in Linux, you can achieve this by executing the query and selecting numbers from the regex.
With Hive,
max=`beeline -u ${hiveConnectionUrl} -e "select max(col1) from schema_name.table_name;" | sed 's/[^0-9]*//g'`
The main part is to extract the number from the result. Also, if you're getting too large a result, you can use --silent=true flag to silent the execution which would reduce the log messages.

You can use BeeTamer for that. It allows to store result (or part of it) in a variable, and use this variable later in your code.
Beetamer is a macro language / macro processor that allows to extend functionality of the Apache Hive and Cloudera Impala engines.
select avg(a) from abc;
%capture MY_AVERAGE;
select * from abc2 where avg_var=#MY_AVERAGE#;
In here you save average value from you query into macro variable MY_AVERAGE and then reusing it in the second query.

try below :
$ var=$(hive -e "select '12' ")
$ echo $var
12 -- output

Export Hive Query Results

I'm new to hive and could use some tips.
I'm trying to export query results from hive as a csv. When I try to pipe them out of CLI like:
hive -e 'select * from table'>OutPut.txt
I get a text file that has all the records but doesn't have the column headers. Does anyone have a tip for how to export the query results with the column headers, to a csv file?
If I run the query in hue, and then download the results as a csv I get a csv with the column headers but no records. If anyone has a tip on how to download query results from hue with records and column headers, I would greatly appreciate it too.

To export the column headers, you need to set the following in the hiverc file:
set hive.cli.print.header=true;
To get just the headers into a file, you could try the following:
hive -e 'set hive.cli.print.header=true; SELECT * FROM TABLE_NAME LIMIT 0;' > /file_path/file_name.txt

Having the column header but missing data is a known issue: HUE-544
The workaround is to use Hue 3 or more or switch to HiveServer2 (recommended starting from CDH4.6).

Best equivalent of SQL Server UPDATE command in Hive

What is the best (less expensive) equivalent of SQL Server UPDATE SET command in Hive?
For example, consider the case in which I want to convert the following query:
UPDATE TABLE employee
SET visaEligibility = 'YES'
WHERE experienceMonths > 36
to equivalent Hive query.

I'm assuming you have a table without partitions, in which case you should be able to do the following command:
INSERT OVERWRITE TABLE employee SELECT employeeId,employeeName, experienceMonths ,salary, CASE WHEN experienceMonths >=36 THEN ‘YES’ ELSE visaEligibility END AS visaEligibility FROM employee;
There are other ways but they are much more convoluted, I think the way Bejoy described is the most efficient.
(source: Bejoy KS blog)
Note that if you have to do this on a partitioned table (which is likely if you have a lot of data), you would probably need to overwrite your partition when doing this.

You can create an external table and use the 'insert overwrite into local directory' and in case you want to change the column values, you can use 'CASE WHEN', 'IF' or other conditional operations. And copy the output file back to HDFS location.

You can upgrade your hive to 0.14.0
Starting from 0.14.0 hive supports UPDATE operation.
To do the same we need to create hive tables such that they support ACID output format and need to set additional properties in hive-site.xml.
How to do CURD operations in Hive

I have created a table in hive, I would like to know which directory my table is created in?

I have created a table in hive, I would like to know which directory my table is created in? I would like to know the path...

DESCRIBE FORMATTED my_table;
or
DESCRIBE FORMATTED my_table PARTITION (my_column='my_value');

There are three ways to describe a table in Hive.
1) To see table primary info of Hive table, use describe table_name; command
2) To see more detailed information about the table, use describe extended table_name; command
3) To see code in a clean manner use describe formatted table_name; command to see all information. also describe all details in a clean manner.
Resource: Hive interview tips

You can use below commands for the same.
show create table <table>;
desc formatted <table>;
describe formatted <table>;

DESCRIBE FORMATTED <tablename>
or
DESCRIBE EXTENDED <tablename>
I prefer formatted because it is more human readable format

To see both of the structure and location (directory) of an any (internal or external)table, we can use table's create statment-
show create table table_name;

in hive 0.1 you can use SHOW CREATE TABLE to find the path where hive store data.
in other versions, there is no good way to do this.
upadted:
thanks Joe K
use DESCRIBE FORMATTED <table> to show table information.
ps: database.tablename is not supported here.

Further to pensz answer you can get more info using:
DESCRIBE EXTENDED my_table;
or
DESCRIBE EXTENDED my_table PARTITION (my_column='my_value');

All HIVE managed tables are stored in the below HDFS location.
hadoop fs -ls /user/hive/warehouse/databasename.db/tablename

If you use Hue, you can browse the table in the Metastore App and then click on 'View file location': that will open the HDFS File Browser in its directory.

in the 'default' directory if you have not specifically mentioned your location.
you can use describe and describe extended to know about the table structure.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Just get column names from hive table - sql

you could also do show columns in $table or see Hive, how do I retrieve all the database's tables columns for access to hive metadata

The solution is show columns in table_name; This is simpler than use describe tablename; Thanks a lot.

Best way to do this is setting the below property: set hive.cli.print.header=true; set hive.resultset.use.unique.column.names=false;

Related

Hive -where are tables information stored

Storing query result in a variable

Export Hive Query Results

Best equivalent of SQL Server UPDATE command in Hive

I have created a table in hive, I would like to know which directory my table is created in?

Categories

Resources