I have created a table couple months ago. Is there any way in HIVE that I can see when was the table created?
show table doesn't give the date creation of the table.
Execute the command desc formatted <database>.<table_name> on the hive cli. It will show detailed table information similar to
Detailed Table Information
Database:
Owner:
CreateTime:
LastAccessTime:
You need to run the following command:
describe formatted <your_table_name>;
Or if you need this information about a particular partition:
describe formatted <your_table_name> partition (<partition_field>=<value>);
First of all, enable hive support when you create your spark session:
spark = SparkSession.builder.appName('AppName').enableHiveSupport().getOrCreate()
And then:
df_desc = spark.sql('describe formatted <your_table_name>')
df_desc.show()
Related
Note: this is nearly a duplicate of this question with the distinction that in this case, the source table is date partitioned and the destination table does not yet exist. Also, the accepted solution to that question didn't work in this case.
I'm trying to copy a single day's worth of data from one date partitioned table into a new date partitoined table that I have not yet created. My hope is that BigQuery would simply create the date-partitioned destination table for me like it usually does for the non-date-partitioned case.
Using BigQuery CLI, here's my command:
bq cp mydataset.sourcetable\$20161231 mydataset.desttable\$20161231
Here's the output of that command:
BigQuery error in cp operation: Error processing job
'myproject:bqjob_bqjobid': Partitioning specification must be provided
in order to create partitioned table
I've tried doing something similar using the python SDK: running a select command on a date partitioned table (which selects data from only one date partition) and saving the results into a new destination table (which I hope would also be date partitioned). The job fails with the same error:
{u'message': u'Partitioning specification must be provided in order to
create partitioned table', u'reason': u'invalid'}
Clearly I need to add a partitioning specification, but I couldn't find any documentation on how to do so.
You need to create the partitioned destination table first (as per the docs):
If you want to copy a partitioned table into another partitioned
table, the partition specifications for the source and destination
tables must match.
So, just create the destination partitioned table before you start copying. If you can't be bothered specifying the schema, you can create the destination partitioned table like so:
bq mk --time_partitioning_type=DAY mydataset.temps
Then, use a query instead of a copy to write to the destination table. The schema will be copied with it:
bq query --allow_large_results --replace --destination_table 'mydataset.temps$20160101''SELECT * from `source`'
I want to add a unique value to my hive table whenever i enter any record, that value should not be repeated in the entire hive table. I am not able to find any solutions or any function for this. In my case i want to enter the record in hive using pig latin. Please help.
HIVE does not provide RDBMS database like constraints.
The suggested approch using PIG Script is as below.
1. Load data
2. Apply DISTINCT to data
3. Store data at a location
4. Create external hive table at the same location.
Step 3 and 4 can be combined if you can use HCATALOG which allows you to directly store data in Hive table.
Official documentation :Link 1 link 2
did you take a look to this? https://github.com/manojkumarvohra/hive-hilo it seems to provide a way to generate sequence numbers in hive using hi/lo algorithm
I am creating and insert tables in HIVE,and the files are created on HDFS and some on external storage S3
Assuming if I created a 10 tables,is there any system table in Hive where I can find the table info created by the user??? (for example like in Teradata we have DBC.tablesv which hold information of all the user defined tables)
You can find where you metastore is configured to be in the hive-site.xml file.
Its usual location is under /etc/hive/{$hadoop_version}/ or /etc/hive/conf/.
grep for "hive.metastore.uris" or "javax.jdo.option.ConnectionURL" to see which db you are using for the metastore. The credentials should also be there.
If, for example, your metastore is on a MySQL server, you can run queries like
SELECT * FROM TBLS;
SELECT * FROM PARTITIONS;
etc
You can't query (as in SELECT ... FROM...) the metadata from within Hive.
You do however have comnands that display that information, e.g. show databases, show tables, desc MyTable etc.
I'm not sure I understood 100% your question, if you mean the informations about the creation of the table, like the query itself, with the location on HDFS, table properties, etc, you can try with:
SHOW CREATE TABLE <table>;
If you need to retrieve a list of the columns names and datatypes try with:
DESCRIBE <table>;
Can anyone please suggest how to create partition table in Big Query ?.
Example: Suppose I have one log data in google storage for the year of 2016. I stored all data in one bucket partitioned by year , month and date wise. Here I want create table with partitioned by date.
Thanks in Advance
Documentation for partitioned tables is here:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables
In this case, you'd create a partitioned table and populate the partitions with the data. You can run a query job that reads from GCS (and filters data for the specific date) and writes to the corresponding partition of a table. For example, to load data for May 1st, 2016 -- you'd specify the destination_table as table$20160501.
Currently, you'll have to run several query jobs to achieve this process. Please note that you'll be charged for each query job based on bytes processed.
Please see this post for some more details:
Migrating from non-partitioned to Partitioned tables
There are two options:
Option 1
You can load each daily file into separate respective table with name as YourLogs_YYYYMMDD
See details on how to Load Data from Cloud Storage
After tables created, you can access them either using Table wildcard functions (Legacy SQL) or using Wildcard Table (Standar SQL). See also Querying Multiple Tables Using a Wildcard Table for more examples
Option 2
You can create Date-Partitioned Table (just one table - YourLogs) - but you still will need to load each daily file into respective partition - see Creating and Updating Date-Partitioned Tables
After table is loaded you can easily Query Date-Partitioned Tables
Having partitions for an External Table is not allowed as for now. There is a Feature Request for it:
https://issuetracker.google.com/issues/62993684
(please vote for it if you're interested in it!)
Google says that they are considering it.
I have created a table in hive, I would like to know which directory my table is created in? I would like to know the path...
DESCRIBE FORMATTED my_table;
or
DESCRIBE FORMATTED my_table PARTITION (my_column='my_value');
There are three ways to describe a table in Hive.
1) To see table primary info of Hive table, use describe table_name; command
2) To see more detailed information about the table, use describe extended table_name; command
3) To see code in a clean manner use describe formatted table_name; command to see all information. also describe all details in a clean manner.
Resource: Hive interview tips
You can use below commands for the same.
show create table <table>;
desc formatted <table>;
describe formatted <table>;
DESCRIBE FORMATTED <tablename>
or
DESCRIBE EXTENDED <tablename>
I prefer formatted because it is more human readable format
To see both of the structure and location (directory) of an any (internal or external)table, we can use table's create statment-
show create table table_name;
in hive 0.1 you can use SHOW CREATE TABLE to find the path where hive store data.
in other versions, there is no good way to do this.
upadted:
thanks Joe K
use DESCRIBE FORMATTED <table> to show table information.
ps: database.tablename is not supported here.
Further to pensz answer you can get more info using:
DESCRIBE EXTENDED my_table;
or
DESCRIBE EXTENDED my_table PARTITION (my_column='my_value');
All HIVE managed tables are stored in the below HDFS location.
hadoop fs -ls /user/hive/warehouse/databasename.db/tablename
If you use Hue, you can browse the table in the Metastore App and then click on 'View file location': that will open the HDFS File Browser in its directory.
in the 'default' directory if you have not specifically mentioned your location.
you can use describe and describe extended to know about the table structure.