In hive,can we change location of managed/external table
if yes how. After changing location will it behave like external table or internal table
I tried to search this question but I didnt get a proper answer
yes we can change the location of managed table if we add location
CREATE TABLE weather (wban INT, date STRING, precip INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LOCATION ‘ /hive/data/weather’;
After creation we can change location by the below command
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path
Even if we change the location the table will behave as managed table only
yes, it's possible. And if you are looking for an external table location change, please use an alter script like below
ALTER TABLE users
SET LOCATION 'hdfs://hostname:port/source_folder_path'
Related
I have the following table:
CREATE EXTERNAL TABLE aggregate_status(
m_point VARCHAR(50),
territory VARCHAR(50),
reading_meter VARCHAR(50),
meter_type VARCHAR(500)
)
PARTITIONED BY(
insert_date VARCHAR(10))
STORED AS PARQUET
LOCATION '<the s3 route>/aggregate_status'
TBLPROPERTIES(
'parquet.compression'='SNAPPY'
)
I wish to change the reading_meter column to reading_mode, without losing data.
ALTER TABLE works, but the field now shows null.
I'm not the owner of the Hadoop enviroment I'm working on so changing properties such as set parquet.column.index.access = true is discarded.
Any help would be appreciated. Thanks.
Managed to find a solution, at least for short amounts of data.
Create a backup of the table, with the column name already changed.
CREATE TABLE aggregate_status_bkp AS
SELECT
m_point,
territory,
reading_meter AS reading_mode,
meter_type,
insert_date
FROM aggregate_status
Perform the ALTER TABLE
ALTER TABLE aggregate_status CHANGE COLUMN reading_meter reading_mode VARCHAR (50)
INSERT OVERWRITE from the backup to the original.
--You might need to temporarily disable strict partition mode depending on your case, this is safe since it's only a lock.
--set hive.exec.dynamic.partition.mode=nonstrict;
INSERT OVERWRITE TABLE aggregate_status PARTITION(insert_date)
SELECT
m_point,
territory,
reading_mode,
meter_type,
insert_date
FROM aggregate_status_bkp;
--set hive.exec.dynamic.partition.mode=strict;
Another situation we want to protect against dynamic partition insert is that the user may accidentally specify all partitions to be dynamic partitions without specifying one static partition, while the original intention is to just overwrite the sub-partitions of one root partition. We define another parameter hive.exec.dynamic.partition.mode=strict to prevent the all-dynamic partition case.
See https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-QueryingandInsertingData
Optional Delete the backup table after you're finished.
DROP TABLE aggregate_status_bkp;
I have created a table using partition. I tried two ways for my s3 bucket folder as following but both ways I get no records found when I query with where clause containing partition clause.
My S3 bucket looks like following. part*.csv is what I want to query in Athena. There are other folders at same location along side output, within output.
s3://bucket-rootname/ABC-CASE/report/f78dea49-2c3a-481b-a1eb-5169d2a97747/output/part-filename121231.csv
s3://bucket-rootname/XYZ-CASE/report/678d1234-2c3a-481b-a1eb-5169d2a97747/output/part-filename213123.csv
my table looks like following
Version 1:
CREATE EXTERNAL TABLE `mytable_trial1`(
`status` string,
`ref` string)
PARTITIONED BY (
`casename` string,
`id` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION
's3://bucket-rootname/'
TBLPROPERTIES (
'has_encrypted_data'='false',
'skip.header.line.count'='1')
ALTER TABLE mytable_trial1 add partition (casename="ABC-CASE",id="f78dea49-2c3a-481b-a1eb-5169d2a97747") location "s3://bucket-rootname/casename=ABC-CASE/report/id=f78dea49-2c3a-481b-a1eb-5169d2a97747/output/";
select * from mytable_trial1 where casename='ABC-CASE' and report='report' and id='f78dea49-2c3a-481b-a1eb-5169d2a97747' and foldername='output';
Version 2:
CREATE EXTERNAL TABLE `mytable_trial1`(
`status` string,
`ref` string)
PARTITIONED BY (
`casename` string,
`report` string,
`id` string,
`foldername` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION
's3://bucket-rootname/'
TBLPROPERTIES (
'has_encrypted_data'='false',
'skip.header.line.count'='1')
ALTER TABLE mytable_trial1 add partition (casename="ABC-CASE",report="report",id="f78dea49-2c3a-481b-a1eb-5169d2a97747",foldername="output") location "s3://bucket-rootname/casename=ABC-CASE/report=report/id=f78dea49-2c3a-481b-a1eb-5169d2a97747/foldername=output/";
select * from mytable_trial1 where casename='ABC-CASE' and id='f78dea49-2c3a-481b-a1eb-5169d2a97747'
Show partitions shows this partition but no records found with where clause.
I worked with the AWS Support and we were able to narrow down the issue. Version 2 was right one to use since it has four partitions like my S3 bucket. Also, the Alter table command had issue with location. I used hive format location which was incorrect since my actual S3 location is not hive format. So correcting the command to following worked for me.
ALTER TABLE mytable_trial1 add partition (casename="ABC-CASE",report="report",id="f78dea49-2c3a-481b-a1eb-5169d2a97747",foldername="output") location "s3://bucket-rootname/ABC-CASE/report/f78dea49-2c3a-481b-a1eb-5169d2a97747/output/";
Preview table now shows my entries.
Is it possible to get the external table name if the only information I have is the HDFS directory.
For example, I create the table with
CREATE EXTERNAL TABLE IF NOT EXISTS userinfo(id String, name String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'hdfs:///user/testuser/log/2019-02-18/‘
To get the location from table name, I can use
show create table userinfo;
But if I want to get the table name from "hdfs:///user/testuser/log/2019-02-18/"?
Is it possible to find the table name "userinfo" from the directory?
Thanks
David
I created a hive table and then imported the data from the csv file.
when i am doing the order by query on salary, it gives me the right output but in the end it lists column names.
Please see attached screenshot.
Any help would be much appreciated :)
![Creating Hive table
]1
Select* from emp_tb screenshot does not give column names
You can skip header from being selected using
tblproperties("skip.header.line.count"="1") Add this at the end of your table DDL.
Or you can alter existing table:
ALTER TABLE emp_tb SET TBLPROPERTIES ("skip.header.line.count"="1");
If you want header to be displayed in Hive CLI, set this property in Hive:
set hive.cli.print.header=true;
You can skip header from being selected using
tblproperties("skip.header.line.count"="1") Add this at the end of your table DDL.
Or you can alter existing table:
ALTER TABLE emp_tb SET TBLPROPERTIES ("skip.header.line.count"="1");
If you want header to be displayed in Hive CLI, set this property in Hive:
set hive.cli.print.header=true;
I have the following file on HDFS:
I create the structure of the external table in Hive:
CREATE EXTERNAL TABLE google_analytics(
`session` INT)
PARTITIONED BY (date_string string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/flumania/google_analytics';
ALTER TABLE google_analytics ADD PARTITION (date_string = '2016-09-06') LOCATION '/flumania/google_analytics';
After that, the table structure is created in Hive but I cannot see any data:
Since it's an external table, data insertion should be done automatically, right?
your file should be in this sequence.
int,string
here you file contents are in below sequence
string, int
change your file to below.
86,"2016-08-20"
78,"2016-08-21"
It should work.
Also it is not recommended to use keywords as column names (date);
I think the problem was with the alter table command. The code below solved my problem:
CREATE EXTERNAL TABLE google_analytics(
`session` INT)
PARTITIONED BY (date_string string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/flumania/google_analytics/';
ALTER TABLE google_analytics ADD PARTITION (date_string = '2016-09-06');
After these two steps, if you have a date_string=2016-09-06 subfolder with a csv file corresponding to the structure of the table, data will be automatically loaded and you can already use select queries to see the data.
Solved!