Adding a comma separated table to Hive - hive

I have a very basic question which is: How can I add a very simple table to Hive. My table is saved in a text file (.txt) which is saved in HDFS. I have tried to create an external table in Hive which points out this file but when I run an SQL query (select * from table_name) I don't get any output.
Here is an example code:
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
LOCATION 'hdfs:///KibTEst/Data.txt';
KibTEst/Data.txt is the path of the text file in HDFS.
The rows in the table are seperated by carriage return, and the columns are seperated by commas.
Thanks for your help!

You just need to create an external table pointing to your file
location in hdfs and with delimiter properties as below:
create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 'hdfs:///KibTEst/Data.txt';
You need to run select query(because file is already in HDFS and external table directly fetches data from it when location is specified in create statement). So you test using below select statement:
SELECT * FROM Data;

create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
row format delimited
FIELDS TERMINATED BY ‘,’
stored as textfile
LOCATION 'Your hdfs location for external table';
If data in HDFS then use :
LOAD DATA INPATH 'hdfs_file_or_directory_path' INTO TABLE tablename
The use select * from table_name

create external table Data (
dummy INT,
account_number INT,
balance INT,
firstname STRING,
lastname STRING,
age INT,
gender CHAR(1),
address STRING,
employer STRING,
email STRING,
city STRING,
state CHAR(2)
)
row format delimited
FIELDS TERMINATED BY ','
stored as textfile
LOCATION '/Data';
Then load file into table
LOAD DATA INPATH '/KibTEst/Data.txt' INTO TABLE Data;
Then
select * from Data;

I hope, below inputs will try to answer the question asked by #mshabeen.
There are different ways that you can use to load data in Hive table that is created as external table.
While creating the Hive external table you can either use the LOCATION option and specify the HDFS, S3 (in case of AWS) or File location, from where you want to load data OR you can use LOAD DATA INPATH option to load data from HDFS, S3 or File after creating the Hive table.
Alternatively you can also use ALTER TABLE command to load data in the Hive partitions.
Below are some details
Using LOCATION - Used while creating the Hive table. In this case data is already loaded and available in Hive table.
**LOAD DATA INPATH** option - This Hive command can be used to load data from specified location. Point to remember here is, the data will get MOVED from input path to Hive warehouse path.
Example -
LOAD DATA INPATH 'hdfs://cluster-ip/path/to/data/location/'
Using ALTER TABLE command - Mostly this is used to add data from other locations into the Hive partitions. In this case it is required that all partitions are already defined and the values for the partitions are already known. In case of dynamic partitions this command is not required.
Example -
ALTER TABLE table_name ADD PARTITION (date_col='2018-02-21') LOCATION 'hdfs/path/to/location/'
The above code will map the partition to the specified data location (in this case HDFS). However, the data will NOT MOVED to Hive internal warehouse location.
Additional details are available here

Related

How Create a hive external table with parquet format

I am trying to create an external table in hive with the following query in HDFS.
CREATE EXTERNAL TABLE `post` (
FileSK STRING,
OriginalSK STRING,
FileStatus STRING,
TransactionType STRING,
TransactionDate STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS PARQUET TBLPROPERTIES("Parquet.compression"="SNAPPY")
LOCATION 'hdfs://.../post'
getting error
Error while compiling statement: FAILED: ParseException line 11:2
missing EOF at 'LOCATION' near ')'
What is the best way to create a HIVE external table with data stored in parquet format?
I am able to create table after removing property TBLPROPERTIES("Parquet.compression"="SNAPPY")
CREATE EXTERNAL TABLE `post` (
FileSK STRING,
OriginalSK STRING,
FileStatus STRING,
TransactionType STRING,
TransactionDate STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS PARQUET,
LOCATION 'hdfs://.../post'

How to handle the embedded commas in hive?

For example if I have a csv file with three cols,
sno,name,salary
1,latha, 2000
2,Bhavish, Chaturvedi, 3000
How to load this type of file in hive. I tried few of the posts from stackoverflow, but it didn't worked.
I have created a external table:
create external table test(
id int,
name string,
salary int
)
fields terminated by '\;'
stored as text file;
and loaded the data into it.
But when done select * from table, I got all null's into it.
I think CSV file has column name then you have to skip header to avoid the error follow the following steps:
Step 1: Create table e.g
CREATE TABLE salary (sno INT, name STRING, salary INT)
row format delimited fields terminated BY ',' stored as textfile
tblproperties("skip.header.line.count"="1");
Step 2: load the CSV file into table e.g
load data local inpath 'file path' into table salary;
Step 3: Test the records
select * from salary;

Hive and Hbase table for hipotesis

I have an IBM cloud where I have Hive/Hbase, I just create a "table" on Hive and I also load some data from a csv file.
My csv file contains information from google play store apps.
My commands for creating and upload data to my table are the following ones:
hive> create table if not exists app_desc (name string,
category string, rating int,
reviews int, installs string,
type string, price int,
content string, genres string,
last_update string, current_ver string,
android_ver string)
row format delimited fields terminated by ',';
hive > load data local inpath '/home/uamibm130/googleplaystore.csv' into table app_desc;
Ok, It works correctly and using a Select I obtain the data correctly.
Now what I want to do is to create a HBASE table, my problem is that I don't know how to do it correctly.
First of all I create a Hbase Db -> create google_db_ , google_data, info_data
Now I try to create an external table using this hive command, but what I am getting is an error that my table is not found.
This is the command I am using for the creation of the external hive table.
create external table uamibm130_hbase_google (name string, category string, rating int, reviews int, installs string, type string, price int, content string, genres string, last_update string, current_ver string, android_ver string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
google_data:category,google_data:rating, info_data:reviews,
info_data:installs, info_data:type, info_data:price, info_data:content,
info_data:genres, info_data:last_update, info_data:current_ver,
info_data:android_ver") TBLPROPERTIES("hbase.table.name" = "google_db_");
I don't know the correct way for the creation of Hbase table based on an Hive schema, for uploading correctly my .csv data.
Any idea ? I am new on it.
Thanks!
Try with below create table statement in HBase,
Create Hbasetable:
hbase(main):001:0>create 'google_db_','google_data','info_data'
Create Hive External table on Hbase:
hive> create external table uamibm130_hbase_google (name string, category string, rating int, reviews int, installs string, type string, price int, content string, genres string, last_update string, current_ver string, android_ver string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
google_data:category,google_data:rating, info_data:reviews,
info_data:installs, info_data:type, info_data:price, info_data:content,
info_data:genres, info_data:last_update, info_data:current_ver,
info_data:android_ver") TBLPROPERTIES("hbase.table.name" = "google_db_",
"hbase.mapred.output.outputtable" = "google_db_");
Then insert data into Hive-Hbase table(uamibm130_hbase_google) from Hive table(app_desc).
Insert data into Hive-Hbase table:
Hive> insert into table uamibm130_hbase_google select * from app_desc;

Hive: Partitioning by part of integer column

I want to create an external Hive table, partitioned by record type and date (year, month, day). One complication is that the date format I have in my data files is a single value integer yyyymmddhhmmss instead of the required date format yyyy-mm-dd hh:mm:ss.
Can I specify 3 new partition column based on just single data value? Something like the example below (which doesn't work)
create external table cdrs (
record_id int,
record_detail tinyint,
datetime_start int
)
partitioned by (record_type int, createyear=datetime_start(0,3) int, createmonth=datetime_start(4,5) int, createday=datetime_start(6,7) int)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");
If you want to be able to use MSCK REPAIR TABLE to add the partition for you based on the directories structure you should use the following convention:
The nesting of the directories should match the order of the partition columns.
A directory name should be {partition column name}={value}
If you intends to add the partitions manually then the structure has no meaning.
Any set values can be coupled with any directory. e.g. -
alter table cdrs
add if not exist partition (record_type='TYP123',createdate=date '2017-03-22')
location 'hdfs://nameservice1/tmp/sbx_unleashed.db/2017MAR22_OF_TYPE_123';
Assuming directory structure -
.../sbx_unleashed.db/record_type=.../createyear=.../createmonth=.../createday=.../
e.g.
.../sbx_unleashed.db/record_type=TYP123/createyear=2017/createmonth=03/createday=22/
create external table cdrs
(
record_id int
,record_detail tinyint
,datetime_start int
)
partitioned by (record_type int,createyear int, createmonth tinyint, createday tinyint)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;
Assuming directory structure -
.../sbx_unleashed.db/record_type=.../createdate=.../
e.g.
.../sbx_unleashed.db/record_type=TYP123/createdate=2017-03-22/
create external table cdrs
(
record_id int
,record_detail tinyint
,datetime_start int
)
partitioned by (record_type int,createdate date)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;

Create External Table atop pre-partitioned data

I have data that looks like this:
/user/me/output/
key1/
part_00000
part_00001
key2/
part_00000
part_00001
key3/
part_00000
part_00001
The data is pre-partitioned by "key_", and the "part_*" files contains my data in the form "a,b,key_". I create an external table:
CREATE EXTERNAL TABLE tester (
a STRING,
b INT
)
PARTITIONED BY (key STRING)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/me/output/';
But a SELECT * gives no output. How can I create an external table that will read in this partitioned data?
You will have to change your directory structure to make sure that hive reads the folders. It should be something like this.
/user/me/output/
key=key1/
part_00000
part_00001
key=key2/
part_00000
part_00001
key=key3/
part_00000
part_00001
Once this is done you can create a table on top of this using the query you mentioned.
CREATE EXTERNAL TABLE tester (
a STRING,
b INT
)
PARTITIONED BY (key STRING)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/me/output/';
You will also have to explicitly add the partitions or do a msck repair on the table to load the partitions with hive metadata. Any of these would do:
msck repair table tester;
OR
Alter table tester ADD PARTITION (key = 'key1');
Alter table tester ADD PARTITION (key = 'key2');
Alter table tester ADD PARTITION (key = 'key3');
Once you have done this, queries would return the output as present in your folders.