How to insert multivalued field into one column in hive - hive

I have csv file of format
(id,name,courses)
and data is like
1,"ABC","C,C++,DS"
2,"DEF","Java"
How to load such type of data in hive?

First, create a table
hive>create table tablename(text STRING, count INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then load data to hive:
hive>LOAD DATA INPATH '/hdfspath' OVERWRITE INTO TABLE tablename;

Related

How to load a "|" delimited file into hive without creating a hive table with "ROW FORMAT DELIMITER"

I am trying to load a local file with "|" delimited values into hive table, we usually create a table with option "ROW FORMAT DELIMITER "|" . But I want to create a normal table and load data . What is the right syntax I need to use, please suggest.
Working Code
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE;
But I want to do :
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
);
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE FIELDS TERMINATED BY '|';
Reason begin: If i create a table, HDFS will store the data in the table with "|" delimeter
With second DDL you have provided, Hive will create default formatted table like Textformat,orc,parquet..etc(as per your configuration) with cntrl+A delimited file(default delimiter in hive).
If you want to store the hdfs file with pipe delimited then we need to create Hive Table in Text with | delimiter.
(or)
You can also write the result of select query to local (or) HDFS path with pipe delimiter also.

How to handle the embedded commas in hive?

For example if I have a csv file with three cols,
sno,name,salary
1,latha, 2000
2,Bhavish, Chaturvedi, 3000
How to load this type of file in hive. I tried few of the posts from stackoverflow, but it didn't worked.
I have created a external table:
create external table test(
id int,
name string,
salary int
)
fields terminated by '\;'
stored as text file;
and loaded the data into it.
But when done select * from table, I got all null's into it.
I think CSV file has column name then you have to skip header to avoid the error follow the following steps:
Step 1: Create table e.g
CREATE TABLE salary (sno INT, name STRING, salary INT)
row format delimited fields terminated BY ',' stored as textfile
tblproperties("skip.header.line.count"="1");
Step 2: load the CSV file into table e.g
load data local inpath 'file path' into table salary;
Step 3: Test the records
select * from salary;

Load data into HIVE table

My data format is:
1::Toy Story (1995)::Animation|Children's|Comedy
when I try to load data into Hive 3rd column is reading from file .
I created table as follows:
hive> create table movies(mid int,mname string,gn string)
row format delimited
fields terminated by '::'
lines terminated by '\n'
stored as TEXTFILE;
if the table wont read the data try changing the fields delimiter with the relevant unicode of '::'.
hive> create table movies(mid int,mname string,gn array<string>)
row format delimited
fields terminated by '::'
collection items terminated by '|'
lines terminated by '\n'
stored as TEXTFILE;
Now you can load your dataset.

how to alter schema by inserting a new column in hive

I have a hive table stored on the cluster. I want to modify it by adding a new column, and have the old columns data with the data of the new column added from another table. Is there a way to do it without recreating the table?
the old schema looks like:
create external table XXX
(item_id bigint,
start_dt string,
end_dt string,
title string,
subtitle string,
description string)
row format delimited fields terminated by '\t' lines terminated by '\n'
stored as textfile
location '/user/me/XXX';
You should be able to do it using below syntax.
ALTER TABLE table_name
[PARTITION partition_spec] -- (Note: Hive 0.14.0 and later)
ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
[CASCADE|RESTRICT] -- (Note: Hive 0.15.0 and later)

Create External Table atop pre-partitioned data

I have data that looks like this:
/user/me/output/
key1/
part_00000
part_00001
key2/
part_00000
part_00001
key3/
part_00000
part_00001
The data is pre-partitioned by "key_", and the "part_*" files contains my data in the form "a,b,key_". I create an external table:
CREATE EXTERNAL TABLE tester (
a STRING,
b INT
)
PARTITIONED BY (key STRING)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/me/output/';
But a SELECT * gives no output. How can I create an external table that will read in this partitioned data?
You will have to change your directory structure to make sure that hive reads the folders. It should be something like this.
/user/me/output/
key=key1/
part_00000
part_00001
key=key2/
part_00000
part_00001
key=key3/
part_00000
part_00001
Once this is done you can create a table on top of this using the query you mentioned.
CREATE EXTERNAL TABLE tester (
a STRING,
b INT
)
PARTITIONED BY (key STRING)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/me/output/';
You will also have to explicitly add the partitions or do a msck repair on the table to load the partitions with hive metadata. Any of these would do:
msck repair table tester;
OR
Alter table tester ADD PARTITION (key = 'key1');
Alter table tester ADD PARTITION (key = 'key2');
Alter table tester ADD PARTITION (key = 'key3');
Once you have done this, queries would return the output as present in your folders.