How to create an external Hive table if the field value has comma separated values - hive

I had used sqoop-import command to sqoop the data into Hive from teradata. Sqoop-import command is creating a text file with comma(,) as the delimiter.
After Sqooping, I had created an external table as shown below:
CREATE EXTERNAL TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, description String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
But description column has values like this:"abc,xyz,mnl". Due to this,loading of data into a hive table is not proper. Then how to create a text file with a delimiter other than comma while sqooping.
Then how to delimit the fields while creating an external table of Hive?

Use --fields-terminated-by in your Sqoop job if you want to avoid the default delimiter.
--fields-terminated-by - This parameter is used for field separator character in output.
Example: --fields-terminated-by |
and then change fields separator in create table statement by FIELDS TERMINATED BY ‘|’

Related

How to load a "|" delimited file into hive without creating a hive table with "ROW FORMAT DELIMITER"

I am trying to load a local file with "|" delimited values into hive table, we usually create a table with option "ROW FORMAT DELIMITER "|" . But I want to create a normal table and load data . What is the right syntax I need to use, please suggest.
Working Code
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE;
But I want to do :
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
);
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE FIELDS TERMINATED BY '|';
Reason begin: If i create a table, HDFS will store the data in the table with "|" delimeter
With second DDL you have provided, Hive will create default formatted table like Textformat,orc,parquet..etc(as per your configuration) with cntrl+A delimited file(default delimiter in hive).
If you want to store the hdfs file with pipe delimited then we need to create Hive Table in Text with | delimiter.
(or)
You can also write the result of select query to local (or) HDFS path with pipe delimiter also.

Load data into HIVE table

My data format is:
1::Toy Story (1995)::Animation|Children's|Comedy
when I try to load data into Hive 3rd column is reading from file .
I created table as follows:
hive> create table movies(mid int,mname string,gn string)
row format delimited
fields terminated by '::'
lines terminated by '\n'
stored as TEXTFILE;
if the table wont read the data try changing the fields delimiter with the relevant unicode of '::'.
hive> create table movies(mid int,mname string,gn array<string>)
row format delimited
fields terminated by '::'
collection items terminated by '|'
lines terminated by '\n'
stored as TEXTFILE;
Now you can load your dataset.

Hive: using quote character as delimiter in data files

Can we use quote (" or ') as delimiter in hive data files? If not why?
If we could refer to a list of characters which we can use as delimiters for hive data, that would be great.
When using the decimal notation, you can use the whole basic ascii range (decimal 0-127) - tested.
Avoid using \n or\r.
As for " and ', it can be done straightforward -
create table mytable (i int,j int) row format delimited fields terminated by '"';
create table mytable (i int,j int) row format delimited fields terminated by "'";
or
create table mytable (i int,j int) row format delimited fields terminated by '\'';
create table mytable (i int,j int) row format delimited fields terminated by "\"";

how to alter schema by inserting a new column in hive

I have a hive table stored on the cluster. I want to modify it by adding a new column, and have the old columns data with the data of the new column added from another table. Is there a way to do it without recreating the table?
the old schema looks like:
create external table XXX
(item_id bigint,
start_dt string,
end_dt string,
title string,
subtitle string,
description string)
row format delimited fields terminated by '\t' lines terminated by '\n'
stored as textfile
location '/user/me/XXX';
You should be able to do it using below syntax.
ALTER TABLE table_name
[PARTITION partition_spec] -- (Note: Hive 0.14.0 and later)
ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
[CASCADE|RESTRICT] -- (Note: Hive 0.15.0 and later)

Update column in HIVE

I have a table in php which is in this format:
CREATE EXTERNAL TABLE IF NOT EXISTS {$tableName} (fileContent VARCHAR(250), description VARCHAR(250), dimension DOUBLE, fileName VARCHAR(250)) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/var/www/ASOIS_Proiect/metadata/'
I want for a situation to update only description field if fileName='a' and 'size'='12' already exist in database.
Any idea please? I tried to update the file create for insert with command LOAD and flag OVERWRITE but it is not working.