I have a text file with pipe as delimiter. I want to load the data into a RC formatted hive partition table. I can do that in two steps process. i.e. first loaded into an text format external table and from there I can load into the RC formatted partition table.
But the question is can this be done in one step process using load hadoop command (considering in run time I am not sure about the different partition values).
I have tried below load data inpath 'hdfs_file_or_directory_path' OVERWRITE INTO TABLE table1 PARTITION (YEAR_DT) but getting error.
The RC formatted table structure is as below:
CREATE EXTERNAL TABLE TEST.TABLE1(
Col1 DATE,
Col2 INT,
Col3 DOUBLE,
Col4 VARCHAR(2),
Col5 VARCHAR(3),
Col6 SMALLINT,
Col7 TIMESTAMP
)
partitioned by (YEAR_DT INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS RCFILE
LOCATION
'hdfs_file_or_directory_path'
TBLPROPERTIES ('transactional'='true');
The error I am getting is given below:
hive> load data inpath '<hdfs path>' OVERWRITE INTO TABLE TABLE1 PARTITION(year_dt);
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [year_dt, ], values [])
hive> load data inpath '<hdfs path>' OVERWRITE INTO TABLE TABLE1 PARTITION(year_dt = 2014);
Loading data to table test.test1 partition (year_dt=2014)
Failed with exception Wrong file format. Please check the file's format.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Related
I am trying to insert data into an ORC table with Hive v2. But each time I am getting an error:
ERROR : Job failed with java.lang.NoSuchMethodError:
org.apache.orc.TypeDescription.createRowBatch(I)Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch;
Am I missing any dependencies?
You can try this:
Create a Table to LOAD Text data:
CREATE TABLE txt_table(col1 <datatype>, col2 <datatype>) STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/../../file.txt' INTO TABLE txt_table;
Load Data into ORC Table:
CREATE TABLE orc_table(col1 <datatype>, col2 <datatype>) STORED AS ORC;
INSERT INTO TABLE orc_table SELECT * FROM txt_table;
I am trying to load a local file with "|" delimited values into hive table, we usually create a table with option "ROW FORMAT DELIMITER "|" . But I want to create a normal table and load data . What is the right syntax I need to use, please suggest.
Working Code
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE;
But I want to do :
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
);
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE FIELDS TERMINATED BY '|';
Reason begin: If i create a table, HDFS will store the data in the table with "|" delimeter
With second DDL you have provided, Hive will create default formatted table like Textformat,orc,parquet..etc(as per your configuration) with cntrl+A delimited file(default delimiter in hive).
If you want to store the hdfs file with pipe delimited then we need to create Hive Table in Text with | delimiter.
(or)
You can also write the result of select query to local (or) HDFS path with pipe delimiter also.
I am trying to upload 3,000 records to a Teradata table and I am getting the following error:
Error reading import file at record 1: Index and length must refer to
a location within the string
I am importing the data with a txt file and loading it with the following code:
-- Create Table
CT mytable
( col1 VARBYTE (35))
-- Insert data
INSERT INTO mytable VALUES(?)
The text file looks something like this
812619
816625
2B01112
...
I have csv file of format
(id,name,courses)
and data is like
1,"ABC","C,C++,DS"
2,"DEF","Java"
How to load such type of data in hive?
First, create a table
hive>create table tablename(text STRING, count INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then load data to hive:
hive>LOAD DATA INPATH '/hdfspath' OVERWRITE INTO TABLE tablename;
I am using HDP 2.3
Hadoop version - 2.7.1
Hive version - 1.2.1
I created a table dev101 in hive using
create table dev101 (col1 int, col2 char(10));
I inserted two records using
insert into dev101 values (1, 'value1');
insert into dev101 values (2, 'value2');
I exported data to HDFS using
export table dev101 to '/tmp/dev101';
Then, I created a new table dev102 using
create table dev102 (col1 int, col2 String);
I imported data from /tmp/dev10 in dev102 using
import table dev102 from '/tmp/dev101';
I got error:
FAILED: SemanticException [Error 10120]: The existing table is not compatible with the import spec. Column Schema does not match
Then I created another table dev103 using
create table dev103 (col1 int, col2 char(50));
Again imported:
import table dev103 from '/tmp/dev101';
Same error:
FAILED: SemanticException [Error 10120]: The existing table is not compatible with the import spec. Column Schema does not match
Finally, I create table with exactly same schema
create table dev104 (col1 int, col2 char(10));
And imported
import table dev104 from '/tmp/dev101';
Imported Successfully.
Does hive need exact schema in Hive Export/Import?
On Hive Export, it create _metadata and data directories where it keeps metadata and data respectively.
On Hive Import, you need new table (not present in hive) or blank table with exact same metadata.