Inserting data into ORC tables in Hive2 - hive

I am trying to insert data into an ORC table with Hive v2. But each time I am getting an error:
ERROR : Job failed with java.lang.NoSuchMethodError:
org.apache.orc.TypeDescription.createRowBatch(I)Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch;
Am I missing any dependencies?

You can try this:
Create a Table to LOAD Text data:
CREATE TABLE txt_table(col1 <datatype>, col2 <datatype>) STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/../../file.txt' INTO TABLE txt_table;
Load Data into ORC Table:
CREATE TABLE orc_table(col1 <datatype>, col2 <datatype>) STORED AS ORC;
INSERT INTO TABLE orc_table SELECT * FROM txt_table;

Related

columns has 2 elements while hbase.columns.mapping has 3 elements error while creating hive table from hbase

I'm getting following error when I run the below command for creating hive table.
sample is my hive table I'm trying to create. hloan is my existing hbase table. Please help.
create external table sample(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:id,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="sample");
ERROR:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 2 elements while hbase.columns.mapping has 3 elements (counting the key if implicit))
As error describes your create external table statement having 2 columns id,name.
In Hbase mapping you are having 3 columns :key,hl:id,hl:name
Create table with 3 columns:
hive> create external table sample(key int, id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:id,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="hloan");
(or)
if key and id columns having same data then you can skip hl:id in mapping.
Create table with 2 columns:
hive> create external table sample(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES("hbase.columns.mapping"=":key,hl:name")
TBLPROPERTIES ("hbase.table.name"="hloan","hbase.mapred.output.outputtable"="hloan");

How to output a table as a parquet file in spark-sql, not spark-shell?

It is easy to read a table from CSV file using spark-sql:
CREATE TABLE MyTable (
X STRING,
Y STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\,",
"quoteChar" = "\""
)
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH 'input.csv' INTO TABLE MyTable;
But how can I output this result as Parquet file?
PS: I know how to do that in spark-shell, but it is not what I'm looking for.
You have to create one table with the schema of your results in hive stored as parquet. After getting the results you can export them into the parquet file format table like this.
set hive.insert.into.external.tables = true
create external table mytable_parq ( use your source table DDL) stored as parquet location '/hadoop/mytable';
insert into mytable_parq select * from mytable ;
or
insert overwrite directory '/hadoop/mytable' STORED AS PARQUET select * from MyTable ;

How to insert multivalued field into one column in hive

I have csv file of format
(id,name,courses)
and data is like
1,"ABC","C,C++,DS"
2,"DEF","Java"
How to load such type of data in hive?
First, create a table
hive>create table tablename(text STRING, count INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then load data to hive:
hive>LOAD DATA INPATH '/hdfspath' OVERWRITE INTO TABLE tablename;

Does hive need exact schema in Hive Export/Import?

I am using HDP 2.3
Hadoop version - 2.7.1
Hive version - 1.2.1
I created a table dev101 in hive using
create table dev101 (col1 int, col2 char(10));
I inserted two records using
insert into dev101 values (1, 'value1');
insert into dev101 values (2, 'value2');
I exported data to HDFS using
export table dev101 to '/tmp/dev101';
Then, I created a new table dev102 using
create table dev102 (col1 int, col2 String);
I imported data from /tmp/dev10 in dev102 using
import table dev102 from '/tmp/dev101';
I got error:
FAILED: SemanticException [Error 10120]: The existing table is not compatible with the import spec. Column Schema does not match
Then I created another table dev103 using
create table dev103 (col1 int, col2 char(50));
Again imported:
import table dev103 from '/tmp/dev101';
Same error:
FAILED: SemanticException [Error 10120]: The existing table is not compatible with the import spec. Column Schema does not match
Finally, I create table with exactly same schema
create table dev104 (col1 int, col2 char(10));
And imported
import table dev104 from '/tmp/dev101';
Imported Successfully.
Does hive need exact schema in Hive Export/Import?
On Hive Export, it create _metadata and data directories where it keeps metadata and data respectively.
On Hive Import, you need new table (not present in hive) or blank table with exact same metadata.

how to load hive RC format table from a text file

I have a text file with pipe as delimiter. I want to load the data into a RC formatted hive partition table. I can do that in two steps process. i.e. first loaded into an text format external table and from there I can load into the RC formatted partition table.
But the question is can this be done in one step process using load hadoop command (considering in run time I am not sure about the different partition values).
I have tried below load data inpath 'hdfs_file_or_directory_path' OVERWRITE INTO TABLE table1 PARTITION (YEAR_DT) but getting error.
The RC formatted table structure is as below:
CREATE EXTERNAL TABLE TEST.TABLE1(
Col1 DATE,
Col2 INT,
Col3 DOUBLE,
Col4 VARCHAR(2),
Col5 VARCHAR(3),
Col6 SMALLINT,
Col7 TIMESTAMP
)
partitioned by (YEAR_DT INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS RCFILE
LOCATION
'hdfs_file_or_directory_path'
TBLPROPERTIES ('transactional'='true');
The error I am getting is given below:
hive> load data inpath '<hdfs path>' OVERWRITE INTO TABLE TABLE1 PARTITION(year_dt);
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Invalid partition key & values; keys [year_dt, ], values [])
hive> load data inpath '<hdfs path>' OVERWRITE INTO TABLE TABLE1 PARTITION(year_dt = 2014);
Loading data to table test.test1 partition (year_dt=2014)
Failed with exception Wrong file format. Please check the file's format.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask