Creating a hive table without ROW FORMATTED comma delimated columns - hive

I have a .CSV comma delimited file
c1,c2,c3,c4
d1,d2,d3,d4
My requirement is to create an external hive table which has a single field named item and containing each row of my CSV file regardless of the comma delimited columns.
What is the hive query for create table which I have to use?

Create hive table without specifying row formatted and hive defaults to cntrl+A(^A) delimiter.
as your data is comma delimited so all data will be read by single field name.
Example:
create external table i(item string) location '<your_directory_path>';
Here item field will have all the data!

Related

Skipping header in hive is removing first line of my data

I have the following query in hive:
CREATE EXTERNAL TABLE shop.id_store (
person_id INT,
shop_category STRING
)
row format delimited fields terminated by ',' stored as textfile
LOCATION "user/schema/table"
tblproperties('skip.header.line.count'='1', 'external.table.purge'='true');
LOAD DATA INPATH 'tmp/ids.csv' OVERWRITE INTO TABLE shop.id_store;
INSERT OVERWRITE TABLE shop.id_store
SELECT
*
FROM
shop.id_store
my csv ids.csv, does contain headers, however i have noticed that the above code actually removes the first row of my actual data. What is going on?

How to create imapala table with complex data type and how I can specify delimiter for array type column

I am trying to create impala table with array column type, I have to use custom delimiter for array type column.
I tried below query. But, its throwing error.
Create table array_demo( arra_col ARRAY<string>) row format delimited fields terminated by ','
collection items terminated by '|' stored as parquet
You should omit the ROW FORMAT clause and the subclauses specifying the terminators, and include a STORED AS clause (Parquet is the only format Impala supports with complex data).
The data files to load the table have to be in parquet format too.
If you don't have the data file in Parquet format, you can create the table in Hive,
then create a copy using CREATE TABLE … AS SELECT (CTAS statement), with STORED AS PARQUET.
You then can query the table in Impala.
As an example
-- Create table in Hive
CREATE TABLE array_demo( arra_col ARRAY<STRING>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
STORED AS TEXTFILE;
-- Copy the table as parquet format
CREATE TABLE array_demo_impala AS
SELECT *
FROM array_demo
STORED AS PARQUET;

Load into Hive table imported entire data into first column only

I am trying to copy the Hive data from one server to another server. By this, I am exporting into hive data into CSV from server1 and trying to import that CSV file into Hive in server2.
My table contains following datatypes:
bigint
string
array
Here is my commands:
export:
hive -e 'select * from sample' > /home/hadoop/sample.csv
import:
load data local inpath '/home/hadoop/sample.csv' into table sample;
After importing into Hive table, entire row data into inserted into first column only.
How can I overcome this, or else is there a better way to copy data from one server to another server?
While creating table add below line at the end of create statment
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Like Below:
hive>CREATE TABLE sample(id int,
name String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then Load Data:
hive>load data local inpath '/home/hadoop/sample.csv' into table sample;
For Your Example
sample.csv
123,Raju,Hello|How Are You
154,Nishant,Hi|How Are You
So In above sample data first column is bigint, second is String and third is Array separated by |
hive> CREATE TABLE sample(id BIGINT,
name STRING,
messages ARRAY<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|';
hive> LOAD DATA LOCAL INPATH '/home/hadoop/sample.csv' INTO TABLE sample;
Most important point :
Define delimiter for collection items and don't impose the array
structure you do in normal programming.
Also, try to make the field
delimiters different from collection items delimiters to avoid
confusion and unexpected results.
You really should not be using CSV as your data transfer format
DistCp copies data between Hadoop clusters as-is
Hive supports Export, Import
Circus Train allows Hive table replication
why not use hadoop command to transfer data from one cluster to another such as
bash$ hadoop distcp hdfs://nn1:8020/foo/bar \
hdfs://nn2:8020/bar/foo
then load the data to your new table
load data inpath '/bar/foo/*' into table wyp;
your problem may caused by the delimiter
,The default delimiter '\001' if you havn't set when create a hivetable ..
if you use hive -e 'select * from sample' > /home/hadoop/sample.csv will make all cloumn to one cloumn

Creation of a partitioned external table with hive: no data available

I have the following file on HDFS:
I create the structure of the external table in Hive:
CREATE EXTERNAL TABLE google_analytics(
`session` INT)
PARTITIONED BY (date_string string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/flumania/google_analytics';
ALTER TABLE google_analytics ADD PARTITION (date_string = '2016-09-06') LOCATION '/flumania/google_analytics';
After that, the table structure is created in Hive but I cannot see any data:
Since it's an external table, data insertion should be done automatically, right?
your file should be in this sequence.
int,string
here you file contents are in below sequence
string, int
change your file to below.
86,"2016-08-20"
78,"2016-08-21"
It should work.
Also it is not recommended to use keywords as column names (date);
I think the problem was with the alter table command. The code below solved my problem:
CREATE EXTERNAL TABLE google_analytics(
`session` INT)
PARTITIONED BY (date_string string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/flumania/google_analytics/';
ALTER TABLE google_analytics ADD PARTITION (date_string = '2016-09-06');
After these two steps, if you have a date_string=2016-09-06 subfolder with a csv file corresponding to the structure of the table, data will be automatically loaded and you can already use select queries to see the data.
Solved!

populating Hive table from file yields far too many rows

I am creating a Hive table from a file with 8k rows, but the table created has 78k rows. The command line is the following:
bin/hive_executable < my_script.hql
my_script.hql:
create table my_table(k1 t1, k2 t2....);
load data local inpath 'path/to/table_file.txt' INTO TABLE my_table;
table_file.txt:
v1 v2 v3...
I've tried both space and tab delimited fields, and explicitly declaring the structure in the create table statement. When I use example code to create a table from $HIVE_HOME/example/file/kv1.txt, the table and file both have 500 lines / rows.
Any ideas?
Thanks
Strip text fields of all newline characters.