How do I load data into Cloudera Impala Table? - sql

I'm loading data into a Cloudera Impala ODBC table using a post SQL statement but I'm getting a "URI path must be absolute" error. Below is my SQL.
REFRESH sw_cfnusdata.CPN_Sales_Data;
DROP TABLE IF EXISTS sw_cfnusdata.CPN_Sales_Data_parquet;
CREATE TABLE IF NOT EXISTS sw_cfnusdata.CPN_Sales_Data_parquet LIKE
sw_cfnusdata.CPN_Sales_Data STORED AS PARQUET;
REFRESH sw_cfnusdata.CPN_Sales_Data_parquet;
LOAD DATA INPATH 'data/shared_workspace/sw_cfnusdata/Alteryx_CPN_Sales_Data' OVERWRITE INTO TABLE sw_cfnusdata.CPN_Sales_Data_parquet;
REFRESH sw_cfnusdata.CPN_Sales_Data_parquet;
COMPUTE STATS sw_cfnusdata.CPN_Sales_Data;
DROP TABLE sw_cfnusdata.CPN_Sales_Data;
Any ideas on what I'm missing here. I tried the same statement without the Compute Stats function and still got the same error. Thank you in advance.

You need to provide hdfs path.
Upload that file into hdfs and try same command with hdfs path like hdfs://DEV/data/sampletable.
Or else you can upload the file into local disc and try below command
load data local inpath "/data/sampletable.txt" into table sampletable;
So, below section need to be changed and you need to add either hdfs path or local path.
LOAD DATA INPATH 'data/shared_workspace/sw_cfnusdata/Alteryx_CPN_Sales_Data' OVERWRITE INTO TABLE sw_cfnusdata.CPN_Sales_Data_parquet;

Related

How Can we load data into hive using URL

I have created a table in hive and I need to load csv data into hive table,
but the data is in github (I have downloaded and tested it is working fine) I need to load data directly from URL is it possible to load data into hive from URL
something like this can work
LOAD DATA INPATH 'https://github.com/xx/stock-prices.csv' INTO TABLE
stocks;
Loading data from flat files into Hive can be done using below command.
From Apache Hive Wiki:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
If the keyword LOCAL is specified, Hive looks for file path in local filesystem and loads from there. If the keyword LOCAL is not specified, Hive looks for file path in HDFS filesystem and loads data there.
You can specify full URI for HDFS files as well as local files.
Example:
file:///user/data/project/datafolder (Local Path)
hdfs://namenode:10001/user/data/project/datafolder (HDFS path)
This means it is not possible to load data directly into hive from https. So you have to download the data first and load into hive.
This is not the solution but the correct answer.

loading data local to hive database facing issue

hive>LOAD DATA INPATH '/hadoop/abc/POC2_Setup/input/warr2_claims_5441F.csv' OVERWRITE INTO TABLE baumuster_pre_analysi_text;
FAILED: SemanticException Line 1:17 Invalid path
''/hadoop/abc/POC2_Setup/input/warr2_claims_5441F.csv'': No files
matching path
hdfs://localhost:9000/hadoop/abc/POC2_Setup/input/warr2_claims_5441F.csv
If we are loading from local file system, we need to use the keyword "local" as below,
LOAD DATA LOCAL INPATH 'your local file path' OVERWRITE INTO TABLE your-hive-table;
If loading from HDFS,
LOAD DATA INPATH 'your hdfs file path' OVERWRITE INTO TABLE your-hive-table;
If you are loading the data from local then you have to mention LOCAL .
hive>LOAD LOCAL DATA INPATH '/hadoop/abc/POC2_Setup/input/warr2_claims_5441F.csv' OVERWRITE INTO TABLE baumuster_pre_analysi_text;
and if your default directory is not set as local then you have to mention file:/ before the path.
To load data from local machine you can use the following command:
LOAD DATA LOCAL INPATH '/hadoop/abc/POC2_Setup/input/warr2_claims_5441F.csv' OVERWRITE INTO TABLE baumuster_pre_analysi_text;

Difference between `load data inpath ` and `location` in hive?

At my firm, I see these two commands used frequently, and I'd like to be aware of the differences, because their functionality seems the same to me:
1
create table <mytable>
(name string,
number double);
load data inpath '/directory-path/file.csv' into <mytable>;
2
create table <mytable>
(name string,
number double);
location '/directory-path/file.csv';
They both copy the data from the directory on HDFS into the directory for the table on HIVE. Are there differences that one should be aware of when using these? Thank you.
Yes, they are used for different purposes at all.
load data inpath command is use to load data into hive table. 'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.
load data inpath '/directory-path/file.csv' into <mytable>;
load data local inpath '/local-directory-path/file.csv' into <mytable>;
LOCATION keyword allows to point to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.
In other words, with specified LOCATION '/your-path/', Hive does not use a default location for this table. This comes in handy if you already have data generated.
Remember, LOCATION can be specified on EXTERNAL tables only. For regular tables, the default location will be used.
To summarize,
load data inpath tell hive where to look for input files and LOCATION keyword tells hive where to save output files on HDFS.
References:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
Option 1: Internal table
create table <mytable>
(name string,
number double);
load data inpath '/directory-path/file.csv' into <mytable>;
This command will remove content at source directory and create a internal table
Option 2: External table
create table <mytable>
(name string,
number double);
location '/directory-path/file.csv';
Create external table and copy the data into table. Now data won't be moved from source. You can drop external table but still source data is available.
When you drop an external table, it only drops the meta data of HIVE table. Data still exists at HDFS file location.
Have a look at this related SE questions regarding use cases for both internal and external tables
Difference between Hive internal tables and external tables?

How to load data into Hive table

I'm using the hortonworks's Hue (more like a GUI interface that connects hdfs, hive, pig together)and I want to load the data within the hdfs into my current created table.
Suppose the table's name is "test", and the file which contains the data, the path is:
/user/hdfs/test/test.txt"
But I'm unable to load the data into the table, I tried:
load data local inpath '/user/hdfs/test/test.txt' into table test
But there's error said can't find the file, there's no matching path.
I'm still so confused.
Any suggestions?
Thanks
As you said "load the data within the hdfs into my current created table".
But in you command you are using :
load data local inpath '/user/hdfs/test/test.txt' into table test
Using local keyword it looks for the file in your local filesystem. But you file is in HDFS.
I think you need to remove local keyword from you command.
Hope it helps...!!!
Since you are using the hue and the output is showing not matching path. I think you have to give the complete path.
for example:
load data local inpath '/home/cloudera/hive/Documents/info.csv' into table tablename; same as you can give the complete path where the hdfs in which the document resides.
You can use any other format file
remove local keyword as ur referring to local file system

Loading data into a table

I'm new to Hive and using DBVisualizer for hive
I have a text file in the path *D:\data files\datafiles*. I want to load data from one of the files to a table created in hive. while i'm trying the following,
load data inpath "D:\data files\sample.txt" into table sample;
It is showing error like,
cause: FAILED: Error in semantic analysis: Line 1:17 Invalid path "D:\data files\sample.txt": only "file" or "hdfs" file systems accepted
How can proceed, to place that file in correct path and where to place it??
either you can upload that file into hdfs and try same command with hdfs path.
or
you may use local keyword as below.
load data local inpath "D:\data files\sample.txt" into table sample;
check this for more details
Backslashes may be problem here. Try:
load data inpath "D:/data files/sample.txt" into table sample;
If you are loading data from your local machine to HDFS we have to use "LOCAL" in load data command:
load data LOCAL inpath "D:\data files\sample.txt" into table sample;
There are two ways to load the data.
First load data from local and another load from HDFS... but the path is vary on the OS.
If you load data from Linux:
load data local inpath '/home/local/path/sample.txt' into table sample.//Local path
load data inpath '/home/hadoop/path/sample.txt' into table sample.// Hadoop path
If in windows:
load data inpath "D:/data files/sample.txt" into table sample; //Here carefully observe / not \ ok.
load data local inpath "D:/data files/sample.txt" into table sample; //local path it is
Check once.
load data local inpath "D:\data files\sample.txt" into table sample;
by using above command it looks for hdfs location but mentioned path is local environment So use below command then only we can solve the issue
load data local inpath "D:\data files\sample.txt" overwrite into table sample;
By using above command data overwrited into mentioned table
You might not have stored sample.txt file as ".txt" file.
Please check if the file is saved properly as ".txt" file and try again.
when you want to load data from Edge node to HDFS you have to go for
load data local inpath '/user/cloudera/datah/txns' into table txn_externalh;
when you want to load data from HDFS node to HIVE you have to go for
load data inpath '/user/cloudera/datah/txns' into table txn_externalh;