How to load data into Hive table - sql

I'm using the hortonworks's Hue (more like a GUI interface that connects hdfs, hive, pig together)and I want to load the data within the hdfs into my current created table.
Suppose the table's name is "test", and the file which contains the data, the path is:
/user/hdfs/test/test.txt"
But I'm unable to load the data into the table, I tried:
load data local inpath '/user/hdfs/test/test.txt' into table test
But there's error said can't find the file, there's no matching path.
I'm still so confused.
Any suggestions?
Thanks

As you said "load the data within the hdfs into my current created table".
But in you command you are using :
load data local inpath '/user/hdfs/test/test.txt' into table test
Using local keyword it looks for the file in your local filesystem. But you file is in HDFS.
I think you need to remove local keyword from you command.
Hope it helps...!!!

Since you are using the hue and the output is showing not matching path. I think you have to give the complete path.
for example:
load data local inpath '/home/cloudera/hive/Documents/info.csv' into table tablename; same as you can give the complete path where the hdfs in which the document resides.
You can use any other format file

remove local keyword as ur referring to local file system

Related

How do I load data into Cloudera Impala Table?

I'm loading data into a Cloudera Impala ODBC table using a post SQL statement but I'm getting a "URI path must be absolute" error. Below is my SQL.
REFRESH sw_cfnusdata.CPN_Sales_Data;
DROP TABLE IF EXISTS sw_cfnusdata.CPN_Sales_Data_parquet;
CREATE TABLE IF NOT EXISTS sw_cfnusdata.CPN_Sales_Data_parquet LIKE
sw_cfnusdata.CPN_Sales_Data STORED AS PARQUET;
REFRESH sw_cfnusdata.CPN_Sales_Data_parquet;
LOAD DATA INPATH 'data/shared_workspace/sw_cfnusdata/Alteryx_CPN_Sales_Data' OVERWRITE INTO TABLE sw_cfnusdata.CPN_Sales_Data_parquet;
REFRESH sw_cfnusdata.CPN_Sales_Data_parquet;
COMPUTE STATS sw_cfnusdata.CPN_Sales_Data;
DROP TABLE sw_cfnusdata.CPN_Sales_Data;
Any ideas on what I'm missing here. I tried the same statement without the Compute Stats function and still got the same error. Thank you in advance.
You need to provide hdfs path.
Upload that file into hdfs and try same command with hdfs path like hdfs://DEV/data/sampletable.
Or else you can upload the file into local disc and try below command
load data local inpath "/data/sampletable.txt" into table sampletable;
So, below section need to be changed and you need to add either hdfs path or local path.
LOAD DATA INPATH 'data/shared_workspace/sw_cfnusdata/Alteryx_CPN_Sales_Data' OVERWRITE INTO TABLE sw_cfnusdata.CPN_Sales_Data_parquet;

LOAD DATA INPATH table files start with some string in Impala

Just a simple question, I'm new in Impala.
I want to load data from the HDFS to my datalake using impala.
So I have a csv this_is_my_data.csv and what I want to do is load the file without specify all the extension, I mean something like the following:
LOAD DATA INPATH 'user/myuser/this_is.* INTO TABLE my_table
This is, a string starting with this_is and whatever follows.
If you need some additional information, please let me know. Thanks in advance.
The documentation says:
You can specify the HDFS path of a single file to be moved, or the
HDFS path of a directory to move all the files inside that directory.
You cannot specify any sort of wildcard to take only some of the files
from a directory.
The workaround is to put your files into table directory using mv or cp command. Check your table directory using DESCRIBE FORMATTED command and run mv or cp command (in a shell, not Impala of course):
hdfs dfs -mv "user/myuser/this_is.*" "/user/cloudera/mytabledir"
Or put files you need to load into some directory first then load all the directory.

Creating external table from file, not directory

When I run the create external table query, I have to provide a directory for the 'Location' attribute. But if the directory I point to has more than one file, then it reads both files. For example, if I put LOCATION 'dir1/', and dir1 contains file1 and file2, both files will be read.
To avoid this, I want to point to a single file. When I tried LOCATION 'dir1/file1', it gave me an error that the file path is not a directory or unable to create one. Is there a way to point to just the single file?
If You want to load data from HDFS so try this
LOAD DATA INPATH '/user/data/file1' INTO TABLE table1;
And if you want to load data from local storage so,
LOAD DATA LOCAL INPATH '/data/file1' INTO TABLE table1;

hive external table location vs load path

By going through the internet about external tables and managed table, I understood that we need to specify the Location while creating the external table as hive will create the tables in the given location but in case of managed table, the default directory mentioned in hive.metastore.warehouse.dir will be used.
Please correct me if anything wrongly stated.
What confusing me is:
Is the LOCATION clause used to specify where the data exist for External table or where to create the directory to store the actual data?
If the LOCATION clause is used to specify where the data exist, then why are we using the PATH clause in the LOAD statement.
The location clause in the DDL of an external table is used to
specify the hdfs location where the data needs to be stored. Later
on when we query the table the data would be read from this specified
path.
The load data inpath is the path of the source file from where the data
is loaded into the table. The source could be either a local file
path or a hdfs file path.
Hope I have cleared your confusion.

Loading data into a table

I'm new to Hive and using DBVisualizer for hive
I have a text file in the path *D:\data files\datafiles*. I want to load data from one of the files to a table created in hive. while i'm trying the following,
load data inpath "D:\data files\sample.txt" into table sample;
It is showing error like,
cause: FAILED: Error in semantic analysis: Line 1:17 Invalid path "D:\data files\sample.txt": only "file" or "hdfs" file systems accepted
How can proceed, to place that file in correct path and where to place it??
either you can upload that file into hdfs and try same command with hdfs path.
or
you may use local keyword as below.
load data local inpath "D:\data files\sample.txt" into table sample;
check this for more details
Backslashes may be problem here. Try:
load data inpath "D:/data files/sample.txt" into table sample;
If you are loading data from your local machine to HDFS we have to use "LOCAL" in load data command:
load data LOCAL inpath "D:\data files\sample.txt" into table sample;
There are two ways to load the data.
First load data from local and another load from HDFS... but the path is vary on the OS.
If you load data from Linux:
load data local inpath '/home/local/path/sample.txt' into table sample.//Local path
load data inpath '/home/hadoop/path/sample.txt' into table sample.// Hadoop path
If in windows:
load data inpath "D:/data files/sample.txt" into table sample; //Here carefully observe / not \ ok.
load data local inpath "D:/data files/sample.txt" into table sample; //local path it is
Check once.
load data local inpath "D:\data files\sample.txt" into table sample;
by using above command it looks for hdfs location but mentioned path is local environment So use below command then only we can solve the issue
load data local inpath "D:\data files\sample.txt" overwrite into table sample;
By using above command data overwrited into mentioned table
You might not have stored sample.txt file as ".txt" file.
Please check if the file is saved properly as ".txt" file and try again.
when you want to load data from Edge node to HDFS you have to go for
load data local inpath '/user/cloudera/datah/txns' into table txn_externalh;
when you want to load data from HDFS node to HIVE you have to go for
load data inpath '/user/cloudera/datah/txns' into table txn_externalh;