How to load XML data file into Hive table? - hive

While loading XML data file into HIVE table i got following error message:
FAILED: SemanticException 7:9 Input format must implement InputFormat. Error encountered near token 'StoresXml'.
The way i am loading the XML file is as follows :
**Create a table StoresXml
'CREATE EXTERNAL TABLE StoresXml (storexml string)
STORED AS INPUTFORMAT 'org.apache.mahout.classifier.bayes.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/hive/warehouse/stores';'
** Location /user/hive/warehouse/stores is in HDFS.
load data inpath <local path where the xml file is stored> into table StoresXml;
Now,problem is when i select any column from table StoresXml ,the above mentioned error comes up.
Please help me with it.Where i am going wrong?

1) first you need to create single column table like
CREATE TABLE xmlsample(xml string);
2) after that you need to load data in local/hdfs to hive table like
LOAD DATA INPATH '---------' INTO TABLE XMLSAMPLE;
3) NEXT BY USING XPATH, XPATH_ARRAY,XPATH_STRING LIKE SAMPLE XML QUERIES..

I have just loaded this transactions.xml file into hive table using xpath
for XML file:
**Bring records of xml file into one line:
terminal> cat /home/cloudera/Desktop/Test/Transactions_xml.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</record>|</record>\n|g' | grep -v '^\s*$' > /home/cloudera/Desktop/trx_xml;
terminal> hadoop fs -put /home/cloudera/Desktop/trx_xml.xml /user/cloudera/DataTest/Transactions_xml
hive>create table Transactions_xml1(xmldata string);
hive>load data inpath '/user/cloudera/DataTest/Transactions_xml' overwrite into table Transactions_xml1;
hive>create table Transactions_xml(trx_id int,account int,amount int);
hive>insert overwrite table Transactions_xml select xpath_int(xmldata,'record/Tid'),
xpath_int(xmldata,'record/AccounID'),
xpath_int(xmldata,'record/Amount') from Transactions_xml1;
I hope this will help you. Let me know the result.

I have developed a tool to generate hive scripts from a csv file. Following are few examples on how files are generated.
Tool -- https://sourceforge.net/projects/csvtohive/?source=directory
Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/
Tool Generates Hadoop script with all csv files and following is a sample of
generated Hadoop script to insert csv into Hadoop
#!/bin/bash -v
hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv
hive -f ./AllstarFull.hive
hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv
hive -f ./Appearances.hive
hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv
hive -f ./AwardsManagers.hive
Sample of generated Hive scripts
CREATE DATABASE IF NOT EXISTS lahman;
USE lahman;
CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
SELECT * FROM AllstarFull;
Thanks
Vijay

Related

Bash script to import the multiple CSV files into a mysql database using load data local infile command

I have Multiple CSV files which are stored in one of the folder then I need to use these folder to fetch the csv files then load them into Database Table.
This script need to prepare in Bash with parameterized fields like InputFolderPath(loop Csv Files), DatabaseConnection, SchemaName, TableName then pass these fields using
Load Data Local Infile Command.
This worked for me,
for f in /var/www/path_to_your_folder/*.csv
do
mysql -e "use database_name" -e "
load data local infile '"$f"' into table your_table_name FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (column_name1, #date_time_variable1, column_name3)
SET some_column_name_which_contains_date = STR_TO_DATE(#date_time_variable1, '%d-%m-%Y');" -u your_mysql_username_here --p --local-infile=1
echo "Done: '"$f"' at $(date)"
done
This script will prompt password for mysql.
i am using this script on ec2 + ubuntu

What does 'insert overwrite local directory' mean in Hive?

Im having some issues understanding what does the following type of query do:
insert overwrite local directory $directorey_name$
select $some_query$
What does this mean, and what are the side effects of this?
Export the query results into a file on the local file system
insert overwrite local directory '/tmp/hello'
row format delimited
fields terminated by '|'
select 1,2,3,'Hello','world'
;
! ls /tmp/hello;
000000_0
! cat /tmp/hello/000000_0;
1|2|3|Hello|world

LOAD DATA LOCAL INPUT PATH doesnt exist

I am new to the Spark and Scala Technology. I'm getting the following exception while trying to load a file from local file system into table using Spark.
Spark version -2.0 and Scala version - 2.11
scala> sqlContext.sql("LOAD DATA LOCAL INPATH 'file.txt' INTO TABLE student")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: file.txt
Please try to give complete path as file:/complete path to the file.
In above case:
sqlContext.sql("LOAD DATA LOCAL INPATH 'file:/complete path to the file.txt' INTO TABLE student")
~Kedar

not able to load file from hdfs to hive

I can see the file is on HDFS.
$hadoop fs -cat /user/root/1.txt
1
2
3
but from hive, it is not recognize the file.
hive> create table test4 (numm INT);
OK
Time taken: 0.187 seconds
hive> load data inpath '/user/root/1.txt' into table test4;
FAILED: SemanticException Line 1:17 Invalid path ''/user/root/1.txt'': No files matching path file:/user/root/1.txt
load file from local file system looks good.
Requesting you to please put the complete path for the file.
Eg. load data inpath 'Namenode:' in to table .
Hope this help. Please let me know if you still face any difficulties.

run os command and set out put to hive variable

Is it possible to run something like this in Hive CLI?
I am trying to pass file contents as a variable to another query.
set column_list=!cat /home/user/filename.lst ;
create table tabname as select $column_list from ...
if you have a query file you pass the variables as hiveconf
hive -hiveconf var1=abcd -f file.txt
or you can construct your query and then pass it to hive cli using -e
hive -e "create table ..."
file filename.lst
line
make a file test.sh,
temp=$(cat /home/user/filename.lst)
hive -f test.hql -hiveconf var=$temp
make a another file test.hql
create table test(${hiveconf:var} string);
on terminal
sh -x test.sh
It will pass the line to the test.hql and it will create a table with line as column;
note- all files should be in same directory .This script is passing only one variable.