How can I migrate thousands of Hive tables to a MySQL database? - hive

Can this be possible with a sqoop export script? With the script below, I can only export one table at a time.
$sqoop export \
--connect jdbc:mysql://localhost/db \
--username root \
--table employee \
--export-dir /emp/emp_data
What are all the options we have to export the data from a Hive table to a MySQL database?

Here is a working code you can use -
sqoop export --connect jdbc:mysql://localhost/db \ -- conn info
--username xx --password pass12 \ -- user and pass of mysql DB
--hcatalog-database mydb --hcatalog-table myhdfstable \ -- hdfs table and database info
--hcatalog-partition-keys mypartcolumn --hcatalog-partition-values "PART_VAL1" \ -- optional hdfs table and partition info
--table mymysqltable \ -- mysql table name. you can keep same name or diffrent. columns sequence can be diffrent but data type should be compatible and greater or equal than hdfs
--m 4 \ -- number of mappers.
I have explained each command with comments. pls remove comments before running

Related

sqoop import changes ownership : hive table displays NULL values in columns

I have a table orders_1 with 300 records and I want to load data into hive table.
When I ran the sqoop hive-import command, table data was loaded in the /hive/warehouse/ location.But the user ownership has changed and I'm facing the issue when I am querying the hive table to view the loaded table data.It displays all the columns with NULL values.
sqoop-import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table orders_1 --hive-import --hive-database my_sqoop_db --hive-table hive_orders1 --hive-overwrite --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1;
Output :
INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar!/hive-log4j.properties
OK
Time taken: 1.039 seconds
Loading data to table my_sqoop_db.hive_orders1
chgrp: **changing ownership** of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/my_sqoop_db.db/hive_orders1/part-m-00000': User does not belong to hive
Table my_sqoop_db.hive_orders1 stats: [numFiles=1, **numRows=0**, totalSize=12466, rawDataSize=0]
OK
FYI , with the same user root the first time I was able to do sqoop hive-import and view the data in hive and do hive operations.
I could not figure out what went wrong. Can anybody please help me in resolving this issue ?

issues with sqoop and hive

We are facing the following issues details as follows, please share your inputs.
1) Issue with --validate option in sqoop
if we run the sqoop command without creating a job for it, validate works. But if we create a job first, with validate option the validate doesn't seem to work.
works with
sqoop import --connect "DB connection" --username $USER --password-file $File_Path --warehouse-dir $TGT_DIR --as-textfile --fields-terminated by '|' --lines-teriminated-by '\n' --table emp_table -m 1 --outdir $HOME/javafiles --validate
Does not work with
sqoop job --create Job_import_emp import --connect "DB connection" --username $USER --password-file $File_Path --warehouse-dir $TGT_DIR --as-textfile --fields-terminated by '|' --lines-teriminated-by '\n' --table emp_table -m 1 --outdir $HOME/javafiles --validate
2) Issue with Hive import
If we are importing data for the first time in hive, it becomes imperative to create hive table ( hive internal), so we keep "--create-hive-table" in sqoop command.
Even thouhg if i keep "--create-hive-table" option, Is there any way to skip create table step in hive while importing, if the table is already exists.
Thanks
Sheik
Sqoop allows --validate option only for sqoop import and sqoop export commands.
From the official Sqoop User guide, the validation has these limitations,
all-tables option
free-form query option
Data imported into Hive or HBase table
import with --where argument
No, the table check cannot be skip if --create-hive-table option is set, the job will fail if the target table exists.

Difference between 2 commands in Sqoop

Please tell me what is the difference between the 2 commands below
sqoop import --connect jdbc:mysql://localhost:3306/db1
--username root --password password
--table tableName --hive-table tableName --create-hive-table --hive-import;
sqoop create-hive-table --connect jdbc:mysql://localhost:3306/db1
--username root --password password;
What is the difference of using --create-hive-table & just create-hive-table in both the commands?
Consider the two queries:
1) When --create-hive-table is used, the contents of the RDBMS table will be copied to the location mentioned by --target-dir (HDFS Location). This will check whether the table sqoop.emp exists in Hive or not.
If the table in Hive doesn't exist, data from the HDFS location is moved to the hive table and everything goes well.
In case, if the table (sqoop.emp) already exists in Hive, an error is thrown: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table emp already exists)
Example:
sqoop import \
--connect jdbc:mysql://jclient.ambari.org/sqoop \
--username hdfs -P \
--table employee \
--target-dir /user/hive/sqoop/employee \
--delete-target-dir \
--hive-import \
--hive-table sqoophive.emp \
--create-hive-table \
--fields-terminated-by ',' \
--num-mappers 3
2) When create-hive-table is used without hive-import
The schema of the swoop.employee (in RDBMS) is fetched and using that a table is created under the default database in hive (default.employee). But no data is transferred.
Example (Modified form of one given in the book (Hadoop Definitive Guide by Tom White):
sqoop create-hive-table \
--connect jdbc:mysql://jclient.ambari.org/sqoop \
--username hdfs -P \
--table employee \
--fields-terminated-by ','
Now the question is, when to use what. Former is used when no data is only present in the RDBMS and we need to not only create but populate the table in Hive in one go.
The latter is used when the table has to be created in the Hive but not to be populated. Or in case when the data already exists in HDFS and it is to be used to populate the hive table.
sqoop-import --connect jdbc:mysql://localhost:3306/db1
>-username root -password password
>--table tableName --hive-table tableName --create-hive-table --hive-import;
The above command will import data from db into hive with hive default settings and if table is not already present it will create a table in Hive with same schema as it was in DB.
sqoop create-hive-table --connect jdbc:mysql://localhost:3306/db1
>-username root -password password;
The create-hive-table tool will create a table in Hive Metastore, with a definition for a table based on a database table previously imported to HDFS, or one planned to be imported(it will pick from sqoop job). This effectively performs the "--hive-import" step of sqoop-import without running the preceeding import.
For example consider you have imported table1 from db1 into hdfs using sqoop. If you execute create-hive-table next it will create a table in hive metastore with table schema from db1 of table1. So it will be usefull for you to load data into this table in future whenever needed.

How to import easily RDBMS data into HIVE partition tables

I have tables in my RDBMS. Now I have chosen 3rd column of that table as the partition column for my HIVE table.
Now how can I easily import my RDBMS table's data into HIVE table (considering the partition column)?
it works only for static partitions.
refer the below sqoop script for more details :
sqoop import
--connect "jdbc:mysql://quickstart.cloudera:3306/prac"
--username root
--password cloudera
--hive-import
--query "select id,name,ts from student where city='Mumbai' and \$CONDITIONS"
--hive-table prac.student
--hive-partition-key city
--hive-partition-value 'Mumbai'
--target-dir /user/mangesh/sqoop_import/student_temp5
--split-by id
Importing rdbms into hive may be achieved using sqoop.
Here is relevant info for importing into paritioned tables:
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_importing_data_into_hive
You can tell a Sqoop job to import data for Hive into a particular partition by specifying the --hive-partition-key and
--hive-partition-value arguments. The partition value must be a string. Please see the Hive documentation for more details on
partitioning.
For the dynamic partition you can use like
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/prac" \
--username root \
--password cloudera \
--table <mysql -tablename> \
--hcatalog-database <hive-databasename> \
--hcatalog-table <hive-table name> \

sqoop unable to import table with dot

I try to import a table with dot in it's name and sqoop send me schema doesn't exist
sqoop-import --connect jdbc:postgresql://db.xxxxxxxx:5432/production --driver org.postgresql.Driver --username xxxx --password xxxx --connection-manager org.apache.sqoop.manager.GenericJdbcManager --hive-database exxxxxxx --hive-import --warehouse-dir '/user/xxxxx/xxx_import/xxxx' --create-hive-table --table product.product
this work when import-all-tables but it's really slow and it always fail
Valid Hive table name consists of alphanumeric and underscore. In your example product.product means table product in database product.
odoo use underscore delimiter not dot