I am trying to load a Hive table using the Sqoop import commands. But when I run it says that Sqoop doesn't support SEQUENCE FILE FORMAT while loading into hive.
Is this correct , I through SQOOP has matured for all the formats present in Hive. Can anyone guide me on this. And if at all standard procedure to load Hive tables which have SEQUENCE FILE FORMAT using sqoop.
Currently importing of sequence files directly into Hive is not supported yet. But you can import data --as-seuquencefile into HDFS and then you can create an external table on top of that. As you are saying you are getting exceptions even with this approach, please paste your sample code & logs, so that I can help you.
PFB code
sqoop import --connect jdbc:mysql://xxxxx/Emp_Details --username xxxx--password xxxx --table EMP --as-sequencefile --hive-import --target-dir /user/cloudera/emp_2 --hive-overwrite
Related
is it possible to do sqoop export from parquet partitioned hive table to oracle database?
our requirement is to use processed data to legacy system that cannot support hadoop/hive connection, thank you..
tried:
sqoop export -Dmapreduce.job.queuename=root.hsi_sqm \
--connect jdbc:oracle:thin:#host:1521:sid \
--username abc \
--password cde \
--export-dir '/user/hive/warehouse/stg.db/tb_parquet_w_partition/' \
--table UNIQSUBS_DAY
got error:
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://nameservice1/user/hive/warehouse/stg.db/tb_parquet_w_partition/.metadata
org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://nameservice1/user/hive/warehouse/stg.db/tb_parquet_w_partition/.metadata
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
at org.apache.sqoop.manager.OracleManager.exportTable(OracleManager.java:465)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
is there any correct approach for this?
We were facing similar issues.
Parquet creates .metadata folder. If you created the parquet using some other process , it might create like .metadata-00000 ( something similar ).
You can try renaming the folder to .metadata and try.
Else, if this does not works you can try with hcatalog sqoop export.
Hi for those encountering the same problem as me, here is my own solution (this could be vary depends on your environment)
write hive data to hdfs directory, you can use insert overwrite directory command in hive.
if you have deflated generated data from hive query in designated hdfs path, use this:
hdfs dfs -text <hdfs_path_file>/000000_0.deflate | hdfs dfs -put <target_file_name> <hdfs_target_path>
sqoop export the inflated files using sqoop export command, don't forget to map your column according to your data type in target table
We are facing the following issues details as follows, please share your inputs.
1) Issue with --validate option in sqoop
if we run the sqoop command without creating a job for it, validate works. But if we create a job first, with validate option the validate doesn't seem to work.
works with
sqoop import --connect "DB connection" --username $USER --password-file $File_Path --warehouse-dir $TGT_DIR --as-textfile --fields-terminated by '|' --lines-teriminated-by '\n' --table emp_table -m 1 --outdir $HOME/javafiles --validate
Does not work with
sqoop job --create Job_import_emp import --connect "DB connection" --username $USER --password-file $File_Path --warehouse-dir $TGT_DIR --as-textfile --fields-terminated by '|' --lines-teriminated-by '\n' --table emp_table -m 1 --outdir $HOME/javafiles --validate
2) Issue with Hive import
If we are importing data for the first time in hive, it becomes imperative to create hive table ( hive internal), so we keep "--create-hive-table" in sqoop command.
Even thouhg if i keep "--create-hive-table" option, Is there any way to skip create table step in hive while importing, if the table is already exists.
Thanks
Sheik
Sqoop allows --validate option only for sqoop import and sqoop export commands.
From the official Sqoop User guide, the validation has these limitations,
all-tables option
free-form query option
Data imported into Hive or HBase table
import with --where argument
No, the table check cannot be skip if --create-hive-table option is set, the job will fail if the target table exists.
I want to import data from Mysql using sqoop import but my requirement is i want to use 4 mappers but it should create only one file in hdfs target directory is there is any way to do this ?
No. there is no option in sqoop to re-partition files into 1 file.
I don't think this should be a headache of sqoop.
You can do it easily using getmerge feature of hadoop. Example:
hadoop fs -getmerge /sqoop/target-dir/ /desired/local/output/file.txt
Here
/sqoop/target-dir is the target-dir of your sqoop command (directory containing all the part files).
desired/local/output/file.txt is the combined single file.
you can use below sqoop command..!!
Suppose database name is prateekDB and table name is Emp...!!
sqoop import --connect "jdbc:mysql://localhost:3306/prateekDB" --username=root \
--password=data --table Emp --target-dir /SqoopImport --split-by empno
Add this option to sqoop
--num-mappers 1
the sqoop log shows:
Job Counters
Launched map tasks=1
Other local map tasks=1
and finally on hdfs ONE file is created.
I have a hive table which is stored in ORC files format. I want to export the data to a Teradata database. I researched sqoop but could not find a way to export ORC files.
Is there a way to make sqoop work for ORC ? or is there any other tool that I could use to export the data ?
Thanks.
You can use Hcatalog
sqoop export --connect "jdbc:sqlserver://xxxx:1433;databaseName=xxx;USERNAME=xxx;PASSWORD=xxx" --table rdmsTableName --hcatalog-database hiveDB --hcatalog-table hiveTableName
I have a parquet table formatted as follows:
.impala_insert_staging
yearmonth=2013-04
yearmonth=2013-05
yearmonth=2013-06
...
yearmonth=2016-04
Underneath each of these directories are my parquet files. I need to get them into my another table which just has a
.impala_insert_staging
file.
Please help.
The best I found is to pull the files in locally and sqoop them back up into a text table.
To pull the parquet table down I performed the following:
impala-shell -i <ip-addr> -B -q "use default; select * from <table>" -o filename '--output_delimiter=\x1A'
Unfortunately this adds the yearmonth value as another column on my table. So I either go into my 750GB file and sed/awk out that last column or use mysqlimport (since I'm using MySQL as well) to import only the columns I'm interested in.
Finally I'll sqoop up the data to a new text table.
sqoop import --connect jdbc:mysql://<mysqlip> --table <mysql_table> -uroot -p<pass> --hive-import --hive-table <new_db_text>