Hadoop : Reading ORC files and putting into RDBMS? - hive

I have a hive table which is stored in ORC files format. I want to export the data to a Teradata database. I researched sqoop but could not find a way to export ORC files.
Is there a way to make sqoop work for ORC ? or is there any other tool that I could use to export the data ?
Thanks.

You can use Hcatalog
sqoop export --connect "jdbc:sqlserver://xxxx:1433;databaseName=xxx;USERNAME=xxx;PASSWORD=xxx" --table rdmsTableName --hcatalog-database hiveDB --hcatalog-table hiveTableName

Related

Command to import data from CSV to Avro table using sqoop

I have a csv file named test.csv on my HDFS.
I have created an Avro table (avro_test) using Hue with the same column names as the csv file. I want to use a sqoop command to put the csv elements in the Avro table.
What sqoop command will achieve this?
Sqoop is meant to load/transfer data between RDBMS and Hadoop. You can just insert the CSV data into the avro table you have created.
Please refer below link.
Load from CSV File to Hive Table with Sqoop?

sqoop export from hive partitioned parquet table to oracle

is it possible to do sqoop export from parquet partitioned hive table to oracle database?
our requirement is to use processed data to legacy system that cannot support hadoop/hive connection, thank you..
tried:
sqoop export -Dmapreduce.job.queuename=root.hsi_sqm \
--connect jdbc:oracle:thin:#host:1521:sid \
--username abc \
--password cde \
--export-dir '/user/hive/warehouse/stg.db/tb_parquet_w_partition/' \
--table UNIQSUBS_DAY
got error:
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://nameservice1/user/hive/warehouse/stg.db/tb_parquet_w_partition/.metadata
org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://nameservice1/user/hive/warehouse/stg.db/tb_parquet_w_partition/.metadata
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
at org.apache.sqoop.manager.OracleManager.exportTable(OracleManager.java:465)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
is there any correct approach for this?
We were facing similar issues.
Parquet creates .metadata folder. If you created the parquet using some other process , it might create like .metadata-00000 ( something similar ).
You can try renaming the folder to .metadata and try.
Else, if this does not works you can try with hcatalog sqoop export.
Hi for those encountering the same problem as me, here is my own solution (this could be vary depends on your environment)
write hive data to hdfs directory, you can use insert overwrite directory command in hive.
if you have deflated generated data from hive query in designated hdfs path, use this:
hdfs dfs -text <hdfs_path_file>/000000_0.deflate | hdfs dfs -put <target_file_name> <hdfs_target_path>
sqoop export the inflated files using sqoop export command, don't forget to map your column according to your data type in target table

How to create single file while using sqoop import with multiple mappers

I want to import data from Mysql using sqoop import but my requirement is i want to use 4 mappers but it should create only one file in hdfs target directory is there is any way to do this ?
No. there is no option in sqoop to re-partition files into 1 file.
I don't think this should be a headache of sqoop.
You can do it easily using getmerge feature of hadoop. Example:
hadoop fs -getmerge /sqoop/target-dir/ /desired/local/output/file.txt
Here
/sqoop/target-dir is the target-dir of your sqoop command (directory containing all the part files).
desired/local/output/file.txt is the combined single file.
you can use below sqoop command..!!
Suppose database name is prateekDB and table name is Emp...!!
sqoop import --connect "jdbc:mysql://localhost:3306/prateekDB" --username=root \
--password=data --table Emp --target-dir /SqoopImport --split-by empno
Add this option to sqoop
--num-mappers 1
the sqoop log shows:
Job Counters
Launched map tasks=1
Other local map tasks=1
and finally on hdfs ONE file is created.

Sqoop export hive table to RDBMS using Hive table name

How to export Hive table data to RDBMS using sqoop, I am able to export Hive data from hive table path.
In order to directly export from hive table, reference to HCATALOG is needed. Also for the destination table, the case matters. So for MySQL the table should be lower case but if Oracle it will be upper. Following is an example in cloudera where destination database is MySQL. I did not need to use --hcatalog-home or --hcatalog-database, but depending on your setup, it might be require.
sqoop export
--connect jdbc:mysql://localhost/retail_db
--username root -P
--table mysql_test
--hcatalog-table test

Sqoop import into Hive Sequence table

I am trying to load a Hive table using the Sqoop import commands. But when I run it says that Sqoop doesn't support SEQUENCE FILE FORMAT while loading into hive.
Is this correct , I through SQOOP has matured for all the formats present in Hive. Can anyone guide me on this. And if at all standard procedure to load Hive tables which have SEQUENCE FILE FORMAT using sqoop.
Currently importing of sequence files directly into Hive is not supported yet. But you can import data --as-seuquencefile into HDFS and then you can create an external table on top of that. As you are saying you are getting exceptions even with this approach, please paste your sample code & logs, so that I can help you.
PFB code
sqoop import --connect jdbc:mysql://xxxxx/Emp_Details --username xxxx--password xxxx --table EMP --as-sequencefile --hive-import --target-dir /user/cloudera/emp_2 --hive-overwrite