Importing data from RDBMS using sqoop to hive

Importing data from RDBMS using sqoop to hive - hive

I was trying to import data from RDBMS using sqoop into hive but getting below error :
premature end of file hive-site.xml

Related

is there any way to import all tables not views from RDBMS to hive using sqoop

I m trying to import 800 tables from SQL server to hive using sqoop import-all command, i have some views and some tables in SQL database i want to import only tables not views in hive using sqoop
also when i m doing it with import all command if there is any error sqoop exits at that point again have to start the process from the beginning.
errors are like naming conventions datatype issue, no primary key in SQL table.
thank you

sqoop export from hive partitioned parquet table to oracle

is it possible to do sqoop export from parquet partitioned hive table to oracle database?
our requirement is to use processed data to legacy system that cannot support hadoop/hive connection, thank you..
tried:
sqoop export -Dmapreduce.job.queuename=root.hsi_sqm \
--connect jdbc:oracle:thin:#host:1521:sid \
--username abc \
--password cde \
--export-dir '/user/hive/warehouse/stg.db/tb_parquet_w_partition/' \
--table UNIQSUBS_DAY
got error:
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://nameservice1/user/hive/warehouse/stg.db/tb_parquet_w_partition/.metadata
org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://nameservice1/user/hive/warehouse/stg.db/tb_parquet_w_partition/.metadata
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
at org.apache.sqoop.manager.OracleManager.exportTable(OracleManager.java:465)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
is there any correct approach for this?

We were facing similar issues.
Parquet creates .metadata folder. If you created the parquet using some other process , it might create like .metadata-00000 ( something similar ).
You can try renaming the folder to .metadata and try.
Else, if this does not works you can try with hcatalog sqoop export.

Hi for those encountering the same problem as me, here is my own solution (this could be vary depends on your environment)
write hive data to hdfs directory, you can use insert overwrite directory command in hive.
if you have deflated generated data from hive query in designated hdfs path, use this:
hdfs dfs -text <hdfs_path_file>/000000_0.deflate | hdfs dfs -put <target_file_name> <hdfs_target_path>
sqoop export the inflated files using sqoop export command, don't forget to map your column according to your data type in target table

Hadoop : Reading ORC files and putting into RDBMS?

I have a hive table which is stored in ORC files format. I want to export the data to a Teradata database. I researched sqoop but could not find a way to export ORC files.
Is there a way to make sqoop work for ORC ? or is there any other tool that I could use to export the data ?
Thanks.

You can use Hcatalog
sqoop export --connect "jdbc:sqlserver://xxxx:1433;databaseName=xxx;USERNAME=xxx;PASSWORD=xxx" --table rdmsTableName --hcatalog-database hiveDB --hcatalog-table hiveTableName

Sqoop import into Hive Sequence table

I am trying to load a Hive table using the Sqoop import commands. But when I run it says that Sqoop doesn't support SEQUENCE FILE FORMAT while loading into hive.
Is this correct , I through SQOOP has matured for all the formats present in Hive. Can anyone guide me on this. And if at all standard procedure to load Hive tables which have SEQUENCE FILE FORMAT using sqoop.

Currently importing of sequence files directly into Hive is not supported yet. But you can import data --as-seuquencefile into HDFS and then you can create an external table on top of that. As you are saying you are getting exceptions even with this approach, please paste your sample code & logs, so that I can help you.

PFB code
sqoop import --connect jdbc:mysql://xxxxx/Emp_Details --username xxxx--password xxxx --table EMP --as-sequencefile --hive-import --target-dir /user/cloudera/emp_2 --hive-overwrite

how do you import a local csv into hive without creating a schema using sqoop

I have a csv in my local directory and i wish to create a hive table of it..The problem is csv has many columns...

In authors words Sqoop means Sql-to-Hadoop.. you can't use Sqoop to import data from your local to hdfs in any way.
Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:
Imports individual tables or entire databases to files in HDFS
Generates Java classes to allow you to interact with your imported data
Provides the ability to import from SQL databases straight into your Hive data warehouse
For more information follow below links:
http://blog.cloudera.com/blog/2009/06/introducing-sqoop/
http://kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Importing data from RDBMS using sqoop to hive - hive

I was trying to import data from RDBMS using sqoop into hive but getting below error : premature end of file hive-site.xml

Related

is there any way to import all tables not views from RDBMS to hive using sqoop

sqoop export from hive partitioned parquet table to oracle

Hadoop : Reading ORC files and putting into RDBMS?

Sqoop import into Hive Sequence table

how do you import a local csv into hive without creating a schema using sqoop

Categories

Resources