Is there any way to validate data at the time of import in hive , as hive is schema on read , any help would be appreciated.
Thanks
Related
I m trying to import 800 tables from SQL server to hive using sqoop import-all command, i have some views and some tables in SQL database i want to import only tables not views in hive using sqoop
also when i m doing it with import all command if there is any error sqoop exits at that point again have to start the process from the beginning.
errors are like naming conventions datatype issue, no primary key in SQL table.
thank you
what are the ways to load data from big-query table to other database?
steps required to load data from big-query to other DB
You have to do it in three steps:
1) Export data from BigQuery https://cloud.google.com/bigquery/docs/exporting-data
2) Prepare your data to be imported
3) Import in Db2
I have some IDs based on which I want to import data using sqoop into the Hive. But the number of IDs are too high to be given manually. How can I perform this action with minimal effort?
Any guidance would be very much appreciated.
Thanks in advance. :)
SQOOP cannot read the list of values from a file.
The best option is to write a Shell script which reads the list of values and produces the SQOOP command in flight using the values read and executes it.
All,
I have question for sqooping , I am sqooping around 2tb of data for one table and then need to write ORC table wit h that . What's best way to achieve
1) sqoop all data in dir1 as text and write HQL to load into ORC table , where script fail for vertex issue
2) sqoop data in chucks and process and append into hive table ( have you done this ? )
3) sqoop hive import to write all data to hive ORC table
Which is best way ?
Option three will be better because you dont need to create a hive table and again loading data into it and storing that data in orc format it is a long process for 2tb of data so its better to give in sqoop so it can directly push the data into hive table with orc format but when you are returning data from hive table to rdbms you have to use sqoopserde
I am trying to load a Hive table using the Sqoop import commands. But when I run it says that Sqoop doesn't support SEQUENCE FILE FORMAT while loading into hive.
Is this correct , I through SQOOP has matured for all the formats present in Hive. Can anyone guide me on this. And if at all standard procedure to load Hive tables which have SEQUENCE FILE FORMAT using sqoop.
Currently importing of sequence files directly into Hive is not supported yet. But you can import data --as-seuquencefile into HDFS and then you can create an external table on top of that. As you are saying you are getting exceptions even with this approach, please paste your sample code & logs, so that I can help you.
PFB code
sqoop import --connect jdbc:mysql://xxxxx/Emp_Details --username xxxx--password xxxx --table EMP --as-sequencefile --hive-import --target-dir /user/cloudera/emp_2 --hive-overwrite