I have a csv file named test.csv on my HDFS.
I have created an Avro table (avro_test) using Hue with the same column names as the csv file. I want to use a sqoop command to put the csv elements in the Avro table.
What sqoop command will achieve this?
Sqoop is meant to load/transfer data between RDBMS and Hadoop. You can just insert the CSV data into the avro table you have created.
Please refer below link.
Load from CSV File to Hive Table with Sqoop?
Related
I have the .sql files in HDFS which has mysql INSERT queries. I created an external table in impala with the appropriate schema. Now I want to run all the insert commands stored in HDFS to the table. Is there a way to do it? Data size is 900GB!
Is any other way to import the .sql files from HDFS? We have tried with hive but it requires all the insert fields to be in lowercase which we do not have.
All,
I have question for sqooping , I am sqooping around 2tb of data for one table and then need to write ORC table wit h that . What's best way to achieve
1) sqoop all data in dir1 as text and write HQL to load into ORC table , where script fail for vertex issue
2) sqoop data in chucks and process and append into hive table ( have you done this ? )
3) sqoop hive import to write all data to hive ORC table
Which is best way ?
Option three will be better because you dont need to create a hive table and again loading data into it and storing that data in orc format it is a long process for 2tb of data so its better to give in sqoop so it can directly push the data into hive table with orc format but when you are returning data from hive table to rdbms you have to use sqoopserde
I am exporting a Hive partitioned table to Teradata. How can the partitioned column be included in the export?
I know this can be done by copying the data to HDFS directory and then using that directory as source for sqoop export. Is there an one step approach to include the partitioned column during export?
I exported a hive table in hdfs using
export table abc to '/tmp/123'
It created two directories /data and /metadata and keeps data and metadata accordingly.
Now I sent this data to another hadoop cluster using distcp command.
Now I want to import this data in a table xyz (not created yet)
Now using hive import command
import table xyz from '/tmp/123'
It created Table xyz with data but in text format.
I want this xyz in ORC format.
Yes, I can create another table using this text table.
Is there any better way for that ??
Can I mention anywhere in command?
I have a hive table which is stored in ORC files format. I want to export the data to a Teradata database. I researched sqoop but could not find a way to export ORC files.
Is there a way to make sqoop work for ORC ? or is there any other tool that I could use to export the data ?
Thanks.
You can use Hcatalog
sqoop export --connect "jdbc:sqlserver://xxxx:1433;databaseName=xxx;USERNAME=xxx;PASSWORD=xxx" --table rdmsTableName --hcatalog-database hiveDB --hcatalog-table hiveTableName