how do you import a local csv into hive without creating a schema using sqoop - hive

I have a csv in my local directory and i wish to create a hive table of it..The problem is csv has many columns...

In authors words Sqoop means Sql-to-Hadoop.. you can't use Sqoop to import data from your local to hdfs in any way.
Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:
Imports individual tables or entire databases to files in HDFS
Generates Java classes to allow you to interact with your imported data
Provides the ability to import from SQL databases straight into your Hive data warehouse
For more information follow below links:
http://blog.cloudera.com/blog/2009/06/introducing-sqoop/
http://kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html

Related

is there any way to import all tables not views from RDBMS to hive using sqoop

I m trying to import 800 tables from SQL server to hive using sqoop import-all command, i have some views and some tables in SQL database i want to import only tables not views in hive using sqoop
also when i m doing it with import all command if there is any error sqoop exits at that point again have to start the process from the beginning.
errors are like naming conventions datatype issue, no primary key in SQL table.
thank you

Impala import data from an SQL file

I have the .sql files in HDFS which has mysql INSERT queries. I created an external table in impala with the appropriate schema. Now I want to run all the insert commands stored in HDFS to the table. Is there a way to do it? Data size is 900GB!
Is any other way to import the .sql files from HDFS? We have tried with hive but it requires all the insert fields to be in lowercase which we do not have.

In sqoop export, Avro table to define schema in RDBMS

I'm loading data from HDFS to mySQL using SQOOP, in this data one record has got more than 70 fields, making it difficult to define the schema while creating the table in RDBMS.
Is there a way to use AVRO tables to dynamically create the table with schema in RDBMS using SQOOP?
Or is there any some tool which does the same?
This is not supported in sqoop today. From the sqoop documentation
The export tool exports a set of files from HDFS back to an RDBMS. The
target table must already exist in the database. The input files are
read and parsed into a set of records according to the user-specified
delimiters.

Hive import from HDFS as ORC

I exported a hive table in hdfs using
export table abc to '/tmp/123'
It created two directories /data and /metadata and keeps data and metadata accordingly.
Now I sent this data to another hadoop cluster using distcp command.
Now I want to import this data in a table xyz (not created yet)
Now using hive import command
import table xyz from '/tmp/123'
It created Table xyz with data but in text format.
I want this xyz in ORC format.
Yes, I can create another table using this text table.
Is there any better way for that ??
Can I mention anywhere in command?

Export Hive data incremental

We have a requirement to run HiveQL incrementally and export results to a file in avro fromat, and we need to export the records.
Following are the 2 ways i am looking at and challenges i see in using them.
Option 1: using Pig and customer loader:
a. Writing a custom pig loader which run the hive query incemental.
b. write a pig flow and create a relation to the results of hive loader.
c. save the result in avro file.
Option 2. SQOOP export - i couldn't find a why to export hive query results in incrementally.
So far with my analysis i am think going with option 1 will better suit for my requirement.
Can someone please explain if you think we can acheive this easily in sqoop?
Sqoop can export data from HDFS directory to target databases, not files. In this case sqoop cannot
Read increment results unless you have separate hive table or partition (which results in new directory)
Write into external files in avro format.