How to import Teradata table into HIVE using HUE? - hive

I need help importing a Teradata table (with LDAP credentials) into HIVE using HUE interface. I'm new to HIVE and has very little knowledge. Also, the import needs to be a fresh load and not append.
Thank you.

Ask your admin for access to sqoop.
First of all create a table mytable just like source table.
Use below command when you are using a SQL to extract data. i used oracle as example. for teradata you will get some jdbc driver
sqoop import --connect jdbc:oracle:thin:#host:port/service_name --username xx --password yy --target-dir 'hdfs://your/table1/ --query "select * from scott.mytable where emp_id =1 AND \$CONDITIONS" --hive-import --hive-drop-import-delims --hive-database mydb --hive-table mytable1 --hive-overwrite --delete-target-dir --m 1
Use below SQL if you want to extract complete table without any filter.
sqoop import --connect jdbc:oracle:thin:#host:port/service_name --username xx --password yy --target-dir 'hdfs://your/table1/ --hcatalog-database mydb --hcatalog-table mytable1 --table mytable --hive-import --hive-drop-import-delims --hive-overwrite --delete-target-dir --m 1
Pls note, there can be many options to sqooop, you need to add or remove parameters.

Related

sqoop : Pull data to hive table with extra columns

I need to pull records from a MySQL table with n columns and store them in hive with extra columns. Is there any way in sqoop to perform it?
Example:
MySQL table has the following fields id, name, place. And,
Hive table structure is id, name, place and contact number(null).
So when performing sqoop, I want to add an extra column contact number in hive as (null).
You can specify it in the by using --query option in sqoop and select the extra column with NULL AS.
sqoop import \
--query 'SELECT id, name, place, NULL AS contact_number FROM mysql_table'
--connect jdbc:mysql://mysql.example.com/sqoop \
--Any other options

Why impala not showing all tables created by Hive

I have imported all tables using sqoop into a Hive database "sqoop_import" able to see all tables imported successfully as below :-
hive> use sqoop_import;
OK
Time taken: 0.026 seconds
hive> show tables;
OK
categories
customers
departments
order_items
orders
products
Time taken: 0.025 seconds, Fetched: 6 row(s)
hive>
But when I am trying the same from impala-shell or Hue using the same user, It's showing different results as below : -
[quickstart.cloudera:21000] > use sqoop_import;
Query: use sqoop_import
[quickstart.cloudera:21000] > show tables;
Query: show tables
+--------------+
| name |
+--------------+
| customers |
| customers_nk |
+--------------+
Fetched 2 row(s) in 0.01s
[quickstart.cloudera:21000] >
What am I doing wrong?
When you import a new table with sqoop to hive, in order to see it through Impala-Shell you should INVALIDATE METADATA of the specific table. So from the Impala-Shell run the following command : impala-shell -d DB_NAME -q "INVALIDATE METADATA table_name"; .
But if you append new data files to an existing table through sqoop you need to do REFRESH. So from the Impala-Shell run the following command:
impala-shell -d DB_NAME -q "REFRESH table_name";.

When to use Sqoop --create-hive-table

Can anyone tell the difference between create-hive-table & hive-import method? Both will create a hive table, but still what is the significance of each?
hive-import command:
hive-import commands automatically populates the metadata for the populating tables in hive metastore. If the table in Hive does not exist yet, Sqoop
will simply create it based on the metadata fetched for your table or query. If the table already exists, Sqoop will import data into the existing table. If you’re creating a new Hive table, Sqoop will convert the data types of each column from your source table to a type compatible with Hive.
create-hive-table command:
Sqoop can generate a hive table (using create-hive-tablecommand) based on the table from an existing relational data source. If set, then the job will fail if the target hive table exists. By default this property is false.
Using create-hive-table command involves three steps: importing data into HDFS, creating hive table and then loading the HDFS data into Hive. This can be shortened to one step by using hive-import.
During a hive-import, Sqoop will first do a normal HDFS import to a temporary location. After a successful import, Sqoop generates two queries: one for creating a table and another one for loading the data from a temporary location. You can specify any temporary location using either the --target-dir or --warehouse-dir parameter.
Added a example below for above description
Using create-hive-table command:
Involves three steps:
Importing data from RDBMS to HDFS
sqoop import --connect jdbc:mysql://localhost:3306/hadoopexample --table employees --split-by empid -m 1;
Creating hive table using create-hive-table command
sqoop create-hive-table --connect jdbc:mysql://localhost:3306/hadoopexample --table employees --fields-terminated-by ',';
Loading data into Hive
hive> load data inpath "employees" into table employees;
Loading data to table default.employees
Table default.employees stats: [numFiles=1, totalSize=70]
OK
Time taken: 2.269 seconds
hive> select * from employees;
OK
1001 emp1 101
1002 emp2 102
1003 emp3 101
1004 emp4 101
1005 emp5 103
Time taken: 0.334 seconds, Fetched: 5 row(s)
Using hive-import command:
sqoop import --connect jdbc:mysql://localhost:3306/hadoopexample --table departments --split-by deptid -m 1 --hive-import;
The difference is that create-hive-table will create table in Hive based on the source table in database but will NOT transfer any data. Command "import --hive-import" will both create table in Hive and import data from the source table.

How to backup some tables with data and some tables only schema PostgreSQL

I want to dump a database.
I have three tables:
table1
table2
table3
From table1 i want the schema plus data.
From table2 and table3 i just want the schema.
How do i do that?
To get data from just a few tables:
pg_dump myDatabase --inserts -a -t table1 -t table2 > backup.sql;
pg_dump myDatabase --inserts -a -t seq1 -t seq2 > backupSequences.sql;
Parameters descriptions:
-a, --data-only dump only the data, not the schema
-t, --table=TABLE dump the named table(s) only
--inserts dump data as INSERT commands, rather than
COPY
This is what i wanted :)
Thanks all!
Use pg_dump, which has both schema-only and schema + data output.

PostgreSQL, update existing rows with pg_restore

I need to sync two PostgreSQL databases (some tables from development db to production db) sometimes.
So I came up with this script:
[...]
pg_dump -a -F tar -t table1 -t table2 -U user1 dbname1 | \
pg_restore -a -U user2 -d dbname2
[...]
The problem is that this works just for newly added rows. When I edit non-PK column I get constraint error and row isn't updated. For each dumped row I need to check if it exists in destination database (by PK) and if so delete it before INSERT/COPY.
Thanks for advices.
Do this:
pg_dump -t table1 production_database > /tmp/old_production_database_table1.sql
pg_dump -t table1 devel_database > /tmp/devel_database_table1.sql
psql production_database
truncate table1
\i /tmp/devel_database_table1.sql
\i /tmp/old_production_database_table1.sql
You'll get a lot of duplicate primary key errors on second \i, but it'll do what you want: all rows from devel will be updated, all rows not in devel will not be updated nor deleted.
If you have any references to table1 then you'll have to drop them before and recreate them after importing. Especially check for on delete cascade, set null or set default references to table1 - you'd loose data in other tables if you have those.