I have default db in hive table which contains 80 tables .
I have created one more database and I want to copy all the tables from default DB to new Databases.
Is there any way I can copy from One DB to Other DB, without creating individual table.
Please let me know if any solution..
Thanks in advance
I can think of couple of options.
Use CTAS.
CREATE TABLE NEWDB.NEW_TABLE1 AS select * from OLDDB.OLD_TABLE1;
CREATE TABLE NEWDB.NEW_TABLE2 AS select * from OLDDB.OLD_TABLE2;
...
Use IMPORT feature of Hive
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport
Hope this helps.
create external table new_db.table like old_db.table location '(path of file in hdfs file)';
if you have partition in table then you have to add partition in new_db.table.
These are probably the fastest and simplest way to copy / move tables from one db to other.
To move table source
Since 0.14, you can use following statement to move table from one database to another in the same metastore:
alter table old_database.table_a rename to new_database.table_a;
The above statements will also move the table data on hdfs if table_a is a managed table.
To copy table
You can always use CREATE TABLE <new_db>.<new_table> AS SELECT * FROM <old_db>.<old_table>; statements. But I believe this alternate method of copying database using hdfs dfs -cp and then creating tables with LIKE can be a little faster if your tables are huge:
hdfs dfs -cp /user/hive/warehouse/<old_database>.db /user/hive/warehouse/<new_database>.db
And then in Hive:
CREATE DATABASE <new_database>;
CREATE TABLE <new_database>.<new_table> LIKE <old_database>.<old_table>;
You can approach one of the following option :
The syntax looks something like this:
EXPORT TABLE table_or_partition TO hdfs_path;
IMPORT [[EXTERNAL] TABLE table_or_partition] FROM hdfs_path [LOCATION [table_location]];
Some sample statements would look like:
EXPORT TABLE TO 'location in hdfs';
Use test_db;
IMPORT FROM 'location in hdfs';
Export Import can be appled on a partition basis as well:
EXPORT TABLE PARTITION (loc="USA") to 'location in hdfs';
The below import commands imports to an external table instead of a managed one
IMPORT EXTERNAL TABLE FROM 'location in hdfs' LOCATION ‘/location/of/external/table’;
Related
probably a very trivial question but I'm not sure about this and also don't want to lose the table. How do I rename a table in Athena?
Database name - friends
table name - centralPark
desired table name -centralPerk
you can't!
see the list of unsupported DDL in Athena.
what you can do is to make a new table using select:
CREATE TABLE centralPark
AS SELECT * FROM centralPerk
WITH DATA
and drop the old table:
DROP TABLE IF EXISTS centralPerk
Using a CTAS query is effective, but I found it to be quite slow. It needs to copy all the files.
But you don't need to copy the files. You can create a new table directly in the Glue catalog and point it at the existing files. This works in seconds or less.
If you're using Python, I highly recommend the awswrangler library for this kind of work.
import awswrangler as wr
def wrangler_copy(db, old_name, new_name):
wr.catalog.create_parquet_table(
db,
new_name,
path=wr.catalog.get_table_location(db, old_name),
columns_types=wr.catalog.get_table_types(db, old_name),
# TODO: partitions, etc
)
And then drop the old table if you like.
DROP TABLE IF EXISTS <old_name>
When creating a table in SQL Server, it is created using dbo.<tableName> format.
I want to change dbo.tableName to source.tableName, as I want to import data into a source table and then cook that data.
Thanks
You are talking about schemas. If the schema source doesn't exist yet, you need to run create schema source. Once the schema exists it's as easy as create table source.tableName (...).
I am new to HDFS and HIVE. I got some introduction of both after reading some books and documentation. I have a question regarding creation of a table in HIVE for which file is present in HDFS.
I have this file with 300 fields in HDFS. I want to create a table accessing this file in HDFS. But I want to make use of say 30 fields from this file.
My questions are
1. Does hive create a separate file directory?
2. Do I have to create hive table first and import data from HDFS?
3. Since I want to create a table with 30 columns out of 300 columns, Does hive create a file with only those 30 columns?
4. Do I have to create a separate file with 30 columns and import into HDFS and then create hive table pointing to HDFS directory?
My questions are
Does hive create a separate file directory?
YES if you create a hive table (managed/external) and load the data using load command.
NO if you create an external table and point to the existing file.
Do I have to create hive table first and import data from HDFS?
Not Necessarily you can create a hive external table and point to this existing file.
Since I want to create a table with 30 columns out of 300 columns, Does hive create a file with only those 30 columns?
You can do it easily using hiveQL. follow the below steps (note: this is not the only approach):
create a external table with 300 column and point to the existing
file.
create another hive table with desired 30 columns and insert data to this new table from 300 column table using "insert into
table30col select ... from table300col". Note: hive will create the
file with 30 columns during this insert operation.
Do I have to create a separate file with 30 columns and import into HDFS and then create hive table pointing to HDFS directory?
Yes this can be an alternative.
I personally like solution mentioned in question 3 as I don't have to recreate the file and I can do all of that in hadoop without depending on some other system.
You have several options. One is to have Hive simply point to the existing file, i.e. create an external HIVE table:
CREATE EXTERNAL TABLE ... LOCATION '<your existing hdfs file>';
This table in Hive will, obviously, match exactly your existing table. You must declare all 300 columns. There will be no data duplication, there is only one one file, Hive simply references the already existing file.
A second option would be to either IMPORT or LOAD the data into a Hive table. This would copy the data into a Hive table and let Hive control the location. But is important to understand that neither IMPORT nor LOAD do not transform the data, so the result table will have exactly the same structure layout and storage as your original table.
Another option, which I would recommend, is to create a specific Hive table and then import the data into it, using a tool like Sqoop or going through an intermediate staging table created by one of the methods above (preferably external reference to avoid an extra copy). Create the desired table, create the external reference staging table, insert the data into the target using INSERT ... SELECT, then drop the staging table. I recommend this because it lets you control not only the table structure/schema (ie. have only the needed 30 columns) but also, importantly, the storage. Hive has a highly columnar performant storage format, namely ORC, and you should thrive to use this storage format because will give you tremendous query performance boost.
Using a SQL tool like SQL Developer / Toad for Oracle
Is it possible to write a SQL query that will do the following
SELECT * FROM TABLE
WHERE COLUMN1 IN CSV_FILE
The CSV file is just one column of data with no delimiters.
How can I achieve this?
Constraints
I cannot create a temp table to insert CSV file (no create permissions)
The data I am using of this column is the only index in that table so I cannot use other columns to query or else it will be really slow.
Thanks
Creating external table is the best way. If you dont have permission then the other way is to move the file to the path of any oracle directory(Oracle object - Directory). And with help of utl_file read the file, loop through it and do your operation inside a PL/SQL block which is too tedious.
See the eaxmples for using utl_file - http://psoug.org/reference/utl_file.html
But its better if you try and get create access.
Toad for Oracle data import (uses sqlldr internally)
Create a temp table and load the data using this utility and select the values
External tables
Create external table, load the data through the same and select the values.
Using SQL developer you can create a table in your schema and load this table with data from a csv file.
Notes:
You will need to create a void column per each column to import from excel
Excel export csv with ";" delimiter
If SQL developer(4.1.5) doesn't preview the fields in separated columns try moving forward/backwards with Next/back buttons
and a very graphical guide in the following page:
http://www.thatjeffsmith.com/archive/2012/04/how-to-import-from-excel-to-oracle-with-sql-developer/
I need a command for importing a table for the following scenario.
I have a table EMPLOYEE in server A. I am exporting the Table.
I have another table PDATA(having same structure of EMPLOYEE table) in server B.
I need to import the records from EMPLOYEE table(server A) into PDATA table(server B).
I am using Oracle 10g. Please advise.
There are a couple of options. I am going to assume that you don't have any binary data and that the tables aren't absurdly large. We also don't know what type of access you have to either server.
You could use a tool, such as TOAD, to either export to csv or create insert statements. Then execute those on the second server.
You could use PL/SQL and the UTL_FILE library to dump the contents of the table to a csv file. Then mount the csv file as an external table and select into your new table.
If you have the appropriate permissions and the machines can physically see each other you can setup a database link: http://docs.oracle.com/cd/B14117_01/server.101/b10759/statements_5005.htm Once the link is created, you can select from one table into the other.
If you are a DBA then you can use the Export utility, which will export the table into a binary format that can be imported elsewhere.