Does Apache Hive have an equivalent to PostgreSQL's pg_dump?

Does Apache Hive have an equivalent to PostgreSQL's pg_dump? - hive

I have a bunch of databases in Apache Hive. I want to output their structure - table names, column names, keys, relationships, etc. An equivalent of PostgreSQL's pg_dump would be perfect. Does anything like that exist?

I know three ways to do that kind of stuff. None of which is fun.
A. Use some custom scripting and Beeline (or Hive CLI) to reverse engineer all the tables and views
run a show databases query
parse the result, iterate on show tables in **
parse the result, iterate on show CREATE TABLE **.**
...and show partitions **.** and show indexes **.** (but you have to rebuild the ALTER TABLE / CREATE INDEX commands by yourself)
if you have an authorization policy, also run show principals and iterate on show grant ** (but you have to rebuild the CREATE ROLE and GRANT commands by yourself)
B. Develop your custom Java program to connect to the MetaStore service, scan the databases, scan the tables/views, scan the partitions, scan the StorageDescriptor, scan the columns, scan everything and its dog ... to get what you really want. Good luck. Some pointers here.
C. Connect straight to the MetaStore database back-end (Derby, MySQL, Postgres...) then try to understand where the stuff you want is located and SELECT whatever you need

Related

How to sync tables schema without dropping the table?

Not:
DROP -> CREATE
I need:
COMPARE -> ALTER
I have a test and a production database, the data withing these two are different but the schemas should be the same.
I need something like a production script or a tool or a method which compare these two dbs schema and sync them. I'm coding in nodejs and the thing is I haven't used tools like an ORM or db-migrate, I've created the database using MYSQL-workbench and it costs a lot to write every alter query. there must be an easier way.

How to extract the SQL Create statement from ignite

Using Apache Ignite, is it possible to extract the CREATE statement used to create the table? You can do this in MySQL with the SHOW CREATE TABLE x command for example.

I don't think dumping DML (database structure) is possible currently. Especially since CREATE TABLE is only one way of making tables in Ignite out of three.
However, you can query tables, schemas and indexes via JDBC metadata introspection feature.

How to add or route PostgreSQL Data to New Hard Drive

Im Using Windows Server 2008 R2 Standard
Im Running PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 32-bit
I got C:/ and D:/ Drive
C:/ --> 6.7GB free space (almost full and my server performance running low)
D:/ --> 141GB free space
Currently my PostgreSQL Data stored at C:/ Now,I want to route or add path to D:/ without migrate the data from C:/ to D:/ because now my PostgreSQL Data Stored around 148 GB. It Heavy and Massive Stored.
If success, I should still be able to do a query like SELECT * From table_bla_bla and it will return result from both drives?
Please do not suggest me to change PostgreSQL to other DB or whatsoever.
Because Im running 39,763 Device GPS Meter that send the data to my Server.
I have to take care this server because my expert already past-away.

You need to use tablespaces.
Create the tablespace, for example CREATE TABLESPACE second_drive LOCATION 'D:/postgresdata/' (see this other answer if you get permission denied errors)
ALTER TABLE table_bla_bla SET tablespace second_drive
Tablespaces allow you to decide which tables go on which drives and that can help speed up performance by ensuring you control where reads and writes go, but it also helps with space.

Postgres places individual tables in TABLESPACEs (which relate to a single disk), which is enough if you have multiple tables and you can achieve what you need by moving some tables to the other disk.
On the other hand, if you have a large table that you need to split over multiple disks, you need to use Postgres's Horizontal Partitioning capability.
This builds on tablespaces by allowing you to create a master table table_bla_bla which is actually just a facade on top of two or more tables which actually hold the data. These data tables can then be put on different tablesspaces effectively splitting your data over disks.
For this you would:
Rename your current table_bla_bla to something like
table_bla_bla_c
Create a new table_bla_bla master table.
Alter table_bla_bla_c to mark that it inherits from
table_bla_bla
Create a new table_bla_bla_d table that inherits from table_bla_bla and specify the tablespace as the D drive.
Apply partitioning triggers and check constraints as per the partitioning documentation
Once this is in place, you can arrange it so that any inserts into table_bla_bla cause new records to be created on the D drive. Selects on table_bla_bla will read from both disks.

Update Hive metadata location for many tables

I would like to change the bucket name in location of many Hive tables. Is it possible for us to connect to mySQL database and update it? I think it is possible.But I would like to know if it is safe to do it in production database.

Yes, it is possible, and I have seen it done; but
(a) the Metastore schema is not documented, and each Hive version brings some minor changes, so you have to do your own exploration to find where/how the StorageDescriptor objects are persisted -- then some unit tests / non-regression tests on a Dev system -- plus, don't forget to run a full DB backup before tinkering with your Prod system (and to rehearse an emergency restoration on your Dev system, too!)
(b) you have to update the StorageDescriptor for tables, but also for partitions -- remember that for partitioned tables, the table-level LOCATION is just used as default root dir for future partitions; once created, a partition retains its location until it is ALTERed explicitly.
For the record, the preferred method for bulk updates is (in theory) the Hive MetaTool but unfortunately, it does not support the kind of updates that you need.Right now it's only good for changing the NameNode alias in all HDFS paths, because that was a real pain point...
A valid alternative to brutal SQL Updates would be to develop a custom Java program, using the Hive MetaStore API, to scan all tables & partitions then read their StorageDescriptor then run RegEx changes on their Location then write back the changes (which is exactly what the MetaTool does, only at a lower level). But that would be overkill.
Finally, a possible compromise would be a SQL Select on the appropriate MySQL table, to generate (with regexp_replace()) a chain of ALTER Table/Partition LOCATION commands to run later in the Hive CLI.Plus a chain of ALTER to revert to the original locations, in case you have to do an emergency rollback :-/

Bteq Scripts to copy data between two Teradata servers

How do I copy data from multiple tables within one database to another database residing on a different server?
Is this possible through a BTEQ Script in Teradata?
If so, provide a sample.
If not, are there other options to do this other than using a flat-file?

This is not possible using BTEQ since you have mentioned both the databases are residing in different servers.
There are two solutions for this.
Arcmain - You need to use Arcmain Backup first, which creates files containing data from your tables. Then you need to use Arcmain restore which restores the data from the files
TPT - Teradata Parallel Transporter. This is a very advanced tool. This does not create any files like Arcmain. It directly moves the data between two teradata servers.(Wikipedia)

If I am understanding your question, you want to move a set of tables from one DB to another.
You can use the following syntax in a BTEQ Script to copy the tables and data:
CREATE TABLE <NewDB>.<NewTable> AS <OldDB>.<OldTable> WITH DATA AND STATS;
Or just the table structures:
CREATE TABLE <NewDB>.<NewTable> AS <OldDB>.<OldTable> WITH NO DATA AND NO STATS;
If you get real savvy you can create a BTEQ script that dynamically builds the above statement in a SELECT statement, exports the results, then in turn runs the newly exported file all within a single BTEQ script.
There are a bunch of other options that you can do with CREATE TABLE <...> AS <...>;. You would be best served reviewing the Teradata Manuals for more details.

There are a few more options which will allow you to copy from one table to another.
Possibly the simplest way would be to write a smallish program which uses one of their communication layers (ODBC, .NET Data Provider, JDBC, cli, etc.) and use that to take a select statement and an insert statement. This would require some work, but it would have less overhead than trying to learn how to write TPT scripts. You would not need any 'DBA' permissions to write your own.
Teradata also sells other applications which hides the complexity of some of the tools. Teradata Data Mover handles provides an abstraction layer between tools like arcmain and tpt. Access to this tool is most likely restricted to DBA types.

If you want to move data from one server to another server then
We can do this with the flat file.
First we have fetch data from source table to flat file through any utility such as bteq or fastexport.
then we can load this data into target table with the help of mload,fastload or bteq scripts.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Does Apache Hive have an equivalent to PostgreSQL's pg_dump? - hive

I have a bunch of databases in Apache Hive. I want to output their structure - table names, column names, keys, relationships, etc. An equivalent of PostgreSQL's pg_dump would be perfect. Does anything like that exist?

Related

How to sync tables schema without dropping the table?

How to extract the SQL Create statement from ignite

How to add or route PostgreSQL Data to New Hard Drive

Update Hive metadata location for many tables

Bteq Scripts to copy data between two Teradata servers

Categories

Resources