Migrate Hive Table Schema

Migrate Hive Table Schema - hive

I have a set of Hive databases that were expensive and time consuming to build, whose schemas need to change (in a small way). Is there a tool to migrate the data from the current schema to the new schema?

Related

How is AWS Athena so different from a database

AWS Athena can read tables, create new tables and insert into tables (among other things). Is it fair to say that Athena can replace the basic functions of a database , or is it fair to say that Athena can be seen as a lightweight serverless replacement of a relational DB? What are its major differences compared to a DB ?

Athena is not a database and it's a query engine.In Athena compute and storage are separate where as in the case of a database both are tightly coupled.
Athena is more of a OLAP solution where as relational DB is OLTP and it is fully managed service. It will also not maintain the metadata and instead use Glue catalog to store and reteive it.
Also found this article which talks little more about the differences. You can have a read.

Load daily MySQL DB snapshots from S3 to snowflake

I have daily MySQL DB snapshots stored on S3. This daily DB snapshot is a backup of 1000 tables in our DB, using mysqldump, size is about 300M daily (stored 1 year of snapshots, which is about 110G).
Now we want to load these snapshots daily to snowflake for reporting purpose. How do we create tables in snowflake? Shall we create 1000 tables? Will snowflake be able to handle this scenario?
All comments are welcome. Thanks!

One comment before I look at possible solutions: your statement "Our purpose is to avoid creating dimension or fact tables (typical data warehouse approach) to save cost at the beginning" is the sort of thinking that can get companies into real trouble. Once you build something and start using it, in 99% of cases you will be stuck with it - so not designing a proper, supportable, reporting solution (whether it is a Kimball model or something else) from the start is always a false economy. If you take a "quick and dirty" approach now you will regret it in a year's time.
With that out of the way, there seem to be 2 issues you need to address:
How to store your data
How to process your data (to produce you metrics and whatever else you want to do with it)
Data Storage
(Probably stating the obvious) Any tables that you create to hold metrics or which will be accessed by BI tools (including direct SQL) I would hold in Snowflake - otherwise you wont get the performance that Snowflake can deliver and there is little point using Snowflake - you might as well be using Athena directly against your S3 buckets.
For your source tables (currently in S3), in an ideal world I would also copy them into Snowflake and treat S3 as your staging area - so once the data has been copied from S3 to Snowflake you can drop the data from S3 (or archive it or do whatever you want to it).
However, if you need the S3 versions of the data for other purposes (and so can't delete it once it has been copied to Snowflake) then rather than keep duplicate copies of the data you could create External Tables in Snowflake that point to your S3 buckets and don't require you to move the data into Snowflake. Query performance against External Tables will be worse than if the tables were within Snowflake, but performance may be good enough for your purposes - especially if they are "just" being used as data sources rather than for analytical queries.
Computation
There are a number of options for the technologies you use to calculate your metrics - which one you choose is probably down to your existing skillset, cost, supportability, etc.
Snowflake functionality - Stored Procedures, External Functions (still in Preview rather than GA, I believe), etc.
External coding tools: anything that can connect to Snowflake and read/write data (e.g. Python, Spark, etc.)
ETL/ELT tool - probably overkill for your specific use case but if you are building a proper reporting platform that requires an ETL tool then obviously you could use this to create your metrics as well as move your data around
Hope this helps?

Best way to set up a new database on a new server which periodically refreshes tables from a live SQL Server?

I need to create a database solely for analytical purposes. The idea here is for it to start off as a 1:1 replica of a current SQL Server database but we will then add in additional tables. The idea here is to be able to have read-write access to a db without dropping anything in production inadvertently.
We would ideally like to set a daily refresh schedule to update all tables in the new tb to match the tables in the live environment.
In terms of the DBMS for the new database, I am very easy - MySQL, SQL Server, PostgreSQL would be great -- I am not hugely familiar with the Google Storage/BigQuery stack but if this is an easy option, I'm open to it.

You could use a standard HA/DR solution with a readable secondary (Availability Groups/mirroring /log shipping).
then have a second database on the new server for your additional tables.

Cloud Storage and BigQuery are not RDBMS services themselves, but could be used in this case to store the backups/exports/dumps from the replica, and then have the analytical work performed on those backups.
Here is an example workflow:
Perform a backup and restore in a different database
Add the new tables in the new database
Export the database as a CSV file on your local machine
Here you could either directly load the CSV file in BigQuery, or upload that file in a Cloud Storage bucket previously created
Query the data
I suggest to take a look at the multiple methods for loading data in BigQuery, as well as the methods for querying against external data sources which may help to determine which database replication/export method might be best for your use case.

Why do we need to mention schema name in pgAdmin4 tool when querying each time

Attached the image which shows errorWhy do we need to mention schema name in pgAdmin4 tool every time when querying into tables but its not mandatory in other databases tools like sqldeveloper.
Do we have any option in pgAdmin4 to pointing out to the specific schema?

Duplicate Schema in Sql Server 2008

I am wanting to duplicate an existing schema with the table structure, but not any of the existing data. Essentially, we are separating two companies that currently share a single schema in the database, and they have the exact same data structure, but we want them in different schemas (for access control purposes).
It is possible to copy the entire table structure of one schema into a new schema without bringing over any of the data?

You can do that in SSMS (Sql Server Management Studio)
Right-click on the database
Script Database as
Create to
File
Do a global search-and-replace in the resulting file, changing your schema name to their desired schema name.
I suggest going forward that you maintain change scripts to apply any needed changes to the DB as the application is further developed. That way, you can just share the change scripts and each apply them when you are ready to upgrade the app version.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Migrate Hive Table Schema - hive

I have a set of Hive databases that were expensive and time consuming to build, whose schemas need to change (in a small way). Is there a tool to migrate the data from the current schema to the new schema?

Related

How is AWS Athena so different from a database

Load daily MySQL DB snapshots from S3 to snowflake

Best way to set up a new database on a new server which periodically refreshes tables from a live SQL Server?

Why do we need to mention schema name in pgAdmin4 tool when querying each time

Duplicate Schema in Sql Server 2008

Categories

Resources