Does Liquibase support DynamoDb for code deployment/version control similar to how liquibase supports relational database deployments.
If Liquibase does not support DynamoDB what other tool could be used against DynamoDB for database deployments/version control.
Related
I want to take a monthly / quarterly backup of both Hive metadata and Hive data at once for more than 1000 tables with easy restoring capability. So far, I found below options but not sure which is best for backing up Hive tables in production. Any tips ?
Apache Falcon - http://saptak.in/writing/2015/08/11/mirroring-datasets-hadoop-clusters-apache-falcon
Pro: Easily available as a service in Ambari for install
Con: No community support
Hortonworks Data flow - https://docs.hortonworks.com.s3.amazonaws.com/HDPDocuments/Ambari-2.7.4.0/bk_ambari-upgrade-major/content/prepare_hive_for_upgrade.html
Pro: Latest
Con: No much documentation to test. Please share any resources of how to backup with Hortonworks data flow
Other ways - Hive data backup With Distcp, Export/Import, Snapshots and hive metadata backup using relational database dumps
Con: Not sure if both Hive data and Hive metadata get backed-up at same time. Time-taking to implement a monthly / quarterly scheduler.
I have been using AWS Cloudformation and Terraform to manage cloud infrastructure as code (IAC). The benefits are obvious.
1) Template file to concisely describe your infrastructure
2) Versioning
3) Rollbacks
I also have a PostgreSQL DB where I can dump the schema into a single file. Now, it would be amazing if I could edit a dumped SQL file like I do a (IFC) template. I could then validate my new SQL template and apply changes to my DB with the same workflow as Cloudformation or Terraform.
Does anyone know if a tool like this exists for any of the various SQL providers?
Have you given Flyway a try?
It supports versioning database migrations as well as rolling back and undoing migrations when needed. It also keeps a schema table in the database that tracks which migrations have been applied to that database, so that you can continuously deploy new scripts and changes to an existing application that is using Flyway.
We are planning to offload events from Kafka to S3 (e.g via using kafka connect). The target is to spin up a service (e.g. like amazon Athena) and provide a query interface on top of the exported avro events. The obstacle is that amazon Athena avro SerDe (uses org.apache.hadoop.hive.serde2.avro.AvroSerDe) does not support the magic bytes that schema registry is utilising for storing the schema id. Do you know of any alternative that can play nice with confluent schema registry?
Thanks!
Using S3 Connect's AvroConverter does not put any schema ID in the file. In fact, after the message is written, you lose the schema ID entirely.
We have lots of Hive tables that are working fine with these files, and users are querying using Athena, Presto. SparkSQL, etc.
Note: If you wanted to use AWS Glue, S3 Connect doesn't (currently, as of 5.x) offer automatic Hive partition creation like the HDFS Connector, so you might want to look for alternatives if you wanted to use it that way.
I am doing some research on how to prevent data loss during migration and stumbled upon Liquibase.
How does Liquibase handle data loss?
Is there any loss of data when we use Liquibase in data migration? (Drop of Index/Column etc)
Thanks
That's not the goal of Liquibase, which is designed to handle schema lifecycle for an application: create table, index, columns, drop table, etc (DDL)
Liquibase deals with data only for initialization or configuration (best practice).
If you want to migrate data from one database to another you can use the editor tools to export/import (if target schema is the same).
Otherwise, you can use ETL tools like Talend for example.
AWS also offer tools to do so in their cloud environment.
My requirement is to migrate data from teradata database to Google bigquery database where table structure and schema remains unchanged. Later, using the bigquery database, I want to generate reports.
Can anyone suggest how I can achieve this?
I think you should try TDCH to export data to Google Cloud Storage in Avro format. TDCH runs on top of hadoop and exports data in parallel. You can then import data from avro files into BigQuery.
I was part of a team that addressed this issue in a Whitepaper.
The white paper documents the process of migrating data from Teradata Database to Google BigQuery. It highlights several key areas to consider when planning a migration of this nature, including the rationale for Apache NiFi as the preferred data flow technology, pre-migration considerations, details of the migration phase, and post-migration best practices.
Link: How To Migrate From Teradata To Google BigQuery
I think you can also try to use cloud composer(apache airflow) or install apache airflow in instance.
If you can open the ports from Teradata DB then you can run 'gsutil' command from there and schedule it via airflow/composer to run the jobs on daily basis. Its quick and you can leverage the scheduling capabilities of airflow.
BigQuery introduced Migration Service which is a comprehensive solution for migrating the data warehouse to BigQuery. It includes free to use tools that help with each phase of migration including assessment and planning to execution and verification.
Reference:
https://cloud.google.com/bigquery/docs/migration-intro