Is there a way to generate redshift sql? - sql

I have a visio .vdx for the design of the my data warehouse with Lucidchart. Is there a way to generate redshift sql from that ?
What would be the best tool to work with Redshift data modeling ?
If those sql generator can generate tables for special visio stencil, like http://www.visualdatavault.com

Amazon Redshift is (mostly) compatible with PostgreSQL, so any tool that can introspect PostgreSQL tables should work with Redshift.
One things to note -- constraints and foreign keys are not enforced in Redshift.

Related

How to createOrReplaceTempView in Delta Lake?

I want to use Delta Lake tables in my Hive Metastore on Azure Data Lake Gen2 as basis for my company's lakehouse.
Previously, I used "regular" hive catalog tables. I would load data from parquet into a spark dataframe, and create a temp table using df.CreateOrReplaceTempView("TableName"), so I could use Spark SQL or %%sql magic to do ETL. After doing this, I can use spark.sql or %%sql on the TableName. When I was done, I would write my tables to the hive metastore.
However, what If I don't want to perform this saveAsTable operation, and write to my Data Lake? What would be the best way to perform ETL with SQL?
I know I can persist Delta Tables in the Hive Metastore through a multitude of ways, for instance by creating a Managed catalog table through df.write.format("delta").saveAsTable("LakeHouseDB.TableName")
I also know that I can create a DeltaTable object through the DeltaTable(spark, table_path_data_lake), but then I can only use the Python API and not sql.
Does there exist some equivalent of CreateOrReplaceTempView(), or is there a better way to achieve ETL with SQL without 'writing' to the data lake first?
However, what If I don't want to perform this saveAsTable operation, and write to my Data Lake? What would be the best way to perform ETL with SQL?
Not possible with Delta Lake since it relies heavily on a transaction log (_delta_log) under the data directory of a delta table.

Azure Databricks INFORMATION_Schema

I am using Azure Databricks and need to have a way to find out which columns are allowed to be NULL in several tables. For MySQL there is the well-known Information_Schema which does not exist in Databricks.
My idea was now to use the Spark SQL to create a schema from there. I am now wondering if this is an equivalent way to generate the Information Schema? My approach looks like this:
df = spark.sql("Select * from mytable")
df.schema
Any comment would be much appreciated!
By default, in Spark any column of the Dataframe could be null. If you need to enforce that some data should be not null, then you either use the code to check before writing the data, or you can use constraints supported by Delta tables, like NOT NULL, or CHECK (for arbitrary conditions). With these constraints, Spark will check data before writing, and will fail if data doesn't match the given constraint, like this:
P.S> You can get more information about table's schema & these constraints if you use SQL commands like describe table or describe table extended.

Optimize Temporary Table on Presto/Hive SQL

I would like to optimize my computation time for queries ran on PRESTO/HIVE SQL. One of the techniques I used to do on Redshift was to improve efficiency of temporary tables as in the following :
BEGIN;
CREATE TEMPORARY TABLE my_temp_table(
column_a varchar(128) encode lzo,
column_b char(4) encode bytedict)
distkey (column_a) -- Assuming you intend to join this table on column_a
sortkey (column_b) -- Assuming you are sorting or grouping by column_b
;
INSERT INTO my_temp_table SELECT column_a, column_b FROM my_table;
COMMIT;
I have tried that on Presto/Hive SQL but it is not supported. Do you please know the equivalent of this technique on Presto/Hive SQL?
Many thanks!
Redshift is relational database, Presto is a distributed SQL Query Engine. Presto currently doesn't support the creation of temporary tables and also not the creation of indexes. But you may create tables based on a SQL statement via CREATE TABLE AS - Presto Documentation
You optimize the performance of Presto in two ways:
Optimizing the query itself
Optimizing how the underlying data is stored
One of the best articles around is Top 10 Performance Tuning Tips for Amazon Athena - Athena is a AWS Service based on Presto 0.172 and therefore the tips should also work for Presto.
I am not a Redshift expert but it seems you want to precompute a data set, distributing it and sorting by selected columns, so that it is faster to query.
This corresponds to Presto Hive connector ability to:
partition data -- data with same value in partitioning column(s) will form a single partition, which is a folder on storage; do not use partitioning on high cardinality columns. This is defined using partitioned_by table property
bucket data -- data is grouped in files using hash of bucketing column(s); this is similar to partitioning to a certain extent. This is defined using bucketed_by and bucket_count table properties.
sort data -- within data file, data is sorted by given column(s). This is defined using sorted_by table property.
See examples in Trino (formerly Presto SQL) Hive connector documentation
Note: while i realize documentation is scarce at the moment, i filed an issue to improve it. In the meantime, you can get additional information on Trino (formerly Presto SQL) community slack.

How to extract the SQL Create statement from ignite

Using Apache Ignite, is it possible to extract the CREATE statement used to create the table? You can do this in MySQL with the SHOW CREATE TABLE x command for example.
I don't think dumping DML (database structure) is possible currently. Especially since CREATE TABLE is only one way of making tables in Ignite out of three.
However, you can query tables, schemas and indexes via JDBC metadata introspection feature.

Tool for comparing two databases and creating ALTER scripts to sync them

I have two IBM DB2 databases A and B with different database schemas. Database A schema is older, and database B schema is newer. I would neet to create SQL alter scripts that can update A schema to match that of B schema. This can ofcourse be done manually, but is there a tool that could analyse the two databases and do this for me?
I am using the free IBM Data Studio client for querying the database. Can the above operation be done using this tool?
Redgate sql compare is one of tge bests.