how to get my Azure Synapse database and table not to always be lower cased - azure-data-lake

I'm running the following in my Synapse pyspark notebook to create a database and table:
%%sql
CREATE DATABASE IF NOT EXISTS Database1 LOCATION '/Database1';
CREATE TABLE IF NOT EXISTS Database1.Table1(Column1 int) USING CSV OPTIONS (header=true);
Then I see 'database1' and 'table1' in the studio ui. BTW, I see 'Database1/table1' as the folder names in ADLS.
Is there a way to preserve the case for both of these?

running this seems to do the trick: spark.conf.set('spark.sql.caseSensitive', True)

Related

How Paramterize Copy Activity to SQL DB with Azure Data Factory

I'm trying to automatically update tables in Azure SQL Database from another SQLDB with Azure Data Factory. At the moment, the only way to update the table Azure SQL Database is to physically select the table you want to update in Azure SQL Database, as shown here:
My configuration to automatically select a table the SQLDB that I want to copy to Azure SQL Database is as follows:
The parameters are as follows:
#concat('SELECT * FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
Can someone let me know how to configure my SINK and/or connection to automatically insert the table selected from SOURCE.
My SINK looks like the following:
And my connection looks like the following:
Can someone let me know how to configure my SINK and/or connection to
automatically insert the table selected from SOURCE.
You can use Edit option in the SQL dataset.
Create a dataset parameter for the sink table name. In the SQL sink dataset check the Edit checkbox in it and use the dataset parameter. If you want, you can use dataset parameter for the database name also. Here I have given directly (dbo).
Now in the copy activity sink, you can give the table name dynamically from any pipeline parameter (give your parameter in this case) or any variable using the dynamic content.
Also, enable the Auto create table which will create new table if the table with the given name not exists and if it exists it ignores creation and copies data to it.
My sample result:

How to use CETAS (Synapse Serverless Pool) in dbt?

In Synapse Serverless Pool, I can use CETAS to create external table and export the results to the Azure Data Lake Storage.
CREATE EXTERNAL TABLE external_table
WITH (
LOCATION = 'location/',
DATA_SOURCE = staging_zone,
FILE_FORMAT = SynapseParquetFormat
)
AS
SELECT * FROM table
It will create an external table name external_table in Synapse and write a parquet file to my staging zone in Azure Data Lake.
How can I do this in dbt?
I was trying to do something very similar and run my dbt project with Synapse Serverless Pool, but ran into several issues. Ultimately I was mislead by CETAS. When you create the external table it creates a folder hierarchy, in which it places the parquet file. If you were to run the same script like the one you have as an example it fails because you cannot overwrite with CETAS. So dbt would be able to run it like any other model, but it wouldn't be easy to overwrite. Maybe if you dynamically made a new parquet every time the script is run and deleted the old one, but that seems like putting a small bandage on the hemorrhaging wound that is the synapse and severless pool interaction. I had to switch up my architecture for this reason.
I was trying to export as a parquet to maintain the column datatypes and descriptions so I didn't have to re-schematize. Also so I could create tables based of incremental points in my pipeline. I ended up finding a way to pull from a database that already had the datatype schemas, using the dbt-synapse adapter. Then if I needed an incremental table, I could materialize it as a table via dbt and dbt-synapse and access it that way.
What is your goal with the exported parquet file?
Maybe we can find another solution?
Here's the dbt-synapse-serverless adapter github where it lists caveats for serverless pools.
I wrote a materialization for CETAS (Synapse Serverless Pool) here: https://github.com/intheroom/dbt-synapse-serverless
It's a forked from dbt-synapse-serverless here: https://github.com/dbt-msft/dbt-synapse-serverless
Also you can use hooks in dbt to use CETAS.

Bulk copy multiple csv files from Blob Container to Azure SQL Database

Environment:
MS Azure:
Blob Container, multiple csv files saved in a folder. This is my source.
Azure Sql Database. This is my target
Goal:
Use Azure Data Factory and build a pipeline to "copy" all files from the container and store them in their respective tables in the Azure Sql database by automatically creating those tables.
How do I do that? I tried following this but I just end up having tables incorrectly created in the database, where table is created with a single column having same name as the table name.
I believe I followed the instructions from that link pretty must as they are.
My CSV file is as follows, one column contains the table name.
The previous steps will not be repeated,it is the same as the link.
At Step3 inside the Foreach activity, we should add a Lookup activity to query the table name from the source dataset.
We can declare a String type variable tableName pervious, then set the value via expression #activity('Lookup1').output.firstRow.tableName.
At sink setting of the Copy activity, we can key in #variables('tableName').
ADF will auto create the table for us.
The debug result is as follows:

How to rename a database in azure databricks?

I am trying to rename a database in azure databricks but I am getting the following error:
no viable alternative at input 'ALTER DATABASE inventory
Below is code:
%sql
use inventory;
ALTER DATABASE inventory MODIFY NAME = new_inventory;
Please explain what is meant by this error "no viable alternative at input 'ALTER DATABASE inventory"
and how can I solve it
It's not possible to rename database on Databricks. If you go to the documentation, then you will see that you can only set DBPROPERTIES.
If you really need to rename database, then you have 2 choices:
if you have unmanaged tables (not created via saveAsTable, etc.), then you can produce SQL using SHOW CREATE TABLE, drop your database (be careful anyway), and recreate all tables from saved SQL
if you have managed tables, then the solution would be to create new database, and either use CLONE (only for Delta tables), or CREATE TABLE ... AS SELECT for other file types, and after that drop your database
Alex Ott's answer, to use Clone, is OK if you do not need to maintain the versioning history of your database when you rename it.
However if you wish to time travel on the database of Delta tables after the renaming, this solution works:
Create your new database, specifying its location
Move the file system from the old location to the new location
For each table on the old database, create a table on the new database, based on the location (my code relies on the standard file structure of {database name}/{table name} being observed). No need to specify schema as it's just taken from the files in place
Drop old database
You will then be left with a database with your new name, that has all of the data and all of the history of your old database, i.e. a renamed database of Delta tables.
Pyspark method (on databricks, with "spark" and "dbutils" already defined by default) :
def rename_db(original_db_name, original_db_location, new_db_name, new_db_location):
spark.sql(f"create database if not exists {new_db_name} location '{new_db_location}'")
dbutils.fs.mv(original_db_location,new_db_location,True)
for table in list(map(lambda x: x.tableName, spark.sql(f"SHOW TABLES FROM {original_db_name}").select("tableName").collect())):
spark.sql(f"create table {new_db_name}.{table} location '{new_db_location}/{table}'")
spark.sql(f"drop database {original_db_name} cascade")
return spark.sql(f"SHOW TABLES FROM {new_db_name}")

SQL Server : creating a table

When creating a table in SQL Server, it is created using dbo.<tableName> format.
I want to change dbo.tableName to source.tableName, as I want to import data into a source table and then cook that data.
Thanks
You are talking about schemas. If the schema source doesn't exist yet, you need to run create schema source. Once the schema exists it's as easy as create table source.tableName (...).