I have an azure synapse workspace that contains a number of pipelines & external tables in the serverless sql pool. all associated with one particular project.
There are another 2-3 completely separate projects on the way that will require a synapse toolset.
Should i create a new workspace, or allow them all to share this one?
What is the best criteria to use to decide?
This is probably a bit of an opinion question which don't tend to do that well on StackOverflow, but that said, I tend to think of Synapse Workspaces as similar to an instance of SQL Server, so historically, why would you have used the same SQL instance?
Generally this was where projects have things have in common, eg same data, similar permissions (AAD) groups, similar HADR requirements etc, so ask yourself those questions.
Bear in mind you can have multiple databases (dedicated and serverless) within a workspace but cross database queries for tables in a dedicated sql pool are only possible via Spark Pools1. This could work in your favour if you require separation. Also bear in mind you can connect multiple storage accounts to the workspace. There is no cost overhead to having multiple workspaces, but there is an admin overhead and there would be a cost implication to duplicating any of your data across multiple lakes, storage accounts and databases.
One example - we're using workspaces for environments for example where there aren't separate dev, test, uat Azure subscriptions.
So a few things to consider.
1 import the two tables as dataframes then join them in a Synapse notebook as per this example
Related
I have N databases, for example 10 databases.
Every database has the same schema, but different data.
Now i would like to take every data of each database from the table "Table1" and insert them in a common table in a new database "DWHDatabase" in a table named Table1Common.
so it's an insert like n to 1.
How i can do that? i'm trying to solve my issues with the elastic queries but seems it's a 1 to 1 stuff
Use Azure Data Factory with Linked Services to each database. Use the Copy activity to load the data.
You can also paramaterize the solution.
Parameterize linked services
Parameters in Azure Data Factory by Catherine Wilhemsen
Elastic query is best suited for reporting scenarios in which the majority of the processing (filtering, aggregation) may be done on the external source side. It is unsuitable for ETL procedures involving significant amounts of data transfer from a distant database (s). Consider Azure Synapse Analytics for large reporting workloads or data warehousing applications with more sophisticated queries.
You may use the Copy activity to copy data across on-premises and
cloud-based data storage. After you've copied the data, you may use
other actions to alter and analyse it. The Copy activity may also be
used to publish transformation and analysis findings for use in
business intelligence (BI) and application consumption.
MSFT Copy Activity Overview: Here.
Currently our team is having a major database management/data management issue where hundreds of databases are being built and used for minor/one off applications where the app should really be pulling from an already existing database.
Since our security is so tight, the owners of these Systems of authority will not allow others to pull data from them at a consistent (App Necessary) rate, rather they allow a single app to do a weekly pull and that data is then given to the org.
I am being asked to compile all of those publicly available (weekly snapshots) into a single data warehouse for end users to go to. We realistically are talking 30-40 databases each with hundreds of thousands of records.
What is the best way to turn this into a data warehouse? Create a SQL server and treat each one as its own DB on the server? As far as the individual app connections I am less worried, I really want to know what is the best practice to house all of the data for consumption.
What you're describing is more of a simple data lake. If all you're being asked for is a single place for the existing data to live as-is, then sure, directly pulling all 30-40 databases to a new server will get that done. One thing to note is that if they're creating Database Snapshots, those wouldn't be helpful here. With actual database backups, it would be easy to build a process that would copy and restore those to your new server. This is assuming all of the sources are on SQL Server.
"Data warehouse" implies a certain level of organization beyond that, to facilitate reporting on an aggregate of the data across the multiple sources. Generally you'd identify any concepts that are shared between the databases and create a unified table for each concept, then create an ETL (extract, transform, load) process to standardize the data from each source and move it into those unified tables. This would be a large lift for one person to build. There's plenty of resources that you could read to get you started--Ralph Kimball's The Data Warehouse Toolkit is a comprehensive guide.
In either case, a tool you might want to look into is SSIS. It's good for copying data across servers and has drivers for multiple different RDBMS platforms. You can schedule SSIS packages from SQL Agent. It has other features that could help for data warehousing as well.
I need to grant permissions to a remote development team so they can copy schema changes on a database to their local dev instances. I see many posts similar to this, but they seem to focus on what is required in the destination server, rather than rights to read everything necessary on the source.
Currently, the user is in the db_datareader role and while they seem to be able to read a good portion of the table structure, configuration items such as defaults seems to be obscured, and stored proc and view definitions don't seem to be available, either.
I need the team to be able to copy from our Test/UAT instance, but I don't want them to be able to modify it. They should already have sa access to their local dev instances.
I need to grant permissions to a remote development team so they can copy schema changes on a database to their local dev instances.
I think you can using Azure SQL database Data Sync.
Data Sync is useful in cases where data needs to be kept up-to-date across several Azure SQL databases or SQL Server databases. Here are the main use cases for Data Sync:
Hybrid Data Synchronization: With Data Sync, you can keep data
synchronized between your on-premises databases and Azure SQL
databases to enable hybrid applications. This capability may appeal
to customers who are considering moving to the cloud and would like
to put some of their application in Azure.
Distributed Applications: In many cases, it's beneficial to separate
different workloads across different databases. For example, if you
have a large production database, but you also need to run a
reporting or analytics workload on this data, it's helpful to have a
second database for this additional workload. This approach minimizes
the performance impact on your production workload. You can use Data
Sync to keep these two databases synchronized.
Globally Distributed Applications: Many businesses span several
regions and even several countries/regions. To minimize network
latency, it's best to have your data in a region close to you. With
Data Sync, you can easily keep databases in regions around the world
synchronized.
Data Sync is based around the concept of a Sync Group. A Sync Group is a group of databases that you want to synchronize.
A Sync Group has the following properties:
The Sync Schema describes which data is being synchronized.
The Sync Direction can be bi-directional or can flow in only one
direction. That is, the Sync Direction can be Hub to Member, or
Member to Hub, or both.
The Sync Interval describes how often synchronization occurs.
The Conflict Resolution Policy is a group level policy, which can be
Hub wins or Member wins.
For more detail, please see Overview of SQL Data Sync.
With Data sync, you can set your Azure SQL database as Hub database, teams local dev instances as member database, set Sync Direction to 'Hub to Member'.
Then you can sync the schema changes on a database to their local dev instances manually or automatically. Reference: Tutorial: Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises
Hope this helps.
GRANT VIEW DEFINITION was what I needed.
Not sure how I didn't stumble on that in my searches, but there it is.
Introduction:
Hi, we use SQL Server 2016 Servers for App DBs (437 Applications) in our corporation. We have other environments (Such as HCM etc.,) from which data has to be available for these applications.
To meet this requirement, we have a shared DB with name CentralRepository to which data flows from other environments and make it available for these DBs.
Problem Description:
Now we are trying to migrate few of the critical applications(26) to Azure servers. Hence, we have to move the CentralRepository as well to make sure the necessary data is available for the applications. But moving the whole DB is waste of resources as we don't require the tables necessary for rest 410 DBs. Hence, we are planning to move the data necessary for these 26 applications i.e., around 110+ tables out of hundreds of tables.
I would like to know if there is any way we can do that other than using Import/Export Wizard (tough to move 110 tables data to all the Azure environments) or complete DB restoration (as it is relatively very huge DB).
It would be very helpful, if you can suggest work around this problem. Thanks in advance :)
Good morning,
I am using an asp.net framework with an azure client database.
I am now creating another server on Azure to host databases. On this server, for each customer registering on the website (for which 1 entry is created in my first database), I need to create a database with 8 tables - identical for each customer.
What would be the best thing to map the ASP.NET ID to a new database? Which framework would you recommend?
Thanks
Rather than running a VM where you're going to have to manage a SQL Server installation and write a bunch of code to handle a database per tenant scenario, I highly, highly, highly recommend taking a look at Azure SQL's multi-tenant sharding support. All of this code is already written for you. And it's not that you're paying for one DB per client - check out elastic pooling.
You can read the docs here.
Also note, this option will scale very well.
I have done this three different ways: a database per client where I wrote my own code to manage sharding, a single database with a separate schema per client (a huge pain in the rear), and using Azure SQL sharding support. It's not just the issue of correctly separating client data. You also need to think about querying for reporting across all client databases, and managing schema changes. Under the first two options, if you change a schema, you get to modify N client databases. Azure SQL's sharding tools will manage this for you.