Can anyone please help me understand what components/services does Azure Synapse Analytics include?
From what I have read from both Microsoft website and other reviews, it says it is the new SQL Data Warehouse, however, it also says it brings together all these : data ingestion (like azure data factory), data warehouse, and big data analytics (like data lake)?
So what components exactly does a Azure Synapse Analytics include when you purchase it?
Thanks.
Azure Synapse Analytics service currently (as of 6th May 2020) refers to Azure SQL Data Warehouse, more specifically to "gen2" version of it. Microsoft released in November 2019 in Ignite'19 event the new name "Azure Synapse Analytics" and upcoming features for the service. The new features are currently available only in private preview, but I would assume they will be released in public preview soon. Access to new users to private preview is already closed, even though some Microsoft material still hints that you could apply to it.
You can already find information about the new features in documentation and other materil. The confusing part is that you cannot find them in portal yet if you are not part of the private preview. This makes it really hard for new users currently understand what really is available and what is not.
Good start to information on situation and features of both versions this can be found here:
Blog post Azure SQL Data Warehouse is now Azure Synapse Analytics
SQL DW documentation
Synapse new features documentation
Microsoft has made the release of this update very confusing. I assume they wanted to communicate early in Ignite'19 that they will have a competitive offering coming. Compared to some other cloud native data warehousing solutions the old version of Azure DW clearly were behind in many areas, e.g. in flexible scalability. The new Synapse Analytics capabilities look good and can bring Microsoft back to lead in this area.
Related
We have recently implemented Azure Synapse Workspace into our reporting landscape. The purpose of this Synapse Workspace is to store Dynamics data to be reported on by Power BI. We were using the Data Export Services mechanism to move data from Dynamics to Azure SQL Server and report from there but MS have deprecated DES and we now use Synapse as the substitute.
I have found, and this happens 50% of the time during working hours, that when I amend a Power BI report and the Power Query element re-evaluates itself, the re-evaluation seeks an update from the source - which is the Azure Synapse Workspace, the re-evaluation fails. I get an 'Operating system error code 12'. The error message is below with sensitive text scrubed.
Click this to show the error message received
Having Googled the error, it tells me a read (i.e. a Power BI refresh/re-evaluation) cannot take place against the Synapse Workspace if Synapse is updated being by Dynamics during the same time.
Is this correct? I can't believe MS would devise a DES replacement that cannot be read from if it is being updated from the source. The source (Dynamics) will be updated throughout the working day and so this would mean that no one can read from Synapse during the working day.
I'm wondering if further configuration is required within Synapse to allow reads.
If you can confirm what I'm facing is correct and/or advise me on how to remedy this it will be greatly appreciated.
Thanks.
I'm stumped on what to do. I have verified that when the Power BI report fails its refresh/re-evaluation the failing entity has actually been updated in the Synapse Workspace's CSV file. So the Google explanation seems to be correct.
Can anyone share a Kusto query (KQL) that I can use in log analytics that would return some usage tracking stats?
I am trying to identify which "Views" and "Tables" are used the most. Also trying to find out who the power users are and commands/query that is run against the "Tables".
Any insights would be appreciated.
You can use below functions to gather the useage statics
DiagnosticMetricsExpand()
DiagnosticLogsExpand()
ActivityLogRecordsExpand()
And create target tables to store the function data to analyse the useage information.
Refer the Azure documentation for complete details https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-no-code?tabs=activity-logs
Tutorial: Ingest monitoring data in Azure Data Explorer without code
In this tutorial, you learn how to ingest monitoring data to Azure Data Explorer without one line of code and query that data.
Azure data lake analytics and azure databricks both can be used for batch processing. Could anyone please help me understand when to choose one over another?
In my humble opinion, a lot of it comes down to existing skillsets. If you have a team experienced in Spark, Java, Python, r or Scala then Databricks is a natural fit. If on the other hand you have a team with existing SQL and c# skills, then the learning curve for them with U-SQL will be less steep.
That aside, there are other questions which can drive out differences:
Do you require realtime interaction (Databricks) or batch mode analytics (both)? Although there is a feedback item for real-time interactivity for U-SQL, please vote.
Do you want a pay-as-you-go model (U-SQL) or clusters with auto-terminate after a certain period (Databricks)?
Do you like working in a notebook (Databricks) or Visual Studio / VSCode / Powershell / .net sdk (U-SQL) method?
Do you want to use Spark libraries like GraphX (Databricks)?
Do you want the ability to run and scale any runtime (U-SQL)? See here for more details.
Do you want a local development emulator (U-SQL)?
The U-SQL emulator in Visual Studio is seamless, ie you develop your code against your local drives in the same structure as your lake (for free), then simply click the drop-down in Visual Studio to run in the cloud. Although I think you can have a local Spark environment, I'm not sure what the local (and disconnected) development experience is for Databricks.
Are you using ADLS Gen 2 (only Databricks)? See here.
UPDATE October 2018:
As far as I am aware, U-SQL does not currently support ADLS Gen 2, which would count against it (happy to be corrected). I will update the post if and when that support is added.
UPDATE January 2019:
U-SQL has not had any meaningful updates since Spring 2018.
HTH
Databricks has more language options that allows professional with different skills to work on the data. Also with databricks you can run jobs with high-performance, in-memory clusters.
In a project, we use data lake more as a storage, and do all the jobs (ETL, analytics) via databricks notebook. Storing data in data lake is cheaper $.
Back to your questions, if a complex batch job, and different type of professional will work on the data you. You may choose a Azure Data Lake + Databricks architecture. Otherwise an Azure Data Lake would satisfied your needs.
Take a look of these 2 articles would help.
https://databricks.com/glossary/data-lake
https://visualbi.com/blogs/microsoft/azure/etl-azure-databricks-vs-data-lake-analytics/
I am new to databases, and have some data stored as entities in Google Cloud Datastore. I would like to be able to analyze and plot this data in a web interface, and it seems like Google Data Studio provides an easy-to-use way to do this. However, I'm a bit confused as to how I can actually use the two interfaces together; it seems like either Google Cloud Storage or Google BigQuery could be a middleman in between, but I'm not sure how this might work. Could anyone advise on whether using Google Data Studio would be the best approach to plotting/analyzing data in Google Cloud Datastore, and if so, offer tips on how I could go about this? There are a large number of tutorials but it seems like none that I've found have explained how to load data from the Datastore into a useable file for Data Studio.
Thanks!
As Graham Polley says, the question is answered here. The workaround to connect Cloud Datastore to Google Data Studio is to first export Datastore entities to BigQuery, as explained in this guide.
Then see this in order to connect Data Studio to BigQuery tables.
Finally in this blog post, there's a tutorial for building a dashboard with Google Data Studio and BigQuery.
Right now, our application only has one Web Site instance along with SQL Database deployed at Azure US datacenter. We are looking for deploying more Web Site instance at other datacenter such as APAC and Europe. There still be a local SQL Database for each of those web site instance. We would like end user could fail over to another instance if his registered instance is not available, such as if US web site instance is down, we could fail over user to Europe instance. With this, we would need to synchronize local SQL Database at all data centers, US, Europe and APAC.
So we are looking for what's best approach to implement the database synchronization here for Azure SQL Database. Here are what we found at this point:
Azure Data Sync, it looks like that it is the perfect choice since it is available right away at Azure Management Portal and it would be up and running with some simple configuration. However there seems couple catches. The feature has been on preview about 2 years now (see this link with the following quote from comment):
SQL Data Sync has been in preview for over 2 years and the last update was December 2012. Has this been abandoned? Is this a technology we should encourage our clients to use? There absolutely needs to be an ability to synchronize data between a local SQL DB and Azure but Microsoft seems to have dropped this and I'm leery of putting a client on this only to find that the plug has been pulled. You owe it to your users to give us some information
I also saw the post Azure data sync not syncing all databases at SO, it seems that this feature is a second class feature at Azure and MS doesn't really pay sufficient attention to it. So I am worried how good it is.
Microsoft Sync Framework, it seems a more generic sync framework and more suitable for client and server sync instead of sync among server database. Plus it is not simple as above SQL Data Sync which is available just by configuration at Azure.
Any other suggestions on sql database sync at Azure? It would be really appreciated if you could share your experience here.
Thanks very much in advance for your insight.
Update:
Azure Data Sync is built upon using Microsoft Sync Framework: see link, the quote:
Microsoft SQL Data Sync is a cloud-based data synchronization service built on the Microsoft Sync Framework technologies.
Since no one is answering this question and I am going to do it myself. Based on some latest information, the Azure Data Sync is buggy and can not be used for production at this point. I guess that's the reason why it never moves out of preview even after around 2 years. There is no other good approach for handling Azure SQL Database sync at this point unless you want to build something yourself.
you can use RedGate Data Compare to sync your Azuresql DB with your Local DB