Dedicated SQL Pool - Workload management - azure-synapse

I have DW200c dedicated pool which allows 8 concurrent queries to run on pool at any point and based on this I would like to have three WORKLOAD groups
(1. For data loads, 2. For data analysis using power bi 3. Direct DB users ).
For Data loads -> Need 50% allocation;
For Data Analysis using power bi -> Need 30% allocation;
For Direct DB users -> Need 20% allocation
Can someone please advise on what values the below parameters can have in each case (When I create a workload group) -
-MIN_PERCENTAGE_RESOURCE
-CAP_PERCENTAGE_RESOURCE
-REQUEST_MIN_RESOURCE_GRANT_PERCENT
-REQUEST_MIN_RESOURCE_GRANT_PERCENT

Related

SSAS Tabular in memory compression

I am testing SSAS tabular on my existing data warehouse. I read that compression of data in memory will be fantastic, up to 10 times. The warehouse weights about 600MB, analytical model has about 60 measures (mostly row counts and basic calculations). In sql server management studio I checked what is the esimated size of analytical database: ~1000MB. Not what I expected (was hoping for 100MB at most).
I check memory usage of msmdsrv.exe process using a simple Resource Monitor. To my surprise, after full processing of the database memory consumption of the msmdsrv process jumped from 200MB to 1600MB. I deployed second instance of the same model connected to the same source and it grew to over 2500MB. So estimated size was in fact correct.
Data Warehouse is quite typical - star schema, facts and dimensions, nothing fancy.
Why was the data not compressed in any way? How is it possible that it takes even more memory than the uncompressed source warehouse?
I will be most grateful for any tips on this mystery :)
You should read and watch Marco Russo materials about vertipaq analyzer. You can find what part of your model take most of your memory.
https://www.sqlbi.com/articles/data-model-size-with-vertipaq-analyzer/
https://www.sqlbi.com/tv/checking-model-size-using-vertipaq-analyzer-in-dax-studio/
And maybe this can get you some light:
https://www.microsoftpressstore.com/articles/article.aspx?p=2449192&seqNum=3
Tabular Model is based on column Store that mean if you have many unique value in column then you get lower compression (for eg. incremental ID column like transactionID).
-> Omit high-cardinality columns where possible
-> Try to split columns when possible If you have DateTime columns, you should split them into two parts (date and time). You have then more reapeted values
-> Sort Order of data in partitions may have affect to compression rate [Run Length Encoding (RLE)]
-> Use measure (it takes no space) instead of a calculated column (it takes up)
Run Length Encoding (RLE)

Process more than 500 million rows to cube without performance issues

i have a huge database, for example :
My customer loads everyday 500 million records of data for sales in a buffer fact table called "Sales". I have to process this sales to my cube in append/update mode but this is destroying the performance even with 186 GB of RAM.
I've already tried to create indexes on the dimension tables, this help a little but not too much.
My customer said that they expect a 15% sales data increment every 6 months...
There is a smart way in order to load this data without to wait too many ours?
I'm using SQL-Server 2016.
Thanks!
You can adapt column store index feature of sql server 2016.
Columnstore indexes are the standard for storing and querying large data warehousing fact tables. This index uses column-based data storage and query processing to achieve gains up to 10 times the query performance in your data warehouse over traditional row-oriented storage. You can also achieve gains up to 10 times the data compression over the uncompressed data size. Beginning with SQL Server 2016 (13.x), columnstore indexes enable operational analytics: the ability to run performant real-time analytics on a transactional workload.
You can have get more idea about this from microsoft link
If you're using a SAN to store your database. You might want to look into some software like Condusiv V-locity to eliminate a lot of the I/O being sent to and received from the database engine.
I might suggest to create a separate database engine, to ship the transaction log over to a separate server and apply transaction logs to the DB every 15 minutes for you to create analytics without using live data. Also the heavy writes to the production DB will not affect your ability to create complex query that locks the table or rows from time to time on your reporting server.

Using PowerBI to visualize large amounts of data on a SQL Data Warehouse

I have a SQL DW which is about 30 GB. I want to use PowerBI to visualize this data, but I noticed PowerBI desktop only supports file size up to 250MB. What is the best way to connect to PowerBI to visualize this data?
You have a couple of choices depending on your use case:
Direct query of the source data
View based aggregations of the source data
Direct Query
For smaller datasets (think in the thousands of rows), you can simply connect PowerBI directly to Azure SQL Data Warehouse and use the table view to pull in the data as necessary.
View Based Aggregations
For larger datasets (think millions, billions, even trillions of rows) you're better served by running the aggregations within SQL Data Warehouse. This can take the shape of view that is creating the aggregations (think sales by hour instead of every individual sale) or you can create a permanent table at data loading time through a CTAS operation that contains the aggregations your users commonly query against. This latter CTAS operation model is a simple select with filter operation for the user (say Aggregated Sales greater than today - 90 days). Once the view or reporting table is created, you can simply connect to PowerBI as you normally would.
The PowerBI team has a blog post - Exploring Azure SQL Data Warehouse with PowerBI - that covers this as well.
You could also create a query (power query - M) that retrieves only the required data level (ie groups, joins, filters, etc). If done right the queries are translated to tsql and only limited amount of data is downloaded into power bi designer

max memory per query

How can I configure the maximum memory that a query (select query) can use in sql server 2008?
I know there is a way to set the minimum value but how about the max value? I would like to use this because I have many processes in parallel. I know about the MAXDOP option but this is for processors.
Update:
What I am actually trying to do is run some data load continuously. This data load is in the ETL form (extract transform and load). While the data is loaded I want to run some queries ( select ). All of them are expensive queries ( containing group by ). The most important process for me is the data load. I obtained an average speed of 10000 rows/sec and when I run the queries in parallel it drops to 4000 rows/sec and even lower. I know that a little more details should be provided but this is a more complex product that I work at and I cannot detail it more. Another thing that I can guarantee is that my load speed does not drop due to lock problems because I monitored and removed them.
There isn't any way of setting a maximum memory at a per query level that I can think of.
If you are on Enterprise Edition you can use resource governor to set a maximum amount of memory that a particular workload group can consume which might help.
In SQL 2008 you can use resource governor to achieve this. There you can set the request_max_memory_grant_percent to set the memory (this is the percent relative to the pool size specified by the pool's max_memory_percent value). This setting in not query specific, it is session specific.
In addition to Martin's answer
If your queries are all the same or similar, working on the same data, then they will be sharing memory anyway.
Example:
A busy web site with 100 concurrent connections running 6 different parametrised queries between them on broadly the same range of data.
6 execution plans
100 user contexts
one buffer pool with assorted flags and counters to show usage of each data page
If you have 100 different queries or they are not parametrised then fix the code.
Memory per query is something I've never thought or cared about since last millenium

Quickly Large Data Pivoting

We are developing a product which can be used for developing predictive models and the slicing and dicing of the data in order to provide BI.
We are having two kind of data access requirements.
For predictive modeling, we need to read data on daily basis and do it row by row. In this the normal SQL Server database is sufficient and we are not getting any issues.
In case of slicing and dicing data of huge sizes like 1GB of data having let us say 300 M rows. We want to pivot that data easily with minimum response time.
The current SQL Database is having response time issues in this.
We like our product to run on any normal client machine with 2GB RAM with Core 2 Duo processor.
I would like to know how should I store this data and then how I can create a pivoting experience for each of the dimension.
Ideally we will have data of let us say daily sales by sales person by region by product for a large corporation. Then we would like to slice and dice it based on any dimension and also be able to perform aggregation, unique values, maximum, minimum, average values and some other statistical functions.
I would build an in-memory cube on top of that data. To give you an example, icCube is having sub-second response time for 3/4 measures over 50M rows on a single core i5 - without any cache or pre-aggregation (i.e., this response time is constant in all the dimensions).
Contact us directly for more details about how to integrate it into your product.
You could also use PowerPivot to do this. This is a free addin for Excel 2010, which would allow large data sets to be handled, sliced+diced, etc.
If you want to code around it, you can connect to the PowerPivot database (effectively an SSAS cube) using the SSAS database connector
Hope that is of some use..