Optimize Reporting in Reporting Services - sql

I have 10 reports and 250 customers. All the reports are ran by my customers and each report take parameters. Depending on the parameters, same report connects to different database and gets result. I know with different parameters caching is not an option. But I dont want to run these reports on live data during day time. Is there anything I can do (snapshot, subscription) that can run overnight and either sends these reports or save a snapshot that could be used for next 24 hours?
Thanks in advance.

As M Fredrickson suggests, subscriptions might work here depending on the number of different reports to be sent.
Another approach is to consolidate your data query to a single shared datasource. Shared datasources can have caching enabled, and there are several options for refreshing that cache, such as on first access or on a timed schedule. See MSDN for more details.
The challenge with a cached datasource is to figure out how to remove all parameters from the actual data query by moving them elsewhere, usually the dataset filter in the report, or into the filters of the individual data elements, such as your tablixes.
I use this approach to refresh a 10 minute query overnight, and then return the report all day long in less than 30 seconds, with many different possible parameters filtering the dataset.
You can also mix this approach with others by using multiple datasets in your report, some cached and some not.

I would suggest going the route of subscriptions. While you could do some fancy hack to get multiple snapshots of a single report, it would be cleaner to use subscriptions.
However, since you've got 250 customers, and 10 different reports, I doubt that you'll want to configure and manage 2,500 different subscriptions within Report Manager... so I would suggest that you create a data driven subscription for each of the reports.

Related

Allowing many users to view stale BigQuery data query results concurrently

If I have a BigQuery dataset with data that I would like to make available to 1000 people (where each of these people would only be allowed to view their subset of the data, and is OK to view a 24hr stale version of their data), how can I do this without exceeding the 50 concurrent queries limit?
In the BigQuery documentation there's mention of 50 concurrent queries being permitted which give on-the-spot accurate data, which I would surpass if I needed them to all be able to view on-the-spot accurate data - which I don't.
In the documentation there is mention of Batch jobs being permitted and saving of results into destination tables which I'm hoping would somehow allow a reliable solution for my scenario, but am having difficulty finding information on how reliably or frequently those batch jobs can be expected to run, and whether or not someone querying results that exist in those destination tables is in itself counting towards the 50 concurrent users limit.
Any advice appreciated.
Without knowing the specifics of your situation and depending on how much data is in the output, I would suggest putting your own cache in front of BigQuery.
This sounds kind of like a dashboading/reporting solution, so I assume there is a large amount of data going in and a relatively small amount coming out (per-user).
Run one query per day with a batch script to generate your output (grouped by user) and then export it to GCS. You can then break it up into multiple flat files (or just read it into memory on your frontend). Each user hits your frontend, you determine which part of the output to serve up to them and respond.
This should be relatively cheap if you can work off the cached data and it is small enough that handling the BigQuery output isn't too much additional processing.
Google Cloud Functions might be an easy way to handle this, if you don't want the extra work of setting up a new VM to host your frontend.

Azure SQL DW - Control Resource Class on Query Level

I am running some ETL on my Azure SQL DW at DW500
so I have 20 concurrency slots available
some of my queries would require RC xlargerc, some largerc, etc
so the expected load can vary from query to query
is there any option to control the assigned RC in the query directly?
e.g. using OPTION or any other hints?
the only workaround I could find so far is to create separate users with different resource classes assigned which is not really feasible
thanks in advance,
-gerhard
There is currently no option to control this at query level. You have to be logged in as the appropriate user with the appropriate resource class (smallrc, mediumrc, largerc, and xlargerc) assigned to them.
DWU500 is pretty low, with max 20 concurrent queries and only 20 concurrency slots. Remember an xlargerc user would take 16 of those slots, as per here, so you could only have 1 other mediumrc user or 4 smallrc users running at the same time. ie you could not have one largerc and one xlargerc user running at the same time. These queries would queue.
Can you tell us a bit more about your scenario? For example, why switch users during ETL? What ETL tool are you using, eg SSIS, Azure Data Factory etc
If you think this is a worthwhile option, consider making a feedback request.

SSRS Caching and/or Snapshot

I am fairly new to SSRS reports so I am looking for guidance. I have SSRS reports that have 3 visible parameters: Manager, Director, and VP. The report will display data based on the parameters selected. Initially, the report was taking a very long time to load and my research led me to creating a snapshot of the report.
The initial load of the report is really quick (~5 secs) but the parameters are set to "Select All" in all sections. When the report is later filtered to say, only 1 VP, the load time can vary anywhere between 20 to 90 seconds. Because this report will be used by all aspects of management within the organization, load time is critical.
Is it possible to load the filtered data quicker? Is there anything I can do?
Any help will be much appreciated.
Thank you!
This is a pretty broad efficiency issue. One of the big questions is whether or not the query takes a long time to run in the database or just in SSRS. Ideally you would start with optimizing the query and indexing, but that's not always enough. So the work has to be done somewhere, all you can do is shift the work to be done before the report is run. Here are a couple options:
Caching
Turn on caching for the report.
Schedule a subscription to run with each possible value for the parameter. This will cause the report to still load quickly once an individual is specified.
Intermediate Table
Schedule a SQL stored procedure to aggregate and index the data in a new table in your database.
Point the report to run from this data for quick reads.
Each option has it's pros and cons because you have to balance where the data preparation work is done. Sometimes you have to try a few options to see what works best for your situation.

Gain a Customized report

Goal:
Display the result based on the picture below in reporting Service 2008 R2.
Problem:
How should I do it?
You also have to remember that in reality the list contains lots of data, maybe miljon
In terms of the report itself, this should be a fairly standard implementation.
You'll need to create one Tablix, with one Group for Customer (one row), one Group for Artist (two rows, one for the headers and one for the Artist name, then a detail row for the Title.
It looks like you need more formatting options for the Customers Textbox - you could merge the cells in the Customer header row, then insert a Rectangle, which will give you more options to move objects around in the row.
For large reports you've got a few options:
Processing large reports: http://msdn.microsoft.com/en-us/library/ms159638(v=sql.105).aspx
Report Snapshots: http://msdn.microsoft.com/en-us/library/ms156325(v=sql.105).aspx
Report Caching: http://msdn.microsoft.com/en-us/library/ms155927(v=sql.105).aspx
I would recommend scheduling a Snapshot overnight to offload the processing to a quiet time, then making sure the report has sensible pagination set up so not too much data has to be handled at one time when viewed (i.e. not trying to view thousands of reports at one time when viewed in Report Manager).
Another option would be to set up an overnight Subscription that could save the report to a fileshare or send it as an email.
Basically you're thinking about reducing the amount of processing that needs to be done at peak times and processing the report once for future use to reduce overall resource usage.
I would use a List with text-boxes inside to for that kind of display.
In addition you may consider to add page break after each customer.
Personally I Experienced Lots of performance issues when dealing with thousands of rows, not to mention millions.
My advise to you is to re-consider the report main target: if the report is for exporting purposes - then don't use the ssrs for that.
If the report is for viewing - then perhaps it is possible to narrow down the data using parameters per user's choice.
Last thing, I wish you Good luck :)

Load balancing weighted reports?

I work for a fleet tracking company and this question is specifically about how I plan to do reports. Let me explain our environment. We have 1x Database, 1x Load Distributing process, and 3x Report Processing servers (let's assume these are equal in every way). When a customer requests a report, all the parameters of that report go in the database. I'm currently working on a load distributing app that will take pending reports from the database and delegate them to the 3 report processing servers that build and email the reports. When a server finishes a report (or an error arises), it notifies the load distributing app. Reports can come in all sizes, from 1 days worth of GPS data for 1 vehicles to 3 months of GPS data for hundreds of vehicles.
I can think of a few ways to do the load balancing but I'm not quite happy with them. I could have each server only do 5 reports at most, but 1 server might get 5 small reports while another gets 5 large reports. I could do a "Round Robin" approach and just hand out the reports sequentially across the servers, but this still doesn't protect against overloading any of the servers.
The best idea I think I have right now is to keep a count of how much GPS data is needed by each report (an easy task to do) and as I assign reports to each server I keep a running total for each server. When a server finishes a report (and notifies the load balancer), subtract that report's amount of GPS data from the running total for that server. This way, I could assign the next report to the server with the smallest amount of GPS data to work with. I could also set a max so that a server cannot get over worked (the problem that is causing us to refactor our whole reports process to begin with). If there are more reports when all servers hit their max, it can just queue them up and attempt them later when the servers finish a few of their reports.
I'm not convinced it's the best approach for finishing reports as quickly as possible. These are just the best I have come up with so far.
How can I optimize my approach to load balancing reports of different sizes across multiple servers?
Assuming that you have only one major table which you select data from, then I would configure one server to do all the big reports first and leave the other two to do smallest to largest. Otherwise big reports might never get done.
For the smaller reports, you want to try, in the absence of anything better, to have them try and run 'similar' reports, meaning those that cluster around similar values in the index mainly used. For example if a server has just completed a report for June 2011, then the next best report to run is same period, not jumping to November 2012. This is dependent on the actual table though, but I am presuming you have lots of date ordered data comprising the bulk of the selection. All you are really trying to do is group reports that are likely to reuse cached indexes/etc as this should give best throughput.
I have a similar scheduling problem, and any queries that are directed to major tables go one server (slow queue) and anything else goes to another ( fast queue), with some exceptions for special cases.