In order to test part of my SSIS process, I want to simulate part of the SSAS process failing.
The Package runs several processing steps in OLAP and we want to be sure that it will run even in the case of a partial failure.
How can I simulate this?
Since I'm assuming you aren't doing this testing in your production environment, you could temporarily drop one of the tables/views that your cube depends on.
Depending on how you trap errors you could remove some dimension keys from the fact table.
Related
I have an application that is driven by a Microsoft Analysis Services Multidimensional Cube.
Users access the application and load data periodically to the underlying SQL database.
The cube is processed fully over night so users may see updated data the next day.
Users may also kick off a cube partition process via the interface if they need data to be available in the application sooner
The cube processing is executed in the background using SQL Agent jobs with XMLA scripts.
The partition processing happens in two steps:
Dimension processing
Partition processing
The cube process is one step:
Process Full
Recently, I have run into an issue where users may load a significant amount of data, late in the evening and kick off a partition process that then runs at the same time the cube is being processed. Often this isn't an issue, other than longer running processing, but I have run into failures periodically.
The Ask
Using XMLA, is it possible to modify the cube partition script to only run if any part of the cube is not already being processed?
I am aware that I could probably accomplish this with SSIS, but it seems overkill if there is a possibility to use straight XMLA.
I'm in a situation where I have a server running sql 2012 with roughly two hundred scheduled jobs (all are SSIS package executions). I'm facing a directive from management where I need to run some custom software to create a bug report ticket whenever a job fails. Right now I'm relying on half the jobs jobs notifying an operator on failure, while the other half do like a "go to step X- send failure email" for each step on failure, where "step X" is some sql that queries the DB and sends out an email saying which job failed at which step.
So what I'm looking for is some universal solution where I can have every job do the same thing when it fails (in this case, run some program that creates a bug tracking ticket). I am trying to avoid the situation where I manually go into every single job and add a new step at the end, with all previous steps changing to "go to step Y on failure" where step Y is this thing that creates the bug report.
My first thought was to create a new job that queries the execution history tables and looks for unhandled failures and then does the bug report creation itself. However, I already made the mistake of presenting this idea to the manager and was told it's not a viable solution because it's "reactive and not proactive" and also not creating tickets in real-time. I should know better than to brainstorm with non-programming management but it's too late, so that option is off the table and I haven't been able to uncover any other methods.
Any suggestions?
I'm proposing this as an answer, though it's not a technical solution. Present the possible solutions and let the manager decide:
Update all the Agent Jobs - This will take a lot of time and every job will need to be tested, which will also take a lot of time. I'd guess 2-8 weeks depending on how it's done.
Create an error handler job that monitors the logs and creates tickets based on those errors. This has two drawbacks - it is not "real-time" (as desired by the manager) and something will need to be put into place to insure errors are only reported once. This has the upside of being one change to manage. Also it can be made near real time if it were run on the minute.
A third option, which would be more a preliminary step, is to create an error report based off of the logs. This will help to understand the quantity and types of failures. This may help to shape the ultimate solution - do we want all these tickets, can they be broken up into different categories, do we want tickets for errors that are self-healing (i.e. connection errors which have built-in retries)?
Been looking around for quite a bit to see if someone could provide me with any directions and/or tests to fix this issue. Unsuccessful so far.
I'm working on a clients multidimensional cube (they have several in the same warehouse), and have created my own development copy from that exact cube so i don't break anything in production, while developing.
The issue is that whenever i edit my cube, and then deploy it removes the data from the cube, and in some programs the cube disappears all together. The cube itself is still visible in SSMS but contains no data.
I then have to do a full process of the entire database to get data back, which is rather annoying given it takes around 30-40 minutes where i then cannot work on it and its a minor change i've made (such as changing the Order property of a dimension from Name to Key or creating a Measure group)
Some settings/extra info:
When i deploy i have specified the cube to Do Not Process due to some prior processing issues when processing from BIDS
I have a delta process to keep data up to date, that runs continuously and doesn't fail. It moves no data to the failed cube however, but other cubes present works just fine.
In script view the first mdx statement under calculations is a calculate statement as some source suggested could be an issue if not.
It is deployed from VS 2008 (clients version)
Deploying to Localhost
The view upon which some dimensions are built, contain Union statements, but only contain a few records
Scenarios where it fails:
Refresh data source view
Create new dimension
Change dimension properties
Create measure groups
Updating dimensions
Properly more that i either haven't tested or can't remember
Does anyone have any idea of the issue and how to fix it? I really appreciate it if someone could point me in the right direction. I haven't found a solution yet.
Well, this is expected behaviour. SSAS creates aggregations during processing; in case the structure of the cube/dimension is changed then the existing aggregations become invalid and the entire cube goes into the "Unprocessed" state. As you have found out yourself you need to do the full process then to be able to browse the cube.
Here's a blog post with the list of actions and their effect on the state of the cube: http://bimic.blogspot.com/2011/08/ssas-which-change-makes-cubedimension.html
I suggest you create a small data set for the development purposes and test the cube on that data before moving to production. You can also limit the data loaded to the cube by switching to the query (instead of the table) in the partition designer; in the query you can then use WHERE condition to limit the records loaded to the cube and make the processing go faster.
I am having a strange problem when building a cube on SSAS. I have a fact table, let's say FactActivity. Then I have a dimension DimActivity, which has a 1 to 1 relationship with this fact, and all the foreign keys are bound to the dimension. So date dimensions, product dimensions and so on, are all bound to the DimActivity.
When I build the whole cube, it seems it is building the fact before the dimension, therefore it gives me errors. If I however, manually build the dimension before the fact, it works.
Is there anywhere in the SSAS that I can configure the build order, other than doing this from SSIS with the use of the Analysis Services Processing Task?
Many thanks!
Processing a cube will not process the dimensions it relates to because they are constructed as separate entities in SSAS. In practice, this means that a dimension can exist, be processed and accessed without a relationship to a cube.
There is no such thing as a "general build order to configure". It is up to you to decide how AS objects should be processed. There are many tools that facilitate this, and they will all do the same thing: construct XMLA scripts to run on the AS server.
SSIS: Analysis Services Processing task
Configure a SQL agent job.
Perform a manual process using SSMS.
Program your processing activities using AMO
...
Important is that you should process your dimensions before you process your cube. A simple solution is to process the entire SSAS database (containing your cubes and dimensions). This way, SSAS will automatically process the dimensions before processing the cubes.
Documentation on processing Analysis Services objects
When Processing a Dimension or the whole cube, before you click 'Run', click the 'Change Settings...' button. There you can change the way it should process. This link describes the effect of the options available.
http://technet.microsoft.com/en-us/library/ms174774.aspx
HTH
For others who are encountering similar problems....
The reason I was getting occasionally cube processing errors, is that the refreshing was happening at the same time - due to scheduled hourly imports.
I am now using logs to see what SSIS package is running. When importing activity, I inserted a record into this table, with a "Running" status.
Before processing the cube, I have a semaphore to check if records in this table, which are data imports and have a "Running" status. I only allow the refresh of the cube to happen if no imports are currently running. When the cube is processing, the imports also have a semaphore, and will not start importing, unless no cube processing is currently "Running".
After implementing this logic, I've never gotten any errors when processing the cubes.
I have a number of stored procs which I would like to all run simultaneously on the server. Ideally all on the server without reliance on connections to an external client.
What options are there to launch all these and have them run simultaneously (I don't even need to wait until all the processes are done to do additional work)?
I have thought of:
Launching multiple connections from
a client, having each start the
appropriate SP.
Setting up jobs for
each SP and starting the jobs from a
SQL Server connection or SP.
Using
xp_cmdshell to start additional runs
equivalent to osql or whetever
SSIS - I need to see if the package can be dynamically written to handle more SPs, because I'm not sure how much access my clients are going to get to production
In the job and cmdshell cases, I'm probably going to run into permissions level problems from the DBA...
SSIS could be a good option - if I can table-drive the SP list.
This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8-way 32GB machine, so I'm going to load it down and scale it back if I see problems.
I basically have three layers, Layer 1 has a small number of processes and depends on basically all the facts/dimensions already being loaded (effective, the stars are a Layer 0 - and yes, unfortunately they will all need to be loaded), Layer 2 has a number of processes which depend on some or all of Layer 1, and Layer 3 has a number of processes which depend on some or all of Layer 2. I have the dependencies in a table already, and would only initially launch all the procs in a particular layer at the same time, since they are orthogonal within a layer.
Is SSIS an option for you? You can create a simple package with parallel Execute SQL tasks to execute the stored procs simultaneously. However, depending on what your stored procs do, you may or may not get benefit from starting this in parallel (e.g. if they all access the same table records, one may have to wait for locks to be released etc.)
At one point I did some architectural work on a product known as Acumen Advantage that has a warehouse manager that does this.
The basic strategy for this is to have a control DB with a list of the sprocs and their dependencies. Based on the dependencies you can do a Topological Sort to give them an order to run in. If you do this, you need to manage the dependencies - all of the predecessors of a stored procedure must complete before it executes. Just starting the sprocs in order on multiple threads will not accomplish this by itself.
Implementing this meant knocking much of the SSIS functionality on the head and implementing another scheduler. This is OK for a product but probably overkill for a bespoke system. A simpler solution is thus:
You can manage the dependencies at a more coarse-grained level by organising the ETL vertically by dimension (sometimes known as Subject Oriented ETL) where a single SSIS package and set of sprocs takes the data from extraction through to producing dimensions or fact tables. Typically the dimensions will mostly be siloed, so they will have minimal interdependency. Where there is interdependency, make one dimension (or fact table) load process dependent on whatever it needs upstream.
Each loader becomes relatively modular and you still get a useful degree of parallelism by kicking off the load processes in parallel and letting the SSIS scheduler work it out. The dependencies will contain some redundancy. For example an ODS table may not be dependent on a dimension load being completed but the upstream package itself takes the components right through to the dimensional schema before it completes. However this is not likely to be an issue in practice for the following reasons:
The load process probably has plenty of other tasks that can execute in the meantime
The most resource-hungry tasks will almost certainly be the fact table loads, which will mostly not be dependent on each other. Where there is a dependency (e.g. a rollup table based on the contents of another table) this cannot be avoided anyway.
You can construct the SSIS packages so they pick up all of their configuration from an XML file and the location can be supplied exernally in an environment variable. This sort of thing can be fairly easily implemented with scheduling systems like Control-M.
This means that a modified SSIS package can be deployed with relatively little manual intervention. The production staff can be handed the packages to deploy along with the stored procedures and can mainain the config files on a per-environment basis without having to manually fiddle configuration in the SSIS packages.
you might want to look at the service broker and it's activation stored procedures... might be an option...
In the end, I created a C# management console program which launches the processes Async as they are able to be run and keeps track of the connections.