OLAP Cube Parallelism not working as expected - sql

We are having a OLAP SSAS cube setup and the Cube Processing is triggered from SQL Server Agent Job (on SQL Server 2014). The SQL Server Agent Job step is as below:
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Parallel MaxParallel="3">
<Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Object>
<DatabaseID>{DatabaseID}</DatabaseID>
<CubeID>{CubeName}</CubeID>
</Object>
<Type>ProcessFull</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>
</Parallel>
</Batch>
This step is supposed to trigger three parallel measure pull call from the SQL Server databases. But what we could see is it is triggering 10 parallel select queries to the Database. There are few other cubes which has similar parallel settings and fire only 3 queries at a time. The issue of 10 calls are happening only in case of this cube and is happening only in one specific environment. Is there any settings at SSAS cube level which overrides the parallel setting set by the SQL Agent Job?

I think the setting you want is: Maximum Number of Connections in the Data Souce.
https://learn.microsoft.com/en-us/analysis-services/multidimensional-models/set-data-source-properties-ssas-multidimensional?view=asallproducts-allversions
Specifies the maximum number of connections allowed by Analysis Services to connect to the data source. If more connections are needed, Analysis Services will wait until a connection becomes available. The default is 10. Constraining the number of connections ensures that the external data source is not overloaded with Analysis Services requests.
The maxParallel attribute for Parallel element controls the number of processing threads.
https://learn.microsoft.com/en-us/analysis-services/xmla/xml-elements-properties/parallel-element-xmla?view=asallproducts-allversions
Indicates the maximum number of threads on which to run commands in parallel. If not specified or set to 0, the instance of Analysis Services determines an optimal number of threads based on the number of processors available on the computer.

Related

What type of SQL queries must have data on 1 server (can not be distributed to different servers)?

I am evaluating e.g. RediSQL which must hold all data on 1 server in order to execute SQL query. It basically can read data only from that one same server (can not aggregate data from a cluster).
Thinking now on a high level - which use cases / types of queries this will support? (e.g. ST.Dev.... or some heavy mathematical functions) which Must hold all data on 1 server?
So Basically You do not want to share your Sql server to your network? There is no query like this type of validation but you can check it out from this link
How to disable Network Sharing on sql server
https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/enable-or-disable-a-server-network-protocol?view=sql-server-2017

How can I collect SQL commands of an application on server level?

What is the technics if I want to catch/monitor/log/save the native SQL commands of the application developed by us? We have Oracle database.
I have already tried the SQL Developer/Tools/Monitor session function, but it does not include the SQL statements of our apps. RealTime SQL Monitor function contains only a part of the required commands and a lot of useless entries….
Practically what I want:
- „Switch On” the trace function (e.g. in SQL developer or SQL*Plus)
- Launch the application and try some functionalities with real data (e.g. the slow queries)
- As soon as I think I have enough measurement: „Switch Off” the trace function and….
- Start analyzing/tuning the SQL commands (e.g. with SQL developer/Explain Plan, etc.)
1) You can always use awr reports for specified time to know which queries were running in the databases in given time period.
Run awr report using ;
#$ORACLE_HOME/rdbms/admin/awrrpt.sql
AWR captured top sql, but you can increase number of sql captured in awr.
2) Database level trace can capture all sqls run, with execution plans & other stats. You can manually start & stop trace. Once trace is stopped, you can use tkprof to generate readable files. trace always cause bit of performance over head & space overhead , but not massive in my experience.
Abhi

How to optimize the downloading of data to the server in SSIS

Good day.
Need to get records from an Oracle database to a database in SQL Server. The data source type (ODBC) the performed using a SQL command, where I am taking all possible indices according to my requirement. The process runs fine, the problem is that it takes a long time and I need to be something quick. The process can not be performed with lookup, requires merge or merge join, simply load a table from Oracle to SQL under certain conditions.
Thank you for your help
Check what is your limiting factor. Generally there are 3 points to check:
Remote server is slow.
Source DB can run low on memory, read speed or free CPU. Substitute you query with a straight SELECT statement with no WHERE clause or JOINs and see if your SSIS package runs faster.
Target DB.
You may have indexes enabled, high write latency on HDD or not enough CPU.
Run an INSERT for your target table and see how longer it takes.
Problem may be in the middle: transfer between 2 servers. Network usually is main bottleneck. Is SSIS hosted on the same server as SQL server? then you have 2 network connections + possible hardware bottleneck on dedicated SSIS machine.
Depending on the bottleneck there are different solutions.
If you have network capacity and bottleneck is 1 CPU per query on Oracle, then you can partition your data horisontally (IDs 1 to 100, 101 to 200 etc); establish multiple connections to Oracle and load data in several streams. Number of streams is 1 less then number of CPUs on Oracle, SSIS or SQL Server (which ever is smaller).

Retrieving billions of rows from remote server?

I am trying to retrieve around 200 billion rows from a remote SQL Server. To optimize this, I have limited my query to use only an indexed column as a filter and am selecting only a subset of columns to make the query look like this:
SELECT ColA, ColB, ColC FROM <Database> WHERE RecordDate BETWEEN '' AND ''
But it looks like unless I limit my query to a time window of a few hours, the query fails in all cases with the following error:
OLE DB provider "SQLNCLI10" for linked server "<>" returned message "Query timeout expired".
Msg 7399, Level 16, State 1, Server M<, Line 1
The OLE DB provider "SQLNCLI10" for linked server "<>" reported an error. Execution terminated by the provider because a resource limit was reached.
Msg 7421, Level 16, State 2, Server <>, Line 1
Cannot fetch the rowset from OLE DB provider "SQLNCLI10" for linked server "<>".
The timeout is probably an issue because of the time it takes to execute the query plan. As I do not have control over the server, I was wondering if there is a good way of retrieving this data beyond the simple SELECT I am using. Are there any SQL Server specific tricks that I can use? Perhaps tell the remote server to paginate the data instead of issuing multiple queries or something else? Any suggestions on how I could improve this?
This is more of the kind of job SSIS is suited for. Even a simple flow like ReadFromOleDbSource->WriteToOleDbSource would handle this, creating the necessary batching for you.
Why read 200 Billion rows all at once?
You should page them, reading say a few thousand rows at a time.
Even if you do genuinely need to read all 200 Billion rows you should still consider using paging to break up the read into shorter queries - that way if a failure happens you just continue reading where you left off.
See efficient way to implement paging for at least one method of implementing paging using ROW_NUMBER
If you are doing data analysis then I suspect you are either using the wrong storage (SQL Server isn't really designed for processing of large data sets), or you need to alter your queries so that the analysis is done on the Server using SQL.
Update: I think the last paragraph was somewhat misinterpreted.
Storage in SQL Server is primarily designed for online transaction processing (OLTP) - efficient querying of massive datasets in massively concurrent environments (for example reading / updating a single customer record in a database of billions, at the same time that thousands of other users are doing the same for other records). Typically the goal is to minimise the amout of data read, reducing the amount of IO needed and also reducing contention.
The analysis you are talking about is almost the exact opposite of this - a single client actively trying to read pretty much all records in order to perform some statistical analysis.
Yes SQL Server will manage this, but you have to bear in mind that it is optimised for a completely different scenario. For example data is read from disk a page (8 KB) at a time, despite the fact that your statistical processing is probably only based on 2 or 3 columns. Depending on row density and column width you may only be using a tiny fraction of the data stored on an 8 KB page - most of the data that SQL Server had to read and allocate memory for wasn't even used. (Remember that SQL Server also had to lock that page to prevent other users from messing with the data while it was being read).
If you are serious about processing / analysis of massive datasets then there are storage formats that are optimised for exactly this sort of thing - SQL Server also has an add on service called Microsoft Analysis Services that adds additional online analytical processing (OLAP) and data mining capabilities, using storage modes more suited to this sort of processing.
Personally I would use a data extraction tool such as BCP to get the data to a local file before trying to manipulate it if I was trying to pull that much data at once.
http://msdn.microsoft.com/en-us/library/ms162802.aspx
This isn't A SQL Server specific answer, but even when the rDBMS supports server side cursors, it's considered poor form to use them. Doing so means that you are consuming resources on the server even though the server is still waiting for you to request more data.
Instead you should reformulate your query usage so that the server can transmit the entire result set as soon as it can, and then completely forget about you and your query to make way for the next one. When the result set is too large for you process all in one go, you should keep track of the last row returned by the current batch so that you can fetch another batch starting at that position.
Odds are the remote server has the "Remote Query Timeout" set. How long does it take for the query to fail?
Just run into the same problem, I also had the message at 10:01 after running the query.
Check this link. There's a remote query timeout setting under Connections that's setup to 600secs by default and you need to change it to zero (unlimited) or other value you think is right.
Try to change remote server connection timeout property.
For that go to SSMS, connect to the server, right click on server's name in object explorer, further select Properties -> Connections and change value in the Remote query timeout (in seconds, 0 = no timeout) text box.

Suspended status in SQL Activity Monitor

What would cause a query being done in Management Studio to get suspended?
I perform a simple select top 60000 from a table (which has 11 million rows) and the results come back within a sec or two.
I change the query to top 70000 and the results take up to 40 min.
From doing a bit of searching on another but related issue I came across someone using DBCC FREEPROCCACHE to fix it.
I run DBCC FREEPROCCACHE and then redo the query for 70000 and it seemmed to work.
However, the issue still occurs with a different query.
I increase to say 90000 or if I try to open the table using [Right->Open Table], it pulls about 8000 records and stops.
Checking the activity log for when I do the Open Table shows the session has been suspended with a wait type of "Async_Network_IO". For the session running the select of 90000 the status is "Sleeping", this is the same status for the above select 70000 query which did return but in 45min. It is strange to me that the status shows "Sleeping" and it does not appear to be changing to "Runable" (I have the activiy monitor refreshing ever 30sec).
Additional notes:
I am not running both the Open Table and select 90000 at the same time. All queries are done one at a time.
I am running 32bit SQL Server 2005 SP2 CU9. I tried upgrading to SP3 but ran into install failurs. The issues was occuring prior to me trying this upgrade.
Server setup is an Active/Active cluster the issue occurs on either node, and the other instance does not have this issue.
I have ~20 other database on this same server instance but only this one DB is seeing the issue.
This database gets fairly large. It is currently at 76756.19MB. Data file is 11,513MB.
I am logged in locally on the Server box using Remote Desktop.
The wait type "Async_Network_IO" means that its waiting for the client to retrieve the result set as SQL Server's network buffer is full. Why your client isn't picking up the data in a timely manner I can't say.
The other case it can happen is with linked servers when SQL Server is querying a remote table, in this case SQL Server is waiting for the remote server to respond.
Something worth looking at is virus scanners, if they are monitoring network connections sometimes they can get lagged, its often apparent by them hogging all the CPU.
Suspended means it is waiting on a resource and will resume when it gets its resource. Judging from the sizes you are pulling back, it seems you are in an OLAP type of query.
Try the following things:
Use NOLOCK or set the TRANSACTION ISOLATION LEVEL at the top of the query
Check your execution plan and tune the query to be more efficient