SQL Server Blocking Issue

SQL Server Blocking Issue - sql

We currently have an issue that occurs roughly once a day on SQL 2005 database server, although the time it happens is not consistent.
Basically, the database grinds to a halt, and starts refusing connections with the following error message. This includes logging into SSMS:
A connection was successfully established with the server, but then an error occurred during the login process. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)
Our CPU usage for SQL is usually around 15%, but when the DB is in it's broken state it's around 70%, so it's clearly doing something, even if no-one can connect. Even if I disable the web app that uses the database the CPU still doesn't go down.
I am unable to restart the SQLSERVER process as it is unresponsive, so I have to end up killing the process manually, which then puts the DB into Suspect/Recovery mode (which I can fix but it's a pain).
Below are some PerfMon stats I gathered when the DB was in it's broken state which might help. I have a bunch more if people want to request them:
Active Transactions: 2 (Never
Changes) Logical Connections: 34 (NC)
Process Blocked: 16 (NC) User
Connections: 30 (NC) Batch Request: 0
(NC) Active Jobs: 2 (NC) Log
Truncations: 596 (NC) Log Shrinks: 24
(NC) Longest Running Transaction
Time: 99 (NC)
I guess they key is finding out what the DB is using it's CPU on, but as I can't even log into SSMS this isn't possible with the standard methods.
Disturbingly, I can't even use the dedicated admin connection to get into SSMS. I get the same timout as with all other requests.
Any advice, reccomendations, or even sympathy, is much appreciated!

You will need to use the Profiler to determine what queries and process(es) may be causing this.

Whilst it is blocking normal connections, you might want to try going in under the Dedicated Admin Console connection. You will need to be in the sysadmin role of the database server to achieve this, in SSMS when you specify the server name prefix it with 'admin:' - this uses a different connection which is less likely to be blocked (but not impossible, just takes extreme circumstances).
You shouldn't use this DAC by default, you get access to the system tables and various other items you can not see normally, so you can also do a lot of damage with it.
Once in, you have a normal query window and can start looking at what is running, what's locked etc.

The dedicated admin connection is meant to help in these situations
Then, this script can tell you what has the open tran and the SQL running
SELECT s_tst.[session_id],
s_es.[login_name] AS [Login Name],
S_tdt.[database_transaction_begin_time] AS [Begin Time],
s_tdt.[database_transaction_log_record_count] AS [Log Records],
s_tdt.[database_transaction_log_bytes_used] AS [Log Bytes],
s_tdt.[database_transaction_log_bytes_reserved] AS [Log Reserved],
s_est.[text] AS [Last T-SQL Text],
s_eqp.[query_plan] AS [Last Query Plan]
FROM sys.dm_tran_database_transactions s_tdt
JOIN sys.dm_tran_session_transactions s_tst
ON s_tst.[transaction_id] = s_tdt.[transaction_id]
JOIN sys.[dm_exec_sessions] s_es
ON s_es.[session_id] = s_tst.[session_id]
JOIN sys.dm_exec_connections s_ec
ON s_ec.[session_id] = s_tst.[session_id]
LEFT OUTER JOIN sys.dm_exec_requests s_er
ON s_er.[session_id] = s_tst.[session_id]
CROSS APPLY sys.dm_exec_sql_text (s_ec.[most_recent_sql_handle]) AS s_est
OUTER APPLY sys.dm_exec_query_plan (s_er.[plan_handle]) AS s_eqp
ORDER BY [Begin Time] ASC;
Finally, SQL Server 2005 has a default trace running: you may be able to use this to find out what went wrong

Related

SQL Server query to show last activity time for current session

Back end: SQL Server 2017 Express
Front End: Microsoft Access 2019 with ODBC linked tables to the SQL Server database
Objective: To detect inactivity of say 30 minutes in the current session, and then exit from Access
I would like a SQL Server query (to be called from the front-end Access database, via a timer) which will return the date/time of the last SQL statement (e.g. select/insert/update/delete) for the current session, so that the Access application can exit after a defined period of inactivity.
So far, I have looked at sp_who, sp_who2, sysprocesses, dm_exec_connections, dm_db_index_usage_stats and dm_exec_sessions.
Whilst these return useful looking columns such as LastBatch, the problem is that the act of querying the database updates the return value. For instance, if I run sp_who2 and look at the row for my SPID, the value of LastBatch is always the same as GetDate().
I know that the options above would work if I was monitoring another session (SPID) but I'm looking for a way to find the time of last activity (excluding sp_who2 etc) for my own session.
Any suggestions?

You are correct that the last request time is the time of the request requesting the last request time. (Had to write that.) Maybe you should be detecting if Access is idle instead? Searching "automatically close access if idle" has promising results.
https://learn.microsoft.com/en-us/office/vba/access/concepts/miscellaneous/detect-user-idle-time-or-inactivity
https://www.iaccessworld.com/set-program-close-automatically/
https://www.tek-tips.com/faqs.cfm?fid=1432
If you absolutely have to do the check in SQL Server, then perhaps a job that checks connections. It can kill connections if they are idle a while without an open transaction. I don't know if the access connections will be obvious as not all apps provide their name. I also don't know how the access front end will respond. I'd be leery of doing this.

SQL Server 2012 Migrating Spatial data across a Linked server, Query timeout?

We are doing a migration from our old system (sql server 2008) to the new system (SQL server 2012) , the data sources we are using are Remote so we have it configured as a linked servers , the data in the source we are migrating have special data (Geography type) , we are migrating the data per customer , so some customers have more data that the others, we batch the data and we are using OPENQUERY to pull the Spatial data across. For the customers with less data the migration goes smoothly and it completes successfully , but for customers with more than couple million records in one table the migration stops and gives mainly 2 errors:
This how the error comes like :
OLE DB provider "yyy" for linked server "xxx" returned message "Query timeout expired".
Msg 7399, Level 16, State 1, Server nnn, Line 1
The OLE DB provider "yyy" for linked server "xxx" reported an error. Execution terminated by the provider because a resource limit was reached.
Msg 7320, Level 16, State 2, Server ttt , Line 1
Cannot execute the query "
select top (200000)
[row] = row_number () over ( order by t.[x])
, .....
, [Spatial] = cast(ts.[Spatial] as varbinary(max))
from [..].[..].[..] t
join [...].[..].[… ] s
on t.[..] = s.[...]
where (t.[x] > '00000000-0000-0000-0000-000000000000')
and v.[x] = x
order by t.[x]
" against OLE DB provider "yyy" for linked server "xxx".Build step 'Execute Windows batch command' marked build as failure
also this problem happened with one other table that doesn't have a spatial data in it.
The approaches we tried to follow,
We have increased the timeout of the query,
We have dropped the batch size to 200,000 per batch
The Provider is “ in process mode”
we only have couple linked servers so the buffer size is more than acceptable ,
We tried to run the migration using an admin role to make sure it’s not a permissions problem
We are thinking this might be a network problem but it’s not a Load Balancer issue. maybe its something else,
The other error that comes frequently is
HResult 0x40, Level 16, State 1
TCP Provider: The specified network name is no longer available.
Any idea's for what could be a reason will be so much appreciated
Thank you,
Lsaif

I would say the "The specified network name is no longer available." error indicates no response from the remote server. Since SQL Server hasn't "heard" from the remote server in a while, it gives up. I would cut down the batch size to something really small and increase with success (rather than the other way around). That way you'll find a batch size that works. Also, this may vary between "customers" depending on your connection to them (i.e., type and size of line, traffic on the line, etc.).
Personally, I like the BCP OUT/BCP IN option as well because I know it works however, you still have to consider the transfer method of the data from the remote server. If you have a robust enterprise MFT over a dedicated T1 or better, you probably won't have an issue.
SSIS offers more of a direct transfer but I imagine you'll run into the same network issues you're having now. That said, you can create a general BCP solution within SSIS.

Retrieving billions of rows from remote server?

I am trying to retrieve around 200 billion rows from a remote SQL Server. To optimize this, I have limited my query to use only an indexed column as a filter and am selecting only a subset of columns to make the query look like this:
SELECT ColA, ColB, ColC FROM <Database> WHERE RecordDate BETWEEN '' AND ''
But it looks like unless I limit my query to a time window of a few hours, the query fails in all cases with the following error:
OLE DB provider "SQLNCLI10" for linked server "<>" returned message "Query timeout expired".
Msg 7399, Level 16, State 1, Server M<, Line 1
The OLE DB provider "SQLNCLI10" for linked server "<>" reported an error. Execution terminated by the provider because a resource limit was reached.
Msg 7421, Level 16, State 2, Server <>, Line 1
Cannot fetch the rowset from OLE DB provider "SQLNCLI10" for linked server "<>".
The timeout is probably an issue because of the time it takes to execute the query plan. As I do not have control over the server, I was wondering if there is a good way of retrieving this data beyond the simple SELECT I am using. Are there any SQL Server specific tricks that I can use? Perhaps tell the remote server to paginate the data instead of issuing multiple queries or something else? Any suggestions on how I could improve this?

This is more of the kind of job SSIS is suited for. Even a simple flow like ReadFromOleDbSource->WriteToOleDbSource would handle this, creating the necessary batching for you.

Why read 200 Billion rows all at once?
You should page them, reading say a few thousand rows at a time.
Even if you do genuinely need to read all 200 Billion rows you should still consider using paging to break up the read into shorter queries - that way if a failure happens you just continue reading where you left off.
See efficient way to implement paging for at least one method of implementing paging using ROW_NUMBER
If you are doing data analysis then I suspect you are either using the wrong storage (SQL Server isn't really designed for processing of large data sets), or you need to alter your queries so that the analysis is done on the Server using SQL.
Update: I think the last paragraph was somewhat misinterpreted.
Storage in SQL Server is primarily designed for online transaction processing (OLTP) - efficient querying of massive datasets in massively concurrent environments (for example reading / updating a single customer record in a database of billions, at the same time that thousands of other users are doing the same for other records). Typically the goal is to minimise the amout of data read, reducing the amount of IO needed and also reducing contention.
The analysis you are talking about is almost the exact opposite of this - a single client actively trying to read pretty much all records in order to perform some statistical analysis.
Yes SQL Server will manage this, but you have to bear in mind that it is optimised for a completely different scenario. For example data is read from disk a page (8 KB) at a time, despite the fact that your statistical processing is probably only based on 2 or 3 columns. Depending on row density and column width you may only be using a tiny fraction of the data stored on an 8 KB page - most of the data that SQL Server had to read and allocate memory for wasn't even used. (Remember that SQL Server also had to lock that page to prevent other users from messing with the data while it was being read).
If you are serious about processing / analysis of massive datasets then there are storage formats that are optimised for exactly this sort of thing - SQL Server also has an add on service called Microsoft Analysis Services that adds additional online analytical processing (OLAP) and data mining capabilities, using storage modes more suited to this sort of processing.

Personally I would use a data extraction tool such as BCP to get the data to a local file before trying to manipulate it if I was trying to pull that much data at once.
http://msdn.microsoft.com/en-us/library/ms162802.aspx

This isn't A SQL Server specific answer, but even when the rDBMS supports server side cursors, it's considered poor form to use them. Doing so means that you are consuming resources on the server even though the server is still waiting for you to request more data.
Instead you should reformulate your query usage so that the server can transmit the entire result set as soon as it can, and then completely forget about you and your query to make way for the next one. When the result set is too large for you process all in one go, you should keep track of the last row returned by the current batch so that you can fetch another batch starting at that position.

Odds are the remote server has the "Remote Query Timeout" set. How long does it take for the query to fail?

Just run into the same problem, I also had the message at 10:01 after running the query.
Check this link. There's a remote query timeout setting under Connections that's setup to 600secs by default and you need to change it to zero (unlimited) or other value you think is right.

Try to change remote server connection timeout property.
For that go to SSMS, connect to the server, right click on server's name in object explorer, further select Properties -> Connections and change value in the Remote query timeout (in seconds, 0 = no timeout) text box.

Suspended status in SQL Activity Monitor

What would cause a query being done in Management Studio to get suspended?
I perform a simple select top 60000 from a table (which has 11 million rows) and the results come back within a sec or two.
I change the query to top 70000 and the results take up to 40 min.
From doing a bit of searching on another but related issue I came across someone using DBCC FREEPROCCACHE to fix it.
I run DBCC FREEPROCCACHE and then redo the query for 70000 and it seemmed to work.
However, the issue still occurs with a different query.
I increase to say 90000 or if I try to open the table using [Right->Open Table], it pulls about 8000 records and stops.
Checking the activity log for when I do the Open Table shows the session has been suspended with a wait type of "Async_Network_IO". For the session running the select of 90000 the status is "Sleeping", this is the same status for the above select 70000 query which did return but in 45min. It is strange to me that the status shows "Sleeping" and it does not appear to be changing to "Runable" (I have the activiy monitor refreshing ever 30sec).
Additional notes:
I am not running both the Open Table and select 90000 at the same time. All queries are done one at a time.
I am running 32bit SQL Server 2005 SP2 CU9. I tried upgrading to SP3 but ran into install failurs. The issues was occuring prior to me trying this upgrade.
Server setup is an Active/Active cluster the issue occurs on either node, and the other instance does not have this issue.
I have ~20 other database on this same server instance but only this one DB is seeing the issue.
This database gets fairly large. It is currently at 76756.19MB. Data file is 11,513MB.
I am logged in locally on the Server box using Remote Desktop.

The wait type "Async_Network_IO" means that its waiting for the client to retrieve the result set as SQL Server's network buffer is full. Why your client isn't picking up the data in a timely manner I can't say.
The other case it can happen is with linked servers when SQL Server is querying a remote table, in this case SQL Server is waiting for the remote server to respond.
Something worth looking at is virus scanners, if they are monitoring network connections sometimes they can get lagged, its often apparent by them hogging all the CPU.

Suspended means it is waiting on a resource and will resume when it gets its resource. Judging from the sizes you are pulling back, it seems you are in an OLAP type of query.
Try the following things:
Use NOLOCK or set the TRANSACTION ISOLATION LEVEL at the top of the query
Check your execution plan and tune the query to be more efficient

How to determine an Oracle query without access to source code?

We have a system with an Oracle backend to which we have access (though possibly not administrative access) and a front end to which we do not have the source code. The database is quite large and not easily understood - we have no documentation. I'm also not particularly knowledgable about Oracle in general.
One aspect of the front end queries the database for a particular set of data and displays it. We have a need to determine what query is being made so that we can replicate and automate it without the front end (e.g. by generating a csv file periodically).
What methods would you use to determine the SQL required to retrieve this set of data?
Currently I'm leaning towards the use of an EeePC, Wireshark and a hub (installing Wireshark on the client machines may not be possible), but I'm curious to hear any other ideas and whether anyone can think of any pitfalls with this particular approach.

Clearly there are many methods. The one that I find easiest is:
(1) Connect to the database as SYS or SYSTEM
(2) Query V$SESSION to identify the database session you are interested in.
Record the SID and SERIAL# values.
(3) Execute the following commands to activate tracing for the session:
exec sys.dbms_system.set_bool_param_in_session( *sid*, *serial#*, 'timed_statistics', true )
exec sys.dbms_system.set_int_param_in_session( *sid*, *serial#*, 'max_dump_file_size', 2000000000 )
exec sys.dbms_system.set_ev( *sid*, *serial#*, 10046, 5, '' )
(4) Perform some actions in the client app
(5) Either terminate the database session (e.g. by closing the client) or deactivate tracing ( exec sys.dbms_system.set_ev( sid, serial#, 10046, 0, '' ) )
(6) Locate the udump folder on the database server. There will be a trace file for the database session showing the statements executed and the bind values used in each execution.
This method does not require any access to the client machine, which could be a benefit. It does require access to the database server, which may be problematic if you're not the DBA and they don't let you onto the machine. Also, identifying the proper session to trace can be difficult if you have many clients or if the client application opens more than one session.

Start with querying Oracle system views like V$SQL, v$sqlarea and
v$sqltext.

Which version of Oracle? If it is 10+ and if you have administrative access (sysdba), then you can relatively easy find executed queries through Oracle enterprise manager.
For older versions, you'll need access to views that tuinstoel mentioned in his answer.
Same data you can get through TOAD for oracle which is quite capable piece of software, but expensive.

Wireshark is indeed a good idea, it has Oracle support and nicely displays the whole conversation.
A packet sniffer like Wireshark is especially interesting if you don't have admin' access to the database server but you have access to the network (for instance because there is port mirroring on the Ethernet switch).

I have used these instructions successfully several times:
http://www.orafaq.com/wiki/SQL_Trace#Tracing_a_SQL_session

"though possibly not administrative access". Someone should have administrative access, probably whoever is responsible for backups. At the very least, I expect you'd have a user with root/Administrator access to the machine on which the oracle database is running. Administrator should be able to login with a
"SQLPLUS / AS SYSDBA" syntax which will give full access (which can be quite dangerous). root could 'su' to the oracle user and do the same.
If you really can't get admin access then as an alternative to wireshark, if your front-end connects to the database through an Oracle client, look for the file sqlnet.ora. You can set trace_level_client, trace_file_client and trace_directory_client and get it to log the Oracle network traffic between the client and database server.
However it is possible that the client will call a stored procedure and retrieve the data as output parameters or a ref cursor, which means you may not see the query being executed through that mechanism. If so, you will need admin access to the db server, and trace as per Dave Costa's answer

A quick and dirty way to do this, if you can catch the SQL statement(s) in the act, is to run this in SQL*Plus:-
set verify off lines 140 head on pagesize 300
column sql_text format a65
column username format a12
column osuser format a15
break on username on sid on osuser
select S.USERNAME, s.sid, s.osuser,sql_text
from v$sqltext_with_newlines t,V$SESSION s
where t.address =s.sql_address
and t.hash_value = s.sql_hash_value
order by s.sid,t.piece
/
You need access those v$ views for this to work. Generally that means connecting as system.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas