Trying to create an index (and run some long queries) on DB2 v9.1 and failing with the following error message:
SQL30081N (A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "". Communication function detecting the
error...")
I have tried to follow advice given by IBM here regarding setting QUERYTIMEOUTINTERVAL=0- http://www-01.ibm.com/support/docview.wss?rs=71&uid=swg21164785 but it did not take.
any ideas? queries and commands seem to time out at about 15 minutes.
You can rule out any network interference by running the DDL and SQL locally on the server. By using nohup on UNIX or schtasks on Windows, you can start a DB2 job that will run to completion even if the database server loses all network connectivity.
It seems like a network error, probably your client machine is losing the connection to the server. Are you over an unstable network connection, for example a VPN over the internet?
Related
Problem
When executing a data-heavy SSIS package that inserts the data from a database in EnvironmentA1 to a database EnvironmentB1, I get the following error:
A fatal error occurred while reading the input stream from the network. The session will be terminated (input error: 10060, output error: 0)
Context Information
EnvironmentA1 - virtual machine in local data center, running SQL Server 2017
EnvironmentB1 - virtual machine in Azure, running SQL Server 2017
The package is being executed from SSIS Catalog scheduled daily by SQL Agent. Very occasionally it will succeed but it is now generally expected to fail every time it runs, different step every time.
What is really baffling to me about this is that if I set to run the same package interactively in Visual Studio using the exact same connection strings with the same security context for both EnvironmentA1 & EnvironmentB1 connection managers it will succeed every time without any issues. The Visual Studio itself is installed elsewhere in EnvironmentC1.
This is how example entries in SQL Error Log on EnvironmentB1 look like around the time of failure:
Error messages from SSIS Catalog execution report:
Everything above and the research made suggest that this is network related issue. The common suggestion found was to disable any TCP Offloading related features which I did for both environments but that didn't make any difference.
EnvironmentA1:
EnvironmentB1:
Additionally for testing purposes I disabled the following features from NIC configuration on each environmet:
EnvironmentA1:
Receive-Side Scaling State
Large Send Offload V2 IPv4
Large Send Offload V2 IPv6
TCP Checksum Offload IPv4
TCP Checksum Offload IPv6
EnvironmentB1:
Receive-Side Scaling
Large Send Offload Version 2 IPv4
Large Send Offload Version 2 IPv6
TCP Checksum Offload IPv4
TCP Checksum Offload IPv6
IPSec Offload
Also to note there are other SSIS packages that interact with same both environments and some of them has never produced a similar error, but they are either dealing with insignificant amount of data or pushing it in the opposite direction (EnvironmentB1 to EnvironmentA1)
As a temporary measure I have also tried deploying the package to the SSIS Catalog of EnvironmentA2 (development version of EnvironmentA1) and scheduling execution using production connection strings, but it gets the exact same issue and the only guaranteed way to run the package successfully remains running it via Visual Studio.
If anyone could at least point me in the right direction of diagnosing this issue, that would be greatly appreciated. Please let me know if I can add any other info for the context.
Your 3rd SSIS error states the connection was forceably closed by remote host.
That suggests firewall or network filtering issues. Check with your network guys if that could be the case.
Wall of text (my apologies, but you'll need to read it all):
Error Message: Database Error: Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was una
Environment:
Virtual VMware Server 2008R2 SP1, running SQL 2008 SP3
32GB RAM - about 50 Databases
10Gb LAN connection, datastore storage provided by SSD SAN.
Application is CSTS connecting to SQL Server "DIRGE".
The application is configured to connect to another application for document retrieval "Onbase", who's database is also stored on DIRGE.
Throughout the day, CSTS will get connection time-outs. It's usually in spurts, so if one user is getting a timeout, usually someone else is getting one as well.
SQL has 28GB of the 32GB allocated. Memory utilization is a consistent >95%.
We cannot add more RAM as 2008R2 standard doesn't see more than 32GB.
CPU utilization was very high at times and the trend was it was getting more and more utilization, so we added a second CPU (2 sockets, 4 cores per socket).
I've scoured the event logs and the SQL logs, and the CSTS error logs looking for a commonality. I'm finding very little. I've resolved all the event log errors, no joy.
NOTE: Onbase server also gets connection time outs to SQL, so I don't believe it's application specific.
Scheduled Events:
Logs are backed up at 8am, 11a, 2pm, 5pm.
There's a SSIS package that runs every 15 minutes and takes about 8 minutes to run. However, I did not find any correlation to the timeouts.
There are maintenance plans that run after hours as well.
IP4 and 6 are both enabled.
Clients are referencing the database server by IP, so it's not a name resolution issue.
IP Protocols are enabled, static set to port 1433.
I ran a portqry from the Onbase server to TCP 143 and UDP 1434 and it IS listening.
We have a Solarwinds Database Analyzer running and watching this server; it says CPU and RAM are issues. I can get more details from it if anyone is interested.
I've google-fu'd the heck out of this and I just can't seem to find a good answer. From my searching, it seems this is a networking issue, but we've watched the network and I'm not seeing anything that would be the cause. Throughput is very little overall.
I will say this: The ONBASE server is on a different subnet than DIRGE, but I've ran a test DB connection using the name, named pipe and IP and they all work without issue.
The problem is I'm on a DBA so I'm learning this on the fly (I'm a Sr Systems Engineer).
I'm curious if someone has a suggestion on how to hunt this down.
I am frequently getting the below mentioned errors, the dll version used in the project is - 1.0.488.0
System.TimeoutException: Timeout performing GET
StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: GET
No connection is available to service this operation: EXISTS
Can anyone help me out in figuring what the issue can be?
Have also created an issue on StackExchange's Github repo for the same
Issue created on Github for the same
It looks like your connection broke. And when it did, any commands already sent to Redis would have timed out on the client application, even though they could have executed on the server. If you upgrade to a later version of the StackExchange.Redis client, you will get richer diagnostics information about what the state of the threadpool, CPU etc was on the client application side.
I am running a Java + JPA/Hibernate application on Appengine and switched my database from the first generation Google Cloud SQL instance to the second generation and now get a lot of this errors:
2017-05-20T22:49:53.533247Z 2235 [Note] Aborted connection 2235 to db:
'mydb' user: 'root' host: 'cloudsqlproxy~myip'
(Got an error reading communication packets)
As far as I can tell, most of these error occur during database requests inside task queue tasks.
This did not happen with the first generation. How can this be avoided?
Τhe "Aborted connection nnnn to db:" message is triggered when an existing connection is terminated improperly as described by Google’s documentation. Most of the aborted connections happen because of the termination of your connection was not correct or because of networking problem between the server and the client, as described in the documentation here.
I advise you to follow the Google’s documentation about managing Cloud SQL connections, emphasizing on the “connection pools” section and of course the “opening and closing connection” section.
Managing database connections talks about "close connection properly". However, in my case, the error still occurs when I use GCP cloud function to connect GCP cloudSQL.
A Google group says that, unless you use NullPool OR dispose the engine explicitly, the error message will always occur. Also, it does not suggest to use engine.dispose().
So I wonder what is the best way to release the resource of connection pool without generating error message on CloudSQL?
I have this error when running a large query on oracle. any advice?
I'm using pl sql version 10.2
I have noticed that the error is due to creating a view that is based up on many tables, and when I do a select from this view to a specific parameter with a where condition I got that error. When I checked the logs I found out this
ora 07445 access violation
So it is due to something on the view. I have full rights on the tables that I'm creating the views from. And I'm not using any network, the database is on my machine.
Thanks.
From the useful oerr command:
$ oerr ora 3113
03113, 00000, "end-of-file on communication channel"
// *Cause: The connection between Client and Server process was broken.
// *Action: There was a communication error that requires further investigation.
// First, check for network problems and review the SQL*Net setup.
// Also, look in the alert.log file for any errors. Finally, test to
// see whether the server process is dead and whether a trace file
// was generated at failure time.
So the likeliest causes:
The server process you were connected to crashed.
A network problem broke your connection.
Someone manually killed the process on the server you were connected to.
When the server process you were connected to crashed, it threw an ORA-07445. That error, along with ORA-00600, are relatively famous Oracle errors. They're functionally unhandled exceptions, with an ORA-00600 being an unhandled exception in the Oracle code, whereas ORA-07445 is a fatal signal from the OS, generally because Oracle did something that the OS didn't approve of, so the OS killed the Oracle process.
Oracle's support site (http://metalink.oracle.com) has an online troubleshooter for these errors -- search within metalink for document 600.1, and enter the appropriate information from the log file and you might receive some useful troubleshooting information.
This is usually when something is killed at the database server OS level. But it is a fairly generic error. But in my specific world, I'll see this in an application server log on machine A if the database server on machine B is shutdown. In your case, your desktop is losing communication with your DBMS. Your 'large query' may be getting killed at the process level if some administrator or automated process is identifying your query as a resource hog (i.e. you have a Cartesian product).
To be clear this is very likely something your doing wrong as the client and not a bug with your server or Oracle itself.
UPDATE since you provided additional details. Since the db is running on your machine I would bet that your query is encountering a lack of RAM to support both client and server operations.