SSIS package gets fatal error while reading the input stream from the network - sql

Problem
When executing a data-heavy SSIS package that inserts the data from a database in EnvironmentA1 to a database EnvironmentB1, I get the following error:
A fatal error occurred while reading the input stream from the network. The session will be terminated (input error: 10060, output error: 0)
Context Information
EnvironmentA1 - virtual machine in local data center, running SQL Server 2017
EnvironmentB1 - virtual machine in Azure, running SQL Server 2017
The package is being executed from SSIS Catalog scheduled daily by SQL Agent. Very occasionally it will succeed but it is now generally expected to fail every time it runs, different step every time.
What is really baffling to me about this is that if I set to run the same package interactively in Visual Studio using the exact same connection strings with the same security context for both EnvironmentA1 & EnvironmentB1 connection managers it will succeed every time without any issues. The Visual Studio itself is installed elsewhere in EnvironmentC1.
This is how example entries in SQL Error Log on EnvironmentB1 look like around the time of failure:
Error messages from SSIS Catalog execution report:
Everything above and the research made suggest that this is network related issue. The common suggestion found was to disable any TCP Offloading related features which I did for both environments but that didn't make any difference.
EnvironmentA1:
EnvironmentB1:
Additionally for testing purposes I disabled the following features from NIC configuration on each environmet:
EnvironmentA1:
Receive-Side Scaling State
Large Send Offload V2 IPv4
Large Send Offload V2 IPv6
TCP Checksum Offload IPv4
TCP Checksum Offload IPv6
EnvironmentB1:
Receive-Side Scaling
Large Send Offload Version 2 IPv4
Large Send Offload Version 2 IPv6
TCP Checksum Offload IPv4
TCP Checksum Offload IPv6
IPSec Offload
Also to note there are other SSIS packages that interact with same both environments and some of them has never produced a similar error, but they are either dealing with insignificant amount of data or pushing it in the opposite direction (EnvironmentB1 to EnvironmentA1)
As a temporary measure I have also tried deploying the package to the SSIS Catalog of EnvironmentA2 (development version of EnvironmentA1) and scheduling execution using production connection strings, but it gets the exact same issue and the only guaranteed way to run the package successfully remains running it via Visual Studio.
If anyone could at least point me in the right direction of diagnosing this issue, that would be greatly appreciated. Please let me know if I can add any other info for the context.

Your 3rd SSIS error states the connection was forceably closed by remote host.
That suggests firewall or network filtering issues. Check with your network guys if that could be the case.

Related

SQL 2008 getting connection timeouts - Not a DBA

Wall of text (my apologies, but you'll need to read it all):
Error Message: Database Error: Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was una
Environment:
Virtual VMware Server 2008R2 SP1, running SQL 2008 SP3
32GB RAM - about 50 Databases
10Gb LAN connection, datastore storage provided by SSD SAN.
Application is CSTS connecting to SQL Server "DIRGE".
The application is configured to connect to another application for document retrieval "Onbase", who's database is also stored on DIRGE.
Throughout the day, CSTS will get connection time-outs. It's usually in spurts, so if one user is getting a timeout, usually someone else is getting one as well.
SQL has 28GB of the 32GB allocated. Memory utilization is a consistent >95%.
We cannot add more RAM as 2008R2 standard doesn't see more than 32GB.
CPU utilization was very high at times and the trend was it was getting more and more utilization, so we added a second CPU (2 sockets, 4 cores per socket).
I've scoured the event logs and the SQL logs, and the CSTS error logs looking for a commonality. I'm finding very little. I've resolved all the event log errors, no joy.
NOTE: Onbase server also gets connection time outs to SQL, so I don't believe it's application specific.
Scheduled Events:
Logs are backed up at 8am, 11a, 2pm, 5pm.
There's a SSIS package that runs every 15 minutes and takes about 8 minutes to run. However, I did not find any correlation to the timeouts.
There are maintenance plans that run after hours as well.
IP4 and 6 are both enabled.
Clients are referencing the database server by IP, so it's not a name resolution issue.
IP Protocols are enabled, static set to port 1433.
I ran a portqry from the Onbase server to TCP 143 and UDP 1434 and it IS listening.
We have a Solarwinds Database Analyzer running and watching this server; it says CPU and RAM are issues. I can get more details from it if anyone is interested.
I've google-fu'd the heck out of this and I just can't seem to find a good answer. From my searching, it seems this is a networking issue, but we've watched the network and I'm not seeing anything that would be the cause. Throughput is very little overall.
I will say this: The ONBASE server is on a different subnet than DIRGE, but I've ran a test DB connection using the name, named pipe and IP and they all work without issue.
The problem is I'm on a DBA so I'm learning this on the fly (I'm a Sr Systems Engineer).
I'm curious if someone has a suggestion on how to hunt this down.

Perforce replica server that can write to main server and has build capability

I need to customize the Perforce server to achieve the following requirements:
I need a local replica server which gets synced with the main server in a different geographical location. I can have the same time zone settings for the local and main servers
The client should be able to commit to the replica server.
The replica server will have build capability as well as a test frame work that is run whenever a build is succesfull.
Once the build and test is succesfull the code should get committed to main server.
I know that the replica server provided by perforce is used as a readonly server which can't write to main server and the forwarding replica just forwards the commands to main server.
I can't use proxy server, as the local server should work even when the main server is offline.
Is it possible to do this? Can anyone point me to some articles which would help me to set up such a server
I had asked the same question in the Perforce forum, but the question is still under verification by moderators.
An edge/commit setup may meet your requirements, as an Edge Server handles some local operations associated with workspaces and work in progress.
As well as read-only commands, the following operations can be performed on an Edge Server:
syncing, checking out, merging, resolving, and reverting files
More information about edge/commit archetecture is available here:
http://www.perforce.com/perforce/doc.current/manuals/p4dist/chapter.distributed.html
You may also want to look at BuildFarm servers:
http://www.perforce.com/perforce/doc.current/manuals/p4dist/chapter.replication.html#DB5-72814
Hope this helps,
Jen!
Build Server doesn't allow build work spaces to submit files. If submitting files is required as part of the build process, consider the use of an edge server to support your automated build processes.
With the implementation of edge servers in 2013.2, we now recommend that you use an edge server instead of a build farm server.
Edge servers offer all the functionality of build farm servers and yet offload more work from the main server and improve performance, with the additional flexibility of being able to run write commands as part of the build process.

BizTalk connectivity issue to SQL during VM snapshot

We have one VM for BizTalk and a separate VM for the SQL backend. We are using Veeam for backups which basically kicks off a snapshot of the VM. When this snapshot is being finalized on the SQL VM, BizTalk services on the application server fail. Usually they restart automatically but sometimes this requires manual intervention to start the services. The error below is logged on the BizTalk server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
An error occurred that requires the BizTalk service to terminate. The most common causes are the following:
1) An unexpected out of memory error.
OR
2) An inability to connect or a loss of connectivity to one of the BizTalk databases.
The service will shutdown and auto-restart in 1 minute. If the problematic database remains unavailable, this cycle will repeat.
Error message: [DBNETLIB][ConnectionRead (recv()).]General network error. Check your network documentation.
Error source:
BizTalk host name: BizTalkServerApplication
Windows service name: BTSSvc$BizTalkServerApplication
We experienced the same situation and error with both BizTalk 2009 and BizTalk 2013, each set up with two App servers and one SQL DB server.
When our VMware does the final step of the Snapshot backup on the Application servers, it freezes the application server for about 10 seconds, preventing it from receiving packets. On SQL Server 2008 and 2012, it by default will send out keep-alive packets to the clients every 30 seconds (30,000 ms). If the SQL server fails to receive a response back from the App server, it will send out 5 retries (default setting) of the keep-alive request 1 second (1,000 ms) apart. If SQL still does not receive the response back, it will terminate the connection, which will cause the BizTalk hosts on the App server to reset, and in our case, when our German-made ERP system sends its EDI documents over to BizTalk during that reset period, the transmission will fail.
We trapped the issue by running NetMon on the DB and App servers, waiting for the next error message. Upon inspection, we see the five SQL keep-alive packets being sent to the App servers 1 second apart, and at the same time there were NO packets at all received on the Application server. At first guess, one might think they were "just dropped network packets", which is rarely the case. We then made the correlation to the timing of the VM Snapshots, and now confirm each time the snapshot finishes each day, the App servers freeze.
As a Short-to-mid-term workaround, we raised the number of retries SQL attempts before declaring a connection dead, (5 by default), by adding the registry value TcpMaxDataRetransmissions and setting it to 30 (thus 30 seconds before SQL declares the client unresponsive). This has masked the problem for now for us, and use at your own discretion.
We are also looking at an Agent-based version of the VM Snapshot, which may alleviate the condition of freezing the server.
Is there any timeout setting or config changes that will allow BizTalk services to stay up during the snapshot process?
Not that I am aware of, however you might want to Google config options in the btsntsvc.exe.config file which is located in your BizTalk installation directory.
All messages that pass through BizTalk are written to the BizTalkMsgBoxDb and its other databases are involved if you are running tracking, BAM etc. The only service that can cache 'stuff' and handle a database outage is the Enterprise Single Sign-On (ESSO) Service. BizTalk therefore needs a persistent connection to the database server to remain 'up', hence why your Host Instance (BizTalkServerApplication) is stopping - it simply wouldn't be able to process messages if the database wasn't there.
I would add that your approach to back-ups probably isn't supported by Microsoft and I would further suggest that you seriously consider whether an approach that takes your database server offline during the backup is viable?
BizTalk has a pretty robust backup solution for its various databases built into the product, and I would recommend that you take a look at using this supported method.
If you do need to take snapshots of the database system - say once a night - you might want to consider stopping the BizTalk Host Instances, performing the snapshot, and then re-starting the Host Instances through some scripted task.
You might also want to consider checking whether there are any hotfixes for your version of BizTalk Server included in a Cumulative Update that might help address your problem.

ORA-07445 access violation

I have this error when running a large query on oracle. any advice?
I'm using pl sql version 10.2
I have noticed that the error is due to creating a view that is based up on many tables, and when I do a select from this view to a specific parameter with a where condition I got that error. When I checked the logs I found out this
ora 07445 access violation
So it is due to something on the view. I have full rights on the tables that I'm creating the views from. And I'm not using any network, the database is on my machine.
Thanks.
From the useful oerr command:
$ oerr ora 3113
03113, 00000, "end-of-file on communication channel"
// *Cause: The connection between Client and Server process was broken.
// *Action: There was a communication error that requires further investigation.
// First, check for network problems and review the SQL*Net setup.
// Also, look in the alert.log file for any errors. Finally, test to
// see whether the server process is dead and whether a trace file
// was generated at failure time.
So the likeliest causes:
The server process you were connected to crashed.
A network problem broke your connection.
Someone manually killed the process on the server you were connected to.
When the server process you were connected to crashed, it threw an ORA-07445. That error, along with ORA-00600, are relatively famous Oracle errors. They're functionally unhandled exceptions, with an ORA-00600 being an unhandled exception in the Oracle code, whereas ORA-07445 is a fatal signal from the OS, generally because Oracle did something that the OS didn't approve of, so the OS killed the Oracle process.
Oracle's support site (http://metalink.oracle.com) has an online troubleshooter for these errors -- search within metalink for document 600.1, and enter the appropriate information from the log file and you might receive some useful troubleshooting information.
This is usually when something is killed at the database server OS level. But it is a fairly generic error. But in my specific world, I'll see this in an application server log on machine A if the database server on machine B is shutdown. In your case, your desktop is losing communication with your DBMS. Your 'large query' may be getting killed at the process level if some administrator or automated process is identifying your query as a resource hog (i.e. you have a Cartesian product).
To be clear this is very likely something your doing wrong as the client and not a bug with your server or Oracle itself.
UPDATE since you provided additional details. Since the db is running on your machine I would bet that your query is encountering a lack of RAM to support both client and server operations.

Cannot create DB2 index, getting SQL30081N error

Trying to create an index (and run some long queries) on DB2 v9.1 and failing with the following error message:
SQL30081N (A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "". Communication function detecting the
error...")
I have tried to follow advice given by IBM here regarding setting QUERYTIMEOUTINTERVAL=0- http://www-01.ibm.com/support/docview.wss?rs=71&uid=swg21164785 but it did not take.
any ideas? queries and commands seem to time out at about 15 minutes.
You can rule out any network interference by running the DDL and SQL locally on the server. By using nohup on UNIX or schtasks on Windows, you can start a DB2 job that will run to completion even if the database server loses all network connectivity.
It seems like a network error, probably your client machine is losing the connection to the server. Are you over an unstable network connection, for example a VPN over the internet?