SQL Server Express Idle Mode Partial Data Returns? - sql-server-2005

I'm attempting to help our network engineers troubleshoot a situation for one of our clients. This client purchased a point-of-sale system from quite literally a "mom-and-pop" vendor, and said vendor recommended SQL Server Express 2005 as the back-end database to save the client from having to incur extra licensing fees. (Please don't get me started on that!)
We didn't write the app, and because it's a commercial app, we have no source code available. (Not that it would help us if we did; the thing was built in PowerBuilder, so we don't have tooling for it.) The app does none of its own logging, that we can ascertain. All we have to go on is SQL Server Express's own logging.
In the application, an end user swipes a membership card. Occasionally (a few times a day), the swipe will not return data from the database. The message on screen will say, "Member 123 not found." (The member numbers are actually six digits, "000123.") A rescan immediately afterward returns the member data correctly.
We've eliminated the scanner itself as a source of issues -- it routinely scans the full six-digit number. A scan of SQL Server Express's log indicates that it is coming back online from being idle, often at the point of the scan (but also at several other times per day). (Idle mode is explained here.)
I understand that allocating/deallocating RAM the way SQL Express does is a time-consuming process, especially if we're talking about hundreds of megabytes at a time -- which appears to be the case.
What we're not sure of is whether or not we're getting back partial data, or if the app is simply failing to connect to the database and displaying a generic error message. Since everything is so opaque, and the client is (for obvious reasons) unwilling to pay us to sit in their facility for 8 hours or so to physically see it happen (perhaps with network monitoring/packet sniffing tools), we're kind of at a loss.
At this point, our recommendation is that the client upgrade to SQL Server 2005 Workgroup Edition, with 5 CALs. But that doesn't completely sit well with me as the solution to this issue, because I'm reasonably certain that no SQL Server ever returns partial data -- if you can't connect, you can't connect. (That said, I still recommend it because it's a solution to a number of their other issues!)
I don't have much experience with Express. (I never use it for anything but local development, and there only at home; I certainly never recommend it to my clients.)
My question to those who might have experience with Express is, have you ever seen an instance of SQL Express return partial data, without the app itself being the cause of it? Specifically, have you seen this behavior when returning from idle mode?
(For what it's worth, we're inclined to believe that the app is failing to connect and merely displaying a generic error message, lopping off leading zeroes on the member ID when it does. That seems the most reasonable answer -- a third question might be, do you guys concur with that assessment?)

I've never heard of or experienced SQL Server Express returning partial data. It's essentially the same code base as the full SQL Server.
It is more likely that the application is experiencing a timeout (which defaults to 30 seconds) due to SQL Server Express going idle. The application probably receives a timeout that it does not expect and does not handle it well.
The problem and possible solutions are discussed in this forum thread: http://social.msdn.microsoft.com/forums/en-US/sqlexpress/thread/a8fbf8d6-9949-47a5-a32b-50f8131f1127/
I suspect you have a connection string that looks like this:
Data Source=.\SQLEXPRESS; Integrated Security=True;AttachDbFilename=|DataDirectory|\myDatabase.mdf;User Instance=True
From the referenced thread:
This connection string will cause an
initial connection to the main
instance (.\SQLEXPRESS) and then
instruct the main instance to spawn a
new instance of SQL Server under the
user's context and attach the database
specified to that new User Instance.
The User Instance is a completely
separate running instance of SQL
Server form the main instance that is
unique to the user and that will be
shut down when there are no longer any
connections to it.
This is totally different that
attaching a database to the main
instance, which stays running at all
times, unless you've manually shut it
down. If your question is about the
main instance going into an Idle
state, then your question is not
unique to SQL Express and you should
ask this question in the Database
Engine forum. I believe all Editions
of SQL Server have an Idle state and
the other forum would be where you can
find out how to affect that behavior.

Related

ColdFusion 11 to 2018 Upgrade -- Server Locking Up, How to Test Better?

We are currently testing an upgrade from CF11 to CF2018 for my company's intranet. To give you an idea how long this site has been running, our first version of CF was 3.1! It is still using application.cfm, and there is code from 1998, when I started writing this thing. Yes, 21 years -- I'm astonished, too. It is a hodgepodge of all kinds of older frameworks, too, including Fusebox.
Anyway, we're running Win 2012 VM connected to a SQL 2016 farm. Everything looked OK initially, but in the Week I've been testing, the server has come to a slowdown once (a page took more than 5 seconds to run, something that usually takes 100ms, no DB involvement), and another time, the server came to a grinding halt. The only way I could restart CF App service was by connecting to the server with another server via Services, because doing it via Remote Desktop was so slow.
Now keep in mind -- it's just me testing. This is a site that doesn't have a ton of users, but still, having 5 concurrent connections is normal and there are upwards of 200-400 users hitting this thing every day.
I have FusionReactor running on this thing now, so the next time a lockup happens, I will be able to take a closer look, but what do you think is the best way I can test this? Our site is mostly transactional, users going and filling out forms to put internal orders through. We also connect to XML web services and REST services; we also provide REST services, too. Obviously there's no way to completely replicate a production server's requests onto a test server, but I need to do more thorough testing. Any advice would be hugely appreciated.
I realize your focus for now is trying to recreate the problem on test. That may not be as easy as hoped. Instead, you should be able to understand and resolve it in production. FusionReactor can help, but the answer may well be in the cf logs.
You don't mention assessing the logs at the time of the hangup. See especially the coldfusion-error log, for outofmemory conditions.
You mention raising the heap, but the problem may be with the metaspace instead. If so, consider simply removing the maxmetaspace setting in the jvm args. That may be the sole and likely cause of such new and unexpected outages.
Or if it's not, and there's nothing in the logs at the time, THEN do consider FR. Does IT show anything happening at the time?
If not then consider a need to tune the cf/web server connector. I assume you're using iis. How many sites do you have? And how many connectors (folders in the cf config/wsconfig folder)? What are the settings in their workers.properties file? Are they optimized for the number of sites using that connector?
Also, have you updated cf2018? Are there any errors in the update error log? Did you update the web server connector also?
Are you running the cf2018 pmt (performance monitoring tool set)? Have you updated it?
There could be still more to consider, but let's see how it goes with those. I have blog posts on these and many more topics that would elaborate on things, both at my site (carehart.org) and the Adobe cf portal (coldfusion.adobe.com).
But let's hear if any of this gets you going.

Cannot access SQL azure

Just had a bizarre issue with SQL Azure, and it's happened in a small phase just before full go live with some users doing some data entry.
"Database 'dbname' on server 'xxx' is not currently available. Please rety the connection later. If the problem persists, contact customer support."
When I tried to connect via SQL Azure database website I got:
"Firewall check failed.
Resource ID : 1. The request minimum guarantee is 0,
maximum limit is 180 and the current usage for the database is 0.
However, the server is currently too busy to support request greater than 0 for this database."
Looking at the databases section of the Azure Management website the site reported it couldn't access the DB, but I didn't capture the exact error message unfortunately.
Bizarrely, a couple of my users were still able to login to our system website that access the DB, and view and save data. Eventually they lost connection too however.
After an hour or so, the databases came back to life and we could fully access them again.
I have looked at the servers master db event table using queries from here and there was a couple of connection failures but nothing interesting. No throttling or deadlocks, a couple of failed connections that said "Client may have timed out when establishing connection. Try increasing the connection timeout." in the description
Any ideas where else to look?
Business users have had a massive drop in confidence because of this.
What your describing normally occurs because of :
1) SQL Connection limit being hit. Assuming you don't see this often you unlikely to be the cause. But worth checking putting a limit on your connection pool can help.
2)You neighbours being extremely noisy and thus the node re-adjusts.
3) Hardware failure and Microsoft bringing your database back online in a different node. This can take some time.
Normally I have seen this when Microsoft have throttled or had problems with a box and had to recover everyone over. Because you are on a shared system you have to keep in mind that they are recovering everyone else also in that node also and thus sometimes this takes time.
The best bet if you are worried and need to get a resolution for the business is to open a support ticket with MS and give them the time and error message you saw this. They will investigate and generally they have really good back end telemetry that will point to a reason. This will allow you to give the business a resolution and then you can make a call on future plans and contingencies. You have to keep in mind though that SQL Azure is shared system and transient errors can happen, you might need to design more failover into your designs.

SyncFramework 2.1 updates & deletes do not seem to apply properly

I'm synchronizing SQL Server 2008 with ~6 SQL Server 2008 Express clients (everything R2 I believe), using the SyncOrchestrator or specifically using http://code.msdn.microsoft.com/windowsdesktop/Database-SyncSQL-Server-e97d1208 as a base with slight modifications. To my knowledge this means all connections are peers or nodes.
I have 2 scopes. One is download only and the other is upload only. The download only scope is ridden with identity columns primarily because I didn't know any better and still couldn't wrap my head around introducing Guids as the PK on the client side. It doesn't totally matter as all clients should have exact replicas of about 8 or so tables and these machines don't touch this data in any way, only read it.
The upload only scope uses Guids as fortunately I can control that portion of the database and there would be no way 10 clients all using the same identity seed could sync back to the server properly. Both scopes use the default provisioning with bulk inserts and the whole 9 yards so there shouldn't be anything I'm doing on the provisioning end to screw this up.
I initially set everything up not using PerformPostRestoreFixup AND the initial database would be manually synchronized with insert statements from the host. This seemed fine but no updates or deletes seemed to ever be applied. You can safely ignore this (only used for historical accuracy and to prove my ineptness) as I then used VS2010 Database Projects to rebuild the database down to schema only & synchronized. I then used the steps outlined here (http://social.microsoft.com/Forums/br/syncdevdiscussions/thread/9ac6d1a1-1565-4b82-a8d8-3d4a9ff5d07b) (sync, backup, restore, call performpostrestorefixup, sync on x clients) and on my dev box where I'm setting all this up I could see updates and deletes just fine. Its when I deploy this to the x clients that I'm not seeing a mirror of the database as I think I should.
The initial sync will complain and try to synchronize all records again. I believe this is expected. During ApplyChangeFailed event on the client I set everything other than DbConflictType.ErrorsOccurred to ApplyAction.RetryWithForceWrite. This may be a source of problems as I initially thought this should be done to force the change down to the client. I want the server to always win in this scenario but during trace I always see the phrase "Local wins" during the bulk insert/update calls. It's possible I'm seeing the error before the re-apply happens but it's awkward to look at.
The only problem I seem to be having is with the download only scope. The initial client database is about a week old now and if I use the performpostrestorefixup steps I don't see any of the updates that have applied between now and then as I think I should. It's as if SyncFx almost prefers a blank database on the client side to kick off the initial sync then all the updates seem to apply just fine with no ApplyChangesFailed events kicking off.
If anyone has seen this before or has a clue where to go I would greatly appreciate it. My brain has fried trying to determine what it is that's going on. My last ditch effort will be to deploy blank databases to all the clients and have them start the sync. I've had no issues with this on the dev side but I can only test one other client to know if that'll do anything different. Aside from that I don't know what to do other than to keep doing manual syncs which would defeat this purpose entirely. I thought PerformPostRestoreFixup would alleviate the issue entirely but I seem to be having the same problems with or without it or perhaps I'm not looking at what I need to be.
Thanks
I wanted to report and close the entry with my findings.
When I would deploy a previously configured client database, I'd often get ApplyChangeFailed events in the form of this log:
"[05:30:41 PM] - ApplyChange Failed: TableName: , Stage: ApplyingInserts, ConflictType: LocalInsertRemoteInsert, Action: RetryWithForceWrite"
This is what I thought would be expected as it tried to reinsert the data that is already there. What this should've been changed to was an update statement during RetryWithForceWrite but I found the data was not updating with what was being sent down.
Once I started each client with a completely blank database and provisioned locally, all of these errors went away. It's as if every client expects some unique id only it sets. I'm also using x64 builds versus x86 which may have some or no bearing on the results. I wish I could determine what exactly happened but it seems that when in doubt, and whenever possible, starting from absolute zero and letting sync fill in the data is your safest option.

sql 2008 express slow on first request? Goes to sleep?

I have read somewhere about SQL express running as a user instance or something.. and as such, the instance/service "goes to sleep" if not used for x time.. (don't know the actual timings etc)
So the scenario is:
If my website (in this case) doesn't have anyone using it for "a few hours", SQL Express "seems" to go to sleep.
The next time someone comes along (after the pause for however long), the initial response takes quite a few seconds more to action.
Subsequent requests directly after the initial one seem very fast.. again until there is a pause "for a few hours" or whatever the timing is?
Any ideas? if so, any examples/directions of what to do?
Thanks!
David.
Yes, there is the so called RANU instance, which is what you get when you specify User Instance=True in the connection string. Read more about this in SQL Server 2005 Express Edition User Instances. I would recommend you stay as far away as possible from anything related to User Instances. They are impossible to debug and troubleshoot when things go wrong, they sometimes have a ramp up time to create the new instance of minutes, and they really offer no advantage in the real world. Besides, they are deprecated in SQL Server Express 2008.
If you're using SQL Express 2008 and you do not specify User Instance=True in your connection string then you do no get an user instance, so probably the first request time comes from the IIS app pool warm up, as other have suggested. It may also occur due to ordinary process workingset attrition that would cause the SQL buffer pools to go cold. You can easily identify whether it is IIS or SQL by monitoring appropriate performance counters on your system.
This isn't the database going to sleep, this is the Application Pool in IIS. If no users are connected/using the website, then the application pool will reset and the session(s) will shutdown. Then, when a user comes to the website, it has to restart the website.
There is a term called Database Warmup pal, you can find out more here and probably this is your solution
Are you sure it's the database going to sleep and not IIS? IIS will unload websites after a certain period of inactivity, and they can be very slow to reload.

Stop Monitoring SQL Services for Registered Servers in SMSS

Question: Is it possible to stop SSMS from monitoring the service status of registered servers?
Details:
SSMS 2008 monitors the service status of every registered server. From what I have seen it seems to reach out to every registered server every minute or so to check it's status, in my case that is over 100 servers. This process has raised issues with our Security and Network departments. Network identified it initially as suspicious traffic due to the fact that it appeard as an unknown utility was scanning the network for SQL Servers. Security was concerned because the Security Event Logs on each server are being filled up with my logon events.
I have looked all over for a setting but can't seem to find one. Am I missing it somewhere?
TIA,
Brian
I finally found an answer!!
While it is not possible (at least that I've found) to stop SSMS from checking the service status of registered servers it is possible to change the interval at which it checks it.
The short version is to create the following registry keys (DWORD):
(SQL Server 2008)
HKLM\Software\Microsoft\Microsoft SQL Server\100\Tools\Shell | PollingInterval = 600 (decimal)
(SQL Server 2005)
HKLM\Software\Microsoft\Microsoft SQL Server\90\Tools\Shell | PollingInterval = 600 (decimal)
This will make SSMS connect automatically every minute instead of every few seconds.
See this MS Connect Post for details.
Since it doesn't appear that there's any way to stop these status checks by SSMS, can you focus on helping them to see their harmlessness?
Can the network group allow certain exceptions to this particular rule (pinging servers on port 1433) in their scanning software, which would allow you and your group to monitor SQL Server uptime? Even if you weren't using SSMS, this type of sweeping monitoring activity is pretty common, and you'll know the requests will only ever come from a handful of workstations.
I don't think these SQL status checks generate any more events in the security log than any other activity, so maybe they were just concerned because it was something they weren't expecting. Could the security group be convinced that these events aren't dangerous, again as long as they're coming from certain approved workstations?
If neither of these is an option (or even if it is), you could help mitigate the problem by not connecting to all your SQL servers at once. Maybe just connect to the ones you need at the time - it looks like loading the entire list actively connects to each of them, but just connecting to the ones you intend to use in that session might help reduce the number of network sessions open.
I hope this helps - if it doesn't, or you've got some additional input that might help find a workaround, please post it!