Slow SQL connection over sea - reverse-proxy

This is what we have know:
web server in the UK + SQL SERVER in the UK
Because we can't make live replication of the database we come up this solution for the US:
web server in the US + talk with the SQL SERVER in the UK.
And we see a strange result, we got a slow connection of the page, it's more slow from making proxy from the US to the UK and we don't understand why.
The logic said to us that the sql data is smaller then the proxy (of all the data in the page).
Do you have any ideas?

If you want your SQL database to be that far away from your server, you need to seriously think about reducing the number of sequential queries used.
If your round-trip ping is 0.2ms to the MySQL server, and you make a query, this waits for round-trip communication. If you make 5 round-trip queries sequentially (that is, you wait for the first query to end before starting the second), it will take 0.2ms * 5 = 1ms.
Adding 1ms extra latency is no big deal. You probably won't notice.
If your database server is located outside the same datacenter, you'll probably get at least 20ms latency to the database. Five queries in a row would then take 100ms. Still not that bad.
If you're located across the ocean from your datacenter, you're probably talking 100-200ms latency. Five sequential queries would then take as long as a full second to return.
If you use 20-30 queries throughout the backend, it could take 10+ seconds to load your page.
Solutions?
Put your database server in the same datacenter as your web server. Unless you can do all queries in parallel, or reduce the system to a single query per page, it's actually faster to have your webserver in the UK than to separate the web server and database server by an ocean.
Greatly reduce the number of queries.
Cache.

Related

What data load on my DB should I expect if I get more users?

currently as a single user, it takes the 260ms for a certain query to run from start to finish.
what will happen if I have 1000 queries sent at the same time? should I expect the same query to take ~4 minutes? (260ms*1000)
It is not possible to make predictions without any knowledge of the situation. There will be a number of factors which affect this time:
Resources available to the server (if it is able to hold data in memory, things run quicker than if disk is being accessed)
What is involved in the query (e.g. a repeated query will usually execute quicker the second time around, assuming the underlying data has not changed)
What other bottlenecks are in the system (e.g. if the webserver and database server are on the same system, the two processes will be fighting for available resource under heavy load)
The only way to properly answer this question is to perform load testing on your application.

SQL Azure - very slow compared to localhost database

I decided I wanted to try out Microsoft SQL Azure, as many people have talked very highly about it. It should be fast, flexible, cheap and many other things.
I got it up and running, migrated my data to Azure and hooked up the connectionstring. I tried to run some queries on the database, and was shocked about how slow even simple queries were. A "SELECT *" from a table with 700 rows took 7 seconds. My page also seems extremely slow, compared to when I used a localhost managent studio or a database on a shared hosting.
Now, when I setup my server, I couldn't pick a physical position. However, I live in Denmark, and I can see the server is the "South Central US". This might be the issue.
I don't use any stored procedures (so I guess no parameter sniffing).. I can also see my indexes is transfered succesfully.
Any ideas on what to do? Any performance things I am missing?
I ran into this very issue the last few days. Change your database tier from basic to standard and you will see a HUGE increase in performance. I am working on a query intensive dashboard at the moment, it took a 20 sec response time down to 2 sec response.
I've used Azure now for the last many years, and my original question is pretty much solved.
My main take-aways after dealing with Azure databases for a while:
It's extremely important that your application and database is placed in the same region. If not, then you will have a slow application. Recently I had an API and app running on two different regions - it took ~1 second for every response.. After moving it to same, it was instant
If your application has a high load, it's often a good idea to upgrade. This happens earlier than you might expect
Pick the nearest region - it really matters

What is "Excessive resource usage" in SQL Azure?

I searched online for awhile about what is "Excessive resource usage" on SQL Azure, still cannot get an idea.
Some articles suggest query takes too long, too much memory etc will cause "Excessive resource usage". But If I use simple query, simple data structure, what will happen?
For example: I get a 1G SQL Azure as session state. Since session is a very small string, and save/delete all the time, I don't think it will grow to 1G for millions of session simultaneously. You can calculate, for 1 million session, 20 char each, only take 20M space, consider 20 minutes expire etc. Cannot even close to 1G. But the queries, should be lots and lots. Each query will be very simple and fast by index.
I wanna know, if this use will be consider as "Excessive resource usage"? Is there any hard number to limit you on the usage?
Btw, as example above, if all happen in same datacenter, so all cost is 1G database which is $10 a month, right?
Unfortunately the answer is 'it depends'. I think that probably the best reference (with guidance) on the SQL Azure Query Throttle is here: TechNet Article on SQL Azure Perormance This will povide details about the metrics that are monitored and the mechanism of the throttle.
The reason that I say it depends is that the throttle is non-deterministic for any given user. This is because the throttle will be activated based on the total load on the node (physical SQL Server in Azure DC). While the subscribers who will get throttled are the subscribers delivering the greatest load the level at which the throttle kicks in will depend on the total load on the node. SO if you are on a quiet node (where other tenant DBs are relatively inactive) then you will be able to put through a bunch more throughput than if you are on a busy node.
It is very appealing to use 1GB SQL Azure DBs for session state storage; you've identified the cost benefits. You are taking a risk though. One way to mitigate this risk is to partition across at least two SQL Azure 1GB DBs and adjust the load yourself based on whether one of the DBs starts hitting the throttle.
Another option if you want determinism for throughput is to use the WIndows Azure Cache to back your sesion state store. The Cache has hard pre-defined limits for query throughput so you can plan for it more easily Azure Caching FAQ including Limits. The Cache approach is probably a bit more expensive but with a lower risk of problems.

What are the factors that affect the time taken to run a SQL on a database?

I have a query that runs on a data warehouse. I ran the report last month. It gave me some results in say x minutes. The same report when run on the same database without any modifications to the database returns the same results but in y minutes now.
y>x. The difference between the time is so large.
The amount of data and the indexes are also the same. There is no difference in them.
Now clients ask for me for a reason for this. What are the possible reasons for this?
You leave a lot of questions open
is the database running on a dedicated server.
do you run the reports from clients or directly on the server.
have there been changes to the phyisical network, have some settings been changed.
did they (by accident) change the protocol to communicate with the server (tcp, named-pipes, ...)
have you tried defragmenting
have you rebooted the server
do you have an execution plan before and after
Most likely the query plan has changed. Some minor difference in data has pushed the query optimisers calculations onto a new, less optimal plan.
Here are a few:
The amount of data in the warehouse has changed.
Indexes might have been modified.
Your warehouse is split across different servers and there is connectivity lag between them...
Your database server is processing something else as well due to which it has lesser memory and cpu for ur reports to run.

Is it possible to get sub-1-second latency with transactional replication?

Our database architecture consists of two Sql Server 2005 servers each with an instance of the same database structure: one for all reads, and one for all writes. We use transactional replication to keep the read database up-to-date.
The two servers are very high-spec indeed (the write server has 32GB of RAM), and are connected via a fibre network.
When deciding upon this architecture we were led to believe that the latency for data to be replicated to the read server would be in the order of a few milliseconds (depending on load, obviously). In practice we are seeing latency of around 2-5 seconds in even the simplest of cases, which is unsatisfactory. By a simplest case, I mean updating a single value in a single row in a single table on the write db and seeing how long it takes to observe the new value in the read database.
What factors should we be looking at to achieve latency below 1 second? Is this even achievable?
Alternatively, is there a different mode of replication we should consider? What is the best practice for the locations of the data and log files?
Edit
Thanks to all for the advice and insight - I believe that the latency periods we are experiencing are normal; we were mis-led by our db hosting company as to what latency times to expect!
We're using the technique described near the bottom of this MSDN article (under the heading "scaling databases"), and we'd failed to deal properly with this warning:
The consequence of creating such specialized databases is latency: a write is now going to take time to be distributed to the reader databases. But if you can deal with the latency, the scaling potential is huge.
We're now looking at implementing a change to our caching mechanism that enforces reads from the write database when an item of data is considered to be "volatile".
No. It's highly unlikely you could achieve sub-1s latency times with SQL Server transactional replication even with fast hardware.
If you can get 1 - 5 seconds latency then you are doing well.
From here:
Using transactional replication, it is
possible for a Subscriber to be a few
seconds behind the Publisher. With a
latency of only a few seconds, the
Subscriber can easily be used as a
reporting server, offloading expensive
user queries and reporting from the
Publisher to the Subscriber.
In the following scenario (using the
Customer table shown later in this
section) the Subscriber was only four
seconds behind the Publisher. Even
more impressive, 60 percent of the
time it had a latency of two seconds
or less. The time is measured from
when the record was inserted or
updated at the Publisher until it was
actually written to the subscribing
database.
I would say it's definately possible.
I would look at:
Your network
Run ping commands between the two servers and see if there are any issues
If the servers are next to each other you should have < 1 ms.
Bottlenecks on the server
This could be network traffic (volume)
Like network cards not being configured for 1GB/sec
Anti-virus or other things
Do some analysis on some queries and see if you can identify indexes or locking which might be a problem
See if any of the selects on the read database might be blocking the writes.
Add with (nolock), and see if this makes a difference on one or two queries you're analyzing.
Essentially you have a complicated system which you have a problem with, you need to determine which component is the problem and fix it.
Transactional replication is probably best if the reports / selects you need to run need to be up to date. If they don't you could look at log shipping, although that would add some down time with each import.
For data/log files, make sure they're on seperate drives so the performance is maximized.
Something to remember about transaction replication is that a single update now requires several operations to happen for that change to occur.
First you update the source table.
Next the log readers sees the change and writes the change to the distribution database.
Next the distribution agent sees the new entry in the distribution database and reads that change, then runs the correct stored procedure on the subscriber to update the row.
If you monitor the statement run times on the two servers you'll probably see that they are running in just a few milliseconds. However it is the lag time while waiting for the log reader and distribution agent to see that they need to do something which is going to kill you.
If you truly need sub second processing time then you will want to look into writing your own processing engine to handle data moving from one server to another. I would recommend using SQL Service Broker to handle this as this way everything is native to SQL Server and no third party code has to be written.