Have a bunch of WCF REST services hosted on Azure that access a SQL Azure database. I see that ServicePointManager.UseNagleAlgorithm is set to true. I understand that setting this to false would speed up calls (inserts of records < 1460 bytes) to table storage - the following link talks about it.
My Question - Would disabling the Nagle Algorithm also speed up my calls to SQL Azure?
Nagle's algorithm is all about buffering tcp-level data into a smaller # of packets, and is not tied to record size. You could be writing rows to Table Storage of, say, 1300 bytes of data, but once you include tcp header info, content serialization, etc., the data transmitted could be larger than the threshold of 1460 bytes.
In any case: the net result is that you could be seeing write delays up to 500ms when the algorithm is enabled, as data is buffered, resulting in less tcp packets over the wire.
It's possible that disabling Nagle's algorithm would help with your access to SQL Azure, but you'd probably need to do some benchmarking to see if your throughput is being affected based on the type of reads/writes you're doing. It's possible that the calls to SQL Azure, with the requisite SQL command text, result in large-enough packets that disabling nagle wouldn't make a difference.
Related
As in the question, I want to increase number of queries per second on GCS. Currently, my application is on my local machine, when it runs, it repeatedly sends queries to and receives data back from the GCS server. More specifically, my location is in Vietnam, and the server (free tier though) is in Singapore. The maximum QPS I can get is ~80, which is unacceptable. I know I can get better QPS by putting my application on the cloud, same location with the SQL server, but that alone requires a lot of configuration and works. Are there any solutions for this?
Thank you in advance.
colocating your application front-end layer with the data persistence layer should be your priority: deploy your code to the cloud as well
use persistent connections/connection pooling to cut on connection establishment overhead
free tier instances for Cloud SQL do not exist. What are you referring to here? f1-micro GCE instances are not free in Singapore region either.
depending on the complexity of your queries, read/write pattern, size of your dataset, etc. performance of your DB could be I/O bound. Ensuring your instance is provisioned with SSD storage and/or increasing the data disk size can help lifting IOPS limits, further improving DB performance.
Side note: don't confuse commonly used abbreviation GCS (Google Cloud Storage) with Google Cloud SQL.
I searched online for awhile about what is "Excessive resource usage" on SQL Azure, still cannot get an idea.
Some articles suggest query takes too long, too much memory etc will cause "Excessive resource usage". But If I use simple query, simple data structure, what will happen?
For example: I get a 1G SQL Azure as session state. Since session is a very small string, and save/delete all the time, I don't think it will grow to 1G for millions of session simultaneously. You can calculate, for 1 million session, 20 char each, only take 20M space, consider 20 minutes expire etc. Cannot even close to 1G. But the queries, should be lots and lots. Each query will be very simple and fast by index.
I wanna know, if this use will be consider as "Excessive resource usage"? Is there any hard number to limit you on the usage?
Btw, as example above, if all happen in same datacenter, so all cost is 1G database which is $10 a month, right?
Unfortunately the answer is 'it depends'. I think that probably the best reference (with guidance) on the SQL Azure Query Throttle is here: TechNet Article on SQL Azure Perormance This will povide details about the metrics that are monitored and the mechanism of the throttle.
The reason that I say it depends is that the throttle is non-deterministic for any given user. This is because the throttle will be activated based on the total load on the node (physical SQL Server in Azure DC). While the subscribers who will get throttled are the subscribers delivering the greatest load the level at which the throttle kicks in will depend on the total load on the node. SO if you are on a quiet node (where other tenant DBs are relatively inactive) then you will be able to put through a bunch more throughput than if you are on a busy node.
It is very appealing to use 1GB SQL Azure DBs for session state storage; you've identified the cost benefits. You are taking a risk though. One way to mitigate this risk is to partition across at least two SQL Azure 1GB DBs and adjust the load yourself based on whether one of the DBs starts hitting the throttle.
Another option if you want determinism for throughput is to use the WIndows Azure Cache to back your sesion state store. The Cache has hard pre-defined limits for query throughput so you can plan for it more easily Azure Caching FAQ including Limits. The Cache approach is probably a bit more expensive but with a lower risk of problems.
I have the need to access a sybase database (12.5) from oversea. The high latency is definitely a problem.
I already optimized the connection parameters to make better use of the network and achieved a 20x performance increase, but it's still not enough : 1 minute to get 3Mb of data.
We need another 10x or 20x increase for our application.
Technical data :
the data are flowing through a single TCP connection using the TDS protocol
the client app is an excel sheet with macros, using the default Sybase driver
the corporate environment makes it difficult to push big changes in the 10+ years architecture, so solutions need to be the least intrusive. But some changes may be bargained due to the importance of this project.
Can anyone give me pointers ?
I already thought of :
splitting SQL requests over several concurrent connections to the database. The problem is data consistency : what if records are modified at the same time since requests will not be exactly executed at the same time ? Is there an existing mechanism to spread a request over several calls on different connections ?
using some kind of database "cache" or "local replication" oversea, but I don't know what is possible.
Thanks.
Try to install local database (ASE or ASA) and synchronize this databases with Sybase Mobilink (or Sybase Replication Server if you need small replication latency and you have a lot of money).
(I know I answer to my own question)
Eventually, we settled to designing our own database remote access protocol. It's not complicated since we are only using a basic subset of SQL (SELECT and UPDATE), and the protocol doesn't have to understand SQL anyway.
By using our own protocol, we'll be able to use compression, make the client able to use several TCP links at the same time, maximize network utilisation and add some functionnal caching secific to our application.
The client will be our app and the server will be a "proxy" to the real database, sitting next to it (like #Tim suggested in the comments).
It's not the only solution, but we feel that it's a good balance between enormous replication price, development complexity and expected benefits.
I'm writing both client and server code using WCF, where I need to know the "perceived" bandwidth of traffic between the client and server. I could use ping statistics to gather this information separately, but I wonder if there is a way to configure the channel stack in WCF so that the same statistics can be gathered simultaneously while performing my web service invocations. This would be particularly useful in cases where ICMP is disabled (e.g. ping won't work).
In short, while making my regular business-related web service calls (REST calls to be precise), is there a way to collect connection speed data implicitly?
Certainly I could time the web service round trip, compared to the size of data used in the round-trip, to give me an idea of throughput - but I won't know how much of that perceived bandwidth was network related, or simply due to server-processing latency. I could perhaps solve that by having the server send back a time delta, representing server latency, so that the client can compute the actual network traffic time. If a more sophisticated approach is not available, that might be my answer...
The ICMP was not created with the intention of trying those connection speed statistics, but rather if a valid connection was made between two hosts.
My best guess is that the amount of data sent in those REST calls or ICMP traffic is not enough to calculate a perceived connection speed / bandwidth.
If you calculate by these metrics, you will get very big bandwidth statistics or very low, use as an example the copy box in windows XP. You need a constant and substantial amount of data to be sent in order to calculate valid throughput statistics.
Our database architecture consists of two Sql Server 2005 servers each with an instance of the same database structure: one for all reads, and one for all writes. We use transactional replication to keep the read database up-to-date.
The two servers are very high-spec indeed (the write server has 32GB of RAM), and are connected via a fibre network.
When deciding upon this architecture we were led to believe that the latency for data to be replicated to the read server would be in the order of a few milliseconds (depending on load, obviously). In practice we are seeing latency of around 2-5 seconds in even the simplest of cases, which is unsatisfactory. By a simplest case, I mean updating a single value in a single row in a single table on the write db and seeing how long it takes to observe the new value in the read database.
What factors should we be looking at to achieve latency below 1 second? Is this even achievable?
Alternatively, is there a different mode of replication we should consider? What is the best practice for the locations of the data and log files?
Edit
Thanks to all for the advice and insight - I believe that the latency periods we are experiencing are normal; we were mis-led by our db hosting company as to what latency times to expect!
We're using the technique described near the bottom of this MSDN article (under the heading "scaling databases"), and we'd failed to deal properly with this warning:
The consequence of creating such specialized databases is latency: a write is now going to take time to be distributed to the reader databases. But if you can deal with the latency, the scaling potential is huge.
We're now looking at implementing a change to our caching mechanism that enforces reads from the write database when an item of data is considered to be "volatile".
No. It's highly unlikely you could achieve sub-1s latency times with SQL Server transactional replication even with fast hardware.
If you can get 1 - 5 seconds latency then you are doing well.
From here:
Using transactional replication, it is
possible for a Subscriber to be a few
seconds behind the Publisher. With a
latency of only a few seconds, the
Subscriber can easily be used as a
reporting server, offloading expensive
user queries and reporting from the
Publisher to the Subscriber.
In the following scenario (using the
Customer table shown later in this
section) the Subscriber was only four
seconds behind the Publisher. Even
more impressive, 60 percent of the
time it had a latency of two seconds
or less. The time is measured from
when the record was inserted or
updated at the Publisher until it was
actually written to the subscribing
database.
I would say it's definately possible.
I would look at:
Your network
Run ping commands between the two servers and see if there are any issues
If the servers are next to each other you should have < 1 ms.
Bottlenecks on the server
This could be network traffic (volume)
Like network cards not being configured for 1GB/sec
Anti-virus or other things
Do some analysis on some queries and see if you can identify indexes or locking which might be a problem
See if any of the selects on the read database might be blocking the writes.
Add with (nolock), and see if this makes a difference on one or two queries you're analyzing.
Essentially you have a complicated system which you have a problem with, you need to determine which component is the problem and fix it.
Transactional replication is probably best if the reports / selects you need to run need to be up to date. If they don't you could look at log shipping, although that would add some down time with each import.
For data/log files, make sure they're on seperate drives so the performance is maximized.
Something to remember about transaction replication is that a single update now requires several operations to happen for that change to occur.
First you update the source table.
Next the log readers sees the change and writes the change to the distribution database.
Next the distribution agent sees the new entry in the distribution database and reads that change, then runs the correct stored procedure on the subscriber to update the row.
If you monitor the statement run times on the two servers you'll probably see that they are running in just a few milliseconds. However it is the lag time while waiting for the log reader and distribution agent to see that they need to do something which is going to kill you.
If you truly need sub second processing time then you will want to look into writing your own processing engine to handle data moving from one server to another. I would recommend using SQL Service Broker to handle this as this way everything is native to SQL Server and no third party code has to be written.