Changing PostgreSQL server changed Django app characteristics - sql

I had to switch an enterprise Django 1.11 site from a corporate-hosted PostgreSQL 9.4 server to AWS RDS Aurora-PostgreSQL 10 cluster. My initial impression was that it should be a straightforward migration, as I was not using any version-specific code.
Immediately after migration, the site started breaking down horribly. Queries that used to take milliseconds suddenly jumped to 100x the time, causing timeouts all over gunicorn threads. I also kept seeing connections being dropped from both RDS and Django.
It kept appearing as if it would be some setting I need to match between previous server and current server, but despite engaging PostgreSQL experts and AWS support, there were no simple answers (or even complex ones). I finally had to fine-tune most queries in my Django code to bring stability to the site.
The app has several queries that refer to foreign relationships, so I used a number of prefetch_related and similar tricks to fix the slowdown. So, a query that was taking 0.5 seconds went to 80 seconds, and after I added prefetch_related, went back to 0.5 seconds.
Even though the site is now stable, I am posting this in the hope that some PostgreSQL and/or Django expert sees this and recognizes this as a symptom of some wrong setting. I am not in a position to share sample queries and am not asking for query optimization. The question is: what would cause a query to become 100x slower when we move from one PostgreSQL server to another, with no change in application code?

In general, postgres-compatible aurora has wildly different performance characteristics than vanilla postgres, and the configuration and tuning for both can be very different. The easiest path forward for you would have been for you to have used AWS RDS for Postgres and not AWS RDS with Aurora Postgres if you had wanted to get performance characteristics that were close to your self-hosted postgres. There are a number of configuration details that you didn't share that would affect performance between RDS and a self-hosted server including VPC settings, SSL, etc. that could also affect performance.

Related

Redis performance in localhost

I am trying to check redis performance against mysql in my windows localhost. I am a student and we are learning various things in my school. I have around 1048580 records in mysql local and I am performing various rest operations. I also have implemented redis to store the values by using springboot cacheable and lettuce. It all works fine but I don't know how to measure the performance to see thaat redis is performing better than mysql. I think it would be easier on a very laarge scale company structure. can I simulate on my local? Also, how to benchmark redis performances on my local for my academic project?
I have tried sending multiple requests in a loop to try to determine performance but don't see much of a difference for localhost - my records. I have tried understanding various commands of redis cli monitoring but don't see much latency.
Well it depends on how you are actually testing these redis vs MySQL. You have to keep in mind that MySQL internally use caches, also if you use hibernet then it also does a level of caching. If you do make same get request several time then there would not be any major difference between redis and MySQL result.
You should compare your result by doing several different operation, like inserting/deleting/getting thousands of different values. Then same operation for identical values etc.

How to analyze poor performance from Azure PostGreSQL-PaaS

I'm experiencing poor performance from Azure PostGreSQL-PaaS and need help with how to proceed.
I'm trying out Azure PostGreSQL-PaaS in a project. I'm experiencing an intolerable performance from the database (or at least it seems like the database is the problem).
Our application is running in an Azure-VM and both the VM and the database is located in western Europe.
The network between the VM and the database seems to perform ok. (Using psping (from Sysinternals) on the database port 5432 I get latency between 2 ms and 4 ms)
PostGreSQL incorporates a benchmark tool called pgbench. This tool runs a sequence of simple sql statements on a test dataset and provides timing.
I ran pgbench on the VM against the database. Pgbench reports latency between 800 ms and 1600 ms.
If I do the same test with pgbench in-house on our local network against an in-house database I typically get latency below 10 ms.
I tried to contact Microsoft support regarding this, but I've basically been told that since the network seems to perform ok this must be a PostGreSQL-software-problem and not related to Microsoft.
Since the database is PostGreSQL-Paas I've only got limited access to logs and metrics.
Can anyone please help or advice me with how to proceed with this?
Performance of Azure PostgreSQL PaaS offering depends on different server and client configuration, including the SKU provisioned along with storage IOPS. Microsoft engineering has published series of performance blog which helps customer gain measurable and empirical gains by following these steps based on their workload. Please review these blog post:
Performance best practices for Azure PostgreSQL
Performance tuning of Azure PostgreSQL
Performance quick tips for Azure PostgreSQL
Is your in-house Postgres set up similar to the set up in Azure ?
I had the same issue. We moved from a dedicated VM (Ubuntu, Size Standard B2s 2 vcpus, 4 GiB memory, ~35€ p.m. ) running PostgreSQL to the Azure managed PostgreSQL instance (General Purpose, single server, 2vcpus, 10GB Memory, ~130€ p.m. ).
I first noticed the bad performance when the main API request of our webapplication suddenly took 3s instead of 1.7s / 2s.
I ran some very simple timing tests on my old setup with dedicated VM:
select count(*) from mytable;
count
-------
4686
Time: 0.940 ms
And those are the timings of the new setup with Azure managed PostgreSQL:
select count(*) from mytable;
count
-------
4686
Time: 21,353 ms
I think I do not have to explain these numbers :)
I have created a support ticket, and got some insights:
"In Azure PostgreSQL single server, we have a gateway to manage and route connections and there are always 3 copies of the data to ensure your data is not lost, and all of this will create latency."
I also asked what the benefits are of the managed database:
A: Being a instance running on azure, you’ve benefit of:
-Automatic patching, your instance is automatically upgraded.
-Crash recovery, in case our system detects the instance is not running, it tries to perform a restart/swithover to a new host. If all this fails, an oncall engineer is activated to manually restore the instance.
-Automatic backups and one click point in time restore.
-Redundancy of data."
They suggested that I switch from Single Server to a Flexible server, where the gateway is ditched and the performance apparently should be better, but not as good as on a managed instance:
"In several tests we’ve made, the performance comparing to single server is much better. But to setup the right expectactions, you will not get 1 to 1 performance as having PostgreSQL running in a dedicated virtual machine."
I asked for the results of those tests, I will post them here as soon as I get them.
I think you have to decide if the benefits mentioned above are so high that you are willing to pay at least 4 times more compared to a dedicated VM and if you can live with the worse performance. We will now switch back to a master / slave configuration with 2 dedicated VMs.

Query Terminating in Redshift

We are migrating our database from SQL Server 2012 to Amazon Redshift.
The front end of our application is developed in MicroStrategy (MSTR) which fires the queries on Redshift.
Although the application is working fine in Production (on SQL Server 2012), we have run into a strange issue in our PoC Environment on Redshift.
When we kicked off a dashboard in MSTR, the query from the dashboard hits Redshift and it completes successfully without any issues.
But when we stress test the application by running all the dashboards simultaneously, then that particular dashboard's query terminates in Redshift. The database does not throw any error message which is why we cannot troubleshoot why the query is terminating.
Can anyone please suggest how we should go about solving this problem.
Thank you
The problem might be that you have some timeout on the queue that you are sending the query using WLM configuration.
Redshift is designed differently from other DB, to be optimized for Analytical queries. For that reason it doesn't cache queries results, as you would do with OLTP DB. The other difference is that you have a predefined concurrently level (also part of WLM - http://docs.aws.amazon.com/redshift/latest/mgmt/workload-mgmt-config.html). Each concurrency slot will have its allocated resources to complete big queries quickly, but it is limiting the number of concurrent queries that can run. The default configuration is 5, and you can increase it up to 50. The recommendation is to have it increased to not more than 15-20, as with 50, it means that each query is getting only 2% of the cluster resource instead of 20% (with 5) or 5% (with 20).
The combination of these two differences is: if you are connecting many dashboards, each one sends its queries to Redshift, competes over the resources (without caching each query will run again and again), and might timeout or just be too slow for an interactive dashboard.
Please make sure that you are using the Redshift optimized drivers for MicroStrategy, which are sending queries to Redshift under the above assumptions.
You can also consider putting some RDS between your dashboards and Redshift, with the aggregation data that you need for your dashboards, and that can use in-memory caching and higher concurrency on that summary data. You can see an interesting pattern that you can implement with pg-bouncer see here, that can help you send some queries (the analytical ones) to Redshift, and some (the aggregated dashboard ones) to a PostgreSQL one.

SQL Azure - very slow compared to localhost database

I decided I wanted to try out Microsoft SQL Azure, as many people have talked very highly about it. It should be fast, flexible, cheap and many other things.
I got it up and running, migrated my data to Azure and hooked up the connectionstring. I tried to run some queries on the database, and was shocked about how slow even simple queries were. A "SELECT *" from a table with 700 rows took 7 seconds. My page also seems extremely slow, compared to when I used a localhost managent studio or a database on a shared hosting.
Now, when I setup my server, I couldn't pick a physical position. However, I live in Denmark, and I can see the server is the "South Central US". This might be the issue.
I don't use any stored procedures (so I guess no parameter sniffing).. I can also see my indexes is transfered succesfully.
Any ideas on what to do? Any performance things I am missing?
I ran into this very issue the last few days. Change your database tier from basic to standard and you will see a HUGE increase in performance. I am working on a query intensive dashboard at the moment, it took a 20 sec response time down to 2 sec response.
I've used Azure now for the last many years, and my original question is pretty much solved.
My main take-aways after dealing with Azure databases for a while:
It's extremely important that your application and database is placed in the same region. If not, then you will have a slow application. Recently I had an API and app running on two different regions - it took ~1 second for every response.. After moving it to same, it was instant
If your application has a high load, it's often a good idea to upgrade. This happens earlier than you might expect
Pick the nearest region - it really matters

Need hints to optimise a sybase access over a big fat pipe

I have the need to access a sybase database (12.5) from oversea. The high latency is definitely a problem.
I already optimized the connection parameters to make better use of the network and achieved a 20x performance increase, but it's still not enough : 1 minute to get 3Mb of data.
We need another 10x or 20x increase for our application.
Technical data :
the data are flowing through a single TCP connection using the TDS protocol
the client app is an excel sheet with macros, using the default Sybase driver
the corporate environment makes it difficult to push big changes in the 10+ years architecture, so solutions need to be the least intrusive. But some changes may be bargained due to the importance of this project.
Can anyone give me pointers ?
I already thought of :
splitting SQL requests over several concurrent connections to the database. The problem is data consistency : what if records are modified at the same time since requests will not be exactly executed at the same time ? Is there an existing mechanism to spread a request over several calls on different connections ?
using some kind of database "cache" or "local replication" oversea, but I don't know what is possible.
Thanks.
Try to install local database (ASE or ASA) and synchronize this databases with Sybase Mobilink (or Sybase Replication Server if you need small replication latency and you have a lot of money).
(I know I answer to my own question)
Eventually, we settled to designing our own database remote access protocol. It's not complicated since we are only using a basic subset of SQL (SELECT and UPDATE), and the protocol doesn't have to understand SQL anyway.
By using our own protocol, we'll be able to use compression, make the client able to use several TCP links at the same time, maximize network utilisation and add some functionnal caching secific to our application.
The client will be our app and the server will be a "proxy" to the real database, sitting next to it (like #Tim suggested in the comments).
It's not the only solution, but we feel that it's a good balance between enormous replication price, development complexity and expected benefits.