Postgres Paginating a FTS Query

Postgres Paginating a FTS Query - sql

What is the best way to paginate a FTS Query ? LIMIT and OFFSET spring to mind. However, I am concerned that by using limit and offset I'd be running the same query over and over (i.e., once for page 1, another time for page 2.... etc).
Will PostgreSQL be smart enough to transparently cache the query result ? Thus subsequently satisfying the pagination queries from a cache ? If not, how do I paginate efficiently ?
edit
The database is for single user desktop analytics. But, I still want to know what the best way is, if this were a live OLTP application. I have addressed the problem in the past with SQL Server by creating a ordered set of document id's and cache the query parameters against the IDs in a seperate table. Clearing the cache every few hours (so as to allow new documents to enter the result set).
Perhaps this approach is viable for postgres. But still I wanna know the mechanics present in the database and how best to leverage them. If I were a DB developer I'd enable the query-response cache to work with the FTS system.

A server-side SQL cursor can be effectively used for this if a client session can be tied to a specific db connection that stays open during the entire session. This is because cursors cannot be shared between different connections. But if it's a desktop app with a unique connection per running instance, that's fine.
The doc for DECLARE CURSOR explains how the resultset is going to be materialized when the cursor is declared WITH HOLD in a committed transaction.
Locking shouldn't be a concern at all. Should the data be modified while the cursor is already materialized, it wouldn't affect the reader nor block the writer.
Other than that, there is no implicit query cache in PostgreSQL. The LIMIT/OFFSET technique implies a new execution of the query for each page, which may be as slow as the initial query depending on the complexity of the execution plan and the effectiveness of the buffer cache and disk cache.

Well, to be honest, what you may want is for your query to return a live Cursor, that you can then reuse to fetch certain portions of the results that it (the Cursor) represents. Now, I don't know if PostGre supports this, Mongo DB does, and I've tried going down that road but it's not cool. For example: do you know how much time it will pass between when a query is done and a second page of results from that query are demanded? Can the cursor stay on for that amount if time? And if it can, what does it mean exactly, will it block resources, such that if you have many lazy users, who start queries but take a long time to navigate through pages, your server might be bogged down by locked cursors?
Honestly, I think redoing a paginated query each time someone asks for a certain page is ok. First of all, you'll be returning a small number of entries (no need to display more than 10-20 entries at a time) and that's gonna be pretty fast, and second, you should more likely tune up your server so that it executes frequent request fast (add indexes, put it behind a Solr server if necessary, etc.) rather than have those queries run slow, but caching them.
Finally, if you really want to speed up full text searches, and have fancy indexes like case insensitive, prefix and suffix enabled, etc, you should take a look at Lucene or better yet Solr (which is Lucene on steroids) as an in-between search and indexing solution between your users and your persistence tier.

Related

Handling paging with changing sort orders

I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request.
I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution.
Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.
Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).
Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data?
Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.

This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements:
Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.
Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.
Maintain different state for each client
This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state.
There are variations within those that mix the various compromises, but that's what it all boils down to.
For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination.
Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination.
Etc.
See also:
How to provide an API client with 1,000,000 database results?
Using "Cursors" for paging in PostgreSQL
Iterate over large external postgres db, manipulate rows, write output to rails postgres db
offset/limit performance optimization
If PostgreSQL count(*) is always slow how to paginate complex queries?
How to return sample row from database one by one
I'd probably implement a hybrid solution of some form, like:
Using a cursor, read and immediately send the first part of the data to the client.
Immediately fetch enough extra data from the cursor to satisfy 99% of clients' requirements. Store it to a fast, unsafe cache like memcached, Redis, BigMemory, EHCache, whatever under a key that'll let me retrieve it for later requests by the same client. Then close the cursor to free the DB resources.
Expire the cache on a least-recently-used basis, so if the client doesn't keep reading fast enough they have to go get a fresh set of data from the DB, and the pagination changes.
If the client wants more results than the vast majority of its peers, pagination will change at some point as you switch to reading direct from the DB rather than the cache or generate a new bigger cached dataset.
That way most clients won't notice pagination issues and you don't have to send vast amounts of data to most clients, but you won't melt your DB server. However, you need a big boofy cache to get away with this. Its practical depends on whether your clients can cope with pagination breaking - if it's simply not acceptable to break pagination, then you're stuck with doing it DB-side with cursors, temp tables, coping the whole result set at first request, etc. It also depends on the data set size and how much data each client usually requires.

I am not aware of a perfect solution for this problem. But if you want the user to have a stale view of the data then cursor is the way to go. Only tuning you can do is to store only the data for 1st 2 pages in the cursor. Beyond that you fetch it again.

Why does my SELECT query take so much longer to run on the web server than on the database itself?

I'm running the following setup:
Physical Server
Windows 2003 Standard Edition R2 SP2
IIS 6
ColdFusion 8
JDBC connection to iSeries AS400 using JT400 driver
I am running a simple SQL query against a file in the database:
SELECT
column1,
column2,
column3,
....
FROM LIB/MYFILE
No conditions.
The file has 81 columns - aplhanumeric and numeric - and about 16,000 records.
When I run the query in the emulator using the STRSQL command, the query comes back immediately.
When I run the query on my Web Server, it takes about 30 seconds.
Why is this happening, and is there any way to reduce this time?

While I cannot address whatever overhead might be involved in your web server, I can say there are several other factors to consider:
This may likely have to do primarily in the differences between the way the two system interfaces work.
Your interactive STRSQL session will start displaying results as quickly as it receives the first few pages of data. You are able to page down through that initial data, but generally at some point you will see a status message at the bottom of the screen indicating that it is now getting more data.
I assume your web server is waiting until it receives the entire result set. It wants to get all the data as it is building the HTML page, before it sends the page. Thus you will naturally wait longer.
If this is not how your web server application works, then it is likely to be a JT400 JDBC Properties issue.
If you have overridden any default settings, make sure that those are appropriate.
In some situations the OPTIMIZATION_GOAL settings might be a factor. But if you are reading the table (aka physical file or PF) directly, in its physical sequence, without any index or key, then that might not apply here.
Your interactive STRSQL session will default to a setting of *FIRSTIO, meaning that the query is optimized for returning the first pages of data quickly, which corresponds to the way it works.
Your JDBC connection will default to a "query optimize goal" of "0", which will translate to an OPTIMIZATION_GOAL setting of *ALLIO, unless you are using extended dynamic packages. *ALLIO means the optimizer will try to minimize the time needed to return the entire result set, not just the first pages.
Or, perhaps first try simply adding FOR READ ONLY onto the end of your SELECT statement.
Update: a more advanced solution
You may be able to bypass the delay caused by waiting for the entire result set as part of constructing the web page to be sent.
Send a web page out to the browser without any records, or limited records, but use AJAX code to load the remainder of the data behind the scenes.
Use large block fetches whenever feasible, to grab plenty of rows in one clip.

One thing you need to remember, the i saves the access paths it creates in the job in case they are needed again. Which means if you log out and log back in then run your query, it should take longer to run, then the second time you run the query it'll be faster. When running queries in a web application, you may or may not be reusing a job meaning the access paths have to be rebuilt.
If speed is important. I would:
Look into optimizing the query. I know there are better sources, but I can't find them right now.
Create a stored procedure. A stored procedure saves the access paths created.

With only 16000 rows and no WHERE or ORDER BY this thing should scream. Break the problem down to help diagnose where the bottleneck is. Go back to the IBM i, run your query in the SQL command line and then use the B, BOT or BOTTOM command to tell the database to show the last row. THAT will force the database to cough up the entire 16k result set, and give you a better idea of the raw performance on the IBM side. If that's poor, have the IBM administrators run Navigator and monitor the performance for you. It might be something unexpected, like the 'table' is really a view and the columns you are selecting might be user defined functions.
If the performance on the IBM side is OK, then look to what Cold Fusion is doing with the result set. Not being a CF programmer, I'm no help there. But generally, when I am tasked with solving multi-platform performance issues, the client side tends to consume the entire result set and then use program logic to choose what rows to display/work with. The server is MUCH faster than the client, and given the right hints, the database optimiser can make some very good decisions about how to get at those rows.

NHibernate Caching Dilemma

My application includes a client, web tier (load balanced), application tier (load balanced), and database tier. The web tier exposes services to clients, and forwards calls onto the application tier. The application tier then executes queries against the database (using NHibernate) and returns the results.
Data is mostly read, but writes occur fairly frequently, particularly as new data enters the system. Much more often than not, data is aggregated and those aggregations are returned to the client - not the original data.
Typically, users will be interested in the aggregation of recent data - say, from the past week. Thus, to me it makes sense to introduce a cache that includes all data from the past 7 days. I cannot just cache entities as and when they are loaded because I need to aggregate over a range of entities, and that range is dictated by the client, along with other complications, such as filters. I need to know whether - for a given range of time - all data within that range is in the cache or not.
In my ideal fantasy world, my services would not have to change at all:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
// execute HQL/criteria call and have it automatically use the cache where possible
}
There would be a separate filtering layer that would hook into NHibernate and intelligently and transparently determine whether the HQL/criteria query could be executed against the cache or not, and would only go to the database if necessary. If all the data was in the cache, it would query the cached data itself, kind of like an in-memory database.
However, on first inspection, NHibernate's second level cache mechanism does not seem appropriate for my needs. What I'd like to be able to do is:
Configure it to always have the last 7 days worth of data in the cache. eg. "For this table, cache all records where this field is between 7 days ago and now."
Have the ability to manually maintain the cache. As new data enters the system, it would be nice if I could just throw it straight into the cache rather than waiting until the cache is invalidated. Similarly, as data falls out of the time period, I'd like to be able to pull it from the cache.
Have NHibernate intelligently understand when it can serve a query directly from the cache rather than hitting the database at all. eg. If the user asks for an aggregate of data over the past 3 days, that aggregation should be calculated directly from the cache rather than touching the DB.
Now, I'm pretty sure #3 is asking too much. Even if I can get the cache populated with all the data required, NHibernate has no idea how to efficiently query that data. It would literally have to loop over all entities in order to discriminate which are relevant to the query (which might be fine, to be honest). Also, it would require an implementation of NHibernate's query engine that executed against objects rather than a database. But I can dream, right?
Assuming #3 is asking too much, I would require some logic in my services like this:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
if (CanBeServicedFromCache(starting, ending, filter))
{
// execute some LINQ to object code or whatever to determine the aggregation results
}
else
{
// execute HQL/criteria call to determine the aggregation results
}
}
This isn't ideal because each service must be cache-aware, and must duplicate the aggregation logic: once for querying the database via NHibernate, and once for querying the cache.
That said, it would be nice if I could at least store the relevant data in NHibernate's second level cache. Doing so would allow other services (that don't do aggregation) to transparently benefit from the cache. It would also ensure that I'm not doubling up on cached entities (once in the second level cache, and once in my own separate cache) if I ever decide the second level cache is required elsewhere in the system.
I suspect if I can get a hold of the implementation of ICache at runtime, all I would need to do is call the Put() method to stick my data into the cache. But this might be treading on dangerous ground...
Can anyone provide any insight as to whether any of my requirements can be met by NHibernate's second level cache mechanism? Or should I just roll my own solution and forgo NHibernate's second level cache altogether?
Thanks
PS. I've already considered a cube to do the aggregation calculations much more quickly, but that still leaves me with the database as the bottleneck. I may well use a cube in addition to the cache, but the lack of a cache is my primary concern right now.

Stop using your transactional ( OLTP ) datasource for analytical ( OLAP ) queries and the problem goes away.
When a domain significant event occurs (eg a new entity enters the system or is updated), fire an event ( a la domain events ). Wire up a handler for the event that takes the details of the created or updated entity and stores the data in a denormalised reporting store specifically designed to allow reporting of the aggregates you desire ( most likely push the data into a star schema ). Now your reporting is simply the querying of aggregates ( which may even be precalculated ) along predefined axes requiring nothing more than a simple select and a few joins. Querying can be carried out using something like L2SQL or even simple parameterised queries and datareaders.
Performance gains should be significant as you can optimise the read side for fast lookups across many criteria while optimising the write side for fast lookups by id and reduced index load on write.
Additional performance and scalability is also gained as once you have migrated to this approach, you can then physically separate your read and write stores such that you can run n read stores for every write store thereby allowing your solution to scale out to meet increased read demands while write demands increase at a lower rate.

Define 2 cache regions "aggregation" and "aggregation.today" with a large expiry time. Use these for your aggregation queries for previous days and today respectively.
In DoIt(), make 1 NH query per day in the requested range using cacheable queries. Combine the query results in C#.
Prime the cache with a background process which calls DoIt() periodically with the date range that you need to be cached. The frequency of this process must be lower than the expiry time of the aggregation cache regions.
When today's data changes, clear cache region "aggregation.today". If you want to reload this cache region quickly, either do so immediately or have another more frequent background process which calls DoIt() for today.
When you have query caching enabled, NHibernate will pull the results from cache if possible. This is based on the query and parameters values.

When analyzing the NHibernate cache details i remember reading something that you should not relay on the cache being there, witch seems a good suggestion.
Instead of trying to make your O/R Mapper cover your applications needs i think rolling your own data/cache management strategy might be more reasonable.
Also the 7 days caching rule you talk about sounds like something business related, witch is something the O/R mapper should not know about.
In conclusion make your app work without any caching at all, than use a profiler (or more - .net,sql,nhibernate profiler ) to see where the bottlenecks are and start improving the "red" parts by eventually adding caching or any other optimizations.
PS: about caching in general - in my experience one caching point is fine, two caches is in the gray zone and you should have a strong reason for the separation and more than two is asking for trouble.
hope it helps

mysql slow on first query, then fast for related queries

I have been struggling with a problem that only happens when the database has been idle for a period of time for the data queried. The first query will be extremely slow, on the order of 30 seconds and then related queries will be fast like 0.1 seconds. I am assuming this is related to caching, but I have been unable to find the cause of it.
Changing the mysql variables tmp_table_size, max_heap_table_size to a larger size had no effect except to create the temp tables in memory.
I do not think this is related to the query itself as it is well indexed and after the first slow query, variants of the same query do not show up in the slow query log. I am most interested in trying to determine the cause of this or a way to reset the offending cache so I can troubleshoot the issue.

Pages of the innodb data files get cached in the innodb buffer pool. This is what you'd expect. Reading files is slow, even on good hard drives, especially random reads which is mostly what databases see.
It may be that your first query is doing some kind of table scan which pulls a lot of pages into the buffer pool, then accessing them is fast. Or something similar.
This is what I'd expect.
Ideally, use the same engine for all tables (exceptions: system tables, temporary tables (perhaps) and very small tables or short-lived ones). If you don't do this then they have to fight for ram.
Assuming all your tables are innodb, make the buffer pool use up to 75% of the server's physical ram (assuming you don't run too many other tasks on the machine).
Then you will be able to fit around 12G of your database into ram, so once it's "warmed up", the "most used" 12G of your database will be in ram, where accessing it is nice and fast.
Some users of mysql tend to "warm up" production servers following a restart by sending them queries copied from another machine for a while (these will be replication slaves) until they add them into their production pool. This avoids the extreme slowness seen while the cache is cold. For example, Youtube does this (or at least it used to; Google bought them and they may now use Google-fu)

MySQL Workbench:
The below isn't 100% related to this SO question, but the symptoms are very related and this is the first SO result when searching for "mysql workbench slow" or related terms, so hopefully it's useful for others.
Clear the query history! - following the process at MySql workbench query history ( last executed query / queries ) i.e. create / alter table, select, insert update queries to clear MySQL Workbench's query history really sped up the program for me.
In summary: change the Output pane to History Output, right click on a Date and select Delete All Logs.
The issue I was experiencing was "slow first query" in that it would take a few seconds to load the results even though the duration/fetch were well under 1 second. After clearing my query history, the duration/fetch times stayed the same (well under 1 second, so no DB behavior actually changed), but now the results loaded instantly rather than after a few second delay.

Is anything else running on your mysql server? My thought is that maybe after the first query, your table is still cached in memory. Once it's idle, another process is causing it to be de-cached. Just a guess though.
How much memory do you have any what else is running?

I had an SSIS package that was timing out. The query was very simple, from a single MySQL table, but it sometimes returned a lot of records and would sometimes take a few minutes initially to execute, then only a few milliseconds afterwards if I wanted to query it again. We were stuck with the ADO connection, which meant it would time out after 30 seconds, so about half the databases we were trying to load were failing.
After beating my head against the wall I tried performing an initial query first; very simple and only returning a few rows. Since it was so simple it executed fast and set the table in the cache for faster querying. In the next step of the package I would do the more complex query which returned the large data set that kept timing out. Problem solved - all tables loaded. I may start doing this on a regular basis, the complex queries execute much faster by doing a simple query first.

Ttry and compare the output of "vmstat 1" on the linux command line when running the query after a period of time, vs when you re-run it and get results fast. Specifically check the "bi" column (that's the kb read from disk per second).
You may find the operating system is caching the disk blocks in the fast case (and thus a low "bi" figure), but not in the slow case (and hence a large "bi" figure).
You might also find that vmstat shows high/low cpu usage in either case. If it's low when fast, and disk throughput is also low, then your system may still be returning a cached query, even though you've indicated the relevant config value is set to zero. Perhaps check the output of show engine innodb status and SHOW VARIABLES and confirm.
innodb_buffer_pool_size may also be set high (it should be...), which would cache the blocks even before the OS can return them.
You might also find that "key_buffer" is set high - this would cache the keys in the indexes, which could make your select blindingly fast.
Try check the mysql performance blog site for lots of useful info.

I had issue when MySQL 5.6 was slow on first query after idle period. This was a connection problem, not MySQL instance problem, e.g. if you run MYSQL Query Browser execute "select * from some_queue", leave it alone for a couple of hours, then execute any query, it runs slow, while at the same time processes on server or new instance of Browser will select from same tables instantly.
Adding skip-host-cache, skip-name-resolve to my.ini file solved this problem.
I don't know why is that. Why I tried this: MySQL 5.1 without those options was slowly establishing connections from other networks (e.g. server is 192.168.1.100, 192.168.1.101 connects fast, 192.168.2.100 connects slow), MySQL 5.6 didn't have such problem to start with so we didn't add these to my.ini initially.
UPD: Solved half the cases, actually. Setting wait_timeout to maximum integer fixed the other half. Maybe I even now can remove skip-host-cache, skip-name-resolve and it won't slow down in 100% of the cases

Sorting on the server or on the client?

I had a discussion with a colleague at work, it was about SQL queries and sorting. He has the opinion that you should let the server do any sorting before returning the rows to the client. I on the other hand thinks that the server is probably busy enough as it is, and it must be better for performance to let the client handle the sorting after it has fetched the rows.
Anyone which strategy is best for the overall performance of a multi-user system?

In general, you should let the database do the sorting; if it doesn't have the resources to handle this effectively, you need to upgrade your database server.
First off, the database may already have indexes on the fields you want so it may be trivial for it to retrieve data in sorted order. Secondly, the client can't sort the results until it has all of them; if the server sorts the results, you can process them one row at a time, already sorted. Lastly, the database is probably more powerful than the client machine and can probably perform the sorting more efficiently.

It depends... Is there paging involved? What's the max size of the data set? Is the entire dataset need to be sorted the same one way all the time? or according to user selection? Or, (if paging is involved), is it only the records in the single page on client screen need to be sorted? (not normally acceptable) or does the entire dataset need to be sorted and page one of the newly sorted set redisplayed?
What's the distribution of client hardware compared to the processing requirements of this sort operation?
bottom line is; It's the overall user experience (measured against cost of course), that should control your decision... In general client machines are slower than servers, and may cause additional latency. ...
... But how often will clients request additional custom sort operations after initial page load? (client sort of data already on client is way faster than round trip...)
But sorting on client always requires that entire dataset be sent to client on initial load... That delays initials page display.. which may require lazy loading, or AJAX, or other technical complexities to mitigate...
Sorting on server otoh, introduces additional scalability issues and may require that you add more boxes to the server farm to deal with additional load... if you're doing sorting in DB, and reach that threshold, that can get complicated. (To scale out on DB, you have to implement some read-only replication scheme, or some other solution that allows multiple servers (each doing processing) to share read only data)..

I am in favor of Roberts answer, but I wanted to add a bit to it.
I also favor the sorting of data in SQL Server, I have worked on many systems that have tried to do it on the client side and in almost every case we have had to re-write the process to have it done inside SQL Server. Why is this you might ask? Well we have two primary reasons.
The amount of data being sorted
The need to implement proper paging due to #1
We deal with interfaces that show users very large sets of data, and leveraging the power of SQL Server to handle sorting and paging is by far better performing than doing it client side.
To put some numbers to this, a SQL Server Side sort to a client side sort in our environment, no paging for either. Client side 28 seconds using XML for sorting, and Server side sort total load time 3 seconds.

Generally I agree with the views expressed above that server-side sorting is usually the way to go. However, there are sometimes reasons to do client-side sorting:
The sort criteria are user-selectable or numerous. In this case, it may not be a good idea to go adding a shedload of indices to the table - especially if insert performance is a concern. If some sort criteria are rarely used, an index isn't necessarily worth it since inserts will outnumber selects.
The sort criteria can't be expressed in pure SQL [uncommon], or can't be indexed. It's not necessarily any quicker client-side, but it takes load of the server.
The important thing to remember is that while balancing the load between powerful clients and the server may be a good idea in theory, only the server can maintain an index which is updated on every insert. Whatever the client does, it's starting with a non-indexed unsorted set of data.

As usual, "It Depends" :)
If you have a stored procedure, for instance, that sends results to your presentation layer (whether a report, grid, etc.), it probably doesn't matter which method you go with.
What I typically run across, though, are views which have sorting (because they were used directly by a report, for instance) but are also used by other views or other procedures with their own sorting.
So as a general rule, I encourage others to do all sorting on the client-side and only on the server when there's reasonable justification for it.

If the sorting is just cosmetic and the client is getting the entire set of data I would tend to let the client handle it as it is about the presentation.
Also, say in a grid, you may have to implement the sorting in the client anyway as the user may change the ordering by clicking a column header (don't want to have to ask the server to retrieve all the information again)

Like any other performance related question, the universal answer is... "It Depends." However, I have developed a preference for sorting on the client. We write browser-based apps, and my definition of client is split between the web servers an the actual end-user client, the browser. I have two reasons for preferring sorting on the client to sorting in the DB.
First, there's the issue of the "right" place to do it from a design point of view. Most of the time the order of data isn't a business rule thing but rather a end-user convenience thing, so I view it as a function of the presentation, and I don't like to push presentation issues into the database. There are exceptions, for example, where the current price for an item is the most recent one on file. If you're getting price with something like:
SELECT TOP 1 price
FROM itemprice
WHERE ItemNumber = ?
AND effectivedate <= getdate()
ORDER BY effectivedate DESC
Then the order of the rows is very much a part of the business rule and obviously belongs in the database. However, if you're sorting on LastName when the user views customer by last name, and then again on FirstName when they click the FirstName column header, and again on State when they click that header then your sorting is a function of the presentation and belongs in the presentation layer.
The second reason I prefer sorting in the client layer is one of performance. Web servers scale horizontally, that is, if I overload my web server with users I can add another, and another, and another. I can have as many frontend servers as I need to handle the load and everything works just fine. But, if I overload the database I'm screwed. Databases scale vertically, you can throw more hardware at the problem, sure, but at some point that becomes cost prohibitive, so I like to let the DB do the selection, which it has to do, and let the client do the sorting, which it can to quite simply.

I prefer custom sorting on the client, however I also suggest that most SQL statements should have some reasonable ORDER BY clause by default. It causes very little impact on the database, but without it you could wind up with problems later. Often times without ever realizing it, a developer or user will begin to rely on some initial default sort order. If an ORDER BY clause wasn't specified, the data is only in that order by chance. At some later date an index could change or the data might be re-organized and the users will complain because the initial order of the data might have changed out from under them.

Situations vary, and measuring performance is important.
Sometimes it's obvious - if you have a big dataset and you're interested in a small range of the sorted list (e.g. paging in a UI app) - sorting on the server saves the data transfer.
But often you have one DB and several clients, and the DB may be overloaded while the clients are idle. Sorting on the client isn't heavy, and in this situation it could help you scale.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas