Resort (reorder) table data - sql

What is the fastest way to reorder (resort) a table so that the physical representation most closely matches the logical, using PostgreSQL?

Having data ordered does not mean that you won't need ORDER BY clause in your queries anymore.
It just means that the logical order or the data is likely to match the physical order, and retrieving the data in logical order (say, from an index scan) will more likely result in a sequential read access to the table.
Note that neither MySQL nor PostgreSQL guarantee that INSERT … SELECT … ORDER BY will result in the data being ordered in the table.
To order data in PostgreSQL you should define an index with the given order and issue this command:
CLUSTER mytable USING myindex
This will rebuild the table and all indexes defined on it.

Order of insertion does not in the end always control order in the table. Tables are ordered by their clustered index, if they have one. If they do not have one, then the order is technically undefined, although it is probably safe in many cases to assume that they're ordered in insertion order this doesn't mean they'll stay that way. Lacking a specific ordering the DB engine is free to reorder as it sees fit to optimize retrieval.
If you want to control order on disk, the single best way is to properly define your clustered index. Always.

Use CLUSTER to reorder a table by a given index.
(BTW, ALTER TABLE ... ORDER BY ...)

Related

How to reorganize the records in a postgres table and persist it permanently

I have the following table structure
id name
--------------
10991 Shoug
10990 Moneera
10989 Abc
10988 xyz
id is the primary key column, As you can see the id is in decreasing order (ie: select * from users returns the records in this order) because of the way the data was inserted.
How do I resort the table in the ascending order of the primary key permanently? Preferably with SQL alone?
I found this answer but its not working for me. I am using Postgres as the database.
How do I resort the table in the ascending order of the primary key permanently?
You are missing a fundamental concept about relational databases. In SQL, tables represent unordered sets. There is no "permanent" ordering. The only ordering is provided by the order by clause on a query.
Some databases support a concept called "clustering"/"clustered indexes". This means that the data on data pages is actually ordered according to some key. In these databases, even when using a table with a clustered index, you are still not guaranteed that the data is returned in any particular order. Unless you use ORDER BY.
Postgres does not support this functionality, so even this is not available.
Apparently not possible in Postgres.
Curious what your use case was for this. I was researching any downside to using UUID as PK in Postgres, which led me to your post. Since pg does not physically reorder the data according to PK, I concluded that only the size (vs traditional sequence) would be the difference. I'm more used to SQL Server creating a Cluster index and reordering data by default.
As your link in your OP suggests, you can force it to reorder, but it is not persistent. Postgres also has a CLUSTER command to do this. This does lock your table, though.
https://www.postgresql.org/docs/current/sql-cluster.html
https://dba.stackexchange.com/questions/38710/how-does-postgresql-physically-order-new-records-on-disk-after-a-cluster-on-pri

AWSDynamoDBQueryExpression order by UNIX timestamp

I am new to NoSQL databases and I have changed my database schema from storing dates as a UTC timestamp string, to a UNIX timestamp (number), in hopes that I can create either a scan or a query expression to find the 1000 most recent items in the table. I have yet to find a simple snippet of code to accomplish this using the AWSDynamoDBQueryExpression class. Scan doesn't appear to have any sort mechanism but query might. Any ideas?
There is no ORDER BY functionality in DynamoDB. If you want to run a top N query you'll have to performa a scan and then order the results yourself.
Mark B is right that query results can be ordered by the sort key but that is only within the context of a query. Queries are inherently limited to a single partition key.
If your table is small then you can get away with creating a Global Secondary Index on the table in which the partition key can be an attribute that is the same for all items and then use the timestamp attribute as a sort key. But keep in mind that this will break down once your table gets bigger. And if you're doing that you might as well not be using Dynamo. You're better off with an relational database on RDS.
First you need to make sure the timestamp field is the sort key for your DynamoDB table (or the sort key for a Global Secondary Index on that table). Then you just need to run a query. From the documentation:
Query results are always sorted by the sort key value. If the data
type of the sort key is Number, the results are returned in numeric
order; otherwise, the results are returned in order of UTF-8 bytes. By
default, the sort order is ascending. To reverse the order, set the
ScanIndexForward parameter to false.

How to create trigger "ORDER BY"

I want to order my table Jogadores by Total value which I can achieve by writing this:
ALTER TABLE `Jogadores` ORDER BY `Total` DESC ;
My question is how do I use this as a trigger every time I edit a Total value or insert a new row?
I'm using phpMyAdmin
Thanks
I'm not sure I understand what you are trying to accomplish. I am going to assume you want your table sorted in that fashion for when you retrieve it. If that is true, you can sort it upon retrieval rather than altering the table.
SELECT *
FROM Jogadores
ORDER BY Total DESC;
Why? That's not usually how SQL databases work
What if some totals are the same? What is the second way to sort them?
It's also usually not a good idea to store a total in your operational db. Is this a data warehouse?
It would probably be easier to add an index on total and create a view to order by that column.
After reading the new comments, what you want is not doable. See this answer.
Original content:
As others have said, the physical order of the rows in the table doesn't matter since you can always sort when querying. If you're thinking of clustered indexes, then all you need to do is define your primary key properly.
The InnoDB term for a primary key index. InnoDB table storage is organized based on the values of the primary key columns, to speed up queries and sorts involving the primary key columns. For best performance, choose the primary key columns carefully based on the most performance-critical queries. Because modifying the columns of the clustered index is an expensive operation, choose primary columns that are rarely or never updated.
So a clustered index would achieve what you want, but it is probably not what you actually need. Just so you know, the clustered index speedups are practically zero if you're dealing with less than a million or so rows (rough estimate).

Oracle Index Sort Order and Joining

I have 2 tables that are a few millions of rows with indexes. I'm looking to convert one of the indexes to DESC order to optimize some operations. However, will that affect joinining speed or other optimizations?
For example:
Table A:
a_id (pk)
Table B:
b_id (pk)
a_id (fk)
If A.a_id is stored as DESC and B.a_id is stored ASC will I encounter any problems or slowness on joins? Will oracle be able to use the indexes for joining even though they have different sort orders? Do I have to make B.a_id DESC as well or create a second index that is DESC ? Obviously I'd like to try a simple experiment but I don't have DBA access or a spare oracle setup to work with.
Will oracle be able to use the indexes
for joining even though they have
different sort orders?
Indexes are not used "for joining". They're used to access data. The row sources thus created are then joined. The only reason I can think of that the sort order of the index would have any impact on joining would be if a merge join is occurring and the index is being used to avoid sorting. In this case, the impact of changing to a descending index might be that the data needs to be sorted in memory after it is accessed; or it might not, if the optimizer is intelligent enough to simply walk through that data in reverse order when doing the merge.
If you have queries whose execution plans rely on using the index on A.A_ID to get the data in ascending order (either for purposes of a merge join or to meet your requested ordering of the results), then changing the index to descending order could have an impact.
Edit: Just did a quick test on some sample data. The optimizer does seem to have the capability to merge row sources sorting in opposite orders without resorting either of them. So at the most obvious level, having one index ascending and the other descending should not cause serious performance problems. However, it does look like the descending indexes can have other effects on the execution plan -- in my case, the ascending index was used for a fast full scan, while the descending one was used for a range scan. This could cause changes in query performance -- good or bad -- but the only way to know for certain is to test it.
Oracle implements indexes as doubly-linked lists, so it makes no difference whether you specify an ASC or DESC index for a single column.
DESC indexes are a special case that helps when you have a multi-column index, e.g. if I have a query that often orders by colA ASC, colB DESC, then I might decide to add an index on (colA, colB DESC) in order to avoid a sort.
Developing without a development and test system?
Your answer is to develop with one. Oracle comes on all platforms, just install, add data, do your work.
For you, just live dangerously and do the index change, who cares what happens. Grab for that brass ring. So you miss. You won't lose any data.
I'm not sure I get what you're trying to ask - you cannot "store" in descending or ascending order. You can fetch the results of the query and order it using ORDER BY clause which will sort the resulting set in ascending or descending order.
There is no guarantee that you're inserting any data in ascending or descending order.
Consequently, the "order" by which it is inserted will have no bearing on the performance because there is no order
Generally speaking an index can do scanning in asc/desc order since the 2 pointers in the index structure are sufficient to identify leaf blocks and corresponding blocks while doing scan based on asc/desc order without sorting in the memory.
However if we create an index with desc column definition its structure will be much larger than the a normal index since the normal index has a 90-10 splits (incrementing row ids) where as desc index will be 50-50 splits and will lead to unused space and a candidate for rebuild which will require additional maintenance and overhead.
DESC indexes can be helpful when you have a multi-column index where one column is need in asc while the other in desc to avoid sorting in the memory.
Early optimization is a waste of time. Just leave this problem and do the next thing. When there are 100 million rows in this table change the indexes and test what happens, until then your ten rows of data are not worth the time to "optimize".

SQL Server table is sorted by default

I have simple SSIS package where I import data from flat file into SQL Server table (SQL Server 005). File contains 70k rows and table has no primary key. Importing is sucessful but when I open SQL Server table the order of rows is different from the that of file. After observing closely I see that data in table is sorted by default by first column. Why this is happening? and how I can avoid default sort?
Thanks.
You cannot rely on ordering unless you specify order by in your SQL query. SQL is a relational algebra that works with sets. Those sets are unordered. Database tables do not have an intrinsic ordering.
It may well be that the sets are ordered due to the way the data is retrieved from the tables. This may be based on primary key, order of insertion, clustered key, seemingly random order based on the execution plan of the query or the actual data in the table or even the phase of the moon.
Bottom line, if you want a specific order, use order by. If you don't want a specific order, the DBMS is free to deliver your rows in any order, including one based on the first column.
If you really want them sorted depending on the position in the import file, you should add another column to the table to store an increasing number based on its position in that file. Then use order by using that column. But that's a pretty arbitrary sort order, you're generally better off choosing one that makes more sense to the data (transaction ID, date/time, customer number or whatever else you have).
If you want to avoid the default sort (however variable that may be), use a specific sort.
In general no order is applied if there is no ordering in the select query.
What I have noticed is that the table results might return in the order of the primary key, but this is not gaurenteed either.
So all in all, if you do not specify a ordering, no ordering can be assumed.