Does indexing in Postgres improve ordering speed?

Does indexing in Postgres improve ordering speed? - sql

Let's say you have a table with a primary key A, and two columns B and C.
When querying we want to do SELECT * FROM table WHERE A = 'thing' ORDER BY B, C
Since A is a primary key, it already has an index. Is there any benefit to adding an index on B and C in terms of speeding up ordering?
Thanks!

This query cannot benefit from additional indexes.
If a is the primary key, then the query can only return zero or one rows, so ordering is trivial and cannot be made faster.
In fact, you should omit the ORDER BY clause.

Related

Does PostgreSQL use all available indexes to run a query faster?

We are structuring a project where some tables will have many records, and we intend to use 4 numeric foreign keys and 1 numeric primary, our assumption is that if we create an index for each foreign key and the default index of the primary key, the postgres planning would use all the starts (5 in total) to perform the query.
95% of the time the queries would be providing at least the 4 foreign keys.
Would each index be used to position the search faster in the sequential section of records?
Would having 4 indexes increase the speed of the query or would it suffice with a single index of the parent level (branch_id)?
Thank you for your time and experience.
example: if all foreign keys have an index
SELECT * FROM products WHERE
account_d=1 AND
organization_id=2 AND
business_id=3 AND
branch_id=4 AND
product_id=5;
example: if I only indicate the id of the primary key
SELECT * FROM products WHERE product_id=5;

If all 4 columns are specified by equality, it is possible to combine the single-column indexes using BitmapAnd. However, this would be less efficient than using one multi-column index on all four columns.
Since that will apparently be a very common query, it would make sense to have that multi-column index.
Usually you will want to index each foreign key column. Otherwise, if you want to delete an organization, for example, it would need to scan the whole table to verify that no records were still referencing it. Whichever column is the first one in the multi-column index will not need to also have a single-column index for it. But the other 3 which are not first probably still need their own indexes.

Indexes are (predominantly) used when filtering or joining tables, so whether the indexes you are proposing are useful is entirely dependent on the SQL you are running and whether the query optimiser determines that using an index would be beneficial.
For example, if you ran SELECT * FROM TABLE then none of the indexes would be used.
I can’t comment on Postgresql specifically but many/most DBMSs automatically create indexes when you define PKs/FKs - so you will get the indexes anyway, regardless of any performance tuning you are trying to implement
Update
Having individual indexes on each column is not going to help with the query you’ve provided, the optimiser will only use one of them, probably the PK. A compound index on multiple columns would help, but the more columns you add to the index, the more restrictive the pattern of queries it will benefit.
Say you have 3 columns A, B, C and include them all in WHERE clause, then having a compound index of A+B+C would be highly beneficial.
If you keep this index but your WHERE clause only has columns A, B it will still benefit significantly as the query can still use the A+B subset of the index.
If your WHERE clause only has columns A,C then it would benefit only slightly, as it would select all records from the index that start with the A value - but then would have to filter them to find the subset with the C value

I have a composite key for a table. I want to join on just one column of this key. Does that column need a separate index?

Imagine I have a table with a composite primary key containing DateCode and AddressCode.
I want to join that table with another table on just AddressCode.
I know there will be a single index on DateCode combined with AddressCode, since that is the primary key. Should I also have an index on just AddressCode in this table just for the purposes of efficient joins to other tables only using the AddressCode as a foreign key? This is was what I would do in MySQL, though I'm not sure if Microsoft SQL Server handles this situation better automatically somehow.

After further research and experimentation, I have my own answer. Yes, a join on a column that is part of a composite key but is not the first element of that index (that is, "most significant member") requires a separate index. Without that index, performing a JOIN on that column requires a full scan of either the composite index or the table.
To clarify this further, if there is a composite index (such as is automatically created for a composite primary key) on three columns a, b, and c, if the index was created on a, b, c via
CREATE INDEX NewIndex ON Table(a, b, c)
then a is the most significant and c is the least. If the index was created on b, c, a, like so
CREATE INDEX NewIndex ON Table(b, c, a)
then b is the most significant. Since the index is ordered according to this significance, finding values indexed by the most significant component of a composite index requires only a trivial amount of additional effort in comparison to finding values indexed by that column alone (that is, it’s like looking for all integers that begin with “7” in an ordered list from 1 to 1000), whereas finding values indexed on less significant components of a composite index typically requires a full index scan (that is, it’s like looking for all integers that end with “7” in an ordered list from 1 to 1000).

Oracle multiple vs single column index

Imagine I have a table with the following columns:
Column: A (numer(10)) (PK)
Column: B (numer(10))
Column: C (numer(10))
CREATE TABLE schema_name.table_name (
column_a number(10) primary_key,
column_b number(10) ,
column_c number(10)
);
Column A is my PK.
Imagine my application now has a flow that queries by B and C. Something like:
SELECT * FROM SCHEMA.TABLE WHERE B=30 AND C=99
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
Q1 - If so, why should I create an index with those two columns?
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?

The simple answers to your questions.
For this query:
SELECT *
FROM SCHEMA.TABLE
WHERE B = 30 AND C = 99;
The optimal index either (B, C) or (C, B). The order does matter because the two comparisons are =.
An index on either column can be used, but all the matching values will need to be scanned to compare to the second value.
If you have an index on (B, C), then this can be used for a query on WHERE B = 30. Oracle also implements a skip-scan optimization, so it is possible that the index could also be used for WHERE C = 99 -- but it probably would not be.
I think the documentation for MySQL has a good introduction to multi-column indexes. It doesn't cover the skip-scan but is otherwise quite applicable to Oracle.

Short answer: always check the real performance, not theoretical. It means, that my answer requires verification at real database.
Inside SQL (Oracle, Postgre, MsSql, etc.) the Primary Key is used for at least two purposes:
Ordering of rows (e.g. if PK is incremented only then all values will be appended)
Link to row. It means that if you have any extra index, it will contain whole PK to have ability to jump from additional index to other rows.
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
It depends. If your table is too small, Oracle can do just full scan of it. For large table Oracle can (and will do in common scenario) use index for column B and next do range scan. In this case Oracle check all values with B=30. Therefore, if you can only one row with B=30 then you can achieve good performance. If you have millions of such rows, Oracle will need to do million of reads. Oracle can get this information via statistic.
Q1 - If so, why should I create an index with those two columns?
It is needed to direct access to row. In this case Oracle requires just few jumps to find your row. Moreover, you can apply unique modifier to help Oracle. Then it will know, that not more than single row will be returned.
However if your table has other columns, real execution plan will include access to PK (to retrieve other rows).
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
Yes. Please check the details here. If index have several columns, than Oracle will sort them according to column ordering. E.g. if you create index with columns B, C then Oracle will able to use it to retrieve values like "B=30", e.g. when you restricted only B.

Well, it all depends.
If that table is tiny, you won't see any benefit regardless any indexes you might create - it is just too small and Oracle returns data immediately.
If the table is huge, then it depends on column's selectivity. There's no guarantee that Oracle will ever use that index. If optimizer decides (upon information it has - don't forget to regularly collect statistics!) that the index should not be used, then you created it in vain (though, you can choose to use a hint, but - unless you know what you're doing, don't do it).
How will you know what's going on? See the explain plan.
But, generally speaking, yes - indexes help.
Q1 - If so, why should I create an index with those two columns?
Which "two columns"? A? If it is a primary key column, Oracle automatically creates an index, you don't have to do that.
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If you are talking about a composite index (containing both B and C columns, respectively), and if query uses B column, then yes - index will (OK, might be used). But, if query uses only column C, then this index will be completely useless.

In spite of this question being answered and one answer being accepted already, I'll just throw in some more information :-)
An index is an offer to the DBMS that it can use to access data quicker in some situations. Whether it actually uses the index is a decision made by the DBMS.
Oracle has a built-in optimizer that looks at the query and tries to find the best execution plan to get the results you are after.
Let's say that 90% of all rows have B = 30 AND C = 99. Why then should Oracle laboriously walk through the index only to have to access almost every row in the table at last? So, even with an index on both columns, Oracle may decide not to use the index at all and even perform the query faster because of the decision against the index.
Now to the questions:
If I create an index only using the Column B, this will already improve my query right?
It may. If Oracle thinks that B = 30 reduces the rows it will have to read from the table imensely, it will.
If so, why should I create an index with those two columns?
If the combination of B = 30 AND C = 99 limits the rows to read from the table further, it's a good idea to use this index instead.
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If the index is on (B, C), i.e. B first, then Oracle may find it useful, yes. In the extreme case that there are only the two columns in the table, that would even be a covering index (i.e. containing all columns accessed in the query) and the DBMS wouldn't have to read any table row, as all the information is already in the index itself. If the index is (C, B), i.e. C first, it is quite unlikely that the index would be used. In some edge-case situations, Oracle might do so, though.

MS Access: Best indexing strategy for retrieving DISTINCT combinations of joined fields

I have two tables in MS Access 2010:
Table tblA:
idA AutoNumber
a Text(255)
b Text(255)
c Text(255)
x Text(255)
y Text(255)
Table tblB:
idB AutoNumber
fkA Long Integer
d Text(255)
e Text(255)
z Text(255)
... and need to execute the following query:
SELECT DISTINCT
tblA.a
, tblA.b
, tblA.c
, tblB.d
, tblB.e
FROM tblA
INNER JOIN tblB
on tblA.idA = tblB.fkA
;
Both tables are very large and I was wondering what is the best indexing strategy to achieve the fastest response time.
idA and idB are the primary keys for their respective tables and fkA has its own index.
But what about tblA.a, tblA.b, tblA.c, tblB.d, tblB.e? Should I create a composite index on tblA.a, tblA.b, tblA.c and one on tblB.d, tblB.e? Or should each field be indexed individually?
I tried both options and the first one seems to yield slightly better results, though both are not very satisfactory in terms of performance. I would like to understand more about the theoretical background and appreciate every input.

As you are joining all records, the DBMS may simply decide for full table scans to join the tables.
With indexes on tblA(idA) and tblB(fkA) you give the DBMS the option to use these instead, but it's up to the DBMS to do so or not (it will - hopefully - decide for the faster way, whichever this is).
You can also offer the DBMS covering indexes. That means all columns used in the query are in that index, so if the DBMS uses it, it doesn't have to access the table additionally, but can get everything from the index itself. As you have no where clause, the DBMS may still prefer to access the tables row by row, rather than run through indexes. The covering indexes would be:
tblA(idA, a, b, c)
tblB(fkA, d, e)

Is an index on A, B redundant if there is an index on A, B, C?

Having years of experience as a DBA, I do believe I know the answer to the question, but I figured it never hurts to check my bases.
Using SQL Server, assuming I have a table which has an index on column A and column B, and a second index on columns A, B, and C, would it be safe to drop the first index, as the second index basically would satisfy queries that would benefit from the first index?

It depends, but the answer is often 'Yes, you could drop the index on (A,B)'.
The counter-case (where you would not drop the index on (A,B)) is when the index on (A,B) is a unique index that is enforcing a constraint; then you do not want to drop the index on (A,B). The index on (A,B,C) could also be unique, but the uniqueness is redundant because the (A,B) combination is unique because of the other index.
But in the absence of such unusual cases (for example, if both (A,B) and (A,B,C) allow duplicate entries), then the (A,B) index is logically redundant. However, if the column C is 'wide' (a CHAR(100) column perhaps), whereas A and B are small (say INTEGER), then the (A,B) index is more efficient than the (A,B,C) index because you can get more information read per page of the (A,B) index. So, even though (A,B) is redundant, it may be worth keeping. You also need to consider the volatility of the table; if the table seldom changes, the extra indexes don't matter much; if the table changes a lot, extra indexes slow up modifications to the table. Whether that's significant is difficult to guess; you probably need to do the performance measurements.

The first index covers queries that look up on A , A,B and the second index can be used to cover queries that look up on A , A,B or A,B,C which is clearly a superset of the first case.
If C is very wide however the index on A,B may still be useful as it can satisfy certain queries with fewer reads.
e.g. if C was a char(800) column the following query may benefit significantly from having the narrower index available.
SELECT a,b
FROM YourTable
ORDER BY a,b

Yes, this is a common optimization. Any query that would benefit from the index on A,B can also benefit just as well from the index on A,B,C.
In the MySQL community, there's even a tool to search your whole schema for redundant indexes: http://www.percona.com/doc/percona-toolkit/pt-duplicate-key-checker.html
The possible exception case would be if the index on A,B were more compact and used much more frequently, and you wanted to control which index was kept loaded in memory.

Much of what I was thinking was written by Jonathan in a previous answer. Uniqueness, faster work, and one other thing I think he missed.
If the first index is made A desc, B asc and second A asc, B asc, C asc, then deleting the first index isn't really a way to go, because the second one isn't a superset of the first one, and your query cannot benefit from the second index if ordering is as written in the first one.
In some cases like when you use the first index, you can order by A desc, B asc (of course) and A asc, B desc, but you can also make a query that will use any part of that index, like Order by A desc.
But a query like order by A asc, B asc, will not be 'covered' by the first index.
So I would add up, you can usually delete the first index, but that depends on your table configuration and your query (and, of course, indexes).

I typically would find this "almost" similar index in table that contains historical data. If column C is a date or integer column, be careful. It is most likely used to satisfy the MAX function as in WHERE tblA.C = MAX(tblB.C), which skips the table altogether and utilize an index only access path.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas