MS Access: Best indexing strategy for retrieving DISTINCT combinations of joined fields - sql

I have two tables in MS Access 2010:
Table tblA:
idA AutoNumber
a Text(255)
b Text(255)
c Text(255)
x Text(255)
y Text(255)
Table tblB:
idB AutoNumber
fkA Long Integer
d Text(255)
e Text(255)
z Text(255)
... and need to execute the following query:
SELECT DISTINCT
tblA.a
, tblA.b
, tblA.c
, tblB.d
, tblB.e
FROM tblA
INNER JOIN tblB
on tblA.idA = tblB.fkA
;
Both tables are very large and I was wondering what is the best indexing strategy to achieve the fastest response time.
idA and idB are the primary keys for their respective tables and fkA has its own index.
But what about tblA.a, tblA.b, tblA.c, tblB.d, tblB.e? Should I create a composite index on tblA.a, tblA.b, tblA.c and one on tblB.d, tblB.e? Or should each field be indexed individually?
I tried both options and the first one seems to yield slightly better results, though both are not very satisfactory in terms of performance. I would like to understand more about the theoretical background and appreciate every input.

As you are joining all records, the DBMS may simply decide for full table scans to join the tables.
With indexes on tblA(idA) and tblB(fkA) you give the DBMS the option to use these instead, but it's up to the DBMS to do so or not (it will - hopefully - decide for the faster way, whichever this is).
You can also offer the DBMS covering indexes. That means all columns used in the query are in that index, so if the DBMS uses it, it doesn't have to access the table additionally, but can get everything from the index itself. As you have no where clause, the DBMS may still prefer to access the tables row by row, rather than run through indexes. The covering indexes would be:
tblA(idA, a, b, c)
tblB(fkA, d, e)

Related

Oracle multiple vs single column index

Imagine I have a table with the following columns:
Column: A (numer(10)) (PK)
Column: B (numer(10))
Column: C (numer(10))
CREATE TABLE schema_name.table_name (
column_a number(10) primary_key,
column_b number(10) ,
column_c number(10)
);
Column A is my PK.
Imagine my application now has a flow that queries by B and C. Something like:
SELECT * FROM SCHEMA.TABLE WHERE B=30 AND C=99
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
Q1 - If so, why should I create an index with those two columns?
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
The simple answers to your questions.
For this query:
SELECT *
FROM SCHEMA.TABLE
WHERE B = 30 AND C = 99;
The optimal index either (B, C) or (C, B). The order does matter because the two comparisons are =.
An index on either column can be used, but all the matching values will need to be scanned to compare to the second value.
If you have an index on (B, C), then this can be used for a query on WHERE B = 30. Oracle also implements a skip-scan optimization, so it is possible that the index could also be used for WHERE C = 99 -- but it probably would not be.
I think the documentation for MySQL has a good introduction to multi-column indexes. It doesn't cover the skip-scan but is otherwise quite applicable to Oracle.
Short answer: always check the real performance, not theoretical. It means, that my answer requires verification at real database.
Inside SQL (Oracle, Postgre, MsSql, etc.) the Primary Key is used for at least two purposes:
Ordering of rows (e.g. if PK is incremented only then all values will be appended)
Link to row. It means that if you have any extra index, it will contain whole PK to have ability to jump from additional index to other rows.
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
It depends. If your table is too small, Oracle can do just full scan of it. For large table Oracle can (and will do in common scenario) use index for column B and next do range scan. In this case Oracle check all values with B=30. Therefore, if you can only one row with B=30 then you can achieve good performance. If you have millions of such rows, Oracle will need to do million of reads. Oracle can get this information via statistic.
Q1 - If so, why should I create an index with those two columns?
It is needed to direct access to row. In this case Oracle requires just few jumps to find your row. Moreover, you can apply unique modifier to help Oracle. Then it will know, that not more than single row will be returned.
However if your table has other columns, real execution plan will include access to PK (to retrieve other rows).
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
Yes. Please check the details here. If index have several columns, than Oracle will sort them according to column ordering. E.g. if you create index with columns B, C then Oracle will able to use it to retrieve values like "B=30", e.g. when you restricted only B.
Well, it all depends.
If that table is tiny, you won't see any benefit regardless any indexes you might create - it is just too small and Oracle returns data immediately.
If the table is huge, then it depends on column's selectivity. There's no guarantee that Oracle will ever use that index. If optimizer decides (upon information it has - don't forget to regularly collect statistics!) that the index should not be used, then you created it in vain (though, you can choose to use a hint, but - unless you know what you're doing, don't do it).
How will you know what's going on? See the explain plan.
But, generally speaking, yes - indexes help.
Q1 - If so, why should I create an index with those two columns?
Which "two columns"? A? If it is a primary key column, Oracle automatically creates an index, you don't have to do that.
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If you are talking about a composite index (containing both B and C columns, respectively), and if query uses B column, then yes - index will (OK, might be used). But, if query uses only column C, then this index will be completely useless.
In spite of this question being answered and one answer being accepted already, I'll just throw in some more information :-)
An index is an offer to the DBMS that it can use to access data quicker in some situations. Whether it actually uses the index is a decision made by the DBMS.
Oracle has a built-in optimizer that looks at the query and tries to find the best execution plan to get the results you are after.
Let's say that 90% of all rows have B = 30 AND C = 99. Why then should Oracle laboriously walk through the index only to have to access almost every row in the table at last? So, even with an index on both columns, Oracle may decide not to use the index at all and even perform the query faster because of the decision against the index.
Now to the questions:
If I create an index only using the Column B, this will already improve my query right?
It may. If Oracle thinks that B = 30 reduces the rows it will have to read from the table imensely, it will.
If so, why should I create an index with those two columns?
If the combination of B = 30 AND C = 99 limits the rows to read from the table further, it's a good idea to use this index instead.
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If the index is on (B, C), i.e. B first, then Oracle may find it useful, yes. In the extreme case that there are only the two columns in the table, that would even be a covering index (i.e. containing all columns accessed in the query) and the DBMS wouldn't have to read any table row, as all the information is already in the index itself. If the index is (C, B), i.e. C first, it is quite unlikely that the index would be used. In some edge-case situations, Oracle might do so, though.

Does indexing in Postgres improve ordering speed?

Let's say you have a table with a primary key A, and two columns B and C.
When querying we want to do SELECT * FROM table WHERE A = 'thing' ORDER BY B, C
Since A is a primary key, it already has an index. Is there any benefit to adding an index on B and C in terms of speeding up ordering?
Thanks!
This query cannot benefit from additional indexes.
If a is the primary key, then the query can only return zero or one rows, so ordering is trivial and cannot be made faster.
In fact, you should omit the ORDER BY clause.

Does it make sense to index a table with just one column?

I wonder if it makes sense to index a table, which contains just a single column? The table will be populated with 100's or 1000's of records and will be used to JOIN to another (larger table) in order to filter its records.
Thank you!
Yes and no. An explicit index probably does not make sense. However, defining the single column as a primary key is often done (assuming it is never NULL and unique).
This is actually a common practice. It is not uncommon for me to create exclusion tables, with logic such as:
from . . .
where not exists (select 1 from exclusion_table et where et.id = ?.id)
A primary key index can speed up such a query.
In your case, it might not make a difference if the larger table has an index on the id used for the join. However, you can give the optimizer of option of choosing which index to use.
My vote is that it probably doesn't really make sense in your scenario. You're saying this table with a single column will be joined to another table to filter records in the other table, so why not just delete this table, index the other column in the other table, and filter that?
Essentially, why are you writing:
SELECT * FROM manycols M INNER JOIN singlecol s ON m.id = s.id WHERE s.id = 123
When that is this:
SELECT * FROM manycols m WHERE m.id = 123
Suppose the argument is that manycols has a million rows, and singlecol has a thousand. You want the thousand matching rows, it's manycols that would need to be indexed then for the benefit.
Suppose the argument is you want all rows except those in singlecol; you could index singlecol but the optimiser might choose to just load the entire table into a hash anyway, so again, indexing it wouldn't necessarily help
It feels like there's probably another way to do what you require that ditches this single column table entirely

DB Design: Looking for Performance improvement when a BIT Column from every table used in every SQL Queries

I recently got added to a new ASP .NET Project(A web application) .There were recent performance issues with the application, and I am in a team with their current task to Optimize some slow running stored procedures.
The database designed is highly normalized. In all the tables we have a BIT column as [Status_ID]. In every Stored procedures, For every tsql query, this column is involved in WHERE condition for all tables.
Example:
Select A.Col1,
C.Info
From dbo.table1 A
Join dbo.table2 B On A.id = B.id
Left Join dbo.table21 C On C.map = B.Map
Where A.[Status_ID] = 1
And B.[Status_ID] = 1
And C.[Status_ID] = 1
And A.link > 50
In the above sql, 3 tables are involved, [Status_ID] column from all 3 tables are involved in the WHERE condition. This is just an example. Like this [Status_ID] is involved in almost all the queries.
When I see the execution plan of most of the SPs, there are lot of Key lookup (Clustered) task involved and most of them are looking for [Status_ID] in the respective table.
In the Application, I found that, it is not possible to avoid these column checking from queries. So
Will it be a good idea to
Alter all [Status_ID] columns to NOT NULL, and then adding them to PRIMARY KEY of that table.Columns 12,13.. will be (12,1) and (13,1)
Adding [Status_ID] column to all the NON Clustered indexes in the INCLUDE PART for that table.
Please share you suggestions over the above two points as well as any other.
Thanks for reading.
If you add the Status_ID to the PK you change the definition of the PK
If you add Status_ID to the PK then you could have duplicate ID
And changing the Status_ID would fragment the index
Don't do that
The PK should be what should make the row unique
Add a separate nonclustered index for the Status_ID
And if it is not null then change it to not null
This will only cut the workload in 1/2
Another option is to add [Status_ID] to every other non clustered.
But if it is first it only cuts the workload in 1/2.
And if is second it is only effective if the other component of the index is in the query
Try Status_ID as a separate index
I suspect the query optimizer will be smart enough to evaluate it last since it will be the least specific index
If you don't have an index on link then do so
And try changing the query
Some times this helps the query optimizer
Select A.Col1, C.Info
From dbo.table1 A
Join dbo.table2 B
On A.id = B.id
AND A.[Status_ID] = 1
And A.link > 50
And B.[Status_ID] = 1
Left Join dbo.table21 C
On C.map = B.Map
And C.[Status_ID] = 1
Check the fragmentation of the indexes
Check the type of join
If it is using a loop join then try join hints
This query should not be performing poorly
If might be lock contention
Try with (nolock)
That might not be an acceptable long term solution but it would tell you is locks are the problem

mysql: which queries can untilize which indexes?

I'm using Mysql 5.0 and am a bit new to indexes. Which of the following queries can be helped by indexing and which index should I create?
(Don't assume either table to have unique values. This isn't homework, its just some examples I made up to try and get my head around indexing.)
Query1:
Select a.*, b.*
From a
Left Join b on b.type=a.type;
Query2:
Select a.*, b.*
From a,b
Where a.type=b.type;
Query3:
Select a.*
From a
Where a.type in (Select b.type from b where b.brand=5);
Here is my guess for what indexes would be use for these different kinds of queries:
Query1:
Create Index Query1 Using Hash on b (type);
Query2:
Create Index Query2a Using Hash on a (type);
Create Index Query2b Using Hash on b (type);
Query3:
Create Index Query2a Using Hash on b (brand,type);
Am I correct that neither Query1 or Query3 would utilize any indexes on table a?
I believe these should all be hash because there is only = or !=, right?
Thanks
using the explain command in mysql will give a lot of great info on what mysql is doing and how a query can be optimized.
in q1 and q2: an index on (a.type, all other a cols) and one on (b.type, all other b cols)
in q3: an index on (a.b_type, all other a cols) and one on b (brand, type)
ideally, you'd want all the columns that were selected stored directly in the index so that mysql doesn't have to jump from the index back to the table data to fetch the selected columns. however, that is not always manageable (i.e.: sometimes you need to select * and indexing all columns is too costly), in which case indexing just the search columns is fine.
so everything you said works great.
query 3 is invalid, but i assume you meant
where a.type in ....
Query 1 is the same as query two, just better syntax, both probably have the same query plan and both will use both indexes.
Query 3 will use the index on b.brand, but not the type portion of it. It would also use an index on a.type if you had one.
You are right that they should be hash indexes.
Query 3 could utilize an index on a.type if the number of b's with brand=5 is close to zero
Query2 will utilize indices if they are B-trees (and thus are sorted). Using hash indices with index-join may slow down your query (because you'll have to read Size(a) values in non-sequential way)
Query optimization and indexing is a huge topic, so you'll definitely want to read about MySQL and the specific storage engines you're using. The "using hash" is supported by InnoDB and NDB; I don't think MyISAM supports it.
The joins you have will perform a full table or index scan even though the join condition is equality; Every row will have to be read because there's no where clause.
You'll probably be better off with a standard b-tree index, but measure it and investigate the query plan with "explain". MySQL InnoDB stores row data organized by primary key so you should also have a primary key on your tables, not just an index. It's best if you can use the primary key in your joins because otherwise MySQL retrieves the primary key from the index, then does another fetch to get the row. The nice exception to that rule is if your secondary index includes all the columns you need in the query. That's called a covering index and MySQL will not have to lookup the row at all.