I'm new to sql, if i use this query frequently:
SELECT * FROM student WHERE key1=? AND key2=?
I want to create index on student, what is the main difference between these two below?
CREATE INDEX idx_key1 on student (key1);
CREATE INDEX idx_key2 on student (key2);
and
CREATE INDEX idx_keys on student (key1, key2);
Thanks!
The second one (CREATE INDEX idx_keys on student (key1, key2)) will return all the rows you need in a single index seek (to find the rows) + key lookups to get the columns.
If you create 2 single-column indexes, only one of them can be used for index seek. Then for every returned row you need a key lookup to get the other key and filter the results. Or the DB engine will simply decide it's faster to just do a table scan and filter.
So the 2nd one is much better for your query.
Related
This is some pseudo SQL in which the 'problem' is easily replicated:
create table Child (
childId text primary key,
some_int int not null
);
create table Person (
personId text primary key,
childId text,
foreign key (childId) references Child (childId) on delete cascade
);
create index Person_childId on Person (childId);
explain query plan select count(1)
from Person
left outer join Child on Child.childId = Person.childId
where Person.childId is null or Child.some_int = 0;
The result of the query plan is this:
SCAN Person USING COVERING INDEX Person_childId
SEARCH Child USING INDEX sqlite_autoindex_Child_1 (childId=?) LEFT-JOIN
This looks great right? But I am curious if this is the 'full' plan. This is because some_int does not have an index. But the query plan does not uncover this, I don't see the filtering anywhere. The database must filter on this field right?
When I execute the some_int field in a separate query, it shows a SCAN, exactly like I though I would see in the previous query plan because there is no index:
explain query plan select * from Child where some_int = 0;
Gives:
SCAN Child
Now my questions:
Why isn't there SCAN Child shown in the first query plan?
Why is there a SCAN on Person and not a SEARCH?
Is the first query plan 'quick' or do I still need to add an index?
You should take a look at this page. It explains the sqlite query planner in depth and you can find answers to all your questions.
Note that filtering conditions like WHERE some_int=0 are not displayed in the query plan because they don't affect the plan but only the result set.
In brief:
Why isn't there SCAN Child shown in the first query plan?
Because, due to the LEFT JOIN, sqlite needs to SCAN Person and, for every row of Person, use the index on ChildId to find the corresponding records in Child.
Why is there a SCAN on Person and not a SEARCH?
A SCAN means reading of all rows of a table, in the order in which they are stored. A SEARCH is a lookup of a single value in the table, using an index to find out the rowid and the using the rowid to get to that row of the table, without the need to scan all te table to find the row.
Since your query needs to read all Person.childId, it does a full SCAN.
Is the first query plan 'quick' or do I still need to add an index?
Your query is already using all the indexes it could use, so it's already as fast as you could get it.
I have a patient table with a few columns, and a clustered index on column ID and a non-clustered index on column birth.
create clustered index CI_patient on dbo.patient (ID)
create nonclustered index NCI_patient on dbo.patient (birth)
Here are my queries:
select * from patient
select ID from patient
select birth from patient
Looking at the execution plan, the first query is 'clustered index scan' (which is understandable because the table is a clustered table), the third one is 'index scan nonclustered' (which is also understandable because this column has a nonclustered index)
My question is why the second one is 'index scan nonclustered'? This column suppose to have a clustered index, in this sense, should that be clustered index scan? Any thoughts on this?
Basically, your second query wants to get all ID values from the table (no WHERE clause or anything).
SQL Server can do this two ways:
clustered index scan - basically a full table scan to read all the data from all rows, and extract the ID from each row - would work, but it loads the WHOLE table, one by one
do a scan across the non-clustered index, because each non-clustered index also includes the clustering column(s) on its leaf level. Since this is a index that is much smaller than the full table, to do this, SQL Server will need to load fewer data pages and thus can provide the answer - all ID values from all rows - faster than when doing a full table scan (clustered index scan)
The cost-based optimizer in SQL Server just picks the more efficient route to get the answer to the question you've asked with your second query.
I have never created an Index before but I'm thinking it may help here. I have a SAS dataset of approx. 7million records. It is a listing of employee entries along with their respective timestamps. I am identifying if there are any subsequent entries by the same user on the same day and then noting the timestamp.
The data set (Entries) is 3 columns: Storage_ID, User_ID and EventTimestamp.
I'm thinking maybe an Index on Stoarge_ID and User_ID would help speed things along.
If they would help, how/where would I need to go about creating the index?
PROC SQL;
CREATE TABLE sub_ENTRIES AS
SELECT A.*,
(SELECT
MIN(B.EVENTTIMESTAMP)
FROM
ENTRIES B
WHERE
A.STORAGE_ID=B.STORAGE_ID
AND A.USER_ID=B.USER_ID
AND DATEPART(A.EVENTTIMESTAMP)=DATEPART(B.EVENTTIMESTAMP)
AND B.EVENTTIMESTAMP > A.EVENTTIMESTAMP
) AS NEXT_ACCESS FORMAT=DATETIME27.6
FROM
ENTRIES A
;
You can create a composite index (two or more columns) using SQL.
For example:
Proc SQL;
create index STORAGE_USER on ENTRIES (storage_id, user_id);
The general syntax is for a index key of n columns is:
create index <index-name>
on <table-name>
( <column-name-1>,
<column-name-2>,
…
<column-name-<n>>
)
The index is most effective / applicable when the query select or join criteria involves all the columns of the composite key. Using OPTION MSGLEVEL=I to have SAS log index usage.
I want to extract record of person from table (employ) and I write query
SELECT *
FROM employ
WHERE employ_Id=some_specific_id
Now my question is what this query does first, mean this will first go to the table(employ) and selects all the records and then apply filter on it or just go the table(employ) and find record of the employ with the specific id given after WHERE clause.
1) Table records are mostly stored in order of primary key (known as clustered index). So, when you use primary key as where condition then rdbms doesn't requires to scan table (all records)
2) For other then primary key. Rdbms checks if index is created on table and if can be used for your where condition. so, it can avoid full table scan.
3) If non of above is possible then full table scan if performed.
When executing a query, it will look through ALL ROWS to see if they match your condition. This is why the more data you have, the longer the query will take.
If your condition is an index, as I believe is the case in your query, assuming empId is a primary key of that table, then the search will only be on that sorted index which will be much faster as not all the rows will need to be checked.
1-> At first control will check for the table in user_tab data dictionary.
2->Then will check for column availability in the table if the column exists the check for the where condition.
3-> Condition may or may not true, the control will go to select columns
Simple question I think. I want to do an index scan on a table but it's not doing it. So I have a table with a unique clustered index on ID column and have 2 other columns, first_name and last_name. The following was my query...
SELECT FIRST_NAME
FROM TABLE_A
WHERE FIRST_NAME LIKE 'GUY'
I thought since I wasn't searching on the column with the index it should do it.
Why isn't it working and how do I make sure that I can get this to work every time I want it to?
Since first_name is not part of any index, there's no point in the database using an index - it will have to scan all of it, access the actual table row for each entry, and evaluate the first_name value there. Since it's accessing all the table's rows anyway, the optimizer just prefers to perform a full table scan, and save the (useless) index accesses.
If you want to use an index to speed up your query, you should create one that covers this column. E.g.:
CREATE INDEX table_a_first_name_ind ON table_a(first_name)