Indexing boolean fields - sql

Is there going to be much benefit in indexing a boolean field in a database table?
Given a common situation, like "soft-delete" records which are flagged as inactive, and hence most queries include WHERE deleted = 0, would it help to have that field indexed on its own, or should it be combined with the other commonly-searched fields in a different index?

No.
You index fields that are searched upon and have high selectivity/cardinality. A boolean field's cardinality is obliterated in nearly any table. If anything it will make your writes slower (by an oh so tiny amount).
Maybe you would make it the first field in the clustered index if every query took into account soft deletes?

What is about a deleted_at DATETIME column? There are two benefits.
If you need an unique column like name, you can create and soft-delete a record with the same name multiple times (if you use an unique index on the columns deleted_at AND name)
You can search for recently deleted records.
You query could look like this:
SELECT * FROM xyz WHERE deleted_at IS NULL

I think it would help, especially in covering indices.
How much/little is of course dependent on your data and queries.
You can have theories of all sorts about indices but final answers are given by the database engine in a database with real data. And often you are surprised by the answer (or maybe my theories are too bad ;)
Examine the query plan of your queries and determine if the queries can be improved, or if the indices can be improved.
It's quite simple to alter indices and see what difference it makes

I think it would help if you were using a view (where deleted = 0) and you are regularly querying from this view.

i think if your boolean field is such that you would be referring to them in many cases, it would make sense to have a separate table, example DeletedPages, or SpecialPages, which will have many boolean type fields, like is_deleted, is_hidden, is_really_deleted, requires_higher_user etc, and then you would take joins to get them.
Typically the size of this table would be smaller and you would get some advantage by taking joins, especially as far as code readability and maintainability is concerned. And for this type of query:
select all pages where is_deleted = 1
It would be faster to have it implemented like this:
select all pages where pages
inner join DeletedPages on page.id=deleted_pages.page_id
I think i read it somewhere about mysql databases that you need a field to at least have cardinality of 3 to make indexing work on that field, but please confirm this.

If you are using database that supports bitmap indexes (such as Oracle), then such an index on a boolean column will much more useful than without.

Related

Index on multiple bit fields in SQL Server

We currently have a scenario where one table effectively has several (10 to 15) boolean flags (not nullable bit fields). Unfortunately, it is not really possible to simplify this too much on a logical level, because any combination of the boolean values is permissible.
The table in question is a transactional table which may end up having tens of millions of rows, and both insert and select performance is fairly critical. Although we are not quite sure of the distribution of the data at this time, the combination of all flags should provide relative good cardinality, i.e. make it a "worthwhile" index for SQL Server to make use of.
Typical select query scenarios might be to select records based on 3 or 4 of the flags only, e.g. WHERE FLAG3=1 AND FLAG7=0 AND FLAG9=1. It would not be practical to create separate indexes for all combinations of the flags used by these select queries, as there will be many of them.
Given this situation, what would be the recommended approach to effectively index these fields? The table is new, so there is no existing data to worry about yet, and we have a fair amount of flexibility in the actual implementation of the table.
There are two main options that we are considering at the moment:
Create a single index which includes all the bit fields (this would probably include 1 or 2 other int fields which would always used). My concern is that given the typical usage of only including a few of the fields, this approach would skip the index and resort to a table scan. Let's call this Option A (Having read some of the replies, it seems that this approach would not work well, since the order of the fields in the index would make a difference making it impossible to index effectively on ALL the fields).
Effectively do what I believe SQL Server is doing internally, and encode the bit fields into a single int field using binary operators (AND-ing and OR-ing numbers together: 1, 2, 4, 8, etc). My concern here is that we'd need to do some kind of calculation to query on this encoded field, which would skip the index again. Maintenance and the complexity of this solution is also a concern. Let's call this Option B. Additional info: The argument for this approach is that we could have a relatively simple and short index which includes one or two other fields from the table and this field. The other fields would narrow down the number of records needing to be evaluated, and since the encoded field would contain all of our bit fields, SQL Server would be able to perform the calculation using the data retrieved from the index directly (i.e. an index scan) as opposed to the table (i.e. a table scan).
At the moment, we are heavily leaning towards Option B. For completeness, this would be running on SQL Server 2008.
Any advice would be greatly appreciated.
Edit: Spelling, clarity, query example, additional info on Option B.
A single BIT column typically is not selective enough to be even considered for use in an index. So an index on a single BIT column really doesn't make sense - on average, you'd always have to search about half the entries in the table (50% selectiveness) and so the SQL Server query optimizer will instead use a table scan.
If you create a single index on all 15 bit columns, then you don't have that problem - since you have 15 yes/no options, your index will become quite selective.
Trouble is: the sequence of the bit columns is important. Your index will only ever be considered if your SQL statement uses at least 1-n of the left-most BIT columns.
So if your index is on
Col1,Col2,Col3,....,Col14,Col15
then it might be used for a query that uses
Col1
Col1 and Col2
Col1 and Col2 and Col3
....
and so on. But it cannot be used for a query that specifies Col6,Col9 and Col14.
Because of that, I don't really think an index on your collection of BIT columns really makes a lot of sense.
Are those 15 BIT columns the only columns you use for querying? If not, I would try to combine those BIT columns that you use most for selection with other columns, e.g. have an index on Name and Col7 or something (then your BIT columns can add some additional selectivity to another index)
Whilst there are probably ways to solve your indexing problem against your existing table schema, I would reduce this to a normalisation problem:
e.g I would highly recommend creating a series of new tables:
Lookup table for the names of this bit flags. e.g. CREATE TABLE Flags (id int IDENTITY(1,1), Name varchar(256)) (you don't have to make id an identity-seed column if you want to manually control the id's - e.g. 2,4,8,16,32,64,128 as binary flags.)
Create a new link-table that contains the id's of the original data table and the new link table e.g. CREATE TABLE DataFlags_Link (id int IDENTITY(1,1), MyFlagId int, DataId int)
You could then create an index on the DataFlags_Link table and write queries like:
SELECT Data.*
FROM Data
INNER JOIN DataFlags_Link ON Data.id = DataFlags_Link.DataId
WHERE DataFlags_Link.MyFlagId IN (4,7,2,8)
As for performance, that's where good DBA maintenance comes in. You'll want to set the INDEX fill-factor and padding on your tables appropriately and run regular index defragmentation or rebuild your indexes on a schedule.
Performance and maintenance go hand-in-hand with databases. You can't have one without the other.
Whilst I think Neil Fenwick's answer is probably right, I think the real answer is to try out the different options and see which one is fast enough.
Option 1 is probably the most straightforward solution, and therefore likely the most maintainable - and it may well be fast enough.
I would build a prototype database, with the "option 1" schema, and use something like http://www.red-gate.com/products/sql-development/sql-data-generator/ or http://sourceforge.net/projects/dbmonster/ to create twice as much data as you expect to need, and then build the queries you expect to need. Agree an acceptable response time, and only consider a "faster" schema if you exceed those response times (and you can't throw hardware at the problem).
Neil's solution is probably just as obvious and maintainable as "option 1" - and it should be easy to index. However, I'd still test it by creating a prototype schema and generating a lot of test data...

How to search millions of record in SQL table faster?

I have SQL table with millions of domain name. But now when I search for let's say
SELECT *
FROM tblDomainResults
WHERE domainName LIKE '%lifeis%'
It takes more than 10 minutes to get the results. I tried indexing but that didn't help.
What is the best way to store this millions of record and easily access these information in short period of time?
There are about 50 million records and 5 column so far.
Most likely, you tried a traditional index which cannot be used to optimize LIKE queries unless the pattern begins with a fixed string (e.g. 'lifeis%').
What you need for your query is a full-text index. Most DBMS support it these days.
Assuming that your 50 million row table includes duplicates (perhaps that is part of the problem), and assuming SQL Server (the syntax may change but the concept is similar on most RDBMSes), another option is to store domains in a lookup table, e.g.
CREATE TABLE dbo.Domains
(
DomainID INT IDENTITY(1,1) PRIMARY KEY,
DomainName VARCHAR(255) NOT NULL
);
CREATE UNIQUE INDEX dn ON dbo.Domains(DomainName);
When you load new data, check if any of the domain names are new - and insert those into the Domains table. Then in your big table, you just include the DomainID. Not only will this keep your 50 million row table much smaller, it will also make lookups like this much more efficient.
SELECT * -- please specify column names
FROM dbo.tblDomainResults AS dr
INNER JOIN dbo.Domains AS d
ON dr.DomainID = d.DomainID
WHERE d.DomainName LIKE '%lifeis%';
Of course except on the tiniest of tables, it will always help to avoid LIKE clauses with a leading wildcard.
Full-text indexing is the far-and-away best option here - how this is accomplished will depend on the DBMS you're using.
Short of that, ensuring that you have an index on the column being matched with the pattern will help performance, but by the sounds of it, you've tried this and it didn't help a great deal.
Stop using LIKE statement. You could use fulltext search, but it will require MyISAM table and isn't all that good solution.
I would recommend for you to examine available 3rd party solutions - like Lucene and Sphinx. They will be superior.
One thing you might want to consider is having a separate search engine for such lookups. For example, you can use a SOLR (lucene) server to search on and retrieve the ids of entries that match your search, then retrieve the data from the database by id. Even having to make two different calls, its very likely it will wind up being faster.
Indexes are slowed down whenever they have to go lookup ("bookmark lookup") data that the index itself doesn't contain. For instance, if your index has 2 columns, ID, and NAME, but you're selecting * (which is 5 columns total) the database has to read the index for the first two columns, then go lookup the other 3 columns in a less efficient data structure somewhere else.
In this case, your index can't be used because of the "like". This is similar to not putting any where filter on the query, it will skip the index altogether since it has to read the whole table anyway it will do just that ("table scan"). There is a threshold (i think around 35-50% where the engine normally flips over to this).
In short, it seems unlikely that you need all 50 million rows from the DB for a production application, but if you do... use a machine with more memory and try methods that keep that data in memory. Maybe a No-SQL DB would be a better option - mongoDB, couch DB, tokyo cabinet. Things like this. Good luck!
You could try breaking up the domain into chunks and then searh the chunks themselves. I did some thing like that years ago when I needed to search for words in sentences. I did not have full text searching available so I broke up the sentences into a word list and searched the words. It was really fast to find the results since the words were indexed.

Why can't I simply add an index that includes all columns?

I have a table in SQL Server database which I want to be able to search and retrieve data from as fast as possible. I don't care about how long time it takes to insert into the table, I am only interested in the speed at which I can get data.
The problem is the table is accessed with 20 or more different types of queries. This makes it a tedious task to add an index specially designed for each query. I'm considering instead simply adding an index that includes ALL columns of the table. It's not something you would normally do in "good" database design, so I'm assuming there is some good reason why I shouldn't do it.
Can anyone tell me why I shouldn't do this?
UPDATE: I forgot to mention, I also don't care about the size of my database. It's OK that it means my database size will grow larger than it needed to
First of all, an index in SQL Server can only have at most 900 bytes in its index entry. That alone makes it impossible to have an index with all columns.
Most of all: such an index makes no sense at all. What are you trying to achieve??
Consider this: if you have an index on (LastName, FirstName, Street, City), that index will not be able to be used to speed up queries on
FirstName alone
City
Street
That index would be useful for searches on
(LastName), or
(LastName, FirstName), or
(LastName, FirstName, Street), or
(LastName, FirstName, Street, City)
but really nothing else - certainly not if you search for just Street or just City!
The order of the columns in your index makes quite a difference, and the query optimizer can't just use any column somewhere in the middle of an index for lookups.
Consider your phone book: it's order probably by LastName, FirstName, maybe Street. So does that indexing help you find all "Joe's" in your city? All people living on "Main Street" ?? No - you can lookup by LastName first - then you get more specific inside that set of data. Just having an index over everything doesn't help speed up searching for all columns at all.
If you want to be able to search by Street - you need to add a separate index on (Street) (and possibly another column or two that make sense).
If you want to be able to search by Occupation or whatever else - you need another specific index for that.
Just because your column exists in an index doesn't mean that'll speed up all searches for that column!
The main rule is: use as few indices as possible - too many indices can be even worse for a system than having no indices at all.... build your system, monitor its performance, and find those queries that cost the most - then optimize these, e.g. by adding indices.
Don't just blindly index every column just because you can - this is a guarantee for lousy system performance - any index also requires maintenance and upkeep, so the more indices you have, the more your INSERT, UPDATE and DELETE operations will suffer (get slower) since all those indices need to be updated.
You are having a fundamental misunderstanding how indexes work.
Read this explanation "how multi-column indexes work".
The next question you might have is why not creating one index per column--but that's also a dead-end if you try to reach top select performance.
You might feel that it is a tedious task, but I would say it's a required task to index carefully. Sloppy indexing strikes back, as in this example.
Note: I am strongly convinced that proper indexing pays off and I know that many people are having the very same questions you have. That's why I'm writing a the a free book about it. The links above refer the pages that might help you to answer your question. However, you might also want to read it from the beginning.
...if you add an index that contains all columns, and a query was actually able to use that index, it would scan it in the order of the primary key. Which means hitting nearly every record. Average search time would be O(n/2).. the same as hitting the actual database.
You need to read a bit lot about indexes.
It might help if you consider an index on a table to be a bit like a Dictionary in C#.
var nameIndex = new Dictionary<String, List<int>>();
That means that the name column is indexed, and will return a list of primary keys.
var nameOccupationIndex = new Dictionary<String, List<Dictionary<String, List<int>>>>();
That means that the name column + occupation columns are indexed. Now imagine the index contained 10 different columns, nested so far deep it contains every single row in your table.
This isn't exactly how it works mind you. But it should give you an idea of how indexes could work if implemented in C#. What you need to do is create indexes based on one or two keys that are queried on extensively, so that the index is more useful than scanning the entire table.
If this is a data warehouse type operation where queries are highly optimized for READ queries, and if you have 20 ways of dissecting the data, e.g.
WHERE clause involves..
Q1: status, type, customer
Q2: price, customer, band
Q3: sale_month, band, type, status
Q4: customer
etc
And you absolutely have plenty of fast storage space to burn, then by all means create an index for EVERY single column, separately. So a 20-column table will have 20 indexes, one for each individual column. I could probably say to ignore bit columns or low cardinality columns, but since we're going so far, why bother (with that admonition). They will just sit there and churn the WRITE time, but if you don't care about that part of the picture, then we're all good.
Analyze your 20 queries, and if you have hot queries (the hottest ones) that still won't go any faster, plan it using SSMS (press Ctrl-L) with one query in the query window. It will tell you what index can help that queries - just create it; create them all, fully remembering that this adds again to the write cost, backup file size, db maintenance time etc.
I think the questioner is asking
'why can't I make an index like':
create index index_name
on table_name
(
*
)
The problems with that have been addressed.
But given it sounds like they are using MS sql server.
It's useful to understand that you can include nonkey columns in an index so they the values of those columns are available for retrieval from the index, but not to be used as selection criteria :
create index index_name
on table_name
(
foreign_key
)
include (a,b,c,d) -- every column except foreign key
I created two tables with a million identical rows
I indexed table A like this
create nonclustered index index_name_A
on A
(
foreign_key -- this is a guid
)
and table B like this
create nonclustered index index_name_B
on B
(
foreign_key -- this is a guid
)
include (id,a,b,c,d) -- ( every key except foreign key)
no surprise, table A was slightly faster to insert to.
but when I and ran these this queries
select * from A where foreign_key = #guid
select * from B where foreign_key = #guid
On table A, sql server didn't even use the index, it did a table scan, and complained about a missing index including id,a,b,c,d
On table B, the query was over 50 times faster with much less io
forcing the query on A to use the index didn't make it any faster
select * from A where foreign_key = #guid
select * from A with (index(index_name_A)) where foreign_key = #guid
I'm considering instead simply adding an index that includes ALL columns of the table.
This is always a bad idea. Indexes in database is not some sort of pixie dust that works magically. You have to analyze your queries and according to what and how is being queried - append indexes.
It is not as simple as "add everything to index and have a nap"
I see only long and complicated answers here so I thought I should give the simplest answer possible.
You cannot add an entire table, or all its columns, to an index because that just duplicates the table.
In simple terms, an index is just another table with selected data ordered in the order you normally expect to query it in, and a pointer to the row on disk where the rest of the data lives.
So, a level of indirection exists. You have a partial copy of a table in an preordered manner (both on disk and in RAM, assuming the index is not fragmented), which is faster to query for the columns defined in the index only, while the rest of the columns can be fetched without having to scan the disk for them, because the index contains a reference to the correct position on disk where the rest of the data is for each row.
1) size, an index essentially builds a copy of the data in that column some easily searchable structure, like a binary tree (I don't know SQL Server specifcs).
2) You mentioned speed, index structures are slower to add to.
That index would just be identical to your table (possibly sorted in another order).
It won't speed up your queries.

faster way to use sets in MySQL

I have a MySQL 5.1 InnoDB table (customers) with the following structure:
int record_id (PRIMARY KEY)
int user_id (ALLOW NULL)
varchar[11] postcode (ALLOW NULL)
varchar[30] region (ALLOW NULL)
..
..
..
There are roughly 7 million rows in the table. Currently, the table is being queried like this:
SELECT * FROM customers WHERE user_id IN (32343, 45676, 12345, 98765, 66010, ...
in the actual query, currently over 560 user_ids are in the IN clause. With several million records in the table, this query is slow!
There are secondary indexes on table, the first of which being on user_id itself, which I thought would help.
I know that SELECT(*) is A Bad Thing and this will be expanded to the full list of fields required. However, the fields not listed above are more ints and doubles. There are another 50 of those being returned, but they are needed for the report.
I imagine there's a much better way to access the data for the user_ids, but I can't think how to do it. My initial reaction is to remove the ALLOW NULL on the user_id field, as I understand NULL handling slows down queries?
I'd be very grateful if you could point me in a more efficient direction than using the IN ( ) method.
EDIT
Ran EXPLAIN, which said:
select_type = SIMPLE
table = customers
type = range
possible_keys = userid_idx
key = userid_idx
key_len = 5
ref = (NULL)
rows = 637640
Extra = Using where
does that help?
First, check if there is an index on USER_ID and make sure it's used.
You can do it with running EXPLAIN.
Second, create a temporary table and use it in a JOIN:
CREATE TABLE temptable (user_id INT NOT NULL)
SELECT *
FROM temptable t
JOIN customers c
ON c.user_id = t.user_id
Third, how may rows does your query return?
If it returns almost all rows, then it just will be slow, since it will have to pump all these millions over the connection channel, to begin with.
NULL will not slow your query down, since the IN condition only satisfies non-NULL values which are indexed.
Update:
The index is used, the plan is fine except that it returns more than half a million rows.
Do you really need to put all these 638,000 rows into the report?
Hope its not printed: bad for rainforests, global warming and stuff.
Speaking seriously, you seem to need either aggregation or pagination on your query.
"Select *" is not as bad as some people think; row-based databases will fetch the entire row if they fetch any of it, so in situations where you're not using a covering index, "SELECT *" is essentially no slower than "SELECT a,b,c" (NB: There is sometimes an exception when you have large BLOBs, but that is an edge-case).
First things first - does your database fit in RAM? If not, get more RAM. No, seriously. Now, suppose your database is too huge to reasonably fit into ram (Say, > 32Gb) , you should try to reduce the number of random I/Os as they are probably what's holding things up.
I'll assuming from here on that you're running proper server grade hardware with a RAID controller in RAID1 (or RAID10 etc) and at least two spindles. If you're not, go away and get that.
You could definitely consider using a clustered index. In MySQL InnoDB you can only cluster the primary key, which means that if something else is currently the primary key, you'll have to change it. Composite primary keys are ok, and if you're doing a lot of queries on one criterion (say user_id) it is a definite benefit to make it the first part of the primary key (you'll need to add something else to make it unique).
Alternatively, you might be able to make your query use a covering index, in which case you don't need user_id to be the primary key (in fact, it must not be). This will only happen if all of the columns you need are in an index which begins with user_id.
As far as query efficiency is concerned, WHERE user_id IN (big list of IDs) is almost certainly the most efficient way of doing it from SQL.
BUT my biggest tips are:
Have a goal in mind, work out what it is, and when you reach it, stop.
Don't take anybody's word for it - try it and see
Ensure that your performance test system is the same hardware spec as production
Ensure that your performance test system has the same data size and kind as production (same schema is not good enough!).
Use synthetic data if it is not possible to use production data (Copying production data may be logistically difficult (Remember your database is >32Gb) ; it may also violate security policies).
If your query is optimal (as it probably already is), try tuning the schema, then the database itself.
Is this your most important query? Is this a transactional table?
If so, try creating a clustered index on user_id. Your query might be slow because it still must make random disk reads to retrieve the columns (key lookups), even after finding the records that match (index seek on the user_Id index).
If you cannot change the clustered index, then you might want to consider an ETL process (simplest is a trigger that inserts into another table with the best indexing). This should yield faster results.
Also note that such large queries may take some time to parse, so help it out by putting the queried ids into a temp table if possibl
Are they the same ~560 id's every time? Or is it a different ~500 ids on different runs of the queries?
You could just insert your 560 UserIDs into a separate table (or even a temp table), stick an index on the that table and inner join it to you original table.
You can try to insert the ids you need to query on in a temp table and inner join both tables. I don't know if that would help.

Do indexes work with "IN" clause

If I have a query like:
Select EmployeeId
From Employee
Where EmployeeTypeId IN (1,2,3)
and I have an index on the EmployeeTypeId field, does SQL server still use that index?
Yeah, that's right. If your Employee table has 10,000 records, and only 5 records have EmployeeTypeId in (1,2,3), then it will most likely use the index to fetch the records. However, if it finds that 9,000 records have the EmployeeTypeId in (1,2,3), then it would most likely just do a table scan to get the corresponding EmployeeIds, as it's faster just to run through the whole table than to go to each branch of the index tree and look at the records individually.
SQL Server does a lot of stuff to try and optimize how the queries run. However, sometimes it doesn't get the right answer. If you know that SQL Server isn't using the index, by looking at the execution plan in query analyzer, you can tell the query engine to use a specific index with the following change to your query.
SELECT EmployeeId FROM Employee WITH (Index(Index_EmployeeTypeId )) WHERE EmployeeTypeId IN (1,2,3)
Assuming the index you have on the EmployeeTypeId field is named Index_EmployeeTypeId.
Usually it would, unless the IN clause covers too much of the table, and then it will do a table scan. Best way to find out in your specific case would be to run it in the query analyzer, and check out the execution plan.
Unless technology has improved in ways I can't imagine of late, the "IN" query shown will produce a result that's effectively the OR-ing of three result sets, one for each of the values in the "IN" list. The IN clause becomes an equality condition for each of the list and will use an index if appropriate. In the case of unique IDs and a large enough table then I'd expect the optimiser to use an index.
If the items in the list were to be non-unique however, and I guess in the example that a "TypeId" is a foreign key, then I'm more interested in the distribution. I'm wondering if the optimiser will check the stats for each value in the list? Say it checks the first value and finds it's in 20% of the rows (of a large enough table to matter). It'll probably table scan. But will the same query plan be used for the other two, even if they're unique?
It's probably moot - something like an Employee table is likely to be small enough that it will stay cached in memory and you probably wouldn't notice a difference between that and indexed retrieval anyway.
And lastly, while I'm preaching, beware the query in the IN clause: it's often a quick way to get something working and (for me at least) can be a good way to express the requirement, but it's almost always better restated as a join. Your optimiser may be smart enough to spot this, but then again it may not. If you don't currently performance-check against production data volumes, do so - in these days of cost-based optimisation you can't be certain of the query plan until you have a full load and representative statistics. If you can't, then be prepared for surprises in production...
So there's the potential for an "IN" clause to run a table scan, but the optimizer will
try and work out the best way to deal with it?
Whether an index is used doesn't so much vary on the type of query as much of the type and distribution of data in the table(s), how up-to-date your table statistics are, and the actual datatype of the column.
The other posters are correct that an index will be used over a table scan if:
The query won't access more than a certain percent of the rows indexed (say ~10% but should vary between DBMS's).
Alternatively, if there are a lot of rows, but relatively few unique values in the column, it also may be faster to do a table scan.
The other variable that might not be that obvious is making sure that the datatypes of the values being compared are the same. In PostgreSQL, I don't think that indexes will be used if you're filtering on a float but your column is made up of ints. There are also some operators that don't support index use (again, in PostgreSQL, the ILIKE operator is like this).
As noted though, always check the query analyser when in doubt and your DBMS's documentation is your friend.
#Mike: Thanks for the detailed analysis. There are definately some interesting points you make there. The example I posted is somewhat trivial but the basis of the question came from using NHibernate.
With NHibernate, you can write a clause like this:
int[] employeeIds = new int[]{1, 5, 23463, 32523};
NHibernateSession.CreateCriteria(typeof(Employee))
.Add(Restrictions.InG("EmployeeId",employeeIds))
NHibernate then generates a query which looks like
select * from employee where employeeid in (1, 5, 23463, 32523)
So as you and others have pointed out, it looks like there are going to be times where an index will be used or a table scan will happen, but you can't really determine that until runtime.
Select EmployeeId From Employee USE(INDEX(EmployeeTypeId))
This query will search using the index you have created. It works for me. Please do a try..