SQL - How to find a specific value from DB without going through the entire DB table - sql

I wonder how can I find a specific value from DB without going through the entire DB table.
by example:
There is a DB of students and we are looking for all the students with a certain name, how do you do that without going through the whole DB table.

Use INDEXES
Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. ... Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

SQL Server has four options for improving performance for this type of query:
A regular index (either clustered or non-clustered).
A full text index.
Partitioning.
Hash index (for memory optimized tables).
A regular index, created using create index, is the "canonical" answer to this question. It is like an alphabetical list of all names with a pointer to the record. The implementation uses something called B-trees, so the analogy is not perfect. These indexes can be used for equality (eg. =, is null) and inequality comparisons (eg. in, <, >)
A full text index indexes all words in a text column (for some definition of "word"). This can be used for a range of full text search options -- and available through contains.
Partitioning is used when you have lots and lots of data and only a handful of categories. That is highly unlikely with a name in a student database. But it physically splits the data into separate files for each name or range of names.
Hash-based indexing is only available on memory-optimized tables. These are only useful for comparisons using = and in (and <> and not in).

Related

Are indexes on columns with always a different value worth it?

Does creating an index on a column that will always have a different value in each record (like a unique column) improves performances on SELECTs?
I understand that having an index on a column named ie. status which can have 3 values (such as PENDING, DONE, FAILED) and searching only FAILED in 1kk records will be faster.
But what happens if I have a unique id (not the primary key) in 1kk records, and I'm doing a SELECT on that column?
An index on a unique column is actually better than an index on a column with a few values.
To understand why, you need a basic understanding of how databases manage storage. This is a high-level view.
The primary purpose of an index is to reduce the number of pages that need to be read for a query. The rows themselves are stored on data pages. If you don't have an index, then all the data needs to be read.
The index is a data structure that makes it efficient to find a particular value. You can think of it as a sorted list, where a binary search is used to identify the right location. In actual fact, these are usually stored in a structure called b-trees (where the "b" stands for "balanced", not "binary") but that is an implementation detail. And there are types of indexes that don't use b-trees.
So, if the values are unique, then an index is extremely helpful. Instead of doing a full table scan, the "row id" can efficiently be looked up in the index and then only one data page needs to be read.
Note that unique constraints are implemented using indexes. If you have declared a column to be unique, there is no need for an additional index because it is already there.

Two indexes for same column and change the order

I have a large table in Microsoft SQL Server 2008. It has two indexes. One index having column A descending order and another index having the column A ascending with some other columns.
My application is doing below:
Select for the record.
If there is no record then insert
If find then update the records
Note that this table has millions of records.
The question is: Are these indexes affect the any select/insert/update performance?
Any suggestions?
Having the exact same two indexes with the only difference being the ordering will make no difference to the SQL engine and will just pick either (practially).
Imagine you 2 dictionaries of the english language, one sorts words from A to Z and the other from Z to A. The effort you will need to search for a particular word will be roughly the same in both cases.
A different case would be if you had 2 lists of people's data, one ordered by first name then last name and the other by last name first, then first name. If you have to look for "John Doe", the one that's ordered first by last name will be practically useless compared to the other one.
These examples are very simplified representations of indexes on SQL Server. Indexes store their data on a structure that's called a B-Tree, but for searching purposes these examples work to understand when will a index be useful or not.
Resuming: you can drop the first index and keep the one that has additional columns on it, since it can be used for more different scenarios and also all cases that would require the other one. Keeping an unuseful index brings additional maintenance tasks like keeping the index updated on every insert, update, delete and refreshing statistics, so you might want to drop it.
PD: As for the 2nd index being used, this greatly depends on the query you are using to access that particular table, it's joins and where clauses. You can use the "Display Estimated Execution Plan" having highlighted the query on SSMS to see the access plan of the engine to each object to perform the operation. If the index is used, you will see it there.
Thanks for all the answers. I explored the SQL Server Profiler and SQL Tuning Advisor for the SQL Server and ran them to get the recommended indexes. It recommended a single index with include options. I used the index and the performance has been improved.

What is the time complexity for string equality query in databases like sql or oracle?

Let's say I have a database with string entries in it, and I need to look for a certain string, say "gamma". What would be the time complexity if I search for the particular string via this query :
SELECT * FROM TABLE WHERE NAME='GAMMA'
Is there a faster or more efficient way to do this? Would it be better if I stored SHA256 of the strings as this would ensure a fixed size even if the string is large (assuming that string comparison happens according to O(length of string)?
Assume that only unique entries of strings are there.
This is a pretty meaningless question in SQL. This query can be approached in multiple different ways:
Doing a full table scan
Using an index on (name) or name is the first column of any index, followed by a row lookup
Doing a partial table scan, if name is a partitioning column
Doing a binary search, if the table is an index-organized table
And there are variations and complexities on these.
When thinking about the performance of SQL queries, you generally want to think about the I/O operations needed for the set, so about the complexity of specific operations. That said, some operations are quite expensive, so sometimes you need to take that into account. String comparison on relatively short strings is not one of those operations.
If you have large strings, then hashing the string can be a helpful optimization -- particularly for comparing strings in different rows. However, it comes at the cost of query complexity. An index is typically sufficient for this operation.

Creating index for a query

I have one table Person with two columns Name and Gender and suppose in my application if I have a query which is called frequently :
select * from Person where Gender = 'M'
So is it advisable to create an index on the column Gender?
It's not advisable unless there is loads of one an only a few of the other and your query only looks at the few. A full table scan would give you a much more efficient result than diving through an index. In fact, even if you created the index, it's highly unlikely the optimiser would use it.
Below points might give you the idea:
From Documentation
In general, index access paths are more efficient for statements that retrieve a small subset of table rows, whereas full table scans are more efficient when accessing a large portion of a table.
Do not index columns that are modified frequently. UPDATE statements that modify indexed columns and INSERT and DELETE statements that modify indexed tables take longer than if there were no index. Such SQL statements must modify data in indexes as well as data in tables. They also generate additional undo and redo.
When choosing to index a key, consider whether the performance gain for queries is worth the performance loss for INSERTs, UPDATEs, and DELETEs and the use of the space required to store the index. You might want to experiment by comparing the processing times of the SQL statements with and without indexes. You can measure processing time with the SQL trace facility.

What are the different types of indexes, what are the benefits of each?

What are the different types of indexes, what are the benefits of each?
I heard of covering and clustered indexes, are there more? Where would you use them?
Unique - Guarantees unique values for the column(or set of columns) included in the index
Covering - Includes all of the columns that are used in a particular query (or set of queries), allowing the database to use only the index and not actually have to look at the table data to retrieve the results
Clustered - This is way in which the actual data is ordered on the disk, which means if a query uses the clustered index for looking up the values, it does not have to take the additional step of looking up the actual table row for any data not included in the index.
OdeToCode has a good article covering the basic differences
As it says in the article:
Proper indexes are crucial for good
performance in large databases.
Sometimes you can make up for a poorly
written query with a good index, but
it can be hard to make up for poor
indexing with even the best queries.
Quite true, too... If you're just starting out with it, I'd focus on clustered and composite indexes, since they'll probably be what you use the most.
I'll add a couple of index types
BITMAP - when you have very low number of different possible values, very fast and doesn't take up much space
PARTITIONED - allows the index to be partitioned based on some property usually advantageous on very large database objects for storage or performance reasons.
FUNCTION/EXPRESSION indexes - used to pre-calculate some value based on the table and store it in the index, a very simple example might be an index based on lower() or a substring function.
PostgreSQL allows partial indexes, where only rows that match a predicate are indexed. For instance, you might want to index the customer table for only those records which are active. This might look something like:
create index i on customers (id, name, whatever) where is_active is true;
If your index many columns, and you have many inactive customers, this can be a big win in terms of space (the index will be stored in fewer disk pages) and thus performance. To hit the index you need to, at a minimum, specify the predicate:
select name from customers where is_active is true;
Conventional wisdom suggests that index choice should be based on cardinality. They'll say,
For a low cardinality column like GENDER, use bitmap. For a high cardinality like LAST_NAME, use b-tree.
This is not the case with Oracle, where index choice should instead be based on the type of application (OLTP vs. OLAP). DML on tables with bitmap indexes can cause serious lock contention. On the other hand, the Oracle CBO can easily combine multiple bitmap indexes together, and bitmap indexes can be used to search for nulls. As a general rule:
For an OLTP system with frequent DML and routine queries, use btree. For an OLAP system with infrequent DML and adhoc queries, use bitmap.
I'm not sure if this applies to other databases, comments are welcome. The following articles discuss the subject further:
Bitmap Index vs. B-tree Index: Which and When?
Understanding Bitmap Indexes
Different database systems have different names for the same type of index, so be careful with this. For example, what SQL Server and Sybase call "clustered index" is called in Oracle an "index-organised table".
I suggest you search the blogs of Jason Massie (http://statisticsio.com/) and Brent Ozar (http://www.brentozar.com/) for related info. They have some post about real-life scenario that deals with indexes.
Oracle has various combinations of b-tree, bitmap, partitioned and non-partitioned, reverse byte, bitmap join, and domain indexes.
Here's a link to the 11gR1 documentation on the subject: http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/data_acc.htm#PFGRF004
Unique
cluster
non-cluster
column store
Index with included column
index on computed column
filtered
spatial
xml
full text
SQL Server 2008 has filtered indexes, similar to PostgreSQL's partial indexes. Both allow to include in index only rows matching specified criteria.
The syntax is identical to PostgreSQL:
create index i on Customers(name) where is_alive = cast(1 as bit);
To view the types of indexes and its meaning visits:
https://msdn.microsoft.com/en-us/library/ms175049.aspx