How database index works

Ever wonder when you trigger a query that took 2 minutes to complete then when you applied index as suggested by the SQL Server Database Engine Tuning Advisor then it suddenly runs under a second?.

Think of it like this, when a data is stored in a disk based storage devices, the entirety of the data is stored as blocks of data. When running a query against unindexed field which value is not unique, to search a value it would require to scan the entire blocks of data (at worst N).

With an indexed field, a new blocks of data is created to store the indexed field which value is already sorted. Therefore binary search is performed when trying to find a value that in the indexed fields (log2 N).

Now for example if you have a table schema like the following

Person

Field Data type Size in disk
Id (primary key) unsigned int 4 bytes
FirstName char(50) 50 bytes
LastName char(50) 50 bytes

with that schema, to store a record in a disk it would take 104 bytes, or in one disk block (1024 bytes) it can store 1024/104 = 9 records in a disk block. If you have 1.000.000 records then it would take 1.000.000/9 = 111.111 disk block to store all of those data.

Now depending on the type of query that you run against the table you would get different result in performance, for example if you do a search query against the Id field it would perform a binary search (log2 n) which results in log2 111.111 = 16 block access. This is possible because the Id field is a primary key which value has to be unique and also has been sorted.

Compare it with a query against the FirstName field, since the FirstName field is not sorted a binary search would not be possible, thus it would require exactly 111.111 block access to find the value,  a huge difference.

Creating an index would help greatly in slow performing query, but once again you have to keep mind that creating index would also mean creating a new data structure that is stored in the disk. For example if we are to create an index for the FirstName field

Field Data type Size in disk
FirstName char(50) 50 bytes
(record pointer) special 4 bytes

Based on that schema then it would require 54 bytes for each record. 1024/54 = 18 record in a disk block.  1.000.000/18 = 55.555 disk block. Manage your index wisely, the more fields the index contains or the more index that you created the more disk space that it’s going to take.

reference:

http://stackoverflow.com/questions/1108/how-does-database-indexing-work/1130#1130

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s