How database index works

Ever wonder when you trigger a query that took 2 minutes to complete then when you applied index as suggested by the SQL Server Database Engine Tuning Advisor then it suddenly runs under a second?.

Think of it like this, when a data is stored in a disk based storage devices, the entirety of the data is stored as blocks of data. When running a query against unindexed field which value is not unique, to search a value it would require to scan the entire blocks of data (at worst N).

With an indexed field, a new blocks of data is created to store the indexed field which value is already sorted. Therefore binary search is performed when trying to find a value that in the indexed fields (log2 N).

Now for example if you have a table schema like the following

Person

Field Data type Size in disk
Id (primary key) unsigned int 4 bytes
FirstName char(50) 50 bytes
LastName char(50) 50 bytes

with that schema, to store a record in a disk it would take 104 bytes, or in one disk block (1024 bytes) it can store 1024/104 = 9 records in a disk block. If you have 1.000.000 records then it would take 1.000.000/9 = 111.111 disk block to store all of those data.

Now depending on the type of query that you run against the table you would get different result in performance, for example if you do a search query against the Id field it would perform a binary search (log2 n) which results in log2 111.111 = 16 block access. This is possible because the Id field is a primary key which value has to be unique and also has been sorted.

Compare it with a query against the FirstName field, since the FirstName field is not sorted a binary search would not be possible, thus it would require exactly 111.111 block access to find the value,  a huge difference.

Creating an index would help greatly in slow performing query, but once again you have to keep mind that creating index would also mean creating a new data structure that is stored in the disk. For example if we are to create an index for the FirstName field

Field Data type Size in disk
FirstName char(50) 50 bytes
(record pointer) special 4 bytes

Based on that schema then it would require 54 bytes for each record. 1024/54 = 18 record in a disk block.  1.000.000/18 = 55.555 disk block. Manage your index wisely, the more fields the index contains or the more index that you created the more disk space that it’s going to take.

reference:

http://stackoverflow.com/questions/1108/how-does-database-indexing-work/1130#1130

Stack VS Heap

There’s two type of memory that we should know. Stack and Heap, what’s the difference ?

Stack

  • keeps track of the code execution contains
  • it uses LIFO structure
  • high performance memory, fixed limit
  • local variables goes to stack
  • points to an object in heap

Heap

  • It’s a large pool of operating system memory
  • used in dynamic memory allocation
  • garbage collector remove any resources that no longer used

 

let’s take a look on how stack and heap works

public void Go()
{
    MyInt x = new MyInt();
    x.MyValue = 2;

    DoSomething(x);

    Console.WriteLine(x.MyValue.ToString());
  
}

public void DoSomething(MyInt pValue)
{
    pValue.MyValue = 12345;
}

when that method is called, this what’s happening in the stack

  1. The Go() method is called
  2. the x local variable is defined and goes to the stack, x is a pointer that point to MyInt instance in the heap
  3. MyValue property is set, the value is stored in heap because MyValue is declared on the heap
  4. DoSomething() method is called
  5. the MyValue value in the heap is changed to 12345

references:

https://www.youtube.com/watch?v=clOUdVDDzIM
http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap
http://www.c-sharpcorner.com/UploadFile/rmcochran/csharp_memory01122006130034PM/csharp_memory.aspx

SQL Connection Pooling

It is expensive to create an SQL connection, a socket connection must be established, a handshake must occurred, the connection credential must be check against the list of known credentials. To optimize this a technique called SQL Connection Pooling is implemented.

Every time a connection is required, we request it from the connection pool, if there’s exist the connection by the specified connection string then it’s going to return that connection rather than creating a new one. Every time we are going to close a connection we do not close it but instead we returned it to the connection pool. This way we can save the connection for later use.

reference: http://msdn.microsoft.com/en-us/library/8xx3tyca(v=vs.80).aspx

What is SQL Injection ?

It’s a technique which is used to attack a software, be it a desktop application or a website as long as it uses a database server behind it. It is done by inputting a malicious input in attempt to get a dangerous query to run.

Example:

select * from Products where ProductName = 'Car'

This will get all products where the product name is Car

Imagine the Car value is retrieved from an input text on a form somewhere, what would happen if we input something like this.

select * from Products where ProductName = ''; delete from Products –''

This query will delete –assuming the running user has access- all data from our products table. which is clearly not good.

Issues like this are commonly happens on code that concatenates string to form query

What should we do ?

  • Do not concatenates string to form a query
  • Use a parameterized query to execute query, ORM tools such as EF uses parameterized query
  • Turn on custom error page on production to avoid giving crucial information to malicious users
  • Give the running user appropriate permissions, do not give access to modify table or creating new record in table if you only want a read only operation