B Trees and B+ Trees

0. Foundation

Disk Structure

Let's start with disk structure. It's a platter with concentric circles, which are logical not physical. These circles of different radii are called tracks, and the vertical sections are called sectors. The intersection of a track and a sector is called a block. Hence, any location on the disk, i.e., block address, can be identified by the track number and sector number.

How is Data Stored on Disk

Let's consider a block size of 512 bytes (hypothetical), meaning each block is 512 bytes. All read and write operations are in terms of blocks. Considering one block of 512 bytes, we have a beginning address of 0 up to 511. Each byte can have its own address and is called an offset. Now, we can refer to any one byte on the disk in terms of block address and offset.

The disk is mounted on a spindle and has a head post. By spinning, we can change the sectors, and with the arm movement of the head (for reading and writing), the tracks are changed. This allows us to access any block or byte of data.

Figure 1: Main Memory for Processing

Another type of memory is main memory (RAM). We run programs in main memory, and these programs access the data on the disk. To do so, the data has to be brought into the main memory to be accessed, and any updated results are written back to the disk. Thus, the data cannot be directly processed on the disk and must be brought into the main memory.

Organizing the data inside the main memory as used by the program involves data structures, while organizing the data on the disk so that it can be accessed efficiently is managed by a DBMS (Database Management System).

Moving on to understanding how the data is organized on the disk, let's consider an employee table with columns such as ID, name, department, and several others, containing 100 records/rows. Each record is 128 bytes, and each block on the disk is 512 bytes. Hence, the number of records that can be stored in each block is 4.

Figure 2: Disk Organization

Number of blocks required = Total Number of Records / Records per block = 100/4 = 25 blocks are required for storing 100 records.

What is Indexing

Now, let's consider a query to search for a particular record. Because of the sequential access, this would require reading up to 25 blocks on the disk. To reduce this time and the number of blocks to scan, we prepare an index table. Each record in the index has a pointer to the table inside the disk/block, called the record pointer.

Figure 3: Table Index

Typically, for a dense index, each record in the table has an entry in the index. The index is also stored on the disk. Given that the index is stored on the disk, the space it takes can be calculated as follows: We know that the ID takes 10 bytes, and assume the pointer takes 6 bytes. So, each entry in the index takes 16 bytes.

For 100 records, the size of the index on the disk is: 100 × 16 bytes = 1600 bytes. And the number of blocks required is 1600 bytes/512 bytes per block = 3.125. Approximately, we need 4 blocks to store the index records on the disk.

This significantly reduces the number of blocks to be read on the disk. By finding the ID in the index (accessing up to 4 blocks), followed by using the pointer address to find the record (accessing 1 block), the maximum number of blocks to access a record is reduced from 25 to 5 blocks.

What is Multi-Level Indexing

But as the number of records grows, the size of the index also grows. For instance, if we have 1,000 records in the employee table, this would take up 250 blocks on the disk. The index would now need 32 blocks for the 1,000 entries, leading to the index itself becoming too large or requiring too many blocks to scan.

Figure 4: Multi Level Index

This can be optimized by using multi-level indexing. The level-2 index points to the first record of each level-1 index block. Each level-1 index block contains 32 index records (ID + Pointer). Thus, each entry in the level-2 index holds the first ID and a pointer to a level-1 index block. This sparse index reduces the number of blocks needed to search.

By adding another level of indexing, the number of blocks to scan for a given ID is reduced from 33 blocks (32 level-1 Index + 1 table block) to 3 blocks (1 level-2 index + 1 level-1 index + 1 table block).

Figure 4: Two Level Sparse Index

The multi-level index forms a tree structure. However, manually adding indexes every time the number of records increases significantly is not feasible. What we want is a system that can automatically add and delete higher-level indices, resulting in a self-managed multi-level indexing system.

This concludes the foundation and introduces the basic concept behind B-trees, B+ trees, and more generally, M-way search trees.

1. M-way Search Trees

Starting with a BST (Binary Search Tree), each node has one key/value and at most two children: the left child contains values less than the parent node, and the right child contains values greater than the parent node.

Figure 5: BST vs M-Way Search Tree

Extending this to a similar search tree structure with multiple keys per node, consider an example with 2 keys: [20, 50]. Here, the keys are arranged in ascending order, and the node can have 3 children: one for values less than 20, one for values between 20 and 50, and one for values greater than 50.

These are called M-Way Search Trees. The above example is a 3-way search tree, where the node has 2 keys and at most 3 children. In general, an M-Way Search Tree can have at most M children and (M - 1) keys. Thus, M is based on the number of children, representing the degree of a node.

Figure 6: BST and 4-Way Search Tree Node

Similar to a BST, where a node has a key/data and left and right child pointers, in an M-Way Search Tree, a node can have up to M children pointers and (M - 1) keys. For example, in a 4-way search tree, a node can have up to 4 children pointers and 3 keys.

Figure 7: M-Way Search Tree (Multi Index) Node

For multi-level indexes, we can use an M-Way Search Tree, where each node contains keys, child pointers, and record pointers (to the database record).

However, let's take an example of using a 10-Way Search Tree for Multi-Level Indexing to see the issue with M-Way Trees. Where each node can have at most 10 children (M - 1) and 9 keys. Insert the keys: 10, 20, 30

Figure 8: M-Way Search Tree - Out of Control

While it may seem obvious to first fill up the node keys before branching to a new child, M-Way Search Trees have no strict rules enforced on branching (inserts and deletes). In the worst case, an M-Way Search Tree could have a length of N for N number of keys, which is as bad as a linear search.

2. B Trees

In short, B-trees are M-Way trees with rules. The 4 rules:

A node should have at least ⌈M/2⌉ children before creating a new child node to control the height of the M-Way Search Tree.
The root node can have a minimum of 2 children without the restriction of Point 1.
All leaf nodes should be at the same level.
The creation process is bottom-up.

Taking an example, M = 4 (4 children, 3 keys), starting with keys: [10, 20, 40, 50]:

Initial Insertion:

Insert 10: The tree is empty, so 10 becomes the first key in the root node.
Insert 20: 20 is greater than 10, so it is placed in the same node, resulting in [10, 20].
Insert 40: 40 is greater than both 10 and 20, so it is placed in the same node, resulting in [10, 20, 40].

Figure 9: B Tree, Insertion and Overflow (1)

Handling Overflow:

Insert 50: The node now has 4 keys, exceeding the limit of 3 keys per node. This requires splitting the node.
The middle key, 40, will be promoted to a new root node.
The keys are split into two nodes: [10, 20] and [50], with 40 as the root.

Adding more keys [60, 70, 70]:

Insert 60: Goes to the right node [50], resulting in [50, 60].
Insert 70: Goes to the right node [50, 60], resulting in [50, 60, 70].

Figure 10: B Tree, Insertion and Overflow (2)

Handling Overflow in Right Node:

Insert 80: The right node now has 4 keys, exceeding the limit. The middle key, 70, is promoted to the root.
The keys are split into two nodes: [50, 60] and [80].

Adding more keys [30, 35]:

Insert 30: Goes to the left node [10, 20], resulting in [10, 20, 30].
Insert 35: The left node [10, 20, 30] now has 4 keys, exceeding the limit. The middle key, 30, will be promoted to the root.
The keys are split into two nodes: [10, 20] and [35].

Figure 11: B Tree, Insertion and Overflow (3)

The root now has [30, 40, 70].
Children: [10, 20], [35], [50, 60], [80].

Adding more keys [5, 15]:

Insert 5: Goes to the leftmost node [10, 20], resulting in [5, 10, 20].
Insert 15: The leftmost node [5, 10, 20] now has 4 keys, exceeding the limit. The middle key, 15, will be promoted to the parent node.

Figure 12: B Tree, Insertion and Overflow (3)

The node [30, 40, 70] will now have [15, 30, 40, 70], and needs to split because it has exceeded the limit.
The middle key, 40, will be promoted to a new root.
The keys are split into three nodes: [15, 30] and [70].
Re-arrange the links/connections.

Figure 13: B Tree, Final Form

Note: Splitting the node in the above example is ⌈M/2⌉ (Ciel), which makes it left-biased, whereas choosing ⌊M/2⌋ (Floor) is still valid and would be right-biased.

3. B+ Trees

B+ Tree is a variant of a B Tree. In a B Tree, every node has keys with record pointers, as shown in Figure 7. In contrast, a B+ Tree does not have record pointers in every node. Instead, only the leaf nodes have record pointers.

Figure 13: B+ Tree

Every key in the B+ Tree has its copy in the leaf node along with its record pointer. Additionally, the leaf nodes are connected, forming a linked list, making the leaf-node level a dense index, which matches the format of the multi-level index we wanted (Figure 4).

4. References

"B-Trees and B+ Trees,"  Abdul Bari - YouTube. [Online]. Available: https://www.youtube.com/watch?v=aZjYr87r1b8.
"Platter," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Platter.
"Sector," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Sector.
"Random-access memory," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Random-access_memory.
"Database index," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Database_index.
"Dense Index," MLWiki. [Online]. Available: https://mlwiki.org/index.php/Dense_Index.
"Sparse Index," MLWiki. [Online]. Available: https://mlwiki.org/index.php/Sparse_Index.
"Binary search tree," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Binary_search_tree.
"B-tree," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/B-tree.
"B+ tree," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/B%2B_tree.
"M-ary tree," Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/M-ary_tree.

Disk Structure

How is Data Stored on Disk

What is Indexing

What is Multi-Level Indexing

#index table of contents