Hash File Organizations

Focusing on static hashing and various methods to handle bucket overflow. It also touches on dynamic hashing as an alternative approach. Let's break down the key points:

Data bucket – Data buckets are the memory locations where the records are stored. These buckets are also considered as Unit Of Storage.
Hash Function – Hash function is a mapping function that maps all the set of search keys to actual record address. Generally, hash function uses the primary key to generate the hash index – address of the data block. Hash function can be simple mathematical function to any complex mathematical function.
Hash Index-The prefix of an entire hash value is taken as a hash index. Every hash index has a depth value to signify how many bits are used for computing a hash function. These bits can address 2n buckets. When all these bits are consumed ? then the depth value is increased linearly and twice the buckets are allocated.

Static Hashing:

Static hashing is a file organization technique where a fixed number of buckets (data blocks) is used to store records.
The hash function consistently maps each search key to a specific bucket address. For example, the search key "STUDENT_ID = 104" always maps to bucket 4 using a mod(5) hash function.
Operations in static hashing include insertion, searching, deletion, and updating. The hash function is crucial for determining the bucket address of each record.

Bucket Overflow in Static Hashing:

Bucket overflow occurs when a record needs to be inserted into a bucket that is already full. This situation can be problematic and requires handling.
Two common methods to handle bucket overflow are Open Hashing (linear probing) and Closed Hashing (overflow chaining).
- Open Hashing: When a bucket is full, the system searches for the next available bucket and inserts the record there.
Closed Hashing: A new data bucket is allocated with the same address and linked after the full bucket. This creates a chain of buckets at the same address.

Alternative Methods for Handling Bucket Overflow:

Quadratic Probing: Similar to Open Hashing, it searches for the next available bucket, but the difference between old and new bucket addresses is calculated using a quadratic function.
Double Hashing: Also similar to Open Hashing, but the difference between old and new bucket addresses is calculated using another hash function.
These methods are used when Open Hashing or Closed Hashing might not be suitable for a specific use case or to address performance concerns.

Dynamic Hashing:

Dynamic hashing (or extended hashing) addresses the limitations of static hashing by allowing the number of buckets to grow or shrink dynamically based on the number of records.
In dynamic hashing, the hash function generates a large number of values, and a portion of these values (e.g., the first bit) is used to determine the bucket address.

When a bucket becomes full, dynamic hashing can expand the number of bits used for the address, effectively doubling the number of buckets. This expansion occurs dynamically to accommodate additional records.
Dynamic hashing is particularly useful when the database size is expected to change over time.

Overall, hash organization, whether static or dynamic, provides a method to efficiently locate and manage records in a database using a hash function. Handling bucket overflow is a crucial aspect of hash organization to ensure that data can be accommodated even when the originally allocated bucket becomes full. Dynamic hashing offers scalability, making it suitable for databases with varying data sizes.

PreviousFile Organization NextB+ Tree File Organization: A Guide to Efficient Data Indexing

Last updated 1 year ago

Was this helpful?