💾
Welcome to DataGenesis !
  • 🚀 Welcome to the Database Management System Playground! 📊💾
  • Basics of DBMS
    • Database Management System
    • DBMS V/S File System
    • DBMS Architectures
    • Tier 3 Architecture / Three Schema Architecture
  • E-R Data Model
    • Basics of E-R Model
    • Attributes in E-R Model
    • Null Values
    • Strong & Weak Entities
    • Relationship Constraints
    • Recursive Relationships
    • E-R Diagrams
    • Extended E-R Model
  • Relational Model
    • Relational Model
    • Facts About Relational Model
    • Types of Keys in Relational Model
    • Integrity Constraints
    • Anomalies in Relational Model
  • Transform - ER Model to Relational Model
    • Mapping from ER Model to Relational Model
  • SQL - Structured Query Language
    • SQL
    • CRUD Operations
    • Data Types
    • Type of Commands in SQL
    • Working With Commands
    • Data Retrieval Commands
  • Normalisation
    • Functional Dependencies
    • Armstrong's Axioms
    • Multivalued Dependency
    • 1 Normal Form
    • 2 Normal Form
    • 3 Normal Form
    • Boyce-Codd Normal Form (BCNF)
    • 4 Normal Form
    • 5 Normal Form
    • Lossless Decomposition, Lossless Join ,and Dependency Preserving Decomposition, Denormalization
  • Concurrency Control
    • Transactions & Concurrency
    • Scheduling of Transactions
    • Problems & Strategies in Concurrency Control
    • Transaction & ACID Properties
    • How to implement ACID Properties
    • Atomicity Techniques
    • Durability Techniques
    • Implementing Locking in DBMS
    • Concurrency Control Protocols
      • Two Phase Locking
      • Timestamp Ordering
      • Multi Version Concurrency Control Techniques
    • Starvation in DBMS
    • Deadlock in DBMS
    • Log Based Recovery
  • NoSQL & Types of Databases
    • SQL V/S NoSQL
    • Types of Databases
  • DB Optimization
    • File Organization
      • Hash File Organizations
      • B+ Tree File Organization: A Guide to Efficient Data Indexing
      • Cluster File Organization
    • Indexing in DBMS
      • Primary Indexing
      • Clustered Indexing
      • Secondary Indexing
      • Multilevel Indexing
  • Distributed Databases
    • Database Clustering
    • Partitioning and Sharding
    • CAP Theorm
Powered by GitBook
On this page

Was this helpful?

  1. Distributed Databases

Partitioning and Sharding

Partitioning and sharding are techniques used in database management to manage and distribute large datasets across multiple storage locations or servers. While they share similarities, they have different applications and implementation details.

Partitioning:

  1. Definition:

    • Partitioning is the process of dividing a large database table into smaller, more manageable pieces called partitions.

    • Each partition holds a subset of the data based on specific criteria, such as a range of values in a column, a list of values, or a hashing function.

  2. Purpose:

    • Performance Optimization: Partitioning can significantly improve query performance by allowing the database to scan and retrieve data from a specific partition instead of the entire table.

    • Manageability: Smaller partitions are easier to manage, especially in cases where the dataset is too large to fit into memory or requires maintenance operations.

    • Data Archiving and Purging: Older data can be archived or purged more efficiently by partitioning based on time.

  3. Types of Partitioning:

    • Range Partitioning: Data is divided into partitions based on a specific range of values in a column, e.g., partitioning by date ranges.

    • List Partitioning: Data is divided into partitions based on a list of predefined values in a column.

    • Hash Partitioning: Data is distributed into partitions based on the result of applying a hash function to a column value.

    • Composite Partitioning: Combines multiple partitioning methods to achieve more complex data distribution strategies.

  4. Examples:

    • In a sales database, you can partition the sales data by date, with each partition containing sales transactions for a specific month or year.

    • In a customer database, you can partition customer data by geographic region, where each partition holds customer records from a specific region.

Sharding:

  1. Definition:

    • Sharding is a technique for distributing data across multiple database instances, often on different physical servers or even in different geographic locations.

    • Each database instance is responsible for a subset of the data, and there is no shared storage or single point of failure.

  2. Purpose:

    • Horizontal Scalability: Sharding allows you to scale out your database horizontally by adding more shard servers as your data grows.

    • Load Balancing: Incoming requests are distributed across shards to prevent overloading a single server.

    • Fault Tolerance: Sharding enhances fault tolerance because if one shard server fails, the other shards can continue to operate.

  3. Sharding Strategies:

    • Key-Based Sharding: Data is distributed to shards based on a specific shard key or hash of a key. This ensures that related data is stored on the same shard.

    • Range-Based Sharding: Data is sharded based on ranges of values in a specified column.

    • Directory-Based Sharding: A central directory or service manages the mapping of data to shards, helping route queries to the appropriate shard.

  4. Examples:

    • In a global e-commerce platform, customer data could be sharded based on the geographic region of the customers to reduce latency and comply with data sovereignty laws.

    • In a social media platform, user-generated content like posts and comments could be sharded based on the user's ID or username.

In summary, partitioning and sharding are techniques used to manage large datasets in databases. Partitioning focuses on dividing data within a single database, while sharding involves distributing data across multiple database instances. These techniques are critical for improving performance, manageability, and scalability in data-intensive applications.

PreviousDatabase ClusteringNextCAP Theorm

Last updated 1 year ago

Was this helpful?