Partition Key, Clustering Key

  Apache Cassandra Partition Key, Clustering Key



Partition Key



In Cassandra, the partition key is specified during table creation as part of the PRIMARY KEY definition. 

The partition key is used to determine the node in the cluster where the data will be stored, ensuring data distribution and load balancing.


Syntax for Defining a Partition Key


When creating a table, the partition key is the first part of the PRIMARY KEY definition.


  1. Single-Column Partition Key


If the partition key is based on a single column, it is simply the first column in the PRIMARY KEY definition:


CREATE TABLE users (

    user_id UUID,        -- Partition key

    name TEXT,

    age INT,

    PRIMARY KEY (user_id)

);


user_id is the partition key in this table. Each row will be partitioned based on this column.


  1. Composite Partition Key


If we want to use more than one column as the partition key (composite partition key), wrap the partition key columns in parentheses within the PRIMARY KEY clause:


CREATE TABLE orders (

    customer_id UUID,    -- Part of the partition key

    order_id UUID,       -- Part of the partition key

    order_date TIMESTAMP,

    total DECIMAL,

    PRIMARY KEY ((customer_id, order_id))

);


customer_id and order_id together form the composite partition key. This means the data will be partitioned based on the combination of these two columns



Clustering Key:


Any additional columns in the PRIMARY KEY serve as clustering columns, which define how the data is sorted within a partition.


CREATE TABLE events (

    event_id UUID,           -- Partition key

    timestamp TIMESTAMP,     -- Clustering column

    event_description TEXT,

    PRIMARY KEY (event_id, timestamp)

);



event_id is the partition key.

timestamp is a clustering column, so rows within a partition will be sorted by timestamp.



Primary Key:


The primary key is a combination of the partition key and the clustering key(s) (if any).

It uniquely identifies each row in the table.

The partition key is used to determine which node stores the row, while the clustering key is used to organize rows within the same partition.

Comments

Popular posts from this blog

Peer to Peer Architecture

Virtual Nodes in Ring

Read Repair Chance