Node

Apache Cassandra Node

But as the load on single node increases node gets to crack

We just add more nodes to the ring

Data is distributed evenly across the ring nodes

Each node is assigned a token range

How does cassandra retrieve the data from the correct node ?

Query using partition key only:

eg: select * from users where user_id=1234;

Cassandra takes the partition key and calculates the hash value using default Murmur3 algorithms
Calculated hash value of the partition key is the Token.
Cassandra takes the token and identifies which node can handle this token range.
Once cassandra identifies the node it sends the query , if we have replication factor then it sends to multiple nodes depends on the replication factor and consistency level.
Once data is retrieved from the multiple nodes then it checks which replica node has the latest timestamp
Cassandra then sends the data to the client.

Query using partition key and clustering key:

eg: select * from users where user_id=1234 and email = "ksherasagar@gmail.com";

Cassandra takes the partition key and calculates the hash value using default Murmur3 algorithms
Calculated hash value of the partition key is the Token.
Cassandra takes the token and identifies which node can handle this token range.
Once cassandra identifies the node it sends the query , using clustering key email = "ksherasagar@gmail.com" it narrows down to this data and sorted the result based on email and if we have replication factor then it sends to multiple nodes depends on the replication factor and consistency level.
Once data is retrieved from the multiple nodes then it checks which replica node has the latest timestamp
Cassandra then sends the data to the client.

How to add a new node to existing ring ?

To add a new node to an existing ring in Cassandra, follow these steps:

1. Prepare the New Node

• Install Cassandra on the new node, ensuring it matches the version of the existing nodes.

• Set up the same configurations, particularly for cassandra.yaml.

2. Configure cassandra.yaml

• cluster_name: Ensure it matches the existing cluster.

• seed_provider: Point it to the IPs of at least two existing seed nodes in the cluster.

• listen_address: Set to the IP address of the new node.

• rpc_address: Set to the IP address for client connections (often the same as listen_address).

• auto_bootstrap: Set to true so that the new node receives data.

3. Start the New Node

• Start Cassandra on the new node by running:

sudo cassandra -f

• The new node will communicate with the seed nodes, identify its place in the ring, and begin streaming data it needs.

4. Monitor the Data Streaming Process

• Use nodetool status on an existing node to monitor the new node’s integration into the ring.

• Use nodetool netstats on the new node to check the status of the streaming process, which may take some time depending on data size.

5. Verify the New Node

• Run nodetool status again to ensure the new node is up and showing the status as UN (Up and Normal).

• Once it has the UN status, the new node is fully integrated and the cluster is balanced.

Additional Notes:

• auto_bootstrap is required only when adding new nodes to a ring with data.

• If you’re adding multiple nodes, consider doing it one at a time to avoid overloading the network.

This approach should effectively scale your Cassandra cluster with minimal disruption to its operation.

Node Tool commands

Command	Description.
nodetool bootstrap	Monitor and manage a node's bootstrap process.
Nodetool cleanup	Cleans up keyspaces and partition keys no longer belonging to a node.
Nodetool compact	Forces a major compaction on one or more tables.
nodetool decommission	Deactivates a node by streaming its data to another node.
nodetool flush	Flushes one or more tables from the memtable.
nodetool join	Causes the node to join the ring.