Node

 

Apache Cassandra Node






But as the load on single node increases node gets to crack





We just add more nodes to the ring 





Data is distributed evenly across the ring nodes


Each node is assigned a token range 





How does cassandra retrieve the data from the correct node ?


Query using partition key only: 


  1. When we query data using partition key  

eg: select * from users where user_id=1234; 

  1. Cassandra takes the partition key and calculates the hash value using default Murmur3 algorithms 
  2. Calculated hash value of the partition key is the Token.
  3. Cassandra takes the token and identifies which node can handle this token range.
  4. Once cassandra identifies the node it sends the query , if we have replication factor then it sends to multiple nodes depends on the replication factor and consistency level. 
  5. Once data is retrieved from the multiple nodes then it checks which replica node has the latest timestamp 
  6. Cassandra then sends the data to the client.


Query using partition key and clustering key: 


  1. When we query data using partition key  

eg: select * from users where user_id=1234 and email = "ksherasagar@gmail.com"; 

  1. Cassandra takes the partition key and calculates the hash value using default Murmur3 algorithms 
  2. Calculated hash value of the partition key is the Token.
  3. Cassandra takes the token and identifies which node can handle this token range.
  4. Once cassandra identifies the node it sends the query , using clustering key  email = "ksherasagar@gmail.com" it narrows down to this data and sorted the result based on email and if we have replication factor then it sends to multiple nodes depends on the replication factor and consistency level. 
  5. Once data is retrieved from the multiple nodes then it checks which replica node has the latest timestamp 
  6. Cassandra then sends the data to the client.


How to add a new node to existing ring ?



To add a new node to an existing ring in Cassandra, follow these steps:


1. Prepare the New Node


Install Cassandra on the new node, ensuring it matches the version of the existing nodes.

Set up the same configurations, particularly for cassandra.yaml.


2. Configure cassandra.yaml


cluster_name: Ensure it matches the existing cluster.

seed_provider: Point it to the IPs of at least two existing seed nodes in the cluster.

listen_address: Set to the IP address of the new node.

rpc_address: Set to the IP address for client connections (often the same as listen_address).

auto_bootstrap: Set to true so that the new node receives data.


3. Start the New Node


Start Cassandra on the new node by running:


sudo cassandra -f


The new node will communicate with the seed nodes, identify its place in the ring, and begin streaming data it needs.


4. Monitor the Data Streaming Process


Use nodetool status on an existing node to monitor the new node’s integration into the ring.

Use nodetool netstats on the new node to check the status of the streaming process, which may take some time depending on data size.


5. Verify the New Node


Run nodetool status again to ensure the new node is up and showing the status as UN (Up and Normal).

Once it has the UN status, the new node is fully integrated and the cluster is balanced.


Additional Notes:


auto_bootstrap is required only when adding new nodes to a ring with data.

If you’re adding multiple nodes, consider doing it one at a time to avoid overloading the network.


This approach should effectively scale your Cassandra cluster with minimal disruption to its operation.



Node Tool commands


Command 

Description.                       

nodetool bootstrap

Monitor and manage a node's bootstrap process.

Nodetool cleanup

Cleans up keyspaces and partition keys no longer belonging to a node.

Nodetool compact

Forces a major compaction on one or more tables.

nodetool decommission

Deactivates a node by streaming its data to another node.

nodetool flush

Flushes one or more tables from the memtable.

nodetool join

Causes the node to join the ring.

   



Comments

Popular posts from this blog

Peer to Peer Architecture

Virtual Nodes in Ring

Read Repair Chance