- Cassandra does not have a central coordinator, .e.g., namenode in hdfs or hmaster in hbase. It does not have single point failure (SPF)
- Every data row needs have a primary key. Primary key is hashed using MD5
- Cassandra use consistent hashing for keys distribution
- Cassandra data storage model is different from hdfs
- Cassandra enterprise version (product of DataStax) have developed a hdfs compatible filesystem, Cassandra file system (CFS), which can support hadoop M/R, hive, etc.
- The idea is to use dedicate Cassandra table column family to store file meta data (file name, path). And use another one to store file block data. By doing this, it removes the need for name node and data node. Basically, it re-implement hdfs api.
- A data node in Cassandra serves both as data node as well as coordinator. A client cannot connect to any number of data node for reading data. If the data does not exist in current collected node, it will compute the sharding and forward the request to the correct destination node. This is different as Hbase, where client directly know the destination region server through the sharding meta data saved in zookeeper.