Friday, November 15, 2024

New Apache Cassandra 5.0 gives open source NoSQL database a scalability and performance boost


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


After years of development effort and community discussion, the open-source Apache Cassandra 5.0 database is finally generally available. The new database update offers enterprises the promise of improved performance, AI enablement and better data efficiency.

The new release marks the first major version number change since Apache Cassandra 4.0 was released in 2021. There was also an Apache Cassandra 4.1 update in 2022 that added scalability features and ever since then, the focus has been on 5.0. Apache Cassandra is among the most widely deployed database technologies and is used by big-name organizations including Apple, Netflix and Meta as well as all types of enterprises. Cassandra is developed as a multi-stakeholder open-source technology. Multiple commercial vendors support Cassandra, including DataStax as well as managed database offerings on Amazon Web Services, Microsoft Azure and Google Cloud. 

A key benefit that Cassandra has always had is that it is a massively distributed NoSQL database which enables organizations to have multiple nodes in different locations, that are all kept in synchronization. With 5.0 that distributed nature gets a big boost with a new indexing approach that also improves overall performance.

Apache Cassandra 5.0 also marks the official debut of vector search support in the generally available open-source version of Cassandra. Some commercial Cassandra vendors, notably DataStax integrated the vector support long in advance of the technology being part of the official stable 5.0 release.

“We changed how indexing works in Cassandra, that’s the big change,” Patrick McFaddin, VP of developer relations and Apache Cassandra committer told VentureBeat. “Not only is it vector, but it’s also the way we do normal indexes.”

Why Cassandra’s new data index matters to enterprise users

The new data indexing approach will offer enterprise users all manner of benefits.

McFaddin said that what it means is that now developers have a much easier way to work with Cassandra and they’re not constrained by very tight data models. He noted that previously, in a data modeling exercise, organizations had to be very specific about how the data model was built.

“Now we’re loosening the requirements,” he said. “You can build the data model, have a change, and then just add an index to use that data model in a different way.”

What makes the new indexing approach particularly noteworthy with Apache Cassandra is that it works in a highly distributed way.

“We have users that have five data centers worldwide that are in sync, in a cluster that spans the entire world,” McFaddin said. 

How Cassandra 5.0 improves data density and performance

Beyond the new indexing approach, Cassandra 5.0 introduces a unified compaction strategy that significantly increases data density per node. 

“Instead of having four terabytes per node, now you can have maybe 10 or more terabytes per node,” McFadin said.

The ability to have more data per node will help enterprise users by reducing hardware requirements for large-scale deployments. It will also lower operational costs associated with managing fewer nodes

Cassandra 5.0 also introduces a pair of new data structures known as trie memtables and trie SSTables. McFadin explained that those feature changes align data structures for faster processing and improved overall performance in the database. He noted that by aligning data structure from the user to the disk, the database spends less time doing unnecessary work, leading to these significant performance gains.

“In a nutshell, when you’re looking for data that’s in memory or on a disk or something like that, databases have to go through this massive conversion process,” McFadin explained. ” What the trie features do is it makes everything aligned, so there’s no conversions that need to happen.”

The future of Apache Cassandra is ACID transactions

With Apache Cassandra 5.0 now generally available, the open-source community can turn its full attention to what comes next.

McFadin noted that work on Cassandra 5.1 has actually been going on since November 2023, after a feature freeze came into effect for the 5.0 release. Looking ahead, the Cassandra project is working on implementing full ACID (Atomicity, Consistency, Isolation, Durability) transactions. 

“That is probably the most exciting thing to come to the Cassandra database in 15 years,” he said.


Related Articles

Latest Articles