Glossary
NewSQL Databases: Distributed SQL for Real-Time Applications
NewSQL databases combine SQL semantics with horizontal scalability. Learn how they enable distributed ACID transactions at scale for streaming systems.
NewSQL Databases: Distributed SQL for Real-Time Applications
Introduction: The NewSQL Evolution
For decades, organizations faced a difficult choice when selecting databases. Traditional relational databases (RDBMS) like PostgreSQL and MySQL offered strong ACID guarantees and familiar SQL interfaces, but struggled to scale horizontally. NoSQL databases like Cassandra and MongoDB provided massive scalability, but sacrificed transactional consistency and the powerful querying capabilities of SQL.
NewSQL databases emerged to bridge this gap, offering the best of both worlds: the familiar SQL interface and ACID guarantees of traditional RDBMS, combined with the horizontal scalability and fault tolerance of NoSQL systems.
What is NewSQL?
NewSQL is a category of modern relational database management systems that provide the same scalable performance as NoSQL systems while maintaining the ACID guarantees and SQL interface of traditional databases. The term was coined around 2011 to describe this new generation of distributed databases.
The core promise of NewSQL is simple: you shouldn't have to choose between consistency and scalability. These systems achieve this through modern distributed systems techniques, including:
Distributed consensus protocols like Raft or Paxos for maintaining consistency across nodes
Automatic sharding that distributes data across multiple machines transparently
Global transaction coordination that ensures ACID properties across distributed data
Multi-version concurrency control (MVCC) for high-performance reads without blocking writes
Unlike traditional databases that scale vertically (adding more resources to a single machine), NewSQL databases scale horizontally by adding more commodity servers to the cluster.
Key NewSQL Characteristics
Distributed Architecture
NewSQL databases are designed from the ground up for distributed deployment. Data is automatically partitioned (sharded) across multiple nodes, with each shard replicated for fault tolerance. If a node fails, the system automatically reroutes requests to healthy replicas without data loss or downtime.
Strong Consistency
Despite being distributed, NewSQL databases maintain strong consistency guarantees. Unlike eventually consistent NoSQL systems where different nodes might temporarily show different values, NewSQL databases ensure that all clients see the same view of the data. This is achieved through distributed consensus protocols that coordinate writes across replicas.
Horizontal Scalability
NewSQL databases can grow by adding more machines to the cluster. This happens transparently—applications don't need to be rewritten when the cluster expands. The database automatically rebalances data across the new nodes and adjusts query routing.
SQL Compatibility
NewSQL databases support standard SQL interfaces, making them compatible with existing tools, ORMs, and developer skills. Most implement large subsets of the SQL standard, including joins, transactions, indexes, and constraints.
Popular NewSQL Databases
CockroachDB
Built on Google's Spanner design, CockroachDB uses a distributed key-value architecture with a SQL layer on top. It emphasizes survivability, automatically replicating and rebalancing data across geographically distributed nodes. CockroachDB uses a multi-active availability model, meaning all nodes can accept both reads and writes.
YugabyteDB
YugabyteDB combines PostgreSQL's query layer with a distributed storage engine inspired by Apache HBase. It offers both SQL and Cassandra-compatible APIs. YugabyteDB focuses on PostgreSQL compatibility, making migration from traditional PostgreSQL databases relatively straightforward.
TiDB
Developed by PingCAP, TiDB uses a layered architecture with stateless SQL nodes (TiDB) and a distributed transactional key-value store (TiKV). It's MySQL-compatible and particularly popular in the APAC region. TiDB separates storage (TiKV) from compute (TiDB layer), allowing independent scaling.
Google Spanner
The original inspiration for many NewSQL systems, Spanner is Google's globally distributed database service. It uses atomic clocks and GPS to provide external consistency across data centers worldwide. While proprietary to Google Cloud, Spanner's design papers have influenced the entire NewSQL category.
NewSQL in Streaming Architectures
NewSQL databases play several critical roles in real-time streaming architectures.
Change Data Capture (CDC)
NewSQL databases can serve as source systems for event streams through CDC. Tools like Debezium can capture row-level changes from databases like CockroachDB and YugabyteDB, publishing them to Kafka topics. Because NewSQL databases support transactions, CDC can capture changes in a consistent, ordered manner that preserves transactional boundaries.
For example, a financial transaction involving multiple account updates can be captured as a single logical change event, maintaining the atomicity of the original transaction as it flows through the streaming pipeline.
Real-Time OLTP
NewSQL databases excel as the operational data store for real-time applications that consume from streams. Consider an inventory management system that processes order events from Kafka. The system needs to:
Update inventory counts transactionally
Enforce constraints (no overselling)
Serve real-time queries about current stock levels
Scale to handle high order volumes
A NewSQL database can handle all these requirements while maintaining ACID guarantees that prevent inventory inconsistencies, even under high concurrency.
Event Sourcing and CQRS
In event sourcing architectures, NewSQL databases often serve as the read model (query side) in CQRS patterns. Events from Kafka are consumed and projected into queryable tables that support complex analytical queries. The distributed nature of NewSQL databases allows these read models to scale independently of the write-optimized event log.
Stream Processing State Stores
While stream processors like Kafka Streams and Flink typically use embedded state stores, some architectures use NewSQL databases as external state stores when state needs to be:
Queried by multiple applications
Accessed via SQL for ad-hoc analysis
Persisted with strong durability guarantees
Shared across geographically distributed stream processors
NewSQL vs Traditional Databases and NoSQL
Aspect | Traditional RDBMS | NoSQL | NewSQL |
|---|---|---|---|
Scalability | Vertical (limited) | Horizontal (unlimited) | Horizontal (unlimited) |
Consistency | Strong (ACID) | Eventual (typically) | Strong (ACID) |
Query Language | SQL | Varies | SQL |
Transaction Support | Full ACID | Limited/None | Full ACID |
CAP Trade-off | CP (availability suffers during partitions) | AP (consistency suffers) | CP (optimized for availability) |
Use Cases | Traditional OLTP | High-volume writes, flexible schema | Distributed OLTP, real-time apps |
CAP Theorem Considerations
The CAP theorem states that distributed systems can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance. NewSQL databases choose consistency and partition tolerance (CP), but use advanced techniques to maximize availability:
Multi-region replication for geographic redundancy
Automatic failover with minimal downtime
Read replicas for distributing query load
Sophisticated consensus protocols that minimize coordination overhead
While they may have slightly lower availability during network partitions compared to AP systems like Cassandra, modern NewSQL databases achieve availability levels that meet most production requirements.
When to Choose NewSQL
Ideal Use Cases
NewSQL databases are particularly well-suited for:
Financial Services: Transaction processing that requires strong consistency, such as payment systems, trading platforms, and account management. The combination of ACID guarantees and horizontal scalability makes NewSQL ideal for financial workloads that can't tolerate inconsistency but need to scale globally.
Inventory and Supply Chain: Real-time inventory management where overselling prevention is critical. NewSQL transactions ensure that concurrent order processing doesn't violate stock constraints, while scalability handles peak shopping periods.
User-Facing Applications: Social platforms, SaaS applications, and gaming systems that need responsive queries, transactional updates, and the ability to scale with user growth. The SQL interface accelerates development with familiar tools and patterns.
IoT and Time-Series: Ingesting sensor data at scale while supporting complex analytical queries. NewSQL databases can handle high write throughput while maintaining the ability to run sophisticated SQL analytics.
Multi-Region Deployments: Applications serving global users with low-latency requirements across regions. NewSQL databases like CockroachDB and Spanner can place data close to users while maintaining consistency.
Decision Criteria
Choose NewSQL when you need:
Strong consistency across distributed data
SQL interface for developer productivity and tool compatibility
Horizontal scalability beyond what a single database server can provide
High availability with automatic failover
Complex transactions involving multiple rows or tables
Regulatory compliance requiring ACID guarantees and audit trails
Consider alternatives when:
Simple key-value access patterns dominate (consider NoSQL)
Eventual consistency is acceptable (consider NoSQL)
You don't need horizontal scalability (consider traditional RDBMS)
Cost is the primary concern and scale is limited (traditional RDBMS is typically cheaper)
Governance and Compliance
In regulated industries, NewSQL databases offer advantages for data governance. Governance platforms provide layers for streaming platforms, ensuring data quality, access control, and compliance. When NewSQL databases feed streams via CDC, proper governance ensures that sensitive database changes are properly classified, masked, and audited as they flow through the streaming platform.
The combination of transactional guarantees in NewSQL and streaming governance creates end-to-end data integrity—from the database transaction log through Kafka topics to downstream consumers.
Conclusion
NewSQL databases represent a significant evolution in database technology, eliminating the false choice between consistency and scalability. For real-time streaming architectures, they serve as reliable, scalable systems of record that can both source and sink event streams while maintaining strong transactional guarantees.
The key is understanding when the trade-offs of NewSQL—typically slightly higher latency than pure NoSQL and higher cost than traditional single-server databases—are worth the benefits of distributed ACID transactions and horizontal scalability. For applications where correctness and consistency are paramount, but scale is essential, NewSQL databases are often the optimal choice.
Sources and References
CockroachDB Architecture Documentation - Detailed architecture of a leading NewSQL database
Google Spanner: TrueTime and External Consistency - Original research paper on Spanner's distributed transaction model
YugabyteDB Documentation - Architecture and design of PostgreSQL-compatible NewSQL
CAP Theorem: Revisiting the CAP Theorem - Eric Brewer's updated perspective on CAP theorem and modern databases
Debezium CDC for Change Data Capture - Connecting NewSQL databases to streaming platforms