A Deep dive in ACID and the real challenges in Distributed Systems
ACID is an acronym for (Atomicity, Consistency, Isolation, Durability) and is closely tied to the development of relational database systems and the need for reliable and transactional data management.
The principles of Atomicity, Consistency, Isolation, and Durability were formally introduced as ACID in academic literature during the 1970s and 1980s.
Foundations of ACID
Let’s go dip on each element from ACID.
- Atomicity (A): Atomicity ensures that a transaction is treated as a single, indivisible unit of work. Either all the changes made by the transaction are committed, or none of them are. There is no partial completion of the transaction.
- Consistency (C): Consistency ensures that a transaction brings the database from one valid state to another. The database must satisfy integrity constraints before and after the transaction, ensuring that the data remains in a consistent state.
- Isolation (I): Isolation ensures that the execution of one transaction is isolated from the execution of other transactions. Each transaction should appear as if it is the only transaction executing, even in a multi-user environment. Isolation prevents interference between concurrent transactions.
- Durability (D): Durability guarantees that once a transaction is committed, its changes are permanent and will survive subsequent failures, such as power outages or crashes. The committed changes are stored in non-volatile storage (e.g., disk) to ensure durability.
Example:
Consider a banking application transferring funds between two accounts. The transaction’s life cycle might involve multiple steps:
- Atomicity: If any step fails (e.g., deducting funds from one account or crediting to another), the entire transaction is rolled back to maintain atomicity.
- Consistency: Before and after the transaction, the system ensures that account balances satisfy consistency constraints, preventing situations where the total balance is inconsistent.
- Isolation: If another transaction is simultaneously updating the same accounts, isolation prevents interference, ensuring that each transaction sees a consistent snapshot of the data.
- Durability: Once the funds transfer is committed, the changes persist even if the system encounters a failure, providing durability.
Transaction
A transaction is a sequence of one or more operations that are executed as a single unit.
In a transactional database system, the isolation level refers to the degree to which the operations within one transaction are isolated or separated from the operations of other concurrent transactions. Different isolation levels provide different levels of protection against issues such as dirty reads, non-repeatable reads, and phantom reads. Let’s explore these concepts further:
SQL Phenomena
Dirty read
A transaction reads data written by a concurrent uncommitted transaction.
Nonrepeatable read
A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).
Phantom read
A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.
Serialization anomaly
The result of successfully committing a group of transactions is inconsistent with all possible orderings of running those transactions one at a time.
Isolation Levels
Isolation levels are important for dealing with SQL phenomena, and managing the trade-off between data consistency and system performance in a multi-user environment.
The common isolation levels are defined by the SQL standard, and they include:
Read Uncommitted:
- Allows transactions to read uncommitted changes made by other transactions.
- No isolation is provided, and there is a high risk of dirty reads, non-repeatable reads, and phantom reads.
- Generally not used in practice due to its lack of data consistency guarantees.
Read Committed:
- Ensures that a transaction can only read committed changes made by other transactions.
- Eliminates dirty reads but may still allow non-repeatable reads and phantom reads.
- Provides a higher level of consistency compared to Read Uncommitted.
Repeatable Read:
- Guarantees that a transaction can repeatedly read the same set of committed data, preventing non-repeatable reads.
- Still allows for phantom reads, where new rows meeting the search criteria may appear between consecutive reads.
- Provides a higher level of consistency compared to Read Committed.
Serializable:
- Provides the highest level of isolation by ensuring that transactions are executed in a manner that is equivalent to some serial order.
- Eliminates dirty reads, non-repeatable reads, and phantom reads.
- Ensures the strictest consistency but may lead to increased contention and reduced concurrency.
PostgreSQL — Transaction Isolation Levels
Challenges with ACID in Distributed Systems
In a Distributed System, achieving ACID is hard due to many challenges:
High-Latency Networks:
- Challenge: In distributed systems spanning across different geographical locations, high-latency networks become a significant challenge. ACID transactions typically assume low-latency, which may not be the case in a globally distributed environment.
- Issue: The time taken for transactions to travel between nodes increases, impacting the overall performance and responsiveness of the system.
Global Data Distribution:
- Challenge: Distributed systems often involve replicating data across multiple nodes for scalability and fault tolerance. ACID transactions struggle when dealing with globally distributed data due to the time it takes to synchronize updates.
- Issue: Maintaining consistency across distributed replicas becomes complex, and the potential for conflicts or inconsistencies increases
Consistency vs. Availability:
- Challenge: The CAP theorem (Consistency, Availability, Partition tolerance) suggests that in a distributed system, it’s impossible to achieve all three simultaneously. ACID transactions often prioritize consistency, but achieving both consistency and high availability becomes a trade-off.
- Issue: During network partitions or failures, maintaining both consistency and availability becomes a delicate balance.
Transaction Coordination:
- Challenge: Coordinating transactions across distributed nodes requires sophisticated protocols. Traditional two-phase commit (2PC) protocols can introduce blocking issues and are not well-suited for large-scale distributed systems.
- Issue: Bottlenecks can occur as transactions wait for global coordination, impacting system performance.
SAGA and BASE
Due to these challenges related to ACID, we have some alternatives to overcome them, especially when we talk about distributed systems.
SAGA is a pattern for managing distributed transactions across multiple services in a way that ensures eventual consistency.
BASE is an acronym that stands for “Basically Available, Soft state, Eventually consistent.”.
These 2 strategies, SAGA and BASE provide alternative approaches to distributed transactions, emphasizing resilience, scalability, and flexibility. While SAGA focuses on orchestrating sequences of localized transactions with compensations, BASE relaxes consistency constraints to prioritize availability and scalability. These concepts offer valuable strategies for dealing with the challenges inherent in distributed systems.
Conclusion
When we are talking about databases, we should be aware of the 4 properties Atomicity, Consistency, Isolation, and Durability.
Maintaining these 4 properties in a distributed system can be quite a challenge due to several reasons. So we have patterns and strategies for this, and generally provide solutions more focused on eventual consistency to guarantee high availability.
We discussed only two ways to handle these challenges, but like everything else, they are not a one-size-fits-all solution. If you’re interested in learning more, stay tuned as I’ll be sharing more related information soon 😁!
PS: If you liked this article and want to stay updated, I invite you to subscribe to my free newsletter: https://devjava.substack.com/