Java – Volatile keyword and Double Checked Locking

Java volatile keyword:

(http://tutorials.jenkov.com/java-concurrency/volatile.html)

  1. A CPU may have its own cache, a variable’s value may reside in cpu’s cache when a thread read it.
  2. volatile keyword inform enforce that the variable need to be read/write from/to main memory (i.e., no cpu cache)

volatile keyword also enforce that

  • If Thread A writes to a volatile variable and Thread B subsequently reads the same volatile variable, then all variables visible to Thread A before writing the volatile variable, will also be visible to Thread B after it has read the volatile variable. Example:
int a = 0;
volatile int b = 0;
a = 1;
b= 2;    
// Here, not only b's value need to be write into main memory, but also               
// a need to be write to main memory so that all other thread can read                 
// the same a value as current thread
  • The reading and writing instructions of volatile variables cannot be reordered by the JVM (the JVM may reorder instructions for performance reasons as long as the JVM detects no change in program behavior from the reordering). Instructions before and after can be reordered, but the volatile read or write cannot be mixed with these instructions. Whatever instructions follow a read or write of a volatile variable are guaranteed to happen after the read or write.

For a given shared variable, if there is one write thread, and multiple read thread, volatile keyword is good enough to maintain the correct access concurrency. If there is multiple write thread, volatile keyword is not good enough anymore, we need synchronization (lock) for write access.

Double checked locking problem

(https://en.wikipedia.org/wiki/Double-checked_locking)

Following code return a helper variable, works ok in single-thread case.

// Single-threaded version
class Foo {
    private Helper helper;
    public Helper getHelper() {
        if (helper == null) {
            helper = new Helper();
        }
        return helper;
    }

    // other functions and members...
}

To support multi-thread, synchronization works but could be expensive

class Foo {
    private Helper helper;
    public synchronized Helper getHelper() {
        if (helper == null) {
            helper = new Helper();
        }
        return helper;
    }

    // other functions and members...
}

Following code may looks works, but actually not in java. Note that the purpose of this double-checked locking code is to do lazy-initialization, while trying to avoid synchronization as much as possible.

// Broken multithreaded version
// "Double-Checked Locking" idiom
class Foo {
    private Helper helper;
    public Helper getHelper() {
        if (helper == null) {  // check 1
            synchronized(this) {
                if (helper == null) {  // check 2
                    helper = new Helper();    // potential problem
                }
            }
        }
        return helper;
    }

    // other functions and members...
}

Why it could fails? Because when helper = new Helper(); is called in thread 1, before Helper’s constructer  is fully finished and the helper object instance is fully initialized,  JVM may optimize the code to make helper variable available (i.e, not null anymore). This means when thread 2 call getHelper function before thread 1 finish the new statement, thread 2 can get “helper” variable, even though it is not fully initialized. If thread 2 access helper.xxx, where xxx is an un-initialized member in Helper object, problem could happens.

Using volatile keyword can help solve the problem as follows:

// Works with acquire/release semantics for volatile in Java 1.5 and later
// Broken under Java 1.4 and earlier semantics for volatile
class Foo {
    private volatile Helper helper;
    public Helper getHelper() {
        Helper result = helper;
        if (result == null) {
            synchronized(this) {
                result = helper;
                if (result == null) {
                    helper = result = new Helper();
                }
            }
        }
        return result;
    }

    // other functions and members...
}

Since helper is volatile, it will not be make available before it is fully initialized (i.e., no JVM optimization).

Note the local variable result, which seems unnecessary. The effect of this is that in cases where helper is already initialized (i.e., most of the time), the volatile field is only accessed once (due to “return result;” instead of “return helper;”), which can improve the method’s overall performance by as much as 25 percent.[7]

Best lazy singleton lazy initialization practice

this relies on the fact that nested classes are not loaded until they are referenced.

// Correct lazy initialization in Java
class Foo {
    private static class HelperHolder {
       public static final Helper helper = new Helper();
    }

    public static Helper getHelper() {
        return HelperHolder.helper;
    }
}

 

Java – Volatile keyword and Double Checked Locking

ACID explained

Acid: Atomicity, Consistency, Isolation, Durability

Isolation Level Explained:

Assuming that there is one read transaction (with two exact same read) and one write transaction, the time sequence are (read1,  write1 + commit write, read2 + commit read)

Serializable (Higest Level)-  Read and write transaction need to be fully serializable, i.e., One cannot start after the other one finished

Repeatable reads – No range lock support, this means it is possible for write transaction to delete a row and add another back. Even though the final result may (or maynot) be same for read (e.g, count(*), max() ), but it may not be expected.   The problem this causes is called Phantom Read.  For single row read, the DBMS must return the old value for the second SELECT.

Read commit – Each read is guaranteed to see “committed” write transaction value.  But two read is not guaranteed to get the same result. The problem this causes is called “non-repeatable read”

Read uncommitted – Read may get committed write result (dirty value). The problem this causes is called “Dirty read”

In summary, we have

Dirty Read            non-repeatable Read      Phantom Read

Read uncommitted             V                                   V                                        V

Read committed                 X                                    V                                        V

Repeatable read                 X                                    X                                        V

Serializable                         X                                    X                                        X

ACID explained

Understanding Paxos/Raft/Zab Algorithm

Paxos/Raft/Zab are the different variation of distributed consensus algorithms. They follow two key principles, namely 1) two phase commit, and 2) quorum majority vote. A value is commit only when 1) majority nodes have went through 2-phase commit process.

Paxos algorithms:

  1. There are two roles in the system: proposer, candidate, follower and client. The proposer and follower are the two most important roles in the algorithms.
  2. Every client can change/set some value through the proposer it connects. If two client want to change same value at the same time, two proposer essentially compete for the majority vote.
  3. We can think of so called “value” as a particular “key”, whose value we want to change from version X to version X+1.
  4. Multiple clients/proposals are competing to change  the same key’s corresponding value from version X to version X+1 at the same time, which forms a so called “instance” in Paxos algorithm
  5. Here is good explanation for Paxos algorithm in Chinese. (http://blog.csdn.net/sinat_35172510/article/details/51547388 )

 

Raft algorithms:

  1. Raft algorithms simplify Paxos algorithm by requiring there is a Single proposal at any given time in the system. Only this proposer make proposal.
  2. When leader selection is needed, a random timeout is added to each candidate before it start proposal itself to be the leader. This random timeout helps to minimize competition between different candidate.

 

Zookeeper atomic broadcasting algorithms (Zab):

  1. The difference between Zab and Paxos https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos

 

Related resources are as follows.

Raft:

 

Understanding Paxos/Raft/Zab Algorithm

Two-phase commit

Key points for two phase commit:

  1. Multiple party need finish an action together in a distributed environment. Assume there is an coordinator to coordinate the action for all parties and everyone can write “redo/undo” log.
  2. Assume that every party can recover/revert the action as long as the log is written to disk.

Commit request phase

or voting phase

  1. The coordinator sends a query to commit message to all cohorts and waits until it has received a reply from all cohorts.
  2. The cohorts execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo logand an entry to their redo log.
  3. Each cohort replies with an agreement message (cohort votes Yes to commit), if the cohort’s actions succeeded, or an abortmessage (cohort votes No, not to commit), if the cohort experiences a failure that will make it impossible to commit.

Commit phase

or Completion phase

Success

If the coordinator received an agreement message from all cohorts during the commit-request phase:

  1. The coordinator sends a commit message to all the cohorts.
  2. Each cohort completes the operation, and releases all the locks and resources held during the transaction.
  3. Each cohort sends an acknowledgment to the coordinator.
  4. The coordinator completes the transaction when all acknowledgments have been received.

Failure

If any cohort votes No during the commit-request phase (or the coordinator’s timeout expires):

  1. The coordinator sends a rollback message to all the cohorts.
  2. Each cohort undoes the transaction using the undo log, and releases the resources and locks held during the transaction.
  3. Each cohort sends an acknowledgement to the coordinator.
  4. The coordinator undoes the transaction when all acknowledgements have been received.
Two-phase commit