cloud storage mechanisms

A Final Year Project by Ashley Blackmore at The University of Sydney


So I’m done. It has been a hell of a ride. Spent the weekend catching up with friends and sleep.

Next up:

  1. Thesis presentation
  2. Open-sourcing of code
  3. Kick my final two uni exams to the kerb
  4. Publish results

Results, DBRG Presentation

There are several key relationships in the data. I am currently plotting the results. matplotlib is so nice to work with, and its colormaps are pretty.

I will be presenting the implementation and current results of my research at the University of Sydney DBRG (Database Reading Group) this Thursday.

Squashed two bugs

A couple of bugs have shown up after a whole bunch of testing. One was a concurrency issue in the  YCSB log writer (The log is written from a Java Vector, so had synchronization issues when I tried to write from the consistency checking thread). I kept the vector, but changed it so it performed its write from within a synchronized block.

The other bug was quite interesting. I had not tested with a number of keys above 10000, revealing a flaw: nulls were being put into the work queue from reads of pre-existent writes. This resulted in an NPE during the consistency checking. These are now discarded.

The fixing of these issues will allow me to get a wider range of results.

git repos

I should probably post my git repos here too.

These are the parsers I wrote for producing results:

I won’t be posting my main algorithm until the thesis proper is complete. It needs quite a bit of refactoring before I feel it will be presentable.

Draft completed, starting final edit

Results are still coming in for processing. Doing over 4000 test runs per day!

Algorithm (Actually) Completed

My procedures have been debugged, approved and I have been pulling results. Happy days…

Algorithm Completed…

I have worked out the majority of the algorithm logic, but there a few bugs to squash.

The next priority is to set up the set of false transactions, so that we may test the database’s consistency properties.

Digraph Implementation

I have implemented digraphs to hold the register data - now I have to figure out how to get data into tens of thousands of them without bottlenecking anywhere. Cakewalk.

What consistency does your key-value store actually provide?

An operation on the register is either a read (i.e., get) or a write (i.e., put). Using a global clock, we assign each operation a start time and a finish time. We say opera- tion A time-precedes operation B, written as A < B, if A finishes before B starts. If neither A < B nor B < A, then A and B are said to be concurrent with each other, written as A||B. It is easy to see that this time precedence relation imposes a partial order on the operations. A valid total order is one that conforms to this partial order. Given a total order, each read has a unique most recent write, if any. A register has the following semantics if there exists a valid total order that satisfy the corresponding conditions.

A register is said to be safe if a read not concurrent with any writes returns the value of the most recent write, and a read concurrent with some writes returns any value.

A register is said to be regular if a read not concurrent with any writes returns the value of the most recent write, and a read concurrent with some writes returns either the value of the most recent write, or the value of one of the concurrent writes.

A register is said to be atomic if every read (regardless of whether they are concurrent with any writes or not) returns the value of the most recent write. It is easy to see that safety is weaker than regularity, which is in turn weaker than atomicity.

from What consistency does your key-value store actually provide? by Eric Anderson, Xiaozhou Li, Mehul A. Shah, Joseph Tucek, Jay J. Wylie — Hewlett-Packard Laboratories

Progress and Upcoming Hurdles

I have a made a great deal of progress with the algorithm. I have written the safety verification code. This presented a number of hurdles:

Firstly, migrating the code from my custom client back over to YCSB has been difficult, due to a number of threading  problems.

Secondly, since I have to pull values out of the underlying database for checking, I have had to add value-reading functionality to the underlying clients.

Another issue (now solved) was that YCSB-generated entries to the database were not being escaped correctly somewhere in the build pipeline. This was a massive timesink, since I thought I had run into a concurrency bug. For now, all database field values entries are alphanumeric. This should be more than sufficient for testing. The lesson learned is that the rubber ducky debugging should have started sooner rather than later.

Current (high-priority) tasks

The sequential consistency verification algorithm must be written. The algorithm requires that each operation can be traced back to its issuing process:

[To] verify sequential consistency, we need to know which

process issues which operation in the history, in contrast to the pre-

vious three consistency properties. In this section, we assume that

the history includes such information, and that we know in advance

the full set of processes that may issue operations. Both assump-

tions can be realized easily in practice.

Offline verification of sequential consistency is NP-complete if

writes can assign duplicate values [14, 27], but admits straight-

forward solutions if we assume that each write assigns a unique


Due to the way that YCSB launches operations, this may require some cross-thread communication (this is a major hurdle).

Future (low-priority) tasks

The atomicity verification algorithm 

Give value-reading functionality to each YCSB database client. I have already implemented (but not thoroughly tested) this for the CassandraClient10 class. I do not believe this is necessary for my thesis, but plan to contribute the necessary code to the open-source community once my thesis is complete.