Tuesday, August 19, 2014

TRAC Audit: Do-It-Yourself Demos

In my post TRAC Audit: Process I explained how we demonstrated the LOCKSS Polling and Repair Protocol to the auditors, and linked to the annotated logs we showed them. These demos have been included in the latest release of the LOCKSS software. Below the fold, and now in the documentation, are step-by-step instructions allowing you to replicate this demo.


These instructions have been tested on a vanilla install of Ubuntu 14.04.1, up-to-date as of August 4. They should work on other recent Debian-based Linux systems, but there are no guarantees and no support - if they don't work it is up to you not me to figure out why.

The first step is to install the pre-requisites:
foo@bar:~$ cd
foo@bar:~$ sudo apt-get install default-jdk ant subversion libxml2-utils
[sudo] password for foo:
Reading package lists... 0%
...
0 upgraded, 46 newly installed, 0 to remove and 0 not upgraded.
Need to get 70.6 MB of archives.
After this operation, 127 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
...
done.
foo@bar:~$ ls -l /etc/alternatives/javac
lrwxrwxrwx 1 root root 42 Aug  4 18:45 /etc/alternatives/javac -> /usr/lib/jvm/java-7-openjdk-i386/bin/javac
foo@bar:~$ export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
foo@bar:~$ 
The next step is to check the latest release of the LOCKSS daemon out from SourceForge:

foo@bar:~$ svn checkout svn://svn.code.sf.net/p/lockss/svn/lockss-daemon/tags/last_released_daemon lockss-daemon
...
foo@bar:~$ 
The next step is to build the LOCKSS daemon. This takes a while, there's a lot of code to build. It generates Java warnings that you should be able to ignore, but no errors. Just to be sure that everything is OK, we run the unit and functional tests on the daemon that gets built. This takes much longer, especially on the little netbook I'm using to test the instructions:

foo@bar:~$ mkdir ~/.ant
foo@bar:~$ cd ~/.ant
foo@bar:~$ ln -s ~/lockss-daemon/lib .
foo@bar:~$ cd ~/lockss-daemon
foo@bar:~/lockss-daemon$ ant
Buildfile: /home/foo/lockss-daemon/build.xml
...
BUILD SUCCESSFUL
Total time: 81 minutes 21 seconds

real 81m21.980s
user 84m29.308s
sys 5m13.708s
foo@bar:~/lockss-daemon$  
The next step is to configure STF for the demos. The demos work without this configuration, but they are much more informative with it:


foo@bar:~$ cd ~/lockss-daemon/test/frameworks/run_stf
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ cp testsuite.opt.demo testsuite.opt
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
This configuration ensures that:
  • The logs contain detailed information about the polling and repair process.
  • The logs aren't deleted after the demo.
  • The daemons stay running until you hit Enter. This allows you use a Web browser to access the UI of the daemons and see the polling and voting status pages. See the STF README.txt file for details of how to do this.
Now you can go ahead and run the first demo in the STF test framework. It creates a network of 5 LOCKSS boxes each preserving the same Archival Unit (AU) of synthetic content, and causes the first box to call a poll on it, which should result in complete agreement among the boxes:

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ python testsuite.py AuditDemo1
11:27:35.057: INFO: ===================================
11:27:35.057: INFO: Demo a V3 poll with no disagreement
11:27:35.057: INFO: -----------------------------------
11:27:35.250: INFO: Starting framework in /home/foo/gamma/lockss-daemon/test/frameworks/run_stf/testcase-1
11:27:35.266: INFO: Waiting for framework to become ready
11:27:45.624: INFO: Creating simulated AU's
11:27:47.546: INFO: Waiting for simulated AU's to crawl
11:27:47.759: INFO: AU's completed initial crawl
11:27:47.760: INFO: No nodes damaged on client localhost:8041
11:27:47.777: INFO: Waiting for a V3 poll to be called...
11:28:18.087: INFO: Successfully called a V3 poll
11:28:18.088: INFO: Checking V3 poll result...
11:28:18.215: INFO: Asymmetric client localhost:8042 repairers OK
11:28:18.249: INFO: Asymmetric client localhost:8043 repairers OK
11:28:18.287: INFO: Asymmetric client localhost:8044 repairers OK
11:28:18.322: INFO: Asymmetric client localhost:8045 repairers OK
11:28:18.425: INFO: AU successfully polled
11:28:19.427: INFO: No deadlocks detected
>>> Delaying shutdown.  Press Enter to continue...
11:29:08.161: INFO: Stopping framework
----------------------------------------------------------------------
Ran 1 test in 93.213s

OK
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
You will find that the demo has created a file system tree under testcase-1 with a directory for each of the five boxes in the network:

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ ls testcase-1
daemon-8041  daemon-8042  daemon-8043  daemon-8044  daemon-8045  lockss.opt  lockss.txt
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
daemon-8041 is the poller, the box that called the poll and tallied the result. You can see its log (an annotated version is here):

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ ls -l testcase-1/daemon-8041/test.out
-rw-rw-r-- 1 foo foo 31399 Aug  5 13:56 testcase-1/daemon-8041/test.out
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
daemon-8042 through daemon-8045 are the voters, the boxes whose content is compared with the poller's. You can see their logs (an annotated version is here):

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ ls -l testcase-1/daemon-8042/test.out
-rw-rw-r-- 1 foo foo 14755 Aug  5 13:56 testcase-1/daemon-8042/test.out
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
Now we clean up in preparation for the second demo:

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ ./clean.sh
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
In the second demo one of the daemons calls a poll, but before it does one file in its simulated content is damaged. The other 4 vote, and they all disagree with the poller about the damaged file. The poller requests a repair of this file from one of the voters. Once the repair is received, the poller re-tallies the poll and now finds 100% agreement. The logs end up in the usual place, annotated versions are available for the poller and a voter.

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ python testsuite.py AuditDemo2
11:16:24.793: INFO: ================================================
11:16:24.793: INFO: Demo a basic V3 poll with repair via open access
11:16:24.793: INFO: ------------------------------------------------
11:16:24.987: INFO: Starting framework in /home/foo/gamma/lockss-daemon/test/frameworks/run_stf/testcase-1
11:16:25.002: INFO: Waiting for framework to become ready
11:16:35.392: INFO: Creating simulated AU's
11:16:37.454: INFO: Waiting for simulated AU's to crawl
11:16:37.671: INFO: AU's completed initial crawl
11:16:38.320: INFO: Damaged the following node(s) on client localhost:8041:
   http://www.example.com/branch1/001file.txt
11:16:38.337: INFO: Waiting for a V3 poll to be called...
11:17:03.523: INFO: Successfully called a V3 poll
11:17:03.523: INFO: Waiting for V3 repair...
11:17:03.765: INFO: Asymmetric client localhost:8042 repairers OK
11:17:03.802: INFO: Asymmetric client localhost:8043 repairers OK
11:17:03.839: INFO: Asymmetric client localhost:8044 repairers OK
11:17:03.869: INFO: Asymmetric client localhost:8045 repairers OK
11:17:03.943: INFO: AU successfully repaired
11:17:04.945: INFO: No deadlocks detected
>>> Delaying shutdown.  Press Enter to continue...
11:17:15.661: INFO: Stopping framework
----------------------------------------------------------------------
Ran 1 test in 50.956s

OK
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
Now we clean up in preparation for the third demo:

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ ./clean.sh
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
In the second demo, the simulated content was open access, so there was no restriction on the voter sending a repair to the poller. The common case is that the content is not open access, in which case the voter has to remember agreeing with the poller in the past about the AU being repaired so that it doesn't leak content to boxes that could not get it directly from the publisher.

In the third demo the daemons achieve agreement on the non-open access content before damage is created at the poller. Then when the poller next calls a poll, detects the damage and requests a repair, the voter remembers the prior agreement and sends a repair.

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ python testsuite.py AuditDemo3
11:18:05.527: INFO: =======================================================
11:18:05.527: INFO: Demo a basic V3 poll with repair via previous agreement
11:18:05.527: INFO: -------------------------------------------------------
11:18:05.722: INFO: Starting framework in /home/foo/gamma/lockss-daemon/test/frameworks/run_stf/testcase-1
11:18:05.743: INFO: Waiting for framework to become ready
11:18:21.199: INFO: Creating simulated AU's
11:18:23.138: INFO: Waiting for simulated AU's to crawl
11:18:23.355: INFO: AU's completed initial crawl
11:18:23.449: INFO: Waiting for a V3 poll by all simulated caches
11:18:48.653: INFO: Client on port 8041 called V3 poll...
11:18:48.694: INFO: Client on port 8042 called V3 poll...
11:18:48.732: INFO: Client on port 8043 called V3 poll...
11:18:48.764: INFO: Client on port 8044 called V3 poll...
11:18:48.814: INFO: Client on port 8045 called V3 poll...
11:18:48.814: INFO: Waiting for all peers to win their polls
11:18:48.891: INFO: Client on port 8041 won V3 poll...
11:18:48.972: INFO: Client on port 8042 won V3 poll...
11:18:49.072: INFO: Client on port 8043 won V3 poll...
11:18:49.157: INFO: Client on port 8044 won V3 poll...
11:18:49.248: INFO: Client on port 8045 won V3 poll...
11:18:50.347: INFO: Damaged the following node(s) on client localhost:8041:
   http://www.example.com/001file.bin
   http://www.example.com/001file.txt
   http://www.example.com/002file.bin
   http://www.example.com/002file.txt
   http://www.example.com/branch1/001file.bin
   http://www.example.com/branch1/001file.txt
   http://www.example.com/branch1/002file.bin
   http://www.example.com/branch1/002file.txt
   http://www.example.com/branch1/index.html
   http://www.example.com/index.html
11:18:50.375: INFO: Waiting for a V3 poll to be called...
11:19:25.638: INFO: Successfully called a V3 poll
11:19:25.714: INFO: Waiting for a V3 poll to be called...
11:19:25.742: INFO: Successfully called a V3 poll
11:19:25.742: INFO: Waiting for V3 repair...
11:19:26.871: INFO: Asymmetric client localhost:8042 repairers OK
11:19:26.871: INFO: Asymmetric client localhost:8042 repairers OK
11:19:26.872: INFO: Asymmetric client localhost:8042 repairers OK
11:19:26.872: INFO: Asymmetric client localhost:8042 repairers OK
11:19:26.919: INFO: Asymmetric client localhost:8043 repairers OK
11:19:26.919: INFO: Asymmetric client localhost:8043 repairers OK
11:19:26.919: INFO: Asymmetric client localhost:8043 repairers OK
11:19:26.920: INFO: Asymmetric client localhost:8043 repairers OK
11:19:26.955: INFO: Asymmetric client localhost:8044 repairers OK
11:19:26.955: INFO: Asymmetric client localhost:8044 repairers OK
11:19:26.955: INFO: Asymmetric client localhost:8044 repairers OK
11:19:26.956: INFO: Asymmetric client localhost:8044 repairers OK
11:19:26.999: INFO: Asymmetric client localhost:8045 repairers OK
11:19:26.999: INFO: Asymmetric client localhost:8045 repairers OK
11:19:26.999: INFO: Asymmetric client localhost:8045 repairers OK
11:19:26.999: INFO: Asymmetric client localhost:8045 repairers OK
11:19:27.083: INFO: AU successfully repaired
11:19:28.086: INFO: No deadlocks detected
>>> Delaying shutdown.  Press Enter to continue...
11:20:46.489: INFO: Stopping framework
----------------------------------------------------------------------
Ran 1 test in 161.058s

OK
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
Finally we clean up again:

foo@bar:~/lockss-daemon/test/frameworks/run_stf$ ./clean.sh ; rm testsuite.opt
foo@bar:~/lockss-daemon/test/frameworks/run_stf$ 
We hope these demos help you understand how the LOCKSS Polling and Repair Protocol works.

1 comment:

  1. The post has been updated to reflect the move from CVS to SVN.

    ReplyDelete