Advance to content

Galois
Technology
Client Services
Company
Blog

Entries in the “Technology” category

Shuffling a deck of cards, Cryptol style

December 26th, 2009 by Levent Erkok

I can never shuffle cards properly. They seem to go every which way when I try, and a perfect random shuffle seems nigh-impossible to achieve, even though the experts claim it takes a mere 7 moves. (The mathematical argument is surprisingly quite technical.) Luckily, we shall concern ourselves with a much simpler problem today: How many perfect out-shuffles does it take to restore a deck back to its original order? We’ll throw in a couple of variants of the problem for fun, but rest assured that we’ll let computers do the work. And, of course, we’ll use Cryptol to help us along the way.

What is a riffle shuffle?

According to wikipedia, a riffle (or dovetail) shuffle goes like this:

… half of the deck is held in each hand with the thumbs inward, then cards are released by the thumbs so that they fall to the table interleaved. Many also lift the cards up after a riffle, forming what is called a bridge which puts the cards back into place…

Well, I read that a couple times, and watched a couple of videos on the internet showing how to do it, but no luck so far. Luckily, this sort of shuffling is quite easy to express programmatically, and Cryptol has the right abstractions to code this in a couple of lines.

Bisecting the deck

The first step in the shuffle is bisecting the deck into two equal halves:

bisect : {a b} (fin a) => [2*a]b -> ([a]b, [a]b);
bisect xs = (take (w, xs), drop (w, xs))
  where w = width xs / 2;

We simply compute the mid-point, and divide the given sequence xs into two, by take‘ing and drop‘ping the correct amounts from the input sequence. In fact, the type of bisect is more interesting than its definition: It succinctly captures the following four facts:

  1. The input has to be of even length (2*a),
  2. The input has to be finite (fin a),
  3. The output has two components, each of which has precisely a elements, that is, half the input,
  4. The actual contents of the sequence can be of any type (b), i.e., the function bisect is shape-polymorphic in the contents.

The ability to express precise size/shape-polymorphic properties using types is one of the strengths of Cryptol.

Out-shuffle vs in-shuffle

Once the deck is split into two, we proceed by picking the cards alternatingly from each half. We have two choices: We can either start with the first half, or the second. If you start with the first half, that’s called an out-shuffle. If you start with the second half, then it’s an in-shuffle. These two functions are actually quite easy to code in Cryptol:

  outShuffle : {a b} (fin a) => [2*a]b -> [2*a]b;
  outShuffle xs = join [| [x y] || x <- fh || y <- sh |]
    where (fh, sh) = bisect xs;

  inShuffle : {a b} (fin a) => [2*a]b -> [2*a]b;
  inShuffle xs = join [| [y x] || x <- fh || y <- sh |]
    where (fh, sh) = bisect xs;

The definitions are almost identical, except for the order in which we put the cards from the halves (fh and sh) together. In the outShuffle, the first card in each pair comes from the first-half. In the inShuffle, it comes from the second half. Easier done than said! Let’s make sure they behave as we expect:

  Cryptol> bisect [1..8]
  ([1 2 3 4], [5 6 7 8])
  Cryptol> outShuffle [1..8]
  [1 5 2 6 3 7 4 8]
  Cryptol> inShuffle [1..8]
  [5 1 6 2 7 3 8 4]

Good! It’s interesting to see what happens when we apply bisect to an odd-length sequence:

  Crytpol> bisect [1..9]
  In a top-level expression: with inferred type:
  {a} ([a][4],[a][4])
  encountered the following unresolved constraints:
  fin a
  2*a == 9

Cryptol is basically telling us that there is no a such that 2*a is 9, resulting in a type-error. Note that this is a static-check before you run your program! In other words, if your program type-checks, then you can rest assured that whenever you call bisect, it is guaranteed to get an even-length sequence as its argument. Strong static typing and size-polymorphism of Cryptol really pays off in this case!

Sequences of shuffles

Before proceeding to the properties of shuffles, we need one last notion: The application of a shuffle repeatedly to a given input, yielding an infinite sequence of transformations:

  iterate : {a} (a -> a, a) -> [inf]a;
  iterate (f, x) = [x] # iterate (f, f x);

  outShuffles, inShuffles :
             {a b} (fin a) => [2*a]b -> [inf][2*a]b;
  outShuffles xs = iterate(outShuffle, xs);
  inShuffles  xs = iterate(inShuffle, xs);

The high-order function iterate gives us the infinite sequence of results of applying a function to a value over and over. We simply use this helper to define outShuffles and inShuffles to apply the corresponding functions indefinitely to their input. Note that the resulting type shows that we get an infinite sequence as output, as indicated by the size inf.

Properties of shuffles

It turns out that if one applies 8 out-shuffles to a deck, a remarkable thing happens: Nothing! The deck goes back to its original order! This is a bit hard to believe, and harder to realize using a real deck of cards. (A friend of mine says he saw it done at college once by hand, but I’m yet to meet anyone who can do this successfully so far!)

Well, the good thing about programming is that we can manipulate the sequences at will, without fear of messing up the cards. Even better, we can assert the above claim as a theorem in Cryptol:

  type Deck = [52][6];

  outShuffle8 : Deck -> Bit;
  theorem outShuffle8: {deck}.
            outShuffles(deck) @ 8 == deck;

We have defined a Deck to be a sequence of 52 things, each of which is 6-bits wide, which is more than enough to cover all the 52-unique elements that appear in an ordinary deck. (6-bits can encode 64 values, so we have 12 unused elements.) The theorem simply states that the 8′th element of the infinite sequence of outShuffle applied to an arbitrary deck gives us back the original deck.

Let’s ask Cryptol to prove this theorem: (Cryptol’s symbolic and sbv backends can perform these proofs, so we first set our mode accordingly below.)

   Cryptol> :set sbv
   Cryptol> :prove outShuffle8
   Q.E.D.

Voila! The proof completes instantaneously, with almost no time elapsed. (This might be surprising at first, since the input space to the theorem has 52*6 = 312 bits, which is quite large. However, we note that the theorem is actually fairly easy to prove since all shuffling does is a mere re-ordering of things with no specific computation; which is easy to manipulate symbolically for Cryptol’s proof engine.)

Reversing the deck

Can we reverse a deck of cards using outShuffle’s? Turns out that this cannot be done. In particular, an outShuffle never moves the first element of the deck anywhere:

  outShuffleFirstCard : Deck -> Bit;
  theorem outShuffleFirstCard: {deck}.
          outShuffle deck @ 0 == deck @ 0;

We have:

  Cryptol> :prove outShuffleFirstCard
  Q.E.D.

Since the first card remains stationary, there is no way to reverse a deck of cards by just using outShuffles.

How about with inShuffle? Turns out the magic number is 26 for reversing a deck of cards in this particular case:

  inShuffle26Rev : Deck -> Bit;
  theorem inShuffle26Rev : {deck}.
            inShuffles(deck) @ 26 == reverse deck;

Again, the proof is immediate:

  Cryptol> :prove inShuffle26Rev
  Q.E.D.

If 26 in-shuffle’s reverse the deck, then 52 of them will restore it back. Here’s the corresponding theorem:

   inShuffle52 : Deck -> Bit;
   theorem inShuffle52: {deck}.
             inShuffles(deck) @ 52 == deck;

Again, the proof is immediate.

The Mongean Shuffle

There is one more variation on the shuffle that we will consider. The mongean shuffle picks the even and odd numbered elements, reverses the odds and adds the evens at the back. For instance, given the sequence: 0 1 2 3 4 5 6 7 8 9, we first construct two sub-sequences: The even index elements: 0 2 4 6 8, and the odd ones 1 3 5 7 9. We then reverse the latter to get 9 7 5 3 1, and append the former, obtaining: 9 7 5 3 1 0 2 4 6 8. Luckily, the Cryptol definition is much easier to read:

  monge xs = reverse odds # evens
    where { w = width xs;
            evens = xs @@ [0 2 .. (w-1)];
            odds  = xs @@ [1 3 .. (w-1)]
          };

  monges xs = iterate(monge, xs);

With a monge shuffle, it takes 12 rounds to restore a deck:

  monge12 : Deck -> Bit;
  theorem monge12: {deck}. monges(deck) @ 12 == deck;

We will leave it to the reader to construct the argument that no sequence of monge shuffles would reverse a deck. (In particular, one can prove that the 18th element from top will never move in a deck of 52. Proving this theorem in Cryptol is again quite trivial.)

A note on theorems

The attentive reader might worry that our Deck type does not quite correspond to a deck-of-cards. This is indeed the case. There are two discrepancies. First, as we mentioned before, our decks can represent 64 elements, while a deck of cards has only 52 distinct cards. On the plus side, this just makes our theorems “stronger,” since we allow for more cards then possible. More importantly, the properties are intended for decks that have no-repeating cards in them. (That is, every card occurs precisely once.) But our theorems apply to arbitrary deck’s, even those that have repeating elements in them. Again, this just makes our theorems stronger, as the unique-sequence cases directly follow from them. We can rest assured that our proofs are conclusive, even though our model of playing-cards is not perfect.

Download

Free evaluation licenses of Cryptol are available at www.cryptol.net. The Cryptol source code for shuffling cards is also available as well.

Posted in Formal Methods, Misc, Technology | Comments (2)

Tech Talk: Hoare-Logic – fiddly details and small print

November 4th, 2009 by Levent Erkok

[Note the Friday date, instead of the usual Tuesday slot.]

The November 13th Galois Tech Talk will be delivered by Rod Chapman, titled Hoare-Logic – fiddly details and small print.”

  • Date: Friday, November 13th, 2009
  • Time: 10:30am – 11:30am
  • Location: Galois, Inc.
    421 SW 6th Ave. Suite 300
    (3rd floor of the Commonwealth Building)
    Portland, OR 97204

Abstract: Hoare’s “assignment axiom” is noted for its simplicity and elegance, which seems to suggest that practical implementations are somehow “easy”. This talk will focus on our experience with SPARK – a programming language and toolset whose principal design goal is the provision of a sound Hoare-style verification system. I will concentrate on the “small print” and various fiddly (but essential) langauge features, such as volatility, I/O, dealing with commercial compilers, soundness and so on, that make the system much harder to implement than you might think.

Bio: Rod Chapman is a Principal Engineer with Praxis, specializing in the design and implementation of safety and security-critical systems. He currently leads the development of the SPARK language and its associated analysis tools.

Rod is a well-known conference speaker and has presented papers, tutorials, and workshops at many international events including SSTC, NSA HCSS, SIGAda, Ada-Europe and the Society of Automotive Engineers (SAE) World Congress. In addition to SPARK, Rod has been the key contributor to many of Praxis’ major projects such as SHOLIS, MULTOS CA, Tokeneer and Software verification tools. He received a MEng in Computer Systems and Software Engineering and a DPhil in Computer Science from the University of York, England, in 1991 and 1995 respectively. He is a Chartered Engineer, a Fellow of the British Computer Society, and also an SEI-Certified PSP Instructor.


Galois has been holding weekly technical seminars for several years on topics from functional programming, formal methods, compiler and language design, to cryptography, and operating system construction, with talks by many figures from the programming language and formal methods communities. The talks are open and free. An RSVP is not required, but feel free to contact the organizer with questions and comments.

Posted in Community, Formal Methods, Tech Talks, Technology | Comments (0)

Tech Talk: Testing First-Order-Logic Axioms in AutoCert

October 28th, 2009 by Levent Erkok

The November 3rd Galois Tech Talk will be delivered by Ki Yung Ahn, titled “Testing First-Order-Logic Axioms in AutoCert.”

  • Date: Tuesday, November 3rd, 2009
  • Time: 10:30am – 11:30am
  • Location: Galois, Inc.
    421 SW 6th Ave. Suite 300
    (3rd floor of the Commonwealth Building)
    Portland, OR 97204

Abstract: AutoCert is a formal verification tool for machine generated code in safety critical domains, such as aerospace control code generated from MathWorks Real-Time Workshop.  AutoCert uses Automated Theorem Proving (ATP) systems based on First-Order-Logic (FOL) to formally verify safety and functional correctness properties of the code.  These ATP systems try to build proofs based on user provided domain-specific axioms, which can be arbitrary First-Order-Formulas (FOFs).  That is, these axioms are the crucial trusted base in AutoCert.  However, formulating axioms correctly (i.e.  precisely as the user had really intended) is non-trivial in practice, and speculating validity of the axioms from the ATP systems is very hard since theorem provers do not give examples or counter-examples in general.

We adopt the idea of model-based testing to aid axiom authors to discover errors in axiomatization.  To test the validity of axioms, users can define a computational model of the axiomatized logic by giving interpretations to each of the function symbols and constants as computable functions and data constants in a simple declarative programming language.  Then, users can test axioms against the computational model with widely used software testing frameworks.  The advantage of this approach is that the users have a concrete intuitive model with which to test validity of the axioms, and can observe counterexamples when the model does not satisfy the axioms.  We have implemented a proof of concept tool using Template Haskell, QuickCheck, and Yices.

(This is a joint work with Ewen Denney at SGT/ NASA Ames Research Center, going to be presented in the poster session at Asian Symposium on Programming Languages and Systems this December.)

Bio: Ki Yung Ahn is a Ph.D student at Portland State University working with Tim Sheard and other members of the TRELLYS project, designing and implementing a dependently typed programming language that is general purpose programming friendly. He received a BS in Computer Science from KAIST in 2002, worked as online storage service server programmer at Gretech before joining graduate program at Portland State University in 2005.  He has been working on Shared Subtypes, Korean translation of “Programming in Haskell” by Graham Hutton, and enjoying other small funs of Haskell hacking.


Galois has been holding weekly technical seminars for several years on topics from functional programming, formal methods, compiler and language design, to cryptography, and operating system construction, with talks by many figures from the programming language and formal methods communities. The talks are open and free. An RSVP is not required, but feel free to contact the organizer with questions and comments.

Posted in Community, Formal Methods, Functional Programming, Tech Talks, Technology | Comments (0)

Tech Talk: How to choose between a screwdriver and a drill

October 20th, 2009 by Levent Erkok

The October 27th Galois Tech Talk will be delivered by Tanya Crenshaw, titled “How to choose between a screwdriver and a drill.”

  • Date: Tuesday, October 27th, 2009
  • Time: 10:30am – 11:30am
  • Location: Galois, Inc.
    421 SW 6th Ave. Suite 300
    (3rd floor of the Commonwealth Building)
    Portland, OR 97204

Abstract: Suppose you have two different algorithms to compute the same result. One is clearly safe. It is simple enough to verify. The other has more desirable features, but it is too complicated to verify. You want a system that can choose between these two algorithms while meeting its quality of service goals. This is the intuition behind Lui Sha’s Simplex Architecture. The intuition is simple, but its verification is not. Implementations of the Simplex Architecture are reactive systems whose properties are not always easy to express. In this talk, I’ll describe the history of Simplex, the challenges of verifying its safety properties, and a variety of approaches to modeling it, from an architecture description language to an executable specification language.

Bio: Tanya L. Crenshaw is an assistant professor of computer science at the University of Portland. She teaches courses in both Computer Science and Electrical Engineering and conducts research into security for embedded systems. Before University of Portland, she worked for Wind River and Sharp Microelectronics and studied at the University of Illinois at Urbana-Champaign where she completed her Ph.D. (Her personal web-page is at: http://kaju.dreamhosters.com/.)


Galois has been holding weekly technical seminars for several years on topics from functional programming, formal methods, compiler and language design, to cryptography, and operating system construction, with talks by many figures from the programming language and formal methods communities. The talks are open and free. An RSVP is not required, but feel free to contact the organizer with questions and comments.

Posted in Formal Methods, Functional Programming, Tech Talks, Technology | Comments (0)

Tech Talk: Writing Linux Kernel Modules with Haskell

October 13th, 2009 by Levent Erkok

The October 20th Galois Tech Talk will be delivered by Thomas DuBuisson, titled “Writing Linux Kernel Modules with Haskell.”

  • Date: Tuesday, October 20th, 2009
  • Time: 10:30am – 11:30am
  • Location: Galois, Inc.
    421 SW 6th Ave. Suite 300
    (3rd floor of the Commonwealth Building)
    Portland, OR 97204

Slides for this talk are now available.

Abstract: Current operating systems are developed and extended primarily with imperative languages that lack in expressiveness and safety.  Pure and functional languages fill these gaps nicely but existing tools are not ideal fits and the language abstracts away important environmental information.

This talk will focus on modifications of the Glasgow Haskell Compiler to generate suitable code and nuances of binding to the Linux API. Short-comings of the language and tools will be identified along with known workarounds and potential engineering efforts.

Bio: Thomas DuBuisson is a PhD student in Computer Science at Portland State University, working with advisor Andrew Tolmach.  Current research efforts revolve around the use of functional languages for systems programming and improving runtime system security.


Galois has been holding weekly technical seminars for several years on topics from functional programming, formal methods, compiler and language design, to cryptography, and operating system construction, with talks by many figures from the programming language and formal methods communities. The talks are open and free. An RSVP is not required, but feel free to contact the organizer with questions and comments.

Posted in Functional Programming, Tech Talks, Technology | Comments (5)

Tech Talk: Roll Your Own Test Bed for Embedded Real-Time Protocols: A Haskell Experience

September 29th, 2009 by Levent Erkok

The October 6th Galois Tech Talk will be delivered by Lee Pike, titled Roll Your Own Test Bed for Embedded Real-Time Protocols: A Haskell Experience

[Note: Lee has recently presented this talk at the Haskell Symposium'09 in Edinburgh. Further info and downloadable artifacts are available at his personal web-site.]

  • Date: Tuesday, October 6th, 2009
  • Time: 10:30am – 11:30am
  • Location: Galois, Inc.
    421 SW 6th Ave. Suite 300
    (3rd floor of the Commonwealth Building)
    Portland, OR 97204

Abstract: We present by example a new application domain for functional languages: emulators for embedded real-time protocols. As a case-study, we implement a simple emulator for the Biphase Mark Protocol, a physical-layer network protocol in Haskell. The surprising result is that a pure functional language with no built-in notion of time is extremely well-suited for constructing such emulators. Furthermore, we use Haskell’s property-checker QuickCheck to automatically generate real-time parameters for simulation. We also describe a novel use of QuickCheck as a probability calculator for reliability analysis.

Bio: Lee Pike is a member of the technical staff at Galois. Previously, he was a research scientist with the NASA Langley Formal Methods Group, primarily involved in the SPIDER project. His research interests include applying formal methods to safety-critical and security-critical applications, with a focus on industrial-scale endeavors.


Galois has been holding weekly technical seminars for several years on topics from functional programming, formal methods, compiler and language design, to cryptography, and operating system construction, with talks by many figures from the programming language and formal methods communities. The talks are open and free. An RSVP is not required, but feel free to contact the organizer with questions and comments.

Posted in Functional Programming, Tech Talks, Technology | Comments (0)

Tech Talk: Building Systems That Enforce Measurable Security Goals

September 10th, 2009 by Levent Erkok

The September 18th Galois Tech Talk will be delivered by Trent Jaeger, titled “Building Systems That Enforce Measurable Security Goals.”

[Note the non-standard Friday date; instead of the usual Tuesday slot!]

  • Date: Friday, September 18th, 2009
  • Time: 10:30am – 11:30am
  • Location: Galois, Inc.
    421 SW 6th Ave. Suite 300
    (3rd floor of the Commonwealth Building)
    Portland, OR 97204

The slides for this talk are now available.

Abstract: In this talk, I will argue for an approach for building and deploying systems that enforce measurable security goals. Historically, the security community has developed “ideal” goals for security, but conventional systems are not built to satisfy such goals, leading to vulnerabilities. However, we find that building conventional systems to ideal security goals is not a practical option. Ideal security requires heavyweight tasks, such as complete formal assurance, and conventional systems depend on security enforcement in too many programs to make assurance cost-effective. As an alternative, we propose an approach where we use ideal goals as a means to gain a comprehensive understanding of which programs we depend upon for security enforcement and the risks that result from such enforcement. The result is a model that enables comprehensive evaluation of security goals and treatment of risks, once identified. In this talk, I will discuss the motivation for our approach in the development of a practical integrity model, called CW-Lite integrity. Then, I will describe two further experiments. The first examines whether user-level processes can be automatically deployed in a manner in which correct enforcement of system policy can be verified. The second examines whether virtual machine systems can be deployed in a manner in which integrity goals can be determined and verified. In these experiments, we leverage the mandatory access control enforcement of the Linux and Xen, although the talk will focus on the conceptual problems in obtaining a comprehensive view of security in conventional systems. The result of these experiments is that by making security goals measurable in conventional systems a comprehensive view of security can be obtained that enables the solution of key problems in building and deploying secure systems.

Bio:

Trent Jaeger received his M.S.E. and Ph.D degrees in computer science and engineering from the University of Michigan, Ann Arbor in 1993 and 1997, respectively. Trent joined the Computer Science and Engineering department at Penn State in 2005. He is co-director of the Systems and Internet Infrastructure Security (SIIS) Lab at Penn State. Prior to joining Penn State, he was a research staff member at IBM Thomas J. Watson Research Center for nine years.

His research/teaching interests are in the areas of computer security, operating systems, security policies, and source code analysis for security. Trent has been active in the Linux community for several years, particularly in contributing code and tools for the Linux Security Modules (LSM) framework (in Linux 2.6) and for integrating the SELinux/LSM with IPsec (called Labeled IPsec, available in Linux 2.6.18 and above). Trent also has active interests in virtual machine systems (mostly Xen) and trusted computing hardware (Linux Integrity Measurement Architecture and its successor PRIMA).

He is also active in the security research community at large, having served on the program committees of many of the major security conferences, including the IEEE Symposium on Security and Privacy, USENIX Security Symposium, ACM Conference on Computer and Communications Security, European Symposium on Research in Computer Security, Annual Computer Security Applications Conference, ISOC Network and Distributed Systems Symposium. Trent has been Program Chair of the ACM Symposium on Access Control Models and Technologies and the ACM Conference on Computer and Communications Security (Industry Track). He is the current Program Chair for the 2007 USENIX Workshop on Hot Topics in Security. He has over 60 refereed publications, and is the holder of several U.S. patents.


Galois has been holding weekly technical seminars for several years on topics from functional programming, formal methods, compiler and language design, to cryptography, and operating system construction, with talks by many figures from the programming language and formal methods communities. The talks are open and free. An RSVP is not required, but feel free to contact the organizer with questions and comments.

Posted in Community, Formal Methods, Tech Talks, Technology | Comments (2)

Substitution ciphers in Cryptol

August 24th, 2009 by Levent Erkok

Substitution ciphers are one of the oldest encryption methods, dating back to at least the 15th century. In a substitution cipher, each character in the plain-text is simply “substituted” according to a predefined map. Decryption is simply the substitution in the reverse direction. Wikipedia has a nice description of these ciphers. Obviously, you wouldn’t want your bank to use such a cipher when executing your web-based transactions! But they are fun to play around, especially when entertaining kids in hot summer days. In this post, we’ll see how to code simple substitution ciphers in Cryptol, and go a step further and actually prove that our implementation is correct.

Preliminaries

The simplest form of substitution ciphers use a permutation of the input alphabet. That is, each letter in the input alphabet gets mapped to another in the same  alphabet. (Strictly speaking, input and output alphabets need not be the same, but nothing essential changes by making that assumption.) For instance, you might decide that your substitution will map ‘a’ to ‘q’, and ‘b’ to ‘d’, …, etc., making sure no two letters are mapped to the same target. Once this mapping is agreed on, all you have to do to encrypt a given message is to map each character to the corresponding element according to your predefined mapping rules.

Here’s our Cryptol encoding of these ciphers. First, some preliminary declarations:

type Char      = [8];
type String(l) = [l]Char;
type Table(n)  = [n](Char, Char);

We’ll simply assume that the input consist of “characters,” each of which will be 8-bit quantities (i.e., numbers from 0 to 255). We will simply use ASCII encoding for normal English characters. This is captured by the Char type declaration above, which simply gives a convenient name for 8-bit wide words. The second type declaration captures sized-strings: For any given size l, the type String(l) represents a sequence of length l, containing 8-bit words. For instance, String(16) is the type of all sequences of length 16, containing numbers from 0 to 255 as elements. Finally a Table of size n is simply n-pairings of characters that form a substitution. Here’s the example table we will use:

cipherTable : Table(28);
cipherTable = [| (x, y) || x <- plain || y <- cipher |]
  where { plain  = "abcdefghijklmnopqrstuvwxyz .";
          cipher = "oebxa.cdf hijklmnzpqtuvwrygs"
        };

Note that our table has 28 entries (the lower-case English alphabet, plus space and the dot). A simple Cryptol sequence-comprehension succinctly zips the sequences up, forming our example table.

Performing the substitution

Given a table and a character, the function subst returns the element that the table maps the character to:

subst : {n} (fin n) => (Table(n), Char) -> Char;
subst (table, chr) = find 0
 where find i = if i == width table
                then '?'
                else if chr == chr' then code
                                    else find (i+1)
                   where (chr', code) = table @ i;

To do the search, we simply start from index 0 and walk through the given table recursively, returning the mapped element if we have a match. If the given table does not have the corresponding element,  we simply return the character ‘?’, which will indicate failure.  (Aside: Note how the predicate "fin n" ensures we’re given a finite table, ensured by Cryptol’s type-system. Haskell enthusiasts can gripe about receiving an infinite list when they didn’t ask for one!)

Encryption and Decryption

Having defined subst, encryption and decryption are mere maps:

encrypt (table, msg) = [| subst (table,  c) || c <- msg |];
decrypt (table, msg) = [| subst (table', c) || c <- msg |]
  where table' = [| (y, x) || (x, y) <- table |];

where we simply swap the elements in the table for decryption.

Substitution in action

That’s pretty much all there is to it for substitution ciphers. To illustrate, let us create specialized versions of encrypt and decrypt for our example table, together with some test data:

enc, dec : {l} String(l) -> String(l);
enc msg = encrypt (cipherTable, msg);
dec msg = decrypt (cipherTable, msg);

plainText, cipherText, decodedText : String(51);
plainText
   = "the quick brown fox jumped over the lazy black dog.";
cipherText  = enc plainText;
decodedText = dec cipherText;

Here’s Cryptol in action:

SubstCipher> :p plainText
the quick brown fox jumped over the lazy black dog.
SubstCipher> :p cipherText
qdagntfbhgezlvkg.lwg tjmaxgluazgqdagioyrgeiobhgxlcs
SubstCipher> :p decodedText
the quick brown fox jumped over the lazy black dog.

Volia! The secret is now safe, thanks to our substitution cipher.. (Note that we’re using Cryptol’s :p command which prints its argument as a string, as opposed to a sequence of 8-bit words.)

Correctness

Substitution ciphers are dead-simple, but it would be nice to have further assurance that our implementation is indeed correct. This is where Cryptol’s verification tools come into play. Let us first try to write a simple theorem, stating that encryption followed by decryption does not alter the message:

theorem checkEncDec: {msg}.
   dec (enc msg) == (msg : String(16));

Note that we have restricted the theorem to messages of size 16 only, in order to create a  monomorphic theorem that Cryptol’s verification engine can check/prove automatically. (While polymorphic theorems can be stated in Cryptol, they cannot be proven automatically.)

Before attempting a proof, it’s always good to get a quick-check evidence that we’re on safe ground. Here’s what happens when I try to quick-check the above theorem using Cryptol:

SubstCipher> :check checkEncDec
Checking case 1 of 1000 (0.10%)
Falsifiable.
checkEncDec [0x15 0x3e 0xf2 0x4f 0x34 0xc4 0x69 0x5a
             0x64 0x9e 0xb0 0xe2 0x6f 0xf8 0x6a 0x4a]
	= False

Oops! Something went terribly wrong, Cryptol found a counter-example without even really trying. Let’s use Cryptol’s “:p” command to see what the counter-example really is:

SubstCipher> :p [0x15 0x3e 0xf2 0x4f 0x34 0xc4 0x69 0x5a
                 0x64 0x9e 0xb0 0xe2 0x6f 0xf8 0x6a 0x4a]
>?O4?iZd???o?jJ

Ah, it contains characters that we have not mapped! Since our input accepts any 8-bit word, all values from 0 to 255 are valid “characters.” But we clearly have not mapped most of these to anything, causing Cryptol to rightfully reject our theorem!

Conditional theorems

What we need is a conditional theorem, one that has a hypothesis that says “for all good messages.” That is, we need to consider only those messages that our cipher knows how to map. Here’s a helper function to do just this:

all, any : {n a} (fin n) => (a -> Bit, [n]a) -> Bit;
all (f, xs) = [| f x || x <- xs |] == ~zero;
any (f, xs) = [| f x || x <- xs |] != zero;

checkMessage (table, msg) = all (isMapped, msg)
   where isMapped c = any (\(c', _) -> c == c', table);
[Aside: Cryptol's polymorphic constant zero is simply the value that is False at every point. Thus, checking against its negation (~zero) ensures all the elements are True, and checking inequivalence against it (i.e., != zero), ensures at least one element is True.]

Having defined checkMessage, we can now express our correctness theorem (again constrained to strings of size 16):

theorem cipherIsCorrect: {msg}.
      if checkMessage(cipherTable, (msg : String(16)))
      then dec (enc msg) == msg
      else True;

If you try the :check command on this theorem, you will find that Cryptol is now happy with it:

SubstCipher> :check cipherIsCorrect
Checking case 1000 of 1000 (100.00%)
1000 tests passed OK
[Coverage: 0.00%.
           (1000/340282366920938463463374607431768211456)]

The coverage info suggests we have not even barely scratched the test state-space in this case, as one can expect. However, that’s the least of our worries in this case. More importantly, quick-check’s OK is not  satisfactory at all for conditional theorems. How do we know that the random test data generated by Cryptol will pass the checkMessage test? In fact, it is very likely that very few of the test cases actually executed the actual equality check itself, as the “randomly-generated” messages are unlikely to pass the checkMessage filter.

Verification

Luckily, we can do better. Using Cryptol’s automated theorem prover, we can rigorously prove that the theorem is indeed true for all possible messages:

SubstCipher> :prove cipherIsCorrect
Q.E.D.

Cryptol proves this theorem on my 3 year old laptop in less than a second!  (We  should also note that the :prove command is available in the symbolic and SBV backends of Cryptol that comes with the full release. It uses off-the-shelf SAT/SMT solvers to prove Cryptol theorems automatically.)

Now we can rest assured that our implementation of substitution ciphers in Cryptol is indeed correct!

For the curious

As the alert reader would have no doubt noticed, we have only proved our implementation correct with respect to our example translation map cipherTable. A more general theorem would have proved the cipher correct with respect to all possible tables. Such a theorem is indeed easy to express in Cryptol, but you’ll also need to add the condition that the table is well defined. (That is, it should be a one-to-one map and no letter should be mapped to the special invalid marker ‘?’.) We invite the curious reader to play with this variant. Note that the automated proof will be more complicated in this case, and the backend prover might need more time before returning the final Q.E.D.

Download

The Cryptol source file implementing the substitution cipher is available for download. The Cryptol toolset licenses are freely available at www.cryptol.net. Enjoy!

Posted in Cryptography, Formal Methods, Technology | Comments (0)

Tech talk: Programming the fleet

August 19th, 2009 by Levent Erkok

The August 25th Galois Tech Talk will be delivered by Adam Megacz, titled “Programming the Fleet”

  • Date: Tuesday, August 25th, 2009
  • Time: 10:30am – 11:30am
  • Location: Galois, Inc.
    421 SW 6th Ave. Suite 300
    (3rd floor of the Commonwealth Building)
    Portland, OR 97204

[This is a continuation to the earlier tech seminar on the Fleet Architecture.]

Slides for this talk are now available.

Abstract: Computing has become far too complicated, as evidenced by recent high-profile chip design failures.  The Fleet project seeks to simplify processor design by replacing large irregular “cores” with simple functional units connected by a uniform switch fabric.  This simplification frees up hardware design resources, which can be reallocated to the use of ultra-high-performance circuit styles such as GasP.

These benefits do not come for free: simplifying the hardware in this manner shifts complexity out of the silicon and into the compiler and programming language.  This talk will present current progress on and future plans for the challenging task of programming Fleet.

Bio: Adam Megacz is a PhD candidate in Computer Science at UC Berkeley, a consultant to the VLSI Research Group at Sun Labs, and an NSF Graduate Fellow.


Galois has been holding weekly technical seminars for several years on topics from functional programming, formal methods, compiler and language design, to cryptography, and operating system construction, with talks by many figures from the programming language and formal methods communities. The talks are open and free. An RSVP is not required, but feel free to contact the organizer with questions and comments.

Posted in Community, Tech Talks, Technology | Comments (0)

Verifying Legato’s multiplier in Cryptol

July 8th, 2009 by Levent Erkok

Consider the following multiplication algorithm, coded in Mostek 6502 Assembler:

        LDX #8    ; 1; load X immediate with the 8
        LDA #0    ; 2; load A immediate with the 0
        CLC       ; 3; set C to 0
 LOOP   ROR F1    ; 4; rotate F1 right circular through C
        BCC ZCOEF ; 5; branch to ZCOEF if C = 0
        CLC       ; 6; set C to 0
        ADC F2    ; 7; set A to A+F2+C and C to the carry
 ZCOEF  ROR A     ; 8; rotate A right circular through C
        ROR LOW   ; 9; rotate LOW right circular through C
        DEX       ;10; set X to X-1
        BNE LOOP  ;11; branch to LOOP if Z = 0

This program comes from Wilfred Legato’s paper “A Weakest Precondition Model for Assembly Language Programs.” It multiplies the contents of the memory locations F1 and F2; each of which is 8-bits wide. The result is stored in the accumulator register A and the memory location LOW, each of which is, again, 8-bits. It holds that:

    F1 * F2 = 256 * A + LOW

when the algorithm terminates, correctly handling the overflow. It is worth spending a moment or two pondering how this algorithm works; it is not at all obvious how the multiplication is done!

Legato’s challenge  (as  referred to in ACL2 circles) is to prove a deep-embedding of Legato’s algorithm correct with respect to a Mostek simulator coded in ACL2. We do not attempt to solve  Legato’s challenge in Cryptol. We are merely interested in coding and proving that Legato’s multiplier is correct in Cryptol. Our interest stems from the fact that Legato’s algorithm is a truly interesting multiplier on its own right, and we would like to make sure that a straightforward encoding of it in Cryptol can be proven correct automatically by Cryptol’s verification tools. And of course, it’s just too hard to pass up on the opportunity to  pay respect to the Mostek chip that powered the Commodore 64’s and Atari 800XL’s of our childhood.

A shallow embedding

The Cryptol solution to Legato’s problem will be a fairly shallow encoding of the multiplier, together with an automated proof of correctness. We choose to do a shallow encoding here since it allows us to focus on the multiplication algorithm itself, as opposed to the particulars of the underlying Mostek chip. Theorem proving based solutions (such as those given by ACL2 folks) will rightly pursue a deeper embedding of the algorithm and the Mostek architecture in general. Cryptol is not particularly suitable for deep embeddings. Representing Mostek assembly instructions directly as Cryptol functions is a much simpler and straightforward choice.

Looking at Legato’s multiplier above, we will represent each instruction (from 1 to 11) as a simple state transformer, taking a simplified representation of the Mostek machine state as input and delivering a new one. We will only represent parts of the state that matter for our problem. The following Cryptol type declaration succinctly captures what we need:

  type Mostek = ( [8] // F1
                , [8] // F2
                , [8] // A
                , [8] // X
                , [8] // LOW
                , Bit // C (Carry)
                , Bit // Z (Zero)
                );

Using this state representation, each instruction in the program can be modeled as a  state transformer:

  type Instruction = Mostek -> Mostek;

This takes care of the data-flow aspect of the embedding; but the question of how to model control-flow remains. We will simply use the host-language’s control-flow features, using the quintessential functional idiom: by calling functions! This is actually easier done than said, and here’s our embedding of the first instruction of the program:

   // step1: LDX #8; load X immediate with the integer 8.
  step1 : Instruction;
  step1 (f1, f2, a, _, l, c, z) =
                 step2 (f1, f2, a, 8, l, c, z);

Let’s spend a minute explaining this in detail. The first step in the program loads the register X with the immediate value 8. Using our state-transformer model, our step1 function will receive a Mostek state (consisting of the “current” values of F1, F2, A, X, LOW, CARRY, and ZERO). The “effect” of this instruction is to put the value 8 into the register X, leaving everything else the same. Once this is done, the control goes to the next instruction, which we model by calling the function step2 (which is yet to be defined).

In this fashion, we can shallowly embed all the instructions in Legato’s multiplier, using Cryptol’s native functions and control-flow features. Of course, this is hardly a new idea, being the essence of the whole domain-specific embedded language saga: Using a rich host-language to “fake” other languages.

Following the recipe set by step1, it is easy to model the next two instructions:

  // step2: LDA #0; load A immediate with the integer 0.
  step2 : Instruction;
  step2 (f1, f2, _, x, l, c, z) =
                 step3 (f1, f2, 0, x, l, c, z);

  // step3: CLC; set C to 0 (Note the use of Bit False here)
  step3 : Instruction;
  step3 (f1, f2, a, x, l, _, z) =
                 step4 (f1, f2, a, x, l, False, z);

Step 4 is equally easy in terms of control flow, but is tricky in terms of operation. After some head-scratching, one figures out that the term “rotate F1 right circular through C” means put the right-most bit of F1 in C, and put C in the first position of F1. A bizarre thing to do indeed, but that’s the beauty of Legato’s multiplier. The Cryptol translation is almost literal:

// step4: LOOP  ROR F1; rotate F1 right circular through C.
step4 : Instruction;
step4 (f1, f2, a, x, l, c, z) =
               step5 (f1', f2, a, x, l, b0, z)
   where {
      [b0 b1 b2 b3 b4 b5 b6 b7] = f1;
      f1' = [b1 b2 b3 b4 b5 b6 b7 c];
   };

The use of pattern matching in getting the bits out of f1, and the construction of the new value of f1 is idiomatic Cryptol. There’s one little catch though: Apparently Mostek was a big-endian machine, having a most-significant-bit-first representation. Cryptol is little-endian. So, instead of rotating the bits to right, we  rotate them left.

The fifth instruction is the first time where we use Cryptol’s control-flow to model the Mostek jump instruction:

  // step5 : BCC ZCOEF; branch to ZCOEF if C = 0.
  // ZCOEF is step8 in our encoding
  step5 (f1, f2, a, x, l, c, z)
    = if c then step6 (f1, f2, a, x, l, c, z)
           else step8 (f1, f2, a, x, l, c, z);

In this case, we simply receive a state, and depending on the value of the carry bit (C), we either go to the next step (i.e., no jump); or go to the ZCOEF instruction, which is going to be step-8 in our model. Easy as pie!

Step-6 is a replica of Step-3, clearing the carry bit:

  // step6: CLC; set C to 0
  step6 (f1, f2, a, x, l, _, z) =
                 step7 (f1, f2, a, x, l, False, z);

Step-7 is the most compute intensive part of the algorithm. The Cryptol encoding is a bit complicated due to the need to determine if there was a carry in the addition. Since all Cryptol arithmetic is modular, we are forced to do the computation at an extended bit-size. Otherwise, the modeling of the ADC instruction is quite straightforward:

  // step7: ADC F2; set A to A+F2+C and C to the carry.
  step7 (f1, f2, a, x, l, c, z) =
                 step8 (f1, f2, a', x, l, c', z')
    where {
        // 8-bit "modular" result
        a' = a + f2 + (if c then (1:[8]) else (0:[8]));
        // Was there a carry? Check that "real"
        // result is larger than 255
        a'Large : [9];
        a'Large =
            (a # zero)    // extend a by adding zero bits
          + (f2 # zero)   // same for f2
          + (if c then (1:[9]) else (0:[9]));
        c' = a'Large > (255:[9]);
        // set the zero flag
        z' = a' == 0;
      };

The Cryptol idiom x # zero simply represents the value x extended on the right with 0 bits. (Remember that Cryptol is little-endian, hence the addition of zero bits on the right does not change the value.) Due to the polymorphic type of the value zero, the result has any number of bits larger than equal to the original bit-size of x. (Since we only need 9-bits in this case, we could have coded the same via the expression x # [False], but the former expression is more idiomatic Cryptol.)

Steps 8 and 9 are similar to Step-4, using A and LOW instead of F1, respectively:

  // step8 : ZCOEF ROR A; rotate A right circular through C.
  step8 : Instruction;
  step8 (f1, f2, a, x, l, c, z) =
                 step9 (f1, f2, a', x, l, a0, z)
    where {
        [a0 a1 a2 a3 a4 a5 a6 a7] = a;
        a' = [a1 a2 a3 a4 a5 a6 a7 c];
     };

  // step9 : ROR LOW; rotate LOW right circular through C.
  step9 : Instruction;
  step9 (f1, f2, a, x, l, c, z) =
                 step10 (f1, f2, a, x, l', l0, z)
    where {
      [l0 l1 l2 l3 l4 l5 l6 l7] = l;
      l' = [l1 l2 l3 l4 l5 l6 l7 c];
    };

Step-10 simply decrements X, setting the ZERO flag appropriately:

  // step10: DEX; set X to X-1
  step10 : Instruction;
  step10 (f1, f2, a, x, l, c, z) =
                  step11 (f1, f2, a, x', l, c, x'==0)
        where x' = x-1;

Finally, step-11 either jumps back to the top of the loop (step-4), or finishes the algorithm:

  // step11: BNE LOOP; branch to LOOP if Z = 0.
  // LOOP is step4 in our encoding
  step11 : Instruction;
  step11 (f1, f2, a, x, l, c, z)
    = if z
         then (f1, f2, a, x, l, c, z)           // done!
         else step4 (f1, f2, a, x, l, c, z);

From a control-flow perspective, we indicate the end of the algorithm by simply returning the final Mostek state. It is worthwile at this point to go through the Cryptol embeddings of the instructions to see how they match-up to the Mostek assembly given by Legato.

Extracting the multiplier

Having coded Legato’s multiplier as a sequence of state transformers, we can simply call the function step1 to use it with an appropriate state. The following helper function simplifies this task for us, by loading the registers F1 and F2, and extracting the high and low bits at the end:

legato : ([8], [8], Mostek) -> ([8], [8]);
legato (f1, f2, st) = (hi, lo)
  where {
       // get the relevant parts
       // to construct the starting state
       (_, _, A, X, LOW, C, Z) = st;
       // Run legato multiplier;
       //  final A is hi; and final LOW is low
       (_, _, hi, _, lo, _, _) =
                    step1 (f1, f2, A, X, LOW, C, Z);
  };

Note that legato still takes the starting machine state st as an argument. Legato’s claim (which we will shortly prove) is that the algorithm works correctly no matter what the initial state is, hence it is important to be explicit about the starting state.

To see legato in action, let’s just run it on a simple input:

   legato> legato (12, 93, (9, 42, 3, 8, 1, False, True))
   (4, 92)

where I just made up the initial state by plugging in some random values. If Legato is right, then it must be the case that

    12 * 93 = 256 * 4 + 92

correctly computing the high and low bytes. And voila! Both sides equal 1116. Magic!

Correctness

If you do believe in magic,  you can stop reading now. But I suspect most readers of the Galois blog will be looking for something more concrete. Surely, we must be able to give a better argument than claiming witchcraft for the correctness of our implementation.

Let us first formally capture what we mean by “correct,” by writing a Cryptol theorem that expresses our intuitive expectation:

theorem
   legatoIsCorrect: {x y st}. x' * y' == 256 * hi' + lo'
  where { (hi, lo) = legato (x, y, st);
           hi', lo', x', y' : [16];
           hi' = hi # zero;
           lo' = lo # zero;
           x'  = x  # zero;
           y'  = y  # zero
        };

Here’s the English reading of this theorem: “For all values of x, y, and st, if we run legato on these values and get the results hi and lo, then, it’ll be the case that x * y = 256 * hi + lo.” The only caveat is that we have to do arithmetic operations over 16 bit values (instead of 8), to make sure the theorem statement correctly captures the intended mathematical meaning. (Recall that all Cryptol arithmetic is modular with respect to the bit-size involved.) Hence, we simply add extra zero’s at the end to enlarge the arguments to 16 bits. Note that, we do not have to assert that the value of lo is at most 255; this is automatically guaranteed by the fact that it is an 8-bit value. Cryptol’s bit-precise type system saves the day!

Verification

Here’s what happens when I run cryptol on the file containing the above theorem:

$ cryptol legato.cry
Cryptol version 1.8.5, Copyright (C) 2004-2009 Galois, Inc.
                                           www.cryptol.net
Type :? for help
Loading "legato.cry".. Checking types.. Processing.. Done!
*** Auto quickchecking 1 theorem.
*** Checking "legatoIsCorrect" ["legato.cry", line 147, col 1]
Checking case 100 of 100 (100.00%)
100 tests passed OK
[Coverage: 3.47e-14%. (100/288230376151711744)]

When Cryptol sees a theorem declaration in a loaded file, it automatically performs a quick-check run to provide feedback on its validity. In this case, Cryptol automatically created 100 random test values for the theorem and checked that each one of them satisfied the statement. This is a quick way of getting feedback on the correctness of theorems, courtesy of Cryptol at no additonal cost to the user!

While the quick-check run is promising, the coverage info indicates that we’ve barely scratched the surface. The entire state space in this case has 58 bits (8 each for x and y, plus the starting arbitrary state of the Mostek machine costing us an extra 42 bits; for a total of 58). The total number of possible inputs is, therefore, 258 or 288230376151711744. This is a huge number: If you had a computer that run 1-billion (109) test cases every second, it’d still take you over 9 years to go through all possible inputs!

Of course, we can do better. Cryptol’s theorem proving environment uses modern equivalence-checkers to prove such theorems automatically, at the push of a (virtual) button:

  legato> :prove legatoIsCorrect
  Q.E.D.

And there, we’ve proved that our implementation of Legato’s multiplier is indeed correct for all possible inputs! (The above proof takes about 2.5 minutes to complete on my 3-year old MacBook Pro, using abc as the underlying equivalence checker in Cryptol’s symbolic mode. I should also note that the symbolic mode is only available in the full Cryptol release, for which free licenses are available.)

Closing thoughts

I must emphasize that we are not advocating Cryptol as a platform for doing proofs of algorithm correctness. Modern theorem provers such as ACL2, Coq, or Isabelle are the leading tools in this regard. (In particular, the logic behind Cryptol’s automated theorem prover is much less expressive, for starters.) Where Cryptol shines is in its restricted attention to bit-vectors and data-flow algorithms (cryptography being a prime application area), and it turns out that automated equivalence-checking based techniques do perform rather well for such problems. Our shallow embedding of Legato’s multiplier and the automated proof-of-correctness is a case in point.

There is one more important point to make. While push-button provers are indispensable in industrial practice, the final Q.E.D. you get from an interactive theorem prover such as ACL2 or Isabelle is much more satisfactory. For instance, we can hardly claim that the above proof increased our understanding of Legato’s algorithm in any sense, it just made us really believe it. I’m willing to bet that anyone who goes through a similar proof in ACL2 or Isabelle would have a much higher chance of having their “aha!” moment, where everything just sinks in…

On the practical side, however, nothing beats the fully-automated Q.E.D., especially when your boss is breathing down your neck!

Download

The Cryptol file containing Legato’s multiplier and the correctness theorem is here. The Cryptol toolset licenses are freely available at www.cryptol.net.

Posted in Formal Methods, Misc, Technology | Comments (0)

©2000–2010 Galois, Inc. All rights reserved. Terms of use. Contact Us