Advance to content

Galois
Technology
Client Services
Company
Blog

Entries in the “Cryptography” category

Solving n-Queens in Cryptol

April 11th, 2010 by Levent Erkok

The eight queens puzzle asks how to place eight queens on a chess board such that none of them attacks any other. The problem easily generalizes to n queens, using an nxn board. In this post, we’ll see how to solve the n-Queens puzzle in Cryptol, without lifting a finger!

Representing the board

It is easy to see that any solution to the n-Queens puzzle will have precisely one queen on each row and column of the board. Therefore, we can represent the solution as a sequence of n row numbers, corresponding to subsequent columns. For instance, the board pictured below (taken from the wikipedia page), can be represented by the sequence

3  7  0  2  5  1  6  4

recording the row of the queen appearing in each consecutive column, starting from the top-left position. (And yes, we always count from 0!)

Recognizing a valid solution

Let us start by asking a simpler question. How can we recognize that a given sequence of n row indices constitute a valid solution? We have to verify that the followings hold:

  • The numbers must be between 0 and n-1. (Sanity checking.)
  • The numbers should not be duplicated. (Ensures that there is exactly one queen per row.)
  • For any pair of numbers in the sequence, their row-value difference should be different than their column-value difference. (Ensures that there are no diagonal conflicts.)

The first two conditions are fairly trivial. The last one simply says that if two queens are positioned such that their vertical and horizontal displacements are the same, then they are on the same diagonal: A situation we must avoid for all pairs of queens given.

Writing a Cryptol predicate

Having described how we will recognize an alleged solution, let us now consider how we can code this in Cryptol.

First, we will need a helper function that applies a given predicate to all the elements of a sequence and returns True if all elements satisfy the predicate. Naturally, we call this function all:

 all : {n a} (fin n) => (a -> Bit,[n]a) -> Bit;
 all (f, xs) = [|f x || x <- xs|] == ~zero;

The type of this function is worth spending a few moments looking at, as it demonstrates some crucial aspects of  Cryptol’s type system that sets it apart from other languages. In particular, it says that the function all takes a sequence of n elements, each of some arbitrary size a. However, it also restricts n to be  finite, i.e., the type system will complain if you pass all an infinite sequence. Then, we apply the predicate f to each element in the sequence, and check that the result consists of all True bits. (The polymorphic Cryptol value zero represents a value with an arbitrary shape with all bits set to False. The complement operator (~) flips all those bits to True.)

The second utility function we will need is a predicate that checks whether the elements of a given sequence are all distinct:

 distinct : {a b} (fin a,fin b) => [a]b -> Bit;
 distinct xs = diff == ~zero
   where {
     n = width xs;
     diff = [|check (x, i)
            || x <- xs
            || i <- [0 .. (n-1)]
            |];
     check (x, start) = walk (start + 1)
        where walk cur =
                if cur == n
                   then True
                   else if xs @ cur == x
                        then False
                        else walk (cur + 1);
      };

The function distinct is fairly simple: For each element in the sequence, it walk’s through the rest to make sure that none of the remaining elements are the same as that particular element. Note how Cryptol’s width function allows us to determine the size of the input sequence, and recurse properly.

Having done the heavy-lifting above, we can now write the predicate that recognizes a valid n-Queens solution in Cryptol:

type Solution(n) = [n][1+width n] -> Bit;

nQueens : {n} (fin n) => Solution(n);
nQueens qs = all(inRange, qs)
           & distinct qs
           & all(checkDiag, ijs)
  where {
     n = width qs;
     inRange x = x < n;
     ijs = [| (i, j) || i <- [0 .. n-1]
                      , j <- [0 .. n-1] |];
     checkDiag (i, j) =
        (i >= j) | (diffR != diffC)
          where { qi = qs @ i;
                  qj = qs @ j;
                  diffR = if qi >= qj
                          then qi-qj
                          else qj-qi;
                  diffC = j - i;
           };
    };

The nQueens function simply executes our checks as we described above. It first makes sure that all the elements are inRange. (Note that we do not have to check for negative numbers since all Cryptol numbers are non-negative.) Then, we check that all the elements are distinct. Following this, we generate the pairs of indices ijs, and call checkDiag on all these pairs to make sure that the row and column differences do not match.

We can test this function in Cryptol, by providing sample input:

  NQueens> nQueens [3 7 0 2 5 1 6 4]
  True
  NQueens> nQueens [3 7 0 2 5 6 1 4]
  False

In the second example, I swapped the elements 1 and 6; effectively putting the 6th queen on row 6 and the 7th on row 1. This creates a diagonal conflict  as the reader can easily verify.

Solving n-Queens

It appears that all we have done so far is to write a Cryptol program to recognize whether a given solution to the n-Queens problem is valid. But, how do we actually solve the puzzle?

Remember that I promised to solve n-Queens without lifting a finger, and this is where Cryptol’s formal methods based tools come into the picture. Given any predicate (like nQueens above), Cryptol can find inputs that will satisfy it automatically, at the push of a button! (We have briefly described how this works in our previous post on Sudoku, so I shall not go into any details. Suffice it to say that one can exploit modern SAT/SMT solvers to generate automatic solvers for validity problems, and Cryptol provides just the right framework for doing so.)

Without further ado, here is how we solve n-Queens in Cryptol, without writing any additional code on top of what we have already seen:

  NQueens> :set sbv
  NQueens> :set base=10
  NQueens> :sat nQueens : Solution(8)
  (nQueens : Solution(8))  [2 7 3 6 0 5 1 4]
  	 = True

The first command puts Cryptol in the sbv mode, which allows for the use of SAT/SMT solvers. The second command tells Cryptol to print results in base 10. Finally, the :sat command finds satisfying assignments for predicates. The type-signature on the :sat command indicates that we are interested in solutions for 8-queens, for which we are given a solution almost instantaneously!

If you look at the wiki-page for n-queens, you will see that they claim there is always a solution when n=1 and n>=4. We can indeed verify that there are no solutions when n is 2 or 3, using Cryptol:

NQueens> :sat nQueens : Solution(2)
No variable assignment satisfies this function
NQueens> :sat nQueens : Solution(3)
No variable assignment satisfies this function

While these two properties are easy to see, it’s quite valuable to get independent verification!

In practice..

We would be remiss if we didn’t mention some of the actual uses of the satisfiability checker in the Cryptol tool chain. The most important application is in verifying functional equivalence of two functions. If f and g are two functions, then they are functionally equivalent exactly when the predicate (\x -> f x != g x) is not satisfiable. (Conversely, if this predicate is satisfiable then the satisfying value is where f and g disagree, indicating a potential bug.) More importantly, f and g need not be both written in Cryptol. The toolchain allows performing equivalence checking between functions written in VHDL, C, and Java (with varying degrees of “seamless” integration!). Furthermore, this is also how we make sure our compiler produces correct code: The function g can be generated code, for instance in one of Cryptol’s intermediate languages as we compile Cryptol to FPGA descriptions. In summary, we can check the equivalence of 3rd party code (generated or hand-written) against “gold” Cryptol specifications, increasing assurance in the correctness of crypto-implementations.

Download

You can download the Cryptol solution for n-Queens. Cryptol licenses are also available free of charge at www.cryptol.net, although I should note that the SAT/SMT based backends are only available in the full-releases of Cryptol, which requires a simple form to be filled before downloading. Enjoy!

Posted in Cryptography, Formal Methods, Functional Programming, Misc, Technology | Comments (8)

Substitution ciphers in Cryptol

August 24th, 2009 by Levent Erkok

Substitution ciphers are one of the oldest encryption methods, dating back to at least the 15th century. In a substitution cipher, each character in the plain-text is simply “substituted” according to a predefined map. Decryption is simply the substitution in the reverse direction. Wikipedia has a nice description of these ciphers. Obviously, you wouldn’t want your bank to use such a cipher when executing your web-based transactions! But they are fun to play around, especially when entertaining kids in hot summer days. In this post, we’ll see how to code simple substitution ciphers in Cryptol, and go a step further and actually prove that our implementation is correct.

Preliminaries

The simplest form of substitution ciphers use a permutation of the input alphabet. That is, each letter in the input alphabet gets mapped to another in the same  alphabet. (Strictly speaking, input and output alphabets need not be the same, but nothing essential changes by making that assumption.) For instance, you might decide that your substitution will map ‘a’ to ‘q’, and ‘b’ to ‘d’, …, etc., making sure no two letters are mapped to the same target. Once this mapping is agreed on, all you have to do to encrypt a given message is to map each character to the corresponding element according to your predefined mapping rules.

Here’s our Cryptol encoding of these ciphers. First, some preliminary declarations:

type Char      = [8];
type String(l) = [l]Char;
type Table(n)  = [n](Char, Char);

We’ll simply assume that the input consist of “characters,” each of which will be 8-bit quantities (i.e., numbers from 0 to 255). We will simply use ASCII encoding for normal English characters. This is captured by the Char type declaration above, which simply gives a convenient name for 8-bit wide words. The second type declaration captures sized-strings: For any given size l, the type String(l) represents a sequence of length l, containing 8-bit words. For instance, String(16) is the type of all sequences of length 16, containing numbers from 0 to 255 as elements. Finally a Table of size n is simply n-pairings of characters that form a substitution. Here’s the example table we will use:

cipherTable : Table(28);
cipherTable = [| (x, y) || x <- plain || y <- cipher |]
  where { plain  = "abcdefghijklmnopqrstuvwxyz .";
          cipher = "oebxa.cdf hijklmnzpqtuvwrygs"
        };

Note that our table has 28 entries (the lower-case English alphabet, plus space and the dot). A simple Cryptol sequence-comprehension succinctly zips the sequences up, forming our example table.

Performing the substitution

Given a table and a character, the function subst returns the element that the table maps the character to:

subst : {n} (fin n) => (Table(n), Char) -> Char;
subst (table, chr) = find 0
 where find i = if i == width table
                then '?'
                else if chr == chr' then code
                                    else find (i+1)
                   where (chr', code) = table @ i;

To do the search, we simply start from index 0 and walk through the given table recursively, returning the mapped element if we have a match. If the given table does not have the corresponding element,  we simply return the character ‘?’, which will indicate failure.  (Aside: Note how the predicate "fin n" ensures we’re given a finite table, ensured by Cryptol’s type-system. Haskell enthusiasts can gripe about receiving an infinite list when they didn’t ask for one!)

Encryption and Decryption

Having defined subst, encryption and decryption are mere maps:

encrypt (table, msg) = [| subst (table,  c) || c <- msg |];
decrypt (table, msg) = [| subst (table', c) || c <- msg |]
  where table' = [| (y, x) || (x, y) <- table |];

where we simply swap the elements in the table for decryption.

Substitution in action

That’s pretty much all there is to it for substitution ciphers. To illustrate, let us create specialized versions of encrypt and decrypt for our example table, together with some test data:

enc, dec : {l} String(l) -> String(l);
enc msg = encrypt (cipherTable, msg);
dec msg = decrypt (cipherTable, msg);

plainText, cipherText, decodedText : String(51);
plainText
   = "the quick brown fox jumped over the lazy black dog.";
cipherText  = enc plainText;
decodedText = dec cipherText;

Here’s Cryptol in action:

SubstCipher> :p plainText
the quick brown fox jumped over the lazy black dog.
SubstCipher> :p cipherText
qdagntfbhgezlvkg.lwg tjmaxgluazgqdagioyrgeiobhgxlcs
SubstCipher> :p decodedText
the quick brown fox jumped over the lazy black dog.

Volia! The secret is now safe, thanks to our substitution cipher.. (Note that we’re using Cryptol’s :p command which prints its argument as a string, as opposed to a sequence of 8-bit words.)

Correctness

Substitution ciphers are dead-simple, but it would be nice to have further assurance that our implementation is indeed correct. This is where Cryptol’s verification tools come into play. Let us first try to write a simple theorem, stating that encryption followed by decryption does not alter the message:

theorem checkEncDec: {msg}.
   dec (enc msg) == (msg : String(16));

Note that we have restricted the theorem to messages of size 16 only, in order to create a  monomorphic theorem that Cryptol’s verification engine can check/prove automatically. (While polymorphic theorems can be stated in Cryptol, they cannot be proven automatically.)

Before attempting a proof, it’s always good to get a quick-check evidence that we’re on safe ground. Here’s what happens when I try to quick-check the above theorem using Cryptol:

SubstCipher> :check checkEncDec
Checking case 1 of 1000 (0.10%)
Falsifiable.
checkEncDec [0x15 0x3e 0xf2 0x4f 0x34 0xc4 0x69 0x5a
             0x64 0x9e 0xb0 0xe2 0x6f 0xf8 0x6a 0x4a]
	= False

Oops! Something went terribly wrong, Cryptol found a counter-example without even really trying. Let’s use Cryptol’s “:p” command to see what the counter-example really is:

SubstCipher> :p [0x15 0x3e 0xf2 0x4f 0x34 0xc4 0x69 0x5a
                 0x64 0x9e 0xb0 0xe2 0x6f 0xf8 0x6a 0x4a]
>?O4?iZd???o?jJ

Ah, it contains characters that we have not mapped! Since our input accepts any 8-bit word, all values from 0 to 255 are valid “characters.” But we clearly have not mapped most of these to anything, causing Cryptol to rightfully reject our theorem!

Conditional theorems

What we need is a conditional theorem, one that has a hypothesis that says “for all good messages.” That is, we need to consider only those messages that our cipher knows how to map. Here’s a helper function to do just this:

all, any : {n a} (fin n) => (a -> Bit, [n]a) -> Bit;
all (f, xs) = [| f x || x <- xs |] == ~zero;
any (f, xs) = [| f x || x <- xs |] != zero;

checkMessage (table, msg) = all (isMapped, msg)
   where isMapped c = any (\(c', _) -> c == c', table);
[Aside: Cryptol's polymorphic constant zero is simply the value that is False at every point. Thus, checking against its negation (~zero) ensures all the elements are True, and checking inequivalence against it (i.e., != zero), ensures at least one element is True.]

Having defined checkMessage, we can now express our correctness theorem (again constrained to strings of size 16):

theorem cipherIsCorrect: {msg}.
      if checkMessage(cipherTable, (msg : String(16)))
      then dec (enc msg) == msg
      else True;

If you try the :check command on this theorem, you will find that Cryptol is now happy with it:

SubstCipher> :check cipherIsCorrect
Checking case 1000 of 1000 (100.00%)
1000 tests passed OK
[Coverage: 0.00%.
           (1000/340282366920938463463374607431768211456)]

The coverage info suggests we have not even barely scratched the test state-space in this case, as one can expect. However, that’s the least of our worries in this case. More importantly, quick-check’s OK is not  satisfactory at all for conditional theorems. How do we know that the random test data generated by Cryptol will pass the checkMessage test? In fact, it is very likely that very few of the test cases actually executed the actual equality check itself, as the “randomly-generated” messages are unlikely to pass the checkMessage filter.

Verification

Luckily, we can do better. Using Cryptol’s automated theorem prover, we can rigorously prove that the theorem is indeed true for all possible messages:

SubstCipher> :prove cipherIsCorrect
Q.E.D.

Cryptol proves this theorem on my 3 year old laptop in less than a second!  (We  should also note that the :prove command is available in the symbolic and SBV backends of Cryptol that comes with the full release. It uses off-the-shelf SAT/SMT solvers to prove Cryptol theorems automatically.)

Now we can rest assured that our implementation of substitution ciphers in Cryptol is indeed correct!

For the curious

As the alert reader would have no doubt noticed, we have only proved our implementation correct with respect to our example translation map cipherTable. A more general theorem would have proved the cipher correct with respect to all possible tables. Such a theorem is indeed easy to express in Cryptol, but you’ll also need to add the condition that the table is well defined. (That is, it should be a one-to-one map and no letter should be mapped to the special invalid marker ‘?’.) We invite the curious reader to play with this variant. Note that the automated proof will be more complicated in this case, and the backend prover might need more time before returning the final Q.E.D.

Download

The Cryptol source file implementing the substitution cipher is available for download. The Cryptol toolset licenses are freely available at www.cryptol.net. Enjoy!

Posted in Cryptography, Formal Methods, Technology | Comments (0)

Tech Talk: Mathematics of Cryptography: A Guided Tour

July 22nd, 2009 by Levent Erkok

The July 28th Galois Tech Talk will be delivered by Joe Hurd, titled “Mathematics of Cryptography: A Guided Tour.”

  • Date: Tuesday, July 28th, 2009
  • Time: 10:30am – 11:30am
  • Location: Galois, Inc.
    421 SW 6th Ave. Suite 300
    (3rd floor of the Commonwealth Building)
    Portland, OR 97204

Slides are now available.

Abstract: In this informal talk I’ll give a guided tour of the mathematics underlying cryptography. No prior knowledge will be assumed: the goal of the talk is to demonstrate how simple mathematical concepts from algebra and number theory are used to build a wide range of cryptographic algorithms, from the familiar (encryption) to the exotic (zero knowledge proofs).

Bio: Joe Hurd is a Formal Methods Engineer at Galois, Inc. He completed a Ph.D. at the University of Cambridge on the formal verification of probabilistic programs, and his work since has included: developing a package management system for higher order logic theories; applying automatic proof techniques for first order logic; and creating the world’s first formally verified chess endgame database.


Galois has been holding weekly technical seminars for several years on topics from functional programming, formal methods, compiler and language design, to cryptography, and operating system construction, with talks by many figures from the programming language and formal methods communities. The talks are open and free. An RSVP is not required, but feel free to contact the organizer with questions and comments.

Posted in Community, Cryptography, Tech Talks | Comments (0)

One Million Haskell Downloads…

March 23rd, 2009 by admin

Galois engineers write a lot of Haskell (in fact, our tech catalogue is built pretty much entirely on it). We find we’re able to build systems faster, with fewer errors, and in turn are able to apply techniques to increase assurance, helping us deliver value to our clients. We’ve successfully engineered large systems in the language for nearly a decade.

We also use and write a lot of open source Haskell code.

Since 2004 we’ve been investing in improving packaging and distribution infrastructure for Haskell code, and since 2007 Galois has been hosting hackage.haskell.org: the central online database of open source Haskell libraries and applications. These packages are built via Cabal (dreamed up by Galois’ own Isaac Potoczny-Jones), and distributed via cabal-install. Hackage now hosts more than 1100 released libraries and tools, and has been growing rapidly (and, incidentally, Galois employees have released or been significant contributors to just shy of 10% of all Hackage projects).

We’ve wondered for a while now just how busy Hackage was becoming, and in turn, what other interesting information about Haskell were buried in the Hackage logs. This post answers those questions for the first time.

We’ll see:

  • Total, and growing, Haskell source downloads
  • The most popular Haskell projects hosted on Hackage
  • The most popular development categories
  • The most popular methods for distributing Haskell source

And speculate a little on where Hackage is heading.

Background

We’ve known for a while that uploads to Hackage were growing. You might have seen this graph elsewhere (it’s derivable from the RSS logs of package uploads):

There’s a pretty clear trend upwards. Average daily Hackage releases have increased 4-fold since Hackage was launched, and it’s now averaging 10 packages a day released. The question is: was anyone using this code?

Measuring Downloads

To measure downloads, we processed the apache logs for Hackage going back to its launch (incidentally, the log processing script – in Haskell of course – uses the Haskell zlib, bytestring, filepath, containers, bytestring-csv libraries). This generates a two dimensional map of downloads per project, per month. (You can find links to the raw data at the end of the article).

We can now play with the data to see some interesting trends.

Some cautionary notes interpreting this data: we only process Hackage package downloads (i.e. “GET binary-0.5..tar.gz” requests). We are only able to measure source downloads – that is, someone downloads a package they will build from source, with GHC. We cannot measure downloads from open source mirrors (such as those provided by the major unix distributions), nor can we measure users of binary packages (such as on Debian, Ubuntu or Fedora), nor can we measure downloads of packages not hosted on Hackage (such as gtk2hs, pugs, darcs or ghc). So this doesn’t represent all Haskell downloads, only downloads of source packages.

We have complete data for Hackage, from the initial alpha tests in July 2006, through to launch in January 2007, until March 2009.

Total Downloads

To begin with, here’s the cumulative downloads from Hackage, over time:

As you can see, we’re just shy of 900 thousand package downloads, and from January 2008 to December 2008 – the first complete year of live operation – there were 500 thousand downloads, with a further 150 thousand downloads in the first 2 months of 2009.

To visualize the growth trend, here is the same cumulative download line on a logarithmic scale:

In the alpha-testing period from July 2006 to July 2007, downloads grew exponentially (4 orders of magnitude in 4 quarters) as existing developers started to use the system. Since the middle of 2007, the rate of growth has slowed, increasing by an order of magnitude in a 15 month period.

Hackage is on target to reach its 1 millionth download next month. We’ll have a party.

Downloads by Month

Next up is the downloads per month, over time. I dropped a bezier curve on top to give a sense of the trend. As we only have partial data for March 2009, I’m excluding that. This graph essentially confirms the same trend as we see in the cumulative graphs. Interestingly, download spikes roughly correspond with upload spikes in our first graph (presumably as people scramble to get the new code).

Hackage is currently seeing 100 package downloads an hour, and that value has been doubling every 4 months for the past year and a half.

Package Popularity

Besides overall downloads, there’s a wealth of per-package information. In the following graphs we extract total downloads for each package (ignoring version numbers).

The popularity graph displays a classic “long tail“, where the download frequency of any package is inversely proportional to its rank in the frequency table.

This suggests that a good interface to a large Hackage database should behave something like the “long tail” sites like Amazon – where we rely on recommendations and other interlinking to ensure visibility of projects in the tail.

The frequency of downloads is even more visible on a double log scale, where the popularity pretty much matches Zipf’s law:

The download frequency doesn’t quite match the classic distribution at the top and tail of the curve. There are two reasons for this (and maybe other factors at play). Firstly, the tail of the curve drops off, as the bottom 10% of popular packages tend to be the newest packages, and so are unde-represented in the “market”.

The other interesting part of the popularity curve is at the top. The top 10 to 50 packages are, to varying degrees, less popular than we might predict. Why is this?

The Distribution Effect

Remember that Hackage is a source-only repository. So it is of primary interest to developers. As a Haskell package becomes more popular, it tends to get picked up by Linux and BSD distributions such as Debian or Ubuntu (and also distributed in binary form for Mac and Windows), removing the need to download the source. Popular packages are doomed to become less popular in source form if the distributions are doing their work!

This is particularly apparent for libraries distributed with GHC (the “extra libs”). Libraries such as containers, arrays, bytestring, parsec and network rarely need to be downloaded in source form, as they’re bundled with GHC forming a platform base. They should thus be under-repesented in source downloads.

You can see this “distribution effect” in this overlay of xmonad and its Debian package installs where source installs decline dramatically as soon as the binary packaging takes off.

Popular packages are doomed to be distributed through other channels.

Most popular packages

And here – for the first time – are the per-package popularity statistics for Hackage. First, the top 25 packages sorted by their cumulative total downloads. Executable applications are marked in blue.

1 xmonad 35428
2 HTTP 26203
3 zlib 24431
4 Cabal 23691
5 X11 21563
6 binary 15752
7 utf8-string 12633
8 mtl 12517
9 cabal-install 12274
10 regex-posix 11351
11 X11-extras 10509
12 xmonad-contrib 9794
13 haddock 9209
14 parsec 8468
15 bytestring 7473
16 regex-base 7438
17 HaXml 6307
18 network 6285
19 xmobar 6272
20 yi 6268
21 hscolour 6264
22 QuickCheck 5697
23 hslogger 5434
24 regex-compat 5266
25 ghc-paths 4653

And the next 75 Haskell packages, in order:

filepath, X11-xft, alex, happy, vty, cgi, terminfo, unix, GLUT, chunks, fingertree, OpenGL, time, pureMD5, regex-tdfa, xhtml, bzlib, Crypto, syb-with-class, hxt, tagsoup, HDBC, MissingH, SDL, haskell-src-exts, plugins, Stream, frag, curl, pcre-light, unix-compat, uniplate, wxcore, hinstaller, stm, html, Diff, polyparse, leksah, HUnit, hmp3, haskell-src, RJson, fastcgi, pandoc, arrows, YamlReference, parsedate, HGL, GLFW, process, extensible-exceptions, zip-archive, iconv, HDBC-sqlite3, TypeCompose, cpphs, hmatrix, HPDF, HAppS-Server, haskell98, hspread, HAppS-Util, rosezipper, gd, dlist, array, yi-gtk, haskeline, HStringTemplate, HAppS-Data, fgl, haskelldb, xml, cabal-rpm

Congratulations to authors of these packages – you made the top 10% most popular releases.

Most Popular Downloads in February

The previous table looked at the cumulative most popular projects. But that doesn’t necessarily reflect what is popular at the moment to download in source form. This table compares January downloads against February downloads, for the top 25 packages in Feburary:

Package Downloads Rank Change
HTTP 2926
zlib 2345
Cabal 2148
cabal-install 1490
utf8-string 1352
xmonad 1280
binary 1174
regex-posix 901 +8
parsec 842 +2
X11 834 -1
xmonad-contrib 754 +1
hscolour 739 -2
terminfo 713 +1
haddock 669 -6
ghc-paths 630 +2
HaXml 600 +12
extensible-exceptions 596 +16
QuickCheck 584 +9
regex-base 558 -4
time 529 -1
darcs 501 1
leksah 500 +18
regex-tdfa 496 +24
hslogger 441 -2

Most popular applications

The 25 most popular Haskell applications hosted on Hackage, to download in source form are:

xmonad, cabal-install, haddock, xmobar, yi, hscolour, alex, happy, frag, leksah, hmp3, pandoc, cpphs, cabal-rpm, darcs, c2hs, hoogle, lambdabot, cabal2arch, hpodder, monadius, lhs2tex, mkcabal, pugs, ghc-core

Note pugs and darcs have been primarily distributed separately to Hackage, until recently.

Most popular libraries by category

We can also determine the most popular libraries and tools in each semantic category on Hackage:

Task Library Downloads
Client-side HTTP HTTP 26203
Database HDBC 3098
XML HaXml 6307
Control mtl 12517
Parsing parsec 8468
Binary Parsing binary 15752
Logging hscolour 6264
Testing QuickCheck 5697
Regex regex-base + regex-posix 7438
Lexing alex 4360
Codec zlib 24431
Unicode IO utf8-string 12633
Sockets network 6285
Build System Cabal 23691
Documentation haddock 9209
Syntax hscolour 6264
3D Graphics GLUT + OpenGL 7345
2D Graphics SDL * 3016
Hashing pureMD5 3460
HTML xhtml 3391
Cryptography Crypto 3243
Generics syb-with-class 3230
IDE leksah 2408
JSON RJson 2222
Markup pandoc 2210
Numerics hmatrix 1844
Web Framework HAppS-Server * 1759
Graphs fgl 1658
Parallelism parallel 1370
Charting chart 1300
Code generation llvm 970
RSS feed 726
Wiki gitit 759

Note that SDL is represented here as gtk2hs (which provides many cairo-based 2D graphics functions) isn’t distributed on Hackage. Also, HAppS-Server has been superceded by happstack.

Honourable mentions

These packages didn’t quite take first place, but still have significant user support. I include happstack, although it is only a month old, as it replaces the previously popular HAppS-Server.

Parsing happy, polyparse 6685
Regex Regex-tdfa, regexpr, pcre-light 8791
Codec bzlib 3275
XML hxt, xml 4871
HTML tagsoup 3141
Client-side HTTP curl 2678
Generics uniplate 2675
2D Graphics wxcore 2601
Testing HUnit 2325
Database haskelldb, hsql 3164
Network Network-bytestring 1210
Control monadLib 1081
Web Framework happstack 549

Future

Finally, we can speculate on what would happen if the current download growth rate continued for a couple more years (projecting forward 18 months, using the trend of the last 18). We’d reach a cumulative total of 10 million source downloads around the end of 2010 (continuing the order of magnitude growth of the last 18 months).

Of course, a lot is unknown in this scenario. If everyone starts installing all their code via cabal-install, downloads will rocket, as does increasing reuse, by using more libraries.

However, if the top Haskell applications and libraries become an order of magnitude more popular, the distros will take them up, slowing growth. Growth will also slow if we run out of resources in some form or another: no more easy libraries to bind to, for example, just as we ran out of existing things to cabalize in 2007.

Get the data yourself

You can play with the data set yourself here:

  • Monthly downloads per package (CSV, HTML)
  • Packages by download frequency (CSV, HTML)

You can also get the full data in a sqlite database, courtesy of mmorrow on #haskell.

Posted in Community, Cryptography, Functional Programming, Galois News, Haskell, Open Source | Comments (8)

Solving Sudoku Using Cryptol

March 18th, 2009 by Levent Erkok

Cryptol is a language tailored for cryptographic algorithms. Sudoku is a popular puzzle the reader  is no-doubt already familiar with. We will offer no deep reason why anyone should try to solve Sudoku in Cryptol; other than the very fact that it’d be a shame if we couldn’t!

Needless to say, Cryptol has not been designed for encoding search algorithms. Nonetheless, some of the features of Cryptol and its associated toolset make it extremely suitable for expressing certain constraint satisfaction problems very concisely; and Sudoku very nicely falls into this category.

Representing the board

A Sudoku board can be represented in a variety of ways. We will pick the simplest: A sequence of 9 rows, each of which has 9 elements storing the digits. Each digit will require 4 bits; since they range from 1 to 9. So, a good Cryptol type for a board is:

     [9][9][4]

In Cryptol-speak, this type simply represents a sequence of precisely 9 elements, each of which is a sequence of 9 elements themselves, each of which are 4-bit words. (Technically, the type [4] also represents a sequence of precisely 4 elements, each of which are bits. But it’s easier to read that as 4-bit words. The type [4] and [4]Bit are synonymous in Cryptol, and can be used interchangeably in all contexts.)

Recognizing a valid row, column, or box

Let us tackle a much simpler problem to start with. How would we determine if a given set of 9 numbers form a valid Sudoku row, column, or a box? We should simply check that each number from 1 to 9 appears precisely once in the sequence:

check : [9][4] -> Bit;
check group = [| contains x  || x <- [1 .. 9] |] == ~zero
   where contains x = [| x == y || y <- group |] != zero;

We simply iterate over the numbers 1 through 9, and check that the given  group contains that number.  The function contains iterates through all the elements in the given group, and makes sure one of them is the currently looked for element.

(The Cryptol primitive  zero is a polymorphic constant representing all False’s. The operator ~ inverts all the bits. Hence, the test “== ~zero” makes sure all the components are True; and the test “!= zero” makes sure at least one bit is True.)

Recognizing a full board

Given a full Sudoku board, checking it’s a valid solution simply amounts to identifying rows, columns, and squares; and “check“-ing them all, in the above sense. The following Cryptol function accomplishes this task rather concisely:

valid : [9][9][4] -> Bit;
valid rows = [| check grp
             || grp <- rows # columns # squares |] == ~zero
 where {
   columns = transpose rows;
   regions = transpose [| groupBy (3, row) || row <- rows |];
   squares = [| join sq || sq <- groupBy(3, join regions) |]
 };

The function valid receives 9 rows; and calls check on all these rows, columns, and the squares. Columns are easy to compute: we simply use Cryptol’s transpose primitive. The squares are slightly more tricky, but not particularly hard. We first group all the rows in length 3 segments, and transpose these to align them, thus forming the regions. Then the squares are simply grouping of the regions 3 elements at a time. It’s a good exercise to precisely work out how the squares are formed using the above code, something we encourage the interested reader to do on a rainy afternoon..

Solving Sudoku

All we have done so far is to recognize a given Sudoku board as valid; we have not written a single line of code to actually fill a partially empty board. The good news is that we do not need to! We have all the bits and pieces ready to go. Sounds too good to be true? Well, read on!

Enter Formal methods

What if I told you that recognizing a valid Sudoku board is  sufficent to actually solve one that has empty squares on it, using Cryptol’s formal-methods toolbox? The idea is rather simple. But before we get there, we need to take a detour into the Cryptol toolbox.

Checking  satisfiability

Cryptol’s formal-methods tools can perform equivalence, safety, and satisfiability checking. We have talked about the former two in an earlier post. Today, we will look at satisfiability checking only. Given a function  f, the satisfiability checking problem asks if there is any  x such that f x = True. Here is a simple example. Let:

  f : [8] -> Bit;
  f x = x*x - 7*x + 12 == 0;

The function f returns True if its given 8-bit argument is a solution to the quadratic equation x2 – 7x + 12 = 0.  We have:

Cryptol> :sat f
f 4
         = True

Indeed, 4 is a solution to this equation. Is there any other solution? It is easy to formulate a similar query using the lambda-notation:

Cryptol> :sat (\x -> f x & (x != 4))
((\x -> f x & (x != 4))) 3
         = True

Cryptol tells us 3 is a solution as well! Since this is a quadratic equation, there can be at most two solutions; let’s verify:

Cryptol> :sat (\x -> f x & (x != 4) & (x != 3))
No variable assignment satisfies this function

As expected, Cryptol confirms that 3 and 4 are the only  8-bit values that satisfy the equation x2 – 7x + 12 = 0.

(I should mention  that the :sat command is available only in the symbolic and sbv backends of Cryptol; the two main backends of Cryptol that are capable of performing formal-verification.)

Back to Sudoku

Remember the valid function that returns True if a given full board is a correctly laid-out Sudoku board? With the magic of satisfiability checking, we can just use that definition to fill in the blanks for us! To illustrate, consider the board below.

How do we encode a board with empty cells in Cryptol? One simple idea is to represent the board as a function: It will take the values of its “empty” cells, and return the full board. In the Cryptol encoding below I have tried to align the variables so that they correspond exactly to the empty cells, and named them row-by-row:

  puzzle : [53][4] -> Bit;
  puzzle   [ a1    a3    a5 a6       a9
             b1       b4 b5    b7    b9
                c2    c4 c5 c6 c7 c8 c9
             d1 d2    d4    d6 d7 d8
             e1 e2 e3    e5    e7 e8 e9
                f2 f3 f4    f6    f8 f9
             g1 g2 g3 g4 g5 g6    g8
             h1    h3    h5 h6       h9
             i1       i4 i5    i7    i9 ]
   = valid [[a1  9 a3  7 a5 a6  8  6 a9]
            [b1  3  1 b4 b5  5 b7  2 b9]
            [ 8  c2 6 c4 c5 c6 c7 c8 c9]
            [d1 d2  7 d4  5 d6 d7 d8  6]
            [e1 e2 e3  3 e5  7 e7 e8 e9]
            [ 5 f2 f3 f4  1 f6  7 f8 f9]
            [g1 g2 g3 g4 g5 g6  1 g8  9]
            [h1  2 h3  6 h5 h6  3  5 h9]
            [i1  5  4 i4 i5  8 i7  7 i9]];

It might take a bit of staring at this definition; but the idea is strikingly simple.  Notice that the type of puzzle is [53][4] -> Bit, precisely because there are 53 empty cells. Also, instead of just returning the final board, I simply pass it to the function valid; so that the function puzzle will return True precisely when it is given the correct numbers that solve it!

By now, it must be obvious how we’ll solve Sudoku in Cryptol: All we need to do is to ask Cryptol to find the right input value to make the function return True, i.e., we need to find a satisfying assignment. Here’s the response from Cryptol:

Sudoku> :sat puzzle
puzzle [2 5 4 3 1 4 8 6 9 7 7 1 9 2 5 4 3 3 8 4
        9 2 1 6 1 2 8 4 9 5 4 9 2 6 3 8 7 6 3 5
        2 4 8 9 8 7 1 4 1 9 3 6 2]
         = True

If we plugin the numbers we get from Cryptol back into the grid, we get the full  solution depicted below. (I used italic for the numbers found by Cryptol.)

Well; that’s what we set out to do originally; so mission accomplished!

What just happened here?

Apologies if you were expecting to see Cryptol code that actually searched for the values of the empty cells! Note that we have not written a single line of code that tried to deduce what must go in the empty cells, nor  have we implemented a search algorithm. We merely viewed Sudoku as a satisfiability problem, and asked Cryptol’s formal-methods tools to find the missing values for us. The necessary search is all done by the underlying formal-methods engine, freeing us from the labor. Yet another instance of telling the computer “what” to do, instead of “how.”

Download

Here is the Cryptol code that contains all the definitions you need.  Enjoy!

Posted in Cryptography, Formal Methods, Functional Programming, Technology | Comments (6)

Trustworthy Voting Systems

March 2nd, 2009 by Aaron Tomb

Accurate and reliable elections are a critical component of an effective democracy. However, completely secure and trustworthy voting procedures are difficult to design, and no perfect solutions are known. Ideally, a trustworthy voting system should guarantee both verifiability (the ability to prove that the counted vote matches the submitted ballots) and privacy (the inability to link the contents of a vote with the voter who cast it).

These guarantees may now be achievable. Many researchers have proposed voting protocols that achieve verifiability and privacy in theory, and a few do so under assumptions that are satisfied by current election practices. Most of the protocols involve posting an encrypted version of the contents of every ballot in some public place (likely a web site), and depend on the properties of cryptographic operations to achieve privacy while allowing anyone to verify the final tally.

Now that practical, secure voting protocols exist, the time has come to bring them into use. One existing solution that comes close to achieving these goals while retaining compatibility with current voting practices is the Scantegrity II system. It has the advantage that it can operate under current US election conditions, without requiring any modification to existing optical ballot scanners, and with very little change to the individual voting process.

However, the software used in this system is only a prototype, with a number of shortcomings. Voter privacy depends on ability of a computer system to keep a key database completely secret, and accurate vote counting depends on the correct implementation of complex cryptographic algorithms. The software is tens of thousands of lines of code, and as with any other software of that size, many bugs certainly exist.

We believe that the importance of trustworthy election results and the past lack of success in creating reliable solutions warrants a new approach to the design of voting systems. In particular, we advocate a class of techniques known as formal methods that allow us to make precise mathematical assertions about how software should behave, and determine whether it satisfies those assertions. Government agencies within the Department of Defense make use of formal methods to ensure the reliability of important computer systems, and the draft update to the development standards used by the Federal Aviation Administration, DO178C, includes provisions for the use of formal methods. Voting systems deserve similar care.

One Approach

Formal Methods

Software designers employ a number of techniques to combat the difficulty of producing correct software. At the simple extreme, running the software directly through a few scenarios that match expected operating conditions and observing the results can help eliminate egregious mistakes, but does little to prevent subtle mistakes or malicious attacks. Most software, including that in existing voting machines, undergoes little more than this preliminary sanity checking.

The goal of formal methods is instead to prove that a piece of software necessarily behaves properly under all possible circumstances. The first step toward this goal is to make a connection between the concrete software and some well-defined mathematical model. It can be possible to extract a model directly from the concrete software. Alternatively, software architects can build the model first, and derive the concrete software from the mathematical abstraction.

In either case, the connection between the model and the concrete software should be such that a proving a property of the model guarantees that the property holds of the software.

A Formal Model of the Voting Process

We advocate developing a formal model of the complete pipeline of voting events, including ballot creation, vote marking, ballot recording or scanning, ballot aggregation and counting, and final certification. If described appropriately, in a fully-executable language, portions of this formal model can be used directly in a software implementation. This method of software development also makes feasible a wide variety of techniques to gain confidence in the correctness of the software, potentially including a complete formal proof of correctness for key components. It has the additional benefit of allowing us to make confident statements about the security and reliability of the non-computerized stages of the process.

The Voluntary Voting System Guidelines recommendation released in 2007 by the National Institute of Standards and Technology makes significant steps toward a formal definition of voting, going as far as including UML process diagrams and mathematical descriptions of a few important properties. We advocate taking this work a step farther, and describing as much of the process as possible in a precise, formal, and machine-readable language.

High-Assurance Cryptography

Voting protocols that provide simultaneous guarantees of integrity and privacy typically depend heavily on the properties of cryptographic algorithms. While, in theory, these algorithms transform data in ways that achieve the properties necessary for trustworthy voting, it is absolutely critical that the algorithms be implemented properly.

Recent work has made significant progress toward high-assurance cryptography. Researchers at Stanford have done work to formally verify that Java code implementing a cryptographic algorithm matches a specification written in a functional language. Galois’ own programming language, Cryptol, is also a functional language, designed specifically for implementation of cryptographic algorithms at a high level of abstraction, and also allows the high-level specification to be proven equivalent to code generated in C or VHDL. The use of a functional language in both approaches allows developers to describe cryptographic algorithms in a form that closely resembles their mathematical specification, making it significantly easier to gain confidence in their correctness.

The existence of these approaches demonstrates the feasibility of high-assurance implementations of cryptographic algorithms. These specific technologies may or may not be the most appropriate solutions for a standard, highly trustworthy voting system, but they show that such a system can be built.

Guaranteed Privacy

Secure voting protocols often depend on the secrecy of certain pieces of information to protect voter privacy. To prevent vote buying and voter intimidation, it must be infeasible to prove that a specific voter cast a particular vote.

Technology exists to provide this guarantee, as well. Work on cross domain solutions has resulted in a number of highly trustworthy systems. As an example, Galois’ Trusted Services Engine (TSE) includes a software component that enforces strong separation between different information domains. A formal proof of correctness, designed for evaluation at EAL6, guarantees that private information can never be accessed by unauthorized parties. General Dynamics also has a whole suite of products aimed at this market.

In the context of voting systems, a cross domain component could be used to protect the database used to generate ballots. This database could be inaccessible to any human user, while still allowing auditors, and even the general public, to view the data necessary to tally votes, and verify that any particular vote was recorded correctly.

Though these existing systems, as they stand, may not be  entirely appropriate for use in a voting system (none of them are available to the general public, for instance) they do show that it is feasible to gain a high degree of assurance in the separation of secret and public data.

Openness

Because of the importance of voting to an effective democracy, and the temptations to subvert the process, we believe that transparency is essential. Therefore, we advocate that all software source code, design documents, and assurance artifacts involved in a voting system be made freely available to the public (either in the public domain or under an OSI-approved open source license). In particular, any interested party should be able to obtain the source code and use it to build executable software identical to that stored in the voting systems in use.

Conclusion

We argue that a key component of a secure voting system is trustworthy software that is developed using the best known assurance techniques and completely open to public inspection. In addition, the essence of the voting process should remain software independent, in that a manual recount should always be possible. Higher software reliability should make the necessity of a hand recount less common, but software should never be designed in such a way that a hand recount is impossible, or even more difficult than it would be in a completely manual election.

Posted in Cryptography, Formal Methods, Open Source, Technology | Comments (6)

Equivalence and Safety Checking in Cryptol

February 5th, 2009 by Levent Erkok

The Cryptol language comes with an integrated verification tool-set that can automatically perform equivalence and safety checking on Cryptol programs. Recently, we have presented a paper on this topic at PLPV’09: “Programming Languages Meets Program Verification” workshop. (Slides are also available.)

Briefly, equivalence checking refers to the problem of proving that two functions have the exact same input/output behavior. Typically, these functions are versions of the same algorithm; one being a reference implementation and the other being an optimized version. Cryptol automatically establishes that the optimized version is precisely equivalent to the original. If the functions are not equivalent, Cryptol provides a counter-example where they disagree; aiding greatly in development/debugging.

Safety checking refers to the problem of proving that the execution of a function cannot raise any exceptions; such as division by zero; index out-of-bounds, etc. When the safety checker says that a function is safe, you will know for sure that such conditions will never arise at run-time. (Similarly, you will get a concrete counter-example from Cryptol if this is not the case.)

Cryptol uses symbolic simulation to translate equivalence and safety checking problems to equivalent problems using the bit-vector logic of SMT-Lib. Furthermore, Cryptol has built-in connections to several SAT/SMT solvers. It automatically calls these provers and presents the results to the user in original Cryptol terms; providing a seamless verification environment for the end-user.

The full paper and slides on equivalence checking in Cryptol is available for download.

Posted in Cryptography, Formal Methods | Comments (0)

Cryptol User’s Mailing List

January 29th, 2009 by Levent Erkok

Cryptol now has a mailing list for discussions: The language, the tool set, programming idioms, everthing and anything related to Cryptol. Looking forward to seeing you join us!

Posted in Community, Cryptography, Formal Methods, Technology | Comments (0)

A Cryptol implementation of Skein

January 23rd, 2009 by Galois

Following on from the MD6-in-Cryptol posting, let’s consider another very interesting candidate from the (deep) pool of SHA-3 submissions; Skein

  http://www.skein-hash.info/
  http://www.schneier.com/skein.html

by the merry band of Ferguson, Lucks, Schneier, et al.

The expression of their reference implementation comes out, we think, fairly cleanly in Cryptol. The digest output size is a variable parameter to the algorithm, but we’ll focus on the 512-bit version here — the submission’s primary candidate for SHA-3.

In order to avoid duplicating the introductory material on Cryptol, we suggest the reader go through the MD6 writeup to get a grounding in Cryptol, its idioms, and syntax.

The structure of Skein

Skein refers to a family of hashing functions, differing in the size of their internal state. We’ll look at the 512-bit version here. Skein is built up out of three main components:

  • A tweakable block cipher, Threefish.
  • The Unique Block Iteration(UBI) chaining mode, defining the mode-of-operation by the repeated application of the block cipher function.
  • Optional arguments to the hashing function. Extra functionality and input to the hashing function can be accommodated via Skein’s optional argument system. There is also input, like key material and randomized-hashing via nonces, etc. (see spec for details). We won’t consider any such additional inputs in our implementation of the basic hashing functionality.

We’ll take the same tack as with the MD6 presentation – starting at the top by showing the mode-of operation function, working our way down to the tweakable block cipher that the Skein team developed as part of Skein.

The Skein hashing function

The pure/simple Skein hashing function (512 bit) transforms an arbitrary length byte sequence into a 512-bit output:

  hash512 : {a}
              // type constraints
            ( fin a  // finite input
	    , 32 >= width (8*((8*a+7)/8))
	    , 32 >= width (8*a)
	    , 64*(((8*a+7)/8+63)/64) >= (8*a+7)/8
	    , 8*((8*a+7)/8) >= 8*a
	    )
	  => [a][8]     // input message
	  -> [64][8];   // 512-bit result

The Cryptol type signature captures that (and more!), with the [64][8] result packaging up the result into a 64-element wide sequence of 8-bit bytes. The type constraints look involved, but capture the pre-condition that the input message size has already been padded out to the desired width. (Writing Cryptol functions that automatically perform padding based on input sizes is possible, but would be a distraction here.)

The definition of hash512 closely mirrors the specification (Section 3.5.4):

  hash512 m = output512 g1
    where {
     k   = (zero : [64][8]);
     c   = mkConfig(512); // config string, see Section 3.5.2 of spec.
     g0  = ubi512 (encrypt512, k,join c, t_cfg);
     g1  = ubi512 (encrypt512,g0,join m, t_msg);
    };

i.e., it is the double application of the chaining mode UBI, followed in the end by the computation of the result by the output512 function. As the internal size has been fixed here to 512, the innermost UBI computation is actually all over constant inputs and could be folded/optimized away. This optimization can be done manually or by the Cryptol tools, but we won’t bother with the optimization here.

The output function is simply another application of the UBI:

  output512 : [64][8] -> [64][8];
  output512 g = ubi512 (encrypt512, g, (zero:[64]), t_out);

i.e., a straight UBI computation of the internal state argument g to the null message and a special output type value.

Regarding t_out, the various t_X values in the above represent the type values that Skein defines –

  t_out : [128];
  t_out = ((0x3f: [6]) # zero) << 120;

  t_cfg : [128];
  t_cfg = ((0x04: [6]) # zero) << 120;

  t_msg : [128];
  t_msg = ((0x30: [6]) # zero) << 120;

i.e., padded out to 128-bit widths so as to fit with subsequent use as basis for Skein’s tweak values.

That’s the toplevel driver for the Skein hashing function. Let’s consider next the chaining mode that it builds upon.

UBI: the Unique Block Iteration chaining mode

The chaining mode employed by Skein is parameterized over the following values:

  • G – (chained) internal state / starting value.
  • M – the message string.
  • T – a 128-bit starting value for the tweak (the type value from the above.)

As a Cryptol type signature, its external interface would be

  ubi512 :  {a}
            ( fin a
            , 32 >= width (8*((a+7)/8))
            , 32 >= width a
            , 64*(((a+7)/8+63)/64) >= (a+7)/8
            , 8*((a+7)/8) >= a
            )
         => ( (([64][8],[16][8],[64][8]) -> [8][64]) // block cipher function
	    , [64][8]  // initial (chained) value / state.
	    , [a]      // input message
	    , [128]    // tweak base value
	    )
         -> ([64][8]); // 512 bit result

The type constraints all involve the type variable a, which is the size of the input message. Together they express the padding assumptions made or introduced over that bit string.

The function takes four arguments, the first being a function value — the block cipher function to apply for each of the blocks that the input message is divvied up into. We’ll have more to say about its type and properties in the next section when we show how the Threefish cipher is expressed in Cryptol.

The other three are the G, M, and T parameters to the UBI, which we can now define as follows:

  ubi512 (encryptor,g,m,t) = last hs
   where {
    hs = [g] #
       [| m ^ split(join(encryptor(h,
                  groupBy(8,mkTweak(t,width(m1),nb,width ms,i,b)),m)))
       || h <- hs
       || m <- ms
       || i <- [(0:[32])..]
       |];

    //ms : [N][32][8];
    ms = toBlocks64(groupBy(8,m1));

    nb = 64;
    m1 = pad8(m);
    sz = width(m1);

    b = False;
   };

Like before, let’s start with the result value and work ourselves inwards from there — the result being the last element of the hs sequence. hs represents the result of processing each block (512 bits) of the input message. The definition of hs expresses how to compute block N’s digested output:

   hs = [g] #
       [| m ^ split(join(encryptor(h,
                    groupBy(8,mkTweak(t,width(m1),nb,width ms,i,b)),m)))
       || h <- hs
       || m <- ms
       || i <- [(0:[32])..]
       |];

The iterative nature of the chaining mode is captured as a recursive sequence comprehension in Cryptol, with each of the arms of the comprehension providing input to the UBI’s result computation (e.g., m <- ms selects each block in turn.) That result computation is a bit of a mouthful, but easy enough to pick apart a bit:

  m ^ split(join(encryptor(h,
           groupBy(8,mkTweak(t,width(m1),nb,width ms,i,b)),m)))

i.e., XORing the message block with the application of the block cipher encryptor

 encryptor(H,Tweak,MsgBlock)

where Tweak is a 128-bit value made up out of a range of parameters, globally-constant, per-UBI-specific and per-round, all contributing to diffusing the output. The h is the output from the previous invocation of the block cipher, i.e., to compute element we need to have access to the previous UBI result only.

We hope it is relatively straightforward to map the definition of hs back to the recurrence relation that defines the UBI in Section 3.4 of the Skein specification.

With tweaks and message blocks in place, it is time to consider the block cipher that Skein uses.

Threefish, a tweakable block cipher

This block cipher is parameterized over the following values:

  • K – block cipher key (512 bits / 64 bytes)
  • T – tweak (128 bits / 16 bytes)
  • P – plaintext (equal to key)

Deriving the Cryptol type signature from that is straightforward:

  encrypt512  : ([64][8],[16][8],[64][8]) -> [8][64];

with the result being 8 64-bit words. This simple type hides many a clever detail though:

  encrypt512 (key,tweak,pt) = vn + kn
    where {
     vn : [nw][64];
     vn  = last vss;

     kn : [nw][64];
     kn  = last kss;

     // ...more below...

The result is the element-wise addition of two 64-bit word sequences of width 8 (cf. the definition of the ciphertext C in Section 3.3 of the specification). kn is the last value from the key schedule of the cipher, and vn is the last value from the round computation. The key schedule is a Cryptol sequence of values, splitting up the input key and tweak into round/4 sub-keys:

     kss : [19][nw][64];
     kss = [| keySchedule_8(tw_words, key_words, d)
           || d <- [0 .. nr/4]
           |];

     key_words : [nw][64];
     key_words = split(join key);

     tw_words : [2][64];
     tw_words = split(join tweak);

     nw  = 8;
     // Threefish-512, like 256, has 72 rounds:
     nr  = 72;

We won’t show the implementation of the keySchedule_8() function here, but the reader is encouraged to download the sources and look it over in Skein/Threefish.cry.

The subkeys are combined together with the per-round values:

     vss : [73][nw][64];
     vss = [pt_words] #
           [| fs @@ pi_8 || fs <- fss |];
	      // per-round mix values are permuted wrt the pi
	      // permutation vector (cf. Table 3 in Section 3.3)

     // the per-round mix-in values
     fss : [72][nw][64];
     fss = [| mix8(d,es)
           || d  <- [0 .. (nr-1)] // one per round
	   || es <- ess
	   |];

     ess : [72][nw][64];
     ess = [| if d%4 == 0 then v + k else v
           || d <- [0 .. (nr-1)]
	   || k <- pad3 kss   // key schedule sub-keys
  	   || v <- vss
	   |];

     pt_words : [nw][64];
     pt_words = split(join pt);
    };

Again, we won’t go into further details on the internals of the cipher, but it is worth highlighting the use of mutually recursive sequences values — vss is computed in terms of fss, which is defined in terms of ess, which ties the knot back to vss in its definition. The recursive nature works out by being initialized with pt_words as the result of round 0.

The MIX function

Threefish uses the MIX function to perform its permutation and mixing; encoded in Cryptol as follows:

  mix8 : ([8],[8][64]) -> [8][64];
  mix8 (d,xss) = join [| mix_8(d,j,xs)
                      || j <- [0..3]
		      || xs <- groupBy(2,xss)
		      |]
   where {
    mix_8 : ([8],[8],[2][64]) -> [2][64];
    mix_8 (d,j,[x0 x1]) = [y0 y1]
      where {
        y0 : [64];
        y0 = (x0 + x1);

        y1 = (x1 <<< (rs_8 @ j @ ((d%0x8)#zero))) ^ y0;
      };
  };

Included here for completeness, the round-dependent permutation of its 512-bit input maps relatively cleanly onto the specification of MIX in Section 3.3.1. The built-in groupBy operator perform useful function here in dividing up its per-round input:

  groupBy : {a b c} (b,[a*b]c) -> [a][b]c

i.e., if the second argument is a multiple of the first (b), we can divide it up into a values of width b. Use it here to split up the 512-bit input into 4 pairs of 128-bit values.

Concluding remarks

Skein has an appealingly clean structure and composition, something we hope to have maintained some in its Cryptol rendition. The use of recursive sequence definitions to express the chaining and iterative nature of the algorithm is one place which helped in maintaining the clear mapping back to the specification.

Having a clean reference implementation like this has considerable value, but using it with Cryptol and its toolchain, you may take advantage of its definition in a number of ways. Please see the concluding remarks for the MD6 posting for an outline of what they are.

Downloading Cryptol Skein

If you want to try out the implementation of Skein in Cryptol, it’s available from download as a separate archive. Load it into your Cryptol interpreter, and verify its sanity by running some of the known-answer tests:

  Cryptol> :load Skein.cry
  ...
  Skein> ts_1
  True
  Skein> ts_4
  True
  Skein>

To have a closer look at the underlying code and experiment with it, edit the .cry files followed by :reload from the Cryptol interpreter prompt.

Want more?

If you’d like to see more examples of Cryptol, maybe other SHA-3 candidates, please let us know. Or, better still, have a go at implementing some of them yourself. The CubeHash submission by D.J. Bernstein is perhaps a good place to start?

Author

("Sigbjorn Finne" # "Galois, Inc" # "2009-01-23") : [35][8]

Posted in Community, Cryptography, Formal Methods, Functional Programming, Technology | Comments (3)

MD6 in Cryptol

January 23rd, 2009 by Galois

NIST is currently running a competition to come up with the next generation message hashing function that it intends to standardize and FIPS recommend upon completion (assuming one good candidate is left standing and well at the conclusion of the evaluation process):

    http://csrc.nist.gov/groups/ST/hash/sha-3/index.html

Apart from the need to come up with better alternatives to its current recommendation, the SHA-2 family of hashing functions, this competition draws inspiration from the success that the AES competition had a couple of years ago in engaging the community in coming up with a replacement for the DES block cipher. As then, a lot of new innovation has resulted.

As with block ciphers, many common types of hashing functions lend themselves well to expression in Cryptol. To demonstrate some of the features of Cryptol and how it could be used to express SHA-3 candidates, here’s one of the submissions, MD6 from the CSAIL group at MIT, headed by Ronald L. Rivest:

   http://groups.csail.mit.edu/cis/md6/

The goal of this writeup is twofold:

  • Introduce you to the MD6 hashing algorithm and its construction.
  • Expose you to the Cryptol language, and how it lends itself to expressing MD6.

Ideally, you’ll come away with enthusiasm on both accounts!

Setting the context

We won’t try to re-count or re-express the design and implementation details of MD6 here, but assume the reader has either a basic knowledge of its structure or is able to separately consult a description of MD6 (such as its specification, a mighty fine and interesting document.) Instead, we’ll limit ourselves to presenting some code snippets representing key portions of MD6 and highlight the Cryptol features that it employs. A pointer to the complete code of MD6 in Cryptol is provided at the end, for your own use and experimentation.

The MD6 hashing function mode

At the toplevel, MD6 accommodates running in a number of different modes, tailored to the resources of the execution environment and external parameters like the digest output size and the number of rounds to apply each step. The implementation presented here is limited to the sequential/iterative version.

The Cryptol function that implements the MD6 sequential mode of operation is md6, having the following type:

  md6 : {a b c d}
        ( ...type constraints... )
     => ( [8*a][b]   // input message
        , [32]       // bit width of original message input
                     // (before byte-padding)
        , [8]        // number of rounds (104 for 256.)
        , [c][8]     // key (zero for hashing applications)
        , [32]       // output digest length (in bits.)
        )
     -> [16][64];

To explain the Cryptol type signature, let’s start at the end — the result. It is made up of 16 64-bit words. In Cryptol, [N] is the type of an N-bit wide value. You can concatenate the [] sequence type operators, so [N]Ty is the type for a sequence of N Ty values; in fact, the [N] syntax is a shorthand for [N]Bit, i.e., an N-bit wide sequence of bits. So for the above result type, [16][64] is the type of a sequence of 16 64-bit values.

The inputs to the md6 function mirror those of the parameters to MD6 SEQ (see specification; Fig 2.6). The argument types make use of Cryptol’s size polymorphism via type variables; for example, the input message’s type is given as [8*a][b]. This means: a sequence with widths that are a multiple of 8 (i.e., if needs be, the input message has already been padded out to make it so.) of [b] values, i.e., b-bit wide words (for any b).

Notice that we’ve left out the type constraints from the above signature. It is a somewhat long and involved for this function, but this portion of the Cryptol type signature let’s you state constraints that must hold for type variables, like:

  select5 : {a b} ( a >= 5 ) => [a]b -> b;
  select5 xs = xs@4;

The indexing operator @ better be restricted to sequence values that have at least 5 elements, which is what the type constraint expresses here. If you try to evaluate select5 with the Cryptol interpreter for narrower sequence, you’ll see something like this:

  Cryptol> :l "select.cry"
  ...
  Select> select5 [1 2 3]
  ["command line", line 1, col 1 "select.cry", line 1, col 13]:
  Inferred size of 3 is too small - expected to be >= 5
  Select> select5 [1..10]
  0x5
  Select>

That’s just a flavor of Cryptol’s type system; it is capable of expressing the shape and sizes of the data you manipulate quite precisely and with some sophistication. Like size polymorphism for the md6 mode function signature, letting you use type variables instead of concrete sizes, but in a constrained manner. The size preconditions that must hold for an argument to a function like md6 are separately specificed using type constraints.

Notice that Cryptol is able to infer the most general type for most definitions and values, so writing out the type signature is optional. But due to its compact and precise nature, it provides valuable documentation that’s worth including alongside its definition. One popular way of using the type inference mechanism is to have Cryptol come up with the initial type for a definition, which you then copy in and use. Indeed, that was how the md6 type signature ended up being derived!

Enough on types for now; time to get to grips with the bit and word values that md6 work over.

The MD6 mode of operation

MD6 defines its toplevel, mode-of-operation as follows:

  md6 (msg,wi,rnds,ks,digLen) = compress(last_chunk, rnds)
  where {
   // The final 89-word chunk to compress:
   last_chunk = qs # key # [uu vv] # r # last padMsg;

   r = cs @ (width(padMsg) - 1);

   //
   // sequential mode of operation, each compression round
   // is fed into the next (cf. the 'c <- cs' knot-tying below.)
   //
   cs = [c0] # [| compress((qs # key # [u v] # c # b), rnds)
               || c <- cs
               || u <- us
               || v <- vs
               || b <- padMsg
               |];

   ...
  };

MD6 iteratively operates on chunks of 89 64-bit values, each chunk made up out of internal state, the chained output from the previous “round” (this being the sequential, Merkle-Damgaard rendition of the algorithm.) The result is one final application of the diffusion function compress on the last of these chunks (last_chunk.) We won’t drill down here into detail about how that final chunk is constructed, but the syntax of last_chunk’s Cryptol definition merits some unscrambling:

    last_chunk = qs # key # [uu vv] # r # last padMsg;

i.e., it is a sequence constructed by the concatenation of number of sub-sequences (# is the Cryptol concatentation operator); the constant Q sequence (see spec), an (optional) key, the control words U and V, and the final portion of the suitably padded input message. Preceding it is r, which is the output of the (last-1) chained compression steps. That value (r) is computed by taking the last element of the cs sequence:

 //
 // sequential mode of operation, each compression round
 // is fed into the next (cf. the 'c <- cs' knot-tying below.)
 //
 cs = [c0] # [| compress((qs # key # [u v] # c # b), rnds)
             || c <- cs
             || u <- us
             || v <- vs
             || b <- padMsg
             |];

It is defined using Cryptol’s parallel sequence comprehensions, prefixed by the initial value/chunk (IV) c0. The function compress is computed for each element, with the desired chaining of output from one compress invocation being fed into the next through the use of the recursive use of cs in one of the “arms” of the comprehension.

Recursive sequences allow us to capture stateful computations in Cryptol without the explicit mentioning of registers or an internal state being iteratively updated. It is a common “functional” / declarative idiom used in Cryptol code, using such sequences to express chaining/feedback.

As c0, which can be likened to the initial state of the mode function, is initially prefixed to cs, the recursive definition will produce values for any N, i.e., element N can be computed only by knowing the value of the sequence element (N-1) — not N itself nor successive elements. The number of elements in this comprehension is equal to the length of the shortest of its “arms”. In the above, that would be the padded-out sequence of message chunks.

The net effect being that the md6 mode function will iterate over all the chunks of the input message, applying at each step the compression function to the internal state and the chained output from the previous step. To finalize and generate output, the compress function is called again on the final element.

The construction used by this MD6 mode (i.e., it’s not the only one, alternative versions are supported by the spec.) is standard. Indeed, we could easily re-factor the md6 function to have it be parameterized over the compression function to call at each step. The resulting chained-mode function could then be instantiated by any algorithm that uses the same mode-of-operation, but furnishing their own compression/diffusion function.

Back to MD6 in Cryptol, expressing the compression function is next.

The compress function

At the heart of MD6 is the function that compresses/diffuses the incoming message, called repeatedly by the chained mode in the previous section. Like with md6, let’s first consider its Cryptol type:

  compress : ([89][64],[8]) -> [16][64];

A function of two arguments, the first being the message block/chunk to process, which consists of 89 64-bit words. The second argument is the number of rounds to iterate over that chunk. For 512-bit digest outputs, the suggested number of rounds is 168.

The result is 16 words wide. The function is expressed as follows in Cryptol:

  compress (chunk,r) = as @@ outs
   where {
     as   = chunk #
             [| cf (a,b,c,d)
             || a <- ss
             || b <- a_ts
             || c <- rs
             || d <- ls
             |];

     cf (s_i,a_i,r_i,l_i) = x3
      where {
        x1 = s_i ^ a_i;
        x2 = x1 ^ (x1 >> r_i);
        x3 = x2 ^ (x2 << l_i);
      };

     a_ts =
       [| a ^ b ^ (c & d) ^ (e & f)
       || a <- as
       || b <- drop(n-t_0,as)
       || c <- drop(n-t_1,as)
       || d <- drop(n-t_2,as)
       || e <- drop(n-t_3,as)
       || f <- drop(n-t_4,as)
       |];

     outs : [16][32];
     outs = [| (t + 89 - 16 + x) || x <- [0 .. 15] |];

     t : [32];
     t = (r # zero) * (c # zero);

     c : [8];
     c = 16;

     n : [8];
     n = 89;
   };

The final 16 word result is produced by applying the @@ operator, which indexes its first argument sequence value by the indices from the second. The MD6 compression function slides a window across its input chunk, so in this case the result is the last 16 words in that window — see the MD6 spec for details about the portion that outs selects.

The as sequence is defined using a common Cryptol idiom, the parallel sequence comprehension:

   as   = chunk #
           [| cf (a,b,c,d)
           || a <- ss
           || b <- a_ts
           || c <- rs
           || d <- ls
           |];

Its first element is the input chunk, followed by sequence values of equal chunk width, each computed by applying the function cf to successive elements of four generator sequences, ss, a_ts, rs, and ls. Apart from a_ts, these are constant-valued sequences that the MD6 specification defines. a_ts is rather more interesting, another parallel sequence comprehension:

   a_ts =
     [| a ^ b ^ (c & d) ^ (e & f)
     || a <- as
     || b <- drop(n-t_0,as)
     || c <- drop(n-t_1,as)
     || d <- drop(n-t_2,as)
     || e <- drop(n-t_3,as)
     || f <- drop(n-t_4,as)
     |];

It is recursively defined in terms of the above as sequence value, i.e., to compute element I it reaches ‘back’ at previous values at 5 different positions of the ‘as’ sequence, as determined by the tap positions t_X ( 0 =< X <= 5) that MD6 specifies. In the words of the specification, a_ts implements a non-linear feedback shift register. The drop(Num,Seq) Cryptol built-in operator returns the sequence Seq, but with its first Num elements chopped off. In the above, n is the constant chunk size, 89. The values from these different positions are then combined together by simple (and hardware efficient) composition of word-sized bit operators.

The result of compress is then the final 16 words of the as sequence. The length of the as sequence is determined by the number of iterations or rounds to perform, r.

That’s it – a tidy and simple compression function at the heart of MD6. Its expression in Cryptol is, we hope, reasonably clear and close to the specification.

What’s really gained from using Cryptol here? From a programming perspective, having the backing of its sized type system lends great help wrt. correctness and consistency of the word-level operations. The resulting declarative Cryptol program has a number of follow-on benefits though:

  • to generate and derive an efficient hardware implementation (for mapping to FPGAs, perhaps). The types aid the generation of this hardware mapping. But, more importantly, use of the recursive sequence idiom allows the Cryptol implementation to infer how to effectively ‘render’ the function in hardware, mapping the feedback loop to a register-based design.
  • as input to Cryptol’s C code generator to output C code for embedding into other codebases.
  • to check correctness of the generated implementations by using Cryptol’s symbolic verification support.
  • for prototyping and ‘playing around’ with ideas and variations of MD6. Cryptol’s equivalence checking support can help you determine if the transformations you do to one maintain equivalence with previous versions.
  • to check equivalence between this Cryptol ’specification’ and other, non-Cryptol implementations in either C or VHDL. Cryptol’s toolchain is able to symbolically execute code in either language, followed by the equivalence checking them via SAT solvers.

Being nice and high-level isn’t the goal (but we don’t mind! :-) ), but using Cryptol to help the hard work of generating algorithms that are correct and possibly high-performing.

Cryptol sample implementation

The Cryptol files making up the MD6 implementation is available for download via the Galois Cryptol pages as a .zip archive. To use the code, simply load it in:

  Cryptol> :load "MD6.cry"
  ...
  MD6> t1 // runs the t1 test, see MD6/Tests.cry.
  True
  MD6>

Please fire up an editor to have a look at the various “.cry” files.

Next up: Skein

If that left you wanting more, the next Cryptol blog entry will present a reference implementation for another SHA-3 candidate, Schneier et al.’s Skein. Stay tuned.

Author

("Sigbjorn Finne" # "Galois, Inc" # "2009-01-23") : [35][8]

Posted in Community, Cryptography, Formal Methods, Functional Programming, Technology | Comments (2)

©2000–2010 Galois, Inc. All rights reserved. Terms of use. Contact Us