2013-08-25

Monads in Scala Part Three: Lisst[A]

We continue our Monads in Scala series by reviewing and translating the third page of the chapter on monads from the Haskell wikibook. This page takes quite an interesting view on lists, as it does not treat lists as things that you cons onto or even look things up in. It manages to treat lists nearly entirely as monadic structures. It even manages to come up with a couple of examples of using lists that only use the basic monadic operations. To view lists as things that we cons onto, we have to wait for a later page on additive monads.

Since the major point of this blog post series is to get our feet wet playing with monads, it only makes sense that we will implement a list. I'm going to use a basic linked list implementation, and call my type Lisst, to avoid confusion with the List type in the Scala collections library. Just to be clear, this Lisst is by no means intended as an alternative to the Scala's List. It is only intended as an exercise to experiment with the monadic aspects of lists.

All the source code presented here is freely available on GitHub. Most of the new code is under package monads.lisst. Be sure to take a look under both src/main and src/test. I recommend you get a local copy, run ~ test in sbt, and play around with it.

This post builds upon the two previous posts in this series - here and here. If you haven't read them yet, I highly recommend you start there.

Monadic operations on lists

As we might expect, the page on the list monad begins by defining the unit function and the binding operation for a list. We recall from our first blog post that the unit function is a function from the contained type to the container type, and that binding operation is the flatMap method in Scala. We will once again put the unit function in the companion object. Let's just nail down the signatures for these methods before we provide an implementation:

In case you haven't seen it before, the ??? is a very handy Scala operator which is basically a shorthand for "not implemented yet". If you actually invoke such a method, you will get a "scala.NotImplementedError: an implementation is missing" error message. My old-timer Java technique was to return null with a comment indicating that the method is a stub. ??? beats this in two ways: it is less verbose, and it is safer, because you get an error right away, and not a NullPointerException at some later point in time.

Linked list data structure

In order to start implementing things, we need to flesh out our linked list data type. I'll create a NonEmptyLisst type, that has a head and a tail, and an EmptyLisst class, which is used to terminate the linked list. (If you've never implemented a linked list before, or it's been a while, you may want to freshen up. I'm implementing a basic singly linked list here.) With this basic data structure defined, we can easily implement our unit function:


Varargs factory method

Before proceeding to implement flatMap, I'd like to make one further change. Instead of just providing a unit method, I would like to provide a Lisst constructor that can take any number of arguments. This will allow me to construct items of type Lisst[Int] as in any of the following examples:
  • Lisst[Int]()
  • Lisst(1)
  • Lisst(1, 2, 3, 4, 5)
As you can see, this constructor will still provide us with a suitable unit function. It will also be useful for constructing Lissts to use in testing. To implement this, we use Scala "repeated parameters", commonly known as varargs.

Within our implementation, the we get a Seq for the repeated parameters. But what kind of Seq we get is not specified in the Scala Language Specification (section 4.6.2). And a little experimentation shows out that we can indeed end up with nearly any sequence type inside our method. Since these sequences have different performance characteristics, we have to be careful to make sure we don't transform what should be an O(n) operation into an O(n^2) operation. For instance, if we get a List, using indexed access will give us O(n^2) behavior. On the other hand, it we get a Vector, then using head and tail will be O(n^2). The safest way to proceed is using foreach and reverse, both of which should be O(n) operations on any Seq type:

We've extracted a reverseSeq factory method because we'll have use for it later. We've used a var in reverseSeq, which we don't normally do. We could easily replace it with a tail-recursive method, but this seems like a perfectly acceptable use of a var. It's entirely encapsulated in a little 3-line method. And I don't want to get too distracted from the main goal of understanding the list monad.

Implementing map and flatten

The Haskell wikibook implements the bind operation in terms of Haskell list operations concat and map. In Scala terms, flatMap is implemented in terms of flatten and map. However, we remember from the first blog post that while we can define flatMap in terms of flatten and map, we can conversely define flatten and map in terms of flatMap and the unit function. We're going to take the latter approach, as we did with Maybe. Let's fill in map and flatten now:


Implementing foreach and flatMap

Now we're nearly ready to implement flatMap. To do this we are going to introduce an instance method foreach that walks through a Lisst, applying a side-effecting function f to every member of the Lisst in turn. It's a straightforward recursive method on a linked list. You'll notice that we've annotated this method as @tailrec, which assures us that the compiler is able to eliminate the tail recursion, which will prevent the method from blowing out our stack on very long lists. The @tailrec annotation actually turns out to be very useful here, because we learn that we have to make the method final for the Scala compiler to apply tail recursion elimination. The compiler is worried the method might get overridden, even though the Lisst trait is sealed.

With the help of the foreach method, flatMap itself becomes very easy to reason about and understand. The intuition behind flatMap is that we want to apply a function from A to Lisst[B] to every member of the original Lisst[A], and then concatenate the resulting Lisst[B]s together. We break this process down into two steps. First, generate all the elements of type B and store them in a (Scala library) List. We end up with the results in reverse order, since we prepend successive elements to the List. Second, construct the result using the reverseSeq factory method we defined above.


Testing Lisst[A]

Before moving further on into the Haskell wiki page, let's take a moment to write some tests for the above code. We're going to split testing into two parts. First, we'll write some simple tests to make sure Lisst.flatMap is acting as expected. Then we'll go on to generalize our code for testing Maybe in regards to monadic laws, and apply that testing code to Lisst as well. I'm not going to dwell on the Lisst.flatMap spec, but you can look at it on GitHub here and here. It may be useful to review if you are still not entirely clear on what it means to flatMap a list.

To test Lisst in regards to the monad laws, we will generalize some of the Maybe monad tests that we developed for the previous blog post. Let's quickly review what we did there. Starting from the monads-in-scala-2 Git tag for the project, we launch sbt and run test-only *MaybeMonadLawsSpec. The output from the test is as follows:
[info] MaybeMonadLawsSpec:
[info] Maybe monad with respect to Person data
[info] - should obey left unit monadic law
[info] - should obey right unit monadic law
[info] - should obey associativity monadic law
[info] - should flatten a Maybe[Maybe[_]] according to monadic laws
[info] - should flatMap equivalently to calling map and then flatten
[info] Maybe monad with respect to safe math operations
[info] - should obey left unit monadic law
[info] - should obey right unit monadic law
[info] - should obey associativity monadic law
[info] - should flatten a Maybe[Maybe[_]] according to monadic laws
[info] - should flatMap equivalently to calling map and then flatten
[info] Maybe monad with respect to looking up Numbers and Registrations by Name
[info] - should obey left unit monadic law
[info] - should obey right unit monadic law
[info] - should obey associativity monadic law
[info] - should flatten a Maybe[Maybe[_]] according to monadic laws
[info] - should flatMap equivalently to calling map and then flatten
[info] Maybe monad with respect to looking up Registrations and TaxesOwed by Number
[info] - should obey left unit monadic law
[info] - should obey right unit monadic law
[info] - should obey associativity monadic law
[info] - should flatten a Maybe[Maybe[_]] according to monadic laws
[info] - should flatMap equivalently to calling map and then flatten
As you will recall, we exercise various monad laws with respect to Maybe over 4 different data sets. For each data set, there are five tests. The first three exercise the monad laws proper, and the remaining two are specific to the Maybe type. We will generalize the code that exercises the monad laws proper, refactor the MaybeMonadLawsSpec to use the generalized code, and then write some basic monad laws tests for Lisst.

Abstracting a Monad Type

In order to write generalized test code for a monad, let's first review what a monad is. A monad is defined as having three parts:
  • A type constructor M, which provides a higher-kinded type M[A] for any given base type A;
  • A unit function, which takes an A and returns an M[A]; and
  • A binding operation (i.e., flatMap), which takes an M[A] and a function from A to M[B], and returns an M[B].
These requirements readily translate into a Scala trait:

Note the import of scala.language.higherKinds. The higher kinded type in our example is the type M[_] inside the Monad trait. Higher kinded types are very powerful, but make the Scala type system Turing complete, which means the Scala compiler is no longer guaranteed to terminate. While this is highly improbable in practice, people generally expect a compiler to terminate, which makes this language feature a bit strange. As of Scala 2.10, a variety of language features have been classified as "advanced and contentious", and require a compiler flag, or an import such as this one, to use. For the time being, (and as of Scala 2.11.0-M4), leaving out the import produces a compiler warning, but it may produce an error in future versions. You can read SIP-18 for more details.

It's easy to define singleton objects MaybeMonad and LisstMonad that implement the Monad trait:

An important realization that hits us at this point is that our Maybe and Lisst classes are not actually monads in and of themselves! It's more appropriate to consider these classes as satisfying the first out of three parts of the definition of a monad: the type constructor. It's merely a Scala convention that the binding operation is a method called flatMap in the type constructor class. We could also say that having the unit function as an apply method in the companion object is a bit of a Scala convention. But all three of these things must come together to make a monad.

A Method to Test Monad Laws

Let's step back and review some of the testing code we developed previously for Maybe and the monad laws. The function takes these arguments: a description of the test; some test items of type A, a function from A to Maybe[B], and a function from B to Maybe[C]. Within the test, we also generate a test set of Maybes by applying the unit function to every test item, and prepending MaybeNot:


To generalize this, we will need to add three more arguments: the monad; a name for the monad to use in the test descriptions; and a list of test items of type M[A]. With Maybe, it was easy to generate this kind of test data, but in general, we will need to leave this up to the caller. Here is the generalized method:


The most interesting thing about this code is that we are using path dependent types in the signature. Specifically, we refer to monad.M[A], monad.M[B], and monad.M[C], where monad is the higher kinded type in question. For our purposes, monad will be one of MaybeMonad and LisstMonad defined above, and monad.M will resolve to types Maybe and Lisst. Note that to use path dependent types in argument lists this way, we need to declare monad in a separate argument list from the arguments with type monad.M.

LisstMonadLawsSpec

It's straightforward to rework the MaybeMonadLawsSpec to use this new method. You can view the details here. To test Lisst, we provide all combinations of methods f and g that return Lissts of length 0, 1, and greater than one. For types A, B, and C, we use Int, String, and Double. We construct our test data roughly as follows:

Finally, we need to add a little something to produce different test descriptions for each data set, as required by ScalaTest. So we just zipWithIndex the fgPairs, and insert the index into the test descriptions:


Bunny Invasion

The next section of the Haskell wiki presents a simple example using a function that replicates a single value into a list. This example is rather trivial compared to the work we've gone through to build and test Lisst, so the translation into Scala is presented here without further comment:


Tic Tac Toe

In the tic tac toe example (noughts and crosses) in the Haskell wiki, the type Board and the function nextConfigs are never defined. This is understandable, and these are implementation details of the game that are not directly relevant to the monadic treatment of lists. We don't want to spend the time to write a full blown tic tac toe game in Scala either, but we do want to be able to run and test the code. So let's cook up just enough of an implementation to allow us to write and test the nextConfigs method.

The Implementation

First of all, there are 9 positions on the board where a turn can be played. We don't want to worry about any of the details of the positions themselves, just to provide a safe way to enumerate them. Here's some defensive code that satisfies these requirements:


Because X always goes first, a sequence of the positions played will fully describe the game state. We follow the Haskell example and name this type the Board. We'll keep the constructor and the position sequence private. We'll give the user access to an initial board state, and let them generate new board states with nextConfigs:

The nextConfigs method itself is quite simple and elegant to write in Scala, and really shows the power of a rich collections API built out from a monadic base. We start with all possible positions; filter out any positions already taken; add each remaining position to the sequence of moves so far; construct a new Board for each new sequence of moves; and finally, construct a Lisst of those new Boards.


Four Versions of thirdConfigs

The Haskell example goes on to define the function thirdConfigs in four different ways. thirdConfigs takes a Board state as input, and produces a list of all the Boards three moves out. These readily translate into Scala:


Let's make sure these four versions produce the same results:


While we won't bother to confirm the full contents of the results, let's at least make sure that the size of the results is right. The number of board positions 3 turns out from an empty board is the permutation of 3 elements taken from a set of size 9:


thirdConfigs with Kleisli Composition

When reading over the Haskell variations on thirdConfigs, I couldn't help but thinking that this was a natural fit for Kleisli composition, introduced in the previous blog post (and previous page of the Wiki book).

In the previous blog post, we used an implicit class to pimp a function suitable for flatMap with a kleisliCompose method:

Unfortunately, this implicit class is hard-coded to the Maybe monad, and needs to be generalized. A good place to put the generalized version is inside the Monad trait we defined above. We have to replace references to Maybe with Monad.M, and the Scala for-comprehension with the equivalent call to Monad.bindingOperation. The new implicit class looks like this:

To make use of the new KleisliComposition class, we have to bring it into scope within the context of a specific instance of Monad. For instance, we use it with the singleton object LisstMonad like so to define another variant of thirdConfigs:

A simple test like those shown above for thirdConfigs versions one through four will demonstrate that this works as expected:


List Comprehensions

Scala doesn't have list comprehensions. That's a good thing. It doesn't need them. For-comprehensions and the collections API methods are enough. List comprehensions probably make sense in Haskell, but I'm really not qualified to say.

Wrapping Up

In this post, we've seen how to implement a singly linked list in the context of a monad. We came up with a generalization of what a monad is with the Scala trait Monad, and used that trait to generalize our testing framework for testing monad laws. We also provided a variety of examples of the Lisst monad in action, including in LisstFlatMapSpec, as well as the bunny invasion and tic tac toe examples provided in the Haskell wiki. Amazingly enough, we accomplished all this more or less using only the unit function and binding operation of the Lisst monad. We did make liberal use of the varargs factory method, but this was mainly used for constructing expected values in our tests. This goes to show that there is a lot we can do with lists without even appending, prepending, or concatenating.

At this point, we've thoroughly covered the two most canonical uses of monads: lists and optional values. I'm looking forward to covering slightly less mainstream monads such as State in the near future. We'll most likely skip over the Haskell wiki page on do notation entirely, or at best merge anything interesting there into a post on the IO monad, as the do notation wiki page is mostly devoted to describing a Haskell-specific language feature. Which means IO is next! (Interestingly enough, a moon of Jupiter that goes by the same name is undergoing a major volcanic eruption at the moment.)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.