Haskell for noobs, written by a noob

https://anadoxin.org/blog/haskell-for-noobs-written-by-a-noob.html

Thu, 25 December 2014 :: #haskell

Haskell is a purely functional language. It means that it's different from "normal" imperative languages like C/C++, Ruby or Python.

In a functional language, the thinking process is different. Instead of specifying the steps needed to perform an operation, you specify the result you'll want to get. This switch of thinking process to a different paradigm is the majority of the learning curve.

Recently I've started to learn this functional methodology. By writing down my progression, maybe it will be helpful to you as well, as I came from the imperative world without prior knowledge about functional approach.

The first problem is a Project Euler problem number 1. It goes like this:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

The algorithm I've used isn't the best or efficient, but it's easy to understand.

The problem states that we need to find numbers that are divisable by 3 or 5 from the pool of numbers from 0 to 1000. So, first we need to generate this pool. We start by building an infinite list, from 0 to infinity:

[0..]

This will build a list from 0 to infinity. It will be a lazy list, so it will reserve a small amount of memory to store only the generator code, not the resulting list contents.

Later, if you'll want to reference any element from this list, it will be computed on demand. So, indexing is slower, but at the same time it requires a lot less memory than a non-lazy method.

You can see how this list looks like in ghci, the Glasgow Haskell Compiler Interactive mode, but prepare to use ^C to cancel the printing process:

$ ghci
...

*Main> [0..]
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,...

However, we need to get only first 1000 numbers from this list. There is a function in Haskell that takes first n elements of user-supplied list, named take.

take 1000 [0..]

The syntax is: function-name arg1 arg2.

So, take takes first 1000 elements from an infinite list of numbers from 0 to infinity.

The resulting list is a list of numbers from 0 to 999.

Since we have a list of numbers, we need to filter out those which are not divisable by 3 nor 5. Normally I'd just use an iterator that would allow me to iterate on every item in this list. This however implies that I would use a state. The "state" can be also thought as a class field, mutable local variable, mutable global variable, etc. Haskell is a pure functional language, this means that its functions are state-less. You can't have a state in your functions, unless you're using monads, but that is a topic for another day. We won't use it, so we have to build a function that will not use any state.

One way to generate a list with these numbers is to simply loop over the input list, get each item from the input list, and if this item is divisable by 3 or 5, append it to the output list. When repeating this for each element of the input list, we should have a proper output list with proper elements.

Before writing the body of the function, a good practice would be to declare the types on which the function is operating.

generateList :: [Int] -> [Int]

This means: there is a function named generateList. I would like to specify type information for this function (::). My first argument is [Int], so it's a list of Int's. OK, since I've specified the type of the first argument, I'm going to specify the type of next argument now (->). Oh wait, it will be last type in my definition, so it will actually be a type of the return value instead of a type of the second argument. So, the return value is also [Int]. So, my function will take a list of Int's as an argument, and will return a list of Int's as a return value.

The input list will be take 10 [0..], and the output list will be list of numbers divisable by 3 or 5.

generateList (x:xs) = div35 x ++ generateList xs
generateList [] = []

Before we go analyzing the body of the function and why it's written the way it is, there are multiple other items here that are worth explaining. Function definitions in Haskell are defined by using pattern matches. In the example above, we have 2 patterns that Haskell needs to match to invoke the proper function.

First match is defined as:

generateList (x:xs) = ...

which is a standard notation for matching a non-empty list as an argument. I will get back to it in a minute.

Second match is defined as:

generateList [] = ...

This is matched when generateList is called with an argument that is an empty list. So, if you'd call generateList function like this:

generateList []

Then the second function body would be executed. Second function is equal to [], so this means that if generateList will be called with an argument that is an empty list, it will return an empty list.

But, if generateList will be invoked with an argument that is not an empty list, first match will be triggered, and first body will be executed.

Now, there is a small shortcut notation being utilised here. (x:xs) matches a non-empty list, and automatically creates two variables named x and xs, that contain: first element of the list, and the rest of the list -- respectively. This means, that if generateList will be called as:

generateList [1,2,3,4,5]

then x will contain 1, and xs will contain this list: [2,3,4,5].

By using these two matches: (x:xs) and [], we are covering 100% of cases, because we are covering lists that are empty, and lists that are not empty. There are no other types of lists.

The body of the function invoked for non-empty lists is:

div35 x ++ generateList xs

This means: invoke the div35 function, and give x as its first parameter. Operator ++ is used as a list concatenation operator. This means that it can append multiple lists to one bigger list. Example:

[1] ++ [2] => [1,2]
[1,2] ++ [3] => [1,2,3]
[1] ++ [2,3] => [1,2,3]
[1,2] ++ [2,3] => [1,2,2,3]

Then, the function recursively calls itself on the remainding part ("tail") of the input list. When it will finish, we will get a list. This list gets merged through the usage of the ++ operator with the result of div35 function to one list. This list is returned as a return value.

By quickly looking at the div35 function, it's relatively clear what it does. First, let's look at the type of the function. It requires an Int as an argument, and will return a list of Ints -- or an empty list.

div35 :: Int -> [Int]
div35 x
    | x == 0            = []
    | x `mod` 3 == 0    = [x]
    | x `mod` 5 == 0    = [x]
    | otherwise         = []

If x is 0, the function returns []. If x is divisable by 3, it will return a list with one element inside: x. Same thing with x divisable by 5 -- it will return a list with that number as a sole element. In case nothing is matched, it returns an empty list.

Yes, this looks like a normal switch statement, from C/C++ or similar.

To get more clear view of this process, let's debug it.

Let's invoke the generateList function with the list of [1,2,3] as an argument:

generateList [1,2,3]

It got a non-empty list as an argument, so first function body is matched. x is 1, xs is [2,3]. Following operations are being done:

div35 1
generateList [2,3]
return new list

div35 for 1 will return an empty list, because 1 is not divisable neither by 3 nor 5.

Then, generateList recursively calls itself with an argument of [2,3]. Lets step into it.

In a new frame, x is 2 and xs is [3]. Again, we have these operations to perform:

div35 2
generateList [3]
return new list

div35 will return an empty list again. And we enter ourselves again, specifying a list [3] as an argument.

So, it's a new frame of our function again. x is 3, and xs is an empty list: []. Yet again, we have these operations to perform:

div35 3
generateList []
return new list

div35 will return [3]! Because 3 is divisable by 3 (obviously), div35 built a new list with only one item inside -- the same number as was specified in the argument. And yet again, we recursively enter our function again.

But this time, we match the second function pattern -- the one that specifies an empty list as a match pattern. This function implementation doesn't recursively call itself anymore, it merely returns an empty list. So we have our result immediately. So, div35 returned this list: [3], generateList returned an empty list []. We have both lists, so we can concatenate them:

[3] ++ [] => [3]

and we return a new list constructed this way to the previous frame.

In the previous frame, div35 2 returned an empty list, and generateList [3] returned [3]. This means that we can merge them:

[] ++ [3] => [3]

and return this new list to our previous frame.

In the previous frame, div35 1 returned an empty list as well, and generateList [2,3] returned [3] (we just calculated this). By concatenating [] and [3], we get [3]

So, we're returning it, and we're in the caller frame. generateList [1,2,3] has just returned a list containing [3]. And this is a valid result: this is a list of numbers from an input list that are divisable by 3 or 5. Only 3 meets this criteria, so our function works well.

By using exactly the same method, we can build a function that will sum all of the numbers in the input list:

sumList :: [Int] -> Int
sumList (x:xs) = x + sumList xs
sumList [] = 0

This is exactly the same method as above, so I'll just skip the analysis.

Now we have all the functions we need to have to solve the problem. We just need to use them.

Let's define a function that will use them to calculate the proper solution. In short words, we need to generate an input list of numbers, so that generateList will filter out all the numbers that are not divisable by 3 or 5. Then we will use the new list (that will be generated by generateList) as an argument to sumList function, that will return one Int value, being the sum of all elements in its input list.

result = sumList (generateList (take 1000 [0..]))

This can be shortened to:

result = sumList $ generateList $ take 1000 [0..]

And we're done! result will return one number, being the sum of all elements of a list generated by generateList.

Our work is done ;).

But wait, there's more.

Bad code: list generation

Actually, our list generation method isn't very nice. Instead of taking 1000 numbers from an infinite list, we can simply generate a finite list by using this syntax:

[0..999]

This will generate a finite list with elements from 0 to 999. So, why we've used an infinite list in the first place? To show you that Haskell has no problem with infinity, and by default it's using lazy evaluation!

Bad code: checking for divisors

While we can't get over with division of elements, we can surely compress some code. We don't really need div35 and generateList functions. They're reinventing the wheel. Instead of them, we can use two features from Haskell: the filter function and a lambda function.

You probably already know what is a lambda function. From Python's perspective it's a small function that can be inlined as an argument. C++ also has lambda functions in the form of auto func = [&] (Args...) { body; }. For Java people, you can imagine that a lambda function is an anonymous class containing just one (default) method inside.

One of the purposes of lambda functions is to provide a mechanism for delayed function invocation. If a higher-order function requires a function as one argument, you can either put a name of the function that is defined elsewhere, but you can also put a lambda function as this argument -- this way you will specify the body of the function in the place of its declaration. Neat.

The filter function works like this: it takes a function as the first argument, and an input list as the second argument. Then, for each element of this list, filter will invoke supplied function, putting the element as an argument to this function. Consider the following example -- you invoke the filter function with arguments f (name of a function) and [1,2,3,4] (a list of numbers):

filter f [1,2,3,4]

What you will get is another list with new content. This new content will be based on the input list, but will contain only elements for which the f function returned True. So if you have an f function defined as:

-- A function named `f`, taking one argument `x`, that is defined as
-- `x == 1`.

f x = x == 1

-- This function will return True if `x` is equal to 1. If it's not equal to 1,
-- the function will return False.

after calling filter, the resulting list will contain only one element:

[1]

because f function only returns True for 1. Let's try a different function as an example:

f x = x < 3

After invoking filter f [1,2,3,4] we will create this list:

[1,2]

because only for 1 and 2 the condition x < 3 will yield True.

So, to reiterate, we can build a list with numbers that are divisable by 3 or 5 by using the following Haskell equation:

generateList = filter (\ x -> x `mod` 3 == 0 || x `mod` 5 == 0) [0..999]

It uses a lambda function in the form of (\ x -> [...]). The (\ sign supposedly looks similar to a lambda character, but honestly speaking I fail to see the similarity. Important thing to remember is that (\ is used to begin the definition of a lambda function. x -> is a list of argments -- in our case we're using just one argument, x, and -> simply begins the body of the function. The body is:

x `mod` 3 == 0 || x `mod` 5 == 0

This one is easy to interpret, it will return True if x is divisable by 3 or 5. There is one peculiarity here in the form of the backtick notation for mod operator, but as this seems to be just a syntactical sugar, I'll leave the description of this until another day.

So our invocation of the filter function will invoke our lambda function for each element of the list of numbers from 0 to 999. If the lambda function will return True for an element, this element will be included in the output list.

This means that we have just built a list of elements that are divisable by 3 or 5.

Here is the code:

sumList :: [Int] -> Int
sumList (x:xs) = x + sumList xs
sumList [] = 0

result = sumList $ filter (\ x -> x `mod` 3 == 0 || x `mod` 5 == 0) [0..999]

So why did I use div35 and generateList functions in the first place? To demonstrate a recursive looping operation, which is quite common in Haskell. You would need to learn it anyway!

To run the code, simply save the above program into a file (i.e. 1.hs), invoke ghci and load the program into your current session:

:l 1.hs

Then, just run the result function:

Main*> result

ghci should produce a valid result.