antek's tech blog - haskellZola2014-12-25T00:00:00+00:00https://anadoxin.org/blog/tags/haskell/atom.xmlHaskell for noobs, written by a noob2014-12-25T00:00:00+00:002014-12-25T00:00:00+00:00Unknownhttps://anadoxin.org/blog/haskell-for-noobs-written-by-a-noob.html/<p>Haskell is a purely functional language. It means that it's different from
"normal" imperative languages like <code>C/C++</code>, <code>Ruby</code> or <code>Python</code>.</p>
<p>In a functional language, the thinking process is different. Instead of
specifying the steps needed to perform an operation, you specify the result
you'll want to get. This switch of thinking process to a different paradigm is
the majority of the learning curve.</p>
<p>Recently I've started to learn this functional methodology. By writing down my
progression, maybe it will be helpful to you as well, as I came from the
imperative world without prior knowledge about functional approach.</p>
<p>The first problem is a Project Euler problem number 1. It goes like this:</p>
<blockquote>
<p>If we list all the natural numbers below 10 that are multiples of 3 or 5, we get
3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the
multiples of 3 or 5 below 1000.</p>
</blockquote>
<p>The algorithm I've used isn't the best or efficient, but it's easy to
understand.</p>
<p>The problem states that we need to find numbers that are divisable by 3 or 5
from the pool of numbers from 0 to 1000. So, first we need to generate this
pool. We start by building an infinite list, from 0 to infinity:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">[0..]
</code></pre>
<p>This will build a list from 0 to infinity. It will be a lazy list, so it will
reserve a small amount of memory to store only the generator code, not the
resulting list contents.</p>
<p>Later, if you'll want to reference any element from this list, it will be
computed on demand. So, indexing is slower, but at the same time it requires a
lot less memory than a non-lazy method.</p>
<p>You can see how this list looks like in <code>ghci</code>, the Glasgow Haskell Compiler
Interactive mode, but prepare to use <code>^C</code> to cancel the printing process:</p>
<pre><code>$ ghci
...
*Main> [0..]
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,...
</code></pre>
<p>However, we need to get only first 1000 numbers from this list. There is a
function in Haskell that takes first <code>n</code> elements of user-supplied list, named
<code>take</code>.</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">take 1000 [0..]
</code></pre>
<p>The syntax is: <code>function-name arg1 arg2</code>.</p>
<p>So, <code>take</code> takes first <code>1000</code> elements from an infinite list of numbers from 0
to infinity.</p>
<p>The resulting list is a list of numbers from 0 to 999.</p>
<p>Since we have a list of numbers, we need to filter out those which are not
divisable by 3 nor 5. Normally I'd just use an iterator that would allow me to
iterate on every item in this list. This however implies that I would use a
<em>state</em>. The "state" can be also thought as a class field, mutable local
variable, mutable global variable, etc. Haskell is a pure functional language,
this means that its functions are <em>state-less</em>. You can't have a state in your
functions, unless you're using <code>monads</code>, but that is a topic for another day. We
won't use it, so we have to build a function that will not use any state.</p>
<p>One way to generate a list with these numbers is to simply loop over the input
list, get each item from the input list, and if this item is divisable by 3 or
5, append it to the output list. When repeating this for each element of the
input list, we should have a proper output list with proper elements.</p>
<p>Before writing the body of the function, a good practice would be to declare
the types on which the function is operating.</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">generateList :: [Int] -> [Int]
</code></pre>
<p>This means: there is a function named <code>generateList</code>. I would like to specify
type information for this function (<code>::</code>). My first argument is <code>[Int]</code>, so it's
a list of Int's. OK, since I've specified the type of the first argument, I'm
going to specify the type of next argument now (<code>-></code>). Oh wait, it will be last
type in my definition, so it will actually be a type of the return value instead
of a type of the second argument. So, the return value is also <code>[Int]</code>. So, my
function will take a list of Int's as an argument, and will return a list of
Int's as a return value.</p>
<p>The input list will be <code>take 10 [0..]</code>, and the output list will be list of
numbers divisable by 3 or 5.</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">generateList (x:xs) = div35 x ++ generateList xs
generateList [] = []
</code></pre>
<p>Before we go analyzing the body of the function and why it's written the way it
is, there are multiple other items here that are worth explaining. Function
definitions in Haskell are defined by using pattern matches. In the example
above, we have 2 patterns that Haskell needs to match to invoke the proper
function.</p>
<p>First match is defined as:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">generateList (x:xs) = ...
</code></pre>
<p>which is a standard notation for matching a <em>non-empty list</em> as an argument. I will get
back to it in a minute.</p>
<p>Second match is defined as:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">generateList [] = ...
</code></pre>
<p>This is matched when <code>generateList</code> is called with an argument that is an <em>empty
list</em>. So, if you'd call <code>generateList</code> function like this:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">generateList []
</code></pre>
<p>Then the second function body would be executed. Second function is equal to
<code>[]</code>, so this means that if <code>generateList</code> will be called with an argument that
is an empty list, it will return an empty list.</p>
<p>But, if <code>generateList</code> will be invoked with an argument that is <strong>not</strong> an empty
list, first match will be triggered, and first body will be executed.</p>
<p>Now, there is a small shortcut notation being utilised here. <code>(x:xs)</code> matches a
non-empty list, and automatically creates two variables named <code>x</code> and <code>xs</code>, that
contain: first element of the list, and the rest of the list -- respectively.
This means, that if <code>generateList</code> will be called as:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">generateList [1,2,3,4,5]
</code></pre>
<p>then <code>x</code> will contain <code>1</code>, and <code>xs</code> will contain this list: <code>[2,3,4,5]</code>.</p>
<p>By using these two matches: <code>(x:xs)</code> and <code>[]</code>, we are covering 100% of cases,
because we are covering lists that are empty, and lists that are not empty.
There are no other types of lists.</p>
<p>The body of the function invoked for non-empty lists is:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">div35 x ++ generateList xs
</code></pre>
<p>This means: invoke the <code>div35</code> function, and give <code>x</code> as its first parameter.
Operator <code>++</code> is used as a list concatenation operator. This means that it can
append multiple lists to one bigger list. Example:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">[1] ++ [2] => [1,2]
[1,2] ++ [3] => [1,2,3]
[1] ++ [2,3] => [1,2,3]
[1,2] ++ [2,3] => [1,2,2,3]
</code></pre>
<p>Then, the function recursively calls itself on the remainding part ("tail") of the input
list. When it will finish, we will get a list. This list gets merged through the
usage of the <code>++</code> operator with the result of div35 function to one list. This
list is returned as a return value.</p>
<p>By quickly looking at the <code>div35</code> function, it's relatively clear what it does.
First, let's look at the type of the function. It requires an Int as an
argument, and will return a list of Ints -- or an empty list.</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">div35 :: Int -> [Int]
div35 x
| x == 0 = []
| x `mod` 3 == 0 = [x]
| x `mod` 5 == 0 = [x]
| otherwise = []
</code></pre>
<p>If <code>x</code> is <code>0</code>, the function returns <code>[]</code>. If <code>x</code> is divisable by <code>3</code>, it will
return a list with one element inside: <code>x</code>. Same thing with <code>x</code> divisable by <code>5</code>
-- it will return a list with that number as a sole element. In case nothing is
matched, it returns an empty list.</p>
<p>Yes, this looks like a normal <code>switch</code> statement, from C/C++ or similar.</p>
<p>To get more clear view of this process, let's debug it.</p>
<p>Let's invoke the <code>generateList</code> function with the list of <code>[1,2,3]</code> as an
argument:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">generateList [1,2,3]
</code></pre>
<p>It got a non-empty list as an argument, so first function body is matched. <code>x</code>
is <code>1</code>, <code>xs</code> is <code>[2,3]</code>. Following operations are being done:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">div35 1
generateList [2,3]
return new list
</code></pre>
<p><code>div35</code> for <code>1</code> will return an empty list, because <code>1</code> is not divisable neither by
<code>3</code> nor <code>5</code>.</p>
<p>Then, <code>generateList</code> recursively calls itself with an argument of <code>[2,3]</code>. Lets
step into it.</p>
<p>In a new frame, <code>x</code> is <code>2</code> and <code>xs</code> is <code>[3]</code>. Again, we have these operations to
perform:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">div35 2
generateList [3]
return new list
</code></pre>
<p><code>div35</code> will return an empty list again. And we enter ourselves again,
specifying a list <code>[3]</code> as an argument.</p>
<p>So, it's a new frame of our function again. <code>x</code> is 3, and <code>xs</code> is an empty list:
<code>[]</code>. Yet again, we have these operations to perform:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">div35 3
generateList []
return new list
</code></pre>
<p><code>div35</code> will return <code>[3]</code>! Because <code>3</code> is divisable by <code>3</code> (obviously), <code>div35</code>
built a new list with only one item inside -- the same number as was specified
in the argument. And yet again, we recursively enter our function again.</p>
<p>But this time, we match the second function pattern -- the one that specifies an
empty list as a match pattern. This function implementation doesn't recursively
call itself anymore, it merely returns an empty list. So we have our result
immediately. So, <code>div35</code> returned this list: <code>[3]</code>, <code>generateList</code> returned an
empty list <code>[]</code>. We have both lists, so we can concatenate them:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">[3] ++ [] => [3]
</code></pre>
<p>and we return a new list constructed this way to the previous frame.</p>
<p>In the previous frame, <code>div35 2</code> returned an empty list, and <code>generateList [3]</code>
returned <code>[3]</code>. This means that we can merge them:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">[] ++ [3] => [3]
</code></pre>
<p>and return this new list to our previous frame.</p>
<p>In the previous frame, <code>div35 1</code> returned an empty list as well, and
<code>generateList [2,3]</code> returned <code>[3]</code> (we just calculated this). By concatenating
<code>[]</code> and <code>[3]</code>, we get <code>[3]</code></p>
<p>So, we're returning it, and we're in the caller frame. <code>generateList [1,2,3]</code>
has just returned a list containing <code>[3]</code>. And this is a valid result: this is a
list of numbers from an input list that are divisable by 3 or 5. Only 3 meets
this criteria, so our function works well.</p>
<p>By using exactly the same method, we can build a function that will sum all of
the numbers in the input list:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">sumList :: [Int] -> Int
sumList (x:xs) = x + sumList xs
sumList [] = 0
</code></pre>
<p>This is exactly the same method as above, so I'll just skip the analysis.</p>
<p>Now we have all the functions we need to have to solve the problem. We just need
to use them.</p>
<p>Let's define a function that will use them to calculate the proper solution. In
short words, we need to generate an input list of numbers, so that
<code>generateList</code> will filter out all the numbers that are not divisable by 3 or 5.
Then we will use the new list (that will be generated by <code>generateList</code>) as an
argument to <code>sumList</code> function, that will return one Int value, being the sum of
all elements in its input list.</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">result = sumList (generateList (take 1000 [0..]))
</code></pre>
<p>This can be shortened to:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">result = sumList $ generateList $ take 1000 [0..]
</code></pre>
<p>And we're done! <code>result</code> will return one number, being the sum of all elements
of a list generated by <code>generateList</code>.</p>
<p>Our work is done ;).</p>
<p>But wait, there's more.</p>
<h2 id="bad-code-list-generation">Bad code: list generation</h2>
<p>Actually, our list generation method isn't very nice. Instead of taking 1000 numbers
from an infinite list, we can simply generate a finite list by using this
syntax:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">[0..999]
</code></pre>
<p>This will generate a finite list with elements from 0 to 999. So, why we've used
an infinite list in the first place? To show you that Haskell has no problem
with infinity, and by default it's using lazy evaluation!</p>
<h2 id="bad-code-checking-for-divisors">Bad code: checking for divisors</h2>
<p>While we can't get over with division of elements, we can surely compress some
code. We don't really need <code>div35</code> and <code>generateList</code> functions. They're
reinventing the wheel. Instead of them, we can use two features from Haskell:
the <code>filter</code> function and a lambda function.</p>
<p>You probably already know what is a lambda function. From Python's perspective
it's a small function that can be inlined as an argument. C++ also has lambda
functions in the form of <code>auto func = [&] (Args...) { body; }</code>. For Java people,
you can imagine that a lambda function is an anonymous class containing just one
(default) method inside.</p>
<p>One of the purposes of lambda functions is to provide a mechanism for delayed
function invocation. If a higher-order function requires a function as one
argument, you can either put a name of the function that is defined elsewhere, but
you can also put a lambda function as this argument -- this way you will specify
the body of the function in the place of its declaration. Neat.</p>
<p>The <code>filter</code> function works like this: it takes a function as the first
argument, and an input list as the second argument. Then, for each element of
this list, <code>filter</code> will invoke supplied function, putting the element as an
argument to this function. Consider the following example -- you invoke the
<code>filter</code> function with arguments <code>f</code> (name of a function) and <code>[1,2,3,4]</code> (a
list of numbers):</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">filter f [1,2,3,4]
</code></pre>
<p>What you will get is another list with new content. This new content will be
based on the input list, but will contain only elements for which the <code>f</code>
function returned <code>True</code>. So if you have an <code>f</code> function defined as:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">-- A function named `f`, taking one argument `x`, that is defined as
-- `x == 1`.
f x = x == 1
-- This function will return True if `x` is equal to 1. If it's not equal to 1,
-- the function will return False.
</code></pre>
<p>after calling <code>filter</code>, the resulting list will contain only one element:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">[1]
</code></pre>
<p>because <code>f</code> function only returns <code>True</code> for <code>1</code>. Let's try a different function
as an example:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">f x = x < 3
</code></pre>
<p>After invoking <code>filter f [1,2,3,4]</code> we will create this list:</p>
<pre><code>[1,2]
</code></pre>
<p>because only for <code>1</code> and <code>2</code> the condition <code>x < 3</code> will yield <code>True</code>.</p>
<p>So, to reiterate, we can build a list with numbers that are divisable by 3 or 5
by using the following Haskell equation:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">generateList = filter (\ x -> x `mod` 3 == 0 || x `mod` 5 == 0) [0..999]
</code></pre>
<p>It uses a lambda function in the form of <code>(\ x -> [...])</code>. The <code>(\</code> sign
supposedly looks similar to a lambda character, but honestly speaking I fail to
see the similarity. Important thing to remember is that <code>(\</code> is used to begin
the definition of a lambda function. <code>x -></code> is a list of argments -- in our case
we're using just one argument, <code>x</code>, and <code>-></code> simply begins the body of the
function. The body is:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">x `mod` 3 == 0 || x `mod` 5 == 0
</code></pre>
<p>This one is easy to interpret, it will return <code>True</code> if <code>x</code> is divisable by <code>3</code>
or <code>5</code>. There is one peculiarity here in the form of the backtick notation for
<code>mod</code> operator, but as this seems to be just a syntactical sugar, I'll leave
the description of this until another day.</p>
<p>So our invocation of the <code>filter</code> function will invoke our lambda function for
each element of the list of numbers from 0 to 999. If the lambda function will
return True for an element, this element will be included in the output list.</p>
<p>This means that we have just built a list of elements that are divisable by 3
or 5.</p>
<p>Here is the code:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">sumList :: [Int] -> Int
sumList (x:xs) = x + sumList xs
sumList [] = 0
result = sumList $ filter (\ x -> x `mod` 3 == 0 || x `mod` 5 == 0) [0..999]
</code></pre>
<p>So why did I use <code>div35</code> and <code>generateList</code> functions in the first place? To
demonstrate a recursive looping operation, which is quite common in Haskell. You
would need to learn it anyway!</p>
<p>To run the code, simply save the above program into a file (i.e. <code>1.hs</code>), invoke
<code>ghci</code> and load the program into your current session:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">:l 1.hs
</code></pre>
<p>Then, just run the <code>result</code> function:</p>
<pre data-lang="haskell" class="language-haskell "><code class="language-haskell" data-lang="haskell">Main*> result
</code></pre>
<p><code>ghci</code> should produce a valid result.</p>