Dan Newcome, blog

I'm bringing cyber back

Archive for February 2012

Running programs as Linux daemons using Upstart

with 2 comments

In the Windows world, any long-running background process on a machine is handled as a Windows Service. This is a well-known programming interface that allows the system to coordinate startup and shutdown procedures and allows an administrator to use the service management tools to control and log details of the process.

In Linux, from the very beginning there was the idea of a daemon, which is conceptually the same as what Windows calls a service. However, a daemon doesn’t really have any one definite meaning. For the most part, when a process “daemonizes” on its own, that means that it forks and dissociates itself from the controlling terminal [1]. However most of the time we aren’t directly calling a process like this when we boot a system or want to manage a process. Which brings us to using tools like init or upstart.

The classic way of starting daemons on a Unix is using init. Init is the first process that is started after booting the kernel [2]. Init is responsible for starting all of the rest of the processes that need to be started when the system boots. The scripts and system that support the use of init are typically referred to as a “system v” init system (sysv). BSD Unix and some Linux distributions (slackware) use a simpler init system that is referred to as BSD init.

So why am I explaining all of this stuff? Despite their structural differences, all init based systems fundamentally are shell scripts that are scheduled or controlled in different ways. In order to hook a new program in as a system daemon we need to write some wrapper scripts around the program and tell init about it. That’s really all. However, managing processes is kind of tricky and for the most part you have to be very careful when writing these scripts. Ubuntu and later Redhat versions have started using an alternative system called Upstart as a result of the difficulty in getting init scripts right.

Using upstart, we only need to worry about a single configuration file, which by convention lives in /etc/init and is named service.conf, where service is the name of the service that the file describes. The format of a conf file is mostly declarative, with hooks for inserting arbitrary shell code. For the most part it is much easier to get a service mostly correct quickly by just copying and modifying one of the existing config files.

My experience is that Upstart is way easier, so if it’s at all possible to use Upstart as your init, it will make things much easier. I used the following template script for my new daemon [3]:

# myservice - myservice job file

description "my service description"
author "Me <myself@i.com>"

# Stanzas
#
# Stanzas control when and how a process is started and stopped
# See a list of stanzas here: http://upstart.ubuntu.com/wiki/Stanzas#respawn

# When to start the service
start on runlevel [2345]

# When to stop the service
stop on runlevel [016]

# Automatically restart process if crashed
respawn

# Essentially lets upstart know the process will detach itself to the background
expect fork

# Run before process
pre-start script
    [ -d /var/run/myservice ] || mkdir -p /var/run/myservice
    echo "Put bash code here"
end script

# Start the process
exec myprocess

In my case, the process I was running was not already a daemon, meaning that it didn’t fork into the background when run – it blocks indefinitely. So the very first thing I had to do was to remove the line:

expect fork

I didn’t use a pid (process identifier) file to control the service so the pre-start script wasn’t necessary. If I have issues with startup/shutdown I might use a PID file in the future. Even without the PID file, Upstart can detect that the process is already running and can kill the running process without having an explicit pid file.

One other thing that got me initially was that the “exec” line is an Upstart stanza in this case, and not the exec command. In fact, in order to run more than one line you need a stanza that uses script/end script:

script
  . /etc/default/hal
  exec /usr/sbin/hald --daemon=no $DAEMON_OPTS
end script

You seem to get a lot for free with Upstart. I’ll update things here if I find any drawbacks to keeping it this simple, but for now things are working well.

Written by newcome

February 26, 2012 at 2:43 pm

Posted in Uncategorized

Clojure and the Classpath

leave a comment »

Since Clojure is built on Java, and Java depends on the classpath to find code that it wants to load and execute, we are stuck using the Classpath when we run Clojure code.

Usually we use Leiningen; to manage the Classpath. “Lein” as we’ll call it from now on, uses a small Clojure file as a configuration file. This file is called project.clj.

Project.clj defines the code locations for Clojure as well as declaring the external dependencies. These dependencies are kept in a lib/ folder under the project path.

All of this stuff is great when it’s set up the way you want it. However, I’ve recently wanted to be able to fire up the Clojure read-eval-print loop (REPL) on random code at will in the project. Typically we’d be able to use Lein to run code like this, but you need a “main” function defined. I don’t want to keep redefining this function in the project.clj file over and over.

What I want to be able to do is call clojure on a source file and load it, dropping immediately into the REPL.

$ clojure foo.clj

This is likely to go down in flames if there are any external dependencies. In my case I wanted to get all of the jar files into the classpath that are currently under lib/. To do this we can use a shell script like the following:

$ find ../lib -exec echo -n {}: \;

Which gives us something like:

../lib:../lib/commons-collections-3.2.1.jar:../lib/jsr305-1.3.9.jar:../lib/snakeyaml-1.8.jar:

We could probably use bash globs too and say something like this on the classpath without having to construct it as above but I haven’t tested this:

../lib/*

As a side note, this is what my Clojure launch script looks like. It is essentially the same script you get when installing clojure on Ubuntu, but I’ve added ‘rlwrap’ so that I get GNU Readline functionality in the Clojure REPL. This gives you things like command history, which I find to be invaluable.

exec rlwrap java -cp /usr/share/java/clojure.jar clojure.main "$@"

To use the classpath script I’m just setting the CLASSPATH environment variable in front of the clojure execution.

CLASSPATH=`./classpath.sh`./:../resources clojure program.clj

Another note on the classpath is that I have added ./resources to the path. This happens to be a location where some Java .properties files are kept. These configuration files are read from a “well known location” by the Java runtime, which basically means that the Classpath is searched to find them. So we add this folder to the classpath.

So this still doesn’t get us into the REPL. It will load the file and drop back to the shell.

We could load the file manually from the REPL like this:

user=> (load-file "program.clj")
#'program/start

This shows us that the file has been loaded and that the function “program/start” has been defined. We can then call it using:

user=> (program/start)

So I don’t know how to avoid this manual step yet. Using clojure -r doesn’t seem to work. This should drop you into the REPL, but apparently it ignores the file when given.

One solution would be to write a small clojure script that takes the argument and loads the file and drops you into the REPL. I’ll save that for next time.

Written by newcome

February 16, 2012 at 11:08 am

Posted in Uncategorized

Functions vs macros in Clojure

with 2 comments

In my third post on the Clojure programming language I’m going to cover macros. I have a function that I converted to my very first macro in Clojure. I’m going to tell the story here, because it sort of made macros a lot less scary.

To begin the story, let’s consider a block of code that I wrote to write some data out to a file. I’ve changed the data to a simple foo=”bar” key-value pair for the sake of this discussion. Here is the code:

(use 'clojure.java.io)
(with-open [wrtr (writer "foo.out")]
    (.write wrtr (str {:foo "bar"})))

For those unfamiliar with Clojure idioms, the ‘with-open’ function is a way to open a file using an underlying Java OutputStream and automatically close it when we are finished. The OutputStream is created in the ‘let’ mapping vector with the ‘writer’ function. This is very similar to the ‘using’ construct in C#. In C# we would have said something like:

using(FileStream fs = new FileStream( "foo.out", FileMode.Append, FileAccess.Write ) {
    ... // use fs here ...
} // fs is disposed once we leave the scope

Here, inside the ‘using’ statement’s resource acquisition section, we create the FileStream. Similarly, in the binding form we are creating a variable binding wrtr that is bound to the OutputStream returned by the ‘writer’ function.

So my next step in this process was to create a function that took another function and a filename. I wanted the function to be able to evaluate the given function and write the results to a new file with the given name. Here is my first attempt:

(defn write-results-to-file [fn name]
    (with-open [wrtr (writer name)]
        (.write wrtr (str(fn)))))

Here is an example usage:

(write-results-to-file #(str "<?xml version='1.0'?>" "<root><child>txt</child></root>") "foo.out")

That’s a little bit contrived, as we use ‘str’ to do a string concatenation as our function. But if we think that maybe we’d end up with a function like ‘create-xml-preamble’ to spit out the XML processing instruction for us, it makes more sense.

So what does this have to do with macros? Notice when I used the above function, in order for it to work correctly I had to structure my first argument as a lambda function. Take another look at the reader macro form used – #(). This takes the contents and wraps it in an anonymous function definition.

From experience we know that other Clojure forms like ‘if’ are able to take blocks of code and treat them as separate functions, so there must be a way for us to write something like:

(write-results-to-file (str "<?xml version='1.0'?>" "<root><child>txt</child></root>") "foo.out")

The only difference is that we can just use a normal clojure form as the first argument, without creating a lambda function. Ordinarily Clojure will evaluate ‘str’ before ‘write-results-to-file’ so we’ll end up trying to evaluate a string, giving us an error like:

ClassCastException java.lang.String cannot be cast to clojure.lang.IFn  user/eval12 (NO_SOURCE_FILE:16)

So let’s try to write a macro. For the first attempt I just took my function and put the body into the macro quoted with backtick:

(defmacro write-to-file-macro [fn name]
    `(with-open [wrtr# (writer ~name)]
        (.write wrtr# (str(~fn))))
)

It looks just like the function but the body is quoted so it won’t be evaluated right away. The variable names need to be unquoted so that they can be replaced by actual values when the macro is expanded. This is done using tilde in front of the variables. Also there is one other small thing – the free variables used are appended with hashes. ‘wrtr’ becomes ‘wrtr#’. This is done to create unique symbols and is a shortcut for calling ‘gensym’. Otherwise we could possibly have redefinitions of those symbols.

So, let’s use the macro.

(write-to-file-macro #(str "" "txt") "foo.out")

We still have to pass a lambda in. What gives? All we have to do now is use a different unquoting method:

(defmacro write-to-file-macro2 [fn name]
    `(with-open [wrtr# (writer ~name)]
        (.write wrtr# (str (~@fn))))
)

Note that the only difference is that we use ~@fn instead of ~fn. This causes the argument to be spliced inline. Expanding the two macros looks like this:

(macroexpand-1 '(with-write-to-file #(str "f") "foo.out"))

(clojure.core/with-open [wrtr__12__auto__ (clojure.java.io/writer foo.out)] (.write wrtr__12__auto__ (clojure.core/str ((fn* [] (str f))))))
(macroexpand-1 '(with-write-to-file2 #(str "f") "foo.out"))

(clojure.core/with-open [wrtr__18__auto__ (clojure.java.io/writer foo.out)] (.write wrtr__18__auto__ (clojure.core/str (fn* [] (str f)))))

It’s hard to see but the only difference between the two is that the ‘fn’ argument in the second one is not in its own list, that is, there is one fewer set of parentheses.

I noticed that the macros in the Clojure source code are constructed using ‘list’ rather than quoted templates. In some cases it can be cleaner. Here is the ‘when’ macro:

(defmacro when
  "Evaluates test. If logical true, evaluates body in an implicit do."
  [test & body]
  (list 'if test (cons 'do body)))

Written by newcome

February 12, 2012 at 12:13 pm

Posted in Uncategorized

Clojure lazy sequences, ISeq

with one comment

Functional languages like Clojure support lazy evaluation of expressions. This contrasts sharply with languages like C# where every expression is evaluated immediately. In order to get something that resembles a lazy sequence in Clojure would be to use IEnumerable and yield.

Ok so I should probably dig into comparing Clojure sequences with C# Iterators, but I’m going to do that later on. First I want to explore Clojure sequences since they are little more slippery than I first imagined.

The Clojure docs would initially have you believe that every data structure provided by the core library is already a sequence, implementing ISeq. This isn’t explicitly true. Let’s take a look at a few examples:

Is a vector a seq?

user=> (seq? [1 2])
false

Nope. How about a map?

user=&gt; (seq? {:foo 1})
false

Again, no. What is the deal? Aren’t all of these things supposed to support ISeq? Ok, what about a list?

user=> (seq? (list 2 3))
true

Ok, now we’re getting somewhere. So a list implements ISeq by default. So, since a vector isn’t a seq, we shouldn’t be able to take the first item using ‘first’ right? Let’s try it:

user=> (first [1 2])
1

Ooops. What’s going on? Reading the docs more closely reveals that the above is actually calling a function called ‘seq’ on its argument before evaluation. So the following expression is actually equivalent:

(first (seq [1 2]))
1

Now, what does it look like when we create a sequence from a vector?

user=> (seq [2 3])
(2 3)

It looks like a list. Let’s see if it is.

user=> (= (seq [2 3]) (list 2 3))
true

Wow. Let’s double-check that result.

user=> (= (seq [2 3]) [2 3])
true

Ok, that’s strange. We know that a vector is not a seq by default, so there must be some coercion going on here.

So now how about using lazy sequences? Let’s create an infinite sequence. We can do this easily using something like ‘cycle’ which takes some data structure and returns an infinitely repeating sequence of the given values. For example:

user=> (take 5 (cycle [1 2 ]))
(1 2 1 2 1)

If we don’t ‘take’ just a few elements, this will repeat forever. Let’s check what we assume is the case:

user=> (seq?(cycle [2 3 ]))
true

What do you know? ‘cycle’ returns a seq. In the above sample, the cycle is obviously not fully computed before we ‘take’ our result, otherwise it would never finish. So cycle returns a lazy seq.

Later on I’m going to explore laziness in Clojure and expand on some of these observations.

Written by newcome

February 11, 2012 at 4:59 pm

Posted in Uncategorized

Functionally speaking with Clojure and Javascript

with one comment

I’ve been playing around with Clojure a lot more recently as a result of a new project that I’m working on. I have played around with Clojure before, especially in the context of the recent .NET port of Clojure.

I considered myself to be pretty familiar with functional programming ideas. Higher order functions, function application, etc. These are all technically things that I do all the time with Javascript.

So I’ve also played around with F# and Haskell, which would probably be more interesting comparisons here, but since I know Javascript so well, and I’ve written a lot about JS in the past, I think I’ll see how far I can go in a comparison using what I’ve learned so far about Clojure. I’m going to start off with some similarities, but later on I want to address some things that are fundamentally different like lazy evaluation and immutable data structures.

First off, one of the most useful things functionally about Javascript is the ability to define an anonymous function, or lambda, shown here in the following code snippet:

function( x, y) {
    return x + y;
}

Guess what that does? Yep, it is a function that sums its arguments. In Javascript it’s very nice to be able to define a lambda in almost exactly the same way that we would normally define a regular named function. In fact, it is useful to think of this lambda definition as simply a language-supported constructor function that we call and leave the return value unbound.

Just for completeness, here is the same function, bound to the name ‘sum’ to be called later.

sum = function( x, y) {
    return x + y;
};

Also the alternative construct where the name is ‘magically’ bound without the use of the assignment operator:

function sum( x, y) {
    return x + y;
}

I showed this last form since it reflects the clojure example that I’m about to show.

In Clojure, we would define the named function sum using a macro called ‘defn’ that is provided by the Clojure system. What is a macro? Let’s not worry about that for the moment. Using a macro looks just like calling a function in this case, so let’s just think of the following sample as us calling a constructor function like we imagine the Javascript ‘function’ keyword to be.

(defn sum [x y] 
  (+ x y)
)

Ok, I have formatted the code so that it follows more closely the C-style indentation convention that I commonly use for Javascript. This is to more clearly show the parallels between what we wrote in JS with the Clojure example.

If we think of ‘defn’ as our ‘function’ language construct in JS, we can see that syntactically, the scope braces have been moved out around the entire expression and we don’t seem to explicitly be returning anything. Well we’ve definitely seen the latter in other familiar languages like Ruby. Of course the function evaluates to the result of the last expression in the function. The argument list is actually given as one of Clojure’s built-in data types called a vector. In JS the specification of the argument list is supported by the language parser. In Clojure it is just a regular data structure that is passed to the macro. This is an important distinguishing factor of Clojure – that is, the language is homoiconic. Clojure code is actually expressed in terms of Clojure data structures.

Ok enough philosophy. What about lambda functions? Well it turns out that there are two ways to express them in Clojure. Not surprisingly one maps closely to the other. The difference is that one is implemented as a macro and is more verbose, and the other is less verbose as a result of being implemented at the reader level. What is the reader? I don’t have enough space to go into that here, but suffice it to say that, much like in Javascript when code is evaluated as the file is parsed, there are several stages at which Clojure can evaluate code other than at runtime proper. One of these times is during the ‘reading’ of the file. This feature allows us to express some commonly-used constructs very concisely as we will see in a minute.

Using the ‘fn’ macro looks like this:

(fn [x y] 
  (+ x y)
)

The same explanations that I gave above for a named function apply here. The biggest difference is (apart from having a different macro name) is that, wait for it, it doesn’t take an argument for the function name. The function definition is returned as a result of macro evaluation in both cases, but in the case of ‘defn’, ‘def’ is used to bind the result to an externally-accessible name.

We could actually mimic the behavior of ‘defn’ using a combination of ‘def’ and the lambda macro ‘fn’, as in the following code snippet:

(def sum (fn [x y] 
  (+ x y)
))

In my mind this is analogous to the Javascript example in which I used the variable assignment for to bind the function to a name:

sum = function( x, y) {
    return x + y;
};

What about the more concise version using reader macros that I alluded to earlier? Well, I’ll drop this on you, but I don’t have a very good way to explain it other than the reader sees the special sequence #( and internally converts it to the form using ‘fn’ that we saw above.

#(+ %1 %2)

That’s all there is to it. The percent characters denote the positional arguments to the function. No need to explicitly define the argument list other than to name them when used. To see what the reader produces as output we can quote the form like this:

'#(+ %1 %2)
(fn* [p1__297# p2__298#] (+ p1__297# p2__298#))

The single quote character prevents the expression from being evaluated so that we can see what it looks like first. The argument names have automatically been automatically generated to avoid conflicts (I think this is similar to the idea of ‘gensym’ in Lisp). I’m not sure what the difference between ‘fn’ and ‘fn*’ is at this point. Structurally, the generated code looks just like what we wrote in the first example using ‘fn’.

Ok so that’s lambdas.

JS doesn’t support lazy evaluation directly in the language, but since we can do higher-order functions, I think we could fake it if we wanted to. Immutability as it exists in Clojure is just right out of the question in JS. It could be done in JS by making deep copies of every data structure every time an assignment is made, but it won’t achieve constant-time performance (O(1)) like Clojure does.

However, the idea of homoiconicity in JS is really ripe for discussion I think. Javascript’s rich object literal format (JSON, roughly speaking) allows a whole lot of the language to be expressed as data structures. Not fully though, as statements of a function are not directly expressed as JS data, but I think I’ll write a post later comparing Clojure lists and vectors to JS Objects and arrays.

Written by newcome

February 8, 2012 at 6:09 pm

Posted in Uncategorized

That pesky bug

with 3 comments

I want to talk for a minute about that annoying bug. That one that doesn’t crash the app but makes it do something annoying. That bug that for some reason, has some dependency on the way that the app was designed early on, so the simple fixes break something else in the app.

How did this happen? We were so careful when we designed this thing! We built things up and tested in small pieces. We continuously integrated everyone’s changes and refactored things as we went.

Well, unfortunately it is nearly impossible to optimize a program along all axes. Somewhere along the line, a decision was made, probably a correct one, that put us down this path. So now we have one axis of the program that has gotten tricky to deal with. Optimizing for this axis is going to wreak havoc on the rest of the program.

Begrudgingly, we revert the quick fix for this bug and push it down on the priority list. But hey bug man, your number is up on the next iteration!

Written by newcome

February 7, 2012 at 11:58 am

Posted in Uncategorized