Dan Newcome on technology

I'm bringing cyber back

Archive for March 2012

Building mesh abstractions

leave a comment »

I’ve had a concept rattling around in my head for a while but I haven’t quite found the right way to describe it yet. The closest that I’ve come is the concept of a mesh as a metaphor.

Abstractions are a tricky thing in that they are never perfect. In some cases we try to make them a completely airtight layer over what we are trying to simplify. In the worst case we have some gaping holes that just don’t map well into our desired solution space. In other cases we just concentrate on some small area of the problem that is the most painful.

Frameworks and Libraries

The first extreme we generally call frameworks. They are some constructed worldview of the problem space that is given in order to try to fit everything into its place according to that view. Web frameworks, for example, have some abstractions around the request/response model, probably parsing HTTP headers for us and providing ways to set response codes. The problem with frameworks, especially ones that try to do everything, is that if you need to do something that the author didn’t think of, we have to hack some kind of workaround. In the best case, if the framework is particularly well designed, we might have a plugin or module system that we can use to extend things to do what we need. In the worst case, there may be a bunch of sealed-off code standing in the way of getting what you want (usually only the case with proprietary precompiled code). Either way, you have to dig in and learn how to mold your code to the framework designer’s worldview.

On the latter extreme we have libraries that provide some focused functionality that we can pull in to solve a particular problem. If the library doesn’t do what we want, we can pick a different one or hack/write our own. This second case is mercifully becoming more standard as the call for better software composability is going out wider in the developer community. Sites like microjs make it easy to find small libraries that work together.

Another way?

As a result of a recent project that I’ve been working on, namely the Donatello HTML/CSS drawing API, I’ve begun to think of some abstractions as more of a mesh rather than a framework or library. A mesh abstraction is intentionally and uniformly leaky, like a wire screen. Like a framework, there are plenty of guide wires to use when you need them, but you know that you can easily look down through the mesh – it has substance, but it is transparent and porous.

One of my goals for Donatello was to make it easy to leverage the power of CSS and HTML for drawing graphics in the browser. CSS is powerful as a graphics language, but I felt like it took too much effort to get the results I was looking for. I wanted to augment the power of CSS without obscuring it. Another goal was to leverage existing tools for things like animations and event handling. I didn’t want to address these tangential concerns that are better served by other existing tools.

Taking cues from jQuery and Raphael, I conceptualized a lightweight layer that handled the creation of drawing primitives rendered using CSS-styled HTML elements. Like jQuery and Raphael, you have full access to the underlying DOM elements at any time, and Donatello will get out of your way. However, like the wires in the mesh, the library provides SVG-like drawing attributes such as “fill” and “stroke” that map down to the appropriate CSS elements. Using the same API, any CSS property may also be directly applied. I’ve begun thinking of this as going down through the holes in the mesh. You get a more direct way of approaching a problem without completely going underneath the abstraction, which would be getting the underlying node and directly setting the property.

This may seem like a subtle difference but I think it’s important. The mesh augments the use of the underlying technology and at the same time offers consistency with the abstraction. Like a designer’s template, the mesh serves as a guide without forcing you to abandon it completely if something doesn’t quite fit between the lines.

Written by newcome

March 27, 2012 at 10:58 pm

Posted in Uncategorized

Text transformation with JSON and regular expressions

leave a comment »

Ever since I wrote the Jath Javascript XML processing library I’ve been thinking about ways to declaratively transform various things to JSON.

Perhaps not-so-coincidentally, I’ve been talking about ways to update the tools we use to pass around data structures rather than text blobs lately.

Since I wrote that post, I’ve been toying with small stepping stones that would get us closer to realizing a more “functional” experience in the Linux shell without ripping everything out and starting over. A start is to have a way to get the output of common shell commands and parse it into something like JSON.

As I said at the top of this post, text transformations have been something I’ve been playing with in various forms for a while now, so I dug up a project that I started back when I was working on Jath that does essentially what Jath does but with regular expressions instead of XPath queries. The idea is to provide a template that transforms plain text to JSON using regexps as the selectors.

For those of you not familiar with Jath, it uses a template like this:

var template = [ "//status", { id: "@id", message: "message" } ];

To turn this:

<statuses userid="djn">
    <status id="1">
        <message>Hello</message>
    </status>
    <status id="3">
        <message>Goodbye</message>
    </status>
</statuses>

into JSON like this:

[ 
    { id: "1", message: "Hello" }, 
    { id: "3", message: "Goodbye" } 
]

Keeping that same idea in mind, let’s look at the output of a common Unix utility, ifconfig:

dan@X200:~/$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:1f:16:15:2e:b1  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:20 Memory:f2600000-f2620000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:550973 errors:0 dropped:0 overruns:0 frame:0
          TX packets:550973 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:70904676 (70.9 MB)  TX bytes:70904676 (70.9 MB)

I’d like to turn this into something like the following:

[ 
    { "Link encap": "Ethernet", "RX packets": "0" }, 
    { "Link encap": "Local Loopback", "RX packets": "550973" } 
]

As a first approximation a template for the above transformation would look something like this:

[ "\n\n", {
        "RX packets": /RX packets:([^\s]*)/,
        "Link encap": /Link encap:([^\s]*)/ 
} ]

I have a little proof-of-concept code that works using the above template, but I’m thinking that the template format could be improved. It seems like a bit much to ask to produce such ugly regular expressions for the template selectors.

Also, maybe we’d want support for a hash object instead of an array. This is something that Jath can’t really do and I never thought about supporting it before. That is, the result collection might look like this:

{
    "eth0": { "Link encap": "Ethernet", "RX packets": "0" }, 
    "lo": { "Link encap": "Local Loopback", "RX packets": "550973" } 
}

Maybe a template format for a hash like this is some kind of glob-style expansion for the key name:

{ "/\b(.+?)\b[\S\s]*\n\n"/g: {
        "RX packets": /RX packets:([^\s]*)/,
        "Link encap": /Link encap:([^\s]*)/ 
} }

The regular expression for finding the keys is a little rough in this case. In order to make this work we are relying on a global regex that we can keep calling exec() on to find the next match until they are exhausted. We only want the capture group, so this would look something like the following:

while( var res = re.exec(input)[1] ) {
    push returnval( res );
}

This stuff is rough for the moment, but I think it could be an interesting way to process arbitrary textual command output in a declarative way by mostly re-using existing solutions like regular expressions.

Written by newcome

March 13, 2012 at 1:24 am

Posted in Uncategorized

Testing the Clojure way

with 4 comments

I’ve been through several iterations of testing fads already in my programming career. The bottom line regardless of any fads however, is that you need to test your code. Somehow, some way. Manual testing is bad, automated testing is good. 100% test coverage is impossible, and shouldn’t be a stated goal. Zero defects is silly, not all bugs are created equal and emphasis on design over testing should be preferred.

That’s a lot of philosophical rubric for one blog post, and possibly I’ll delve more into some of these things, but like most things in life, we have diminishing returns on tests, and code that is less well understood tends to be the buggiest in your codebase. It’s best to focus our efforts on where it matters, and this is what I want to talk about here.

So there is one thing that I like in my programs over all else – clarity of design. When you look at some code, there should not be more than one concern addressed by that code. In the normal course of development this can be adhered to, but there are many things that work to thwart this vision. In my mind the two biggest threats to this vision are logging and testing.

With logging we are always making the tradeoff of code readability and performance versus trace granularity when we want to figure out what things are doing at runtime. With testing, we are trading off extra design complexity for ease (even possibility) of testing.

I don’t want to rant too much here, but C# has been one of the messiest languages for me to test in my career. I suspect that Java is just the same, but I have done comparatively little work in that language so I’ll leave that for you to decide. The main problem is the mechanism of software composition. Both Java and C# have strong type systems that determine how code is organized into logical units. At runtime, these units are either assembled by inheritance or via composition. In either case, in order to conditionally change the way the code is set up at runtime requires a lot of up-front planning. The most frequently used pattern here is Dependency Injection.

I have nothing against this pattern actually, but let’s be honest with ourselves – when one of the fundamental operators, the ‘new’ operator, becomes a code smell you know the language has a serious problem that needs to be addressed.

So, what does any of this have to do with testing?

If you have not been particularly rigorous in adhering to a dependency injection approach, it’s likely that your software will be very hard to unit test. Trying to isolate any one part of the system for a test becomes unlikely. And, since my favorite question to ask when testing anything is “what exactly are you testing?” we’ll probably have to do some refactoring before isolated testing is even possible.

In case it wasn’t obvious, the answer to my favorite question is not “the database”, or “the third party web service”. We know that code works, and if it doesn’t, it is because we didn’t pay attention to version numbers or server maintenance, or any number of other things that are not our code. Let me repeat that, these things are not our code (not what we are testing!).

Finally, I’m going to talk about Clojure. It turns out that, despite the title, this article is only peripherally related to actually testing Clojure code, but it has everything to do with the philosophy of Clojure and functional programming.

How do we do dependency injection in Clojure? Well we could explicitly pass in functions to our functions, (make use of higher order functions), or make extensive use of macros or even keep things limited to the built-in “map” function and the like. However all of these things, while not quite as bad as DI, are orthogonal goals to our design. The idea is to write the code in the clearest possible way without thinking about peripheral concerns like testing. Ideally, clear and correct code would also be the most testable code.

Ok so here is an example of what I’m talking about in Clojure. I’m going to take some code that was written to talk to an external Web service and stub out the call to the service. Ordinarily we’d have to make sure that the service was injected into the code. In C# that would look like this:

void ProcessExternalData( IService svc, DateTime check ) {
    if( DateTime.Now > check ) {
        svc.GetStuffSince( check );
    }
}

In the code snippet above, we use a service to grab some data that has occurred since some time that we specify. When we ask “what are we testing?” the answer should be, we are testing the checking logic here, not necessarily that the GetStuffSince service is doing the right thing. In our test we’d want to try:

ProcessExternalData( mockService, DateTime.Now );
ProcessExternalData( mockService, DateTime.Now.AddDays(-1) );
ProcessExternalData( mockService, DateTime.Now.AddDays(1) );

And we’d check our mock service at the end to make sure we’ve made the calls we thought we’d make.

The problem arises when the code is written like so:

void ProcessExternalData( DateTime check ) {
    if( DateTime.Now > check ) {
        IService svc = new Service();
        svc.GetStuffSince( check );
    }
}

I know that this is an oversimplification, but it illustrates the point that many times we don’t have an easy way of providing an alternative implementation of something. Now we have little choice but to call the real service or resort to really extreme measures like trying to provide an alternative dll that implements the service if it is an external reference, etc. Trying to replace something that is defined in the code under test is pretty much going to require modification to the code. It might be possible to use some conditional compilation tricks such as:

#if MOCKS
using Service = MockService;
#endif

This should redefined Service in the code under test allowing us to specify another implementation depending on some compiler definitions.

Ok so what if I told you that we could overload the C# using directive in the test to override what Service implementation we wanted to use? Here is what it would look like: (not valid C# code)

using( Service = MockService ) {
    ProcessExternalData( DateTime.Now );
} 

This would be effectively changing the C# method under test to look like this:

void ProcessExternalData( DateTime check ) {
    if( DateTime.Now > check ) {
        IService svc = new MockService();
        svc.GetStuffSince( check );
    }
}

In Clojure we can do this with a form called “with-redefs”. Previously we would have used Clojure dynamic bindings but as of Clojure 1.3, we would have had to pre-declare the function as a dynamic binding to make that work, which defeats the purpose that we are going for here.

In clojure this is the function under test:

(defn process-external-data [check] 
    (if (> (now) check) 
        (get-stuff-since)))

Here is how we would test it, providing an alternative implementation of the get-stuff-since function:

(deftest test-process-external-data
    (with-redefs [get-stuff-since stub-get-stuff-since] 
        (process-external-data)))      

We have effectively done exactly what I described in the hypothetical C# example – defined a scope in which any references to the inner service call has been replaced by something of our own choosing.

I’ve only started using this technique, so there may be plenty of pitfalls when doing this. It is kind of unsettling to think that any code you write might be overridden later from above without your knowing, but then that’s kind of the trade-off when going to a less static language. Also, I’m thinking that testing should be a little more black-box, and here we basically need the software equivalent of an x-ray machine to test the way something works. Something about that doesn’t seem right to me. It might come down to which is worse – knowledge of the internals of the function under test, or changing the design for enhanced testability. I have a suspicion that there is no clear answer here.

However, as always I’ll keep everyone apprised!

Written by newcome

March 12, 2012 at 4:56 pm

Posted in Uncategorized

Functional programming and the death of the Unix Way

with 31 comments

Small tools, each doing one thing well, has been the call of the Unix Way since its inception some 40 years ago. However, if you look at even the basic tools that ship with a modern Unix (Linux, BSD) many of them have an abundance of options and layers of additional functionality added over the years.

Every time I have to use anything but the well-worn options of tools like tar and sed I’m reminded of how deep some of these tools really are. Not only are they deep but often times there are a dizzying number of ways to do the same thing and sometimes the only way to do what you really need is more complex than it should be. Take a look at the man page for something supposedly simple like find and check out how many options there are.

Case in point, despite the simplicity of Unix’s plain text output of nearly every standard tool, it can be quite complex to parse that text into a format that you want. Often times I’ll want to grab just one part of a command’s output to use for the input of another command. Sometimes I can use grep to do this, and sometimes grep isn’t quite flexible enough and sed is required. The regular expression required to get sed to do the right thing is often complex on its own, and of course the flags need to be set appropriately. If the data format is in columns, sometimes cut can be simpler.

With this in mind, it seems like the promise of the Unix Way has been lost. When questioned about this very topic, the one and only Rob Pike has been quoted as saying “Those days are dead and gone and the eulogy was delivered by Perl.” With the admission that a more cohesive general-purpose environment is more suited to modern computing, one wonders if the idea of small tools is at fault or whether the sands of time have simply diluted the initial simplicity of the Unix environment. In my view, Perl is hardly a model of cohesion or simplicity, so to say that it improves upon the standard Unix tools is particularly damning.

What would a programming environment look like that embodies the original ideals of the Unix Way? An environment of easily composable tools that perform extremely general but specific functions? The answer is right there in the question merely by the mention of “functions” and “composable”. Modern functional programming languages such as Clojure and Haskell are the closest thing we have to what Unix was intended to be.

Clojure, like most Lisp-like functional languages, is a small kernel with most of the language built up in itself. The idea is that small primitive functions, each doing something basic, are combined to form higher-level functions until finally we have implemented the entire language. Functions in Clojure are inherently composable. That is, like the Unix tools, functions can be combined together to perform more complex tasks. The flexibility of the macro language even allows pipe-like syntax operators so that functions can be composed left-to-right.

Beyond what functional languages give us to mimic Unix, they far surpass it in the flexibility of the data output. Instead of plain text output we have data structures like lists and maps that are easily traversed to transform the data into what we need for the next step of the operation.

However, despite all of the advantages of functional languages, I still write shell scripts. Why? They are the most immediate way to interact with the OS. Unfortunately, languages like Clojure are cumbersome to use to do something quick and dirty. Even Perl can be tricky due to the sheer size of the language and the possibility of module dependencies. Microsoft had a good start with its Powershell programming language in that data is output as parseable object rather than plain text, but it is marred by PHP/Perl-like syntax and procedural focus. Doing many things requires knowledge of .NET and cumbersome syntax to integrate it into the shell.

I’m not advocating a return to Lisp machines here. We tried that and it didn’t work. Symbolics is dead, and no one even gave a eulogy at that funeral. Plan 9 OS never took off, probably because it’s too abstract and elitist. I do think that revisiting some of what we consider Unix gospel is worthwhile though. What if we keep the good – “small tools that do one thing well” – and change one or two not so good things – “everything is plain text, everything is a file”? What if we said that all command output was an S-expression (data list) and that instead of files to interact with the kernel, we had functions or S-expressions? For that matter, for the sake of argument, what if everything was JSON? Maybe getting data from the kernel process list was the act of mapping a function over that data structure?

I think that a lot of progress could be made in by applying some ideas of functional programming to the standard Unix way of computing.

Written by newcome

March 6, 2012 at 1:54 am

Posted in Uncategorized