Dan Newcome, blog

I'm bringing cyber back

Archive for June 2011

Human-writable RDF

with one comment

I’ve done a little bit with RDF in some past projects where I used RDF entailment to do some logic for me. Once I had the data in the right format, I wrote some rules and let one of the off-the-shelf reasoners do a lot of the work for me.

The trouble was, as simple as the abstract graph representation is, the actual serialization formats are tricky to work with by hand and even trickier to work with in places where you don’t have access to a fully-featured RDF parser (javascript for example.)

I wrote a Javascript tool for manipulating a Turtle-like RDF serialization format a while back when I had the need for doing programmatic transformations on RDF data in the browser. The tool and the format are both called Jstle (jostle). You can check that project out on github if you are interested.

While Jstle is a nice format to do data-munging with, it is still not fun to write. Turtle is much better to write by hand at the expense of being more complex to parse. However, Turtle is not the most supported RDF format, and for many things I’ve found myself needing to whip up some RDF/XML files. Tools abound that can convert between formats, but the generated output is very verbose in many cases, fully expanding all URIs to their canonical representations. Once in such a format, they are difficult to work with by hand.

Recently I’ve found myself needing to do some RDF again, and I went back through my notes to figure out the details of RDF/XML. There are many ways of expressing the same graph in XML, and there are a few tricks to keep things simple to deal with by hand.

Consider the following RDF graph expressed in the canonical NTriples form:

<http://www.example.com/a> <http://www.example.com/b> <http://www.example.com/c> .
<http://www.example.com/a> <http://www.example.com/b> <http://www.example.com/d> .
<http://www.example.com/a> <http://www.example.com/e> <http://www.example.com/f> .
<http://www.example.com/a> <http://www.example.com/e> <http://www.example.com/g> .

Each edge of the graph is explicitly defined in full using fully-qualified URIs. Working with this format is a pain by hand, since we have a lot of redundancy in the markup that we can’t abbreviate.

Now consider the following graph expressed in RDF/XML:

<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns="http://www.example.com/">
  <rdf:Description rdf:about="http://www.example.com/a">
    <b rdf:resource="http://www.example.com/c"/>
    <b rdf:resource="http://www.example.com/d"/>
    <e rdf:resource="http://www.example.com/f"/>
    <e rdf:resource="http://www.example.com/g"/>
  </rdf:Description>
</rdf:RDF>

Here we’ve managed to avoid repeating that every assertion is being made with ‘a’ as the subject. However we still have a lot of repetition here. We were able to shorten the predicate statements by virtue of their referencing a default xml namespace declared at the top of the document.

There is one more step we can take here to avoid having to repeat the base URIs in the rdf:resource attributes. That is to set the xml base URI. This mechanism is closely related to using rdf:ID, but does not use the HTML fragment mechanism by default (which is confusing in itself and may be the subject of a later discussion). Using xml:base we can express the same graph as follows:

<rdf:RDF
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns="http://www.example.com/"
        xml:base="http://www.example.com">
        <rdf:Description rdf:about="a">
                <b rdf:resource="c"/>
                <b rdf:resource="d"/>
                <e rdf:resource="f"/>
                <e rdf:resource="g"/>
        </rdf:Description>
</rdf:RDF>

Still we have the verbose xml markup, but at least we don’t have to worry about the base URIs anymore. I’ll surely remember more details and tricks as I go along.

Advertisements

Written by newcome

June 22, 2011 at 4:23 pm

Posted in Uncategorized

Writing a .NET unit testing framework in 100 lines of code

leave a comment »

Some time ago, I got frustrated with setting up NUnit for simple testing projects. I didn’t want to bother with the test runner, and I didn’t want to mess with any tooling in Visual Studio. My lazy default for doing the tests was to create a new console application and use it to drive the system under test.

After a while I figured it was about time to at least put my assertion code into a project and reuse that every time I did a lazy console test runner. You know how these things go after that. I figured it couldn’t be too hard to support using .NET attributes to mark test cases and maybe do test fixtures. I ended up with a complete unit-testing framework including assertions in under 100 lines of code. You can see the result on github.

In order to enable something like this:

[FestTest]
public void FirstTest() {
   // do AAA here
}

We need to get all methods in the assembly that are adorned with the FestTest attribute and run them. Seems simple enough, here is the basic framework:

Assembly executingAssembly = Assembly.GetCallingAssembly();
Type[] types = executingAssembly.GetTypes();
foreach( Type type in types ) {
	MethodInfo[] methods = type.GetMethods();
	foreach( MethodInfo method in methods ) {
		object[] attributes = method.GetCustomAttributes( typeof( FestTest ), true );
		if( attributes.Length > 0 ) {
			// Invoke the test method here
		}
	}
}

In my final version of the code I decided to support the use of test fixtures using attributes as well. That is, you are able to inject a new instance of a fixture into a test method using something like this:

// test fixture we'd like to inject into the test
public class MyFixture { 
	public string teststring = "teststring";
}

[FestTest]
[FestFixture( typeof( MyFixture ) )]
public void Test1( MyFixture myfixture ) {
	...
	Fest.AssertEqual<string>( myfixture.teststring, "teststring");
}

This added to the complexity of the code significantly, but still the library comes in at under 100 lines. I’ll post some details about how I did this in a future blog post.

Written by newcome

June 11, 2011 at 11:18 am

Posted in Uncategorized

Using the Mongrel2 web server with Mono

with one comment

I’ve been playing with evented web servers lately, and one of the projects that caught my attention was Mongrel2. Actually I followed its progress via Hacker News and even made a small donation to the author during development. At the time I hadn’t envisioned myself ever actually using it. Later on I realized that its language-agnostic architecture would allow me to set it up in front of some existing .NET code.

What I didn’t fully realize until I played around with it was that Mongrel2 achieves its language agnosticism by way of a message passing architecture. The main Mongrel2 server runs the event loop (really a coroutine system implemented using libtask) and everything else happens in handlers that communicate with the main process over zeromq connections. This architecture allows great flexibility in setting up a high-performance web server.

The configuration system is another area where Mongrel2 differs from most server software. It was written so that server management was loosely based on the Model-View-Controller pattern. The running server listens on a port that can receive control commands and return process statistics, and the configuration comes from a mysql database that is managed using a commandline tool.

I think I’ve said enough about the design for now, for much more detailed info check out the humorous and informative manual.

Building Mongrel2 on Ubuntu 10.10 required me to build zeromq from source since the Ubuntu packages were too old. So do the drill for both zeromq and then mongrel2:

./configure
make
sudo make install

In order to get .NET code to run in response to our HTTP requests, we’ll need a second process that handles the requests that the main Mongrel2 proecess forwards on. I used an open source handler called m2net.

In the source there is a project called “m2net HandlerTest”. We’ll need to make some changes to suit our environment, so look in the Project.cs file and set the ip address that we’ll be running the server on. We’ll also want to have the client ID match what we put in the Mongrel server configuration for the handler.

  string vboxIp = "127.0.0.1";
            var conn = new Connection("34f9ceee-cd52-4b7f-b197-88bf2f0ec378", "tcp://" + vboxIp + ":9997", "tcp://" + vboxIp + ":9996");

Build the project using make. I had to tweak the makefile, your mileage may vary.

Here is what my Mongrel2 configuration file looked like. This file is boilerplate except for the section under ‘hosts’ for the /csharp route. The configuration should be pretty self-explanatory, but I will note that this configuration terminology mirrors what zeromq uses, and that the recv_ident is not specified becuase the handler will echo the send_ident when it handles the request.

main = Server(
    uuid="f400bf85-4538-4f7a-8908-67e313d515c2",
    access_log="/logs/access.log",
    error_log="/logs/error.log",
    chroot="./",
    default_host="localhost",
    name="test",
    pid_file="/run/mongrel2.pid",
    port=6767,
    hosts = [
        Host(name="localhost", routes={
            '/tests/': Dir(base='tests/', index_file='index.html', default_ctype='text/plain'),
            '/csharp': Handler(send_spec='tcp://127.0.0.1:9997',
                       send_ident='34f9ceee-cd52-4b7f-b197-88bf2f0ec378',
                       recv_spec='tcp://127.0.0.1:9996', recv_ident='')
        })
    ]
)

servers = [main]
 

Start the handler process:

dan@X200:~/tmp/m2net/bin$ ./m2net.HandlerTest.exe 
WAITING FOR REQUEST

Start the mongrel process:

dan@X200:~/Downloads/mongrel2-1.6$ m2sh start -host localhost
[WARN] (errno: None) No option --db given, using "config.sqlite" as the default.
[INFO] (src/handler.c:327) MAX limits.handler_stack=102400
[INFO] (src/config/config.c:158) Loaded handler 1 with send_spec=tcp://127.0.0.1:9997 send_ident=34f9ceee-cd52-4b7f-b197-88bf2f0ec378 recv_spec=tcp://127.0.0.1:9996 recv_ident=
....
[INFO] (src/handler.c:281) Binding handler PUSH socket tcp://127.0.0.1:9997 with identity: 34f9ceee-cd52-4b7f-b197-88bf2f0ec378
[INFO] (src/handler.c:303) Binding listener SUB socket tcp://127.0.0.1:9996 subscribed to: 

I elided some less interesting log output to show mostly the handler messages. We can see that Mongrel has seen our configuration and that the sockets are bound to our running handler. Something to keep in mind is that if we start things up in a different order, things still work. Handlers can connect and disconnect at any time, and if no handler is available the request will be queued until a handler is available. I think a lot of this we get for free from zeromq message handling.

Now we’ll kick the tires by hitting the server with the following URL:

http://127.0.0.1:6767/csharp

If everything is working correctly it should spit back the headers like this:

Sender: 34f9ceee-cd52-4b7f-b197-88bf2f0ec378 Ident: 3 Path: /csharp Headers:
	PATH: /csharp
	x-forwarded-for: 127.0.0.1
	cache-control: max-age=0
	accept-language: en-US,en;q=0.8
	accept-encoding: gzip,deflate,sdch
	connection: keep-alive
	accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
	accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
	user-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16
	host: 127.0.0.1:6767
	METHOD: GET
	VERSION: HTTP/1.1
	URI: /csharp
	PATTERN: /csharp

Now that we’ve seen things in action, I’m going to point out a few things that make this pretty awesome.

First, messages get queued if the handler isn’t connected. This seems like a bad thing since the client will hang until it times out, but if we design the architecture with a fail-fast mentality, we can design our handlers so that they can bail and restart very quickly. Not only that but we can have many handlers running at a time, so the probability of one not being available is smaller.

Second, we don’t specify the handlers in the Mongrel2 configuration. We specify how Mongrel should listen for handlers that attempt to connect. There is a huge difference here. It means that the actual HTTP server doesn’t know or care about the details of the handlers. The handlers can connect and disconnect at will and Mongrel doesn’t care. Scaling up might mean just running a few more handler processes, which dynamically connect to handle the load.

I suspect that the “MVC” server administration system has some interesting benefits too, but I haven’t explored that part of things yet. If I end up using this for a real project, I’ll write some more about it.

Written by newcome

June 5, 2011 at 7:33 pm

Posted in Uncategorized

Method_missing in C#

with 3 comments

There have been a lot of instances where I wanted to intercept any call to an object in C#. In dynamic languages like Ruby, this is pretty easy, since there is a catch-all method that gets called if no method exists. This method is method_missing.

Now that C# supports the dynamic keyword and the framework supplies a dynamic object base class, we can now emulate method_missing.

One of the things I want in C# is a Javascript-like hash object that can be accessed like a dictionary or like a property bag using either property or indexer syntax. Phil Haack covered an early usage of DynamicObject here, and later the .NET framework included a type called ExpandoObject which allows the use of property accessor syntax for adding attributes dynamically.

However, I wanted to be able to use the indexer syntax in addition to property syntax. The only way to do this with ExpandoObject is to cast to Dictionary and access the items using IDictionary methods like this:

ExpandoObject expando = new ExpandoObject();
((IDictionary)expando).Add("theanswer", 42);

This is unnecessarily messy so I rolled my own ExpandoObject based on Phil’s code that allows indexer access to its members. Here is what I came up with:

using System;
using System.Collections.Generic;
using System.Dynamic;

public class SuperSpando : DynamicObject {

	Dictionary<string, object> 
		m_store = new Dictionary<string, object>();

	public object this[ string key ] {
		get {
			return m_store[ key ];
		}
		set {
			m_store[ key ] = value;
		}	
	}	
	  public override bool TrySetMember(SetMemberBinder binder, object value) {
	   m_store[ binder.Name ] = value;
	    return true;
	  }

	  public override bool TryGetMember(GetMemberBinder binder, 
	      out object result) {
	    return m_store.TryGetValue(binder.Name, out result);
	  }
} // class

Here is a sample use case showing both access methods:

dynamic ss = new SuperSpando();

// normal assignment works for ExpandoObject also
ss.foo = "bar";

// can't do this with ExpandoObject
ss["baz"]= "spaz";

Console.WriteLine( ss["foo"]);
Console.WriteLine( ss.baz );
Console.ReadLine();

Later on I might have to databind SuperSpando. I’m messing with a few reflection-based methods of doing this, but it looks like using DataTable might be the best way to do it. Seems ugly to me, but then so does using reflection.

Written by newcome

June 5, 2011 at 11:34 am

Posted in Uncategorized