Dan Newcome, blog

I'm bringing cyber back

Archive for November 2010

Browsers like RockMelt and Flock are Harmful to the Web

with one comment


I recently received a beta invite for the new RockMelt Web browser, which is funded by Andreessen Horowitz, of Marc Andreessen/Netscape fame. Naturally I was curious to see what they are working on, Andreessen knowing a thing or two about the browser market, having been instrumental in the birth of the segment in the early nineties.

However, another part of me remained skeptical, having tried out the Flock browser, which appears to be a similar idea but based around Mozilla/Gecko rather than Chrome/Webkit as RockMelt is.

Both RockMelt and Flock have as a central idea that the applications and social media sites that you frequent should be the center of your browsing experience. To that end they include built-in clients for services such as Twitter and Facebook. I didn’t realize how big of a shift this was until I tried out RockMelt, which required me to log into Facebook before it would even start up. Somehow this strikes me as a huge step backward and is reminiscent of my experience with Google Chrome OS, which required a Gmail account to even log into the operating system.

Why is this bad? The success of the early Web was predicated on universality and transparent interfaces. With services such as Facebook and Twitter, data is walled off and presented only through a Web user interface provided by the companies that publish the service. Instead of having the services move toward making themselves more interoperable with the Web, we have browsers that are catering to the individual whims of the social networks by building special-purpose clients right into the browser.

This screams to me the need for an way for the Web itself to express concepts like social relationships, status and presence rather than scrambling to support each and every social networking site in our browsers.

One alternative that could bridge the gap between where we are now and future Web support for social networking features would be something like what GroupDock is doing. GroupDock is a commercial product that a friend and fellow entrepreneur Luc Castera is working on, but the idea is to allow the delivery of small apps that are built on OpenSocial and Sproutcore. If we twist this idea around slightly and allow these small apps to be more integrated into the browser, we might have a perfect blend between accommodating service-specific apps while not treating the browser as a glorified thick client for specific services.

Advertisements

Written by newcome

November 21, 2010 at 4:21 pm

Posted in Uncategorized

The URI as a Data Lingua Franca for the Web

leave a comment »

I just read Tim Berners-Lee’s piece in Scientific American about what makes the Web the powerful tool that it is and how certain forces threaten to jeopardize its power. One of the things that he mentions in the article is the increasing isolation of data on the Web. Whether we realize it or not, our data is stored in opaque “islands” on the Internet. Although data portability is a rising concern for many Web-savvy netizens, the problem is much further-reaching than just downloading a data extract with your social network contacts. Perhaps unintentionally, valuable data is being locked away in government agencies, research institutions and in industry.

At its heart, the Web isn’t just an application, nor is it a collection of applications. The Web is data and relationships between data. Programmers and application designers know that data often outlives its initial application, so why are we building data obsolescence into the Web?

The answer is mostly due to the legacy of relational databases and traditional data connection protocols. Before the Web, applications were often two-tier affairs where the server tier was simply an off-the-shelf RDBMS that held a custom user schema and data, while the client tier held the user interface and all of the business or domain logic. In this model, the data and schema were exposed via a vendor-specific protocol, but were queryable using an industry standard language — SQL.

On the Web, applications were initially deployed in three tiers — the data tier was still there just as it was before, but all of the heavy lifting was moved out of the client and into a Web application tier. Early Web browsers didn’t support many of the things that enable the rich client-side experiences that we see now in modern AJAX Web applications, so all of the data access was hidden fully behind a Web application layer. Later on in the evolution of modern Web applications, we began to see the rise of the Web API. These APIs were typically run in parallel to the main application and allowed many of the same actions to be performed, but were separate than the main application.

At the beginning of the year I wrote about needing a data API for the Web. Now I’m thinking that maybe we don’t need a specific API standard, we just need guidelines for data relations and discoverability on top of what we already have with REST. Semantic Web technologies showed just how far we could go with using just URIs as identifiers and for data relationships. What I’m proposing here is that we use some of the same ideas put forth by the Semantic Web folks, and apply a small subset of them along with a new HTTP verb for getting schema for a RESTful data URL.

I’m still thinking about what this would look like but here is a quick example. Say we have some billing data. Two of the data elements that we want to manipulate are invoices and line items. This is a well-worn pattern to anyone that has done any business programming. These entities have a parent-child (or master-detail) relationship.

True to RESTful principles, let’s represent the two entities with URIs like the following:

http://example.org/billing/invoice
http://example.org/billing/invoiceitem

In order to see the data relationships between these entities we could do something like this:

Request entity fields

HTTP 1.1 SCHEMA /billing/invoiceitem

Response

http://example.org/billing/invoiceitem#name
http://example.org/billing/invoiceitem#create_date
http://example.org/billing/invoiceitem#parent

Request individual relationship

HTTP 1.1 SCHEMA /billing/invoiceitem#parent

Response

http://example.org/billing/invoice#id

This is just an example of a possible data schema format in keeping with REST principles. My thesis here isn’t the creation of the protocol itself, it is that we should be coding our applications against something like this rather than an application-specific programming interface. The idea is that the applications as well as the APIs that consume the data would both be written against this slightly lower-level data interface. Consumers would be able to use the provided application-specific API for convenience but power users and other data consumers could always look one level deeper for more direct data access.

The basis of RESTful data interfaces is still the URI, as it is for RESTful application interfaces. The URI should always remain a first-class citizen on the Web, and we should strive to express data elements and schema in terms of URIs wherever possible.

This idea is pretty rough, and might be entirely the wrong way of thinking about the problem, but the evolution of my thinking about Web data is moving in this general direction.

Written by newcome

November 20, 2010 at 5:48 pm

Posted in Uncategorized

Fluent interfaces: Iterating on CRMQuery

leave a comment »

I wrote an article several months ago about a small internal DSL that I developed for creating Microsoft CRM QueryExpressions called CRMQuery. I released this project in its nascent state on GitHub in the hopes that it would be useful to others that are in the trenches digging data out of CRM with the CRM SDK.

Lots of you have checked this project out, but I haven’t had many comments on whether it does everything that you’d want it to do (which I seriously doubt — it was intended to make my life easier in on a particular project). Fast forward to today, and CRMQuery has been used on two more major consulting projects that I’ve been involved in, and I can’t imagine going back to creating QueryExpressions the old way. However, I realized recently that I could go a bit further with making queries readable with a relatively simple refactoring of the code.

To show you where I’m headed with this discussion, consider the following query expression written using CRMQuery:

QueryBase query = CrmQuery	
  .Select()
  .From( "events" )
  .Where( "events", "statuscode", ConditionOperator.Equal, new object[] { 1 } )
  .Where( "events", "lastdatetoregister", ConditionOperator.LessEqual, new object[]{ DateTime.Now.ToString() } ).Query;

I started thinking that the criteria expressions looked kind of ugly, and moreover, the same expressions were being used in several places in the project in other queries. What I wanted to write was something like this:

QueryBase query = CrmQuery
  .Select()
  .From( "events" )
  .Where( "events", StatusCodeIsActive )
  .Where( "events", NotPastLateRegistration ).Query;

Just the simple act of renaming the “Where” expressions to something more intuitive made the query more readable to me. The question now is, how can we make this work in the CRMQuery code? Since the where expressions are simply CRM FilterExpressions, I thought that it made the most sense to expose an overload of the Where() method in order to allow passing a FilterExpression in. However, building a FilterExpression by hand outside of CRMQuery defeats the initial intent of building a DSL in the first place. How can we make this all work intuitively?

In the end we’d like to satisfy the following requirements:

  • Caller should not have to manually build a FilterExpression
  • Code to build FilterExpressions must not be duplicated between internal and external calling mechanisms
  • Calls to Where() must continue to work as written in existing dependent code

The current code for Where() looks like this:

	public CrmQuery Where( string in_entity, string in_field, ConditionOperator in_operator, object[] in_values ) {
			FilterExpression filterExpression = new FilterExpression();
			filterExpression.FilterOperator = LogicalOperator.And;

			ConditionExpression ce = new ConditionExpression();
			ce.AttributeName = in_field;
			ce.Operator = in_operator;
			ce.Values = in_values;

			filterExpression.Conditions.Add( ce );

			if( m_lastAddedLink != null ) {
				m_lastAddedLink.LinkCriteria.AddFilter( filterExpression );
			}
			else if( m_query.EntityName == in_entity ) {
				m_query.Criteria.AddFilter( filterExpression );
			}
			else {
				LinkEntity link = FindEntityLink( m_query.LinkEntities, in_entity );
				if( link != null ) {
					link.LinkCriteria.AddFilter( filterExpression );
				}
			}
			return this;
		}

We want to create an overload of this method with the signature:

public Where( string in_entity, FilterExpression in_filterExpression );

Given the current implementation of Where() it should be obvious that an implementation of the interface defined by this method signature would be the same as the current implementation but without the code for creating a new FilterExpression instance. However we will end up with duplicated code for inserting the FilterExpression in the correct place in the resulting query. We could factor out the code for performing the FilterExpression insertion, but thinking about it for a moment longer, we can see that the value in having an isolated implementation of that code is not very useful, but one of our actual requirements was to have an isolated implementation of the code that creates the FilterExpression code.

We can get what we want with a two-step refactoring job. The first step is to get the two different Where() methods implemented without duplicating any code. This is a common pattern, where we split a method into two, with the new overload calling the old one under the hood. This intermediate step looks something like this:

		public CrmQuery Where( string in_entity, string in_field, ConditionOperator in_operator, object[] in_values ) {
			FilterExpression filterExpression = new FilterExpression();
			filterExpression.FilterOperator = LogicalOperator.And;

			ConditionExpression ce = new ConditionExpression();
			ce.AttributeName = in_field;
			ce.Operator = in_operator;
			ce.Values = in_values;

			filterExpression.Conditions.Add( ce );

			return Where( in_entity, filterExpression );
		}
		public CrmQuery Where( string in_entity, FilterExpression in_filterExpression ) {
			if( m_lastAddedLink != null ) {
				m_lastAddedLink.LinkCriteria.AddFilter( in_filterExpression );
			}
			else if( m_query.EntityName == in_entity ) {
				m_query.Criteria.AddFilter( in_filterExpression );
			}
			else {
				LinkEntity link = FindEntityLink( m_query.LinkEntities, in_entity );
				if( link != null ) {
					link.LinkCriteria.AddFilter( in_filterExpression );
				}
			}
			return this;
		}

After this first step we are halfway there. Now we extract the code for creating the filter expression into a static method that we can call separately to create a FilterExpression and will also be used internally by Where() to keep things DRY.

		public static FilterExpression WhereExpression( string in_field, ConditionOperator in_operator, object[] in_values ) {
			FilterExpression filterExpression = new FilterExpression();
			filterExpression.FilterOperator = LogicalOperator.And;

			ConditionExpression ce = new ConditionExpression();
			ce.AttributeName = in_field;
			ce.Operator = in_operator;
			ce.Values = in_values;

			filterExpression.Conditions.Add( ce );
			return filterExpression;
		}

Now we change the new Where() implementation to call this static method for creating a new FilterExpression and we are finished with the refactoring. Here is what the final method looks like:

public CrmQuery Where( string in_entity, string in_field, ConditionOperator in_operator, object[] in_values ) {
	FilterExpression filterExpression = CrmQuery.WhereExpression( in_field, in_operator, in_values );
	return Where( in_entity, filterExpression );
}

Wow, that looks a ton better. Simple, concise and implemented bottom-up — reusing our more primitive building blocks. All of this shuffling around generated almost no new code but it allows us the flexibility to do the following:

// define filter expressions
FilterExpression StatusCodeIsActive = CrmQuery.WhereExpression( 
  "statuscode", 
  ConditionOperator.Equal, new object[] { 1 } 
);

FilterExpression NotPastLateRegistration = CrmQuery.WhereExpression( 
	"lastdatetoregister", 
	ConditionOperator.LessEqual, 
	// TODO: use server time instead of client!
	new object[] { DateTime.Now.ToString() }
);

// use filter expressions in a query expression
QueryBase query = CrmQuery
  .Select()
  .From( "events" )
  .Where( "events", StatusCodeIsActive )
  .Where( "events", NotPastLateRegistration ).Query;

Being able to separately define and name the filter arguments makes the interface much more fluent, and can be understood more quickly and clearly. Maintainability is also increased since we can reuse the same filter definitions. Notice the TODO comment warning that we rely on the client time. This may cause problems down the road if the clients are in a different timezone than the server, among other things. Since we have isolated the issue rather than repeating it in every query, we will be able to address this much more quickly later on.

Written by newcome

November 12, 2010 at 3:15 pm

Posted in Uncategorized