Dan Newcome, blog

I'm bringing cyber back

Functional programming and the death of the Unix Way

with 31 comments

Small tools, each doing one thing well, has been the call of the Unix Way since its inception some 40 years ago. However, if you look at even the basic tools that ship with a modern Unix (Linux, BSD) many of them have an abundance of options and layers of additional functionality added over the years.

Every time I have to use anything but the well-worn options of tools like tar and sed I’m reminded of how deep some of these tools really are. Not only are they deep but often times there are a dizzying number of ways to do the same thing and sometimes the only way to do what you really need is more complex than it should be. Take a look at the man page for something supposedly simple like find and check out how many options there are.

Case in point, despite the simplicity of Unix’s plain text output of nearly every standard tool, it can be quite complex to parse that text into a format that you want. Often times I’ll want to grab just one part of a command’s output to use for the input of another command. Sometimes I can use grep to do this, and sometimes grep isn’t quite flexible enough and sed is required. The regular expression required to get sed to do the right thing is often complex on its own, and of course the flags need to be set appropriately. If the data format is in columns, sometimes cut can be simpler.

With this in mind, it seems like the promise of the Unix Way has been lost. When questioned about this very topic, the one and only Rob Pike has been quoted as saying “Those days are dead and gone and the eulogy was delivered by Perl.” With the admission that a more cohesive general-purpose environment is more suited to modern computing, one wonders if the idea of small tools is at fault or whether the sands of time have simply diluted the initial simplicity of the Unix environment. In my view, Perl is hardly a model of cohesion or simplicity, so to say that it improves upon the standard Unix tools is particularly damning.

What would a programming environment look like that embodies the original ideals of the Unix Way? An environment of easily composable tools that perform extremely general but specific functions? The answer is right there in the question merely by the mention of “functions” and “composable”. Modern functional programming languages such as Clojure and Haskell are the closest thing we have to what Unix was intended to be.

Clojure, like most Lisp-like functional languages, is a small kernel with most of the language built up in itself. The idea is that small primitive functions, each doing something basic, are combined to form higher-level functions until finally we have implemented the entire language. Functions in Clojure are inherently composable. That is, like the Unix tools, functions can be combined together to perform more complex tasks. The flexibility of the macro language even allows pipe-like syntax operators so that functions can be composed left-to-right.

Beyond what functional languages give us to mimic Unix, they far surpass it in the flexibility of the data output. Instead of plain text output we have data structures like lists and maps that are easily traversed to transform the data into what we need for the next step of the operation.

However, despite all of the advantages of functional languages, I still write shell scripts. Why? They are the most immediate way to interact with the OS. Unfortunately, languages like Clojure are cumbersome to use to do something quick and dirty. Even Perl can be tricky due to the sheer size of the language and the possibility of module dependencies. Microsoft had a good start with its Powershell programming language in that data is output as parseable object rather than plain text, but it is marred by PHP/Perl-like syntax and procedural focus. Doing many things requires knowledge of .NET and cumbersome syntax to integrate it into the shell.

I’m not advocating a return to Lisp machines here. We tried that and it didn’t work. Symbolics is dead, and no one even gave a eulogy at that funeral. Plan 9 OS never took off, probably because it’s too abstract and elitist. I do think that revisiting some of what we consider Unix gospel is worthwhile though. What if we keep the good – “small tools that do one thing well” – and change one or two not so good things – “everything is plain text, everything is a file”? What if we said that all command output was an S-expression (data list) and that instead of files to interact with the kernel, we had functions or S-expressions? For that matter, for the sake of argument, what if everything was JSON? Maybe getting data from the kernel process list was the act of mapping a function over that data structure?

I think that a lot of progress could be made in by applying some ideas of functional programming to the standard Unix way of computing.

Advertisements

Written by newcome

March 6, 2012 at 1:54 am

Posted in Uncategorized

31 Responses

Subscribe to comments with RSS.

  1. Take a close look at PowerShell’s design/philosphy on Windows.

    Will Smith

    March 6, 2012 at 6:43 am

  2. Will, you’re correct that PowerShell has the right philosophy. Maybe I haven’t given it quite a fair shake here. I also didn’t realize that there was an open source effort (http://sourceforge.net/projects/pash/) to make it portable until I looked into it after your comment. I might have a look and make some further comparisons.

    newcome

    March 6, 2012 at 6:53 am

  3. S-expressions or JSON would make command output difficult for humans to read when we’re just typing at the terminal. Maybe the output should change format depending on whether it was requested by a terminal or another script? Maybe the caller should be able to specify the format?

    Chris "Jesdisciple"

    March 7, 2012 at 8:19 am

  4. On second thought, just make a command that takes the S-expression or JSON and reformats it as plain text. So instead of `find –text {…}’ we have `find {…} | text’ (keeping with Bash, because I don’t know what our hypothetical language’s syntax is).

    Chris "Jesdisciple"

    March 7, 2012 at 8:26 am

  5. Absolutely! You totally get it. Just like in .NET/etc, where most things implement ToString() to provide a basic human-readable representation. It could be a command, but would be even nicer to be a part of the shell. When going out to the shell the data would be printed according to a default text transformation. This would let you customize the way your shell shows data by providing your own formatter, etc.

    newcome

    March 7, 2012 at 11:52 am

  6. Well, toString is more like `–text’ than `| text’… It’s got to be implemented again and again for every class (unless you want Object.toString, which is rare).

    After my previous comment, I was also thinking about all that unnecessary typing and how to avoid it. Perhaps it could be both a command and part of the terminal. (I understood “shell” above as the CLI interface, but I prefer “terminal” for that.) Although I don’t know how real terminals are implemented, so they might not have all the freedom to mess with code that a terminal emulator obviously could and this suggestion would then break.

    If it’s a command, we can make terminals pipe all commands to it by default. If you want the “real” representation, use a command that converts the whole S-expression/JSON directly to a string; the `text’ command would simply pass a string on through because it’s already acceptable. If you care about optimizing the terminal, you can just detect this other command at the top level of a command and bypass both it and the `text’ command altogether to achieve the same result.

    Chris "Jesdisciple"

    March 8, 2012 at 5:10 am

  7. […] not-so-coincidentally, I’ve been talking about ways to update the tools we use to pass around data structures rather than text blobs […]

  8. Why restrict ourselves to things like S-expressions or JSON? How about using whatever protocols are necessary to do the job?

    Jake

    March 25, 2012 at 6:55 am

  9. Hmm, let’s see. Unix utilities have become Microsnot Word like behemoths, so the solution to that is to start using S-expressions or JSON ???

    This doesn’t make sense!

    In the original UN*X spirit, you can think of each utility (cut, grep, ls, ps) as a “function”. They act and behave exactly like functions. They take input, operate on it, return output without affecting the original input … that sounds “functional” already.

    The issue is the feature-creep that has happened to the original utilities. I’m sick and tired of having to look at 30 pages of text in the man pages just to look at the one of three options I Always use. The issue is that utilities are too complicated now and command documentation is done without any regards for day to day use. The example section in most man pages is absolutely fucking useless. Most people need to use cut command by cutting out a certain field in the previous commands output but I challenge you to go look at any of the man pages and see if you can figure that simple use case out from either the examples or the options galore that have been tacked onto a simple concept.

    So, the solution is to go to a simpler, no bullshit set of commands with a no-bs man-page instead of adding needless complexity to each command by requiring s-expr or JSON as the intermediary. This goes back to a different way of thinking which has gone away. and THAT is the main problem. A way of thinking which does not equate the size of one’s penis to the moronic complexity that one introduces into their particular pet utility. A way of thinking which finds “taking unimportant crap away” elegant instead of acting like a feature pack-rat.

    The documentation has to be treated with the same respect and care (if not more) that is afforded to the actual code. It is all about design. Simple and elegant still works, as long as you don’t start redefining it to be whatever the writer thinks is simple and elegant (I’m looking at you Emacs users/maintainers/creators!)

    Commercial documentation writers get paid by not information density but (it seems almost) by the length of their docs, so they cram it with useless shit and think they’re “helping” while in fact they are actually reducing the usability of the utility command in question. A case can be made that this is a direct result of feature-bloat that overzealous utility writers introduce by adding all kinds of “nice features” to commands which were originally conceived to conform to the philosophy of “do one thing, and do it well, with clear, parse-able output that is usable by other commands when strung together using pipes”

    I call it the “emacs syndrome”. Every author tries to turn their creation into a behemoth that does everything because they feel comfortable the most with their own creation, all the while creating a frikkin nightmare for the end users. My editor should not check the weather, make coffee, create blogs, do financial calculations AND walk the dog by send XML-JSON output to an arduino controlled robot, just because i’m too fucking lazy to get my fat ass out of one and into another command. And I shouldn’t create a monstrosity like that just because I can.

    Mike Moloch

    March 25, 2012 at 9:29 am

  10. ” For that matter, for the sake of argument, what if everything was JSON? Maybe getting data from the kernel process list was the act of mapping a function over that data structure?”

    STOP! Just STOP right there!

    This is /exactly/ the kind of well meaning but fundamentally flawed design approach that has essentially destroyed the “UNIX way.”

    On second thought, and just for fun, you should try to present this to Linus, I’d be extremely interested in reading THAT particular exchange 🙂

    Mike Moloch

    March 25, 2012 at 10:02 am

  11. It seems to me that the feature creep is an indication that the Unix Way isn’t as holy as one is led to believe. There’s a good, human reason we have tar -z instead of gunzip | tar (or even just tar, since it tends to detect the fact that it needs to call gunzip), and that’s because, in practice, it’s easier to have one program which does the thing you want than several programs that do part of it.

    Leszek

    March 25, 2012 at 11:30 am

  12. lol ur retard

    fettemama (@fettemama)

    March 25, 2012 at 1:18 pm

  13. I don’t think the Unix way has failed.

    I think the way people have wanted to use Unix has ebbed and flowed. And Unix has followed suit.

    Loosely coupled tools with a gazillion options vs tightly coupled, easy-to-use tools with little flexibility. vi or emacs? Both have a place in the community.

    I honestly think it depends on the person and the technical problems they routinely tackle. I am continually bothered by power shell because I as a programmer deal in text, and power shell is a bear to do text transformations in (it’s great with objects) You have to slowly massage your strings out in a set of confusing cmdlet calls with odd switches,and the man pages are even less descriptive. But sadly that’s the best and most widely deployed and supported string search+manip solution for Windows.

    Unix lets _you_ pick the right tool for the job. Instead of some company or ideology. I miss it for that reason (few good Unix jobs where I live, lots of Windows ones).

    Jmp

    March 25, 2012 at 3:37 pm

  14. Following up on the PowerShell comment: Check out Object Shell (http://geophile.com/osh). It applies the “Unix Way” to python objects (and runs in a Linux environment). E.g., to kill all java processes:

    osh ps -o ^ select ‘p: “java” in p.commandline’ ^ f ‘p: p.kill()’

    ps lists processes, ^ is the pipe character, select takes a process argument, p, and keeps those with “java” in the command line, and then pipes the selected processes to a function that applies the kill function.

    osh also does database access, piping rows out as python tuples, and distribution, piping commands to one or more hosts, and piping results back as python tuples, each labeled with the host generating it.

    (Disclaimer: I wrote osh.)

    Jack Orenstein

    March 25, 2012 at 7:36 pm

  15. PowerShell is just one language that binds to ActiveScripting. You can pretty much use anything you want.

    xcbsmith

    March 25, 2012 at 8:23 pm

  16. I always thought the Unix Way seemed inefficient, invoking several different processes to perform one operation. PowerShell seems like a good idea, passing objects around instead of requiring everything to parse and format text. Lua seems like it could be adapted to do this well…

    Also look at TermKit. It has a lot of the same ideas, like passing data around in machine-readable formats, then piping them through a convertor before displaying human-readable results in the terminal.

    ⬡

    March 25, 2012 at 11:01 pm

  17. As always, the first enemy of code is size. This is true for any individual program, and this is true for a whole OS. Bigger means clumsier, more error prone, harder to learn… If it can be done at all, significant resources should be devoted to make the system smaller.

    Currently, a full GNU/Linux desktop with basic applications weights more than 200 millions lines of code. To visualize it, imagine we write this code on paper: 50 lines per page, 400 pages per book. That’s 10,000 books, or a whole library. Dense _technical_ books. If you’re super-fast learner and can read one book per week, it would take 200 years to complete. Of course the UNIX way is long lost. There is simply no way that a collection of orthogonal utilities can get this big. Not even a genius can comprehend it, now.

    Now, on to the concrete solutions. The Viewpoints Research Institute is currently working on a system for personal computing that does nearly all basic computing (desktop publishing + web + email + other things) in 20,000 lines of code (a single book). Including the self-implementing compilers. So I’d say there is hope.
    http://vpri.org/html/work/ifnct.htm
    http://www.vpri.org/pdf/tr2011004_steps11.pdf

    Loup Vaillant

    March 25, 2012 at 11:52 pm

  18. […] from: Functional programming and the death of the Unix Way This entry was posted in Tyson Zinn and tagged archives, facebook, freedom, going-out, power, […]

  19. Everything is a stream of bytes. (BECAUSE THAT’S WHAT IT IS!) …

    This is a much simpler and infinitely more powerful concept than tooling around with “les objets du-jour”

    And with all due respect to the osh author, that command looks ugly! I’d take a simple and elegant Unix command pipeline over that any day!

    Mike Moloch

    March 26, 2012 at 3:30 am

  20. “I’m not advocating a return to Lisp machines here. We tried that and it didn’t work. Symbolics is dead”

    Lisp machines died because the hardware was underpowered, and compiler technology wasn’t yet good enough to make up for it. In a decade when Moore’s Law was stepping hard on the gas, they were slower than general-purpose CPUs, even at running compiled Lisp code, almost as soon as they were shipped. When you’ve got 10-minute garbage collection pauses, it doesn’t matter how good your user interface is. But that doesn’t mean the entire concept is bad.

    One of the weaknesses of both the Unix and Lisp ways (though their users don’t really consider it one) is that there’s basically no separation of interface and implementation. The interface to Unix is the set of binaries you happen to have installed; the interface to Lisp is Lisp. Perhaps this blinds people from imagining the possibility of a system with the usability flexibility of a Lisp machine, but the performance of a modern CPU and optimizing compilers.

    While the performance of Lisp machines was poor, I’ve never heard of anyone who used one who didn’t love the user interface.

    Pat

    March 26, 2012 at 7:52 am

  21. I’m going to try to respond to some of the comments I’ve gotten here

    @mike moloch I appreciate the rant, I’m kind of mad too every time I scan a man page needing to do something simple that is outside of the options I already know off the top of my head. My hope is that by making output of commands more consistent data-wise, it would be easier to split things up. For example, say deleting old files requires using find and rm. I hardly use find unless I have to do something like this, but I use ls all the time (thus I’m more likely to remember the options). Doing something like ls -> take-oldest-n -> rm would be really logical, but the output of ls isn’t designed to be used this way and you would need to use xargs I think to make rm work by reading stdin.

    Maybe naming JSON specifically here is just adding fuel to this fire, so let me say that from my experience there is no such thing as ‘plain text’. The output of every command has implicit semantics. I would be fine with some tab-delimited format if it was a consistent way of parsing command output, but even the simplest output like this is still more thinking than it should be. Take sha1sum for example. For some reason unknown to me, it outputs 2 columns of data, one of which is just a ‘-‘ character. I never care about that, I just want the hex string without any spurious whitespace or newlines. There is a thought process then of, “hmm, should I split this with cut, grep -o, or maybe sed with some nutty regex using capture groups” when really it should be a matter of specifying that field by name.

    newcome

    March 26, 2012 at 10:34 am

  22. @jack orenstein – osh looks cool I’m going to check it out soon, looks like it embodies many of the ideals put forth here.

    newcome

    March 26, 2012 at 10:39 am

  23. @newcome I tend to agree with your statement that the output of every command has implicit semantics. I don’t have an issue with JSON per se, but by the same token (no pun intended here) there is no difference between JSON and plain old text (as far as UN*X utilities are concerned) as long as the output semantics are clear and certain agreed upon rules are adhered to. If that were the case, then it may be just as easy to insert a “t2j” (text to json) converter utility in the middle of a pipeline which took the columnar output from a given command (which adhered to these output semantic rules) and converted to JSON.

    ls | t2j | JSON-CONSUMER | | j2t |

    The thing is that there are no (or don’t seem to be) standard output semantics that utility writers appear to comply with. In my opinion the issue seems to be that the utilities themselves have tried to tack on extra functionality due to feature creep and that has not kept up with the (supposedly) initial “clean” output semantics which these tools were supposed to have.

    So my thinking here is that it is not the “glue” medium (text or JSON) that needs to be fixed, but that the problem lies with the feature-bloated utilities themselves and if that aspect was “fixed”, we could (/could/) go back to simply stringing commands together.

    BTW, I’ve thought about the exact same use of “ls” but mostly given up in the interest of time whenever I needed it and ended up using ‘find’ instead.

    Mike Moloch

    March 26, 2012 at 4:05 pm

  24. You might want to check out RecordStream. I have used to quite a number of times along with the shell’s arsenal of commands but to get some specific stuff done. It uses perl which makes it a little eerie but a pretty powerful set of tools. Also inspired by powershell/monad.

    https://github.com/benbernard/RecordStream

    jaskirat (@jaskirat)

    March 26, 2012 at 10:54 pm

  25. Similarly there is also pied piper which brings python to the shell. Similar notions of pipe etc. Seems simple but I haven’t really tried it.

    http://code.google.com/p/pyp/wiki/pyp_manual

    jaskirat (@jaskirat)

    March 26, 2012 at 11:00 pm

  26. This is a real flamebait title.

    You are too smart to think that the “unix way” is ever going to die. Unix has the Buddha Nature.

    asdfasdfsfad

    April 10, 2012 at 5:16 pm

  27. On the need to support annotation… as Moon has said, a problem with s-exp is the inability to hang “extra” information off them. You can’t have a read code list where everything is quietly carrying file positions, for instance. Adding annotations breaks existing expectations (absent an additional abstraction layer). So in pipelines, it’s hard for non-adjacent members to have private protocols.

    So, there’s XML. ‘Everything is a hash’ formats and languages, which permit slapping on extra slots. And languages with support for attaching annotations to objects. Objects with views. Anything else? I note that here in the future, pipeline components can load library support specific to the pipeline.

    Sigh. So the usual – an obvious, long known, society-crippling need, with obvious paths forward, but dysfunctional incentives, reducing progress to a multi-decade glacial-creep installment plan. I’m so looking forward to language change transitioning to incremental adoption. Yay javascript.

    Mitchell

    May 28, 2012 at 6:53 am

  28. “Functional programming and the death of the Unix Way « Dan Newcome, blog” really
    got me hooked on your internet page! I actuallydefinitely will wind
    up being returning a lot more normally. Thanks ,Jeremy

  29. PowerShell is a f*cking joke. Data is all just bytes no matter how you cut it. “Plain text” can be every bit as “structured” as anything else if you make it so. The output of most Unix utilities is very well structured and in many cases standardised. PowerShell sounds good on paper but it’s a laughable toy in practise. The great thing about plain text is that it imposes and enforces *nothing*, and yet allows *everything*.

    Craig

    May 18, 2013 at 2:28 pm

  30. Hi Craig, thanks for the comment. (you don’t happen to be at the bayhac Haskell hackathon now by chance?)

    I agree that text can be just as structured. You have to admit though, if the text is structured in a standard way it makes things much easier for the consumer. I parsed the output of ifconfig in some exploratory code, and it would have been a lot easier if there was some flag that output csv instead of the format that it typically is in.

    Powershell is a bit of a red herring in this discussion at this point. I think it had some interesting ideas, but of course the FOSS ecosystem is never going to support something like PS, so it’s pretty pointless.

    One of the questions remaining in my mind is whether “worse is better” applies here with respect to textual output. Your argument is obviously for, where my argument was for something more standardized. I realize that many a holy war has been fought over smaller details though.

    newcome

    May 18, 2013 at 2:59 pm

  31. […] are still some places where the shell falls down in my opinion though (ahem), but it mostly has to do with just getting around the filesystem. I started writing some tools […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: