Is this thing on?

posted on August 19, 2008

I've been busy and blog-neglecting lately, but I'm still here and hope to start blogging again soon. And I want to test that my bits haven't completely rotted while I wasn't paying attention.

Update: Still fresh!

Look out Martians

posted on March 26, 2008

Thomas Edison has your number.

Nein!

posted on March 25, 2008

My friend posix4e has today the most awesome description evah of the nature of actual programming:

Basically programming is often like playing golf in alice in wonderland with a referee who was trained by the SS.

e-bookin’

posted on March 21, 2008

Ooh – I have a blog. Maybe I should use it to post things?

On that theory, I present British author Richard Herley. Mr. Herley – whose novel The Penal Colony was the basis for the film No Escape – is offering his books for free download. They are provided under a CC A-NC-ND licence, but he requests that “honourable” readers pay him a small fee if they enjoy the books. I really hope this sort of business model can work.

Ode to building, part 2

posted on August 08, 2007

In the first part of this series I introduced the idea of what I termed abstract build policies — higher-level methods for describing how to produce the DAG of a build from a description of the desired inputs (sources) and results (targets). An abstract build policy allows developers to specify their sources and targets in abstract terms and let the build system sort out the details. For example, the autotools allow a developer to specify that they want to generate a shared library from a set of C source files:

lib_LTLIBRARIES = libexample.la
libexample_la_SOURCES = example.c example.h ...

But what's really going on under the hood?

Instead of requiring developers to describe the complete DAG of all sources-to-targets steps in a build, the autotools conspire to allow the developer to operate in terms of a set of higher-order abstractions:

systems:: The build is occurring on a particular kernel/OS/hardware combination (the build-system) to run on another (the host-system) and may — if producing a compiler, etc. — generate system-specific entities for another (the target-system).
variants:: The user invoking the build may choose to add optional part of the build, remove other parts, and cause yet other parts certain parts to occur in an alternative fashion.
sources:: The inputs to the build, some of which may themselves need to be generated from predecessor sources.
targets:: The kinds from a restricted set (e.g., shared library), names (libexample.so), and other properties (somewhat abstract installation location or lack thereof) of entities to result from the build.

The key aspect of the abstraction is that the developer describes her build only in terms of the details relevant for the particular build. "This set of source files produces a shared library." The autotools handle the nitty-gritty of populating the DAG with all the nasty little details only real toolchain-weenies actually care about.

In a perfect world (or for sufficiently simple/well-written code), the developer doesn't care whether the code is being built with gcc or the Sun Workshop compiler, for OpenBSD or AIX. If the code depends upon OS features which vary from system to system, then the developer needs to account for those variations, but only those variations. On Linux accessing /proc involves text-file parsing, on Solaris using a library interface, but the developer still doesn't care which compiler the build invokes or which flags the linker requires to coax it into producing a shared library.

The key limitation of the autotools is that they provide no facility for generating new abstractions. As I said in part 1, they provide a set of abstract build policies, but no a real build policy abstraction. The beauty and downfall of the autotools is that they produce truly portable build descriptions — build descriptions which depend only on the POSIX shell, POSIX utilities, POSIX make, and a toolchain for the source language. But providing even as simple a new abstract policy as one for producing a new target type or compiling a new source language is impossible without wholesale modification of the autotools core. And even then a new policy has little power to leverage intermediate abstractions, ultimately needing to itself generate portable POSIX make rules.

As originally promised for this part, part 3 will look at build tools which allow definition of new abstractions and what that implies.

Ode to building, part 1

posted on August 08, 2007

[This is an only slightly edited repost from an earlier incarnation of my blog. The archives from that blog didn't make the transition, but I'm planning to continue the series, so here's this post again.]

How does one build a piece of software from source? The superficial answer is run the tools1 on the source and intermediate files in the proper order. I'm sure we've all slapped something in a single C source file and just run the compiler on it from the shell. The next logical step beyond this would be a batch mechanism like a shell script running all the necessary commands. Perhaps worthy of a post to worsethanfailure.com if I still had access to the original, I've seen production binaries built with a file like:

#! /bin/sh

cc -c source1.c
cc -c source2.c
# ...
cc -o product source1.o source2.o # ...

In practice most of us use higher-level tools (thank god). The make tool represents the obvious next step "Unix philosophy"-wise. It assembles a directed acyclic graph of files in the build and performs the steps necessary to build the parts of the graph not yet present, leveraging the POSIX shell to represent and execute each build step. Some projects — it seems to be especially popular for kernels — use what I'll call Plain Old Make (POM), applying make "directly."

I put "directly" in quotation marks because most large POM projects put a fair amount of engineering effort into constructing higher-level abstractions applied consistently and automatically throughout the project. In standard Unix fashion make provides "mechanism not policy." In this case make implements the "mechanism" of bringing all the DAG nodes up to date, but requires make users to specify the "policy" of how to create each of those nodes from their dependencies2.

This leads to two interesting observations:

  • Most projects — especially large ones — want factored build policies (node patterns) they can apply consistently throughout the project.
  • Most projects share an awful lot of policy in common.

The GNU version of make includes extension to help with the first3, but for the second we move on to yet higher-level tools.

Most open source projects these days describe their builds using the GNU autotools: autoconf, automake, and libtool. The autotools provide abstract build policies which factor reusable build logic. They provide these policies in two ways. First, the autotools "know about" certain kinds of common build targets — they include the logic necessary to build a "program" or a "shared library" on supported platforms, requiring the developer only to specify the kind of target desired, not all the steps necessary to achieve it. Second, the autotools provide a policy for producing certain kinds of variant builds — they probe for features of the target platform and implement a policy for reacting to those features4.

Unfortunately, the autotools limit their abstract policy capabilities in an important way: the inability to implement new abstract policies. The autotools make it possible to descend to the mechanism level of make, m4, and the POSIX shell, but only in the same concrete way those tools do on their own. The autotools provide a collection of abstract policies, but no real policy abstraction.

Build tools which implement real build policy abstractions move us out of the realm of current popular use and on into part 2 of this series...

1 e.g., compiler, linker, Whiz-bang Source Frobnicator Enterprise Edition

2 The pedantic will note make support for "pattern rules," but I consider the fact that POSIX make pattern rules don't account for e.g. the different ways an object file needs to be generated for inclusion in an executable vs. a shared library enough for me to ignore them in this discussion.

3 $(include) and friends, although they're pretty clunky in practice.

4 This would be your config.h and its HAVE_FOO_H etc. macros.

A compendium of awesomeness

posted on August 05, 2007

Miro rocks. Due to it's awesomeness I've been catching up on all those Google Tech Talks I haven't found the time to stare at in a Web browser. And it gives me a reason to stop cursing how wmii/stable1 considers "fullscreen" for "chumps" — miro in "fullscreen" mode fits quite nicely in one corner of my laptop display.

Among the Google Tech Talks, the one on 7 Habits For Effective Text Editing 2.0 included a mention of a vim command which (in Emacs-speak) starts an incremental search on the token at point. I rather liked that, which yields:

;; I-search with initial contents
(defvar isearch-initial-string nil)

(defun isearch-set-initial-string ()
  (remove-hook 'isearch-mode-hook 'isearch-set-initial-string)
  (setq isearch-string isearch-initial-string)
  (isearch-search-and-update))

(defun isearch-forward-at-point (&optional regexp-p no-recursive-edit)
  "Interactive search forward for the symbol at point."
  (interactive "P\np")
  (if regexp-p (isearch-forward regexp-p no-recursive-edit)
    (let* ((end (progn (skip-syntax-forward "w_") (point)))
           (begin (progn (skip-syntax-backward "w_") (point))))
      (if (eq begin end)
          (isearch-forward regexp-p no-recursive-edit)
        (setq isearch-initial-string (buffer-substring begin end))
        (add-hook 'isearch-mode-hook 'isearch-set-initial-string)
        (isearch-forward regexp-p no-recursive-edit)))))

But then the real awesomeness I can't believe I've only just recently discovered — what language do you think this is?:

cdef extern from "awesomelib.h":
    cdef int completely_awesome_function(char *nice)

cdef class CoolThing:
    cdef int totally
    def __init__(self, nice):
        self.totally = completely_awesome_function(nice)

Python? Not quite. C? Closer, in a sense. It's Pyrex, a Python-like language which encapsulates (but distinguishes) both Python and C constructs, then compiles to C code which uses the Python/C API to provide a Python module. Pyrex does the standard simple type conversions SWIG or a given <Foo>Inline will handle, but also allows the easy creation of extension types (Python objects with C data members) and by-god-simple Python callback routines passed to C callback APIs.

This might just be the straw which pushes me back into the squeezy embrace of Python as my LoC.

1 Debian's wmii/testing is 3.5, which breaks my ruby-wmii setup.

Will the real hack please stand up

posted on April 15, 2007

I think I've changed my mind re: SVK and svn:externals, because it turns out this works just like you'd hope it would1:

svk mirror svn://project1/ //mirror/project1
svk mirror svn://project2/ //mirror/project2
svk sync --all
svk cp //mirror/project1 //local/project1
svk cp //mirror/project2 //local/project1/vendor/project2
svk push //local/project1

Then later do:

svk pull //local/project1/vendor/project2
svk push //local/project1

I've decided that svn:externals is the real hack. I've generally seen it used to solve two different problems: (a) exporting common modules as part of different projects stored in the same repository; and (b) importing "vendor branches" from other repositories. The problem is that it doesn't really do either well.

First, svn:externals requires a fully-qualified URI. This causes it to break if either the repository moves or the repository is accessed by different protocols. Both of those problems can be "solved" with "don't do that," but I've personally had cause for both. Obviously either of these will break people vendoring from your repo, but a URI-change shouldn't break internal module sharing. The fact of the matter is that the URI for a repo is a non-canonical name — and Subversion even accounts for this fact by generating a UUID for each repository.

Second, directly incorporating some chunk of repo into your project is rarely what you want to do. I've been guilty of this before — and am rather embarrassed thinking about it. I might be going out on a limb, but I think most vendor branches need to (a) be pinned to a known-stable version; and (b) include local modifications. For the former, svn:externals depends on either good tagging in the source project or (shiver) pinning to a specific revision. For the latter, svn:externals provides no help whatsoever. In fact, the Subversion manual chapter on vendor branches barely mentions svn:externals, preferring instead careful branch-management and merge practices2.

So I think SVK's way of doing things is quite a bit closer to a solution, although still not quite. It solves the problems discussed above, but still requires the local mirror exist (i.e., external configuration), and breaks when switching the actual source (tag, etc.) of the vendor branch3. "Here's 2 cents — go get yourself a real distributed version control system," I hear some of you saying...

1 For the SVK-unfamiliar, this is almost exactly what Piston lets you do, only in both directions.

2 And a Perl script, apparently.

3 Breaks pull and the UUID-tracking, that is. A generic smerge will still work just fine.

Home sweet version-controlled home

posted on April 15, 2007

I've been meaning to do this for quite a while, but I finally have my entire home directory under revision control. I've been keeping most of my important rcfiles in a Subversion repo — and of course no source code goes unversioned — but everything else has just never made it into anything other than my... um... "infrequent"1 off-site back-ups. Until now!

I looked briefly over several version control systems before finally deciding to stick with Subversion. It's stable, the most widely used by open source projects (after CVS), hacked on by someone I know, and only lacks one feature I'd miss. Work-use of SCM might have affected my decision, but my current day job is using StarTeam2 so... no.

The one feature I'd miss is distributed development. All the cool kids are doing it — not to mention the more recent change-set oriented OSS version control systems. So I'm layering that on with SVK. If you haven't heard of it, SVK is... well... a giant hack. And written in Perl none the less. It essentially uses the Subversion libraries to implement an alternate interface for working with Subversion repositories. The main intent is something like distributed development3, but the way SVK goes about it has some interesting effects.

The big one is how working copies are maintained. The SVN tools maintain with each working copy a pristine copy of the revision checked out from the remote repository. This allows for fast revert, diff, etc., and is also probably the easiest way to allow preparation of the change-set for commit. SVK instead of maintaining that pristine revision, mirrors the remote repository. This allows diffing not only against the checked out revision, but against any revision in the mirror — or between any revisions in the mirror. Want to checkpoint some experimental code? Commit it locally and only push it out after it's finished. Need to switch from your experimental code to mainline development? Create local branches switch your working copy between them at will.

And this is kind of petty, but not having those .svn directories in each checkout means I can stop using this monstrosity:

export SFIND_PRUNE_PATS='*~:*/CVS:*/.svn'
sfind() {
    local dirs=''
    while [ $# != 0 ]; do
        echo "$1" | grep '^-' >/dev/null 2>&1 && break
        dirs="$dirs '$1'"; shift
    done
    local opts="$(squote "$@")"

    local prune_args="-path '$(\
        echo "$SFIND_PRUNE_PATS" | sed -e "s,:,' -o -path ',g")'"
    eval "find $dirs \
              '(' '(' $prune_args ')' -prune ')' \
              -o $opts -print"
}

SVK used to have a reputation for being a bit unstable. So far, the only problem I've had there is trying to use version 2.0's "views" feature, which is clearly tagged as "very beta." And even if it does fall over, my "real" repository is still just plain old Subversion.

Lest I mislead you into believing SVK to be all kittens and sunshine, I have had some problems. SVK doesn't allow overlapping working copies, so keeping all of ~ in a repo involves symlinks. SVK keeps all pending non-file changes4 in one big YAML file — prepare a commit with a lot of deletes, and you can feel the parse time. SVK doesn't support svn:externals, so WYSIWYG when it comes to your repo's directory hierarchy. SVK's "views" are supposed to provide similar functionality, but in addition to not being there yet implementation-wise, they seem a bit off design-wise — and still aren't svn:externals. And did I mention that it's written in Perl?

But it's mostly kittens and sunshine, and I don't have to worry about letting cups of coffee near my laptop anymore. For next blog post: where I'm keeping this homedir Subversion repo of mine...

1 i.e., usually made shortly after some sort of laptop + liquid incident.

2 I can vaguely understand using a proprietary version control system, but one which didn't have atomic commits until its most recent release? Come on, people.

3 Projects are still tied to a central repository, which is either impure or just practical, depending on how you look at it.

4 Directory hierarchy changes, property modifications, etc.

Fun with screen

posted on March 10, 2007

Despite being such an Emacs zealot, I've never been able to love any of the Emacs shell solutions. Emacs' terminal emulation is just off enough to bug me, and eshell drives me nuts1. So that leaves me running a shell in a dedicated terminal (emulator). Because I do most of my work in Emacs — including using tramp to edit files on remote systems — I rarely need to have more than one terminal visible at any one time, however many I have running. Solution: the GNU screen "terminal multiplexer."

Check it out if you haven't heard of it, but this post is mostly about a screen configuration trick I've found useful. Screen has the ability to have a status line showing the open virtual terminals, each designated by a title. The title of each window can either be static, or dynamically updated based upon a watched-for pattern in the terminal output. The pattern-watching mechanism is geared toward titling each virtual terminal with the currently running command2. My configuration hack is to use that mechanism to display instead the name of the host I've remotely logged into in that terminal.

Here's the relevant bit of my .screenrc:

shelltitle '@|'
hardstatus alwayslastline '%-Lw%{= BW}%50>%n%f* %t%{-}%+Lw%<'

The hardstatus directive is taken pretty much directly from the screen documentation. The shelltitle directive tells screen that after it sees the (built-in hard-coded) escape sequence "ESC k ESC \" everything between the next '@' and the end of the line should be the terminal title.

For "normal" use of this feature, one would embed the escape sequence and pattern in one's shell prompt. I do the same thing, only tricky:

# 'screen' title escape
PS1="\033k\033\134@\h\n\033[1A"

I build my $PS1 incrementally in a prompt_command shell function, so that's only the first part, but it's where the magic happens. Each time the shell displays its prompt, it will print the "ESC k ESC \" escape sequence, then the '@' screen is searching for, the hostname (which screen will pick up as the new title), a newline (ending the title), then finally the terminal control escape sequence to go back up a line.

End result — screen picks up the hostname as the title, but rest of the prompt overwrites the hostname output in the visible terminal text.

1 I could probably hack the way command history and tab completion work, but scrolling behavior is so annoying I've just assumed it must be a hard problem to fix.

2 screen expects the new terminal title to end with a newline character.

old posts