Home

shell

An Opinionated Guide to Options Parsing in Shell

Some may say that you shouldn’t write shell beyond a certain, very low bar of complexity. If you reach for arrays, certainly associative arrays (gasp!), or if your script approaches 20, 50, or 100 (how dare you!) lines, maybe you want a “real” language.

Everyone’s bar is different, but I’d wager actual options parsing is above it for most. I think this is misguided; parsing options in shell can be valuable enough, and done with low enough complexity, to more than pay for itself on this scale. I think the problem is a lack of familiarity (did you even know you could parse options in shell?) coupled with confusing alternatives and an information-dense (read: overwhelming) documentation style in the space.

I’ve arrived at a narrow pattern of shell options parsing that I think is drastically improving my scripts, without introducing much by way of downside. By accepting some limitations, I think I’ve found a good 80/20 in benefit/complexity in this space.

Skeleton

Here is how I begin any script I write:

#!/bin/sh
usage() {
  cat <<'EOM'
TODO
EOM
}

while getopts h opt; do
  case "$opt" in
    h)
      usage
      exit 0
      ;;
    \?)
      usage >&2
      exit 64
      ;;
  esac
done

shift $((OPTIND - 1))

printf '>%s<\n' "$@" # For demonstration purposes

I used to do this “when I needed”, but I’m done fooling myself. I always end up wanting this, and I’m always happy when I’ve done it from the start. Seeing usage front-and-center top-of-file is great. Expecting -h in a script that isn’t used very often is extremely useful, for me and my team.

Let’s break down what’s happening:

while getopts h opt; do

The getopts program is typically a shell built-in and is specified by POSIX. This means you can use it in pretty much any shell, but it will be less featureful; no long options, for example. I prefer this over getopt, which is Bash-specific and does support long options. I actually don’t care too much about POSIX compatibility, and I most often write scripts with a bash shebang, but I actually just find getopt’s usage very clunky. Your mileage may vary.

The h is the optstring or “options string”. It defines the options you are going to parse for. In this case, I’m saying the single option h without any arguments. I’ll extend it later and you’ll see how its syntax works.

opt is the name of the variable that getopts will place each option it parses into for each iteration of the loop.

case "$opt" in
  h)
    usage
    exit 0
    ;;
  \?)
    usage >&2
    exit 64
    ;;
esac

As mentioned, this will loop with $opt set to each (valid) flag we see, or ? if we were given something invalid. If given h, I print usage information to stdout and exit successfully. The invalid branch is similar accept going to stderr and exiting un-successfully.

I prefer to let getopts print its own error on invalid items,

% ./example -h
TODO
% ./example -f
./example: illegal option -- f
TODO
% 64

I think its messages are perfectly clear and I’m happy to not manage them myself. You can suppress these messages by prefixing the options string with :. See the manpage for more details.

shift $((OPTIND - 1))

printf '>%s<\n' "$@"

Lastly, we shift passed the parsed options. That way, anything we don’t handle in getopts is $@ at this point in the script:

% ./example foo bar "baz bat"
>foo<
>bar<
>baz bat<
% ./example -f foo bar "baz bat"
./example: illegal option -- f
TODO

And since we’re parsing options “for real” instead of adhoc, we get some behavior for free, such as -- to separate option-like arguments, needed to support that last example:

% ./example -- -f foo bar "baz bat"
>-f<
>foo<
>bar<
>baz bat<

Flag options

Now, let’s parse another option:

usage() {
  cat <<'EOM'
Usage: thing [-fh]

Options
  -f            Force the thing
  -h            Print this help

EOM
}

force=0

while getopts fh opt; do
  case "$opt" in
    f)
      force=1
      ;;
    # ...
  esac
done

Here you see one downside compared to “real” languages’ options parsers: we have to do things 3 times.

  1. The argument to getopts contains f
  2. The case statement must look for f
  3. The usage function

If you configure ShellCheck in your editor (you should!), that can at least protect you from most mistakes in item 2:

Options with arguments

Now, let’s add an option with an argument:

usage() {
  cat <<'EOM'
Usage: thing [-fh] <-o PATH>

Options
  -f            Force the thing
  -o            Output file
  -h            Print this help

EOM
}

force=0
output=

while getopts fo:h opt; do
  case "$opt" in
    # ...
    o)
      output=$OPTARG
      ;;
    # ...
  esac
done

if [ -z "$output" ]; then
  echo "-o is required" >&2
  usage >&2
  exit 64
fi

As before the same 3 things:

  1. Add o: to options string, the : indicates an argument is required
  2. Look for o in the case; the argument will be present in $OPTARG
  3. Document accordingly in usage

And we see a new downside: required options are on us to enforce.

This is certainly error-prone, but again, I’m shooting for the 80/20 on complexity vs featureful-ness. If getopts somehow supported declaring options as required, it would then need to also support defaulting. Going in this direction can cause the complexity to spiral too far for POSIX.

For what it’s worth, I agree with where they’ve drawn the line; and leaving that to us makes defaulting pretty easy:

usage() {
  cat <<'EOM'
Usage: thing [-fh] [-o PATH]

Options
  -f            Force the thing
  -o            Output file, default is stdout
  -h            Print this help

EOM
}

output=/dev/stdout

while getopts # ...

Complete example

This snippet should be a good copy-paste source for the limit of what POSIX getopts provides:

#!/bin/sh
usage() {
  cat <<'EOM'
Usage: thing-mover [-fh] [-o PATH] [--] <THING> [THING...]
Move things into some output.

Options:
  -f            Overwrite output even if it exists
  -o            Output path, default is stdout
  -h            Show this help

Arguments:
  THING         Thing to move

EOM
}

force=0
output=/dev/stdout

while getopts fo:h opt; do
  case "$opt" in
    f)
      force=1
      ;;
    o)
      output=$OPTARG
      ;;
    h)
      usage
      exit 0
      ;;
    \?)
      usage >&2
      exit 64
      ;;
  esac
done

shift $((OPTIND - 1))

if [ $# -eq 0 ]; then
  echo "At least one thing is required" >&2
  usage >&2
  exit 64
fi

for thing in "$@"; do
  if thing_exists "$thing"; then
    if [ "$force" -ne 1 ]; then
      echo "Thing exists!" >&2
      exit 1
    fi
  fi

  move_thing "$thing" "$output"
done

NOTE: Normally I would just do nothing if no things were passed, as a form of define errors out of existence1, but I’m enforcing the argument for demonstration purposes here.


  1. A Philosophy of Software Design↩︎

23 Mar 2021, tagged with shell

Mocking Bash

Have you ever wanted to mock a program on your system so you could write fast and reliable tests around a shell script which calls it? Yeah, I didn’t think so.

Well I did, so here’s how I did it.

Cram

Verification testing of shell scripts is surprisingly easy. Thanks to Unix, most shell scripts have limited interfaces with their environment. Assertions against stdout can often be enough to verify a script’s behavior.

One tool that makes these kind of executions and assertions easy is cram.

Cram’s mechanics are very simple. You write a test file like this:

The ls command should print one column when passed -1

  $ mkdir foo
  > touch foo/bar
  > touch foo/baz

  $ ls -1 foo
  bar
  baz

Any line beginning with an indented $ is executed (with > allowing multi-line commands). The indented text below such commands is compared with the actual output at that point. If it doesn’t match, the test fails and a contextual diff is shown.

With this philosophy, retrofitting tests on an already working script is incredibly easy. You just put in a command, run the test, then insert whatever the actual output was as the assertion. Cram’s --interactive flag is meant for exactly this. Aces.

Not Quite

Suppose your script calls a program internally whose behavior depends on transient things which are outside of your control. Maybe you call curl which of course depends on the state of the internet between you and the server you’re accessing. With the output changing between runs, these tests become more trouble than they’re worth.

What’d be really great is if I could do the following:

  1. Intercept calls to the program
  2. Run the program normally, but record “the response”
  3. On subsequent invocations, just replay the response and don’t call the program

This means I could run the test suite once, letting it really call the program, but record the stdout, stderr, and exit code of the call. The next time I run the test suite, nothing would actually happen. The recorded response would be replayed instead, my script wouldn’t know the difference and everything would pass reliably and instantly.

In case you didn’t notice, this is VCR.

The only limitation here is that a mock must be completely affective while only mimicking the stdout, stderr, and exit code of what it’s mocking. A command that creates files, for example, which are used by other parts of the script could not be mocked this way.

Mucking with PATH

One way to intercept calls to executables is to prepend $PATH with some controllable directory. Files placed in this leading directory will be found first in command lookups, allowing us to handle the calls.

I like to write my cram tests so that the first thing they do is source a test/helper.sh, so this makes a nice place to do such a thing:

test/helper.sh

export PATH="$TESTDIR/..:$TESTDIR/bin:$PATH"

This ensures that a) the executable in the source directory is used and b) anything in test/bin will take precedence over system commands.

Now all we have to do to mock foo is add a test/bin/foo which will be executed whenever our Subject Under Test calls foo.

Record/Replay

The logic of what to do in a mock script is straight forward:

  1. Build a unique identifier for the invocation
  2. Look up a stored “response” by that identifier
  3. If not found, run the program and record said response
  4. Reply with the recorded response to satisfy the caller

We can easily abstract this in a generic, 12 line proxy:

test/bin/act-like

#!/usr/bin/env bash
program="$1"; shift
base="${program##*/}"

fixtures="${TESTDIR:-test}/fixtures/$base/$(echo $* | md5sum | cut -d ' ' -f 1)"

if [[ ! -d "$fixtures" ]]; then
  mkdir -p "$fixtures"
  $program "$@" >"$fixtures/stdout" 2>"$fixtures/stderr"
  echo $? > "$fixtures/exit_code"
fi

cat "$fixtures/stdout"
cat "$fixtures/stderr" >&2

read -r exit_code < "$fixtures/exit_code"

exit $exit_code

With this in hand, we can record any invocation of anything we like (so long as we only need to mimic the stdout, stderr, and exit code).

test/bin/curl

#!/usr/bin/env bash
act-like /usr/bin/curl "$@"

test/bin/makepkg

#!/usr/bin/env bash
act-like /usr/bin/makepkg "$@"

test/bin/pacman

#!/usr/bin/env bash
act-like /usr/bin/pacman "$@"

Success!

After my next test run, I find the following:

$ tree test/fixtures
test/fixtures
├── curl
│   ├── 008f2e64f6dd569e9da714ba8847ae7e
│   │   ├── exit_code
│   │   ├── stderr
│   │   └── stdout
│   ├── 2c5906baa66c800b095c2b47173672ba
│   │   ├── exit_code
│   │   ├── stderr
│   │   └── stdout
│   ├── c50061ffc84a6e1976d1e1129a9868bc
│   │   ├── exit_code
│   │   ├── stderr
│   │   └── stdout
│   ├── f38bb573029c69c0cdc96f7435aaeafe
│   │   ├── exit_code
│   │   ├── stderr
│   │   └── stdout
│   ├── fc5a0df540104584df9c40d169e23d4c
│   │   ├── exit_code
│   │   ├── stderr
│   │   └── stdout
│   └── fda35c202edffac302a7b708d2534659
│       ├── exit_code
│       ├── stderr
│       └── stdout
├── makepkg
│   └── 889437f54f390ee62a5d2d0347824756
│       ├── exit_code
│       ├── stderr
│       └── stdout
└── pacman
    └── af8e8c81790da89bc01a0410521030c6
        ├── exit_code
        ├── stderr
        └── stdout

11 directories, 24 files

Each hash-directory, representing one invocation of the given program, contains the full response in the form of stdout, stderr, and exit_code files

I run my tests again. This time, rather than calling any of the actual programs, the responses are found and replayed. The tests pass instantly.

24 Aug 2013, tagged with shell

Goodsong

If you’re like me, (which you’re probably not…) you enjoy listening to your music with the great music playing daemon known as mpd. You also have your entire collection on shuffle.

Occasionally, I’ll fall into a valley of bad music and end up hitting next far too much to get to a good song. For this reason, I wrote goodsong.

What is it?

Essentially, you press one key command to say the currently playing song is good; then press a different key to say play me a good song.

Goodsong accomplishes exactly that. It creates a playlist file which you can auto-magically add the currently playing song to with the command goodsong. Subsequently, running goodsong -p will play a random track from that same list.

Here’s the --help:

usage: goodsong [ -p | -ls ]

options:
      -p,--play   play a random good song
      -ls,--list  print your list with music dir prepended

      none        note the currently playing song as good

Installation

Goodsong is available in its current form in my git repo.

Usage

Using goodsong is easy. You can always just run it from CLI, but I find it’s best when bound to keys. I’ll leave the method for that up to you; xbindkeys is a nice WM-agnostic way to bind some keys, or you can use your a WM-specific configuration to do so.

Personally, I keep Alt-g as goodsong and Alt-Shift-g as goodsong -p.

You’re going to have to spend some time logging songs as “good” before the -p option becomes useful.

I recently received a patch from a reader for this script. It adds a few features which I’ve happily merged in.

  • Various methods are employed to try and determine exactly what mpd.conf you’re currently running with at the time
  • The goodsong list is now a legitimate playlist file stored in your playlist_directory as specified in mpd.conf

05 Dec 2009, tagged with shell

Dvdcopy

Do not use this for bad things, m’kay?

What it looks like

Dvdcopy Shot 

Usage

usage: dvdcopy [ --option(=<argument>) ] [...]

~/.dvdcopy.conf will be read first if it's found (even if --config
is passed). for syntax, see the help entry for the --config option.
commandline arguments will overrule what's defined in the config.

invalid options are ignored.

options:

  --config=<file>               read any of the below options from a
                                file, note that you must strip the
                                '--' and set any argument-less
                                options specifically to either true
                                or false

                                there is no error if <file> doesn't
                                exist

  --directory=<directory>       set the working directory, default
                                is ./dvdcopy

  --keep_files                  keep all intermediate files; note
                                that they will be removed the next
                                time dvdcopy is run regardless of
                                this option

  --device=<file>               set the reader/burner, default is
                                /dev/sr0

  --title=<number>              set the title, default is longest

  --size=<number>               set the desired output size in KB, 
                                default is 4193404

  --limit=<number>              set the number of times to attempt a
                                read/burn before giving up, default
                                is 15

  --mpeg_only                   stop after transcoding the mpeg
  --dvd_only                    stop after authoring the dvd
  --iso_only                    stop after generating the iso

  --mpeg_dir=<directory>        set a save location for the
                                intermediate mpeg file, default is
                                blank -- don't save it

  --dvd_dir=<directory>         set a save location for the
                                intermediate vob folder, default is
                                blank -- don't save it

  --iso_dir=<directory>         set a save location for the
                                intermediate iso file, default is
                                blank -- don't save it

  --mencoder_options=<options>  pass additional arbitrary arguments
                                to mencoder, multiple options should
                                be quoted and there is no validation
                                on these; you'll need to know what
                                you're doing. the options are placed
                                after '-dvd-device <device>' but
                                before all others

  --quiet                       be quiet
  --verbose                     be verbose

  --force                       disable any options validation,
                                useful if ripping from an image file

  --help                        print this

What’s it do?

Pop in a standard DVD9 (~9GB) and type dvdcopy. The script will calculate the video bitrate required to create an ISO under 4.3GB (standard DVD5). It will then use mencoder to create an authorable image and burn it back to a disc playable on any standard player.

Defaults are sane (IMO), but can be adjusted through the config file or the options passed at runtime (or both). I’ve now added a lot of cool features as described in the help.

How to get it

Install the AUR package here.

Grab the source from my git repo here.

05 Dec 2009, tagged with shell