tee-io Lessons Learned

Apr 16, 2016 00:00 · 1584 words · 8 minute read

A while back, I launched a side project called tee-io. It’s sort of like a live pastebin. You use its API to create a command and then send it buffered output, usually a line at a time. Creating the command gives you a URL where you can watch the output come in in real time. We use it at work to monitor the commands run by our bot instead of waiting for the (potentially long) command to finish and report all the output back to us at once.

tee-io-in-action

While working on this project, which is built with Yesod, I started to settle on some conventions for things I’ve not seen written up in the wild. I’d like to collect my thoughts here, both for myself and in case these conventions are useful to others.

Worker 🔗

One thing tee-io does that I think is common but under-served in the tutorial space is background work. In addition to the main warp-based binary, it’s often necessary to run something on a schedule and do periodic tasks. In tee-io’s case, I want to archive older command output to S3 every 10 minutes.

My approach is to define a second executable target:

executable              tee-io-worker
    if flag(library-only)
        buildable:      False

    main-is:            main-worker.hs
    hs-source-dirs:     app
    build-depends:      base
                      , tee-io

    ghc-options:        -Wall -Werror -threaded -O2 -rtsopts -with-rtsopts=-N

This is basically a copy-paste of the existing executable, and the implementation is also similar:

import Prelude (IO)
import Worker (workerMain)

main :: IO ()
main = workerMain

workerMain uses the “unsafe” handler function to run a Handler action as IO:

workerMain :: IO ()
workerMain = handler $ do
    timeout <- appCommandTimeout . appSettings <$> getYesod
    archiveCommands timeout

archiveCommands :: Second -> Handler ()
archiveCommands timeout = runDB $ -- ...

Making the heavy lifting a Handler () means I have access to logging, the database, and any other configuration present in a fully-inflated App value. It’s certainly possible to write this directly in IO, but the only real downside to Handler is that if I accidentally try to do something request or response-related, it won’t work. In my opinion, pragmatism outweighs principle in this case.

Logging 🔗

One of the major functional changes I make to a scaffolded Yesod project is around AppSettings, and specifically logging verbosity.

I like to avoid the #define DEVELOPMENT stuff as much as possible. It’s required for template-reloading and similar settings because there’s no way to give the functions that need to know those settings an IO context. For everything else, I prefer environment variables.

In keeping with that spirit, I replace the compile-time, logging-related configuration fields with a single, env-based log-level:

Settings.hs

instance FromJSON AppSettings where
    parseJSON = withObject "AppSettings" $ \o -> do
        let appStaticDir = "static"
        appDatabaseConf <- fromDatabaseUrl
            <$> o .: "database-pool-size"
            <*> o .: "database-url"
        appRoot <- o .: "approot"
        appHost <- fromString <$> o .: "host"
        appPort <- o .: "port"
        appIpFromHeader <- o .: "ip-from-header"
        appCommandTimeout <- fromIntegral
            <$> (o .: "command-timeout" :: Parser Integer)
        S3URL appS3Service appS3Bucket <- o .: "s3-url"
        appMutableStatic <- o .: "mutable-static"

        appLogLevel <- parseLogLevel <$> o .: "log-level"
        -- ^ here

        return AppSettings{..}

      where
        parseLogLevel :: Text -> LogLevel
        parseLogLevel t = case T.toLower t of
            "debug" -> LevelDebug
            "info" -> LevelInfo
            "warn" -> LevelWarn
            "error" -> LevelError
            _ -> LevelOther t

config/settings.yml

approot: "_env:APPROOT:http://localhost:3000"
command-timeout: "_env:COMMAND_TIMEOUT:300"
database-pool-size: "_env:PGPOOLSIZE:10"
database-url: "_env:DATABASE_URL:postgres://teeio:teeio@localhost:5432/teeio"
host: "_env:HOST:*4"
ip-from-header: "_env:IP_FROM_HEADER:false"
log-level: "_env:LOG_LEVEL:info"
mutable-static: "_env:MUTABLE_STATIC:false"
port: "_env:PORT:3000"
s3-url: "_env:S3_URL:https://s3.amazonaws.com/tee.io"

I don’t use config/test-settings.yml and prefer to inject whatever variables are appropriate for the given context directly. To make that easier, I load .env files through my load-env package in the appropriate places.

.env (development)

COMMAND_TIMEOUT=5
LOG_LEVEL=debug
MUTABLE_STATIC=true
S3_URL=https://s3.amazonaws.com/tee.io.development

.env.test

DATABASE_URL=postgres://teeio:teeio@localhost:5432/teeio_test
LOG_LEVEL=error
S3_URL=http://localhost:4569/tee.io.test

Now I can adjust my logging verbosity in production with a simple heroku config:set, whereas before I needed a compilation and deployment to do that!

Yesod applications log in a few different ways, so there are a handful of touch-points where we need to check this setting. To make that easier, I put a centralized helper alongside the data type in Settings.hs:

allowsLevel :: AppSettings -> LogLevel -> Bool
AppSettings{..} `allowsLevel` level = level >= appLogLevel

The first place to use it is the shouldLog member of the Yesod instance:

shouldLog App{..} _source level = appSettings `allowsLevel` level

Second is the logging middleware. It’s a little tricky to get the right behavior here because, with the default scaffold, this logging always happens. It has no concept of level and wasn’t attempting to make use of shouldLog in any way.

The approach I landed on was to change the destination to (basically) /dev/null if we’re not logging at INFO or lower. That’s equivalent to if these messages were tagged INFO and respected our configured level, which seems accurate to me. The big win here is they no longer mess up my test suite output.

makeLogWare foundation = mkRequestLogger def
    { outputFormat = if appSettings foundation `allowsLevel` LevelDebug
        then Detailed True
        else Apache apacheIpSource
    , destination = if appSettings foundation `allowsLevel` LevelInfo
        then Logger $ loggerSet $ appLogger foundation
        else Callback $ \_ -> return ()
    }

One last thing, specific to tee-io, is that I can use this setting to turn on debug logging in the AWS library I use:

logger <- AWS.newLogger (if appSettings foundation `allowsLevel` LevelDebug
    then AWS.Debug
    else AWS.Error) stdout

It’s pretty nice to set LOG_LEVEL=debug and start getting detailed logging for all AWS interactions. Kudos to amazonka for having great logging too.

REPL-Driven-Development 🔗

DevelMain.hs has quickly become my preferred way to develop Yesod applications. This file ships with the scaffold and defines a module for starting, stopping, or reloading an instance of your development server directly from the REPL:

stack repl --ghc-options="-DDEVELOPMENT -O0 -fobject-code"
λ> :l DevelMain
DevelMain.update
Devel application launched: http://localhost:3000

The big win here in my opinion is that, in addition to viewing changes in your local browser, you naturally fall into a REPL-based workflow. It’s not something I was actively missing in Yesod projects, but now that I’m doing it, it feels really great.

I happen to have a nice Show instance for my settings, which I can see with handler:

λ> appSettings <$> handler getYesod
log_level=LevelDebug host=HostIPv4 port=3000 root="http://localhost:3000"
  db=[user=teeio password=teeio host=localhost port=5432 dbname=teeio]
  s3_bucket=tee.io.development command_timeout=5s

(Line breaks added for readability, here and below.)

And I can investigate or alter my local data easily with db:

λ> db $ selectFirst [] [Desc CommandCreatedAt]
Just (Entity
      { entityKey = CommandKey
          { unCommandKey = SqlBackendKey {unSqlBackendKey = 1097} }
      , entityVal = Command
          { commandToken = Token {tokenUUID = e79dae2c-020e-48d4-ac0b-6d9c6d79dbf4}
          , commandDescription = Just "That example command"
          , commandCreatedAt = 2016-02-11 14:50:19.786977 UTC
          }
      })
λ>

Finally, this makes it easy to locally test that worker process:

λ> :l Worker
λ> workerMain
16/Apr/2016:14:08:28 -0400 [Debug#SQL]
  SELECT "command"."id", ...
    FROM "command"
  LEFT OUTER JOIN "output"
     ON ("command"."id" = "output"."command")
    AND ("output"."created_at" > ?)
  WHERE (("command"."created_at" < ?)
    AND ("output"."id" IS NULL))
  ; [ PersistUTCTime 2016-04-16 18:08:23.903484 UTC
    , PersistUTCTime 2016-04-16 18:08:23.903484 UTC
    ]
16/Apr/2016:14:08:28 -0400 [Info] archive_commands count=1
  @(main:Worker /home/patrick/code/pbrisbin/tee-io/src/Worker.hs:37:7)
[Client Request] {
  host      = s3.amazonaws.com:443
  secure    = True
  method    = PUT
  target    = Nothing
  timeout   = Just 70000000
  redirects = 0
  path      = /tee.io.development/b9a74a98-0b16-4a23-94f1-5df0a01667d0
  query     = 
  headers   = ...
  body      = ...
}
[Client Response] {
  status  = 200 OK
  headers = ...
}
16/Apr/2016:14:08:28 -0400 [Debug#SQL] SELECT "id", "command", ...
16/Apr/2016:14:08:28 -0400 [Debug#SQL] DELETE FROM "output" WHERE ...
16/Apr/2016:14:08:28 -0400 [Debug#SQL] DELETE FROM "command" WHERE ...
16/Apr/2016:14:08:28 -0400 [Info] archived token=b9a74a98-0b16-4a23-94f1-5df0a01667d0
  @(main:Worker /home/patrick/code/pbrisbin/tee-io/src/Worker.hs:59:7)
λ>

Since I run with DEBUG in development, and that was picked up by the REPL, we can see all the S3 and database interactions the job goes through.

The console was one of the features I felt was lacking when first coming to Yesod from Rails. I got used to not having it, but I’m glad to see there have been huge improvements in this area while I wasn’t paying attention.

Deployment 🔗

I’ve been watching the deployment story for Yesod and Heroku change drastically over the past few years. From compiling on a VM, to a GHC build pack, to Halcyon, the experience hasn’t exactly been smooth. Well, it seems I might have been right in the conclusion of that last blog post:

Docker […] could solve these issues in a complete way by accident.

We now have a Heroku plugin for using Docker to build a slug in a container identical to their Cedar infrastructure, then extracting and releasing it via their API.

Everything we ship at work is Docker-based, so I’m very comfortable with the concepts and machine setup required (which isn’t much), so using this release strategy for my Yesod applications has been great. Your mileage may vary though: while I do feel it’s the best approach available today, there may be some bumps and yaks for those not already familiar with Docker – especially if on an unfortunate operating system, like OS X.

Thanks to the good folks at thoughtbot, who are maintaining a base image for releasing a stack-based project using this Heroku plugin, making tee-io deployable to Heroku looked like this:

% cat Procfile
web: ./tee-io

% cat app.json
{
  "name": "tee.io",
  "description": "This is required for heroku docker:release"
}

% cat docker-compose.yml
# This is required for heroku docker:release
web:
  build: .

% cat Dockerfile
FROM thoughtbot/heroku-haskell-stack:lts-5.12
MAINTAINER Pat Brisbin <pbrisbin@gmail.com>

And I just run:

heroku docker:release

And that’s it!

If you’re interested in seeing any of the code examples here in the context of the real project, checkout the tee-io source on GitHub.

haskell yesod