Sunday, February 23, 2014

process + pipes = process-streaming

There are a number of good libraries for launching external processes already in Hackage. Basic libraries like process, and libraries that build on top of it like shelly and process-conduit (there is also module System.IO.Streams.Process from io-streams).

An official pipes-process package hasn't come out yet. After working recently with Ruby's Open3 package, and becoming inspired by this thread in the Haskell Pipes Google group, I decided to cobble together my own experiment. process-streaming is the result (git repo here).

I wanted to scratch a number of itches:

  • To my knowledge, neither shelly nor process-conduit provide concurrent streaming access to both the stdout and the stderr of a process, which is sometimes necessary. (shelly does provide direct access to the handles, but you have to orchestrate the concurrency by yourself.)
  • shelly and process-conduit use exceptions for signaling errors. This makes for simpler signatures. I wanted to test if a purely Either-based approach to errors could work without making the signatures excessively complicated.
  • Combining stdout with stderr is a frequent use case with some subtleties in the implementation. You have to be careful not to mangle lines, and also ensure that both streams are drained continuously (because otherwise you can get yourself into a blocking scenario).
  • Besides using plain Consumers, it should be easy to consume stdout and stderr using folds from Pipes.Prelude (and pipes-text and pipes-bytestring and pipes-group) and also parsers from the pipes-parse package. And it would be nice if two parsers could be run in parallel over the same Producer.
The haddock documentation for the System.Process.Streaming module is here. A tutorial with some examples can be found here.

If nothing else, this experiment has served me to understand the pipes ecosystem a little better. I really like the pipes-text approach for handling decoding leftovers (putting them in the return values of pipes) and the FreeT-based approach for parsing lines without never having to keep a whole line in memory at any time.