Tuesday, December 23, 2014

Connecting to Denodo Virtual DataPort from Go

Perhaps I can reuse the setup of my previous post for another database connectivity experiment, this time from Go.

I have very little knowledge of Go. It seems like a pragmatic language with good tooling. And there exists a PostgreSQL driver written in pure Go that I can use.

The first step is to install Go and set up the environment. Once ready, the PostgreSQL driver can be installed in the workspace with the command go get github.com/lib/pq.

Then we create a project named github.com/danidiaz/hellopq in the workspace, with the following Go file:

The code was adapted from the one shown in this video. The connection parameters were taken from here.

Executing the program, it seems to work!

But I have a remaining doubt: how to properly handle the nullability of the last_updated column?

Monday, December 22, 2014

Connecting to Denodo Virtual DataPort from Haskell

My employer Denodo has released an "Express" version of its data virtualization platform. The Virtual DataPort component lets you do cool things like joining tables that reside in completely different databases (say, one table in Oracle and another in MySQL). And it comes with a boatload of connectors for a wide range of data sources.

I'm interested in accessing all that data goodness from Haskell. There are a couple of ways of doing it.

One would be to use Virtual DataPort's REST interface. That's a valid option and maybe the subject of another post.

Another is to connect through the ODBC interface on port 9996. As it happens, the wire protocol aims to be compatible with that of PostgreSQL, so maybe I could just use Haskell's popular postgresql-simple library. Let's try it out.

Musicbrainz


First, we need an underlying database instance that we can "wrap" with Virtual DataPort. Let's use the Musicbrainz database, a perennial favorite for testing. We can download it as a virtual machine and run it on Virtual Box. I usually set the guest networking to NAT and only forward PostgreSQL's port, which is 5432.

Virtual DataPort


Then we download and install Denodo Express, start the Virtual DataPort server, and launch the graphical administration tool.

From the administration tool's GUI, we have to import a few tables from the Musicbrainz database. First we create a JDBC data source:


(The Musicbrainz database is called musicbrainz_db, the user/password is musicbrainz/musicbrainz.)

And then import the artist_name and release tables:


Haskell


Ok, now for the Haskell part. postgresql-simple depends on native libraries. In CentOS 7 for example, we'll have to install the package postgresql-devel using yum, before invoking cabal install postgresql-simple.

Once we have the required dependencies, we can dabble with the following snippet (notice that we are connecting to Virtual DataPort, not directly to Musicbrainz):

...and it works! A sweet stream of data flows from the Musicbrainz database, passes through Virtual DataPort and arrives at our Haskell client.

And thanks to the virtualization capabilities of Virtual DataPort, that stream could potentially represent the combined flow of a number of affluents.

Friday, November 14, 2014

Pawn shops, refrigerators, and contravariant functors

Recently, I learned (by reading instance declarations in Haskell) that the composition of contravariant functors is covariant.

In this talk (around 27:30) Erik Meijer tells a colorful analogy for contravariance in subtyping. The analogy involves trashcans, which apparently are contravariant. I wonder if I can find a similar analogy for the case of composition, because I find it difficult to gain an intuition for it.

So, consider bananas. Bananas are a type of fruit.

Consider refrigerators as well. There are different types of refrigerators. Some have a technology advanced enough to freeze any fruit, but some are more modest and can only freeze bananas.

The refrigerators that can freeze any fruit are a subtype of the refrigerators that can freeze bananas. Because, any time you need to freeze a banana, you can freeze it in one of the refrigerators than can handle any kind of fruit.

In other words, refrigerators are contravariant.

Now consider pawn shops. Paws shops are contravariant as well: anything you can sell to a pawn shop that only takes rings, you can sell to a pawn shop that takes any jewelry. The pawn shops that take any jewelry are a subtype of the pawn shops that take rings.

There are also pawn shops for refrigerators. For example pawn shops for refrigerators that can freeze any fruit, or pawn shops for refrigerators that can freeze bananas. (Yes, they are quite specialized pawn shops, but bear with me.)

The pawnbrokers are very meticulous. Before accepting a refigerator for sale, they put inside it a viand of the type associated with the pawn shop, to check that the refrigerator indeed works.

Can you sell to the pawn shop of refrigerators that freeze bananas everything you could sell to the pawn shop of refrigerators that freeze any fruit? The answer is yes. You take the refrigerator that freezes any fruit to the pawn shop, the pawnbroker puts a banana on it, and it gets frozen. Deal!

So pawn shops for refrigerators are covariant on the type of viands that the refrigerators can freeze.

Whew, that was convoluted.

Monday, November 10, 2014

Two new "validation" applicatives

There exists an Either-like Applicative (usually called "Validation") that either collects all the errors, or returns a success. This is different from the Monad instance of Either, that stops at the first error encountered.

The type can be found at the validation package. However, that package incurs on a heavy lens dependency.

Recently, the type has cropped up without the lens dependency in both transformers (as the Errors type synonym) and either (as Validation).

The version in the either package has a more precise type, as it only requires the error values to form a Semigroup, but not necessarily a Monoid. So you can use a NonEmpty to accumulate errors.

Thursday, October 23, 2014

conceit 0.2.1.0

I have released a new version of the conceit package with the following changes:

The Applicative instance now doesn't require those ugly (Show e, Applicative e) constraints on the error type.

There's a Monad instance for Conceit. I initially made the >> operator concurrent, like *>, but that was a bad idea. In the end, the Monad instance has sequential operations and the Applicative concurrent ones, like it happens in Haxl.

There are also MonadThrow, MonadCatch and MonadError instances. The first two work throwing and catching exceptions from IO, the last one works more like ExceptT.

A special run function was added for when the error is "impossible":  Conceit Void a -> IO a. This makes easier to use Conceit in place of Concurrently.

The internals of this new version of conceit have been copied from those of Concurrently in Simon Marlow's async package, with modifications to support the new behaviour.

Sunday, October 19, 2014

Colchis, yet another JSON-RPC 2.0 client

There's no shortage of JSON-RPC 2.0 libraries on Hackage, as any cursory search will show. I have just added my own, called Colchis.

It is a pipes-based client that makes use of bidirectional pipes. It doesn't have a lot of features, in particular the only supported transport is raw TCP. I think HTTP transport support would be easy to add, but I don't have the need right now. No notifications or batched requests, either.

My aim is to be able to communicate with jsonrpc4j servers in "streaming" mode.

Check the examples folder in the repo for some examples of usage.

Monday, October 6, 2014

conceit 0.1.0.0

I have updloaded to Hackage a tiny package called conceit. It contains a slight variation on the Concurrently type from the async package.

Concurrently is hugely useful. A instance of Applicative, its <*> executes two IO actions concurrently (duh!) and, if one of the threads throws an exception, the other one is killed before re-throwing the exception. Very handy to ensure proper cleanup.

Conceit is similar to Concurrently, but the IO actions return an Either value. If both actions return with Right, the return values are combined. But if one of the actions returns with Left, the other action is immediately terminated and the Left value is returned. One could say that Conceit behaves like race for errors and like Concurrently for successful return values.

A Bifunctor instance is provided to help "massage" the errors.

Thursday, October 2, 2014

process-streaming 0.6.0.0

I have released (a few days ago already) version 0.6.0.0 of process-streaming, my library of pipes-based helpers built on top of the process package.

The API of version 0.3.0.0 proved to be too convoluted, with bloated and obscure function signatures. Hopefully the API for the new version is more intuitive.

The central abstraction if that of a Siphon. A Siphon represents a computation that either drains a Producer completely, or fails early aborting the consumption of the Producer. stdout and stderr can only be consumed through Siphons. Siphons can be created out of regular Consumers, pipes folds, or Parsers from pipes-parse.

Why do we need Siphons? When consuming both stdin and stderr of an external process, it is important that the two are drained to completion, because otherwise the process may block due to an output buffer that is not being read and fills up.

Siphons have an Applicative instance that "forks" a producer and feeds it to two computations. This Applicative instance made the support for branching pipelines of processes easy to implement.

The test suite contains several examples of usage.

Monday, June 30, 2014

The lazy Writer monad

I found an interesting example of the lazy State monad that uses "head recursion".

I tried to adapt the example for the lazy Writer monad but it proved difficult. If you naively consume the head of the infinite accumulator, it will hang, even with the lazy version of the monad:

The trick is to use Dual from Data.Monoid:

Maybe I suffer from a failure of imagination, but I can't think of any practical problem in which the lazy Writer monad offers an advantage over the strict version.

Monday, June 9, 2014

process-streaming 0.3.0.0

I have released version 0.3.0.0 of process-streaming, my library of pipes-based helpers built on top of the process package.

It contains breaking changes. Some functions have new names, and a few newtypes have been introduced in order to hide unnecessarily complex signatures. Hopefully the changes make the library a bit more intuitive.

You might find this library useful if:

  • You want an easy way to consume the standard streams of an external process using the tools provided by the pipes ecosystem.
  • You want concurrent, streaming access to both the stdout and stderr of an external process, without fear of deadlocks caused by full output buffers.
  • You want to consume stdout and stderr combined in a single stream.
  • You want to be relieved of tedious bookkeeping like: automatically killing the external process on the face of errors and exceptions, ensuring the termination of threads spawned for concurrent reads, and so on.

Monday, May 12, 2014

The IO monad has a non-strict bind

The Maybe monad has a bind function that is strict in its arguments:

The IO monad however has a non-strict bind function:

And so does the strict State monad:

It's easy to see in the source code:

To reduce m >>= k to Weak Head Normal Form, you only need to go as far as the StateT constructor; you don't need to touch any of the parameters. I suppose something similar happens with IO.

What's so strict then about the IO monad and the strict State monad? Their run functions. This:

As opposed to this:

Notice how in the strict case putting the expression in WHNF with seq is enough to trigger the exception.

Of course, with IO we don't have a explicit run function (one that isn't evil, I mean). We just put the IO action into main.

Notice however that even the strict State monad is not too strict:

This doesn't trigger an exception, either:

But this does:

Some good links about laziness here, here, here, here, here and here.

Monday, May 5, 2014

Understanding representable functors

Representable functors are put to good use in the linear library, and they can be a useful tool in other contexts. Here's a (hopefully not completely misleading) post on the subject.

The general idea

Consider the functor that carries a type a to the type of infinite lists of elements of type a. Let's call that functor Stream. So Stream Bool is the type of infinite lists of boolean values, Stream Char is the type of infinitely long strings, and so on.

There is a type with which the Stream functor has a special relationship: the type of natural numbers. Think about it: any value of type Stream a can be alternately represented by a function of type ℕ -> a. To recover the stream, we just have iterate over the naturals, applying the function at each step. Conversely, we can regard a Stream a value as a "memoized" version of a ℕ -> a function. We can recover the function by indexing on the stream.

There are other functors that have a similar "special relationship" with a certain type. That type need not be . Consider the following very simple functor:

This Pair functor has a special relationship with the Bool type. A value of type Pair a can be alternately represented by a function Bool -> a. Conversely, we can regard a value of type Pair a as a memoized or tabulated version of a Bool -> a function.

There seems to be a pattern here!

One more example: the Identity functor is represented by the trivial unit type (), that has only one value and carries no information. Identity a is just a, so there are no "positions" over which to "range", so to speak.

Composition, products, sums

Functors have the nice property that the composition of two functors is still a functor. We can ask ourselves if the composition of two representable functors is still representable. The answer is yes.

The composition (think of it as "nesting") of two functors is represented by the product of the representations of the original functors. Think of the type Stream (Stream a). It is like an infinite bidimensional array of a values. To index into it, we need two coordinates. So it is represented by (ℕ,ℕ).

The product of two functors is a functor as well. And the product of two representable functors is again representable! It is represented by the sum of the representations of the original functors. Supose you have a pair (Stream a, Stream a). If we want to index into it, we must first choose a side, and then drill into the stream of that side. So the product is represented by Either ℕ ℕ.

A third way of combining functors is summing them. Is a sum of representable functors representable? I don't know about the theory, but intuitively the answer seems to be no. We can't "index" on sum types, because we might attempt to target a branch that isn't there!

The Haskell implementation

The Representable typeclass can be found in module Data.Functor.Rep of the adjunctions package. It has the Distributive class as a prerrequisite. Distributive can be understood as the class of functors with a "fixed shape" whose values can be zipped together easily.

If you check the source, you'll see that the type families extension is used to associate each instance of Representable with its representation. It is instructive to explore which correspond to which. For example, Rep Identity = () as we have alredy mentioned.

You might wonder, why is Stream missing from the list of instances, after having been used many times as an example? It's there, actually, but you have to squint a little.

The Stream functor can be defined as Cofree Identity (the definition of Cofree is here). Cofree Identity is represented by sequences of unit () values. And the type of sequences of () is isomorphic to the natural numbers.

Saturday, May 3, 2014

Applicative vs. Monad

In an Applicative, effects build the railroad upon which the locomotive of function application will travel. All the effects take place while building the railroad, not when the locomotive moves. The locomotive cannot change its course in any way.

A Monad is like having a locomotive in a railroad which is still under construction. Passengers can actually yell to the construction crew a few meters ahead to tell them things like "I like the scenery on this side, please lay the tracks a bit more towards the right" or "you can stop laying tracks, I get out right here". Of course, for this strange railway, the company can't provide a timetable or a fixed list of stops that can be checked before embarking on the train.

Sunday, February 23, 2014

process + pipes = process-streaming

There are a number of good libraries for launching external processes already in Hackage. Basic libraries like process, and libraries that build on top of it like shelly and process-conduit (there is also module System.IO.Streams.Process from io-streams).

An official pipes-process package hasn't come out yet. After working recently with Ruby's Open3 package, and becoming inspired by this thread in the Haskell Pipes Google group, I decided to cobble together my own experiment. process-streaming is the result (git repo here).

I wanted to scratch a number of itches:

  • To my knowledge, neither shelly nor process-conduit provide concurrent streaming access to both the stdout and the stderr of a process, which is sometimes necessary. (shelly does provide direct access to the handles, but you have to orchestrate the concurrency by yourself.)
  • shelly and process-conduit use exceptions for signaling errors. This makes for simpler signatures. I wanted to test if a purely Either-based approach to errors could work without making the signatures excessively complicated.
  • Combining stdout with stderr is a frequent use case with some subtleties in the implementation. You have to be careful not to mangle lines, and also ensure that both streams are drained continuously (because otherwise you can get yourself into a blocking scenario).
  • Besides using plain Consumers, it should be easy to consume stdout and stderr using folds from Pipes.Prelude (and pipes-text and pipes-bytestring and pipes-group) and also parsers from the pipes-parse package. And it would be nice if two parsers could be run in parallel over the same Producer.
The haddock documentation for the System.Process.Streaming module is here. A tutorial with some examples can be found here.

If nothing else, this experiment has served me to understand the pipes ecosystem a little better. I really like the pipes-text approach for handling decoding leftovers (putting them in the return values of pipes) and the FreeT-based approach for parsing lines without never having to keep a whole line in memory at any time.