Monday, December 22, 2014

Connecting to Denodo Virtual DataPort from Haskell

My employer Denodo has released an "Express" version of its data virtualization platform. The Virtual DataPort component lets you do cool things like joining tables that reside in completely different databases (say, one table in Oracle and another in MySQL). And it comes with a boatload of connectors for a wide range of data sources.

I'm interested in accessing all that data goodness from Haskell. There are a couple of ways of doing it.

One would be to use Virtual DataPort's REST interface. That's a valid option and maybe the subject of another post.

Another is to connect through the ODBC interface on port 9996. As it happens, the wire protocol aims to be compatible with that of PostgreSQL, so maybe I could just use Haskell's popular postgresql-simple library. Let's try it out.

Musicbrainz


First, we need an underlying database instance that we can "wrap" with Virtual DataPort. Let's use the Musicbrainz database, a perennial favorite for testing. We can download it as a virtual machine and run it on Virtual Box. I usually set the guest networking to NAT and only forward PostgreSQL's port, which is 5432.

Virtual DataPort


Then we download and install Denodo Express, start the Virtual DataPort server, and launch the graphical administration tool.

From the administration tool's GUI, we have to import a few tables from the Musicbrainz database. First we create a JDBC data source:


(The Musicbrainz database is called musicbrainz_db, the user/password is musicbrainz/musicbrainz.)

And then import the artist_name and release tables:


Haskell


Ok, now for the Haskell part. postgresql-simple depends on native libraries. In CentOS 7 for example, we'll have to install the package postgresql-devel using yum, before invoking cabal install postgresql-simple.

Once we have the required dependencies, we can dabble with the following snippet (notice that we are connecting to Virtual DataPort, not directly to Musicbrainz):

...and it works! A sweet stream of data flows from the Musicbrainz database, passes through Virtual DataPort and arrives at our Haskell client.

And thanks to the virtualization capabilities of Virtual DataPort, that stream could potentially represent the combined flow of a number of affluents.

No comments:

Post a Comment