Max Ogden | Open Web programmer
April 2019
Voxel.js Next
Check out the Voxel.js reboot
May 2016
Getting Started With Node For Distributed Systems
Where to get started with streams and peer to peer
July 2015
What's the deal with iot.js and JerryScript
Node.js will soon be running on tiny low power chips
July 2015
Electron Fundamentals
A quick intro to Electron, a desktop application runtime
May 2015
HD Live Streaming Cats to YouTube with the Raspberry Pi Camera
A how to guide
May 2015
Interdisciplinary Open Source Community Conferences
A list of community organized events
April 2015
Setting up HTTPS with a wildcard certificate and Nginx
How I set up HTTPS with Nginx
April 2015
A Month of Modules
Modules Mafintosh and I wrote this month
February 2015
Tessel Powered Plant Watering System
Make an HTTP accessible water pump
January 2015
Portland Fiber Internet
Review of 1Gb fiber from CenturyLink
January 2015
node-repl
An interactive console for node
January 2015
Nested Dependencies
Insight into why node_modules works the way it does
July 2013
Node Packaged Modules
Bringing NPM modules to the web
March 2013
Kindleberry Wireless
A Portable Outdoor Hackstation
January 2013
Bringing Minecraft-style games to the Open Web
A status report from the one month old voxel.js project
November 2012
A Proposal For Streaming XHR
XHR2 isn't stream friendly. Lets explore why and propose a solution!
October 2012
Scraping With Node
Useful modules and a tutorial on how to parse HTML with node.js
October 2012
Building WebView Applications
Things I learned while building @gather
May 2012
Fast WebView Applications
How to make web apps feel fast and responsive
April 2012
Node Streams: How do they work?
Description of and notes on the node.js Stream API
December 2011
Gut: Hosted Open Data Filet Knives
HTTP Unix pipes for Open Data
July 2011
Little Coders
Elementary school programming
Gut: Hosted Open Data Filet Knives

HTTP Unix pipes for Open Data

Not Invented Here

A topic that has fascinated me for years now is (broadly speaking) nationalism. In the world of the internet that essentially boils down to something like "I am a Python programmer and this project is written in Java! Ignored!". It is this behavior (which I see a lot in programming, mostly manifested as "Not invented here") that leads to a bunch of solutions to the same problem written in a bunch of different languages where many of the solutions are half-baked.

For a concrete example consider open data catalogues. As evidenced by datacatalogs.orgthere are a ton of different solutions to the same set of problems, namely hosting open data. Having a rich ecosystem is a good thing, but I believe that there is a common open data infrastructure layer that we aren't maximizing our collaborating on: the conversion of data between different formats.

Wouldn't it be great if I, as a Javascript developer, could use the awesome data conversion libraries available in Java like Apache POI? Or if Ruby developers could use Python packages like csvkit (which contains the super useful csvclean utility). The good news is that the internet has settled on a common language for crossing these language barriers: HTTP and JSON. Additionally, nowadays the web is filled with hosted services (see SaaS, PaaS). There are numerous platforms where hosted services can be deployed for free (Google App Engine, Heroku, Dotcloud, Nodejitsu, etc).

Unix pipes

On the Unix command line there are a bunch of useful single purpose utilities. The Unix philosophy is "write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface". The Unix command wc is a great example. You give it a bunch of text and it will count the number of words, lines and characters. Combined with the cat command, which reads and file and dumps out all the text, you can use a Unix pipe (the | character) to 'pipe' the data that cat dumps out into wc:

cat awesomeText.txt | wc
21      55     507

What is gut

Taking heavy inspiration from unix pipes, HTTP and JSON I have come up with a modest proposal for how we might share our best tools for various data conversion jobs as hosted web services. I'm calling it gut, as in gutting a fish and getting the yummy filet out while leaving behind all of the junk.

Here's a simple example of how a gut server would work that takes in a CSV file and returns JSON data. As a developer using the gut server to process my CSV file I would send the following HTTP request containing my CSV data:

POST / HTTP/1.1
User-Agent: curl
Host: gutcsv.nodejitsu.com
Accept: */*
Content-Length: 64
Content-Type: application/x-www-form-urlencoded

name,appearance
chewbacca,hairy
bill,nonplussed
bubbles,relaxed

This is what the gut server would give me back:

POST / HTTP/1.1
host: gutcsv.nodejitsu.com
content-type: application/json
content-length: 186
Connection: close
  
{
  "headers": [
    {
      "name": "name"
    },
    {
      "name": "appearance"
    }
  ],
  "rows": [
    {
      "name": "chewbacca",
      "appearance": "hairy"
    },
    {
      "name": "bill",
      "appearance": "nonplussed"
    },
    {
      "name": "bubbles",
      "appearance": "relaxed"
    }
  ]
}

Essentially I am piping data from my computer through a gut server and when it comes back it is in the new format. In this example I used the node.js hosting platform Nodejitsu to deploy my CSV-to-JSON code so that it is available to anyone in the world who can make an HTTP request.

Voltron, assemble!

If you are writing code that converts data from one format to another, consider also exposing your solution in the form of a gut server! Last year I had great success at International Open Data Day teachin ScraperWiki because it scaled out well to a room full of people with different programming backgrounds. I think that writing these lightweight data converters/massagers/transformer servers is also a task that anyone can tackle in a short amount of time.

There is a Github project that contains the current gut servers that I have been working on and also a wiki pagewhere you can add your gut server to the list. Once there are a handful of gut servers we can start working on more extensive discovery and testing tools (ensuring gut server availability, better documentation, web based gut server API playground, etc).