Max Ogden | Open Web programmer
April 2019
Voxel.js Next
Check out the Voxel.js reboot
May 2016
Getting Started With Node For Distributed Systems
Where to get started with streams and peer to peer
July 2015
What's the deal with iot.js and JerryScript
Node.js will soon be running on tiny low power chips
July 2015
Electron Fundamentals
A quick intro to Electron, a desktop application runtime
May 2015
HD Live Streaming Cats to YouTube with the Raspberry Pi Camera
A how to guide
May 2015
Interdisciplinary Open Source Community Conferences
A list of community organized events
April 2015
Setting up HTTPS with a wildcard certificate and Nginx
How I set up HTTPS with Nginx
April 2015
A Month of Modules
Modules Mafintosh and I wrote this month
February 2015
Tessel Powered Plant Watering System
Make an HTTP accessible water pump
January 2015
Portland Fiber Internet
Review of 1Gb fiber from CenturyLink
January 2015
node-repl
An interactive console for node
January 2015
Nested Dependencies
Insight into why node_modules works the way it does
July 2013
Node Packaged Modules
Bringing NPM modules to the web
March 2013
Kindleberry Wireless
A Portable Outdoor Hackstation
January 2013
Bringing Minecraft-style games to the Open Web
A status report from the one month old voxel.js project
November 2012
A Proposal For Streaming XHR
XHR2 isn't stream friendly. Lets explore why and propose a solution!
October 2012
Scraping With Node
Useful modules and a tutorial on how to parse HTML with node.js
October 2012
Building WebView Applications
Things I learned while building @gather
May 2012
Fast WebView Applications
How to make web apps feel fast and responsive
April 2012
Node Streams: How do they work?
Description of and notes on the node.js Stream API
December 2011
Gut: Hosted Open Data Filet Knives
HTTP Unix pipes for Open Data
July 2011
Little Coders
Elementary school programming
A Proposal For Streaming XHR

XHR2 isn't stream friendly. Lets explore why and propose a solution!

Note: This article predates the HTML5 Streams work

XHR2 (XHR is short for XMLHTTPRequest and is the HTTP client in AJAX) does some cool stuff. You can get binary response data, either read-only in a Blob or mutable in an Array Buffer (which you can turn into one of the many flavors of Typed Arrays). Binary data in JavaScript pushes the boundaries of the web by enabling rich multimedia experiences (like those demonstrated in this mind-blowing talk on the Web Audio API by @stuartmemo).

There is one fundamental problem with the current XHR specification (and implementations) which is that they aren't designed for real-time streaming, but with a few tweaks I believe they can enable a much better web experience.

Here is an example of trying to wrap XHR in the node.js Stream API. The important part of the following code is the write function which takes the Array Buffer that XHR returns, turns it into a Typed Array and then emits each new chunk of binary data each time that xhr.readyState 3 is fired.

var stream = require('stream')
var util = require('util')

function XHRStream(xhr) {
  stream.Stream.call(this)
  this.xhr = xhr
  this.offset = 0
  xhr.onreadystatechange = this.handle.bind(this)
  xhr.send(null)
}

// copy the Stream methods to this prototype
util.inherits(XHRStream, stream.Stream)

XHRStream.prototype.handle = function () {
  // readyState 3 will be fired many times during a large download
  if (this.xhr.readyState === 3) this.write()
  if (this.xhr.readyState === 4) this.emit('end')
}

XHRStream.prototype.write = function () {
  if (!this.responseArray) this.responseArray = new Int8Array(this.xhr.response)
  if (this.responseArray.byteLength > this.offset) {
    this.emit('data', this.responseArray.slice(this.offset))
    this.offset = this.responseArray.byteLength
  }
}

module.exports = XHRStream

Fundamentally what node Streams (and the above code) do is take a huge response that may take a loooooong time to complete (like downloading a Blu-ray DVD) and splits the response up into chunks. This is a beautiful pattern due to its simplicity. The programmer can decide if they want to combine the chunks and store them in a file or database OR if they want to process the chunks one at a time immediately and then throw the chunks away so that the JavaScript VM can clean the chunks up and free up the memory they were using. The xhr.response (xhr.responseText when getting non-binary responseTypes) in an XHR request is a single JavaScript object that just grows and grows and grows so if you are downloading a 5GB file you will have a 5GB JavaScript object in memory at the end of the request. The "node way" would be to have hundreds of small objects that each contain a contiguous chunk of the file that get emitted as soon as the client receives the data from the network.

The XHRStream prototype lets you write code that looks like this:

var xhr = new XMLHttpRequest()
xhr.responseType = 'arraybuffer'
xhr.open("GET", "http://bigdata.com/hugefile.zip", true)
var response = new XHRStream(xhr)

// 'data' events will happen each time a new chunk gets to the browser
response.on('data', function(chunk) {
  // chunk size in this case is determined by TCP and will
  // probably be in the range of 10s or 100s of kilobytes
})

Unfortunately this isn't possible with the current XHR implementation for two reasons. The first is that xhr.response is essentially one big buffer that keeps growing linearly as the response data comes in which means it can't get garbage collected. This just means you can't download files with XHR that are larger than the amount of RAM in your machine which isn't a total deal breaker for streaming data. The other reason is that currently the XHR spec prevents access to binary response data before the request has completed. Here is a simplified excerpt from the XMLHTTPRequest specification on w3.org (thanks to @tobie for assisting me in navigating the web standards world here):

If responseType is "text"
  If the state is not LOADING or DONE, return empty string and terminate these steps
  Otherwise return the text response entity body.
Otherwise (for all other types of responseTypes such as 'arraybuffer' or 'blob')
  If the state is not DONE, return null and terminate these steps.
    

Here is a discussion from the whatwg mailing list on this topic that contains this particularly insightful comment:

Hmm! And I guess it's very difficult to create a abstract in/out 
interface that can handle any protocol/stream.
Although an abstract in/out would be ideal as that would let new 
protocols to be supported without needing to rewrite anything at the 
higher level.

-- 
Roger "Rescator" HΓ₯gensen.
    

I totally agree! It turns out that the node Stream API, which is the core I/O abstraction in Node.js (which is a tool for I/O) is essentially an abstract in/out interface that can handle any protocol/stream that also happens to be written in JavaScript.

It should be noted that websockets now support transporting binary data but the websocket spec still isn't perfect for streaming and the overwhelming majority of APIs are HTTP.

So, to any standards people or browser implementers reading this: please please please take some notes from Node.js and change the spec to allow for truly streaming data over XMLHTTPRequest.