XHR2 isn't stream friendly. Lets explore why and propose a solution!
Note: This article predates the HTML5 Streams work
XHR2 (XHR is short for XMLHTTPRequest and is the HTTP client in AJAX) does some cool stuff. You can get binary response data, either read-only in a Blob or mutable in an Array Buffer (which you can turn into one of the many flavors of Typed Arrays). Binary data in JavaScript pushes the boundaries of the web by enabling rich multimedia experiences (like those demonstrated in this mind-blowing talk on the Web Audio API by @stuartmemo).
There is one fundamental problem with the current XHR specification (and implementations) which is that they aren't designed for real-time streaming, but with a few tweaks I believe they can enable a much better web experience.
Here is an example of trying to wrap XHR in the node.js Stream API. The important part of the following code is the write
function which takes the Array Buffer that XHR returns, turns it into a Typed Array and then emits each new chunk of binary data each time that xhr.readyState 3 is fired.
var stream = require('stream')
var util = require('util')
function XHRStream(xhr) {
stream.Stream.call(this)
this.xhr = xhr
this.offset = 0
xhr.onreadystatechange = this.handle.bind(this)
xhr.send(null)
}
// copy the Stream methods to this prototype
util.inherits(XHRStream, stream.Stream)
XHRStream.prototype.handle = function () {
// readyState 3 will be fired many times during a large download
if (this.xhr.readyState === 3) this.write()
if (this.xhr.readyState === 4) this.emit('end')
}
XHRStream.prototype.write = function () {
if (!this.responseArray) this.responseArray = new Int8Array(this.xhr.response)
if (this.responseArray.byteLength > this.offset) {
this.emit('data', this.responseArray.slice(this.offset))
this.offset = this.responseArray.byteLength
}
}
module.exports = XHRStream
Fundamentally what node Streams (and the above code) do is take a huge response that may take a loooooong time to complete (like downloading a Blu-ray DVD) and splits the response up into chunks. This is a beautiful pattern due to its simplicity. The programmer can decide if they want to combine the chunks and store them in a file or database OR if they want to process the chunks one at a time immediately and then throw the chunks away so that the JavaScript VM can clean the chunks up and free up the memory they were using. The xhr.response
(xhr.responseText
when getting non-binary responseTypes) in an XHR request is a single JavaScript object that just grows and grows and grows so if you are downloading a 5GB file you will have a 5GB JavaScript object in memory at the end of the request. The "node way" would be to have hundreds of small objects that each contain a contiguous chunk of the file that get emitted as soon as the client receives the data from the network.
The XHRStream prototype lets you write code that looks like this:
var xhr = new XMLHttpRequest()
xhr.responseType = 'arraybuffer'
xhr.open("GET", "http://bigdata.com/hugefile.zip", true)
var response = new XHRStream(xhr)
// 'data' events will happen each time a new chunk gets to the browser
response.on('data', function(chunk) {
// chunk size in this case is determined by TCP and will
// probably be in the range of 10s or 100s of kilobytes
})
Unfortunately this isn't possible with the current XHR implementation for two reasons. The first is that xhr.response
is essentially one big buffer that keeps growing linearly as the response data comes in which means it can't get garbage collected. This just means you can't download files with XHR that are larger than the amount of RAM in your machine which isn't a total deal breaker for streaming data. The other reason is that currently the XHR spec prevents access to binary response data before the request has completed. Here is a simplified excerpt from the XMLHTTPRequest specification on w3.org (thanks to @tobie for assisting me in navigating the web standards world here):
If responseType is "text"
If the state is not LOADING or DONE, return empty string and terminate these steps
Otherwise return the text response entity body.
Otherwise (for all other types of responseTypes such as 'arraybuffer' or 'blob')
If the state is not DONE, return null and terminate these steps.
Here is a discussion from the whatwg mailing list on this topic that contains this particularly insightful comment:
Hmm! And I guess it's very difficult to create a abstract in/out
interface that can handle any protocol/stream.
Although an abstract in/out would be ideal as that would let new
protocols to be supported without needing to rewrite anything at the
higher level.
--
Roger "Rescator" HΓ₯gensen.
I totally agree! It turns out that the node Stream API, which is the core I/O abstraction in Node.js (which is a tool for I/O) is essentially an abstract in/out interface that can handle any protocol/stream that also happens to be written in JavaScript.
It should be noted that websockets now support transporting binary data but the websocket spec still isn't perfect for streaming and the overwhelming majority of APIs are HTTP.
So, to any standards people or browser implementers reading this: please please please take some notes from Node.js and change the spec to allow for truly streaming data over XMLHTTPRequest.