D3.js: How to handle dynamic JSON Data

When I started with d3.js, I really struggled understanding how I could link data from a JSON feed to a SVG graph. I read a lot of tutorials but still, I couldn't find what I was looking for. Now that I know how d3.js behaves, I thought it would a good idea to share the things that I learned.

When you start using the library, there's stuff that might be foreign to you:

My goal was to create a dynamic graph that I could add, edit and remove data and have d3 update the graph in real time but I couldn't understand how to handle enter() and exit() with JSON. Many examples out there were done using static data so it wasn't covering my issues with JSON.

Throughout this post, I will be using this response as an example of a JSON response to show you how to represent your JSON into a graph.

JSONData = [
  { "id": 3, "created_at": "Sun May 05 2013", "amount": 12000},
  { "id": 1, "created_at": "Mon May 13 2013", "amount": 2000},
  { "id": 2, "created_at": "Thu Jun 06 2013", "amount": 17000},
  { "id": 4, "created_at": "Thu May 09 2013", "amount": 15000},
  { "id": 5, "created_at": "Mon Jul 01 2013", "amount": 16000}
]

Let's build a graph

Before anything else, a working graph would be the best way to get started. To help with comprehension, I will not draw axis and, lines and other stuff that you would normally see in a graph. The point here is to visualize and understand how enter() and exit() works with arbitrary data.

(function() {
  var data = JSONData.slice()
  var format = d3.time.format("%a %b %d %Y")
  var amountFn = function(d) { return d.amount }
  var dateFn = function(d) { return format.parse(d.created_at) }

  var x = d3.time.scale()
    .range([10, 280])
    .domain(d3.extent(data, dateFn))

  var y = d3.scale.linear()
    .range([180, 10])
    .domain(d3.extent(data, amountFn))
  
  var svg = d3.select("#demo").append("svg:svg")
  .attr("width", 300)
  .attr("height", 200)

  svg.selectAll("circle").data(data).enter()
   .append("svg:circle")
   .attr("r", 4)
   .attr("cx", function(d) { return x(dateFn(d)) })
   .attr("cy", function(d) { return y(amountFn(d)) }) 
})();

Hint: You can click on the dots.

Scales

I would like to take a moment here to talk about the two scales I use here. You may wish to skip to the explanation regarding enter() if you already know about scales.

Scales are objects. d3.time.scale() and d3.scale.linear() are two class constructors. The subsequent methods are getter/setter merged in the same method. If you specify an argument, it sets the value. If you don't, it returns the current value.

Range

Range is the amount of pixels you wish to cover with the scale. When you start, you might want to cover the whole width and height of your SVG canvas. If you do so, you will eventually see that elements will be drawn on the edge and become clipped. Giving padding to your range makes sense since every piece of calculation and rendering will be done using your scales so you set your padding in one place and everything you do in you graph will be inside it.

In the case of our graph here, I used a 10px padding on each side of the graph.

You may also notice that the scale in the y-axis is inverted. This is because SVG's y-coordinate is inverted. 0 is at the top of the graph while the height is at the bottom.

Domain

Domain is the start and end of your dataset. It can be any kind of value that can be compared in javascript. Here, one domain is a range of numbers while the other is a range of dates. While the range is usually fixed and doesn't change during the lifetime of the graph, the domain may have to change if your dataset changes.

Notice that I use d3.extent() as an argument to the domain. This is basically an alias of the following

x.domain( [ d3.min(data, dateFn), d3.max(data, dateFn) ] )

Enter()

If you read the documentation about enter or exit, you know that data is bound to nodes.

How are they bound? It depends on data().

svg.selectAll("circle")
 .data(JSONData, function(d) {
   return d.created_at
 })

If you only set the values for data (like data([1,2,3,4])), d3.js will bind the first value in the array to the first node element (in this case, a circle)

If you set a key function, it will bind the value to the node matching having the same key. It's worth mentioning that if you already have data with the same key, d3.js will store the data in the node but will still use the original data. Don't worry too much about the data for now, I'll cover it further down the post.

Once your data is bound to your selection (svg.selectAll("circle").data(JSONData)), enter() will return all the values for which no node circle could be found.

After that, it's your job to append a new element ("svg:circle") and set the proper attribute to that object. If you have ever read a d3.js tutorial, those lines should mean something to you.

Add new entries

The goal of this post is to add and remove arbitrary data from the graph, and while editing/removing data will need a bit of work, adding new entry is pretty straight forward. The current code already support dynamic data addition.

The enter() method will need to be refactored to avoid code duplication where it's possible.

For the sake of this experiment, I'll build a button that will randomly add javascript object to JSONData.

(function() {
  var data = JSONData.slice()
  var format = d3.time.format("%a %b %d %Y")
  var amountFn = function(d) { return d.amount }
  var dateFn = function(d) { return format.parse(d.created_at) }

  var x = d3.time.scale()
    .range([10, 280])
    .domain(d3.extent(data, dateFn))

  var y = d3.scale.linear()
    .range([180, 10])
    .domain(d3.extent(data, amountFn))
  
  var svg = d3.select("#demo").append("svg:svg")
  .attr("width", 300)
  .attr("height", 200)

  var refreshGraph = function() {
    svg.selectAll("circle").data(data).enter()
     .append("svg:circle")
     .attr("r", 4)
     .attr("cx", function(d) { return x(dateFn(d)) })
     .attr("cy", function(d) { return y(amountFn(d)) }) 
  }

  d3.selectAll(".add-data")
   .on("click", function() {
     var start = d3.min(data, dateFn)
     var end = d3.max(data, dateFn)
     var time = start.getTime() + Math.random() * (end.getTime() - start.getTime())
     var date = new Date(time)

     obj = {
       'id': Math.floor(Math.random() * 70),
       'amount': Math.floor(1000 + Math.random() * 20001),
       'created_at': date.toDateString()
     }
     data.push(obj)
     refreshGraph()
  })

  refreshGraph()

})();

Hint: Add more than 1 random entry

Add data

Outside the current (domain)

When adding data, it might falls outside the current graph. What you need to do is to reset the scale to include the new minimum and maximum from your dataset.

This is exactly what I was talking about earlier when I was explaining how scales works. Because some values are smaller or greater than the limit we set at the beginning, the x and y value that are generated with the original scale are too small/big to be rendered on the canvas. The scale need to be set to reflect the current data.

To achieve this, refreshGraph() needs to be refactored to include two changes. It needs to

  1. update the domains with the new dataset;
  2. update existing circles (a.k.a nodes) with the new scales;
var refreshGraph = function() {
  x.domain(d3.extent(data, dateFn))
  y.domain(d3.extent(data, amountFn))

  var circles = svg.selectAll("circle").data(data)
  
  circles.transition()
   .attr("cx", function(d) { return x(dateFn(d)) })
   .attr("cy", function(d) { return y(amountFn(d)) })

   circles.enter()
    .append("svg:circle")
    .attr("r", 4)
    .attr("cx", function(d) { return x(dateFn(d)) })
    .attr("cy", function(d) { return y(amountFn(d)) })
}

Hint: Add many entries

Add data

Modify the scales

First, I don't recreate the scales from scratch. The range value are still good and I don't want to reconfigure every settings I set just because the domain changes.

The domain is the only thing that needs to change.

Transition?

Two other things are different than the previous version: I now assign a circles variable and I use transition() for the first time.

I assign the result of data() to a variable because enter() only affects new data that are not bound to nodes.

When I started playing with d3.js, I thought that rendered data would be evented and react to my changing of scales, domains and whatnot. It's not. You have to tell d3.js to update the current stuff or it will stay there, unsynced.

So, the first cy, cx assignations are for nodes that are already rendered. transition() is added there to add a nice animation between the two position for each node. If you don't want animation, remove transition() and the graph will update itself without animation.

Remove Data

Removing data is more tricky. Because the current data() does not have a key function, removing data from anywhere in the array will cause nodes & data to fall out of sync. The solution is to add the key function that was explained at the beginning of this post. Here's how we can fix this out of sync issue:

svg.selectAll("circle").data(JSONData, dateFn) // I want the key to be a date object

And here's all the code that generates the following graph (I don't show the add/remove button's logic because it's not relevant to the graph. You are more than welcome to look at the source code of this page to know how I did it)

(function() {
  var data = JSONData.slice()
  var x = d3.time.scale()
    .range([10, 280])

  var y = d3.scale.linear()
   .range([180, 10])

  var svg = d3.select("#demoAddRandomAndResize").append("svg:svg")
   .attr("width", 300)
   .attr("height", 200)

  var start = d3.min(data, dateFn)
  var end = d3.max(data, dateFn)

  var refreshGraph = function() {

    x.domain(d3.extent(data, dateFn))
    y.domain(d3.extent(data, amountFn))

    var circles = svg.selectAll("circle").data(data, dateFn)

    circles.transition()
     .attr("cx", function(d) { return x(dateFn(d)) })
     .attr("cy", function(d) { return y(amountFn(d)) })

    circles.enter()
     .append("svg:circle")
     .attr("r", 4)
     .attr("cx", function(d) { return x(dateFn(d)) })
     .attr("cy", function(d) { return y(amountFn(d)) })

    circles.exit()
     .remove()
  }

  refreshGraph()
  
})()

Hint: Add many entries

Add data Delete data

As the previous example, transition, enter and exit are separated since those three methods are not chainable. It also think it gives a boost on readability.

The real differences here is where I assign the data at line #21 and the exit method at line #38.

Now that we have set a key, there's another behavior that has happened: No duplicates!

You can test this behavior by going to the previous graph and add a lot of data, you will soon see more than 1 data for a given date. Add a lot of data here, after a while, the graph won't add any nodes. That's the key function in action!

Done!

Now, you can add, edit and remove nodes as well as scale the graph to include every nodes in the graph. You can also uses extent() to create range and use a key function as a primary key for your data.

You can also transition() nodes so they move around when the scale change and we refactored the code so it's DRY enough to be called from different places.

Not bad, right?