In praise of the humble timeline visualization

January 30, 2024
visualization

tl;dr: Timeline visualizations are surprisingly effective, especially to look for parallelization opportunities. Reduce friction by publishing self-contained visualizations as HTML files.

Visualization timings as a timeline can be effective at conveying why something is slow and how to speed it up. For example, I've used timeline visualizations to show the steps of a build process. Builds often have embarassingly parallelizable steps and can be sped up by using a big machine and parallelizing tasks, thereby reducing wall clock time! This post highlights a few self-contained ways to draw these visualizations.

Often times, the first such visualization is produced "manually" with a spreadsheet, which is great. As things change around you, however, you'll want to repeat the visualization, which makes generating it programmatically and keeping it around as a build output convenient. Prioritize ease of use for the consumers of the visualization and for yourself. If generating things with SVG is your jam, great. If you're a whiz at gnuplot, excellent.

Vega #

I learned about the Grammar of Graphics from Wickham's ggplot2 book and learned about Bostock's D3 at about the same time. We were figuring out how to draw timeseries graphs for monitoring.

Wilkinson (2005) created the grammar of graphics to describe the fundamental features that underlie all statistical graphics. The grammar of graphics is an answer to the question of what is a statistical graphic? ggplot2 (Wickham 2009) builds on Wilkinson’s grammar by focussing on the primacy of layers and adapting it for use in R. In brief, the grammar tells us that a graphic maps the data to the aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also include statistical transformations of the data and information about the plot’s coordinate system. Facetting can be used to plot for different subsets of the data. The combination of these independent components are what make up a graphic.
https://ggplot2-book.org/introduction#what-is-the-grammar-of-graphics

This idea of composing a graphic from a basis of different visualizations transformed how I see data visualization. No longer are we selecting from a large menu of existing visualizations, but, instead, we can build an infinite variety from a lot of options.

At some point (the 90's, in my recollection), newspapers went from black and white to color. Similarly, the web added another dimension to visualiation, namely interactivity. Hover and click behavior isn't something you could do on paper. Cool!

Back to Vega-Lite: you can describe a graphic in a JSON structure, and it'll draw it for you. This is something you can compile down to, or you can build it explicitly. The example below draws a sample build timeline. We assign tasks to swimlanes manually, and then draw the tasks as bars. We use color to distinguish successful versus failing tasks. Hover to see additional detail.

For practical purposes, you can use https://vega.github.io/vega-lite-api/ to get some TypeScript help to build these in JS or Altair for doing this in Python. I use Altair and Jupyter Notebooks together.

Vega-Lite is Jeff Heer's work; check it out.

Source

<!doctype html>
<!-- See https://vega.github.io/vega-lite/usage/embed.html -->
<html>
  <head>
    <title>Vega-Lite Timeline</title>
    <script src="https://cdn.jsdelivr.net/npm/vega@5.25.0"></script>
    <script src="https://cdn.jsdelivr.net/npm/vega-lite@5.16.3"></script>
    <script src="https://cdn.jsdelivr.net/npm/vega-embed@6.22.2"></script>
  </head>
  <body>
    <div id="vis"></div>
    <script type="text/javascript">
			const values = [
				{name: 'container build', start: 0, duration: 30, state: 'PASS'},
				{name: 'typescript', start: 30, duration: 20, state: 'PASS'},
				{name: 'a.test', start: 30, duration: 10, state: 'PASS'},
				{name: 'b.test', start: 40, duration: 10, state: 'PASS'},
				{name: 'c.test', start: 30, duration: 5, state: 'PASS'},
				{name: 'd.test', start: 35, duration: 17, state: 'FAIL'},
				{name: 'e.test', start: 50, duration: 10, state: 'PASS'},
				{name: 'finish', start: 65, duration: 7, state: 'PASS'},
			];
			const lanes = [];
			for (const v of values) {
				for (let lane = 0; lane < lanes.length; ++lane) {
					if (lanes[lane] <= v.start) {
						lanes[lane] = v.start + v.duration;
						v.lane = lane;
						break;
					}
				}
				if (v.lane === undefined) {
					v.lane = lanes.length;
					lanes.push(v.start + v.duration);
				}
			}

      const spec = {
        $schema: 'https://vega.github.io/schema/vega-lite/v5.json',
        description: 'Timelines are great.',
				data: { values },
				// I typically do this against the data directly.
				'transform': [
					{calculate: 'datum.start + datum.duration', as: 'end'},
				],
				encoding: {
					x: {field: 'start', type: 'quantitative', axis: {title: 'Duration (seconds)'}, order: null},
					// y: {field: 'name', type: 'ordinal'},
					y: {field: 'lane', type: 'ordinal'},
					// Should be possible to do content: "data" here to tooltip
					// the whole row, but it's not working.
					tooltip: [{field: 'name', type: 'ordinal'}, {field: 'duration'}],
				},
				layer: [{
					mark: {type: 'bar', height: 20, cornerRadius: 8 },
					encoding: {
						// You can also use type temporal and timeUnit: minutesseconds, but
						// then the input data needs to be milliseconds or similar to an ISO8601 string.
						x2: {field: 'end', type: 'quantitative'},
						color: {field: 'state', type: 'ordinal', scale: {domain: ['PASS', 'FAIL'], range: ['lightgreen', 'red']}},
					},
				}, {
					mark: {
						type: "text",
						align: "left",
						dx: 3,
					},
					encoding: {
						text: {"field": "name", "type": "ordinal"},
						x: {field: 'start', type: 'quantitative', axis: {title: 'Duration (seconds)'}, order: null},
					}
				}],
				width: 600,
				title: 'Vega-Lite Timeline',
      };
      vegaEmbed('#vis', spec);
    </script>
  </body>
</html>

Observable Plot #

I "learned" Observable Plot for this blog post. (I put learned in quotes: there's much to learn, and I learned a fraction.) It's newer, and you could smell it as an evolution of the ggplot2 and Vega lineage. It's not an intermediate format: it's assuming you're drawing your graph in your JS. (Even if you don't speak too much JS, note that your data can be external, so you can treat this as a DSL for graphing if you'd like.)

Source

<!DOCTYPE html>
<div id="myplot"></div>
<script type="module">

import * as Plot from "https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6/+esm";

const values = [
				{name: 'container build', start: 0, duration: 30, state: 'PASS'},
				{name: 'typescript', start: 30, duration: 20, state: 'PASS'},
				{name: 'a.test', start: 30, duration: 10, state: 'PASS'},
				{name: 'b.test', start: 40, duration: 10, state: 'PASS'},
				{name: 'c.test', start: 30, duration: 5, state: 'PASS'},
				{name: 'd.test', start: 35, duration: 17, state: 'FAIL'},
				{name: 'e.test', start: 50, duration: 10, state: 'PASS'},
				{name: 'finish', start: 65, duration: 7, state: 'PASS'},
];

const plot = Plot.plot({
  stroke: 'state',
   color: {
    domain: ["PASS", "FAIL"],
    range: ["green", "red"],
  },
  marks: [
    Plot.axisY({label: null, marginLeft: 80}),
    Plot.barX(values, {
      x1: "start",
      x2: d => d.start + d.duration,
      y: "name",
      sort: {y: "x"},
      stroke: 'state',
      tip: true
    }),
    Plot.text(values, {
      x: "start",
      y: "name",
      text: "name",
      dx: 3,
      textAnchor: "start"
    })
  ]
});
const div = document.querySelector("#myplot");
div.append(plot);
</script>

Mermaid.js #

Unlike d3/vega/observable plot, Mermaid is great for the chart types it supports, but does not mix and match effectively. That said, Github flavored markdown supports it natively, which makes it a great documentation format. Here we abuse the "Gantt" type:

Source

<!doctype html>
<html>
  <head>
    <title>Mermaid Timeline</title>
  </head>

<body>
<script type="module">
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
mermaid.initialize({ startOnLoad: true });
</script>
<pre class="mermaid">
---
displayMode: compact
---
gantt
    title Mermaid Timeline via Gantt
		%% X is seconds since the epoch
    dateFormat	X
		%% Display as minutes, seconds.
    axisFormat	%M:%S
    section Stuff

    %% Generate this from the data like so:
		%% crit/active/done change the formatting, and I didn't encode those from the data.
		%% cat data.json | jq '.[] | (.name + ":" + (.start | tostring) + "," + (.duration|tostring) + "s")' -r | sed 's,^,\t,'
		container build:0,30s
		typescript:30,20s
		a.test:30,10s
		b.test:crit,40,10s
		c.test:active,30,5s
		d.test:done,35,17s
		e.test:50,10s
		finish:65,7s
</pre>
</html>

See also Bryce Mecum's "TIL: Mermaid Gantt diagrams are great for displaying distributed traces in Markdown" post in the context of distributed tracing (Dapper, XTrace, Honeycomb, Lightstep, Jaeger, etc.).

Chrome Tracing / Perfetto #

chrome://tracing is a favorite obscure tool, great for showing nested traces (like what you'd find in a distributed trace viewer like Honeycomb/Dapper). You know it's good when the documentation is a Google Doc from 2016. It's being replaced with Perfetto.

Chrome tracing is not easy to embed. (The instructions at https://github.com/catapult-project/catapult/blob/master/tracing/docs/embedding-trace-viewer.md bring to mind one of my favorite web comics.) Once you look into the embedding documentation, you find a reference to this Chrome issue, which sends us to the replacement system, "https://ui.perfetto.dev, the WIP next generation of trace viewer."

So, in this case, we use jq to massage our input into the "old" trace format, which has one JSON object per event:

$cat data.json | jq '{
  "traceEvents": [
    .[] | {
      "pid": 1,
      "tid": 1,
      "ts": (1000000 * .start),
      "dur": (1000000 * .duration),
      "ph": "X",
      "name": .name
    }
  ]
}'  > perfetto.json

Self-hosting Perfetto is somewhat annoying. You can, however, include an iframe (or open a new window), and postMessage your data to it. This postMessage trick is also used when you "Open in Vega Editor" on the Vega-Lite example above.