escapewindow | blue sky: a federation of automation platforms (Reply)

Premise (Tl;dr)

A federated protocol for automation platforms could usher in a new era of collaboration between open-source projects, corporations, NGOs, and governments. This collaboration could happen, not at the latency of human handoffs, but at the speed of automation.

(I had decided to revive this idea in a blog post before the renewed interest in Mastodon, but the timing is good. I had also debated whether to post this on US election day, but it may be a welcome distraction?)

Background

Once upon a time, an excited computer lab assistant showed my class the world wide web. Left-aligned black text with blue, underlined hypertext on a grey background, interspersed with low-resolution GIFs. Sites, hosted on other people's computers across the country, transferred across analog phone lines at over a thousand baud. "This," he said. "This will change everything."

Some two decades later, I blogged about blue sky, next-gen Release Engineering infrastructure without knowing how we'd get there. Stars aligned, and many teams put in hard work. Today, most of our best ideas made it into taskcluster, the massively scalable, cloud-agnostic automation platform that runs Mozilla's CI and Release Pipelines.

Firefox CI in January 2019: 7,139,432 tasks; 259.0 compute years; 1,071,533 unique workers; 383,411 maximum concurrent tasks #Mozilla #ContinuousIntegration #Taskcluster
— Chris Cooper (@ccooper) February 6, 2019

The still-unimplemented idea that's stuck with me the longest is something I referred to as cross-cluster communication.

Simple cases

In the simplest case, what if you could spin up two Taskcluster instances, and one cluster's tasks could depend on the other cluster's tasks and artifacts? Task 2 on Cluster B could remain unscheduled until Task 1 on Cluster A finished. At that point, Task 2 could move to `pending`, then `running`, and use Task 1's artifacts as input.

We might have an upstream dependency that tends to break everything in our pipeline whenever a new release of that dependency ships. Our current solution might involve pinning this dependency and periodically bumping the pin, debugging and bisecting any bustage at that time. But what if we could fire a set of unit and integration tests, not just when the dependency ships a new release, but whenever their CI runs a build off of trunk? We could detect the breaking change much more easily and quickly. Cross-cluster communication would allow for this cross-cluster dependency while leaving the security, billing, and configuration decisions in the hands of the individual cluster owners.

Or we could split up the FirefoxCI cluster. The release pipeline could move to a locked-down cluster for enhanced security monitoring. In contrast, the CI pipeline could remain accessible for easier debugging. We wouldn't want to split the hardware test pools. We have a limited number of those machines. Instead, we could create a third cluster with the hardware test pools, triggering tests against builds generated by both upstream clusters.

Of course, we wouldn't need this layer if we only wanted to connect to one or two upstreams. This concept starts to shine when we scale up. Many-to-many.

Many-to-many

Open source projects could collaborate in ways we haven't yet seen. But this isn't limited to just open source.

If the US were using Taskcluster, municipal offices could collaborate, each owning their own cluster but federating with the others. The state could aggregate each and generate state-wide numbers and reports. The federal government could aggregate the states' data and federate with other nations. NGOs and corporations could also access public data. Traffic. Carbon. The disappearance and migration of wildlife. The spread of disease outbreaks.

A graph of cluster interdependencies. A matrix. A web, if you will. But instead of connecting machines or individuals, we're connecting automation platforms. Instead of cell-to-cell communication this is organ-to-organ communication.

(As I mentioned in the Tl;dr): A federated protocol for automation platforms could usher in a new era of collaboration between open-source projects, corporations, NGOs, and governments. This collaboration could happen, not at the latency of human handoffs, but at the speed of automation.