escapewindow: escape window (Default)

Since my last blog post, we've released seven more 1.0.0bX betas and a 2.0.0 final.

Since then, we've added beetmover-, balrog- and pushapk- scriptworker instance types, with chain of trust support, and upgraded them off of the now-retired 0.7.x branch. We now have a live_backing.log for easier treeherder log viewing. Our configs are now recursively frozen for more immutable goodness, and we have an unfreeze function as well. We're now running scriptworker instances against tier1 linux and android Firefox nightlies (and developer edition). And we have more contributors and contributions, including two releases pushed by jlund and jlorenzo.

Why 2.0.0? First, we introduced some backwards incompatible changes , and decided that the spirit of semver rule 5 included 1.0.0 betas. Why not 2.0.0b1? We're in production, tier 1, so let's stop futzing with betas and call it 2.0.0. The major version should be incrementing fairly rapidly, since we have a number of changes in the pipeline that may be backwards incompatible. Skipping 1.0.0 and getting used to larger major version numbers seems like a good first step.

Thanks Johan and Jordan, Mihai for the beetmover and balrog work, Pankaj Ahuja for the recursive freeze/unfreeze functions, and a bunch of other people on the Releng and Taskcluster teams for all their help!

escapewindow: escape window (Default)
Tl;dr: I just shipped scriptworker 1.0.0b1 (changelog) (github) (pypi).
This enables chain of trust verification for signing scriptworkers.

chain of trust

As I mentioned before, scriptworkers allow for more control and auditability around sensitive release-oriented tasks in Taskcluster. The Chain of Trust allows us to trace requests back to the tree and verify each previous task in the chain.

We have been generating Chain of Trust artifacts for a while now. These are gpg-signed json blobs with the task definition, artifact shas, and other information needed to verify the task and follow the chain back to the tree. However, nothing has been verifying these artifacts until now.

With the latest scriptworker changes, scriptworker follows and verifies the chain of trust before proceeding with its task. If there is any discrepancy in the verification step, it marks the task invalid before proceeding further. This is effectively a second factor to verify task request authenticity.

scriptworker 1.0.0b1

1.0.0b1 is largely two pull requests: scriptworker.yaml, which allows for more complex, commented config, and chain of trust verification, which grew a little large (275k patch !).

This is running on signing scriptworkers which sign nightlies on date-branch. We still need to support and update the other scriptworker instance types to enable end-to-end chain of trust verification.

escapewindow: escape window (Default)

I was planning for the 0.9.0 release to revolve around Chain of Trust verification. However, some pexpect async issues reared their ugly head. The fix is in scriptworker 0.9.0 (changelog) (github) (pypi) ; Chain of Trust verification will land in the next release, likely 1.0.0.

scriptworker 0.9.0

While working on the chain of trust verification code, I noticed that more than half the time I'd hit async pexpect errors during testing (we used async pexpect to sign gpg keys in the background).

This was just a personal annoyance until bug 1311111 - please start landing docker-worker pubkeys in gpg repo landed, and production signing scriptworker instances barfed on async pexpect errors.

The solution either called for fixing the upstream bug, or pulling the gpg homedir creation/rebuild out into its own process. We opted for the latter solution; so far this seems to be working much more smoothly.

escapewindow: escape window (Default)

Tl;dr: I shipped scriptworker 0.8.2 (changelog) (github) (pypi) and scriptworker 0.7.2 (changelog) (github) (pypi) last Monday (Oct 24), and am only now getting around to blogging about them.

These are patch releases, and fix the polling loop.

scriptworker 0.8.2

The fix for bug 1310120 - puppet reinstalls scriptworker on every run exposed some scriptworker loop bugs: because puppet was restarting scriptworker regularly, we never had a long-running instance before.

:kmoir and :Callek noticed that signing scriptworker wasn't signing (nagios alerts on stuck queues are pending =\ ). It was clear that while git polling was continuing on its merry way, the task polling was dying fairly quickly. We also needed more logging around fatal exceptions.

I addressed these issues in scriptworker 0.8.2. We also have our third scriptworker contributor, :jlorenzo! (:jlund was #2)

scriptworker 0.7.2

Since we already had the 0.7.1 release to help other scriptworker instance types from having to deal with the churn from pre-1.0 changes, I backported the 0.8.2 fixes to the 0.7.x branch and released 0.7.2 off of it. This involved enough merge conflicts that I'm hoping to avoid too many more 0.7.x releases.

escapewindow: escape window (Default)

Tl;dr: I just shipped scriptworker 0.8.1 (changelog) (github) (pypi) and scriptworker 0.7.1 (changelog) (github) (pypi)
These are patch releases, and are currently the only versions of scriptworker that work.

scriptworker 0.8.1

The json, embedded in the Azure XML, now contains a new property, hintId. Ideally this wouldn't have broken anything, but I was using that json dict as kwargs, rather than explicitly passing taskId and runId. This means that older versions of scriptworker no longer successfully poll for tasks.

This is now fixed in scriptworker 0.8.1.

scriptworker 0.7.1

Scriptworker 0.8.0 made some non-backwards-compatible changes to its config format, and there may be more such changes in the near future. To simplify things for other people working on scriptworker, I suggested they stay on 0.7.0 for the time being if they wanted to avoid the churn.

To allow for this, I created a 0.7.x branch and released 0.7.1 off of it. Currently, 0.8.1 and 0.7.1 are the only two versions of scriptworker that will successfully poll Azure for tasks.

escapewindow: escape window (Default)

Tl;dr: I just shipped scriptworker 0.8.0 (changelog) (RTD) (github) (pypi).
This is a non-backwards-compatible release.


By design, taskcluster workers are very flexible and user-input-driven. This allows us to put CI task logic in-tree, which means developers can modify that logic as part of a try push or a code commit. This allows for a smoother, self-serve CI workflow that can ride the trains like any other change.

However, a secure release workflow requires certain tasks to be less permissive and more auditable. If the logic behind code signing or pushing updates to our users is purely in-tree, and the related checks and balances are also in-tree, the possibility of a malicious or accidental change being pushed live increases.

Enter scriptworker. Scriptworker is a limited-purpose taskcluster worker type: each instance can only perform one type of task, and validates its restricted inputs before launching any task logic. The scriptworker instances are maintained by Release Engineering, rather than the Taskcluster team. This separates roles between teams, which limits damage should any one user's credentials become compromised.

scriptworker 0.8.0

The past several releases have included changes involving the chain of trust. Scriptworker 0.8.0 is the first release that enables gpg key management and chain of trust signing.

An upcoming scriptworker release will enable upstream chain of trust validation. Once enabled, scriptworker will fail fast on any task or graph that doesn't pass the validation tests.

February 2017



RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 27th, 2017 06:48 am
Powered by Dreamwidth Studios