escapewindow | Recent Entries

Premise (Tl;dr)

A federated protocol for automation platforms could usher in a new era of collaboration between open-source projects, corporations, NGOs, and governments. This collaboration could happen, not at the latency of human handoffs, but at the speed of automation.

(I had decided to revive this idea in a blog post before the renewed interest in Mastodon, but the timing is good. I had also debated whether to post this on US election day, but it may be a welcome distraction?)

Background

Once upon a time, an excited computer lab assistant showed my class the world wide web. Left-aligned black text with blue, underlined hypertext on a grey background, interspersed with low-resolution GIFs. Sites, hosted on other people's computers across the country, transferred across analog phone lines at over a thousand baud. "This," he said. "This will change everything."

Some two decades later, I blogged about blue sky, next-gen Release Engineering infrastructure without knowing how we'd get there. Stars aligned, and many teams put in hard work. Today, most of our best ideas made it into taskcluster, the massively scalable, cloud-agnostic automation platform that runs Mozilla's CI and Release Pipelines.

Firefox CI in January 2019: 7,139,432 tasks; 259.0 compute years; 1,071,533 unique workers; 383,411 maximum concurrent tasks #Mozilla #ContinuousIntegration #Taskcluster
— Chris Cooper (@ccooper) February 6, 2019

The still-unimplemented idea that's stuck with me the longest is something I referred to as cross-cluster communication.

Simple cases

In the simplest case, what if you could spin up two Taskcluster instances, and one cluster's tasks could depend on the other cluster's tasks and artifacts? Task 2 on Cluster B could remain unscheduled until Task 1 on Cluster A finished. At that point, Task 2 could move to `pending`, then `running`, and use Task 1's artifacts as input.

We might have an upstream dependency that tends to break everything in our pipeline whenever a new release of that dependency ships. Our current solution might involve pinning this dependency and periodically bumping the pin, debugging and bisecting any bustage at that time. But what if we could fire a set of unit and integration tests, not just when the dependency ships a new release, but whenever their CI runs a build off of trunk? We could detect the breaking change much more easily and quickly. Cross-cluster communication would allow for this cross-cluster dependency while leaving the security, billing, and configuration decisions in the hands of the individual cluster owners.

Or we could split up the FirefoxCI cluster. The release pipeline could move to a locked-down cluster for enhanced security monitoring. In contrast, the CI pipeline could remain accessible for easier debugging. We wouldn't want to split the hardware test pools. We have a limited number of those machines. Instead, we could create a third cluster with the hardware test pools, triggering tests against builds generated by both upstream clusters.

Of course, we wouldn't need this layer if we only wanted to connect to one or two upstreams. This concept starts to shine when we scale up. Many-to-many.

Many-to-many

Open source projects could collaborate in ways we haven't yet seen. But this isn't limited to just open source.

If the US were using Taskcluster, municipal offices could collaborate, each owning their own cluster but federating with the others. The state could aggregate each and generate state-wide numbers and reports. The federal government could aggregate the states' data and federate with other nations. NGOs and corporations could also access public data. Traffic. Carbon. The disappearance and migration of wildlife. The spread of disease outbreaks.

A graph of cluster interdependencies. A matrix. A web, if you will. But instead of connecting machines or individuals, we're connecting automation platforms. Instead of cell-to-cell communication this is organ-to-organ communication.

(As I mentioned in the Tl;dr): A federated protocol for automation platforms could usher in a new era of collaboration between open-source projects, corporations, NGOs, and governments. This collaboration could happen, not at the latency of human handoffs, but at the speed of automation.

Current Mood: anxious

One year ago today, I rejoined Mozilla. I'm glad to be back :) Some highlights:

taskcluster-client.py: I added py3 support, pre-generated code, and async support.
scriptworker 0.1.0 through 2.1.0.
signingscript to run signing tasks in scriptworker
a py3-compatible, standalone signtool
docker-signing-server for testing purposes
dephash, for python dependency hashing
configman py3 support, a project I had started in 2015
chain of trust generation + verification
helped get linux and android nightlies to tier1 in taskcluster
I wrote a post about 100% test coverage
I wrote a post about generically packaged tools

There are already some cool things on the horizon for this next year, my [2nd] second year. I'm looking forward to it.

Since my last blog post, we've released seven more 1.0.0bX betas and a 2.0.0 final.

Since then, we've added beetmover-, balrog- and pushapk- scriptworker instance types, with chain of trust support, and upgraded them off of the now-retired 0.7.x branch. We now have a live_backing.log for easier treeherder log viewing. Our configs are now recursively frozen for more immutable goodness, and we have an unfreeze function as well. We're now running scriptworker instances against tier1 linux and android Firefox nightlies (and developer edition). And we have more contributors and contributions, including two releases pushed by jlund and jlorenzo.

Why 2.0.0? First, we introduced some backwards incompatible changes , and decided that the spirit of semver rule 5 included 1.0.0 betas. Why not 2.0.0b1? We're in production, tier 1, so let's stop futzing with betas and call it 2.0.0. The major version should be incrementing fairly rapidly, since we have a number of changes in the pipeline that may be backwards incompatible. Skipping 1.0.0 and getting used to larger major version numbers seems like a good first step.

Thanks Johan and Jordan, Mihai for the beetmover and balrog work, Pankaj Ahuja for the recursive freeze/unfreeze functions, and a bunch of other people on the Releng and Taskcluster teams for all their help!

Tl;dr: I just shipped scriptworker 1.0.0b1 (changelog) (github) (pypi).
This enables chain of trust verification for signing scriptworkers.

chain of trust

As I mentioned before, scriptworkers allow for more control and auditability around sensitive release-oriented tasks in Taskcluster. The Chain of Trust allows us to trace requests back to the tree and verify each previous task in the chain.

We have been generating Chain of Trust artifacts for a while now. These are gpg-signed json blobs with the task definition, artifact shas, and other information needed to verify the task and follow the chain back to the tree. However, nothing has been verifying these artifacts until now.

With the latest scriptworker changes, scriptworker follows and verifies the chain of trust before proceeding with its task. If there is any discrepancy in the verification step, it marks the task invalid before proceeding further. This is effectively a second factor to verify task request authenticity.

scriptworker 1.0.0b1

1.0.0b1 is largely two pull requests: scriptworker.yaml, which allows for more complex, commented config, and chain of trust verification, which grew a little large (275k patch !).

This is running on signing scriptworkers which sign nightlies on date-branch. We still need to support and update the other scriptworker instance types to enable end-to-end chain of trust verification.

It's been a while since I released dephash. Long enough that pip 9.0.1 and hashin 0.7.1 were released... I had to land some fixes to support those.

We now have dephash 0.3.0! (github) (changelog) (pypi)

Also, I'm now watching the github repo, so I should see any new issues or pull requests =\

I was planning for the 0.9.0 release to revolve around Chain of Trust verification. However, some pexpect async issues reared their ugly head. The fix is in scriptworker 0.9.0 (changelog) (github) (pypi) ; Chain of Trust verification will land in the next release, likely 1.0.0.

scriptworker 0.9.0

While working on the chain of trust verification code, I noticed that more than half the time I'd hit async pexpect errors during testing (we used async pexpect to sign gpg keys in the background).

This was just a personal annoyance until bug 1311111 - please start landing docker-worker pubkeys in gpg repo landed, and production signing scriptworker instances barfed on async pexpect errors.

The solution either called for fixing the upstream bug, or pulling the gpg homedir creation/rebuild out into its own process. We opted for the latter solution; so far this seems to be working much more smoothly.

Tl;dr: I shipped scriptworker 0.8.2 (changelog) (github) (pypi) and scriptworker 0.7.2 (changelog) (github) (pypi) last Monday (Oct 24), and am only now getting around to blogging about them.

These are patch releases, and fix the polling loop.

scriptworker 0.8.2

The fix for bug 1310120 - puppet reinstalls scriptworker on every run exposed some scriptworker loop bugs: because puppet was restarting scriptworker regularly, we never had a long-running instance before.

:kmoir and :Callek noticed that signing scriptworker wasn't signing (nagios alerts on stuck queues are pending =\ ). It was clear that while git polling was continuing on its merry way, the task polling was dying fairly quickly. We also needed more logging around fatal exceptions.

I addressed these issues in scriptworker 0.8.2. We also have our third scriptworker contributor, :jlorenzo! (:jlund was #2)

scriptworker 0.7.2

Since we already had the 0.7.1 release to help other scriptworker instance types from having to deal with the churn from pre-1.0 changes, I backported the 0.8.2 fixes to the 0.7.x branch and released 0.7.2 off of it. This involved enough merge conflicts that I'm hoping to avoid too many more 0.7.x releases.

Tl;dr: I just shipped scriptworker 0.8.1 (changelog) (github) (pypi) and scriptworker 0.7.1 (changelog) (github) (pypi)
These are patch releases, and are currently the only versions of scriptworker that work.

scriptworker 0.8.1

The json, embedded in the Azure XML, now contains a new property, hintId. Ideally this wouldn't have broken anything, but I was using that json dict as kwargs, rather than explicitly passing taskId and runId. This means that older versions of scriptworker no longer successfully poll for tasks.

This is now fixed in scriptworker 0.8.1.

scriptworker 0.7.1

Scriptworker 0.8.0 made some non-backwards-compatible changes to its config format, and there may be more such changes in the near future. To simplify things for other people working on scriptworker, I suggested they stay on 0.7.0 for the time being if they wanted to avoid the churn.

To allow for this, I created a 0.7.x branch and released 0.7.1 off of it. Currently, 0.8.1 and 0.7.1 are the only two versions of scriptworker that will successfully poll Azure for tasks.

Tl;dr: I just shipped scriptworker 0.8.0 (changelog) (RTD) (github) (pypi).
This is a non-backwards-compatible release.

background

By design, taskcluster workers are very flexible and user-input-driven. This allows us to put CI task logic in-tree, which means developers can modify that logic as part of a try push or a code commit. This allows for a smoother, self-serve CI workflow that can ride the trains like any other change.

However, a secure release workflow requires certain tasks to be less permissive and more auditable. If the logic behind code signing or pushing updates to our users is purely in-tree, and the related checks and balances are also in-tree, the possibility of a malicious or accidental change being pushed live increases.

Enter scriptworker. Scriptworker is a limited-purpose taskcluster worker type: each instance can only perform one type of task, and validates its restricted inputs before launching any task logic. The scriptworker instances are maintained by Release Engineering, rather than the Taskcluster team. This separates roles between teams, which limits damage should any one user's credentials become compromised.

scriptworker 0.8.0

The past several releases have included changes involving the chain of trust. Scriptworker 0.8.0 is the first release that enables gpg key management and chain of trust signing.

An upcoming scriptworker release will enable upstream chain of trust validation. Once enabled, scriptworker will fail fast on any task or graph that doesn't pass the validation tests.

Tl;dr: I reached 100% test coverage (as measured by coverage) on several of my projects, and I'm planning on continuing that trend for the important projects I maintain. We were chatting about this, and I decided to write a blog post about how to get to 100% test coverage, and why.

Why 100% test coverage?
How did you reach 100% test coverage [in python]?

Why 100% test coverage?

Find unreachable code

Back in 2014, Apple issued a critical security update for iOS, to patch a bug known as #gotofail.

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus err;
    ...

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...

fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

The second highlighted goto fail caused SSLVerifySignedServerKeyExchange calls to jump to the fail block with a successful err. This bypassed the actual signature verification and returned a blanket success. With this bug in place, malicious servers would have been able to serve SSL certs with mismatched or missing private keys, and still look legit.

One might argue for strict indentation, using curly braces for all if blocks, better coding practices, or just Better Reviews. But 100% test coverage would have found it.

"For the bug at hand, even the simplest form of coverage analysis, namely line coverage, would have helped to spot the problem: Since the problematic code results from unreachable code, there is no way to achieve 100% line coverage.

"Therefore, any serious attempt to achieve full statement coverage should have revealed this bug."
- Arie van Deursen

Even though python has meaningful indentation, that doesn't mean it's foolproof against unreachable code. Simply marking each line as visited is enough to protect against this class of bug.

And even when there isn't a huge security issue at stake, unreachable code can clutter the code base. At best, it's dead weight, extra lines of code to work around, to refactor, to have to read when debugging. At worst, it's confusing, misleading, and hides serious bugs. Don't do unreachable code.

Uncovering bugs while writing tests

My first attempt at 100% test coverage was on a personal project, on a whim. I knew all my code was good, but full coverage was something I had never seen. It was a lark. A throwaway bit of effort to see if I could do it, and then something I could mention offhand. "Oh yeah, and by the way, it has 100% test coverage."

Imagine my surprise when I uncovered some non-trivial bugs.

This has been the norm. Every time I put forward a sustained effort to add test coverage to a project, I uncover some things that need fixing. And I fix them. Even if its as small as a confusing variable name or a complex bit of logic that could use a comment or two. And often it's more than that.

Certainly, if I'm able to find and fix bugs while writing tests, I should be able to find and fix those bugs when they're uncovered in the wild, no? The answer is yes. But. It's a difference of headspace and focus.

When I'm focused on isolating a discrete chunk of code for testing, it's easier to find wrong or confusing behavior than when when I'm deploying some large, complex production process that may involve many other pieces of software. There are too many variables, too little headspace for the problem, and often too little time. And Murphy says those are the times these problems are likely to crop up. I'd rather have higher confidence at those times. Full test coverage helps reduce those variables when the chips are down.

A helpful guide for future changes

Software is the proverbial shark: it's constantly moving, or it dies. Small patches are pretty easy to eyeball and tell if they're good. Well written and maintained code helps here too, as does documentation, mentoring, and code reviews. But only tests can verify that the code is still good, especially when dealing with larger patches and refactors.

Tests can help new contributors catch problems before they reach code review or production. And sometimes after a long hiatus and mental context switch, I feel like a new contributor to my own projects. The new contributor you are helping may be you in six months.

Sometimes your dependencies release new versions, and it's nice to have a full set of tests to see if something will break before rolling anything out to production. And when making large changes or refactoring, tests can be a way of keeping your sanity.

whenever someone else's test prevents me from landing new code with a bug, i'm hella grateful pic.twitter.com/s1Yq6Es4ex
— Adrienne Porter Felt (@__apf__) August 26, 2016

Momentum is contagious

There's the social aspect of codebase maintenance. If you want to ask a new contributor to write tests for their code, that's an easier ask when you're at 100% coverage than when you're at, say, 47%.

This can also carry over to your peers' and coworkers' projects. Enough momentum, and you can build an ecosystem of well-tested projects, where the majority of all bugs are caught before any code ever lands on the main repo.

Technical wealth

Have you read Forget Technical Debt - Here's How to Build Technical Wealth? It's well worth the read.

Automated testing and self-validation loom large in that post. She does caution against 100% test coverage as an end in itself in the early days of a startup, for example, but that's when time to market and profitability are opportunity costs. When developing technical wealth at an existing enterprise, the long term maintenance benefits speak for themselves.

How did you reach 100% test coverage [in python]?

Clean architecture

If you haven't watched Clean Architecture in Python, add it to your list. It's worth the time. It's the Python take on Uncle Bob Martin's original talk. Besides making your code more maintainable, it makes your code more testable. When I/O is cleanly separated from the logic, the logic is easily testable, and the I/O portions can be tested in mocked code and integration tests.

parse_args()

Some people swear by click. I'm not quite sold on it, yet, but it's easy to want something other than optparse or argparse when faced with a block of option logic like in this main() function. How can you possibly sanely test all of that, especially when right after the option parsing we go into the main logic of the script?

I pulled all of the optparse logic into its own function, and I called it with sys.argv[:1]. That let me start testing optparse tests, separate from my signing tests. Signtool is one of those projects where I haven't yet reached 100% coverage, but it's high on my list.

Toss `if name == 'main':`

if __name__ == '__main__': is one of the first things we learn in python. It allows us to mix script and library functionality into the same file: when you import the file, __name__ is the name of the module, and the block doesn't execute, and when you run it as a script, __name__ is set to __main__, and the block executes. (More detailed descriptions here.)

There are ways to skip these blocks in code coverage. That might be the easiest solution if all you have under if __name__ == '__main__': is a call to main(). But I've seen scripts where there are pages of code inside this block, all code-coverage-unfriendly.

I forget where I found my solution. This answer suggests __name__ == '__main__' and main(). I rewrote this as

def main(name=None):
    if name in (None, '__main__'):
        ...


main(name=__name__)

Either way, you don't have to worry about covering the special if block. You can mock the heck out of main() and get to 100% coverage that way. (See the mock and integration section.)

Rewrite the logic

mimetypes.guess_type('foo.log') returns 'text/plain' on OSX, and None on linux. I fixed it like this:

content_type, _ = mimetypes.guess_type(path)
if content_type is None and path.endswith('.log'):
    content_type = 'text/plain'

And that works. You can get full coverage on linux: a "foo.log" will enter the if block, and a "foo.unknown_extension" will skip it. But on OSX you'll never satisfy the conditional; the content_type will never be None for a foo.log.

More ugly mocking, right? You could. But how about:

content_type, _ = mimetypes.guess_type(path)
if path.endswith('.log'):
    content_type = content_type or 'text/plain'

Coverage will mark that as covered on both linux and OSX.

@pytest.mark.parametrize

I used to use nosetests for everything, just because that's what I first encountered in the wild. When I first hacked on PGPy I had to get accustomed to pytest, but that was pretty easy. One thing that drew me to it was @pytest.mark.parametrize (which is an alternate spelling.)

With @pytest.mark.parametrize, you can loop through multiple inputs with the same logic, without having to write the loops yourself. Like this. Or, for the PGPy unicode example, this.

This is doable in nosetests, but pytest encourages it by making it easy.

fixtures

In nosetests, I would often write small functions to give me objects or data structures to test against. That, or I'd define constants, and I'd be careful to avoid changing any mutable constants, e.g. dicts, during testing.

pytest fixtures allow you to do that, but more simply. With a @pytest.fixture(scope="function"), the fixture will get reset every function, so even if a test changes the fixture, the next test will get a clean copy.

There are also built-in fixtures, like tmpdir, mocker, and event_loop, which allow you to more easily and succintly perform setup and cleanup around your tests. And there are lots of additional modules which add fixtures or other functionality to pytest.

Here are slides from a talk on advanced fixture use.

mock and integration

It's certainly possible to test the I/O and loop-laden main sections of your project. For unit tests, it's a matter of mocking the heck out of it. Pytest's mocker fixture helps here, although it's certainly possible to nest a bunch of with mock.patch blocks to avoid mocking that code past the end of your test.

Even if you have to include some I/O in a function, that doesn't mean you have to use mock to test it. Instead of

def foo():
    requests.get(...)

you could do

def foo(request_function=requests.get):
    request_function(...)

Then when you unit test, you can override request_function with something that raises the exception or returns the value that you want to test.

Finally, even after you get to 100% coverage on unit tests, it's good to add some integration testing to make sure the project works in the real world. For scriptworker, these tests are marked with @pytest.mark.skipif(os.environ.get("NO_TESTS_OVER_WIRE"), reason=SKIP_REASON), so we can turn them off in Travis.

Use pytest!

Besides parameterization and the mocker fixture, pytest is easily superior to nosetests.

There's parallelization with pytest-xdist. The fixtures make per-test setup and cleanup a breeze. Pytest is more succinct: assert x == y instead of self.assertEquals(x, y) ; because we don't have the self to deal with, the tests can be functions instead of methods. The following

with pytest.raises(OSError):
    function(*args, **kwargs)

is a lot more legible than self.assertRaises(OSError, function, *args, **kwargs) . You've got @pytest.mark.asyncio to await a coroutine; the event_loop fixture is there for more asyncio testing goodness, with auto-open/close for each test.

Miscellaneous hints

Use tox, coveralls, and travis or taskcluster-github for per-checkin test and coverage results.

Use coverage>=4.2 for async code. Pre-4.2 can't handle the async.

Use coverage's branch coverage by setting branch = True in your .coveragerc.

I haven't tried hypothesis or other behavior- or property- based testing yet. But hypothesis does use pytest. There's an issue open to integrate them more deeply.

It looks like I need to learn more about mutation testing.

What other tricks am I missing?

Mozilla RelEng uses a tool called signtool for its signing. Historically, this has been part of the build/tools repo, which we've been wanting to move away from.

I recently ported build-tools to py3, essentially porting the libraries and unittests but ignoring most of the scripts. Signing in a py3 scriptworker was the impetus behind this. However, after the port, I noticed signtool stopped working; I suspect I had hacked my virtualenv to debug, and something about that got it working in py3.

After getting permission to hard fork signtool, I ported it to use requests, and refactored it a bit to make it a little easier to test. There are still some overly complex functions and I only have 65% test coverage right now, but it's working in py3.

The code is here and the latest package is here.

Back in 2015 I tried porting configman to python 3, but then I hit a wall with the DotDict __getattr__ hack. I wrote about that here.

For some reason I took another stab at it last week, after leveling up my python3 porting skills. I got the tests green, submitted a PR, which is now merged and released. Woot!

discovering pip tools

Last week, I wrote a long blog draft about python package pinning. Then I found this. It's well written, and covers many of the points I wanted to make. The author perfectly summarizes the divide between package development and package deployment to production:

Don't pin by default when you're building libraries! Only use pinning for end products.

So I dug deeper. pip-tools #303 perfectly describes the problem I was trying to solve:

Given a minimal dependency list,
generate a full, expanded dependency list, and

pin that full dependency list by version and hash.

Fixing pip-tools #303 and upstreaming seemed like the ideal solution.

~~However, pip-tools is broken with pip 8.1.2.~~ The Python Packaging Authority (PyPA) states that pip's public API is the CLI, and pip-tools could potentially break with every new pip patch release. This is solvable by either using pypa/packaging directly, or switching pip-tools to use the CLI. That's considerably more work than just integrating hashing capability into pip-tools. ([EDIT] pip-tools now works with pip 8.1.2, but shoehorning hashes into it is a non-trivial task. I do hope someone tackles it though.)

But I had already whipped up a quick'n'dirty python script that used the pip CLI. (I had assumed that bypassing the internal API was a hack, but evidently this is the supported way of doing things.) So, back to the original blog post, but much shorter:

dephash gen

dephash gen takes a minimal requirements file, and generates an expanded dependency list, pinned by version and hash.

$ cat requirements-dev.txt

requests
arrow

$ dephash gen requirements-dev.txt > requirements-prod.txt

$ cat requirements-prod.txt

# Generated from dephash.py + hashin.py
arrow==0.8.0 \
    --hash=sha512:b6c01970d408e1169d042f593859577...
python-dateutil==2.5.3 \
    --hash=sha512:413b935321f0a65fd8e8ba49990acd5... \
    --hash=sha512:d8e28dad57ea85663962f4518faea0e... \
    --hash=sha512:107ff2eb6f0715996061262b70844ec...
requests==2.10.0 \
    --hash=sha512:e5b7d20c4d692b2655c13fa177b8462... \
    --hash=sha512:05c6a1a742d31511ca4d3f39534e3e0...
six==1.10.0 \
    --hash=sha512:a41b40b720c5267e4a47ffb98cdc792... \
    --hash=sha512:9a53b7bc8f7e8b358c930eaecf91cc5...

Developers can work against requirements-dev.txt, with the latest available dependencies. At the same time, production can be pinned against specific package versions+hashes for stability and security.

dephash outdated

dephash outdated PATH checks whether PATH contains outdated packages. PATH can be a requirements file or virtualenv.

$ cat requirements-outdated.txt

six==1.9.0

$ dephash outdated requirements-outdated.txt

Found outdated packages in requirements-outdated.txt:
six (1.9.0) - Latest: 1.10.0 [wheel]

or,

$ virtualenv -q venv

$ venv/bin/pip install -q -r requirements-outdated.txt

$ dephash outdated venv

Found outdated packages in venv:
six (1.9.0) - Latest: 1.10.0 [wheel]

This just uses pip list --outdated on the backend. I'm tentatively thinking a whitelist of known-outdated dependencies might help here, but I haven't written it yet.

wrapup

I still think the glorious future involves fixing pip-tools #303 and getting pip-tools pointed at a supported pypa API. And/or getting hashin or pip-tools upstreamed into pip. But in the meantime, there's dephash.

(I'm leaving my package vendoring musings and python package wishlist for future blogpost(s).)

tl;dr
internal benefits to generically packaged tools
learning from history: tinderbox
mozharness and scriptharness
treeherder and taskcluster
other tools

tl;dr

One of the reasons I'm back at Mozilla is to work in-depth with some exciting new tools and infrastructure, at scale. However, I wish these tools could be used equally well by employees and non-employees. I see some efforts to improve this. But if this is really important to us, we need to make it a point of emphasis.

If we do this, we can benefit from a healthier, extended community. There are also internal benefits to making our tools packaged in a generic way. I'll go into these in the next section.

I did start to contact some tool maintainers, and so far the response is good. I'll continue doing so. Hopefully I can write a followup blog post about how efforts are under way to make generically packaged tools a reality.

internal benefits to generically packaged tools

Besides the strengthened community, there are other internal benefits.

upgrades
Once installation is packaged and automated, an upgrade to a service might be:
- spin up a new service
- test it
- send over some traffic (applicable if the service is load balanced)
- go/no-go
- cut over to the appropriate service and turn off the other one.
This entire process can be fully automated. Once this process is smooth enough, upgrading a service can be seamless and relatively worry free.
disaster recovery
If a service is only installable manually, a disaster recovery scenario might involve people working around the clock to reinstall a service.
Once the installation is automated and configurable, this changes. A cold backup solution might be similar to the above upgrade scenario. If disaster strikes, have someone install a new one from the automation, or have a backup instance already installed, ready for someone to switch over.
A hot backup solution might involve having multiple load balanced services running across regions, with automatic failovers. The automated install helps guarantee that each node in the cluster is configured correctly, without human error.
good first bugs
(or intern projects, or GSOC projects, or...)
The more special-snowflake and Mozilla-specific our tools are, the more likely the tool will be tied closely to other Mozilla-specific services, so a seemingly simple change might require touching many different codebases. These types of tools are also more likely to require VPN or special LDAP access that present barriers to new contributors.
If a new contributor is able to install tools locally, that guarantees that they can work on standalone bugs/projects involving those tools. And successful good first bugs and intern/GSOC type projects directly lead to a stronger contributor base.
special projects
At various team work weeks years past, we brainstormed being able to launch entire chunks of infrastructure up in self-contained units. These could handle project branch type work. When the code was merged back into trunk, we could archive the data and shut down the instances used.
This concept also works for special projects: things that don't fit within the standard workflow.
If we can spin up services in a separate, network isolated area, riskier or special-requirement work (whether in terms of access control, user permissions, partner secrets, etc) could happen there without affecting production.
self-testing
Installing the package from scratch is the test for the generic packaging feature. The more we install it, the smaller the window of changes we need to inspect for installation bustage. This is the same as any other software feature.
Having an install test for each tool gives us reassurances that the next time we need to install the service (upgrade, disaster recovery, etc.) it'll work.

learning from history: tinderbox

In 2000, a developer asked me to install tinderbox, a continuous integration tool written and used at Netscape. It would allow us see the state of the tree, and minimize bustage.

One of the first things I saw was this disclaimer:

This is not very well packaged code.  It's not packaged at all.  Don't
come here expecting something you plop in a directory, twiddle a few
things, and you're off and using it.  Much work has to be done to get
there.  We'd like to get there, but it wasn't clear when that would be,
and so we decided to let people see it first.

Don't believe for a minute that you can use this stuff without first
understanding most of the code.

I managed to slog through the steps and get a working tinderbox/bonsai/mxr install up and running. However, since then, I've met a number of other people who had tried and given up.

I ended up joining Netscape in 2001. (My history with tinderbox helped me in my interview.) An external contributor visited and presented tinderbox2 to the engineering team. It was configurable. It was modular. It removed Netscape-centric hardcodes.

However, it didn't fully support all tinderbox1 features, and certain default behaviors were changed by design. Beyond that, Netscape employees already had fully functional, well maintained instances that worked well for us. Rather than sinking time into extending tinderbox2 to cover our needs, we ended up staying with the disclaimered, unpackaged tinderbox1. And that was the version running at tinderbox.mozilla.org, until its death in May 2014.

For a company focused primarily on shipping a browser, shipping the tools used to build that browser isn't necessarily a priority. However, there were some opportunity costs:

Tinderbox1 continued to suffer from the same large barrier of entry, stunting its growth.
I don't know how widely tinderbox2 was used, but I imagine adoption at Netscape would have been a plus for the project. (I did end up installing tinderbox2 post-Netscape.)
A larger, healthier community could have result in upstreamed patches, and a stronger overall project in the long run.
People who use the same toolset may become external contributors or employees to the project in general (like me). People who have poor impressions of a toolset may be less interested in joining as contributors or employees.

mozharness and scriptharness

In my previous stint at Mozilla, I wrote mozharness, which is a python framework for scripts.

I intentionally kept mozilla-specific code under mozharness.mozilla and generic mozharness code under mozharness.base. The idea was to make it easier for external users to grab a copy of mozharness and write their own scripts and modules for a non-Mozilla project. (By "non-Mozilla" and "external user", I mean anyone who wants to automate software anywhere.)

However, after I left Mozilla, I didn't use mozharness for anything. Why not?

There's a non-trivial learning curve for people new to the project, and the benefits of adopting mozharness are most apparent when there's a certain level of adoption. I was working at time scales that didn't necessarily lend themselves to this.
I intentionally kept mozharness clone-and-run. I think this was the right model at the time, to lower the barrier for using mozharness until it had reached a certain level of adoption. Clone-and-run made it easier to use mozharness in buildbot, but makes it harder to install or use just the mozharness.base module.
We did our best to keep Mozilla-isms out of mozharness.base via review. However, this would have been more successful with either an external contributer speaking up before we broke their usage model, or automated tests, or both.

So even with the best intentions in mind, I had ended up putting roadblocks in the way of external users. I didn't realize their scope until I was fully in the mindset of an external user myself.

I wrote scriptharness to try to address these problems (among others):

I revisited certain decisions that made mozharness less approachable. The mixins and monolithic Script object. Requiring a locked config. Missing docstrings and tests.
I made scriptharness resources available at standard locations (generic packages on pypi, source at github, full docs at readthedocs).
Since it's a self-contained package, it's usable here or elsewhere. Since it's written to solve a generic problem rather than a Mozilla-specific problem, it's unencumbered by Mozilla-specific solutions.

I'd like to backport some of the better ideas from scriptharness to mozharness, to address some of these issues.

treeherder and taskcluster

After I left Mozilla, on several occasions we wanted to use other Mozilla tools in a non-Mozilla environment. As a general rule, this didn't work.

Continuous Integration (CI) Dashboard
We had multiple Jenkins servers, each with a partial picture of our set of build+test jobs. Figuring out the state of the code base was complex and a specialized skill. Why couldn't we have one dashboard showing a complete view?
I took a look at Treeherder. It has improved upon the original TBPL, but is designed to work specifically with Mozilla's services and workflows. I found it difficult to set up outside of a Mozilla environment.
CI Infrastructure
We were investigating other open source CI solutions. There are many solutions for server-side apps, or linux-only solutions, or cross-platform at small- to medium- scale. TaskCluster is the only one I know of that's cross-platform at massive scale.
When we looked, all the tutorials and docs had to do with using the existing Mozilla production instance, which required a mozilla.com email address at the time. There are no docs for setting up TaskCluster itself.
(Spoiler: I hear it may be a 2H project :D :D :D )
Single Sign-On
An open source, trusted SSO solution sounded like a good thing to implement.
However, we found out Persona has been EOL'd. We didn't look too closely at the implementation details after that.

(These are just the tools I tried to use in my 1 1/2 years away from Mozilla. There are more tools that might be useful to the outside world. I'll go into those in the next section.)

There are reasons behind each of these, and they may make a lot of sense internally. I'm not trying to place any blame or point fingers. I'm only raising the question of whether our future plans include packaging our tools for outside use, and if not, why not?

We're facing a similar decision to Netscape's. What's important? We can say we're a company that ships a browser, and the tools only exist for that purpose. Or we can put efforts towards making these tools useful in many environments. Generically packaged, with documentation that doesn't start with a disclaimer about how difficult they are to set up.

It's easy to say we'd like to, but we're too busy with ______. That's the gist of the tinderbox disclaimer. There are costs to designing and maintaining tools for use outside of one's subset of needs. But as long as packaging tools for outside use is not a point of emphasis, we'll maintain the status quo.

other tools

The above were just the tools that we tried to install. I asked around and built a list of Mozilla tools that might be useful to the outside world. I'm not sure if I have all the details correct; please correct me if I'm wrong!

mach - if all the mozilla-central-specific functions were moved to libraries, could this be useful for others?
bughunter - I don't know enough to say. This looks like a crash/assertion finder, tying into Socorro and bugzilla.
balrog - this now has docker support, which is promising for potential outside use.
marionette (already used by others)
reftest (already used by others)
pulse - this is a taskcluster dep.
Bugzilla - I've seen lots of instances successfully used at many other companies. Its installation docs are here.
I also hear that Socorro is successfully used at a number of other companies.

So we already have some success here. I'd love to see it extended -- more tools, and more use cases, e.g. supporting bugzilla or jira as the bug db backend when applicable.

I don't know how much demand there will be, if we do end up packaging these tools in a way that others can use them. But if we don't package them, we may never know. And I do know that there are entire companies built around shipping tools like these. We don't have to drop any existing goals on the floor to chase this dream, but I think it's worth pursuing in the future.

A few people have suggested I look at other packages for config solutions. I thought I'd record some of my thoughts on the matter. Let's look at requirements first.

Requirements

Commandline argument support. When running scripts, it's much faster to specify some config via the commandline than always requiring a new config file for each config change.
Default config value support. If a script assumes a value works for most cases, let's make it default, and allow for overriding those values in some way.
Config file support. We need to be able to read in config from a file, and in some cases, several files. Some config values are either too long and unwieldy to pass via the commandline, and some config values contain characters that would be interpreted by the shell. Plus, the ability to use diff and version control on these files is invaluable.
Multiple config file type support. json, yaml, etc.
Adding the above three solutions together. The order should be: default config value -> config file -> commandline arguments. (The rightmost value of a configuration item wins.)
Config definition and validation. Commandline options are constrained by the options that are defined, but config files can contain any number of arbitrary key/value pairs.
The ability to add groups of commandline arguments together. Sometimes familes of scripts need a common set of commandline options, but also need the ability to add script-specific options. Sharing the common set allows for consistency.
The ability to add config definitions together. Sometimes families of scripts need a common set of config items, but also need the ability to add script-specific config items.
Locking and/or logging any changes to the config. Changing config during runtime can wreak havoc on the debugability of a script; locking or logging the config helps avoid or mitigate this.
Python 3 support, and python 2.7 unicode support, preferably unicode-by-default.
Standardized solution, preferably non-company and non-language specific.
All-in-one solution, rather than having to use multiple solutions.

Packages and standards

argparse

Argparse is the standardized python commandline argument parser, which is why configman and scriptharness have wrapped it to add further functionality. Its main drawbacks are lack of config file support and limited validation.

Commandline argument support: yes. That's what it's written for.
Default config value support: yes, for commandline options.
Config file support: no.
multiple config file type support: no.
Adding the above three solutions together: no. The default config value and the commandline arguments are placed in the same Namespace, and you have to use the parser.get_default() method to determine whether it's a default value or an explicitly set commandline option.
Config definition and validation: limited. It only covers commandline option definition+validation, and there's the required flag but not a if foo is set, bar is required type validation. It's possible to roll your own, but that would be script-specific rather than part of the standard.
Adding groups of commandline arguments together: yes. You can take multiple parsers and make them parent parsers of a child parser, if the parent parsers have specified add_help=False
Adding config definitions together: limited, as above.
The ability to lock/log changes to the config: no. argparse.Namespace will take changes silently.
Python 3 + python 2.7 unicode support: yes.
Standardized solution: yes, for python. No for other languages.
All-in-one solution: no, for the above limitations.

configman

Configman is a tool written to deal with configuration in various forms, and adds the ability to transform configs from one type to another (e.g., commandline to ini file). It also adds the ability to block certain keys from being saved or output. Its argparse implementation is deeper than scriptharness' ConfigTemplate argparse abstraction.

Its main drawbacks for scriptharness usage appear to be lack of python 3 + py2-unicode-by-default support, and for being another non-standardized solution. I've given python3 porting two serious attempts, so far, and I've hit a wall on the dotdict __getattr__ hack working differently on python 3. My wip is here if someone else wants a stab at it.

Commandline argument support: yes.
Default config value support: yes.
Config file support: yes.
Multiple config file type support: yes.
Adding the above three solutions together: not as far as I can tell, but since you're left with the ArgumentParser object, I imagine it'll be the same solution to wrap configman as argparse.
Config definition and validation: yes.
Adding groups of commandline arguments together: yes.
Adding config definitions together: not sure, but seems plausible.
The ability to lock/log changes to the config: no. configman.namespace.Namespace will take changes silently.
Python 3 support: no. Python 2.7 unicode support: there are enough str() calls that it looks like unicode is a second class citizen at best.
Standardized solution: no.
All-in-one solution: no, for the above limitations.

docopt

Docopt simplifies the commandline argument definition and prettifies the help output. However, it's purely a commandline solution, and doesn't support adding groups of commandline options together, so it appears to be oriented towards relatively simple script configuration. It could potentially be added to json-schema definition and validation, as could the argparse-based commandline solutions, for an all-in-two solution. More on that below.

json-schema

This looks very promising for an overall config definition + validation schema. The main drawback, as far as I can see so far, is the lack of commandline argument support.

A commandline parser could generate a config object to validate against the schema. (Bonus points for writing a function to validate a parser against the schema before runtime.) However, this would require at least two definitions: one for the schema, one for the hopefully-compliant parser. Alternately, the schema could potentially be extended to support argparse settings for various items, at the expense of full standards compatiblity.

There's already a python jsonschema package.

Commandline argument support: no.
Default config value support: yes.
Config file support: I don't think directly, but anything that can be converted to a dict can be validated.
Multiple config file type support: no.
Adding the above three solutions together: no.
Config definition and validation: yes.
Adding groups of commandline arguments together: no.
Adding config definitions together: sure, you can add dicts together via update().
The ability to lock/log changes to the config: no.
Python 3 support: yes. Python 2.7 unicode support: I'd guess yes since it has python3 support.
Standardized solution: yes, even cross-language.
All-in-one solution: no, for the above limitations.

scriptharness 0.2.0 ConfigTemplate + LoggingDict or ReadOnlyDict

Scriptharness currently extends argparse and dict for its config. It checks off the most boxes in the requirements list currently. My biggest worry with the ConfigTemplate is that it isn't fully standardized, so people may be hesitant to port all of their configs to it.

An argparse/json-schema solution with enough glue code in between might be a good solution. I think ConfigTemplate is sufficiently close to that that adding jsonschema support shouldn't be too difficult, so I'm leaning in that direction right now. Configman has some nice behind the scenes and cross-file-type support, but the python3 and __getattr__ issues are currently blockers, and it seems like a lateral move in terms of standards.

An alternate solution may be BYOC. If the scriptharness Script takes a config object that you built from somewhere, and gives you tools that you can choose to use to build that config, that may allow for enough flexibility that people can use their preferred style of configuration in their scripts. The cost of that flexibility is familiarity between scriptharness scripts.

Commandline argument support: yes.
Default config value support: yes, both through argparse parsers and script initial_config.
Config file support: yes. You can define multiple required config files, and multiple optional config files.
Multiple config file type support: no. Mozharness had .py and .json. Scriptharness currently only supports json because I was a bit iffy about execfileing python again, and PyYAML doesn't always install cleanly everywhere. It's on the list to add more formats, though. We probably need at least one dynamic type of config file (e.g. python or yaml) or a config-file builder tool.
Adding the above three solutions together: yes.
Config definition and validation: yes.
Adding groups of commandline arguments together: yes.
Adding config definitions together: yes.
The ability to lock/log changes to the config: yes. By default Scripts use LoggingDict that logs runtime changes; StrictScript uses a ReadOnlyDict (sams as mozharness) that prevents any changes after locking.
Python 3 and python 2.7 unicode support: yes.
Standardized solution: no. Extended/abstracted argparse + extended python dict.
All-in-one solution: yes.

Corrections, additions, feedback?

As far as I can tell there is no perfect solution here. Thoughts?

I've been getting some good feedback about scriptharness 0.1.0; thank you. I listed the 0.2.0 highlights and changes in the 0.2.0 Release Notes, but wanted to mention a few things here.

First, mozharness' config had the flexibility of accepting any arbitrary key/value pairs from various sources (initial_config, commandline options, config files...). However, it wasn't always clear what each config variable was for, or if it was required, or if the config was valid. I filed bug 699343 back in 2011, but didn't know how to tackle it then. I believe I have the solution now, with ConfigTemplates.

Second, 0.1.0 was definitely lacking a run_command() and get_output_from_command() analogs. 0.2.0 has Command for just running+logging a command, ParsedCommand for parsing the output of a command, and Output for getting the output from a command, as well as run(), parse(), get_output(), and get_text_output() shortcut functions to instantiate the objects and run them for you. (Docs are here.) Each of these supports cross-platform output_timeouts and max_timeouts in both python 2.7 and python3, thanks to the multiprocessing module. As a bonus, I was able to add context line support to the ErrorLists for ParsedCommand. This was also a want since 2011.

I fleshed out some more documentation and populated the scriptharness issues with my todo list.

I think I know what I have in mind for 0.3.0, but feedback is definitely welcome!

0.2.0 Release Notes: http://python-scriptharness.readthedocs.org/en/stable/releasenotes/
PyPI: https://pypi.python.org/pypi/scriptharness/0.2.0
GitHub: https://github.com/scriptharness/python-scriptharness/releases/tag/v0.2.0
Read the Docs: http://python-scriptharness.readthedocs.org/en/stable

I found myself missing mozharness at various points over the past 10 months. Several things kept me from using it at my then-new job:

Even though we had kept mozharness.base.* largely non-Mozilla-specific, the mozharness clone-and-run model meant there was a lot of Mozilla-specific code that came along with it.
The monolithic BaseScript + mixins model had a very steep barrier of entry. There's a significant learning curve, and scripts need to be fully ported to mozharness to take advantage of its features.

I had wanted to address these issues for years, but never had time to devote fully to harness-specific development.

Now I do.

Introducing scriptharness 0.1.0:

Release Notes: http://python-scriptharness.readthedocs.org/en/stable/releasenotes/0.1.0/
- The release notes highlight differences between scriptharness and mozharness, as well as known issues.
PyPI: https://pypi.python.org/pypi/scriptharness/0.1.0
GitHub: https://github.com/scriptharness/python-scriptharness/releases/tag/v0.1.0
Read the Docs: http://python-scriptharness.readthedocs.org/en/stable/

I'm proud of this. I'm also aware it's not mature [yet], and it's currently missing some functionality.

There are some ideas I'd love to explore before 1.0.0:

multiple Script objects with threading and separate logs
Config Type Definitions
rethink how to enable/disable actions. I could keep mozharness' --add-action clobber --no-upload structure, or play with --actions +clobber -upload or something. (The - before the upload in the latter might cause argparse issues?)
also, explore Maven-style actions (all actions before target action are enabled) and actions with dependencies on other actions. I prefer keeping each action independent, idempotent, and individually targetable, but I can see someone wanting the other behavior for certain scripts.
I've already split out strings from code in a number of places, for unit testing. I'm curious what it would take to make scriptharness localizable, and if there would be demand for it.
ahal suggested adding structured logging; I'd love to investigate that.

I already have 0.2.0 on the brain. I'd love any feedback or patches.

Five years ago today, I landed the first mozharness commit in my user repo. (github)

starting something, or wasting my time. Log.py + a scratch trunk_nightly.json

The project had three initial goals:

First and foremost, I was tasked with building a multi-locale Fennec on Maemo. This was a more complex task than could sanely fit in a buildbot factory.
The Mozilla Releng team was already discussing pulling logic out of buildbot factories and into client-side scripts. I had been wanting to write a second version of my script framework idea. The first version was closed-source, perl, and very company- and product-specific. The second would improve on the first in every way, while keeping its three central principles of full logging, flexible config, and modular actions.
Finally, at that point I was still a Perl developer learning Python. I tend learn languages by writing a project from scratch in that new language; this was my opportunity.

Multi-locale Fennec became a reality, and then we started adding projects to mozharness, one by one.

As of last July, mozharness was the client-side engine for the majority of Mozilla's CI and release infrastructure. I still see plenty of activity in bugmail and IRC these days. I'll be the first to point out its shortcomings, but I think overall it has been a success.

Happy birthday, mozharness!

Today's my last day at Mozilla. It wasn't an easy decision to move on; this is the best team I've been a part of in my career. And working at a company with such idealistic principles and the capacity to make a difference has been a privilege.

Looking back at the past five-and-three-quarter years:

I wrote mozharness, a versatile scripting harness. I strongly believe in its three core concepts: versatile locking config; full logging; modularity.
- By my count, over 85% of our buildbot builders (build+test job types) are currently running through mozharness scripts. Given over 100k build+test jobs in a day, that's a sizeable amount. So many people across so many teams contributed to this, and we're not done yet. Once desktop builds are in mozharness and we discontinue Android 2.2 support, that number should be close to, if not over, 90%.
I helped FirefoxOS (b2g) ship, and it's making waves in the industry. Internally, the release processes are well on the path to maturing and stabilizing, and b2g is now riding the trains.
- Merge day: Releng took over ownership of merge day, and b2g increased its complexity exponentially.
  Listening to @escapewindow explain merge day processes is like looking Cthulhu in the eyes. Sanity draining away rapidly
  — Laura Thomson (@lxt) April 29, 2014
  I don't think it's quite that bad :) I whittled it down from requiring someone's full mental capacity for three out of every six weeks, to several days of precisely following directions.
- I rewrote vcs-sync to be more maintainable and robust, and to support gecko-dev and gecko-projects. Being able to support both mercurial and git across many hundreds of repos has become a core part of our development and automation, primarily because of b2g. The best thing you can say about a mission critical piece of infrastructure like this is that you can sleep through the night or take off for the weekend without worrying if it'll break. Or go on vacation for 3 1/2 weeks, far from civilization, without feeling guilty or worried.
I helped ship three mobile 1.0's. I learned a ton, and I don't think I could have gotten through it by myself; John and the team helped me through this immensely.
- On mobile, we went from one or two builds on a branch to full tier-1 support: builds and tests on checkin across all of our integration-, release-, and project- branches. And mobile is riding the trains.
- We Sim-shipped 5.0 on Firefox desktop and mobile off the same changeset. Firefox 6.0b2, and every release since then, was built off the same automation for desktop and mobile. Those were total team efforts.
- I will be remembered for the mobile pedalboard. When we talked to other people in the industry, this was more on-device mobile test automation than they had ever seen or heard of; their solutions all revolved around manual QA.
  
  (full set)
- And they are like effin bunnies; we later moved on to shoe rack bunnies, rackmounted bunnies, and now more and more emulator-driven bunnies in the cloud, each numbering in the hundreds or more. I've been hands off here for quite a while; the team has really improved things leaps and bounds over my crude initial attempts.
I brainstormed next-gen build infrastructure. I started blogging about this back in January 2009, based largely around my previous webapp+db design elsewhere, but I think my LWR posts in Dec 2013 had more of an impact. A lot of those ideas ended up in TaskCluster; mozharness scripts will contain the bulk of the client-side logic. We'll see how it all works when TaskCluster starts taking on a significant percentage of the current buildbot load :)

I will stay a Mozillian, and I'm looking forward to see where we can go from here!

[stating the problem]

Mozharness currently handles a lot of complexity. (It was designed to be able to, but the ideal is still elegantly simple scripts and configs.)

Our production-oriented scripts take (and sometimes expect) config inputs from multiple locations, some of them dynamic; and they contain infrastructure-oriented behavior like clobberer, mock, and tooltool, which don't apply to standalone users.

We want mozharness to be able to handle the complexity of our infrastructure, but make it elegantly simple for the standalone user. These are currently conflicting goals, and automating jobs in infrastructure often wins out over making the scripts user friendly. We've brainstormed some ideas on how to fix this, but first, some more details:

[complex configs]

A lot of the current complexity involves config inputs from many places:

buildbot-configs, through
- environment variables,
- command line options, and
- buildbot properties sent through buildprops.json
mozharness, through
- one or more mozharness config files (e.g. configs/b2g/releng-fota-eng.py),
- the script itself (e.g. b2g_build.py)
in-tree config files (e.g. testing/config/mozharness/linux_config.py or b2g/config/emulator/config.json), and
web-based resources, whether through hgweb or a service like mapper.

We want to lock the running config at the beginning of the script run, but we also don't want to have to clone a repo or make external calls to web resources during __init__(). Our current solution has been to populate runtime configs during one of our script actions, but then to support those runtime configs we have to check multiple config locations for our script logic. (self.buildbot_config, self.test_config, self.config, ...)

We're able to handle this complexity in mozharness, and we end up with a single config dict that we then dump to the log + to a json file on disk, which can then be reused to replicate that job's config. However, this has a negative effect on humans who need to either change something in the running configs, or who want to simplify the config to work locally.

[in-tree vs out-of-tree]

We also want some of mozharness' config and logic to ride the trains, but other portions need to be able to handle outside-of-tree processes and config, for various reasons:

some processes are volatile enough that they need to change across the board across all trees on a frequent basis;
some processes act across multiple trees and revisions, like the bumper scripts and vcs-sync;
some infrastructure-oriented code needs to be able to change across all processes, including historical-revision-based processes; and
some processes have nothing to do with the gecko tree at all.

[brainstorming solutions]

Part of the solution is to move logic out of mozharness. Desktop Firefox builds and repacks moving to mach makes sense, since they're

configurable by separate mozconfigs,
tasks completely shared by developers, and
completely dependent on the tree, so tying them to the tree has no additional downside.

However, Andrew Halberstadt wanted to write the in-tree test harnesses in mozharness, and have mach call the mozharness scripts. This broke some of the above assumptions, until we started thinking along the lines of splitting mozharness: a portion in-tree running the test harnesses, and a portion out-of-tree doing the pre-test-run machine setup.

(I'm leaning towards both splitting mozharness and using helper objects, but am open to other brainstorms at this point...)

[splitting mozharness]

In effect, the wrapper, out-of-tree portion of mozharness would be taking all of the complex inputs, simplifying them for the in-tree portion, and setting up the environment (mock, tooltool, downloads+installs, etc.); the in-tree portion would take a relatively simple config and run the tests.

We could do this by having one mozharness script call another. We'd have to fix the logging bug that causes us to double-log lines when we instantiate a second BaseScript, but that's not an insurmountable problem. We could also try execing the second script, though I'd want to verify how that works on Windows. We could also modify our buildbot ScriptFactory to be able to call two scripts consecutively, after the first script dynamically generates the simplified config for the second script.

We could land the portions of mozharness needed to run test harnesses in-tree, and leave the others out-of-tree. There will be some duplication, especially in the mozharness.base code, but that's changing less than the scripts and mozharness.mozilla modules.

We would be able to present a user-friendly "inner" script with limited inputs that rides the trains, while also allowing for complex inputs and automation-oriented setup beforehand in the "outer" script. We'd most likely still have to allow for automation support in the inner script, if there's some reporting or error checking or other automation task that's needed after the handoff, but we'd still be able to limit the complexity of that inner script. And we could wrap that inner script in a mach command for easy developer use.

[helper objects]

Currently, most of mozharness' logic is encapsulated in self. We do have helper objects: the BaseConfig and the ReadOnlyDict self.config for config; the MultiFileLogger self.log_obj that handles all logging; MercurialVCS for cloning, ADBDeviceHandler and SUTDeviceHandler for mobile device wrangling. But a lot of what we do is handled by mixins inherited by self.

A while back I filed a bug to create a LocalLogger and BaseHelper to enable parallelization in mozharness scripts. Instead of cloning 90 locale repos serially, we could create 10 helper objects that each clone a repo in parallel, and launch new ones as the previous ones finish. This would have simplified Armen's parallel emulator testing code. But even if we're not planning on running parallel processes, creating a helper object allows us to simplify the config and logic in that object, similar to the "inner" script if we split mozharness into in-tree and out-of-tree instances, which could potentially also be instantiated by other non-mozharness scripts.

Essentially, as long as the object has a self.log_obj, it will use that for logging. The LocalLogger would log to memory or disk, outside of the main script log, to avoid parallel log interleaving; we would use this if we were going to run the helper objects in parallel. If we wanted the helper object to stream to the main log, we could set its log_obj to our self.log_obj. Similarly with its config. We could set its config to our self.config, or limit what config we pass to simplify.

(Mozharness' config locking is a feature that promotes easier debugging and predictability, but in practice we often find ourselves trying to get around it somehow. Other config dicts, self.variables, editing self.config in _pre_config_lock() ... Creating helper objects lets us create dynamic config at runtime without violating this central principle, as long as it's logged properly.)

Because this "helper object" solution overlaps considerably with the "splitting mozharness" solution, we could use a combination of the two to great efficacy.

[functions and globals]

This idea completely alters our implementation of mozharness, by moving self.config to a global config, directly calling logging methods (or wrapped logging methods). By making each method a standalone function that's only slightly different from a standard python function, it lowers the bar for contribution or re-use of mozharness code. It does away with both the downsides and benefits of objects.

The first, large downside I see is this solution appears incompatible with the "helper objects" solution. By relying on a global config and logging in our functions, it's difficult to create standalone helpers that use minimized configs or alternate logging configurations. I also think the global logging may make the double-logging bug more prevalent.

It's quite possible I'm downplaying the benefit of importing individual functions like a standard python script. There are decorators to transform functions into class methods and vice versa, which might allow for both standalone functions and object-based methods with the same code.

[related links]

Jordan Lund has some ideas + wip patches linked from bug 753547 comment 6.

Andrew Halberstadt's Sharing code not always a good thing and How to deal with IFFY requirements

My mozharness core principles example scripts+configs and video

Lars Lohn's Crouching Argparse Hidden Configman. Afaict configman appears to solve similar problems to mozharness' BaseConfig, but Argparse requires python 2.7 and mozharness locks the config.

Profile

aki

escape(window)

November 2022

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Syndicate

Page Summary

Style Credit

Style: Plain for Tabula Rasa

Expand Cut Tags

No cut tags

Page generated Jun. 8th, 2025 01:18 am